Distributed Data Parallel PFAM Annotation Workflow

Short Description: 

This workflow performs function annotation by using HMMER 3.0 program on PFAM database. 

Workflow Inputs: 

Protein sequences in FASTA format

User alterable parameters: 
-E e-value cutoff for prediction (default value=0.001)
Reference information (not alterable by users): 

PFAM database 24.0

Output: 

output.1: Table of hmmer hits
output.2: Table of GO mapping
output.3: Table of EC mapping

Validation and Test Plan: 

1). select input file (protein sequences in FASTA format).

2). select appropriate E-value for HMMER.

3). check results if the output format is ok

Parallelization Opportunities and Potential Future Extensions: 

Yes, it needs parallelization to speed up the run time. By splitting input sequences into smaller pieces, it can speed up by running HMMER run multiple nodes.