Distributed Data Parallel PFAM Annotation Workflow

Short Description:

This workflow performs function annotation by using HMMER 3.0 program on PFAM database.

Workflow Inputs:

Protein sequences in FASTA format

User alterable parameters:

-E e-value cutoff for prediction (default value=0.001)

Reference information (not alterable by users):

PFAM database 24.0

Output:

output.1: Table of hmmer hits
output.2: Table of GO mapping
output.3: Table of EC mapping

Validation and Test Plan:

1). select input file (protein sequences in FASTA format).

2). select appropriate E-value for HMMER.

3). check results if the output format is ok

Parallelization Opportunities and Potential Future Extensions:

Yes, it needs parallelization to speed up the run time. By splitting input sequences into smaller pieces, it can speed up by running HMMER run multiple nodes.