rRNA Taxonomy Binning

Short Description: 

rRNA Taxonomy Binning workflow takes rRNA sequences as input, runs RDP rRNA Classifier and/or BLASTN against one of the rRNA databases: GreenGene, RDP (Bacteria & Archaea), RDP Bacteria, RDP Archaea, Silva SSURef, and LSURef.

Workflow Inputs: 

rRNA reads after cleaning

User alterable parameters: 

1). select either RDP classifier or blastn methods, or both
2). e-vale for blastn (default < 1e-5)

Reference information (not alterable by users): 

Reference databases are any of rRNA reference databases: GreenGene, RDP (Bacteria & Archaea), RDP Bacteria, RDP Archaea, Silva SSURef, and LSURef in fasta format

Output: 

1). RDP Classifier output
2). blastn results with whole taxonomy path

Validation and Test Plan: 

1). select only RDP classifier

2). select only blastn

3).  select both

4). check results if the output format is ok

Parallelization Opportunities and Potential Future Extensions: 

Yes, it needs parallelization to speed up the run time. Since it has blastn option, both query and reference partitation are necessary. But the currently workflow only runs on one note, needs BLAST bioActor to have an update version.