Scalable Bioinformatics Boot Camp

Tools and Techniques for Bioinformatics, Biomedical Analysis, and Computational Biology

Date: October 2-3, 2014, 9 am - 5 pm; Check in starts at 8:45 am.

Registration is required. Lunch and refreshments will be provided.

Registration: $1095. An educational discounted rate is available to graduate students, postdoctoral fellows, faculty, and university staff for $650. Enter educational discount code “EDU” along with your academic organization, title and email when registering.)

Location: The Synthesis Center E-B143 is located just off the lobby of SDSC’s east entrance off Hopkins Drive. Directions to SDSC: http://www.sdsc.edu/about/Visitorinfo.html.

Agenda: The agenda is here.

In Big Data era, scalability is becoming a prerequisite for a bioinformatics application to be able to efficiently process large scale datasets. This boot camp will explain how you can turn your bioinformatics applications into scalable workflows by analyzing available options, techniques and tools.

Learn about distributed platforms and system
Learn about Cloud and Big Data
Learn about scalable workflow tools
Learn how to make your science reproducible
Gain hands-on-experience with bioKepler tools to build scalable bioinformatics workflows

About the boot camp: This two-day accelerated training session will start with a crash course on workflow technology and a hands-on session for using the locally developed open source Kepler workflow system. We will then explore common computing platforms including Sun Grid Engine, NSF XSEDE high performance computing resources, the Amazon Cloud and Hadoop. We will explain how workflow systems can help with rapid development of distributed and parallel applications on top of any of these platforms. We will then discuss how to track data flow and process executions within these workflows (i.e. provenance tracking) including the intermediate results as a way to make workflow results reproducible. We will end with a session on using bioKepler to learn how to build and share scalable bioinformatics workflows in Kepler. We will provide lab sessions at the end of each section of the course to apply the explained concepts to real application case studies.

Who should attend? Graduate students and researchers who are responsible from building bioinformatics and computational biology workflows, evaluating workflow systems as a means to conduct reproducible research, and curious to learn more about what workflows help with are welcome to attend. Although the hands-on examples are in the bioinformatics domain, the content on the workflow-driven reproducible science is set up to benefit a larger multidisciplinary audience.

Organizer: Workflows for Data Science (WorDS) Center of Excellence, SDSC and National Biomedical Computation Resource (NBCR)