Scalable Bioinformatics Boot Camp
Date: May 22nd, 2014, 9 am - 5 pm; Check in starts at 8:45 am.
This is a free introductory event with limited space available. Registration is required. Lunch and refreshments will be provided.
Agenda: The agenda is here.
Evaluation Form: The evaluation form is here.
Registration: This event is full. The next boot camp will be announced soon.
Location: The Synthesis Center E-B143 is located just off the lobby of SDSC’s east entrance off Hopkins Drive. Directions to SDSC
In Big Data era, scalability is becoming a prerequisite for a bioinformatics application to be able to efficiently process large scale datasets. This bootcamp will explain how you can turn your bioinformatics applications into scalable workflows by analyzing available options, techniques and tools.
- Learn about distributed platforms and system
- Learn about Cloud and Big Data
- Learn about scalable workflow tools
- Learn how to make your science reproducible
- Gain hands-on-experience with bioKepler tools to build scalable bioinformatics workflows
About the day:The day will start with a crash course on workflow technology and a hands-on session for using the locally developed open source Kepler workflow system. We will then explore common computing platforms including Sun Grid Engine, NSF XSEDE high performance computing resources, the Amazon Cloud and Hadoop. We will explain how workflow systems can help with rapid development of distributed and parallel applications on top of any of these platforms. We will then discuss how to track data flow and process executions within these workflows (i.e. provenance tracking) including the intermediate results as a way to make workflow results reproducible. We will end with a session on using bioKepler to learn how to build and share scalable bioinformatics workflows in Kepler. We will provide lab sessions at the end of each section of the course to apply the explained concepts to real application case studies.
Who should attend? Graduate students and researchers who are responsible from building bioinformatics and computational biology workflows, evaluating workflow systems as a means to conduct reproducible research, and curious to learn more about what workflows help with are welcome to attend. Although the hands-on examples are in the bioinformatics domain, the content on the workflow-driven reproducible science is set up to benefit a larger multidisciplinary audience.
Organizer: Workflows for Data Science (WorDS) Center of Excellence, SDSC and National Biomedical Computation Resource (NBCR)