Scientific Computing Workflows Boot Camp

Tools and Techniques for Scientific Computing and Data Science Workflows

Date: February 25-26, 2015, 9 am - 5 pm; Check in starts at 8:45 am.

Registration is required. Lunch and refreshments will be provided.

Registration: $1095. An educational discounted rate is available to graduate students, postdoctoral fellows, faculty, and university staff for $650. ​Enter educational ​discount ​code ​“EDU” ​along with your ​academic ​organization, ​title ​and ​email ​when ​registering.)

Location:  The Synthesis Center  E-B143 is located just off the lobby of SDSC’s east entrance off Hopkins Drive.  Directions to SDSC: http://www.sdsc.edu/about/Visitorinfo.html.

Agenda: The agenda is here.

Evaluation: The evaluation form is here.

Are you interested in learning new computational tools and techniques for your scientific computing and data science applications?

Interested in saving time, optimizing and scaling up your computations to produce results faster or communicating your research results more effectively?

In the Big Data era, often, valuable information gets buried in voluminous amounts of data. Scalability is becoming a prerequisite for applications to be able to efficiently process large scale datasets. This is where scientific workflows – a software application comprised of computational steps and data tools that scale up to run on high-performance computers, distributed environments, or commercial cloud systems – can make the critical difference. Workflows give you confidence in the accuracy of your results. They are science accelerators because they reduce the time to those results.

The Workflows for Data Science (WorDS) Center of Excellence, based at the San Diego Supercomputer Center at the University of California, San Diego, is dedicated to solving practical scientific problems through the adoption of scientific workflows. WorDS will help you focus on scientific questions and the end-to-end process, from data generation to journal publication or preparing for clinical trials. 

This boot camp will explain how you can turn your scientific computing applications into scalable workflows by analyzing available options, techniques and tools. The highly acclaimed WorDS Boot Camps are two-day workshops that focus on teaching methodologies to create efficient, scientifically rigorous, scalable workflow applications. Participants will also learn about Kepler, a comprehensive environment of reusable and extensible components to support distributed analysis of large-scale data. In particular, you will:

  • Learn about distributed platforms and system
  • Learn about Cloud and Big Data
  • Learn about scalable workflow tools
  • Learn how to make your science reproducible
  • Gain hands-on-experience with bioKepler tools to build scalable scientific workflows

WorDS Boot Camps are followed by a one-day ‘Hackathon’ that covers scientific computing, scalable applications, and data science while showing you how to build a customized workflow based on your specific application requirements.

About the boot camp: This two-day accelerated training session will start with a crash course on workflow technology and a hands-on session for using the locally developed open source Kepler workflow system.  We will then explore common computing platforms including Sun Grid Engine, NSF XSEDE high performance computing resources, the Amazon Cloud and Hadoop. We will explain how workflow systems can help with rapid development of distributed and parallel applications on top of any of these platforms. We will then discuss how to track data flow and process executions within these workflows (i.e. provenance tracking) including the intermediate results as a way to make workflow results reproducible. We will end with a session on using Kepler on the Amazon Cloud to learn how to build and share scalable scientific workflows in Kepler.  We will provide lab sessions at the end of each section of the course to apply the explained concepts to real application case studies.

Who should attend? Researchers and graduate students who are responsible from building computational and data science workflows, evaluating workflow systems as a means to conduct reproducible research, and curious to learn more about what workflows help with are welcome to attend. The hands-on examples for this boot camp are selected from a variety of scientific domains and the content on the workflow-driven reproducible science is set up to benefit a larger multidisciplinary audience.

Organizer: Workflows for Data Science (WorDS) Center of Excellence, SDSC and National Biomedical Computation Resource (NBCR), UCSD