WorDS of Data Science beginning with K

WorDS Center is actively collecting a dictionary of terms related to big data science. Every week, a new word is released.

a b c d e f g h i j k l m n o p q r s t u v w x y z

Click one of the letters above to go to the page of all terms beginning with that letter.

Kepler

Kepler is a software application for creating Scientific Workflows. Scientists have previously been forced to use a range of applications that each provided different benefits when working with data. To make scientific workflows more efficient, Kepler was created to bring together the benefits of these other programs into a single, easy to use application. Kepler provides a graphical interface that allows users to visualise the data flow of their model, and publish or share a much clearer representation of the workflow with their colleagues. This graphical interface also makes Kepler more user-friendly and easier to learn. Another benefit Kepler provides is access to a range of data management middlewere and repositories, and a vast library of data processing and analysis tools that are directly importable into workflows as steps.

Kepler's interface is composed of 'Actors': nodes that data passes between and that each perform a specific task. Actors are divided into three categories: data sources, data analysis or processing tools, and outputs. To direct the flow of data between Actors, users can simply drag and drop data lines between Actors' input and output ports. Kepler is high level and provides abstraction as complex workflows can be built using simpler sub-workflows that are hidden behind single Actors. The software also automates low-level data processing tasks allowing for efficient creation of high level workflows. Kepler allows for importing external modules to customise or extend the program's functionality, and when a high amount of computational power is required. Kepler works with a variety of heterogeneous computing platforms and computing technologies for native parallel processing on various platforms and optimizes scheduling computing tasks across these platforms.