WorDS of Data Science beginning with C
- Cloud Computing
Cloud computing follows the “renting rather than buying” principle for computing resources. It provides on-demand computing resources such as servers, storage, databases and applications through the Internet using virtualization technology. Clouds eliminate the need of initial capital investment, effort and time on acquiring expensive computing infrastructure and maintaining it. The cloud delivers computing resources as a metered service, i.e., you pay for what you use. Clouds are very good for deploying small-scale (often experimental) applications and then scaling-up according to workload requirements.
The features of Cloud computing include:
Elasticity: Clouds provide instant scaling up and scaling down of resources depending on workload changes by provisioning and releasing in an automatic manner.
On demand: Users can get required resources instantly at the time of their requests, i.e., user requests will not wait in a queue like most of traditional computing clusters. Although request resources are limited by the overall physical resource capacity of the provider, users practically can assume they can get unlimited resources.
Virtualization: To fit various user requirements on CPU, memory, operating system, software, etc., resource provisioning of Cloud computing is done on top of a virtualization layer instead of bare physical machines. Through virtualization, a Cloud provider can quickly provide required resources quickly.
Service oriented: Most cloud computing software follows service oriented architecture (SOA). The communication between users and providers can be done through Web services.
Cloud computing provides services that can be classified under three main service models:
Software as a service (SaaS): SaaS refers the software products that utilize Cloud resources to provide better scalability and availability. SaaS is easiest to use and also has minimal flexibility. Services within this model enable users to access a software applications made available by a service provider in ready to use mode. The service providers manage the entire technology stack including servers, storage, networking, virtualization, security, OS, middleware, data and applications. End users can access a software application through web-browser or a client eliminating the need downloading, installing and running the application on local.
Platform as a service (PaaS): PaaS limits how cloud resources can be used by only exposing pre-defined services to users based on each PaaS architecture. Some management tasks, such as operating system updates and scalability management, are taken care by service provider. Users follow the architecture to build and deploy their own components on the cloud resources. Users for this service models are generally developers. The service providers manage servers, storage, networking, virtualization, OS, middleware and runtime.
Infrastructure as a service (IaaS): Resources provided by IaaS are virtual instances where users can have full control of them. Users can login to the virtual instances as their own servers. Users have the most flexibility on how to use the virtual instances, and meanwhile have the greatest management responsibility. This cloud computing service model provides virtualized computing infrastructure to users to build scalable and cost efficient PaaS or SaaS solutions. The service providers manage servers, storage, networking, and virtualization, eliminating cost, complexity, time and effort to acquire and manage hardware.
Besides these three categories, there are other XaaS terms such as Database as a Service and Network as a Service, but most of them can be a special form or usage of one of the above category.
Managed by User
Delivered As A service by Vendor
Applications, Data, Middleware, O/S, Virtualization, Servers, Storage, Networking
Facebook, Twitter, Google, Quora, Dropbox, Microsoft Office 365
Middleware, O/S, Virtualization, Servers, Storage, Networking
AWS Elastic Beanstalk, Google App Engine, Red Hat Openshift
Applications, Data, Middleware, O/S
Virtualization, Servers, Storage, Networking
Amazon AWS EC2, Microsoft Azure Compute, Google Compute platform
Cloud resources can also be categorized into public cloud and private cloud based on who can use them.
Public Cloud: Public cloud refers to Cloud resources open to public. Many IT companies such as Amazon and Microsoft have their own Public Clouds. Users need to pay for the usage. Because all users’ data and software within one public cloud are managed by the same company, it raises security concerns in the context of applications that require high levels of security. From an economical perspective, public cloud is suitable when a user needs to use Cloud resources occasionally.
Private Cloud: Private cloud refers to Cloud resources that only be used by certain users, often within one organization. It is often set up on a compute cluster using Cloud management software like OpenStack and OpenShift. From an economical perspective, private cloud is suitable when a user/organization always uses it.
The bioKepler project provides virtual images that can be used via IaaS model. The image is built on top of CloudBioLinux that contains 500+ biology related tools. Additional software on the virtual image include Kepler, Hadoop and Spark in order to build and run workflows and Big Data applications. The bioKepler virutal image usage on EC2 can be found at http://www.biokepler.org/using-biokepler-amazon-ec2-image.