Friday, December 9, 2011

Exploring Amazon's Cloud IaaS & PaaS

Amazon Web Services (AWS) is a large offering of various compute and applications services that I will use for this article to help describe what “Infrastructure as a Service” (IaaS) and “Platform as a Service” (PaaS) are.  

AWS is used by many large Internet companies like active.com, Ericsson, the guardian news, Yelp,  IMDB and others to deliver their web applications or services.  Although I’ve covered Software as a Service” and related benefits, risks, and some specific applications, this article will take the next big step into another side of cloud computing, using public scalable infrastructure to deploy your in-house applications.


Where “Software as a Service” (SaaS) normally refers to complete applications like CRM (from SalesForce.com) or Invoicing (from zoho.com),  IaaS and PaaS are the building blocks that developers use to build those applications.  SaaS applications are used by business and consumers where as IaaS is used primarily by the applications builders.  AWS is used by companies, such as those mentioned above to deliver consumer and businesses applications.  Some companies are starting to move internal applications out of private company data centers to public cloud data centers and utilizing AWS is one method of doing this.

Amazon’s implementation of IaaS is an example that much of the industry are following.  AWS is a growing multi-billion dollar business for Amazon and as more and more businesses choose the cloud over privately owned data centers, revenue projections for 2015 exceed $100 billion for the industry.

IaaS & PaaS Defined

IaaS

Defining Infrastructure as a service (IaaS) and platform as a service (PaaS) is pretty easy.  Think of IaaS as almost bare metal computing.  IaaS delivers what looks like your very own computer running within Amazon’s data center.  

To spin up your server, you choose options via a step-by-step process as you decide how many instances to create, what OS you want on it, and how much compute capacity (think big server versus little server here), storage and other features, like availability services you need.  Once it is up and running, you have admin access to your own machine.  This may be a dedicated machine or a "virtual system", on which you can do whatever you desire.  You can install other software, like database, web server, application server, or analytic database systems.  There may be some limitations, but for the most part, it is just like any computer you may have under your desk or in your data center - except it is located at the vendor’s data center.   For AWS, this means 6 regions spanning the globe, from US to Europe to Asia.
Adding storage is just as easy to configure as creating compute instances.  IaaS vendors allow you to quickly provision storage, of the size you want, and attach it to your compute instance.  Unless you are provisioning very large amounts of space, you don’t need to reserve the space or really do much planning in advance with the vendor.  New technology in storage products have moved well beyond attaching a physical hard-disk to a computer.  Storage today is - you guessed it - virtualized.  Companies can now deploy computer devices dedicated to providing very very large volumes of storage and then divide that up into what looks like a dedicated disk, allowing multiple computers (and customers) to share this very large pool of storage.  The idea of virtualization extends into other services as well (i.e., networking).

Some of the key features of IaaS from various vendors are listed below.
  • Allow you to deploy compute services for applications in minutes
    • Linux or Windows servers On-demand
    • Dynamic Scaling at a moment’s notice
    • Open APIs to control your compute environment from within applications
    • Only pay for the compute cycles you use.
    • Located in the part of the world of your choosing.
  • Unlimited, On-demand Storage
    • Scalable
    • Pay only for what you use
    • Flexible
  • Load Balancing and other services
    • Dedicated and static IP addresses
    • Load balance between multiple application servers



Virtual Machines

Before I go any further, I should quickly an overview of what a"virtual machine" (VM) is, since it is a key component to public clouds as well as the emerging implementation for local (private cloud) computing today.

Today's computer systems have more than enough capacity to run many applications, normally leaving available excess capacity.  Companies split applications across multiple servers so that one application doesn’t adversely affect other application on the same computer.  An idea pulled from archives of mainframe days has been reworked into special software called "hyper-visors".  This software allows companies to run what look like complete separate computer systems (with their own operating system (OS),  applications, etc) on a single piece of hardware.  Now, for example, even people like me can run a Windows server and a Linux server in my office environment on the same computer at the same time.  The more powerful the server, the more virtual machines it can run at the same time. Most new hardware from laptops to desktops can run multiple VMs.  VMs share resources like network, USB, disk drives, and other hardware.  VMs and the ease of managing VMs on today’s hardware are key enablers of IaaS industry.  VMs provide the "economy of scale" that allows companies like Amazon and Rackspace to provide secure, highly reliable compute services to thousands of customers.  Although you can still request dedicated hardware at a vendor’s data center, VMs are becoming the norm as they provide more flexibility for customers and are more economical. 




PaaS

Think of Platform as a Service (PaaS) as a service used by your application code (rather than end-users).  One good example is Amazon’s Relational Database Service (RDS).  Cloud customers can install Oracle database or some other vendor’s database on their virtual server and have absolute full control, but this comes at an unnecessary cost.  For one, you have to dedicate a lot of computer resources and people and time to set this up and maintain it, back it up, etc.  By provisioning a database from Amazon’s RDS, developers get a database instance dedicated to them without all of the administrative overhead.  The vendor takes care of the upgrades, security patches, backups, and other mundane tasks, freeing up time to focus on move value add tasks.

Amazon Web Services

Amazon’s Amazon Web Services (AWS) is one of the biggest providers of IaaS & PaaS.

When one thinks of Amazon, they think of a company that started as a book seller and became a platform for e-commerce, not a cloud services provider.  AWS grew out of a couple of things.  In order to do what Amazon does, they had to get good at setting up a scalable, flexible environment to handle their growing compute needs, deliver it with  high availability, all while continuing to grow the breadth of product and become a selling platform with which others could sell.  To accomplish their mission they had to invest huge dollars in multiple data centers, networks, redundancy, and deploy their infrastructure world-wide.  At times, this resulted in Amazon sitting on a lot of excess capacity.  Taking the intellectual capital that they had built up, physical infrastructure that they owned,  and combining it with a dogged conversion to service based architectures (as described in  Steve Yegge"s article), positioned Amazon to launch AWS.  Amazon leveraged their intellectual capital and pure compute muscle to offer what they do for themselves as a service to others. 

As Amazon realizes the potential ROI in this business as they continued to invest in AWS, adding capacity and  delivering new re-usable services for their customers.  Amazon continues to improve and deliver new capabilities on a monthly basis.  For example, in November of 2011, Amazon  released a new "Compute Region" in Oregon with two "availability zones" and they have released new services as this article was being written.  In addition to adding capabilities, Amazon is growing their infrastructure at a phenomenal rate.  In an interview, Amazon’s James Hamilton shared a factoid that provides a sense of the rapid growth of Amazon’s cloud platform. “Every day Amazon Web Services adds enough new capacity to support all of Amazon.com’s global infrastructure through the company’s first 5 years, when it was a $2.76 billion annual revenue enterprise,”. (See “A Look Inside Amazon’s Data Centers” for more on this.)

AWS Features

Listed below are some of the core products and services that you will find as part of the AWS offering. You will also find many of these are part of competing offerings from companies like Rackspace

Orchestration

All AWS management is done via their "AWS management Console”.  This console allows you to spin up, change or delete and manage all of the services offered within AWS (compute, storage, network etc).  This is referred to as "Orchestration".  This holistic management console is where everything starts and is a key enabler, making AWS a viable service.  Without it, managing or understanding how all your services are configured would be a nightmare.  

Compute

Amazon Elastic Compute Cloud (EC2)

AWS Elastic Compute Cloud (EC2) is a logical server within an AWS data center and the most basic form of IaaS.  These can be provisioned and started up in just minutes.  To spin up a server, users are lead through a series of options and they can:
  • Choose the region, allowing architects to place compute resources close to each other and their customers.
  • Choose how many server instances to start.
  • Choose the operating system (various versions of Linux and Windows)
  • Select the size of the logical computer.  This mainly equates to number of CPUs and amount of memory.  Bigger = more expensive dollars per compute hour.
  • Select the "Availability Zone" which lets users choose which data center in the region they want to locate this server.  Amazon has at least two "Availability Zones" per region.  Think of each zone as a separate data center with its own power (and backup power), network cooling and other infrastructure services.  Placing systems in different zones insulates them from catastrophic failure in another zone.  AWS provides multiple services that allow you to setup redundant components to a solution to avoid any major failure in one zone.  With a little more design, an application could withstand a complete failure in an entire region.  More on that later.
  • Choose the credentials that will allow you to access the server.  AWS uses crypto key files that users create and then can re-use to access their systems.
  • Define which security group to place the server in.  A security group is a sort of container that allows users to define what can come into and out of that container.  For example, my security group enables access the server via port 80 and 443 (WEB access) where other security groups may contain servers that only allow traffic to flow between servers as added protection.  This allows designers to “fire wall” some services while exposing others to the Internet.

When going through this process you can easily set the option to launch multiple instances, not just one.  The picture to the right is an example of a server all ready to be launched.

When it’s complete, you have a machine name that can be accessed over the Internet that allows you to login with your key via encrypted shell.

Amazon Elastic MapReduce

MapReduce is a fairly new category of computing that uses the idea of attacking large problems in parallel. It’s a compute platform, and in general contains multiple little machines that work together in parallel to solve very large problems.  Think of thousands of ants, all working together to pick apart something in the same way that a army of computers would attack a mountain of data to come up with a solution.  Google uses this parallel compute architecture to search the mountain of Internet website data to provide you results to your search query in just seconds.  Think about it for a moment.  When you send a search query to Google, it finds the relevant web sites to your query, bundles it up with advertising, formats it and sends it back to you within seconds. If you consider that it takes a full second to return the results over the network, and your results are back within 2 to 3 seconds it has only 1 second to do all of this.  WOW!  They do this by employing hundreds of computers, working together to quickly do this.  This architecture even parcels out work to redundant systems so that if one of the nodes in the hive is late getting back to the master with its small piece of work, it just uses the result from the redundant worker. 

Think of Elastic MapReduce as the super computer for your business needs.  Offerings like this allow companies to quickly leverage massive compute power to solve some large compute intensive problem within minutes rather than days using the massive compute power of a provider’s data center.  The advantage is obvious, setup is fast, you pay only for what you use, and there’s no on-going cost to hold onto a resource that you no longer need. 

Auto Scaling

EC2 auto scaling is one of the most compelling aspects of AWS that grows or shrinks compute power as needed.  By this point, you know you can easily setup more compute power for your web application as you need it. But what if you could have this done automatically?  By using AWS CloudWatch to monitor your systems, you can instruct it to automatically spin up new servers in your cluster and place them under a load-balanced URL when certain conditions are met (like CPU utilization).  This allows companies to have  EC2 to automatically spin up a copy of their production environment without manual intervention.  In addition, developers can set conditions that will remove these compute instances when they are no longer needed.  You can use cloud computing systems as needed to cost-effectively provide temporary capacity. Called "cloud-bursting,", ramp up when you need it, ramp down when you are done.  

Database 

Amazon offers two types of database (DB) platforms: Relational Database Service (RDS) and Simple DB.

RDS

Amazon Relational Database Service (RDS) is a database PaaS that allows users to provision a database instance without installing the software on one of their compute instances.  Setup includes the following:  

  • Users can choose Oracle or a license free version of MySQL.
  • Version of database  
  • Define the “Instance class” (defines the size of the server used). 
  • Define the size of data space to reserve 
  • Can choose "Multi-availability zone deployment" which I assume will replicate your data on a backup server in another "availability zone" for redundancy.
  • Define backup retention period (number of days for which automated backups are retained.  
  • Define connection user name and password
  • Set the “security group” which defines which of your servers is allowed to communicate with the DB.

Once the setup is complete, AWS makes the database available to applications within 5 minutes.  Not only do customers shed the entire administrative burden that comes with operating a database server like this, but they still have the ability to modify some of the detailed parameters if needed.   Here’s a quick laundry list of features:
  • Pre-Configured Parameters
  • Monitoring  and Metrics
  • Auto software patching
  • Automated backups
  • DB Snapshots
  • Push-Button Scaling
  • Auto Host replacement in the event of a failure.
  • Replication for enhanced database availability and performance. 


Simple DB

Amazon also provides something called "Amazon SimpleDB" which is Amazon’s  non-relational database (DB).  It is optimized to provide high availability, flexibility, and ease of scalability with no admin burden.
Amazon ElastiCache is their in-memory cache for the cloud. 

Deployment Management

Amazon has management tools - AWS Elastic Beanstalk and AWS CloudFormation - that allow you to easily deploy or rollback your java apps and operate a mini-cloud within the Amazon cloud.  

Storage

Storage is another area that Amazon allows you to scale and they have two types.  Their Simple Storage Service (S3) is storage as a web service.  Amazon Elastic Block Store (EBS) is the storage that backs a EC2 instances and is the typical “local disk”.  Need a new disk volume added to a server? No problem, because it can be accomplished in minutes.

Supporting Services

I’ll close the AWS offering by loosely grouping them together.  Not to be underestimated, these powerful scalable services enhance and enable massive application deployments that are used by some very large Internet SaaS companies today.  

  • Performance information like CPU utilization or things like the number of database connections is easily available for most products like EC2, RDS, and other assets via a “Monitoring” tab on many of the AWS assets.  [This can be really handy when trying to understand the utilization of your various infrastructure assets for capacity planning.]
  •  Amazon Cloud Front - A way to deliver static content that resides closer to your customers.  Once content is inserted into a container, that content is replicated / distributed to regional data centers providing better performance as content is delivered from the closest source to end-users..
  • Amazon simple Queue Service - Application messaging service that allows developers to bridge applications with reliable application to application data transfer inside of AWS or between AWS and a company's private systems.
  • Amazon Simple Notification Service - Used for sending messages to subscribers.
  • Amazon Simple Email Service - For bulk and transactional email-sending for businesses and developers.
  • Amazon CloudWatch - Allows administrators to monitor various metrics of a cloud service (like CPU utilization or storage space availability, and send alerts or initiate elastic scale commands.
  • Amazon Rout 53 - highly available and scalable Domain Name System web service.  
  • Amazon Virtual Private Cloud - Allows users to provision a private isolated section of the AWS cloud and launchs AWS resources in a virtual network as well as create private sub-nets.
  • Elastic Load Balance - Provides traffic load balancing across multiple Amazon EC2 instances
  • AWS Direct Connect - Allows users to setup direct network connections from private systems to AWS.
  • Amazon Flexible Payment Services - Allows developers to build on top of Amazon’s reliable and scalable payment infrastructure.
  • AWS Import/Export - Allows users to ship storage devices to Amazon for direct bulk uploading or downloading of large amounts of data, reducing network costs and time to load.

Customer Apps and Support for AWS  

There is a strong community of developers that add value to AWS.  Within AWS Resources, you will find a listing of 590 Customer Apps available to others.  In addition there is a listing of  “Amazon Machine Images” (AMIs) that other customers have created, developer tools, sample code, public data sets, and discussion forums that are easily used.  This is an amazing resource for AWS customers.  Because it’s integrated, you can spin up a shared AMI in minutes and start using public data sets immediately on your AWS EC2 instance.  

Looking around the Internet, I found a growing cottage industry and community of tool makers for AWS.  If you don’t fully understand how to use a tool or service within AWS, you’ll probably find the help you need on the AWS site or after about a 2 minute Google search.  There are tools that leverage AWS API that add to the manageability of large AWS implementations.

Huge Potential Benefits of Cloud Computing Awaits You

Take a look at my previous article on the benefits and risks of cloud computing.  Aside from the things that are specific to “Software as a Service”, all of the potential benefits can be realized with AWS.  If you have an application that you need to develop and deliver yourself, why not forgo the headache and cost of running your own data center and move to the cloud?  It’s not the right fit for all companies or applications, and there are risks and challenges, but the reasons to not use public clouds are falling away. 


In my next article, "Using Amazon's AWS", I’ll write about how I used an example of a real-world application and deployed it on the AWS cloud.  I’ll demonstrate not only the ease of implementation but how I flexed the “elastic” power muscle that AWS is known for.  A third installment will discuss possible hybrid deployments using the cloud to augment existing data center solutions.

References

Advantages of Cloud Computing
Risks and Challenges of Cloud Computing
AWS Case Studies
A Look Inside Amazon’s Data Centers
Steve Yegge"sPost about designing for reusable services and Amazon 

- Chris Claborne

No comments:

Post a Comment