Saturday, January 18, 2014

2013 AWS re:Invent Conference Was Excellent

I had the opportunity to attend Amazon Web Services (AWS) cloud computing conference, “AWS re:Invent” at the end of 2013.  I had a lot of fun, it was a lot of work, and I also went through the typical three phases of a conference attendee:
  • Shock and Awe -- The overwhelming stimulus input at the beginning of the show
  • Bewitchment -- The infatuation with the great ideas and marketing
  • Oversaturation -- Inability to take in one more presentation, product, or discussion about cloud computing.
Overall, AWS re:Invent, was one of the best conferences that I’ve been too in a while.  Here are some of my key takeaways from the event.

AWS is clearly the market leader

One of the reasons to attend AWS re:Invent geek love-in is that they are by far the market leader. And therefore, if you want to watch where cloud computing is going, they are the ones to watch as many competitors are following their lead.  

If you look at the Gartner Magic Quadrant for infrastructure as a service (IaaS) over the last three years (see below), not only has AWS move to the head of the pack in cloud IaaS, they’ve pulled away from the pack.  The chart for 2013 looks like it needed to be resized in order to keep their competitors on it.  



AWS continues to pull away from the pack for a few reasons.  They not only got the jump on the competition but they now benefit from their size and gravitational effect on the market.  In addition to scale and size, pure intellectual capital is a big barrier to entry in the IaaS business.  Until IaaS becomes a commodity, the know-how (intellectual capital) to do it at the scale that AWS does has to be created by the companies in the market.  You can’t hire people (or buy the intellectual capital) needed to complete, you have to build it within your organization by learning.  It takes time, learning from your mistakes, and customers willing to help you learn how to deliver IaaS and other services at the massive scale and reliability that AWS has today.  It doesn’t hurt that AWS has customers that are willing to stick with them and help them learn, even when they make a mistake.  One of those customers is the key account, Netflix.  In addition, customers like Netflix are sharing what they learned and open sourcing some of their code and tools that they use to achieve massive scale and unprecedented availability of their service.  



Core Themes


Below are some of the core themes that I caught during the 2013 conference.  

Enabling Hockey Stick Growth


Many of the presentations I attended usually had a “hockey stick” graph that told the story  of how AWS enabled them to to grow customers, compute capacity, storage or all three in a matter of months using AWS.  By designing for the cloud, NASA JPL was able to support a massive increase of customer visits as they approached the mars landing.  Although they were able to just as easily scale down that environment, they were able to take advantage of the massive amount of data they receive and make available to scientists.


A special effects company, Atomic Fiction, talked about how they can quickly scale up compute capacity to render massive movie special effects and then shut it all down to avoid costs while they are in between compute workloads.  (The example clips from the latest Star Trek special effects were fun too.)

DropCam showed how they utilize AWS storage to persist customer data at a rate that now exceeds YouTube uploads.  

Another area of hockey stick growth is AWS itself.  AWS doesn’t just build data centers, they also are continuously releasing new services, feature enhancements and capabilities.  In 2013, AWS will have made over 236 announcements, up from 159 a year previous.


Enabling Enormous Scale


There were multiple customers in break-out sessions and the keynote presentations that discussed how AWS enables massive scale for their startups as well as established companies.  Every one of these presenters showed hockey stick growth in their service use and therefore compute and storage needs.  NASA JPL talked about how they used AWS to quickly scale up their environment that allowed millions of people to participate and follow the Mars landing as well as utilizing the massive storage of AWS to house and share science data being collected.  DropCam used AWS to scale up their network camera service to petabytes of data within months, using an IT department of only 5 people.  In addition, the compute capacity available to DropCam will allow them to deliver additional products and services to their customers, allowing them to focus on product and leave the job of building and operating massive data centers to someone else.  


By enabling this level of scale with such little effort, new companies are inventing new markets or attacking existing ones held by tech giants.  In addition, these companies can go from idea to web-scale products in months.

Economies of Scale

To enable on-demand growth, AWS needs to build in front of demand and be big and wow is it.  James Hamilton rocked the industry with some additional information about how AWS scales up to support big customers like Netflix (who drive 30% of north American traffic during peak hours) as well as small growing companies like DropCam. In his talk, titled “Why Scale Matters and How the Cloud Really is Different”, James discussed some of the details. AWS designs their own power sub-stations that are as large as 50 megawatts.  They do this to achieve better time to market but they also have different goals than the power companies and therefore better alignment to their business.

In order to deliver at the scale that AWS does, it makes economic sense for Amazon to design their own routers and protocols within the AWS data centers.  James drove home that cloud is different than traditional data centers and trying to fit off the shelf network hardware isn’t optimal.  Traditional corporate networks are typically over-subscribed because each server is only occasionally using the network.  In cloud environments, AWS benefits when it’s servers are close to max capacity and new architectures, like massive MapReduce clusters, would destroy most corporate networks.  By owning the software used on their routers and other network equipment, AWS can optimize for their environment and fix problems in hours rather than the months it might take for off-the-shelf equipment.

To give conference goers an understanding of how large and dense AWS storage racks are, James showed a commercially available storage rack that holds 600 disks and weighs 3/4 of a ton.  AWS achieves higher density with lower cost and power utilization using custom storage racks that weigh over 1 ton.  AWS stores trillions of objects and serve 1.5 million requests per second at peak demand.  And remember, this is historical data, they are still growing!  

AWS has five times the infrastructure of their fourteen closest competitors combined.  WOW!  And to keep that lead or even pull away from the competition, AWS adds enough infrastructure every day to power Amazon when it was a $7bln company.  They do this every day, including Sundays according to James.   In 2013, AWS did all of this within 9 data center regions and 25 availability zones around the world.  (Each availability zone has at least 1 data center).  

I can continue to quote statistics and size of AWS but I think by now, you get the message.  AWS is mega-big.  

Speed to learn deploy improve and scale

During Werner Vogels keynote, he talked about agility but not in the way most think of business agility, he was talking about engineering agility.  AWS allows engineers to use an iterative design process - tinker, rebuild, scale, change, and improve at very low risk and cost.  Because designers are not locked into a particular architecture because of huge capital outlays, changes are much easier to implement.  No longer do major architectural design changes result in physical data center modifications and juggling of capital costs associated with it.

I loved Werner Vogels term “Overclock your application”.  He showed example after example of how small businesses are using the constellation of services and the scalable capabilities of AWS to build new markets or dominate existing markets.  Companies with five guys with no data center in a livingroom overtake companies with legions of workers and huge data centers.

Maturity

Another theme at re:Invent was maturity.  AWS is focusing on driving toward consistent delivery performance.  One example Werner Vogels showed was the response times of DynamoDB, the AWS object database (think nosql).  DynamoDB is provisioned by IOPs rather than forcing customers to choose a particular size of system.  Prior to today they’ve had performance in sub 10ms response range but they would have spikes response latency at times. The result of their efforts to smooth the response times resulted in a nice consistent sub-10ms response time with no spiking at any scale.  All data in DynamoDB is stored on solid state drives and synchronously replicated across three availability zones.  AWS has a compelling story with DynamoDB and if I get some time, I’ll play around with it and do a writeup.

Resiliency

Resiliency was another theme (or marketing push) at the conference.  This isn’t new, it’s been a constant theme of the AWS show-and-tell for at-least the last three shows.  It’s obvious that their customers are catching on though, this conference was showcasing more of their customers showing attendees and other customers how to achieve higher availability.  There were multiple breakout sessions that discussed how to deploy applications using AWS automated redundant infrastructure from RDS, to DynamoDB, to S3.  AWS customers don’t depend on one monolithic highly reliable physical system to deploy to, AWS IS THE PLATFORM.  AWS customers should assume every single component of AWS will fail at some point (good assumption no matter where you deploy, local or cloud).  AWS designs to this axiom, making it easy for customers to deploy EC2 instances in multiple availability zones (think separate data centers), add them to load balancers.  The same goes for the ease of provisioning auto-redundant databases.  In addition, all of this can be automated.  Customers like Netflix use automated tools to actually test what happens when various servers from their EC2 clusters are shut down during production but they go further.  Their tools verify that their designs can withstand a complete zone failure, and more recently, are  testing complete region failures (extremely rare but it can happen).  

AWS took resilience to the next level this year and announced new tools to easily replicate your environment across regions and use Route52 to act as a global load-balancer.  One of the breakouts I attended, Netflix explained the complexities of designing highly reliable redundant persistence layers and how they do it.  They also discussed the new member of their simian army, Chaos Kong.

Netflix

Although not really a “theme” per say, Netflix plays an important part at AWS re:Invent conferences.  Netflix is probably one of AWS’s largest customers, and can be 1/3 of all north american  internet traffic.  I’m pretty sure Netflix has helped AWS learn how to become even more resilient and gain the intellectual capital it takes to manage a monstrous infrastructure reliably and consistently.  Netflix is a big draw for any attendee that wants to learn how to scale and design for an always-on delivery platform.  Given that AWS new feature enhancements and introductions are customer driven and Netflix is their flagship customer, I expect a lot of the new features regarding region replication have been driven by them, including the region-to-region replication introduced at the show in 2013.

Many AWS customers come just to attend Netflix presentations to learn how they scale, and Netflix is happy to pass along what they have learned about implementing improved reliability on AWS.  Netflix AWS utilization is massive and they’ve helped AWS improve availability by pushing their infrastructure to the limit.  Netflix now has region-to-region failover and has open sourced some of its tools that abstract various services they consume in a way that improves fault tolerance.  In addition to Netflix sharing some of their architecture designs and code, they’ve also open-sourced  their “Simian army”, a suite of tools the helps test resiliency of solutions.  These tools help teams test what happens when a compute instance fails, an entire data center fails (zone), and even a region.  The mantra you always hear from AWS and it’s customers is plan for failure at every level if you want reliability and highly available systems.  AWS makes deploying highly available solutions economically feasible for even the smallest of applications.  For example, when deploying a database on RDS, all you need to do is select the “Multi-AZ Deployment” option to provision a database that is automatically replicated across two data centers with automatic failover.  Instance price for single vs. redundant MySQL instances is less than double the cost whereas doing this yourself will result in excess  2X the costs due to setup, maintenance, network, and facilities.  

As you can tell, I’m bullish on AWS availability and so are their customers, with some saying they have higher availability in the cloud.  I think AWS learned from their failures (most notably the 2011 region failure) and strengthened their processes and architectures resulting in improve up-time.  I wrote about this in 2011.

Making IT / Operations invisible to the Business

It was interesting to listen to AWS customers talk about how few people it takes to run extremely large and complex infrastructures.  Now that companies can get out of the business of racking-and-stacking equipment, troubleshooting hardware issues, networking, and maintenance, they can focus on their core competency and deliver their products at a lower overall cost.  These companies rely on automation to scale, monitor and resolve infrastructure issues automatically.   

Innovation

Woven into all of these themes was the theme of Innovation.  AWS as it turns out, is giving budding companies, entrepreneurs, and dreamers a chance to put their ideas to the test without massively large initial capital investments to get started.  Many of the companies that presented would have never been able to break into their respective industries or invent new markets without AWS because the barrier to entry is the massive compute infrastructures needed to help launch and run their business which requires a huge bet by investors.  AWS allows inventors to tinker, adjust and trial launch services which previously would have been prohibitively expensive if they were forced to build their own data centers.  Because of the flexibility of cloud computing, a mistake in how a solution is architected can be corrected without the need to replace capital equipment.  More and more companies are willing to take a risk, knowing that if their invention is a flop, they can just turn it all off and try something else.

AWS Trends

Here are the trends I see not only at AWS but in the cloud industry as a whole (which is mostly AWS).

More enterprise focus

There is more Enterprise focus this year in the customer presentations that I watched.  A couple of presenters strongly recommended AWS enterprise support.  There are global companies demonstrating how they achieve more secure solutions showing their use of virtual private cloud (VPC), security, and encryption.  

AWS seems to be very focused on security, a top priority for any CIO looking at the cloud.  One example is the introduction of “cloud HSM”.  The Cloud HSM service allows customers to securely generate, store and manage cryptographic keys used for data encryption in a way that makes the keys accessible only to the customer.  This means that if a company’s data is swept up by authorities as part of a legal discovery process, they still need to properly serve the company to obtain the keys.  There are many other attributes that come together that bolster the AWS secure cloud story and companies like NASA JPL publicly state they believe AWS allows them to be more secure than their traditional on-premise implementations.  

Companies touting “Cloud First”

I’m not naive enough to think that AWS doesn’t carefully curate their presentations but I still see this as a trend with JPL and others using the term to describe their IT strategy.

The cloud is the backbone of the new Internet company

More new companies are dominating markets with hockey stick growth and capacity as well as creating new markets or taking market share away from their competitors using AWS.  These companies operate at internet speeds and the ability to change direction like an olympic skater.

More tools, patterns, features, services

AWS used the conference to launch new services and features like these:
  • “Cloudtrail”, providing real-time logging (introduced with partner services like splunk)
  • “Kinesis”, a real-time processing of streaming data
  • “Appstream”, streaming of app intensive applications to users (mobile is target)
  • PostgreSQL on RDS and Larger EC2 Instances
  • “Workspaces”, virtual desktop

Closing Thoughts

Intellectual Capital (IC) is the key barrier to entry in the cloud IaaS business.  AWS size has a compounding effect on IC growth at AWS, making it very hard to challenge AWS’s lead.  (Sort of like the Apple iPod effect).

AWS is enjoying a growing constellation of businesses that consult, build products specifically for or work with AWS.  This is similar to the marketing help Apple receives from all of the 3rd party vendors that make accessories for Apple products.  Open a SkyMall magazine  and the number of Apple related accessories is like one big advertisement for iPhone and iPads.  The same can be said for the growing list of third party tools and services for AWS, it deepens the “gravity well” that AWS is creating in the cloud universe.

Amazon has capabilities that most companies would never be able to afford.  For example, by leveraging AWS cloud, companies can deploy a new iteration of their product (doubling their compute size) when they switch over, and then turn off the old compute instances when they are confident their new products are working correctly.  Using Redshift and Hadoop clusters, companies can use the massive analytics capabilities of AWS products for small jobs.

Security in the cloud is front and center and it’s at a tipping point.  Not only is AWS delivering new capabilities in this area like CryptoKey HSM and VPC features, the vendor floor was packed with companies that address multiple areas of security as well.  With companies like JPL stating that they feel like they can be more secure in the cloud than on-premise is helping to influence others.

To be able to leverage and use AWS when the time is right, you need to build a competency.  I think that one of the biggest advantages of AWS is what I like to call “return on agility”.  Time to market and getting things knocked out fast is a huge benefit.  If you don’t prepare and have an AWS competency, you’ll miss opportunities that pop up.  If companies think that AWS might fit some use future cases, they need to build a team that can use AWS with the same competency level as their local infrastructures.

AWS re:Invent is a great place to to learn from other companies.  A lot of companies, started small and slow in order to learn and grow into the cloud where it made sense, and they passed along what they learned along the way at the conference.  If AWS makes sense, leverage what’s been published by their customers and attend re:Invent as you do your research.  

References & Related


- Chris Claborne

No comments:

Post a Comment