Saturday, September 7, 2013

Using Amazon Glacier

I’ve been wanting to test out Amazon Web Services (AWS) Glacier storage service since it was announced in August 2012.  If you’ve been reading my BLOG for a while, you’ll know that I am a little crazy when it comes to backing up my data.  The attraction of Amazon’s Glacier is low cost ($0.01/GB) and it’s high durability (11 9s).   I was waiting for some client side tools so that I could test (which I now have).  Windows based easy to use clients came out pretty quickly, FastGlacier was one of the first and has since seen a major release in the last month. I decided to give it a try and backup my family and client photos as well as music.  I think I have a strong use case for it and I’ve estimated that I can save $180+ per year by moving some of my archives to Glacier.  So, I’ll save some money while I’m having some fun.


What is Glacier?
Glacier is Amazon’s storage platform in the cloud, dubbed “cold storage”.  AWS has multiple persistence products (fancy way of saying you can store data multiple ways).  There are tradeoffs between speed, cost, and durability (probability of loss of data).  Among their offerings, they have the following products that serve as a means of easily storing files:
  • Elastic Block Storage (EBS).  EBS storage  is comparatively expensive but very very fast (think of it as local disk for an AWS server), which in fact is what many AWS EC2 users use EBS for.  EBS has probably the lowest durability of the three covered here.  AWS does replicate the storage volumes in the same data center, but because EBS is meant for speed and a large number of changes per millisecond, the probability of losing data is higher than with the next two options.
  • Amazon Simple Storage Service (S3). It’s slower than EBS and can be treated like object storage that is sharable.  Applications that need to persist images and file attachments for their applications may use S3 because they can share the data between applications and a server farm and because the files don’t change often, access latency requirements isn’t as important.  S3 has a higher durability because Amazon stores data in multiple facilities on multiple devices within each facility.  This replication happens before the service returns “success” to the application using it (indicating the save is complete and safe).  S3 also has Versioning, allowing you to preserve, retrieve, and restore every version of every object stored in S3.  S3 is designed for 99.99% durability and 99.99% availability of objects over a given year and can withstand the concurrent loss of data in two facilities.  Customers have the option to reduce the cost of their S3 storage by turning off replication as well.
  • Amazon Glacier is the lowest cost option of the three and Glacier is meant for cold storage but restores are glacially slow.  Slow as in, it could take up to 4 hours to start a restore or service an inventory request.  Uploads are pretty fast however.  The key differentiators are durability and cost.  Glacier is the slowest of the three, but provides the same availability of S3 but with the highest durability of 99.999999999%.  
The key tradeoffs above are cost vs. speed and durability.  Slower and more durable = lower cost.  The target use case for Glacier is long term storage of files that really don’t change and are rarely downloaded. Examples are music, video, photos and other media.


    AWS Pricing as of 9/2013


    AWS Storage
    EBS
    S3
    Glacier
    1TB for 1 month
    $100
    $95
    $10

Target Use Case


First and foremost, you need to understand how the pricing model will affect your use case because there is more to the story..  
  • Glacier stores a gigabyte of data for one month for one cent ($0.01/GB).  
  • Upload and retrieval requests are $0.05/1,000 requests.  
  • Data charges while uploading data to Glacier is zero.
  • Downloading data costs $0.12/GB for the first 10 TBs (The first GB is on the house)
  • Deleting files within 90 days of uploading to Glacier incurs a delete fee.
The other big cost is your time.  A request for the inventory (a list of all the files in your vault(s)) can take up to 4 hours and once you have the list, it can take up to 4 hours to start the restore.  When you take all of this together, it’s easy to see that glacier is meant for files that don’t change often and rarely need to be downloaded.  

The obvious choices that fit the use case and cost model are:

  • Media files
  • Backup archive containers that need to be kept for multiple years.


Typical system backups are constantly backing up new data and changed data files.  Consider the documents directory on your PC.  You add new documents and the ones you have may change several times per day.  Glacier doesn’t support the use case of changes, rather you would delete the object and re-upload it.... or just upload a new copy.  The pricing model doesn’t really support this either.


For me, photos won’t change much after just a few days on my computer.  Once I post process my images (keeping the originals and creating new adjusted images), I’m done.  Glacier’s low cost allows me to backup everything, including the original RAW files.   I’ll upload my images and if I do have to make a change, I’ll just re-upload the image since the cost is so incredibly low.  Uploading a file a second time won’t overwrite the file in Glacier because they don’t store it with the filename you give it, they assign a unique ID to the file along with the name you give it along with other meta-data.  Using an application like FastGlacier, it normally just shows the file-names so they are still visible and that’s really how you work.  If I upload the image twice in a folder, I’d see it listed twice with the same name which would through a lot of people off  (FastGlacier does allow you to see the UID of the object you uploaded if you really want to).


Because of the size of image files, using my current cloud backup provider, iDrive, to store them is much more expensive.  I’ll use iDrive for everything else because it provides unattended automatic continuous backups, Versioning, client-side encryption and restores are fast.  The “restore” use case is more likely for documents.  I’ve used iDrive a few times to not only recover a lost document but also to grab a previous version of the document because changes corrupted the document in some way.  I had my wife up and running in just a few minutes when I had to restore a critical data file that was part of her research.  The use case that would drive me to restore images and music is most likely a system crash where my redundant drives failed and my network attached storage device also failed (think really bad fire).

FastGlacier

This really isn’t a review about FastGlacier.  There are a lot of reviews of FastGlacier out there.  I’ve got to say, they make uploading and restoring files to/from the Glacier cloud service easy.  The UI is simple and supports drag-and-drop.  The tool has some other features that I like as well.
  • Support for multipart uploads for large files enabling better performance.
  • Multi-threaded uploads.  It uploads multiple files in parallel, providing better performance and allows you to utilize your full internet upload capacity if that’s your desire.  
  • Ability to tune the application, controlling the number of threads to open, size of files that should be broken apart and other parameters that allow you to throttle your network usage.
  • Drag-and-Drop.  This is nice because the UI “Upload” button allows only one directory or file to be highlighted at a time, it can be a real pain if you do it that way.  Drag-and-drop allows you to use the file manager to select multiple objects and drop them on the UI. I grab multiple directories and drop them on the Glacier Vault and folder I want them uploaded to and press the start button.  If FastGlacier is already uploading, it just adds them to the queue.
  • A console based application allowing you to automate uploads using batch files as well as the ability to conduct command line folder synchronization.
  • Support for multiple AWS accounts.
  • The UI allows you to manage Glacier “Vaults” so you don’t have to go into AWS management console once you are setup.
  • Queue management.  FastGlacier queues all of the jobs and keeps track of them.  This allows the ability to keep dragging and dropping files on FastGlacier and ability to retry failed jobs.  
  • Queue management controls provide the ability to pause and restart activity.  
  • When starting long running requests, like a restore request or inventory request, it will be picked up when it’s ready... Users don’t need to have the application running the entire time while waiting.  If you start the job, shut down the app and come back later, you can pick up what is waiting for you.


The one typical feature you won't find in FastGlacier is local encryption before upload.  iDrive and others do this on your PC before files are uploaded.  The advantage of this is that only YOU have the encryption keys to the data.  Glacier does encrypt the data with 256 bit AES encryption on the AWS servers but the keys are held by Amazon.  

For the content that I’m backing up, I don’t need encryption. In fact, I don’t really want that feature for my photos.  The reason is, I don’t want to depend on one particular vendor’s software solution being available on the target machine when the time comes to restore since most solutions do encryption differently.  There are several clients that will most likely be available to access Glacier as long as Glacier is around.  Worst case, I can always write my own utility (although I’ll probably task my son create a utility for me.)  If I do need to encrypt data before I upload, I’ll try to use a very basic method that will hopefully have a long shelf-life.  If I need to pull something out of cold storage and it’s encrypted, I’ll need to ensure that I can decrypt it in the future.  If the tool is basic enough I should be able to use it on new hardware and operating systems.  Open source tools may provide a more optimal solution allowing users to hold onto the source code should they need to implement it on a platform in the future (whatever that might be).  GPG is one example of an open source tool that that fits this category.

I suggest you play around with your preferred Glacier tool and run through some use-cases in a test vault before you go to “full production”.  Try installing the tool on a different PC and simulate a system crash recovery scenario.  
Hint: During a crash recovery scenario, just install FastGlacier and request a new set of vault inventories be sent down.  You won’t see any files to recover until you do this.

If you don’t like FasGlacier or you have a Mac, there are several other tools on the market that do the same thing.  If you have a Linux system, try using this method to backup to Glacier.


I’ll continue to use iDrive on all my home PC’s, but I’ll direct my large archive of images and music to Glacier while utilizing the security, speed and ease of use of iDrive for my home network workstations.  I expect to save over $180/year in cloud storage costs by moving my media files to Glacier.

Moving In
Now, how do users move into AWS Glacier? You can upload your data over the net but if you need to move a terabyte or more, use the AWS moving truck. I'll cover Amazon Import/Export in my next article.


If I were a small business, I’d certainly consider AWS glacier in place of off-site backups.  Managing tapes or other media can be a real headache (and costly) and you are hoping that the media is durable and readable should you need it and that the hardware that you created the media with is still operational.


For a small business, and certainly a home user, cloud backups alleviate the cost and management off-site storage services for data, and restorations are just a click away.  With Amazon Glacier, people with lots of  large data files that fit the use cases mentioned above can now access lower cost durable storage in the cloud.

Reference

- Christian Claborne

2 comments:

  1. I was able to run FastGlacier on Ubuntu using Wine and Mono (2.x) for Windows. Not as ideal as a native Linux app, but really simplifies things.

    ReplyDelete
  2. http://fiftin.github.io/oblqo - good client for Glacier, written on C# and works under Mono on Linux. Tested on Ubuntu.

    ReplyDelete