Versioning data in S3 on AWS

One of the problem with Amazon’s S3 was the inability to take a “snapshot” of the state of S3 at anyAmazon Web Services given moment. This is one of the most important DR (disaster recovery) steps of any major upgrade which could potentially corrupt data during a release. Until now the applications using S3 would have had to manage versioning of data, but it seems Amazon has launched a versioning feature built into S3 itself to do this particular task. In addition to that, they have made it a requirement that delete operations on versioned data can only be done using MFA (Multi factor authentication).

Versioning allows you to preserve, retrieve, and restore every version of every object in an Amazon S3 bucket. Once you enable Versioning for a bucket, Amazon S3 preserves existing objects any time you perform a PUT, POST, COPY, or DELETE operation on them. By default, GET requests will retrieve the most recently written version. Older versions of an overwritten or deleted object can be retrieved by specifying a version in the request.

The way AWS Blog describes the feature, it looks like a version would be created every time an object is modified and each object in S3 could have different number of copies depending on the number of times it was modified.

This kind of reminds me of SVN/CVS like versioning control system and I wonder how long it will take for someone to build a source code versioning system on S3.

BTW, data requests to a versioned object is priced the same way as regular data, which basically means you are getting this feature for free.

References

3 comments

  1. Actually you’re not getting it for free as you have to pay for all the versions being stored on S3. Here’s what the AWS blog is saying:
    “Normal S3 pricing applies to each version of an object. You can store any number of versions of the same object, so you may want to implement some expiration and deletion logic if you plan to make use of this feature.”

    So depending on how many versions you need to store and supposing all your object versions will have about the same size, it’s strictly proportional:
    2 versions – pay twice the original amount for storage

    N versions – pay N times the amount for storage

  2. @Adrian: Thats true but if I understand that correctly its only for that particular object, not for your entire S3 store. Since they also talk about being able to configure how to set expiry, I’m hoping users could tune that to their satisfaction. Personally, I think for some providers the cost of storage is insignificant compared to risk of data corruption.

    But, I agree its not entirely free :)

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>