More on Amazon S3 versioning (webinar)

If you missed the AWS S3 versioning webcast, I have a copy of the video here. And here are the highlights..

image

  • You can enable and disable this at the bucket level
  • They don’t think there is a performance penalty of turning versioning (but it was kind of obvious S3 would be doing slightly extra work to figure out which is the latest version of any object you have)
  • There isn’t any additional cost for using versioning. But you have to pay for extra copy of each object.
  • MFA (multi factor authentication) to delete objects is not mandatory when versioning is turned on. It needs to be turned on. This was slightly confusing in the original email I got from AWS.
  • If you are planning to use this, please watch this video. There is a part where they explain what happens if you disable versioning after using the feature. This is something you might like to know about.
  • They use GUID for versioning of each object
  • You can iterate over objects and figure out how many versions you have for each object, but currently its not possible to find all objects which have versions older than X date. This is important if you are planning to garbage collection (cleaning up older copies of data) for a later time.

More References

Windows Azure

Windows Azure is an application platform provided by Microsoft to allow others to run applications on Microsoft’s “cloud” infrastructure. Its finally open for business (as of Feb 1, 2010). Below are some links about Azure for those who are still catching up.Windows Azure logo.jpg

Wikipedia: Windows Azure has three core components: Compute, Storage and Fabric. As the names suggest, Compute provides computation environment with Web Role and Worker Role while Storage focuses on providing scalable storage (Blobs, Tables, Queue) for large scale needs.

The hosting environment of Windows Azure is called the Fabric Controller – which pools individual systems into a network that automatically manages resources, load balancing, geo-replication and application lifecycle without requiring the hosted apps to explicitly deal with those requirements.[3] In addition, it also provides other services that most applications require — such as the Windows Azure Storage Service that provides applications with the capability to store unstructured data such as binary large objects, queues and non-relational tables.[3] Applications can also use other services that are a part of the Azure Services Platform.

The real concerns about Cloud infrastructure (as it is today)

While “private clouds may not be the future” they are definitely needed today. Here are some of the top issues bothering some organizations which have been thinking about going into the cloud. Some of issues were based on Craig Bolding’s talk on “Guide to cloud security”.cluod

  • Unlike your own data center, you will never know what the cloud vendors are running, or how they backup, or what their DR plans are. They will say you shouldn’t care, but do you remember what happened to the Tmobile customer’s on Danger ?
  • Uptime, availability and responsiveness is less predictable than in a self hosted environment. In most cases the cloud vendors may not even choose to let customers know about major maintenance if they don’t anticipate any issues. Organizations who manage their own infrastructure would always try to avoid doing two major changes which have interdependencies.
  • Multi-Tenancy means you may have to worry about a noisy neighbor.
  • Muti-Tenancy could also lead one to interesting issues which were never thought about before. What if there was a way to do an “injection attack”. Depending on how Multi-Tenancy is implemented, you could potentially touch other customers data.
  • Infrastructure and platform lock-in issues are worrying for many organizations who are thinking long term. Most cloud vendors don’t really have a long history to show their track record.
  • Change control and detailed change log is missing.
  • Individual customers don’t have much decision making power on what a vendor should do next. In a privately hosted environment the stake holders are asked before something is done, but in larger infrastructure, you are a small fish in a huge pond.
  • Most cloud vendors have multiple layers of cloud infrastructure dependent on each other. Its hard to understand how issues around one type of cloud could impact others. This is especially true from Security view point. A bad flaw in a lower layer of the architecture could impact all other platforms built over it.
  • Moving applications to cloud means dealing with a different style of programming designed for horizontal scalability, data consistency issues, health monitoring, load balancing, managing state, etc.
  • Identify management is still in early stages. Integration with corporate Identify management infrastructure would be important to make it easy for individuals from large organizations on external clouds.
  • Who takes care of scrubbing disks when data is moved around ? What about data on backup tapes ? This is very important in application handling highly sensitive data.
  • Just like credit card fraud, one has to worry about CPU time fraud. Is the current billing and reporting good enough to help large organizations figure out what is real and what could be fraud ? They need a real-time fraud detection mechanism. And what about loss of service due to DOS attacks ? Who pays for that ?
  • Need a better mechanism to bill large corporations.
  • On the non-technical side, there are a lot of questions related to SLAs, Compliance issues, Terms of services, Legal issues around cross border services, and even questions about whether law enforcement have a different set of rules when search and seizure is required.
  • Not too far from being another form of “outsourcing”.

Photo credit: akakumo

New Talks and Slides links from Aug 5 2007

If you haven’t seen these links before.. you should check this page first “Talks and slides from web architects“. But if you have already seen that page… here are the updates from last week.

  PDF Case for Shared Nothing
  PDF The Chubby Lock Service for Loosely-Coupled Distributed Systems
    Building Highly Scalable Web Applications
1/1/2006 Slides The Ebay architecture
1/1/2007 Slides PHP & Performance
4/20/2007 Video Brad Fitzpatrick – Behind the Scenes at LiveJournal: Scaling Storytime
5/4/2006   Scalable computing with Hadoop
6/3/2007   Hadoop Map/Reduce
8/3/2007   Introduction to hadoop
6/1/2007 Slides Hadoop distributed file system
    Yahoo experience with hadoop
7/25/2007   Meed Hadoop
8/3/2007 webpage The Hadoop Distributed File System: Architecture and Design
7/25/2007 Blog Yahoo’s Hadoop Support
7/18/2007 Blog Running Hadoop MapReduce on Amazon EC2 and Amazon S3
8/3/2007   Interpreting the Data: Parallel Analysis with Sawzall
10/18/2005 Video BigTable: A Distributed Structured Storage System
1/1/2006 PDF Bigtable: A Distributed Storage System for Structured Data
1/1/2004 PDF MapReduce: Simplified Data Processing on Large Clusters
1/1/2003 PDF Google File System
8/3/2007 PDF ODISSEA: A Peer-to-Peer Architecture for Scalable Web Search and Information Retrieval
8/3/2007 PDF SEDA: An Architecture for well conditioned scalable internet services
8/3/2007 PDF A scalable architecuture for Global web service hosting service
6/23/2007 Blog Getting Started with Drupal
6/23/2007 Blog 4 Problems with Drupal

Talks and slides from various web architects

For latest set of links go here.

This is a collection of various slides, pdfs and videos about designing scalable websites I collected time. If you have something interesting which might go in here, please let me know.

Date Type Title
6/23/2007 Blog Getting Started with Drupal
6/23/2007 Blog 4 Problems with Drupal
6/23/2007 Video Seattle Conference on Scalability: MapReduce Used on Large Data Sets
6/23/2007 Video Seattle Conference on Scalability: Scaling Google for Every User
6/23/2007 Video Seattle Conference on Scalability: VeriSign’s Global DNS Infrastucture
6/23/2007 Video Seattle Conference on Scalability: YouTube Scalability
6/23/2007 Video Seattle Conference on Scalability: Abstractions for Handling Large Datasets
6/23/2007 Video Seattle Conference on Scalability: Building a Scalable Resource Management
6/23/2007 Video Seattle Conference on Scalability: SCTPs Reliability and Fault Tolerance
6/23/2007 Video Seattle Conference on Scalability: Lessons In Building Scalable Systems
6/23/2007 Video Seattle Conference on Scalability: Scalable Test Selection Using Source Code
6/23/2007 Video Seattle Conference on Scalability: Lustre File System
6/9/2007 Slides Technology at Digg.com
6/9/2007 Blog Extreme Makeover: Database or MySQL@YouTube
4/26/2007 Blog Mysql at Google
4/1/2007 Slides Scaling Twitter
4/1/2007 Slides How we build Vox
4/1/2007 Slides High Performance websites
4/1/2007 Slides Beyond the file system design
4/1/2007 Slides Scalable web architectures
3/1/2007 Slides Scalability set Amazon’s servers on fire not yours
3/1/2007 Slides Hardware layouts for LAMP installations
3/1/2007 Video Mysql scaling and high availability architectures
3/1/2007 Audio Lessons from Building world’s largest social music platform
3/1/2007 PDF Lessons from Building world’s largest social music platform
3/1/2007 Slides Lessons from Building world’s largest social music platform
11/1/2006 PDF Livejournal’s backend: history of scaling
11/1/2006 Slides Livejournal’s backend: history of scaling
11/1/2006 Slides Scalable Web Architectures (w/ Ruby and Amazon S3)
10/26/2006 Blog Yahoo! bookmarks uses symfony
7/26/2006 Slides Getting Rich with PHP 5
7/26/2006 Audio Getting Rich with PHP 5
3/7/2006 Blog Scaling Fast and Cheap – How We Built Flickr
3/1/2005 News Open source helps Flickr share photos
  Slides Flickr and PHP
  Slides Wikipedia: Cheap and explosive scaling with LAMP
  Blog YouTube Scalability Talk
    High Order Bit: Architecture for Humanity
  PDF Mysql and Web2.0 companies
8/3/2007   Building Highly Scalable Web Applications
8/3/2007   Introduction to hadoop
8/3/2007 webpage The Hadoop Distributed File System: Architecture and Design
8/3/2007   Interpreting the Data: Parallel Analysis with Sawzall
8/3/2007 PDF ODISSEA: A Peer-to-Peer Architecture for Scalable Web Search and Information Retrieval
8/3/2007 PDF SEDA: An Architecture for well conditioned scalable internet services
8/3/2007 PDF A scalable architecuture for Global web service hosting service
7/25/2007   Meed Hadoop
7/25/2007 Blog Yahoo’s Hadoop Support
7/18/2007 Blog Running Hadoop MapReduce on Amazon EC2 and Amazon S3
6/22/2007   LH*RSP2P : A Scalable Distributed Data Structure for P2P Environment
6/12/2007   Scaling the Internet routing table with Locator/ID Separation Protocol (LISP)
6/3/2007   Hadoop Map/Reduce
6/1/2007 Slides Hadoop distributed file system
4/20/2007 Video Brad Fitzpatrick – Behind the Scenes at LiveJournal: Scaling Storytime
2/1/2007 Slides Inside LiveJournal’s Backend (April 2004)
2/1/2007 Slides How to scale
1/23/2007   Testing Oracle 10g RAC Scalability
1/1/2007 Slides PHP & Performance
12/22/2006   SQL Performance Optimization
10/13/2006   Building_a_Scalable_Software_Security_Practice
5/31/2006   Building Large Systems at Google
5/4/2006   Scalable computing with Hadoop
1/1/2006 Slides The Ebay architecture
1/1/2006 PDF Bigtable: A Distributed Storage System for Structured Data
1/1/2006 PDF Fault-Tolerant and scalable TCP splice and web server architecture
10/18/2005 Video BigTable: A Distributed Structured Storage System
1/1/2004 PDF MapReduce: Simplified Data Processing on Large Clusters
8/3/2003 PDF Google Cluster architecture
1/1/2003 PDF Google File System
11/1/2002 Doc Implementing a Scalable Architecture
10/30/2001 News How linux saved Millions for Amazon
    Yahoo experience with hadoop
  Slides Scalable web application using Mysql and Java
  Slides Friendster: scalaing for 1 Billion Queries per day
  Blog Lightweight web servers
  PDF Mysql Scale out by application partitioning
  PDF Replication under scalable hashing: A family of algorithms for Scalable decentralized data distribution
  Product Clustered storage revolution
  Blog Early Amazon Series
  Web Wikimedia Server info
  Slides Wikimedia Architecture
  Slides MySpace presentation
  PDF A scalable and fault-tolerant architecture for distributed web resource discovery
8/4/2007 PDF The Chubby Lock Service for Loosely-Coupled Distributed Systems
8/5/2007 Slides Real world Mysql tuning
8/5/2007 Slides Real world Mysql performance tuning
8/5/2007 Slides Learning MogileFS: Buliding scalable storage system
8/5/2007 Slides Reverse Proxy and Webserver
8/5/2007 PDF Case for Shared Nothing
7/1/2007 Slides A scalable stateless proxy for DBI
1/1/2006 Slides Real world scalability web builder 2006
8/5/2005 Slides Real world web scalability