We discussed Brewerâ€™s Theorm a few days ago and how its challenging to obtain Consistency, Availability and Partition tolerance in any distributed system. We also discussed that many of the distributed datastores allow CAP to be tweaked to attain certain operational goals.
Amazon SimpleDB, which was released as an â€œEventually Consistentâ€ datastore, today launched a few features to do just that.
- Consistent reads: Select and GetAttributes request now include an optional Boolean flag â€œConsistentReadâ€ which requests datastore to return consistent results only. If you have noticed scenarios where read right after a write returned an old value, it shouldnâ€™t happen anymore.
- Conditional put/puts, delete/deletes : By providing â€œconditionsâ€ in the form of a key/value pair SimpleDB can now conditionally execute/discard an operation. This might look like a minor feature, but can go a long way in providing reliable datastore operations.
Even though SimpleDB now enables operations that support a stronger consistency model, under the covers SimpleDB remains the same highly-scalable, highly-available, and highly durable structured data store. Even under extreme failure scenarios, such as complete datacenter failures, SimpleDB is architected to continue to operate reliably. However when one of these extreme failure conditions occurs it may be that the stronger consistency options are briefly not available while the software reorganizes itself to ensure that it can provide strong consistency. Under those conditions the default, eventually consistent read will remain available to use.
The initial version of this tool used MySQL database. The original application architecture was very simple, and other than the database it could have scaled horizontally. Over the weekend I played a little with SimpleDB and was able to convert my code to use SimpleDB in a matter of hours.
Here are some things I observed during my experimentation
- Its not a relational database.
- Canâ€™t do joins in the database. If joins have to be done, it has to be done at the application which can be very expensive .
- De-normalizing data is recommended.
- Schemaless: You can add new columns (which are actually just new row attributes) anytime you want.
- You have to create your own unique row identifiers. SimpleDB doesnâ€™t have a concept of auto-increment
- All attributes are auto-Indexed. I think in Google App Engine you had to specify which columns need indexing. Iâ€™m wondering if this would increase cost of using SimpleDB.
- Data is automatically replicated across Amazonâ€™s huge SimpleDB cloud. But they only guarantee something called â€œEventually Consistentâ€. Which means data which is â€œputâ€ into the system is not guaranteed to be available in the next â€œgetâ€.
- I couldnâ€™t find a GUI based tool to browse my SimpleDB like the way some S3 browsers do. Iâ€™m sure someone will come up with something soon. [Updated: Jeff blogged about some simpleDB tools here]
- There are limits imposed by SimpleDB on the amount of data you can put in. Look at the tables below.
||100 active domains
|size of domains
|attributes per domain
|attributes per item
|size per attribute
|items returned in a query response
|seconds a query may run
|attribute names per query predicate
||1 attribute name
|comparisons per predicate
|predicates per query expression
Other related discussions (Do checkout CouchDB)