Ive been hearing this word “sharding” more and more often, and its spreading like fire. Theo Schlossnagle, the author of “Scalable internet architecutres” argues that federation is form of partitioning, and that sharding is nothing but a form of partitioning and federation. Infact, according to him, Sharding has already been in use use for a long time.
I’m not a dba, and I don’t pretend to be one in my free time either, so to understand the differences I did some research and found some interesting posts.
Splitting up the user data so that User A exists on one serverwhile User B exists on another server, each server now holds a shard ofthe data in this federated model.
A couple of months ago Highscalability.com picked it up and made it sound (probably unintentionally) that sharding is actually different from Federation and Partitioning. Todd’s post also points at Flickr using sharding.The search for Flickr architecture lead me to Colin Charles’ post about Federation at Flickr: A tour of the Flickr architecture where he does mention shards as a component of Federation key. Again no mention of Sharding being anything new.
Federation Key Components:
- Shards: My data gets stored on my shard, but the record ofperforming action on your comment, is on your shard. When making acomment on someone elsesÃ¢â‚¬â„¢ blog
- Global Ring: Its like DNS, you need to know where to go and whocontrols where you go. Every page view, calculate where your data is,at that moment of time.
- PHP logic to connect to the shards and keep the data consistent (10 lines of code with comments!)
Based on the discussions on these and other blogs, “Shards” sounds more like a terminology used to describe fragments of data which is federated across multiple databases instead of an architecture by itself. I think Theo Schlossnagle has a valid argument. If any of you disagree I’m interested to hear what you have to say. A clearer definition between sharding and federation would be very helpful as well.
Here are more references to Shard/Sharding.
- Hibernate Shards: Hibernate Shards allows you to continue using the Hibernate APIs you know and love: SessionFactory, Session, H Criteria, Query. If you already know how to use Hibernate, you already know how to use Hibernate Shards.
- Scaling Digg… Shards and DB: Another discussion on Digg here.
- Notes on sharding, unique keys, foreign keys
- Sharding databases with MySQL
- Database sharding helps high-traffic sites