Showing posts from September 28, 2007

Scalable products: KFS released

Kosmix , a search startup has released source to C++ implementation of something which looks like a clustered file system. This looks very similar to Hadoop/HDFS , but the C++ factor will be a big performance boost. From Skrenta blog Incremental scalability - New chunkserver nodes can be added as storage needs increase; the system automatically adapts to the new nodes. Availability - Replication is used to provide availability due to chunk server failures. Re-balancing - Periodically, the meta-server may rebalance the chunks amongst chunkservers. This is done to help with balancing disk space utilization amongst nodes. Data integrity - To handle disk corruptions to data blocks, data blocks are checksummed. Checksum verification is done on each read; whenever there is a checksum mismatch, re-replication is used to recover the corrupted chunk. Client side fail-over - During reads, if the client library determines that the chunkserver it is communicating with is unreacha