September 28, 2007

Scalable products: KFS released

Kosmix, a search startup has released source to C++ implementation of something which looks like a clustered file system. This looks very similar to Hadoop/HDFS, but the C++ factor will be a big performance boost.Kosmic

From Skrenta blog



    • Incremental scalability - New chunkserver nodes can be added as storage needs increase; the system automatically adapts to the new nodes.

    • Availability - Replication is used to provide availability due to chunk server failures.

    • Re-balancing - Periodically, the meta-server may rebalance the chunks amongst chunkservers. This is done to help with balancing disk space utilization amongst nodes.

    • Data integrity - To handle disk corruptions to data blocks, data blocks are checksummed. Checksum verification is done on each read; whenever there is a checksum mismatch, re-replication is used to recover the corrupted chunk.

    • Client side fail-over - During reads, if the client library determines that the chunkserver it is communicating with is unreachable, the client library will fail-over to another chunkserver and continue the read. This fail-over is transparent to the application.

    • Language support - KFS client library can be accessed from C++, Java, and Python.

    • FUSE support on Linux - By mounting KFS via FUSE, this support allows existing Linux utilities (such as, ls) to interface with KFS.

    • Leases - KFS client library uses caching to improve performance. Leases are used to support cache consistency.




If anyone has experience with KFS, or has more information please leave a comment here.

No comments: