Scalable products: KFS released

Kosmix, a search startup has released source to C++ implementation of something which looks like a clustered file system. This looks very similar to Hadoop/HDFS, but the C++ factor will be a big performance boost.Kosmic

From Skrenta blog

    • Incremental scalability – New chunkserver nodes can be added as storage needs increase; the system automatically adapts to the new nodes.
    • Availability – Replication is used to provide availability due to chunk server failures.
    • Re-balancing – Periodically, the meta-server may rebalance the chunks amongst chunkservers. This is done to help with balancing disk space utilization amongst nodes.
    • Data integrity – To handle disk corruptions to data blocks, data blocks are checksummed. Checksum verification is done on each read; whenever there is a checksum mismatch, re-replication is used to recover the corrupted chunk.
    • Client side fail-over – During reads, if the client library determines that the chunkserver it is communicating with is unreachable, the client library will fail-over to another chunkserver and continue the read. This fail-over is transparent to the application.
    • Language support – KFS client library can be accessed from C++, Java, and Python.
    • FUSE support on Linux – By mounting KFS via FUSE, this support allows existing Linux utilities (such as, ls) to interface with KFS.
    • Leases – KFS client library uses caching to improve performance. Leases are used to support cache consistency.

If anyone has experience with KFS, or has more information please leave a comment here.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>