Scalable products: KFS released
Kosmix, a search startup has released source to C++ implementation of something which looks like a clustered file system. This looks very similar to Hadoop/HDFS, but the C++ factor will be a big performance boost.
From Skrenta blog
If anyone has experience with KFS, or has more information please leave a comment here.
From Skrenta blog
- Incremental scalability - New chunkserver nodes can be added as storage needs increase; the system automatically adapts to the new nodes.
- Availability - Replication is used to provide availability due to chunk server failures.
- Re-balancing - Periodically, the meta-server may rebalance the chunks amongst chunkservers. This is done to help with balancing disk space utilization amongst nodes.
- Data integrity - To handle disk corruptions to data blocks, data blocks are checksummed. Checksum verification is done on each read; whenever there is a checksum mismatch, re-replication is used to recover the corrupted chunk.
- Client side fail-over - During reads, if the client library determines that the chunkserver it is communicating with is unreachable, the client library will fail-over to another chunkserver and continue the read. This fail-over is transparent to the application.
- Language support - KFS client library can be accessed from C++, Java, and Python.
- FUSE support on Linux - By mounting KFS via FUSE, this support allows existing Linux utilities (such as, ls) to interface with KFS.
- Leases - KFS client library uses caching to improve performance. Leases are used to support cache consistency.
If anyone has experience with KFS, or has more information please leave a comment here.
Comments