Posted in January 25, 2010 ¬ 11:45 pmh.Royans
Hive is a data warehouse infrastructure built over Hadoop. It provides tools to enable easy data ETL, a mechanism to put structures on the data, and the capability to querying and analysis of large data sets stored in Hadoop files. Hive defines a simple SQL-like query language, called QL, that enables users familiar with SQL [...]
Read the rest of this entry »
hadoop, highavailability, hive, product, scalability, scaledatabase, datawarehouse, facebook, hadoop, hive, scalability
Posted in September 28, 2007 ¬ 9:41 amh.Royans
Kosmix, a search startup has released source to C++ implementation of something which looks like a clustered file system. This looks very similar to Hadoop/HDFS, but the C++ factor will be a big performance boost.
From Skrenta blog
Incremental scalability – New chunkserver nodes can be added as storage needs increase; the system automatically adapts to the [...]
Read the rest of this entry »
Posted in August 4, 2007 ¬ 6:23 pmh.Royans
This may not be a surprise for a lot of people but it was for me. Even though I have been using lucene and nutch for some time, I didn’t really know enough about Hadoop and HBase until recently.
Hadoop
Scalable: Hadoop can reliably store and process petabytes.
Economical: It distributes the data and processing across clusters [...]
Read the rest of this entry »