ApacheCon US 2008 Session

A Tour of Apache Hadoop

Apache Hadoop is a rapidly growing project for building distributed data-processing systems. This talk will explain the rationale and use cases for a number of Hadoop components - including the distributed file system, MapReduce, and HBase (which provides Bigtable-like structured storage) - along with an example for each. The emphasis will be on how the components relate to each other, and how they can be used together to build highly-scalable applications.