ApacheCon NA 2010 Session

Getting Apache Hadoop Production Ready

We, at Yahoo, currently runs several Apache Hadoop clusters with 4000 nodes each. It's been a long journey for the past 12 months to get there. This talk covers challenges we faced in terms of scale (4000 nodes), multi-tenancy (different organizations and users on the same Apache Hadoop installations), varying workloads (batch jobs and SLA sensitive jobs) & classes of clusters (research, production) etc. and the ways we have coped with them in Apache Hadoop HDFS and Apache Hadoop Map-Reduce. Essentially, the idea is to get the message out to the community-at-large regarding the chasm we have crossed, in a way, with Apache Hadoop 0.20 at Yahoo!; and to reinforce the message that Apache Hadoop is ready for enterprise adoption.