Hadoop Map-Reduce: Tuning and Debugging

As Apache Hadoop, and Hadoop Map-Reduce become widely adopted, especially in real-world applications which drive revenue, it becomes increasingly important to get the most out of Hadoop installations and the Map-Reduce applications. Also, distributed debugging and profiling Map-Reduce applications is hard, but critical for success. This talk will cover several ways to peer into Map-Reduce applications as they crunch terabytes of data. This wide-ranging discussion will also cover topics such as using debuggers/profilers on your applications, using Map-Reduce Counters, other simple ways to tune your applications, and how to avoid common pitfalls. We will also talk about the critical 'data-path' for application data as processed data flows from the map-step to the reduce-step and how to tune it to get optimal performance for user applications.