Hadoop Tools and Tricks for Data Processing Pipelines (Part 2)

Why should I attend this Training? LISTEN HERE! (Episode 56)

Presenters: Christophe Bisciglia, Aaron Kimball, and Tom White

There's more to Hadoop than just getting your cluster running. In this session, we will look closely at data processing pipelines, and teach participants how to leverage Hadoop and related tools to conduct complex data analysis tasks.

We will assume a functioning Hadoop cluster and properly configured supporting tools such as Hive, Pig, Streaming, etc. We will go over, in depth, at least one data processing pipeline. This will include data collection, formatting, processing and presentation. Participants will see examples of code, and the glue necessary to start with the data you have, and end with the results you want. Cloudera will provide this cluster free of charge to training participants for the duration of the conference, and provide support for user issues.

Expertise Level: Intermediate

This session is intended for intermediate Hadoop users. Users need to have used Hadoop and be familiar with the basics. Users should be comfortable with Java programming and scripting in their language of choice. We will not cover basics or cluster configuration, but will focus on enabling people with the need to process large amounts of data to leverage Hadoop and related tools such as Hive, Pig, and Streaming. Target users may have large volumes of log files or user data that they would like to explore.

About the Presenters:

Christophe Bisciglia joins Cloudera from Google where he worked extensively with the academic community. Results from this work include integrating Hadoop into many undergraduate and graduate curricula, workshops for educators around the world, and an extensive partnership with the National Science Foundation to support researchers in various fields dealing with big data.

Aaron Kimball and Tom White both join Cloudera after working as Hadoop consultants. Aaron has been actively involved with teaching Hadoop and training users, and Tom is currently completing "Hadoop: The Definitive Guide" to be published by O'Reilly.