ApacheCon NA 2010 Session

Hadoop Security: Keeping out the Looky Loos

Hadoop 0.20, which is the current release of Hadoop, implicitly trusts the user when they state their username and group membership. That is acceptable when used by small teams, but large corporations need more control. For example, large corporations need to have independent clusters for each different kind of sensitive information (financial, personal identifiable information, etc.) and control access by limiting access to those clusters. With Hadoop's new security features and its integration with Kerberos, it is possible to verify that the user is who they claim to be and ensure they only have the correct access to data or resources. This allows corporations to allow finer grained access to information and reduce their operational overhead by coalescing their distinct clusters. This presentation will cover the goals of security and how to use the new features to ensure the security of their HDFS and MapReduce clusters. I will also include Yahoo's experiences deploying the back-ported Hadoop Security features on their science and production clusters.