Amazon EMR Security: © 2018, Amazon Web Services, Inc. or Its Affiliates. All Rights Reserved
Amazon EMR Security: © 2018, Amazon Web Services, Inc. or Its Affiliates. All Rights Reserved
Amazon EMR Security: © 2018, Amazon Web Services, Inc. or Its Affiliates. All Rights Reserved
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Key takeaways
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon EMR Security
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Security needs are continuously evolving
• Authentication
• Authenticate users and systems
• Authorization
• Provision access to data
• Data Protection
• Protect data at rest and in transit
• Audit
• Maintain a record of data access
• Administration
• Central management and consistent security
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Security today in Hadoop with EMR 5.x
Authentication Authorization Audit Data Protection
Who am I ? What can I do? What did I do? Can data be encrypted at
rest and over the wire
• Linux Users
• Encryption At-Rest
• Adding public • Posix based • Log Analysis
• Encryption In-Transit
Keys Authorization
• Kerberos
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Authentication
LDAP
HiveServer2
Presto Coordinator
Spark Thrift Server
Hue Server
Zeppelin Server
AWS credentials
EMR Step (EMR API)
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
New – Authentication with Kerberos
DoAs YARN RM
Users
KDC
Service principals for all
cluster nodes
Microsoft
Active Directory
Master Node
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Authorization
• Storage-based
• EMRFS/S3
• HDFS
• HiveServer2 and Presto (SQL-based)
• HBase
• YARN queues
• Fine-grained access control by cluster tag (IAM)
• Apache Ranger on edge node (using CloudFormation)
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
New – EMRFS fine-grained authorization
Context IAM role: analytics_prod
User: aduser
Group: analyst
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
EMRFS Security Configuration - Example
{
"AuthorizationConfiguration": {
"EmrFsConfiguration": {
"RoleMappings": [{
"Role":"arn:aws:iam::123456789101:role/allow_EMRFS_access_for_user1 ",
"IdentifierType": "User",
"Identifiers": [ "user1" ]
},{
"Role": "arn:aws:iam::123456789101:role/allow_EMRFS_access_to_MyBuckets ",
"IdentifierType": "Prefix",
"Identifiers": [ "s3://MyBucket/","s3://MyOtherBucket/" ]
},{
"Role": "arn:aws:iam::123456789101:role/allow_EMRFS_access_for_AdminGroup ",
"IdentifierType": "Group",
"Identifiers": [ "AdminGroup" ]
}]
}
}
}
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Security - Authentication and authorization
Apache Ranger
• Spark
• Tez
• MapReduce
• Presto
• HBase
• Hive
• Pig
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Security - Governance and auditing
• Custom AMIs
• AWS CloudTrail for EMR APIs
• S3 access logs for cluster S3 access
• YARN and application logs
• Ranger for UI for application-level auditing
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Custom AMIs
• Benefits
• Reduction of cluster start time
• Prevent unexpected bootstrap action failures
• Support for Amazon EBS root volume encryption
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Custom AMIs
• Requirements
• Must be an Amazon Linux AMI
• Must be an HVM AMI
• Must be an EBS-backed AMI
• Must not have multiple EBS volumes
• Must be a 64-bit AMI
• Must not have users with the same name as applications (example: hadoop,
hdfs, yarn, or spark)
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.