Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
1 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Built-in Security For The Cloud
DataWorks Summit Sydney
September 2017
2 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Presenters
Jeff Sposetti
Senior Director of Product Management, Cloud
Hortonworks Data Cloud, Cloudbreak
3 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Agenda
 Introduction
 Quick Demo
 Security Building Blocks: Apache Ranger and Knox
 Bringing It Together: Cloud and Data Lake Security
 Longer Demo
 Wrap Up
 Q & A
4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Background: Ephemeral Workloads + Cloud Storage
 Cloud is driving more ephemeral data processing use cases
 Cloud requires a robust integration with cloud storage
CLOUD STORAGE
S3
ADLS
WASB
WORKLOAD CLUSTERS
Durable Ephemeral
5 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Background: Hortonworks Data Cloud for AWS
 Focuses on business agility, rather than
infinite configurability and cluster
management
 Addresses prescriptive, ephemeral use
cases around Apache Spark + Apache Hive
 Pre-tuned and configured for use with
Amazon S3
Learn more:
http://hortonworks.com/products/cloud/aws/
6 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Quick demo…
7 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Security Building Blocks:
Apache Ranger and Knox
8 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Protecting the Elephant in the Castle…..
Kerberos,
Wire Encryption
HDFS Encryption
Apache Ranger
Network Segmentation,
Firewalls
LDAP/AD
Apache Knox
9 © Hortonworks Inc. 2011 – 2017. All Rights
Reserved
Apache Knox Proxying Services
★ Provide access to Hadoop via proxying of
HTTP resources
★ Ecosystem APIs and UIs + Hadoop oriented
dispatching for Kerberos + doAs
(impersonation) etc.
Authentication Services
★ REST API access, WebSSO flow for UIs
★ LDAP/AD, Header based PreAuth
★ Kerberos, SAML, OAuth
Client DSL/SDK Services
★ Scripting through DSL
★ Using Knox Shell classes directly as SDK
10 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Apache Ranger
Comprehensive and Extensible Security Model
• Centralized platform to define, administer and manage
security policies across Hadoop components (HDFS, Hive,
HBase, YARN, Kafka, Solr, Storm, Knox, NiFi, Atlas)
• Extensible Architecture with ability to add custom policy
conditions, user context enrichers
Fine-Grained Authorization
• For data access control for Database, Table, Column, LDAP
Groups & Specific Users
Centralized Auditing
• Central audit location for all access requests
• Support multiple destination sources (HDFS, Solr, etc.)
• Real-time visual query interface
Advanced Security
• Dynamic Security Policies: Prohibition, Time, Location and
Tag (Atlas)
• Dynamic Column Masking & Row Filtering
OPERATIONS SECURITY
GOVERNANCE
STORAGE
STORAGE
Machine
Learning
Batch
StreamingInteractive
Search
SECURITY
11 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Bringing It Together:
Cloud and Data Lake Services
12 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
CLOUD
DATA LAKE
SECURITY
13 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Key Components for Enterprise Security
SCHEMA POLICY AUDIT DIRECTORY
WHAT
Provides Hive schema (tables,
views, etc).
WHY
If you have 2+ workloads
accessing the same data, need
to share schema across those
workloads.
HOW
Externalize Hive Metastore
into for schema definition.
WHAT
Defines security policies
around Hive schema.
WHY
If you have 2+ users accessing
the same data, need policies
to be consistently available
and enforced.
HOW
Externalize and share Ranger
across workloads and store
policies external.
WHAT
Audit user access.
WHY
Capture data access activity.
HOW
Externalize and share Ranger
across workloads, leverage
cloud storage for audit data.
GATEWAY
WHAT
Provide single endpoint that
can be protected with SSL and
enabled for authentication to
access to cluster resources.
WHY
Avoid opening many ports,
some potentially w/o
authentication or SSL
protection.
HOW
Deploy a centralized protected
gateway automatically.
WHAT
Users and groups.
WHY
Provide authentication source
for users and authorization
source for groups.
HOW
Leverage external LDAP or
Active Directory.
14 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Ephemeral Workloads: With Enterprise Security
Ephemeral Enterprise Security
Tuned and Optimized
Infrastructure
Simplified, Automated
Operations
S3 Integration
Protected Network Access
Schema Shared (Hive Metastore) Shared (Hive Metastore)
Authentication Single-user Multi-User (LDAP/AD)
Authorization - Security Policies (Ranger)
Audit - Audit (Ranger)
15 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Ephemeral Workloads + Cloud Storage + Shared “Data Lake” Services
CLOUD STORAGE
S3
ADLS
WASB
WORKLOAD CLUSTERS
Durable Ephemeral
SHARED DATA LAKE SERVICES
Metastore
SCHEMA
Long Running
Define your data schema and
security policies once for your
ephemeral and always-on
workloads
Ranger
POLICY
Security access to workload
clusters via a Protected Gateway
enabled for AuthN and HTTPS.
16 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Shared Schema: Hive Metastore
 Register external “Amazon RDS” instances to use with Hive Metastore
 Preserve Hive schema across multiple ephemeral clusters
17 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Protected Network Access: Knox
18 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Shared Security Policies: Ranger
 Create a set of “Shared Data Lake Services”
 Preserve Ranger Security Policies across multiple ephemeral clusters
19 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Deployment Architecture
Access your cluster
components through the
protected gateway via SSL
on port 443 open on the
controller security group.
CONTROLLER
PROTECTED
GATEWAY
USER ACCESS
Zeppelin
HIVE LLAP / SPARK WORKLOADS
Hive
LLAP
SHARED DATA LAKE SERVICES
Ranger
POLICY
(RDS)
AUDIT
(S3)
SCHEMA
(RDS)
DIRECTORY
(LDAP/AD)
Spark
Hive
Metastore
20 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Hortonworks Data Cloud + Shared Data Lake Services
1
2
3
Register an Authentication Source (i.e. LDAP/AD).
Create a “Shared Data Lake”, specify S3 Bucket & RDS.
When you create a cluster, ”attach” to the Shared Data Lake Services:
• for Multi-User AuthN (LDAP/AD)
• for AuthZ + Audit (Ranger)
• for Schema (Hive Metastore)
PREREQUISITES
• LDAP/AD
• S3 Bucket
• RDS Instance
21 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Longer demo…
22 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
General Guidelines
 Think Ephemeral. All of your data and metadata in S3 and RDS respectively, do not
create tables or files in the local HDFS.
 The Hive warehouse is setup to be on S3 for data lakes, create tables in this location
instead of individual S3 buckets, it will make them easier to manage.
 Use Hive “external tables” for tables that are outside this warehouse, typically if the
data is being ingested through some path outside of Hadoop
 Create S3 bucket policies that exactly match usage so that you can spin up clusters with
the least privilege.
23 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Wrap Up
24 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Takeaways
 Cloud driving more ephemeral data processing use cases
 Ephemeral workloads leverage cloud storage
 This pattern is driving an architectural approach for “Shared Data Lake Services”
 Building blocks are Apache Ranger and Apache Knox
Resource Link
Hortonworks Data Cloud https://hortonworks.com/products/cloud/aws/
Apache Ranger https://hortonworks.com/apache/ranger/
Apache Knox https://hortonworks.com/apache/knox-gateway/
25 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Learn More
Enterprise ready security
and governance for
Hadoop ecosystem
Breakout Session
Thursday, September 21 @ 3:10p
https://dataworkssummit.com/sydney-
2017/sessions/treat-your-enterprise-data-lake-
indigestion-enterprise-ready-security-and-governance-
for-hadoop-ecosystem
Security, Governance and
Cybersecurity
Bird of a Feather
Thursday, September 21 @ 6:00p
https://dataworkssummit.com/sydney-2017/birds-of-a-
feather/security-governance-cybersecurity/
26 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Thank You
https://hortonworks.com/products/cloud/aws/
https://hortonworks.com/apache/ranger/
https://hortonworks.com/apache/atlas/

More Related Content

Built-In Security for the Cloud

  • 1. 1 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Built-in Security For The Cloud DataWorks Summit Sydney September 2017
  • 2. 2 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Presenters Jeff Sposetti Senior Director of Product Management, Cloud Hortonworks Data Cloud, Cloudbreak
  • 3. 3 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Agenda  Introduction  Quick Demo  Security Building Blocks: Apache Ranger and Knox  Bringing It Together: Cloud and Data Lake Security  Longer Demo  Wrap Up  Q & A
  • 4. 4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Background: Ephemeral Workloads + Cloud Storage  Cloud is driving more ephemeral data processing use cases  Cloud requires a robust integration with cloud storage CLOUD STORAGE S3 ADLS WASB WORKLOAD CLUSTERS Durable Ephemeral
  • 5. 5 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Background: Hortonworks Data Cloud for AWS  Focuses on business agility, rather than infinite configurability and cluster management  Addresses prescriptive, ephemeral use cases around Apache Spark + Apache Hive  Pre-tuned and configured for use with Amazon S3 Learn more: http://hortonworks.com/products/cloud/aws/
  • 6. 6 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Quick demo…
  • 7. 7 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Security Building Blocks: Apache Ranger and Knox
  • 8. 8 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Protecting the Elephant in the Castle….. Kerberos, Wire Encryption HDFS Encryption Apache Ranger Network Segmentation, Firewalls LDAP/AD Apache Knox
  • 9. 9 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Apache Knox Proxying Services ★ Provide access to Hadoop via proxying of HTTP resources ★ Ecosystem APIs and UIs + Hadoop oriented dispatching for Kerberos + doAs (impersonation) etc. Authentication Services ★ REST API access, WebSSO flow for UIs ★ LDAP/AD, Header based PreAuth ★ Kerberos, SAML, OAuth Client DSL/SDK Services ★ Scripting through DSL ★ Using Knox Shell classes directly as SDK
  • 10. 10 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Apache Ranger Comprehensive and Extensible Security Model • Centralized platform to define, administer and manage security policies across Hadoop components (HDFS, Hive, HBase, YARN, Kafka, Solr, Storm, Knox, NiFi, Atlas) • Extensible Architecture with ability to add custom policy conditions, user context enrichers Fine-Grained Authorization • For data access control for Database, Table, Column, LDAP Groups & Specific Users Centralized Auditing • Central audit location for all access requests • Support multiple destination sources (HDFS, Solr, etc.) • Real-time visual query interface Advanced Security • Dynamic Security Policies: Prohibition, Time, Location and Tag (Atlas) • Dynamic Column Masking & Row Filtering OPERATIONS SECURITY GOVERNANCE STORAGE STORAGE Machine Learning Batch StreamingInteractive Search SECURITY
  • 11. 11 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Bringing It Together: Cloud and Data Lake Services
  • 12. 12 © Hortonworks Inc. 2011 – 2017. All Rights Reserved CLOUD DATA LAKE SECURITY
  • 13. 13 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Key Components for Enterprise Security SCHEMA POLICY AUDIT DIRECTORY WHAT Provides Hive schema (tables, views, etc). WHY If you have 2+ workloads accessing the same data, need to share schema across those workloads. HOW Externalize Hive Metastore into for schema definition. WHAT Defines security policies around Hive schema. WHY If you have 2+ users accessing the same data, need policies to be consistently available and enforced. HOW Externalize and share Ranger across workloads and store policies external. WHAT Audit user access. WHY Capture data access activity. HOW Externalize and share Ranger across workloads, leverage cloud storage for audit data. GATEWAY WHAT Provide single endpoint that can be protected with SSL and enabled for authentication to access to cluster resources. WHY Avoid opening many ports, some potentially w/o authentication or SSL protection. HOW Deploy a centralized protected gateway automatically. WHAT Users and groups. WHY Provide authentication source for users and authorization source for groups. HOW Leverage external LDAP or Active Directory.
  • 14. 14 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Ephemeral Workloads: With Enterprise Security Ephemeral Enterprise Security Tuned and Optimized Infrastructure Simplified, Automated Operations S3 Integration Protected Network Access Schema Shared (Hive Metastore) Shared (Hive Metastore) Authentication Single-user Multi-User (LDAP/AD) Authorization - Security Policies (Ranger) Audit - Audit (Ranger)
  • 15. 15 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Ephemeral Workloads + Cloud Storage + Shared “Data Lake” Services CLOUD STORAGE S3 ADLS WASB WORKLOAD CLUSTERS Durable Ephemeral SHARED DATA LAKE SERVICES Metastore SCHEMA Long Running Define your data schema and security policies once for your ephemeral and always-on workloads Ranger POLICY Security access to workload clusters via a Protected Gateway enabled for AuthN and HTTPS.
  • 16. 16 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Shared Schema: Hive Metastore  Register external “Amazon RDS” instances to use with Hive Metastore  Preserve Hive schema across multiple ephemeral clusters
  • 17. 17 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Protected Network Access: Knox
  • 18. 18 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Shared Security Policies: Ranger  Create a set of “Shared Data Lake Services”  Preserve Ranger Security Policies across multiple ephemeral clusters
  • 19. 19 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Deployment Architecture Access your cluster components through the protected gateway via SSL on port 443 open on the controller security group. CONTROLLER PROTECTED GATEWAY USER ACCESS Zeppelin HIVE LLAP / SPARK WORKLOADS Hive LLAP SHARED DATA LAKE SERVICES Ranger POLICY (RDS) AUDIT (S3) SCHEMA (RDS) DIRECTORY (LDAP/AD) Spark Hive Metastore
  • 20. 20 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Hortonworks Data Cloud + Shared Data Lake Services 1 2 3 Register an Authentication Source (i.e. LDAP/AD). Create a “Shared Data Lake”, specify S3 Bucket & RDS. When you create a cluster, ”attach” to the Shared Data Lake Services: • for Multi-User AuthN (LDAP/AD) • for AuthZ + Audit (Ranger) • for Schema (Hive Metastore) PREREQUISITES • LDAP/AD • S3 Bucket • RDS Instance
  • 21. 21 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Longer demo…
  • 22. 22 © Hortonworks Inc. 2011 – 2017. All Rights Reserved General Guidelines  Think Ephemeral. All of your data and metadata in S3 and RDS respectively, do not create tables or files in the local HDFS.  The Hive warehouse is setup to be on S3 for data lakes, create tables in this location instead of individual S3 buckets, it will make them easier to manage.  Use Hive “external tables” for tables that are outside this warehouse, typically if the data is being ingested through some path outside of Hadoop  Create S3 bucket policies that exactly match usage so that you can spin up clusters with the least privilege.
  • 23. 23 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Wrap Up
  • 24. 24 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Takeaways  Cloud driving more ephemeral data processing use cases  Ephemeral workloads leverage cloud storage  This pattern is driving an architectural approach for “Shared Data Lake Services”  Building blocks are Apache Ranger and Apache Knox Resource Link Hortonworks Data Cloud https://hortonworks.com/products/cloud/aws/ Apache Ranger https://hortonworks.com/apache/ranger/ Apache Knox https://hortonworks.com/apache/knox-gateway/
  • 25. 25 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Learn More Enterprise ready security and governance for Hadoop ecosystem Breakout Session Thursday, September 21 @ 3:10p https://dataworkssummit.com/sydney- 2017/sessions/treat-your-enterprise-data-lake- indigestion-enterprise-ready-security-and-governance- for-hadoop-ecosystem Security, Governance and Cybersecurity Bird of a Feather Thursday, September 21 @ 6:00p https://dataworkssummit.com/sydney-2017/birds-of-a- feather/security-governance-cybersecurity/
  • 26. 26 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Thank You https://hortonworks.com/products/cloud/aws/ https://hortonworks.com/apache/ranger/ https://hortonworks.com/apache/atlas/