Big Data, AWS, and Security
How FINRA Secures Its Big Data and Data
Science Platform on AWS
V i n c e n t S a u l y s
D a v i d Y a c o n o
A B D 3 1 0
N o v e m b e r 2 9 , 2 0 1 7
Who is FINRA?
• Financial Industry Regulatory Authority.
• Our Mission: “Investor Protection—Market Integrity.”
• We are a private sector not-for-profit organization authorized by Congress to protect America’s
• We do this by:
• Writing and enforcing rules that govern the activities of 3,800 broker-dealers with 634,000 brokers.
• Examining firms for compliance with those rules.
• Fostering market transparency.
• Educating investors.
And most significant to this discussion:
• FINRA uses big data and data science technologies to detect and analyze fraud, market manipulation,
and insider trading across US capital markets.
of storage
of nodes and edges
FINRA Technology
to protect investors and ensure market integrity
3 - 4 years
data online
4 years
Up to
per day
Session takeaways
• Learn to be realistic in your risk assessment of
using the cloud.
• Learn about Amazon Web Services and its
foundational security controls and practices
relevant to safeguarding your big data workloads.
• See how FINRA securely enables our data
scientists, and other big data projects, in AWS by
achieving a balance between productivity and
Data science needs
• Data discovery and exploration.
• Bring disparate sources of data together.
• Semantic understanding of the data sets.
• Ease of use: Enable users without having to understand underlying
data infrastructure.
• Safeguard information with a high degree of security and least
privileges access.
• Model migration from research to prototype to production.
• Avoid time spent on environment administration.
Data Lake
Diver MIRS DOMT Ad-Hoc FOLA Marketspace
Crosstab UI
personal marts
with billons of
interactive reports
and visualizations
depth of market
over time
and data profiling
via SQL
Retrieve market
events to render
order lifecycle
Exception and
alert viewer
Interactive big data portfolio
Threat sources: Private data center
Threat sources: Cloud-based architecture
Evaluating the risk
Key factors
Compare risk of alternatives
• Cloud vs. legacy data center
• The “idealized zero-risk scenario” is unrealistic.
• Legacy data centers have risk!
Evaluate risk in context
• “Cloud” risks get the most press.
• Many existing risks are unchanged by adoption of cloud.
• Many existing risks are significantly “riskier” than new “cloud”
• The “unknown” creates a powerful perception of risk.
Cybersecurity is only one dimension of risk
• Operational, legal, and opportunity risks as well
Shared responsibility model plus
Security “ABOVE” the cloud
• All the security controls
you’re already using.
Security “IN” the cloud
• Controls to supplement
Cloud Service Provider
(CSP) controls.
Security “OF” the cloud
• CSP provides these controls.
Customer due diligence through
third-party risk management.
• AWS security best practices
• Access management
• Authentication
• Authorization/Separation of Duties
• Policy enforcement and oversight
• Logging/monitoring/alerting/UEBA
• Controls for Economic DoS
• Network architecture
• Encryption and Key Management (KMS)
• Governance
Foundational security controls
Access mgmt: Human
Access mgmt: Machine
Access mgmt: Robust SoD
Almost 100,000 entitlements
Access mgmt: Entitlements
Secrets management
• Credstash: Application Credential
• Minimize Exposure of privileged
• Secrets stored in Dynamo DB;
Encrypted with KMS
• Resource/Object Owner
creates/stores Secrets
• Automation deploys secrets to
• IAM Role limits secrets access
• Minimum exposure of credentials
Pervasive logging
• AWS service layer (AWS CloudTrail)
• OS (Splunk agent in Amazon EC2 AMI)
• Platform layer
• FINRA Platforms of course
• Amazon EMR – Splunk agent for underlying
Amazon EC2 bootstrapped into cluster launch
• Application layer
• AWS CloudTrail logging is robust.
Logging, monitoring, and alerting
Logging, monitoring, and alerting
Avoid Economic DoS
• Architecture distributed across multiple
Availability Zones
• Cross-region replication used where:
• Geographic dispersion is needed
• Single region data durability is not
considered sufficient
Network architecture
Encryption and AWS KMS
Security governance
Security controls must comply and enhance the organization’s overall governance policies.
FINRA Cybersecurity has a uniform process for creating and updating governance policies and
FINRA Cybersecurity governance policies and standards are approved and monitored by
• Cloud Compliance Working Group (CCWG)
• Infrastructure Security Posture Review
• Information Security Steering Committee
AMI updates
• Created at least monthly
• Plus out-of-cycle for critical security
• Start: Latest Amazon AMI
• Harden: Remove unneeded packages,
update remaining packages (security
patches) [Yum], apply compliance modules
• Extend: Install common tools (AWS CLI,
Puppet agent, Splunk agent, Trend Micro
agent, etc.);
• Snapshot new AMI.
Security Groups
• Goals:
• Narrowly crafted (microsegmentation),
• Policy of least privilege,
• Separation of Duties
• Challenges: Many groups to manage!
• Solution: FINRA Portus
Strictly Limited Access
• Goal: No access to production.
• Reality: Occasional prod access may be needed
• Modified Goals: Temporary, just-in-time access
• Restricted by IAM Role, AWS Tag
• Approved and Logged
• Solution: FINRA Gatekeeper
Securing the services—Amazon EC2
Security group mgmt: Portus
FINRA-developed Centralized Security Groups Management tool for Developers and the
FINRA Cyber & Information Security Department
• Cyber & Information Security DEFINE security policies
• Developers SELF MANAGE AGS security groups
• Maximizes FLEXIBILITY for developers
• SIMPLIFIES administration for InfoSec
Portus dashboard
Security Policy for each type of logical system
Policy is a set of Whitelisted Rules (inbound and outbound)
Only these Whitelisted Rules are allowed in Security Groups
Only rules Whitelisted in the policy allowed
Only five Security Groups allowed per AGS
Portus developer dashboard
Select AGS from drop down list
Create Security Group by selecting a Policy
Access only to the AGS owned by developer (priv_aws_<AGS>_dev_d AD group)
Gatekeeper (High level)
Call SSM
on VPC
Store request
Search EC2 &
Gatekeeper detailed
Securing the services–Amazon S3
• AWS::EMR::SecurityConfiguration
• At-rest encryption
• Local volumes (LUKS), HDFS
• In-transit encryption
• Inter-node: Spark/Presto/Hive
(TLS). HBase in 2018?
Securing big data services–Amazon EMR
• Logging–Splunk agent in bootstrap:
• JobCode ID, project code logged
• Access Controls
• No access to underlying cluster. (App
layer AuthN/AuthZ)
• Gatekeeper for admin access (rare)
Data sanitization
• One-way hash/tokenization
Preserves ability to associate related
records by the sensitive data element,
search on tokenized values
• Format-preserving encryption
Preserves ability to associate related
records, some limited ability to operate on
data (search, sort, categorize)
• Generalization, subsetting
• De-identification
• Be wary of re-identification strategies
Limit Credential Exposure
• IAM Role-based access is ideal
• Secrets Management (Credstash)
Make Security Easy
• Internal mirrors of external resources
preserves isolation
• Empower users, managers with
utilization/cost information, necessary
entitlements to provide oversight.
Securing the architecture
• The same security policies apply to all
systems within the organization.
• Risks must be identified and mitigated.
• Sufficient controls need to be in place to
protect the work being done.
• Security should not get in the way of getting
the work done.
• When possible, use security tools to make
doing the right thing the easiest thing.
P r o d u c t i vi t yS e c u r i t y
Striking a balance
Data science tooling before UDSP
Data science platform V1
Security & productivity collide
Users were fine if they only needed what the system provided.
Users, unfortunately, may NEED to add packages/libraries.
How did one add new packages/libraries with V1?
• The official way
1. Put in a request to Technology
2. Technology downloads, builds, and bundles into next release
3. Available when a release deploys
• The unofficial way
1. Download package to local machine
2. Upload to cloud
3. Build and install to instance
What went wrong?
Needs driven by technology
• IT: Reduce costs
• Users: Need more compute
Secure but inflexible
• Local machines were more flexible
• Install any package and experiment
Data availability
• On-premises databases not reachable
Setup still required
• Driver configuration to connect to databases
Technology in the way
• Technology required to install any new package
Universal Data Science Platform
Completely self service, no Technology
• Users select UDSP version and machine capacity
Users associated to groups
• Users manage their instances
• AWS billing tags and machine selection choices to
Create, stop, terminate (delete)
• Managers can administer their teams’ instances
Dashboard to monitor resource usage
• Stop instances from the dashboard
Reports for historical usage
What went right?
20 FOLD!!!!
• Be realistic in your risk assessment. The
security risks in using your own data
center are equal to or more than going to
the cloud.
• Use of strong foundational cloud
security controls is paramount.
• AWS provides controls which, when
properly applied, balance productivity and
Related FINRA presentations
2017 re:INVENT
• SID326 - AWS Security State of the Union
Steve Schmidt, chief information security officer of AWS, addresses the current state of security in the cloud. As part of t his presentation, John Brady
(CISO of FINRA) shares the FINRA journey to the cloud. Wednesday, Nov 29, 12:15 p.m. – 1:15 p.m. MGM, Level 3, Premier Ballroom 316
• FSV307 - Capital Markets Discovery: How FINRA Runs Trade Analytics and Surveillance on AWS
The FINRA analytics platform unlocks the value in capital markets data by accelerating trade analytics and providing a founda tion for machine learning at
scale. Monday, Nov 27, 10:45 a.m. – 11:45 a.m. Venetian, Level 5, Palazzo P
• ENT328 - FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud
The Financial Impact Regulatory Authority (FINRA) Technology Group has changed its customers' relationships with data by crea ting a managed data lake
Thursday, Nov 30, 1 p.m. – 2 p.m. MGM, Level 3, Premier Ballroom 319
• DEV335 - Manage Infrastructure Securely at Scale and Eliminate Operational Risks
Managing AWS and hybrid environments securely and safely while having actionable insights is an operational priority and busi ness driver for all
customers. Thursday, Nov 30, 4 p.m. – 4 p.m. Venetian, Level 2, Venetian E
2016 re:INVENT
• BDM203: Building a Secure Data Science Platform on AWS
Thank you!
T o l e a r n m o r e : t e c h n o l o g y . f i n r a . o r g

