As a 120 year-old company, Nordstrom was facing numerous challenges as a result of an aging, service-oriented, architecture. Developers needing to implement reporting for analytics separately from core functionality resulted in questionable data quality for analytical purposes. Scaling dependent services in harmony to not overwhelm each other was a struggle faced by many, if not most, teams. Several years into a company-wide transition to an event-sourced architecture, Nordstrom has solved these and various other problems. By leveraging the capabilities of Apache Kafka and Confluent, combined with a deep organizational focus on well-defined business event schemas, a singular event can be used for analytical, functional, operational, and model building purposes. This session will describe this architecture and the lessons learned while building it, with a focus on the internally built, multi-tenant, multi-cluster, Kafka-as-a-Service platform that enables it.
Report
Share
Report
Share
1 of 27
Download to read offline
More Related Content
Nordstrom's Event-Sourced Architecture and Kafka-as-a-Service | Adam Weyant and Beau Bender, Nordstrom
2. PRESENTERS
BEAU BENDER
Beau strives to democratize ML
& AI within Nordstrom by
empowering employees to
seamlessly deliver production
quality solutions at scale.
ADAM WEYANT
Adam supports Nordstrom’s
multi-tenant Kafka platform and
related tools.
Beau and Adam are honored to work at Nordstrom, a retailer that strives to be the center of fashion
authority. As part of the Data and Analytical Services organization, they work on solutions that enable
world-class customer service and personalization.
4. ABOUT NORDSTROM
Wallin & Nordstrom
store opened
1998
Nordstrom.com
Launches
1901 2019
Design of Nordstrom
Analytical Platform
Begins
2020
Real-time store
analytics launches
2023
2017
First Generation
Custom Analytical
Platform Launches Majority of business
decisions automated
6. NORDSTROM ANALYTICAL
PLATFORM - CONSIDERATIONS
Hackathonability
Security and tokenization
CCPA/GDPR compliance
Data quality/discovery
Acceptable staleness
7. NORDSTROM ANALYTICAL PLATFORM - ALIGNMENT
Event-first design
Processes
Application design review
Centralized schema advocate group
SDKs
Engineering standards
Ownership of data quality
ORDER SUBMITTED
Number of Use-Cases
Clickstream
NAP
Data
Quality
8. NORDSTROM ANALYTICAL PLATFORM – BEFORE/AFTER
Before NAP With NAP
Burdens
Considerations
Workflow
Data is an afterthought of design and
siloed
Analytics is a first-class
consideration
Data is collected with opaque process
and transformations, and can only be
accessed by data scientists once
processed the next day
Well defined business events are
streamed live to any system that is
interested
Data quality is owned by producers,
but all have a responsibility to drive
improvements
Ownership of quality is not well
defined
“If It’s not in NAP, it didn’t happen.”
9. NORDSTROM ANALYTICAL
PLATFORM – WHAT’S NEXT?
Majority of business decisions automated
All business operations flow through NAP
Staleness reporting
Data quality
Data discoverability
Data lineage
10. A DISTRIBUTED
STREAMING PLATFORM
FOR NORDSTROM,
BASED ON APACHE
KAFKA.
190
TEAMS
5-6X
READ:WRITE RATIO
150TB
STORED
1.25Gbps
PEAK READ
0.3Gbps
PEAK WRITE
500K
CONCURRENT
CONNECTIONS
16. PROTON API
Powers UI and Terraform provider.
PROTON UI
Proton resource management and
documentation.
STREAMING AND SCHEDULED
JOBS
SerivceNow integration, API Key lifecycle,
change-data capture operations, and operational
observability.
SELF-SERVICE LAYER
Proton resource management and core of
multi-tenancy architecture.
Kafka Brokers
Schema Registry
Kafka Connect
Monitoring
Automation
Proton API
Proton UI
Streaming and
Scheduled Jobs
Primary Secondary
KAFKA-AS-A-SERVICE
17. KAFKA BROKERS
Data streaming and retention.
SCHEMA REGISTRY
AVRO schema storage and management.
KAFKA CONNECT
Managed S3, SQS, and Lambda sink.
KAFKA INFRASTRUCTURE
Kafka streaming infrastructure and related
services.
Kafka Brokers
Schema Registry
Kafka Connect
Monitoring
Automation
Proton API
Proton UI
Streaming and
Scheduled Jobs
Primary Secondary
KAFKA-AS-A-SERVICE
18. AUTOMATION
Kafka topic, user, and connector resource
provisioning and quota management.
MONITORING AND
RECONCILIATION
Extract insights from infrastructure to
self-heal and improve visibility in Proton UI
for Kafka connector status, Schema details,
etc.
INFRASTRUCTURE
AUTOMATION
Event-driven control-plane and monitor for
Kafka clusters, Kafka Connect, and Schema
Registry.
Kafka Brokers
Schema Registry
Kafka Connect
Monitoring
Automation
Proton API
Proton UI
Streaming and
Scheduled Jobs
Primary Secondary
KAFKA-AS-A-SERVICE