Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
How the Data Mesh
is Driving Our
Platform
Trey Hicks
Director of Engineering
• Mentors
• Faith
• Recovery Centers
• Resources
Applications That Help People
Building Technologies To Connect People
• Diverse application types and purpose
• Serving several verticals
• Varying resource needs
• Apps are built internally by Gloo
or with partners
• Common means of connectivity to
data and services
Supporting The Mission
Common Platform Must Consider
Technical Landscape
• Microservices
• Datastores per service or
application domains
• Domain based services
• Event Driven
• Domain Driven
• Kubernetes
• AWS
• Confluent Cloud
Ø Kafka
Ø KsqlDB
• Kafka Connect cluster
• Docker
Our Approach Consists of
Architectural Infrastructure
• Heterogeneous apps
• Resource contention
• Gravitational pull to put application use-cases lower in the stack
• Tight coupling due to customization of shared services
• Blocking development due to cross-team dependencies
• Limits to our ability to scale the organization
Challenges
Challenges in Building the Platform
v Our value prop isn’t the applications, it’s the data
v Application specific use-cases low in the stack
causes problems
Platform Facts
Enter Data Mesh
• Domain-driven architecture
• Data as a product
• Self-serve architecture
• Governance
Zhamak Dehghani
https://martinfowler.com/articles/data-monolith-
to-mesh.html
Perhaps the ideas have existed before
• Data Emphasis
• Domain Driven Design
• Service Oriented Architectures
Provides terminology to shift the
conversation UPWARDS to form a
BROAD data strategy as opposed to
being a technical concern
Principles
Data Mesh Paradigm
Solving the Challenges
Domain-Driven
Architecture
Principle Appeal Solves
Data As a Product
Self-Serve
Infrastructure
Governance
• Microservice
architecture
• Primary value
• Apps are transient
• Easy connectivity to
data and domains
• Secure data ports
• Community trust
• Privacy
• Many apps
• Resource contention
• App requirements in
core services
• Blocking development
• Tight coupling
• Blocking development
• App requirements in
stack
• Tight coupling
• Blocking development
Adopting The Principles
• Establish common terminology and language
• Promote a data first philosophy
• Embrace democratized ownership and the associated responsibilities
• Acceptance of eventual consistency
• In our case, embracing event streams
Culture Shift
Data As a Product
How We Define Data Products
• Our data is our unique value
• Foundation for apps and services that drive success
• Requires governance
Ø Security
Ø Availability
Ø Accessibility
Ø Change controls
• Free of application use-cases
• Integrity
• Person
• Organization
• Catalysts
• Relationships
Data Product Examples
Core Data Objects
Secondary Objects
• Cohorts/Collections
• Growth Intelligence
• Assessments
Access via Data Ports
Sharing the Data
• Distributed Data Products
• Domain boundaries
• Process/Application domains apply
their use-cases
• Domains may use sub-sets or
combinations
• Derived Data Products
Conceptual Architecture
Examples
• Campaign Data
• Event Sourcing
Implementation:
Campaign Data
Creating a Data Product
Connecting to the Data Mesh
Sharing the Data Product
• Governed data available
• Options for Access
Ø Download with ETL or ELT
Ø Kafka
• Both have complications
Ø Manual processes
Ø Lack of consuming process
Ø Skillsets not aligned
More Complexity
Enter Kafka Ecosystem
Data Mesh Platform Using Kafka
• Kafka is perfect for one to many
• Event streams/batches provide a means keeping the consuming
domains in sync with the data product
• Kafka Connect is perfect for turning datastores into event streams
• Kafka Connect is perfect for sinking the streams into a datastore
• KsqlDB is perfect for selecting subsets of data or combining streams to
shape the data
Kafka Connect
Building the Mesh
• Connect Data Product
Ø S3 Source Connector
• Connect Consumers
Ø JDBC Sinks
Ø ES Sink
Kafka Connect
S3 Source Connector
• S3 connection
• Policies
Ø Polling
Ø Subdirectories
• JSON = more approachable
* Mario Molina
Kafka Connect
JDBC Sink Connector
• DB Connection
• Dealing with Schema
Ø Table.name.format
Ø Auto.create and evolve
• Single Message Transform
Ø Inject timestamp
Kafka Connect
ES Sink Connector
• Uses REST client
• Single Message Transform
Ø Document id
Ø Index name
Derived Data Products
Implementation:
Event Sourcing
• Bloated infrastructure
Ø Expensive footprint
Ø K8s is great, maybe too easy to spin up new instances
• Experimentation leaves dead instances and other bones
• Complicated data model and APIs
Revisiting Technical Landscape
New Concerns
• Simplify the overall footprint
Ø Fewer and simpler services
Ø Smaller clusters
Ø Fewer instances
• Improve database schema
• Rethink our APIs
Going Forward In Reverse
Rethinking Parts of the Platform
Event Sourcing
● Major changes without
interruption
Ø Tables restructure
Ø Elements combined or removed
● Existing streams via
Connectors
● Need additional JDBC sinks
Changing the Schema
Applying KsqlDB
More On Infrastructure
• Structured like other engineering “pods”
Ø Engineers
Ø Product
• Charter is to build the self-serve connectivity
• Responsible for Data Mesh infrastructure
• Create reference configs for all Kafka Connectors
• Make it super simple to define, add, and govern new data products
• One team responsible for connectivity and data movement
Creation of Data Mesh Engineering
Discovery
• Provide a catalog of all data products
Ø Documentation or manual catalogs are DOA
Ø Must be automatic
• All data products
• Communication channels
• Consuming domains
• Provide schemas
• Data ports
Keeping Track of All the Things
Deployment
• Kafka Configs Project
Ø Project for all Connectors, KsqlDB, and topic configurations
Ø Updates trigger deployment
• Uses REST Proxies to deploy updates
• Open Source?
• Kafka JMX Exporter to collect metrics used in Grafana
dashboards
Continuous Deployment
Closure
• Data first organization
• Data mesh paradigm helps us solves problems
• Kafka ecosystem is the core of the data mesh driving the platform
• Serving our application domains by using Kafka Connect and KsqlDB
• Future
Ø Improve self-serve
Ø Discovery App à If you have experienced this problem, let’s chat!
Summary
Acknowledgments
● Collin Shaafsma – Leadership
● Ken Griesi – Inspiration, guidance, and discovering the articles
Alex Lauderbaugh
All things Data and ghost writer
Scott Symmank
Technical lead
Hannah Manry
Amazing engineer
Mitch Ertle
Resident BA expert and principal consumer
Chicken
Mascot
* We’re Hiring

More Related Content

How a Data Mesh is Driving our Platform | Trey Hicks, Gloo

  • 1. How the Data Mesh is Driving Our Platform Trey Hicks Director of Engineering
  • 2. • Mentors • Faith • Recovery Centers • Resources Applications That Help People Building Technologies To Connect People
  • 3. • Diverse application types and purpose • Serving several verticals • Varying resource needs • Apps are built internally by Gloo or with partners • Common means of connectivity to data and services Supporting The Mission Common Platform Must Consider
  • 4. Technical Landscape • Microservices • Datastores per service or application domains • Domain based services • Event Driven • Domain Driven • Kubernetes • AWS • Confluent Cloud Ø Kafka Ø KsqlDB • Kafka Connect cluster • Docker Our Approach Consists of Architectural Infrastructure
  • 5. • Heterogeneous apps • Resource contention • Gravitational pull to put application use-cases lower in the stack • Tight coupling due to customization of shared services • Blocking development due to cross-team dependencies • Limits to our ability to scale the organization Challenges Challenges in Building the Platform
  • 6. v Our value prop isn’t the applications, it’s the data v Application specific use-cases low in the stack causes problems Platform Facts
  • 7. Enter Data Mesh • Domain-driven architecture • Data as a product • Self-serve architecture • Governance Zhamak Dehghani https://martinfowler.com/articles/data-monolith- to-mesh.html Perhaps the ideas have existed before • Data Emphasis • Domain Driven Design • Service Oriented Architectures Provides terminology to shift the conversation UPWARDS to form a BROAD data strategy as opposed to being a technical concern Principles Data Mesh Paradigm
  • 8. Solving the Challenges Domain-Driven Architecture Principle Appeal Solves Data As a Product Self-Serve Infrastructure Governance • Microservice architecture • Primary value • Apps are transient • Easy connectivity to data and domains • Secure data ports • Community trust • Privacy • Many apps • Resource contention • App requirements in core services • Blocking development • Tight coupling • Blocking development • App requirements in stack • Tight coupling • Blocking development
  • 9. Adopting The Principles • Establish common terminology and language • Promote a data first philosophy • Embrace democratized ownership and the associated responsibilities • Acceptance of eventual consistency • In our case, embracing event streams Culture Shift
  • 10. Data As a Product How We Define Data Products • Our data is our unique value • Foundation for apps and services that drive success • Requires governance Ø Security Ø Availability Ø Accessibility Ø Change controls • Free of application use-cases • Integrity
  • 11. • Person • Organization • Catalysts • Relationships Data Product Examples Core Data Objects Secondary Objects • Cohorts/Collections • Growth Intelligence • Assessments
  • 13. Sharing the Data • Distributed Data Products • Domain boundaries • Process/Application domains apply their use-cases • Domains may use sub-sets or combinations • Derived Data Products Conceptual Architecture
  • 16. Creating a Data Product
  • 17. Connecting to the Data Mesh Sharing the Data Product • Governed data available • Options for Access Ø Download with ETL or ELT Ø Kafka • Both have complications Ø Manual processes Ø Lack of consuming process Ø Skillsets not aligned
  • 19. Enter Kafka Ecosystem Data Mesh Platform Using Kafka • Kafka is perfect for one to many • Event streams/batches provide a means keeping the consuming domains in sync with the data product • Kafka Connect is perfect for turning datastores into event streams • Kafka Connect is perfect for sinking the streams into a datastore • KsqlDB is perfect for selecting subsets of data or combining streams to shape the data
  • 20. Kafka Connect Building the Mesh • Connect Data Product Ø S3 Source Connector • Connect Consumers Ø JDBC Sinks Ø ES Sink
  • 21. Kafka Connect S3 Source Connector • S3 connection • Policies Ø Polling Ø Subdirectories • JSON = more approachable * Mario Molina
  • 22. Kafka Connect JDBC Sink Connector • DB Connection • Dealing with Schema Ø Table.name.format Ø Auto.create and evolve • Single Message Transform Ø Inject timestamp
  • 23. Kafka Connect ES Sink Connector • Uses REST client • Single Message Transform Ø Document id Ø Index name
  • 26. • Bloated infrastructure Ø Expensive footprint Ø K8s is great, maybe too easy to spin up new instances • Experimentation leaves dead instances and other bones • Complicated data model and APIs Revisiting Technical Landscape New Concerns
  • 27. • Simplify the overall footprint Ø Fewer and simpler services Ø Smaller clusters Ø Fewer instances • Improve database schema • Rethink our APIs Going Forward In Reverse Rethinking Parts of the Platform
  • 28. Event Sourcing ● Major changes without interruption Ø Tables restructure Ø Elements combined or removed ● Existing streams via Connectors ● Need additional JDBC sinks Changing the Schema
  • 30. More On Infrastructure • Structured like other engineering “pods” Ø Engineers Ø Product • Charter is to build the self-serve connectivity • Responsible for Data Mesh infrastructure • Create reference configs for all Kafka Connectors • Make it super simple to define, add, and govern new data products • One team responsible for connectivity and data movement Creation of Data Mesh Engineering
  • 31. Discovery • Provide a catalog of all data products Ø Documentation or manual catalogs are DOA Ø Must be automatic • All data products • Communication channels • Consuming domains • Provide schemas • Data ports Keeping Track of All the Things
  • 32. Deployment • Kafka Configs Project Ø Project for all Connectors, KsqlDB, and topic configurations Ø Updates trigger deployment • Uses REST Proxies to deploy updates • Open Source? • Kafka JMX Exporter to collect metrics used in Grafana dashboards Continuous Deployment
  • 33. Closure • Data first organization • Data mesh paradigm helps us solves problems • Kafka ecosystem is the core of the data mesh driving the platform • Serving our application domains by using Kafka Connect and KsqlDB • Future Ø Improve self-serve Ø Discovery App à If you have experienced this problem, let’s chat! Summary
  • 34. Acknowledgments ● Collin Shaafsma – Leadership ● Ken Griesi – Inspiration, guidance, and discovering the articles Alex Lauderbaugh All things Data and ghost writer Scott Symmank Technical lead Hannah Manry Amazing engineer Mitch Ertle Resident BA expert and principal consumer Chicken Mascot * We’re Hiring