Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Tristan Baker - linkedin.com/in/tristanbaker
Suresh Raman - linkedin.com/in/ramansuresh
Allison Bellah (in absentia) - linkedin.com/in/allisonbellah
Data Mesh at Intuit
May 13, 2021
©2021 Intuit Inc. All rights reserved. 2
Arriving at data mesh
Our vision and four part strategy
Now with 25% more parts!
Q&A
Fire away!
Agenda
Arriving at data mesh
©2021 Intuit Inc. All rights reserved. 4
A brief history of data infrastructure
©2021 Intuit Inc. All rights reserved. 5
A brief history of data infrastructure
©2021 Intuit Inc. All rights reserved. 6
A brief history of data infrastructure
©2021 Intuit Inc. All rights reserved. 7
Today
©2021 Intuit Inc. All rights reserved. 8
What “we cannot scale” sounds like from our users
Discovering Data
● Where can I find data about a particular thing (customer,
company, etc)?
● Where can I find the data sourced from a particular product
or service?
Understanding Data
● Who can approve my access so that I can see samples of the
data?
● What is the schema of the data?
● What is the business meaning and context of the data?
● Is this data related to other concepts? Is it joinable to other
data? What is the meaning of the relationship?
Trusting Data
● What system produces this data and at what latency?
● What other systems use this data?
● What is the quality of this data? Is it ‘clean’?
● Which team supports this data if it breaks?
Publishing Data
● How do I describe my data so that others understand what it
means and how to use it?
● Where do I host my data so that other systems can access it?
● Data systems are complicated, how can I build and operate
my process on top of one?
● What are my operational responsibilities once my
process/data is in production?
● How do I meet my compliance requirements for
processing/storing/publishing data?
● Am I duplicating processing/data that already exists?
Consuming Data
● How is this table/topic partitioned?
● Who can approve my production system to access it?
● Will I get alerted if the schema changes?
©2021 Intuit Inc. All rights reserved. 9
The future of data infrastructure
● Data treated as code
● Data service as a facet of a product
● Data responsibility decentralized
● Producers take responsibility for data
● Producers serve consumers
● Data platform provides the ecosystem to
govern and manage the lifecycle of data and
machine learning
The provocation
Data Mesh is born
Our vision and four part
strategy
©2021 Intuit Inc. All rights reserved. 11
Enable more Intuit teams
to more easily use and
create data
©2021 Intuit Inc. All rights reserved. 12
Four part strategy
• Stewardship
– ensures accountability for a set of defined responsibilities in building and managing their solutions; including
adherence to a set of defined best practices to produce only high quality data.
• Organizing people, code and data
– A systematic approach to organizing the people, code and data which clearly identifies the owners of a business problem and its
solution.
• Self serve products
– A rich suite of self serve products that enable teams to more easily author, deploy, govern and operate their own solutions, aided
by automation and processes that support best practices and high quality as a precondition for deployment.
• Rationalizing data definitions
– A process for rationalizing all critical data definitions at the company so that data concepts like Customer, Product and
Entitlement are unique, re-usable and non-conflicting.
Stewardship
©2021 Intuit Inc. All rights reserved. 14
©2021 Intuit Inc. All rights reserved. 15
Stewardship goals for next year
Organizing People,
Code, and Data
©2021 Intuit Inc. All rights reserved. 17
Raw information about physical systems that describes where the data is stored and where code is
executing. This describes where data is physically located so that it can be accessed.
©2021 Intuit Inc. All rights reserved. 18
Basic dependency, ownership and classification information provides additional context about physical
data and code locations so that data can be better governed, secured and operated by the owning
teams.
©2021 Intuit Inc. All rights reserved. 19
Why organizing people, code and data matters
19
Private vs Public
~50% tables are either
temp/sandbox/staging/test/backu
p tables
- Messes up search & discovery
- Teams consume data not meant
for external use
Data Ownership
~50% tables don’t have clearly
identified owners
- Erodes Trust
- Copies proliferate
- Operational, Governance risk
Self Serve Products
©2021 Intuit Inc. All rights reserved. 21
Data Processing Capabilities Data Serving Capabilities
©2021 Intuit Inc. All rights reserved. 22
Self Serve goals for next year
100% of Top 20 tasks in the Data lifecycle are Self Serve
Infra Provisioning
● Transactional Persistence
● Compute for stream, batch
processing
● Monitor, Debug Infra
● Cost
Data Authoring
● Events, Schemas
● Ingestion
● Transformations
● Entities
● ML Features
● Data Quality,
Observability
● Orchestration
Data Governance
● Access Management
● Key management
● Compliance Controls &
Audit
● Privacy
Rationalizing Data
Definitions
©2021 Intuit Inc. All rights reserved. 24
Clean entity information with formally defined meaning and relationships enables better data understanding. This
is the purpose of entity definitions. They ensure that data is clean, organized, connected, discoverable and
documented in a formal way.
©2021 Intuit Inc. All rights reserved. 25
When you bring it all together, you get Intuit’s Data Mesh
©2021 Intuit Inc. All rights reserved. 26
©2021 Intuit Inc. All rights reserved. 27
©2021 Intuit Inc. All rights reserved. 28
©2021 Intuit Inc. All rights reserved. 29
©2021 Intuit Inc. All rights reserved. 30
Capturing meaning,
relationship, ownership, and
system dependencies builds a
full, rich picture for everyone.
No tribal knowledge needed!
In this example, the clean
information describes entities
OII Account and Intuit Product
and the Entitled To relationship
between them.
The basic information describes
how the data for these entities
are sourced from the Identity
Universal Service and the
Entitlement Reference Service.
The raw information describes
which Event Bus topic and Data
Lake table the data for these
entities can be found in.
Q&A
Tristan Baker - linkedin.com/in/tristanbaker
Suresh Raman - linkedin.com/in/ramansuresh
Allison Bellah (in absentia) - linkedin.com/in/allisonbellah
©2021 Intuit Inc. All rights reserved. 32
32

More Related Content

Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021

  • 1. Tristan Baker - linkedin.com/in/tristanbaker Suresh Raman - linkedin.com/in/ramansuresh Allison Bellah (in absentia) - linkedin.com/in/allisonbellah Data Mesh at Intuit May 13, 2021
  • 2. ©2021 Intuit Inc. All rights reserved. 2 Arriving at data mesh Our vision and four part strategy Now with 25% more parts! Q&A Fire away! Agenda
  • 4. ©2021 Intuit Inc. All rights reserved. 4 A brief history of data infrastructure
  • 5. ©2021 Intuit Inc. All rights reserved. 5 A brief history of data infrastructure
  • 6. ©2021 Intuit Inc. All rights reserved. 6 A brief history of data infrastructure
  • 7. ©2021 Intuit Inc. All rights reserved. 7 Today
  • 8. ©2021 Intuit Inc. All rights reserved. 8 What “we cannot scale” sounds like from our users Discovering Data ● Where can I find data about a particular thing (customer, company, etc)? ● Where can I find the data sourced from a particular product or service? Understanding Data ● Who can approve my access so that I can see samples of the data? ● What is the schema of the data? ● What is the business meaning and context of the data? ● Is this data related to other concepts? Is it joinable to other data? What is the meaning of the relationship? Trusting Data ● What system produces this data and at what latency? ● What other systems use this data? ● What is the quality of this data? Is it ‘clean’? ● Which team supports this data if it breaks? Publishing Data ● How do I describe my data so that others understand what it means and how to use it? ● Where do I host my data so that other systems can access it? ● Data systems are complicated, how can I build and operate my process on top of one? ● What are my operational responsibilities once my process/data is in production? ● How do I meet my compliance requirements for processing/storing/publishing data? ● Am I duplicating processing/data that already exists? Consuming Data ● How is this table/topic partitioned? ● Who can approve my production system to access it? ● Will I get alerted if the schema changes?
  • 9. ©2021 Intuit Inc. All rights reserved. 9 The future of data infrastructure ● Data treated as code ● Data service as a facet of a product ● Data responsibility decentralized ● Producers take responsibility for data ● Producers serve consumers ● Data platform provides the ecosystem to govern and manage the lifecycle of data and machine learning The provocation Data Mesh is born
  • 10. Our vision and four part strategy
  • 11. ©2021 Intuit Inc. All rights reserved. 11 Enable more Intuit teams to more easily use and create data
  • 12. ©2021 Intuit Inc. All rights reserved. 12 Four part strategy • Stewardship – ensures accountability for a set of defined responsibilities in building and managing their solutions; including adherence to a set of defined best practices to produce only high quality data. • Organizing people, code and data – A systematic approach to organizing the people, code and data which clearly identifies the owners of a business problem and its solution. • Self serve products – A rich suite of self serve products that enable teams to more easily author, deploy, govern and operate their own solutions, aided by automation and processes that support best practices and high quality as a precondition for deployment. • Rationalizing data definitions – A process for rationalizing all critical data definitions at the company so that data concepts like Customer, Product and Entitlement are unique, re-usable and non-conflicting.
  • 14. ©2021 Intuit Inc. All rights reserved. 14
  • 15. ©2021 Intuit Inc. All rights reserved. 15 Stewardship goals for next year
  • 17. ©2021 Intuit Inc. All rights reserved. 17 Raw information about physical systems that describes where the data is stored and where code is executing. This describes where data is physically located so that it can be accessed.
  • 18. ©2021 Intuit Inc. All rights reserved. 18 Basic dependency, ownership and classification information provides additional context about physical data and code locations so that data can be better governed, secured and operated by the owning teams.
  • 19. ©2021 Intuit Inc. All rights reserved. 19 Why organizing people, code and data matters 19 Private vs Public ~50% tables are either temp/sandbox/staging/test/backu p tables - Messes up search & discovery - Teams consume data not meant for external use Data Ownership ~50% tables don’t have clearly identified owners - Erodes Trust - Copies proliferate - Operational, Governance risk
  • 21. ©2021 Intuit Inc. All rights reserved. 21 Data Processing Capabilities Data Serving Capabilities
  • 22. ©2021 Intuit Inc. All rights reserved. 22 Self Serve goals for next year 100% of Top 20 tasks in the Data lifecycle are Self Serve Infra Provisioning ● Transactional Persistence ● Compute for stream, batch processing ● Monitor, Debug Infra ● Cost Data Authoring ● Events, Schemas ● Ingestion ● Transformations ● Entities ● ML Features ● Data Quality, Observability ● Orchestration Data Governance ● Access Management ● Key management ● Compliance Controls & Audit ● Privacy
  • 24. ©2021 Intuit Inc. All rights reserved. 24 Clean entity information with formally defined meaning and relationships enables better data understanding. This is the purpose of entity definitions. They ensure that data is clean, organized, connected, discoverable and documented in a formal way.
  • 25. ©2021 Intuit Inc. All rights reserved. 25 When you bring it all together, you get Intuit’s Data Mesh
  • 26. ©2021 Intuit Inc. All rights reserved. 26
  • 27. ©2021 Intuit Inc. All rights reserved. 27
  • 28. ©2021 Intuit Inc. All rights reserved. 28
  • 29. ©2021 Intuit Inc. All rights reserved. 29
  • 30. ©2021 Intuit Inc. All rights reserved. 30 Capturing meaning, relationship, ownership, and system dependencies builds a full, rich picture for everyone. No tribal knowledge needed! In this example, the clean information describes entities OII Account and Intuit Product and the Entitled To relationship between them. The basic information describes how the data for these entities are sourced from the Identity Universal Service and the Entitlement Reference Service. The raw information describes which Event Bus topic and Data Lake table the data for these entities can be found in.
  • 31. Q&A Tristan Baker - linkedin.com/in/tristanbaker Suresh Raman - linkedin.com/in/ramansuresh Allison Bellah (in absentia) - linkedin.com/in/allisonbellah
  • 32. ©2021 Intuit Inc. All rights reserved. 32 32