Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
268 views

Scaling Data - Data Informed To Data Driven To Data Led - Reforge

The document discusses scaling data at startups by moving from being data-informed to data-driven to data-led. It argues that companies commonly view data as a team to hire or tools to implement, rather than a strategic lever for growth. The Scaling Data Framework outlines three stages - informed, driven, and led - and provides guidance on assessing a company's product/data maturity and aligning data strategy, team, and tools accordingly at each stage.

Uploaded by

Eka Ponkratova
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
268 views

Scaling Data - Data Informed To Data Driven To Data Led - Reforge

The document discusses scaling data at startups by moving from being data-informed to data-driven to data-led. It argues that companies commonly view data as a team to hire or tools to implement, rather than a strategic lever for growth. The Scaling Data Framework outlines three stages - informed, driven, and led - and provides guidance on assessing a company's product/data maturity and aligning data strategy, team, and tools accordingly at each stage.

Uploaded by

Eka Ponkratova
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Scaling Data: Data

Informed to Data Driven


to Data Led
SUBSCRIBE

400

Brian Balfour (/blog?author=528c45f9e4b06be250a9fe2e)

This post is written by Crystal Widjaja


(https://www.crissyw.com/), Reforge Partner/EIR, and the
former SVP Growth and Business Intelligence at Gojek, one
of the largest super apps in Southeast Asia. Crystal helped
Gojek scale from 20K orders per day to 5M. Crystal is also a
co-founder and advisor to Generation Girl which is dedicated
to helping young girls engage in STEM fields. Crystal
previously wrote Why Most Analytics Efforts Fail
(https://www.reforge.com/blog/why-most-analytics-efforts-
fail) and currently leads Reforge’s Advanced Growth Strategy
(/advanced-growth-strategy-series) program.

Thanks to contributions from Dan Wolchonok (Head of Data at Reforge), Elena Verna (EIR at
Reforge, Advisor at Miro, Netlify, MongoDB), Behzod Sirjani (EIR at Reforge, ex-
Slack/Facebook), Shani Hadiyanto, and Sarah Catanzaro.

Apply Here To Become A Reforge Member (https://program.reforge.com/apply?


utm_source=reforge&utm_medium=blogbanner)

SUBSCRIBE

One of the most common questions I get from founders is: ‘When should I hire my first data
person?’ Invariably, the same types of questions are asked over the lifecycle of the company:

• Should I hire a data engineer or a data analyst?

• Do I need a data scientist right now?

• What kind of analysis should the data team be doing?

• Is the PM or data team responsible for data collection?

• Should I be using Looker (an advanced data transformation and visualization tool)?

• What’s the right ratio of analysts to PMs?

• Where do analysts report into?

At the core of these questions is the common mistake of viewing data as a team to hire or
set of tools to implement rather than as a strategic lever for growth. The answers to these
questions are dependent on your product, business, and points of leverage. In this article, I
lay out:
1. Why data is not a team to hire or set of tools to implement

2. How to approach data as a strategic lever for growth

3. The Scaling Data Framework: Informed > Driven > Led

4. A deeper dive into the three stages of scaling data.

Data Is Not A Team To Hire or Set of


Tools To Implement
It is easy for orgs to settle into treating data as a set of tools or a team to implement.
Meaning, the org thinks they just need to adopt a new technology or grow the data team in a
certain way to fix their data needs. Often the sources of data problems are in other areas.

1. Data Capabilities Don't Match The Product Strategy

Answering questions around who to hire, the tools to implement, and the analyses that
SUBSCRIBE

need to be done are ultimately informed by what the product strategy is and how data
plays a role in helping achieve that product strategy. But often times the product
strategy isn't well defined, and even if it is well defined, where data fits in isn't.

2. Your Stage of Data Mismatches The Stage of Business

Often times there is a mismatch between the stage a person has historical experience
with and the stage the company is at. For example, a Data PM coming into a new
company having worked only with a mature company's data. They never saw the steps it
took to get from 0 to great, and end up misapplying technology, team needs, and a lot
more. Scaling data requires many evolutions and it is rare that someone has seen the
entire lifecycle.

3. Incorrect Incentives Between Data and Other Teams

Often, the culture and incentives of the org create a non-functioning data environment.
For example, data teams should not be measured by the answers they give but rather
the impact of those answers on the business. In a lot of culture-poor organizations, PMs
or others take credit for "asking the right question" instead of attributing it to the data
team. This type of system rewards bad behavior and disincentivizes the data team from
doing more impactful analysis — it instead incentivizes them to design pretty result
tables. It may even incentivize the data team to seek out new questions that aren't
relevant to the business but can provide "interesting" answers, which leads to a negative
cycle.

Strategy → Stage → Team → Tools


Instead, data needs to be seen as a strategic lever for growth. Viewing data from this
perspective leads to different answers on the questions we started with around team and
tools. What does it look like when data is treated as a strategic lever for growth? I
recommend walking through four areas:

1. Strategy - What are your points of leverage? How does data improve those points of
leverage?

2. Stage - What stage of maturity is our product in? What stage of maturity is our Data in?

3. Team - What people do we need to achieve the data strategy? Are they set up for
SUBSCRIBE

success internally?

4. Tools - What tools do we need to adopt to facilitate the team's impact?

Strategy: What are your points of leverage?


Everyone thinks they need to have a highly scalable, mature data organization. The reality is
that most businesses don’t have the necessary scale to build advanced ML-led capabilities
that could meaningfully impact the business. The answer to questions like “when do I need a
data scientist?” really starts with an objective reflection of the company’s strategy, roadmap,
and goals (https://www.reforge.com/blog/the-product-strategy-stack):

• How much data do the product and business operations generate each day?

• How can customer value props be improved by leveraging data?

• What kinds of decisions could the data help inform today?

• How could decision-making change if we had 1000x the data?

• How much more efficient could business operations be with data automation?

It's more about identifying the right points of leverage — and not just jumping to the end
because you think everything else will come as a result of it.
Going through some of the above questions tends to reveal some uncomfortable truths. The
most common one is that the company doesn't have enough data for advanced data
infrastructure to be impactful to a company’s business operations. Even the most
sophisticated data science team and infrastructure will fail to add value to a business that
just isn’t generating enough usable data — there aren’t enough signups, retained users, or
actions in the product for meaningful data science solutions to exist.

Stage: What stage of maturity is our product?


Data?
Just because you've identified the points of leverage, doesn't mean you are at the right stage
in data or product maturity to execute on those strategic initiatives. Questions the team
should ask:

• How much of this data is tracked, stored, and owned by the company?

• How consistent and descriptive is the data for our market and trends? (For example, in
SUBSCRIBE

Covid times (https://www.reforge.com/blog/2020/6/2/retention-in-the-times-of-covid-


19) current data might be out of whack with historical data and make it difficult to
leverage in a useful way.)

• Are you tracking data at the right level of granularity or asset class (event-based,
timebound, derived, aggregated)?

• How deterministic is the company's data? What level of granularity do we require to


make deterministic calls?

• How differentiated and proprietary is the company's data?

• How timely, recent, and accurate is the company's data?

• How accessible is the company's data — by both people and systems?

Understanding both what stage of maturity your product is at and where your data is at is
critical. It helps you understand where you should be, where you are, and informs the kind of
tools and team you need to fill the gap. The most common scenarios I see are:

Overbuilding - Data Stage Is Ahead of Product Maturity


If the company is still searching for product-market-fit, building data infrastructure that meets
the needs of a post-product-market-fit organization will actually impede growth. As an
example, the first data scientist I hired at Gojek was tasked with building a fraud detection
service before we even had the infrastructure in place to collect enough data. This was a
mistake — after a few unproductive weeks, we realized that a few simple business logic rules
could capture the majority of suspicious transactions. Many years later, those business rules
are still responsible for preventing 80% of bad actors, even with advanced data science
teams in place. Stories like these are incredibly common. The shiny, enterprise-level data
products prevent companies from making productive use of the data they have and tend to
favor high-risk, high-resource projects that fail to deliver impact.

Companies that waste resources on projects like these failed to identify an appropriate data
strategy for their stage of the business, and instead of building appropriate capabilities,
looked to solve an advanced, specific problem. The key is to identify the right sequence of
problems to solve with the right foundations built in tandem. This means understanding how
data should be leveraged at each stage to meet the needs of the business today in
preparation for what the company will need in the (near) future.
SUBSCRIBE

Under building - Data Stage Is Behind Product Maturity

Under building is when the maturity of the product is ahead of data maturity. You can under
build in different areas - infrastructure, analytics, team, and operations. This is most
problematic when some of the company’s business operations are at scale but are totally
unprepared to leverage data as a strategic, competitive advantage. Some signals that you've
underbuilt:

• You have multiple products using inconsistent data attributes. For example, timestamp
fields use different time zone logic and definitions the taxonomy is all over the place and
inconsistent (https://www.reforge.com/blog/why-most-analytics-efforts-fail).

• Months or even years of data haven’t been tracked at all.

• Data that has been tracked is stuck in a 3rd party system that the company doesn’t
have ownership of. For example, I recently worked with a company that uses Firebase,
thinking they could eventually export logs. But Firebase does not store individual event
data making this impossible — they have literally wasted years of data collection.

• The business has been operating with sub-optimal decision-making without data for so
long, that it’s unlikely to change easily.
The realization of this opportunity cost is painful, and making up for it can take hours of
realigning metrics definitions, sourcing available data, backfilling data pipelines, and a
realignment of the company’s culture.

Team + Tools
Once you understand what role data plays in the overall strategy, and what stage the product
and data are in, then you can begin to understand what team and tools you need and where
there are gaps. Team and tools is not just about having the right heads in place, but about
making sure that the org is working well together. Signals that teams are not aligned:

• Teams aren't collaborating on both problems and solutions with the data team. They are
instead coming to the data team with a hypothesis to validate.

• The data and product org don't have time to align on strategic initiatives because they
are bogged down by minutiae of tasks to be done.

• Analyses aren't treated as valuable findings that help people move closer to their
SUBSCRIBE

objectives, and instead simply evaluate whether something was a win or not.

The Three Stages Of Scaling Data


To help guide teams through the Strategy → Stage → Team → Tools I have laid out a friendly
framework to designing a scalable data organization. In this framework there are two parts:

• 3 Stages of Data Maturity: What the business needs to grow and how data plays a role
informs the data strategy at each stage.

• 4 Capabilities Within Each Stage: The necessary building blocks and capabilities of
each stage across 4 key work streams (infrastructure, analytics, operations, and team).

The 3 Stages of Data Maturity

SUBSCRIBE

Most companies can fit themselves into one of three stages:

• Stage 1: Data Informed. These companies are focused on building the business and
getting to product-market-fit (stable user retention rates). The key business need is for
data to provide operational visibility.

• Stage 2: Data Driven. These companies have reached product-market-fit and are
actively optimizing for specific users, behaviors, and experiences in the product at the
feature-level. The key business need is for data to support the organization’s growth with
scalable tooling, data products, and deep-dive insights.
• Stage 3: Data Led. These companies are operationally run by data products,
infrastructure, and services. The key business need is the “productization” of data
services that unlock Product and Data Science teams, allowing them to automate
operational decision-making and user product experiences.

The successful advancement from one stage to the next requires two things:

• Needs: The company’s activities and desired business objectives have evolved due to
new levels of growth, scale, or product-market-fit

• Capabilities: The dependencies and foundations required for the next stage have been
built and unlock new leverage and capabilities

Note: Not all companies need to become Data Led.

The implication here is that each stage is a linear progression, but it’s important to note that
not all companies become data led. While most companies may self-describe themselves
today as Data Informed or Data Driven (or aspiring to reach those stages), some businesses
envision reaching the Data Led Stage.
SUBSCRIBE

However, this stage does not apply to all businesses; it describes a globally scaled
organization in which data dictates what and how you operate. Businesses with meaningful
traction may find that building Stage 3: Data Led capabilities are possible, but would not
dramatically impact their strategy due to the nature of their business, such as having a small
number of SKUs to optimize for or a low-frequency product in an evolving market that renders
prediction and forecasting models less effective.

The 4 Capabilities
Founders should use this playbook by considering the needs of their business (what they
need to achieve) in comparison to the next section in this framework: the recommended
capabilities (what’s needed to fulfill their business needs) for each stage.

Mature data organizations work in conjunction with mature businesses by empowering,


unlocking, and building off of one another. The right strategy is to match the needs of the
business at its current stage with building the appropriate data capabilities in order for the
business to scale into its next growth phase. There are four pillars of data capabilities in any
organization:
• Infrastructure: the scalability of the tools, architecture, and technical projects that
enable analytics

• Analytics: the complexity and scalability of insights generated in the company

• Operations: the level of direct impact data has on business operations

• Team: who is hired to support the above capabilities

How and what to develop across these 4 capabilities differs by stage but ultimately leads to
building sustainable infrastructure, developing compounding insights, unlocking business
operations, and evolving skill sets.

Stage 1: Data Informed

SUBSCRIBE

The most common pitfall at the data informed stage is being indecisive about the truth
(and allowing multiple versions to co-exist). If the company has already reached product-
market-fit, but is missing one of the crucial capabilities above, teams might think they have a
shared understanding and single source of truth for data when in reality, they really don't.
If Finance believes we gained 100 new transacting users from Facebook Ads in October, but
Marketing thinks it was 120, we’re likely operating from different tools, metrics, definitions,
time zones, or even accrual vs. cash based accounting. This friction commonly leads to
wasting time on alignment, frustration, and avoiding using data at all. Organizations that do
not stamp this out quickly will fail to mature as a data-driven company.

Data Informed Business Needs


• Business health monitoring

• Visibility of product KPIs/success metrics

• Functional operations support for a handful of multi-disciplinary individuals and ICs

• Getting to product-market fit (flattened retention rates across cohorts)

• Metrics definitions and alignment

Data Informed Capabilities


SUBSCRIBE

• Reliable availability of transactional and financial data

• Off-the-shelf data visualization, integration, and tooling

• Broad organizational understanding of unique user, retention, and monetization metrics


through the use of company level KPI dashboards

• Aligned metric definitions through the use of a data dictionary

Stage 2: Data Driven


The biggest pitfall at the data driven stage is misalignment of organizational incentives
in relation to data-driven decision making. Business decisions don't stop getting made in the
absence of data - they just get made with no data at all. Data Driven companies are focused
on scaling with the growing functional and product operations and refining their product
offering for an ever-expanding set of users. This usually starts by reorganizing teams or hiring
product- and feature-specific PMs and functional specialization (e.g. separating Sales Sales
Development Reps, Account Executives, Account Managers, and operational roles).

The increased capabilities of the data function unlocks deeper accountability as specific
teams can now be responsible for input metrics (# of hand-raisers on feature walls, #
SUBSCRIBE

contacts added, days to first call, or active team members per org) instead of a generalized,
org-wide shared responsibility of output metrics (revenue, retention, # of paid upgrades, etc).

Organizations at this stage leverage the Data team for decision-making guidance, as
opposed to operational data retrieval and visibility. To improve data-driven decision
making, the organization must have some self-serve access to information, comprehensive
insights that answer ***why something is happening (not just what is happening)***, and
an early set of productized data products that unlock operational capabilities.

Data Driven Business Needs


• Feature-level product optimization

• Smarter & faster function-specific business operations (sales, customer service, ops)

• Data-informed decision-making guidance on marketing campaigns, growth tactics, and


support operations

• Expanded monetization of core & adjacent users (https://andrewchen.com/the-


adjacent-user-theory/)
Data Driven Capabilities
• Proactive data governance policies

• Scalable data warehouse infrastructure and tooling through a data lake, customer data
platform, and more

• Self-serve analytics tools

• Org-wide experimentation and decision-making guidance through experimentation


tooling and a reporting framework.

Stage 3: Data Led

SUBSCRIBE

The most differentiated thing about Stage 3: Data Led businesses is that they cannot
operationally function without data products. The scale and complexity of both the
company’s operations and its active user base is such that relying solely on business-
generated recommendations, rule sets, and SOPs are not enough to maintain a defensible
product experience. A good example of this would be Amazon, which could not successfully
manage the scale of their business without the proprietary predictive models that power
fulfillment, logistics routing, and warehouse SKU storage.
At this stage, the Data team has built out a self-service data infrastructure platform that
solves for ingestion, governance, monitoring, and automation. It is no longer the data team’s
sole responsibility to take care of onboarding new data sources and integrating them into the
product’s feedback loops and ML models. This “productization” of data services unlocks
the Product and Data Science teams and allows them to quickly build new products and
features with the data they need.

Data Led Business Needs


• Data-leveraged defensibility

• Self-serve data onboarding and productization

• Prescriptive, automated operational decision-making

• Deepened user engagement, frequency, and monetization

Data Led Capabilities


SUBSCRIBE

• Platformized, scalable data warehouse infrastructure and self-serve tooling

• Feature engineering to feed and train data science models

• Near-real-time data availability

Thoughtful Sequencing, Not All At


Once
A key to success is to not try and enable a stage all at once. At a high-level, infrastructure
and analytics must be balanced between two things:

1. Unlocking product insights and business capabilities

2. Scaling data operations and infrastructure

The right balance is achieved with a thoughtful sequencing of architectural engineering work,
analytics, and application of analytics to business and product. The struggle for early-stage
Data Informed companies will be cultivating the necessary blend of technical and business
skills in the organization that can unlock meaningful insights efficiently. Teams with strong
communication lines between business, product, and engineering will sequence these efforts
SUBSCRIBE

more efficiently than teams with equivalent skill sets but siloed communications. Shared
business and technical fluency encourages the right sequencing by focusing on
understanding what grows the business and having the complementary technical know-how to
select tools, research solutions, and implement quickly without taking on large engineering
projects that would not add proportional value. It is a constant process of identifying
business needs, building the necessary capabilities, and seeing it unlock growth, which
leads to new business needs.

Apply Here To Become A Reforge Member (https://program.reforge.com/apply?


utm_source=reforge&utm_medium=blogbanner)

 53 Likes  Share
Newer Post Older Post
Upsides to Unshipping: The Art of Removing Announcing the 2021 Spring EIRs and OIRs
Features and Products (/blog/unshipping- (/blog/2021/3/16/announcing-our-new-
features) 2021-spring-eirs-and-oirs)

Partners (/partners) | Future Programs (/subscribe) | FAQ (https://reforge.com/faq) |


Terms & Privacy Policy (/terms-and-conditions) | Contact (/contact) | Careers (/jobs)

SUBSCRIBE

You might also like