Designing Data Products

Open Data for Agriculture
Intro to Big Data
29/11/2013
Athens, Greece
Joint offering by

Supported by EU projects

Designing Data Products

Dr. Vassilis Protonotarios
Agro-Know Technologies, Greece

Intro
• This presentation provides introductory information
about
– the (big) data products
– the design of (big) data products,
– the Drivetrain approach for the design of objective-based
(big) data products.

• The Drivetrain approach will be applied to
agricultural case studies in the next session
• The majority of the slides were based on the report
“Big Data Now: 2012 Edition. O’Reilly Media, Inc.”
Slide 3 of 66

Objectives
This presentation aims to:
• Provide an introduction to data products
• Define the “objective-based data products” concept
– Describe the Drivetrain approach in the design of (big) Data
products
– Analyze the design of data products
– Provide applications / case studies

In order to provide the methodology for the development
of data products
Slide 4 of 66

Structure of the presentation
1. Intro to designing (great) data products
2. Objective-based data products
– The Drivetrain approach

3. Case studies (x4): Application of the
Drivetrain approach
4. The future of data products

Slide 5 of 66

INTRO TO DESIGNING GREAT DATA
PRODUCTS

What is a (big) data product?
• What happens when (big) data becomes a
product
– specifically, a consumer product

• Produce (big) data based on inputs ?
= data producers

• Deliver results based on (big) data ?
= data processors

• Uses big data for providing useful outcomes?
Slide 7 of 66

Facts about (big) data products
• Enable their users to do whatever they want
– which most often has little to do with (big) data

• Replace physical products

Slide 8 of 66

The past: Predictive modeling
• Development of data products based on data
predictive modeling
– weather forecasting
– recommendation engines
– email spam filters
– services that predict airline flight times
• sometimes more accurately than the airlines
themselves.

Slide 9 of 66

The issue of predictive modeling
• Prediction technology: interesting, useful and
mathematically elegant
– BUT we need to take the next step because….

• These products just make predictions
– instead of asking what action they want someone
to take as a result of a prediction

Slide 10 of 66

The role of predictive modeling
• Great predictive modeling is still an important
part of the solution
– but it no longer stands on its own
– as products become more sophisticated, it
becomes less useful

Slide 11 of 66

A new, alternative approach
• the Drivetrain Approach
– a four-step approach, already applied in the
industry
– inspired by the emerging field of self-driving
vehicles
– Objective-based approach
B
A
http://www.popularmechanics.com/cars/how-to/repair-questions/1302716

Slide 12 of 66

A fine objective-based model

Slide 13 of 66

Case study
• A user of Google’s self-driving car
– is completely unaware of the hundreds (if not
thousands) of models and the petabytes of data
that make it work

BUT
• It is an increasingly sophisticated product built
by data scientists
– they need a systematic design approach
Slide 14 of 66

The Drivetrain approach

http://cdn.oreilly.com/radar/images/posts/0312-1-drivetrain-approach-lg.png
Slide 16 of 66

ource: http://en.wikipedia.org/wiki/File:Altavista-1999.png

Case study: Search engines

Why ?

1997
2013

Slide 17 of 66

The 4 steps in the Drivetrain approach
• The four steps in this transition:
– Identify the main objective
• For Google: show the most relevant search results

– Specify the system’s manageable inputs [levers]
• For Google: ranking the results

– Consider the data needed for managing the inputs
• Information about users’ activities in other web sites

– Building the predictive models
• For Google: PageRank algorithm
Slide 18 of 66

Drivetrain approach goal

NOT
use data not just to generate more data
– especially in the form of predictions

BUT
use data to produce actionable outcomes

Slide 19 of 66

[CASE STUDY 1] THE MODEL ASSEMBLY
LINE: A CASE STUDY OF INSURANCE
COMPANIES

The issue of insurance companies
• Case study: Insurance companies
– Their objective: maximizing the profit from each
policy price
– An optimal pricing model is to them what the
assembly line is to automobile manufacturing
– Despite their long experience in prediction, they
often fail to make optimal business decisions
about what price to charge each new customer

Slide 21 of 66

Transition to Drivetrain approach
• Identifying solutions to this issue
– Optimal Decisions Group (ODG) approached this
problem with an early use of the Drivetrain
Approach
– Resulted in a practical take on step 4 that can be
applied to a wide range of problems.

Model Assembly Line

Slide 22 of 66

Set a price for policies, maximizing profit
• price to charge each customer;
• types of accidents to cover;
• how much to spend on marketing and customer service;
• how to react to their competitors’ pricing decisions

• Data collected from real experiments on customers

Randomly changing the prices of hundreds of thousands of policies
over many months
•

• Develop a probability model for optimizing the insurer’s profit
Slide 23 of 66

of 66

http://cdn.oreilly.com/radar/images/posts/0312-2-drivetrain-step4-lg.png

Developing the modeler:
Model Assembly Line

The role of the Modeler
• Modeler Component 1
• model of price elasticity: the probability that a customer will
accept a given price (for new policies and renewals)

• relates price to the insurance company’s profit, conditional
on the customer accepting this price.

• Multiplying these two curves creates a final curve
• Shows price versus expected profit
• The final curve has a clearly identifiable local maximum that
represents the best price to charge a customer for the first
year.
Slide 25 of 66

The final curve

http://strata.oreilly.com/2012/03/drivetrain-approach-data-products.html

Slide 26 of 66

The role of the Simulator
• Lets ODG ask the “what if ” questions
• to see how the levers affect the distribution of the final
outcome

• Runs the models over a wide range of inputs
• The operator can adjust the input levers to answer specific
questions
• “What will happen if our company offers the customer a low
teaser price in year one but then raises the premiums in Y2?

• Explores the distribution of profit as affected the
inputs outside of the insurer’s control
• E.g. “What if the economy crashes and the customer loses
his job?”
Slide 27 of 66

The role of the Optimizer
• takes the surface of possible outcomes and
identifies the highest point
• finds the best outcomes
• identify catastrophic outcomes
– and show how to avoid them

Slide 28 of 66

Take-home message
Using a Drivetrain Approach combined with a
Model Assembly Line bridges the gap
between predictive models and actionable
outcomes.

Slide 29 of 66

[CASE STUDY 2] MARKETING INDUSTRY
RECOMMENDER SYSTEMS

Recommendation engines
• Recommendation engines
– data product based on well-built predictive
models that do not achieve an optimal objective.
– The current algorithms predict what products a
customer will like,
• based on purchase history and the histories of similar
customers.

Slide 31 of 66

The case of Amazon
• Amazon represents every purchase that has
ever been made as a giant sparse matrix
– with customers as the rows and products as the
columns.

• Once they have the data in this format, data
scientists apply some form of collaborative
filtering to “fill in the matrix.”

Slide 32 of 66

The case of Amazon
• Such models are good at predicting whether a
customer will like a given product
– but they often suggest products that the customer
already knows about or has already decided not
to buy.


Slide 33 of 66

Mixed-up recommendations

Slide 34 of 66

Drive additional sales
by surprising and delighting the customer with books not initially considered
without the recommendation

• Ranking of the recommendations

Data to derive from many randomized experiments about a
wide range of recommendations for a wide range of
customers:
To generate recommendations that will cause new sales

• Develop an algorithm providing recommendations which escape a
recommendation filter bubble
Slide 35 of 66

of 66


Developing the modeler

• purchase probabilities, conditional on seeing a recommendation

• purchase probabilities, conditional on not seeing a
recommendation

• The difference between these two probabilities is a
utility function for a given recommendation to a
customer
• Low in cases where the algorithm recommends a familiar book
that the customer has already rejected (both components are
small) or a book that he/she would have bought even without
the recommendation
Slide 37 of 66

The final curve


Slide 38 of 66

• Test the utility of each of the many possible
books in stock
• Alternatively just overall the outputs of a
collaborative filtering model of similar
customer purchases

Slide 39 of 66

• Rank and display the recommended books
based on their simulated utility
• Less emphasis on the “function” and more on
the “objective.”
– What is the objective of the person using our data
product?
– What choice are we actually helping him or her
make?
Slide 40 of 66

[CASE STUDY 3] OPTIMIZING
LIFETIME CUSTOMER VALUE

Customer value
• Includes all interactions between a retailer and
his customers outside the actual buy-sell
transaction
– making a product recommendation
– encouraging the customer to check out a new feature
of the online store
– sending sales promotions

• Making the wrong choices comes at a cost to the
retailer
– reduced margins (discounts that do not drive extra
sales)
– opportunity costs

Slide 42 of 66

optimize the lifetime value from each
customer
• Product recommendations

• Offer tailored discounts / special offers on products
• Make customer-care calls just to see how the user is
• Invite them to use the site and ask for their feedback

Zafu approach:
not send customers directly to clothes but ask a series of simple questions
about the customers’ body type, how well their other jeans fit and their fashion
preferences

• Develop an algorithm leading customer to browse a recommended selection
of Zafu’s inventory
Slide 43 of 66

• purchase probabilities, conditional on seeing a
recommendation

• purchase probabilities, conditional on not seeing a
recommendation

• price elasticity model to test how offering a discount might
change the probability that the customer will buy the item

• patience model for the customers’ tolerance for poorly
targeted communications
Slide 46 of 66

The final curve


Slide 47 of 66

• Test the utility of each of the many possible
clothes available
• Provide successful matches between
questions & recommendations

Slide 48 of 66

• Rank and display the recommended clothes
based on their simulated utility
– driving sales and improving the customer experience

• Less emphasis on the “function” and more on the
“objective”
– What is the objective of the person using our data
product?
– What choice are we actually helping him or her make?
Slide 49 of 66

[CASE STUDY 4] REAL LIFE
APPLICATION: SELF-DRIVING CAR

Building a car that drives itself (1/2)
• Alternative approach: Instead of being data
driven, we can now let the data drive us!
• Models required:
– model of distance / speed-limit to predict arrival
time; a ruler and a road map needed
– model for traffic congestion
– model to forecast weather conditions and their
effect on the safest maximum speed

Slide 51 of 66

Building a car that drives itself (2/2)
Plenty of cool challenges in building these
models but by themselves, they do not take us
to our destination

• Simulator: to predict the drive times along
various routes
• Optimizer: pick the shortest route subject to
constraints like avoiding tolls or maximizing
gas mileage
Slide 52 of 66

It is already implemented
According to Google, about a dozen self-driving cars
are on the road at any given time. They've already
logged more than 500,000 miles in beta tests.

Slide 53 of 66

Building a car that drives itself
Vehicle controls
• Steering wheel, Accelerator, Βrakes

Data from sensors etc.
• sensors that gather data about the road
• cameras that detect road signs, red or green lights & unexpected
obstacles

• Physics models to predict the effects of steering, braking &
acceleration
• Pattern recognition algorithms to interpret data from the road signs
Slide 54 of 66

Developing the Modeler


Slide 55 of 66

• Route selection, conditional on following a
recommendation

• Route selection, conditional on not following a
recommendation

Slide 56 of 66

• examine the results of the possible actions the
self-driving car could take
– If it turns left now, will it hit that pedestrian?
– If it makes a right turn at 55 km/h in these
weather conditions, will it skid off the road?

• Merely predicting what will happen isn’t good
enough.
Slide 57 of 66

• optimize the results of the simulation
– to pick the best combination of acceleration and
braking, steering and signaling

Prediction only tells us that there is going to be
an accident.
An optimizer tells us how to avoid accidents.
Slide 58 of 66

The present
• Drivetrain Approach:
– a framework for designing the next generation of
great data products
– heavily relies on optimization

• A need for the data science community to
educate others
– on how to derive value from their predictive
models
– Based on product design process
Slide 60 of 66

Current status of data products
• Data continuously provided in big data
providers
– Facebook, Twitter etc.

• Data are transformed -> they do not look like
data in the end
– Telematics, booking systems etc.

• Example: Music now lives on the cloud
– Amazon, Apple, Google, or Spotify
Slide 61 of 66

The future
• Optimization taught in business schools &
statistics departments.
• Data scientists ship products designed to
produce desirable business outcomes
Risk: Models using data to create more
data, rather than using data to create
actions, disrupt industries, and
transform lives.
Slide 62 of 66

To keep in mind for future big data
products
• when building a data product, it is critical to
integrate designers into the engineering team
from the beginning.
• Data products frequently have special
challenges around inputting or displaying
data.

Slide 63 of 66

What to expect in the future?
Google needs to move beyond the current
search format of you entering a query and
getting 10 results.
The ideal would be us knowing what you want
before you search for it…
Eric Schmidt
Executive Chairman of Google

Slide 64 of 66

The future is near!
25/11/2013

Slide 65 of 66

References
• Big Data Now: 2012 Edition. O’Reilly Media, Inc.
• O’Reilly Strata: Making Data Work
(http://strata.oreilly.com/tag/big-data)

• Jeremy Howard - The Drivetrain Approach: A four-step process
for building data products
(http://strata.oreilly.com/2012/03/drivetrain-approach-dataproducts.html)

• Mike Loukides - The evolution of data products
(http://strata.oreilly.com/2011/09/evolution-of-data-products.html)

• Wikipedia: Big data (http://en.wikipedia.org/wiki/Big_data)

Slide 66 of 66

Thank you!

Vassilis Protonotarios
Agro-Know Technologies
vprot@agroknow.gr

Designing Data Products

Recommended

More Related Content

What's hot (20)

Viewers also liked (9)

Similar to Designing Data Products (20)

More from Vassilis Protonotarios (20)

Recently uploaded (20)

Designing Data Products

Editor's Notes