Presentation delivered during the Introductory Course to Big Data in Agriculture. 29/11/2013, NCSR Demokritos, Athens, Greece.
The presentation is heavily based on the report titled “Big Data Now: 2012 Edition", by O’Reilly Media, Inc.
More info about the event: http://wiki.agroknow.gr/agroknow/index.php/Athens_Green_Hackathon_2013
This document discusses transforming from data projects to data products. It outlines how companies can adopt a product mindset and focus on creating data products that solve specific customer problems. Key aspects include defining data product teams led by data product managers, adopting a product mindset of focusing on outcomes rather than outputs, and using storytelling to communicate insights from data products. The presentation argues that treating data as a product can create competitive advantages and that every company may need to become a data science company in the future.
Examples, techniques, and lessons learned building data products over the last 3 years at LinkedIn.
Pete Skomoroch is a Principal Data Scientist at LinkedIn where he leads a team focused on building data products leveraging LinkedIn's powerful identity and reputation data.
The talk describes some techniques and best practices applied to develop products like LinkedIn Skills & Endorsements.
This was the inaugural UberData Tech Talk, held in SF at Uber HQ.
What are data products and why are they different from other products?inovex GmbH
This document discusses data products and how they differ from traditional products. It defines three types of data products: data as a service, data-enhanced products, and data as insights. The document highlights that data product management may require different processes and methods than traditional product management. It provides examples from interviews with companies about how they approach aspects like defining value propositions, positioning, and identifying data for data products. The conclusion is that data products are different and require categorizing them to help with business modeling and product definition.
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
Developing a Data Strategy for your organization can seem like a daunting task – but it’s worth the effort. Getting your Data Strategy right can provide significant value, as data drives many of the key initiatives in today’s marketplace – from digital transformation, to marketing, to customer centricity, to population health, and more. This webinar will help demystify Data Strategy and its relationship to Data Architecture and will provide concrete, practical ways to get started.
Data Architecture Strategies: Data Architecture for Digital TransformationDATAVERSITY
MDM, data quality, data architecture, and more. At the same time, combining these foundational data management approaches with other innovative techniques can help drive organizational change as well as technological transformation. This webinar will provide practical steps for creating a data foundation for effective digital transformation.
Data-Ed Slides: Best Practices in Data Stewardship (Technical)DATAVERSITY
In order to find value in your organization's data assets, heroic data stewards are tasked with saving the day- every single day! These heroes adhere to a data governance framework and work to ensure that data is: captured right the first time, validated through automated means, and integrated into business processes. Whether its data profiling or in depth root cause analysis, data stewards can be counted on to ensure the organization's mission critical data is reliable. In this webinar we will approach this framework, and punctuate important facets of a data steward’s role.
Learning Objectives:
- Understand the business need for a data governance framework
- Learn why embedded data quality principles are an important part of system/process design
- Identify opportunities to help drive your organization to a data driven culture
DAS Slides: Data Governance and Data Architecture – Alignment and SynergiesDATAVERSITY
Data Governance can have a varied definition, depending on the audience. To many, Data Governance consists of committee meetings and stewardship roles. To others, it focuses on technical Data Management and controls. Holistic Data Governance combines both of these aspects, and a robust Data Architecture and associated diagrams can be the “glue” that binds business and IT governance together. Join this webinar for practical tips and hands-on exercises for aligning Data Architecture and Data Governance for business and IT success.
This presentation reports on data governance best practices. Based on a definition of fundamental terms and the business rationale for data governance, a set of case studies from leading companies is presented. The content of this presentation is a result of the Competence Center Corporate Data Quality (CC CDQ) at the University of St. Gallen, Switzerland.
- Corporate data is growing rapidly at 100% every year and data generated in the past 3 years is equivalent to the previous 30 years.
- With increasing data, organizations need tools to manage data and turn it into useful information for strategic decision making.
- Business intelligence provides interactive tools for analyzing large amounts of data from different sources and transforming it into insightful reports and dashboards to help organizations make better business decisions.
The document outlines several upcoming workshops hosted by CCG, an analytics consulting firm, including:
- An Analytics in a Day workshop focusing on Synapse on March 16th and April 20th.
- An Introduction to Machine Learning workshop on March 23rd.
- A Data Modernization workshop on March 30th.
- A Data Governance workshop with CCG and Profisee on May 4th focusing on leveraging MDM within data governance.
More details and registration information can be found on ccganalytics.com/events. The document encourages following CCG on LinkedIn for event updates.
Data Modeling, Data Governance, & Data QualityDATAVERSITY
Data Governance is often referred to as the people, processes, and policies around data and information, and these aspects are critical to the success of any data governance implementation. But just as critical is the technical infrastructure that supports the diverse data environments that run the business. Data models can be the critical link between business definitions and rules and the technical data systems that support them. Without the valuable metadata these models provide, data governance often lacks the “teeth” to be applied in operational and reporting systems.
Join Donna Burbank and her guest, Nigel Turner, as they discuss how data models & metadata-driven data governance can be applied in your organization in order to achieve improved data quality.
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
Developing a Data Strategy for your organization can seem like a daunting task – but it’s worth the effort. Getting your Data Strategy right can provide significant value, as data drives many of the key initiatives in today’s marketplace, from digital transformation to marketing, customer centricity, population health, and more. This webinar will help demystify Data Strategy and its relationship to Data Architecture and will provide concrete, practical ways to get started.
BI Consultancy - Data, Analytics and StrategyShivam Dhawan
The presentation describes my views around the data we encounter in digital businesses like:
- Looking at common Data collection methodologies,
-What are the common issues within the decision support system and optimiztion lifecycle,
- Where are most of failing?
and most importantly, "How to connect the dots and move from Data to Strategy?"
I work with all facets of Web Analytics and Business Strategy and see the structures and governance models of various domains to establish and analyze the key performance indicators that allow you to have a 360º overview of online and offline multi-channel environment.
Apart from my experience with the leading analytic tools in the market like Google Analytics, Omniture and BI tools for Big Data, I am developing new solutions to solve complex digital / business problems.
As a resourceful consultant, I can connect with your team in any modality or in any form that meets your needs and solves any data/strategy problem.
The right approach to data governance plays a crucial role in the success of AI and analytics initiatives within an organization. This is especially true for small to medium-sized companies that must harness the power of data to drive growth, innovation and competitiveness.
This guide aims to provide SMB organizations with a practical roadmap to successfully implement a data governance strategy that ensures data quality, security and compliance. Use it to unlock the full potential of your data assets.
Data Architecture Strategies: The Rise of the Graph DatabaseDATAVERSITY
Graph databases are growing in popularity, with their ability to quickly discover and integrate key relationship between enterprise data sets. Business use cases such as recommendation engines, master data management, social networks, enterprise knowledge graphs and more provide valuable ways to leverage graph databases in your organization. This webinar provides an overview of graph database technologies, and how they can be used for practical applications to drive business value.
Adopting a Process-Driven Approach to Master Data ManagementSoftware AG
What is a lasting solution to the sea of errors, headaches, and losses caused by inconsistent and inaccurate master data such as customer and product records? This is the data that your business counts on to operate business processes and make decisions. But this data is often incomplete or in conflict because it resides in multiple IT systems. Master Data Management (MDM)'s programs are the solution to this problem, but these programs can fail without the investment and involvement of business managers.
Listen to Rob Karel, Forrester analyst, and Jignesh Shah from Software AG to learn about a new, process-driven approach to MDM and why it is a win-win for both business and IT managers.
Visit us at http://www.softwareag.com Become part of our growing community: Facebook: http://www.facebook.com/softwareag Twitter: http://www.twitter.com/softwareag LinkedIn: http://www.linkedin.com/company/software-ag YouTube: http://www.youtube.com/softwareag
Describes what Enterprise Data Architecture in a Software Development Organization should cover and does that by listing over 200 data architecture related deliverables an Enterprise Data Architect should remember to evangelize.
The document provides guidance on designing a data and analytics strategy. It discusses why data and analytics are important for business success in the digital age. It outlines 13 approaches to a data and analytics strategy organized by core business strategy and value proposition. It emphasizes the importance of data literacy, governance, and quality. It provides examples of how organizations have used data and analytics to improve outcomes. The overall message is that a clear strategy is needed to communicate the business value of data and maximize its impact.
Strategic imperative the enterprise data modelDATAVERSITY
With today's increasingly complex data ecosystems, the Enterprise Data Model (EDM) is a strategic imperative that every organization should adopt. An Enterprise Data Model provides context and consistency for all organizational data assets, as well as a classification framework for data governance. Enterprise modeling is also totally consistent with agile workflows, evolving incrementally to keep pace with changing organizational factors. In this session, IDERA’s Ron Huizenga will discuss the increasing importance of the EDM, how it serves as a framework for all enterprise data assets, and provides a foundation for data governance.
Too often I hear the question “Can you help me with our Data Strategy?” Unfortunately, for most, this is the wrong request because it focuses on the least valuable component – the Data Strategy itself. A more useful request is this: “Can you help me apply data strategically?”Yes, at early maturity phases the process of developing strategic thinking about data is more important than the actual product! Trying to write a good (must less perfect) Data Strategy on the first attempt is generally not productive –particularly given the widespread acceptance of Mike Tyson’s truism: “Everybody has a plan until they get punched in the face.” Refocus on learning how to iteratively improve the way data is strategically applied. This will permit data-based strategy components to keep up with agile, evolving organizational strategies. This approach can also contribute to three primary organizational data goals.
In this webinar, you will learn how improving your organization’s data, the way your people use data, and the way your people use data to achieve your organizational strategy will help in ways never imagined. Data are your sole non-depletable, non-degradable, durable strategic assets, and they are pervasively shared across every organizational area. Addressing existing challenges programmatically includes overcoming necessary but insufficient prerequisites and developing a disciplined, repeatable means of improving business objectives. This process (based on the theory of constraints) is where the strategic data work really occurs, as organizations identify prioritized areas where better assets, literacy, and support (Data Strategy components) can help an organization better achieve specific strategic objectives. Then the process becomes lather, rinse, and repeat. Several complementary concepts are also covered, including:
- A cohesive argument for why Data Strategy is necessary for effective Data Governance
- An overview of prerequisites for effective strategic use of Data Strategy, as well as common pitfalls
- A repeatable process for identifying and removing data constraints
- The importance of balancing business operation and innovation
Data Governance Trends - A Look Backwards and ForwardsDATAVERSITY
As DATAVERSITY’s RWDG series hurdles into our 12th year, this webinar takes a quick look behind us, evaluates the present, and predicts the future of Data Governance. Based on webinar numbers, hot Data Governance topics have evolved over the years from policies and best practices, roles and tools, data catalogs and frameworks, to supporting data mesh and fabric, artificial intelligence, virtualization, literacy, and metadata governance.
Join Bob Seiner as he reflects on the past and what has and has not worked, while sharing examples of enterprise successes and struggles. In this webinar, Bob will challenge the audience to stay a step ahead by learning from the past and blazing a new trail into the future of Data Governance.
In this webinar, Bob will focus on:
- Data Governance’s past, present, and future
- How trials and tribulations evolve to success
- Leveraging lessons learned to improve productivity
- The great Data Governance tool explosion
- The future of Data Governance
This document discusses data governance and data architecture. It introduces data governance as the processes for managing data, including deciding data rights, making data decisions, and implementing those decisions. It describes how data architecture relates to data governance by providing patterns and structures for governing data. The document presents some common data architecture patterns, including a publish/subscribe pattern where a publisher pushes data to a hub and subscribers pull data from the hub. It also discusses how data architecture can support data governance goals through approaches like a subject area data model.
Real-World Data Governance: Data Governance ExpectationsDATAVERSITY
When starting a Data Governance program, significant time, effort and bandwidth is typically spent selling the concept of data governance and telling people in your organization what data governance will do for them. This may not be the best strategy to take. We should focus on making Data Governance THEIR idea not ours.
Shouldn’t the strategy be that we get the business people from our organization to tell US why data governance is necessary and what data governance will do for them? If only we could get them to tell us these things? Maybe we can.
Join Bob Seiner and DATAVERSITY for this informative Real-World Data Governance webinar that will focus on getting THEM to tell US where data governance will add value. Seiner will review techniques for acquiring this information and will share information of where this information will add specific value to your data governance program. Some of those places may surprise you.
This document summarizes a research study that assessed the data management practices of 175 organizations between 2000-2006. The study had both descriptive and self-improvement goals, such as understanding the range of practices and determining areas for improvement. Researchers used a structured interview process to evaluate organizations across six data management processes based on a 5-level maturity model. The results provided insights into an organization's practices and a roadmap for enhancing data management.
Here are my slides on "Board and Cyber Security" that I presented at the Just People Information Security breakfast this morning. Thanks Adam for arranging the session and those who attended.
Big Data: Architecture and Performance Considerations in Logical Data LakesDenodo
This presentation explains in detail what a Data Lake Architecture looks like, how data virtualization fits into the Logical Data Lake, and goes over some performance tips. Also it includes an example demonstrating this model's performance.
This presentation is part of the Fast Data Strategy Conference, and you can watch the video here goo.gl/9Jwfu6.
Driving Data Intelligence in the Supply Chain Through the Data Catalog at TJXDATAVERSITY
Roles and responsibilities are a critical component of every Data Governance program. Building a set of roles that are practical and that will not interfere with people’s “day jobs” is an important consideration that will influence how well your program is adopted. This tutorial focuses on sharing a proven model guaranteed to represent your organization.
Join Bob Seiner for this lively webinar where he will dissect a complete Operating Model of Roles and Responsibilities that encompasses all levels of the organization. Seiner will detail the roles and describe the most effective way to associate people with the roles. You will walk out of this webinar with a model to apply to your organization.
In this session Bob will share:
- The five levels of Data Governance roles
- A proven Operating Model of Roles and Responsibilities
- How to customize the model to meet your requirements
- Setting appropriate role expectations
- How to operationalize the roles and demonstrate value
Presentation about the agINFRA Germplasm Working Group (http://wiki.aginfra.eu/index.php/Germplasm_Working_Group). Presented during Session 1 of the 1st International e-Conference on Germplasm Data Interoperability (https://sites.google.com/site/germplasminteroperability/)
- Corporate data is growing rapidly at 100% every year and data generated in the past 3 years is equivalent to the previous 30 years.
- With increasing data, organizations need tools to manage data and turn it into useful information for strategic decision making.
- Business intelligence provides interactive tools for analyzing large amounts of data from different sources and transforming it into insightful reports and dashboards to help organizations make better business decisions.
The document outlines several upcoming workshops hosted by CCG, an analytics consulting firm, including:
- An Analytics in a Day workshop focusing on Synapse on March 16th and April 20th.
- An Introduction to Machine Learning workshop on March 23rd.
- A Data Modernization workshop on March 30th.
- A Data Governance workshop with CCG and Profisee on May 4th focusing on leveraging MDM within data governance.
More details and registration information can be found on ccganalytics.com/events. The document encourages following CCG on LinkedIn for event updates.
Data Modeling, Data Governance, & Data QualityDATAVERSITY
Data Governance is often referred to as the people, processes, and policies around data and information, and these aspects are critical to the success of any data governance implementation. But just as critical is the technical infrastructure that supports the diverse data environments that run the business. Data models can be the critical link between business definitions and rules and the technical data systems that support them. Without the valuable metadata these models provide, data governance often lacks the “teeth” to be applied in operational and reporting systems.
Join Donna Burbank and her guest, Nigel Turner, as they discuss how data models & metadata-driven data governance can be applied in your organization in order to achieve improved data quality.
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
Developing a Data Strategy for your organization can seem like a daunting task – but it’s worth the effort. Getting your Data Strategy right can provide significant value, as data drives many of the key initiatives in today’s marketplace, from digital transformation to marketing, customer centricity, population health, and more. This webinar will help demystify Data Strategy and its relationship to Data Architecture and will provide concrete, practical ways to get started.
BI Consultancy - Data, Analytics and StrategyShivam Dhawan
The presentation describes my views around the data we encounter in digital businesses like:
- Looking at common Data collection methodologies,
-What are the common issues within the decision support system and optimiztion lifecycle,
- Where are most of failing?
and most importantly, "How to connect the dots and move from Data to Strategy?"
I work with all facets of Web Analytics and Business Strategy and see the structures and governance models of various domains to establish and analyze the key performance indicators that allow you to have a 360º overview of online and offline multi-channel environment.
Apart from my experience with the leading analytic tools in the market like Google Analytics, Omniture and BI tools for Big Data, I am developing new solutions to solve complex digital / business problems.
As a resourceful consultant, I can connect with your team in any modality or in any form that meets your needs and solves any data/strategy problem.
The right approach to data governance plays a crucial role in the success of AI and analytics initiatives within an organization. This is especially true for small to medium-sized companies that must harness the power of data to drive growth, innovation and competitiveness.
This guide aims to provide SMB organizations with a practical roadmap to successfully implement a data governance strategy that ensures data quality, security and compliance. Use it to unlock the full potential of your data assets.
Data Architecture Strategies: The Rise of the Graph DatabaseDATAVERSITY
Graph databases are growing in popularity, with their ability to quickly discover and integrate key relationship between enterprise data sets. Business use cases such as recommendation engines, master data management, social networks, enterprise knowledge graphs and more provide valuable ways to leverage graph databases in your organization. This webinar provides an overview of graph database technologies, and how they can be used for practical applications to drive business value.
Adopting a Process-Driven Approach to Master Data ManagementSoftware AG
What is a lasting solution to the sea of errors, headaches, and losses caused by inconsistent and inaccurate master data such as customer and product records? This is the data that your business counts on to operate business processes and make decisions. But this data is often incomplete or in conflict because it resides in multiple IT systems. Master Data Management (MDM)'s programs are the solution to this problem, but these programs can fail without the investment and involvement of business managers.
Listen to Rob Karel, Forrester analyst, and Jignesh Shah from Software AG to learn about a new, process-driven approach to MDM and why it is a win-win for both business and IT managers.
Visit us at http://www.softwareag.com Become part of our growing community: Facebook: http://www.facebook.com/softwareag Twitter: http://www.twitter.com/softwareag LinkedIn: http://www.linkedin.com/company/software-ag YouTube: http://www.youtube.com/softwareag
Describes what Enterprise Data Architecture in a Software Development Organization should cover and does that by listing over 200 data architecture related deliverables an Enterprise Data Architect should remember to evangelize.
The document provides guidance on designing a data and analytics strategy. It discusses why data and analytics are important for business success in the digital age. It outlines 13 approaches to a data and analytics strategy organized by core business strategy and value proposition. It emphasizes the importance of data literacy, governance, and quality. It provides examples of how organizations have used data and analytics to improve outcomes. The overall message is that a clear strategy is needed to communicate the business value of data and maximize its impact.
Strategic imperative the enterprise data modelDATAVERSITY
With today's increasingly complex data ecosystems, the Enterprise Data Model (EDM) is a strategic imperative that every organization should adopt. An Enterprise Data Model provides context and consistency for all organizational data assets, as well as a classification framework for data governance. Enterprise modeling is also totally consistent with agile workflows, evolving incrementally to keep pace with changing organizational factors. In this session, IDERA’s Ron Huizenga will discuss the increasing importance of the EDM, how it serves as a framework for all enterprise data assets, and provides a foundation for data governance.
Too often I hear the question “Can you help me with our Data Strategy?” Unfortunately, for most, this is the wrong request because it focuses on the least valuable component – the Data Strategy itself. A more useful request is this: “Can you help me apply data strategically?”Yes, at early maturity phases the process of developing strategic thinking about data is more important than the actual product! Trying to write a good (must less perfect) Data Strategy on the first attempt is generally not productive –particularly given the widespread acceptance of Mike Tyson’s truism: “Everybody has a plan until they get punched in the face.” Refocus on learning how to iteratively improve the way data is strategically applied. This will permit data-based strategy components to keep up with agile, evolving organizational strategies. This approach can also contribute to three primary organizational data goals.
In this webinar, you will learn how improving your organization’s data, the way your people use data, and the way your people use data to achieve your organizational strategy will help in ways never imagined. Data are your sole non-depletable, non-degradable, durable strategic assets, and they are pervasively shared across every organizational area. Addressing existing challenges programmatically includes overcoming necessary but insufficient prerequisites and developing a disciplined, repeatable means of improving business objectives. This process (based on the theory of constraints) is where the strategic data work really occurs, as organizations identify prioritized areas where better assets, literacy, and support (Data Strategy components) can help an organization better achieve specific strategic objectives. Then the process becomes lather, rinse, and repeat. Several complementary concepts are also covered, including:
- A cohesive argument for why Data Strategy is necessary for effective Data Governance
- An overview of prerequisites for effective strategic use of Data Strategy, as well as common pitfalls
- A repeatable process for identifying and removing data constraints
- The importance of balancing business operation and innovation
Data Governance Trends - A Look Backwards and ForwardsDATAVERSITY
As DATAVERSITY’s RWDG series hurdles into our 12th year, this webinar takes a quick look behind us, evaluates the present, and predicts the future of Data Governance. Based on webinar numbers, hot Data Governance topics have evolved over the years from policies and best practices, roles and tools, data catalogs and frameworks, to supporting data mesh and fabric, artificial intelligence, virtualization, literacy, and metadata governance.
Join Bob Seiner as he reflects on the past and what has and has not worked, while sharing examples of enterprise successes and struggles. In this webinar, Bob will challenge the audience to stay a step ahead by learning from the past and blazing a new trail into the future of Data Governance.
In this webinar, Bob will focus on:
- Data Governance’s past, present, and future
- How trials and tribulations evolve to success
- Leveraging lessons learned to improve productivity
- The great Data Governance tool explosion
- The future of Data Governance
This document discusses data governance and data architecture. It introduces data governance as the processes for managing data, including deciding data rights, making data decisions, and implementing those decisions. It describes how data architecture relates to data governance by providing patterns and structures for governing data. The document presents some common data architecture patterns, including a publish/subscribe pattern where a publisher pushes data to a hub and subscribers pull data from the hub. It also discusses how data architecture can support data governance goals through approaches like a subject area data model.
Real-World Data Governance: Data Governance ExpectationsDATAVERSITY
When starting a Data Governance program, significant time, effort and bandwidth is typically spent selling the concept of data governance and telling people in your organization what data governance will do for them. This may not be the best strategy to take. We should focus on making Data Governance THEIR idea not ours.
Shouldn’t the strategy be that we get the business people from our organization to tell US why data governance is necessary and what data governance will do for them? If only we could get them to tell us these things? Maybe we can.
Join Bob Seiner and DATAVERSITY for this informative Real-World Data Governance webinar that will focus on getting THEM to tell US where data governance will add value. Seiner will review techniques for acquiring this information and will share information of where this information will add specific value to your data governance program. Some of those places may surprise you.
This document summarizes a research study that assessed the data management practices of 175 organizations between 2000-2006. The study had both descriptive and self-improvement goals, such as understanding the range of practices and determining areas for improvement. Researchers used a structured interview process to evaluate organizations across six data management processes based on a 5-level maturity model. The results provided insights into an organization's practices and a roadmap for enhancing data management.
Here are my slides on "Board and Cyber Security" that I presented at the Just People Information Security breakfast this morning. Thanks Adam for arranging the session and those who attended.
Big Data: Architecture and Performance Considerations in Logical Data LakesDenodo
This presentation explains in detail what a Data Lake Architecture looks like, how data virtualization fits into the Logical Data Lake, and goes over some performance tips. Also it includes an example demonstrating this model's performance.
This presentation is part of the Fast Data Strategy Conference, and you can watch the video here goo.gl/9Jwfu6.
Driving Data Intelligence in the Supply Chain Through the Data Catalog at TJXDATAVERSITY
Roles and responsibilities are a critical component of every Data Governance program. Building a set of roles that are practical and that will not interfere with people’s “day jobs” is an important consideration that will influence how well your program is adopted. This tutorial focuses on sharing a proven model guaranteed to represent your organization.
Join Bob Seiner for this lively webinar where he will dissect a complete Operating Model of Roles and Responsibilities that encompasses all levels of the organization. Seiner will detail the roles and describe the most effective way to associate people with the roles. You will walk out of this webinar with a model to apply to your organization.
In this session Bob will share:
- The five levels of Data Governance roles
- A proven Operating Model of Roles and Responsibilities
- How to customize the model to meet your requirements
- Setting appropriate role expectations
- How to operationalize the roles and demonstrate value
Presentation about the agINFRA Germplasm Working Group (http://wiki.aginfra.eu/index.php/Germplasm_Working_Group). Presented during Session 1 of the 1st International e-Conference on Germplasm Data Interoperability (https://sites.google.com/site/germplasminteroperability/)
This document summarizes a presentation on metadata analysis of germplasm collections in the agINFRA project. It describes two main germplasm data sources - the Chinese Crop Germplasm Information System and the Italian National Germplasm Database. It discusses the schemas and descriptors used, mappings between schemas, and a linked data approach to connecting the different data sources. The overall goal is to facilitate interoperability between global germplasm databases.
Data Science & Data Products at Neue Zürcher ZeitungRené Pfitzner
1) The document discusses data science and data products at NZZ, a Swiss media company.
2) NZZ uses data science to build data products like article recommendations and the NZZ News Companion app to address challenges from declining newspaper revenues and readership.
3) Key aspects of NZZ's data stack include REST APIs, Spark for scalable data processing, and deploying products on-premise, in the cloud, or with microservices.
Great data leads to great insights which leads to great products.
Vitaly Gordon, senior products data scientist, talks about the culture, people and tools that have helped LinkedIn become the world’s leading professional social network and one of the most visited sites on the web.
Data Engineering: Elastic, Low-Cost Data Processing in the CloudCloudera, Inc.
3 Things to Learn About:
*On-premises versus the cloud: What’s the same and what’s different?
*Benefits of data processing in the cloud
*Best practices and architectural considerations
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Data Science London
What 'kind of things' does a data scientist do? What are the foundations and principles of data science? What is a Data Product? What does the data science process looks like? Learning from data: Data Modeling or Algorithmic Modeling? - talk by Carlos Somohano @ds_ldn at The Cloud and Big Data: HDInsight on Azure London 25/01/13
The document discusses prescriptive analytics techniques such as optimization and simulation modeling. It provides examples of how organizations like a school district and ExxonMobil have used optimization models in Excel and GAMS respectively to make strategic decisions by evaluating multiple options and variables. Decision modeling approaches like mathematical programming, spreadsheets, and decision trees are described for representing decision situations involving alternatives under certainty, risk, and uncertainty.
Using PySpark to Scale Markov Decision Problems for Policy ExplorationDatabricks
Finding policies that lead to optimal outcomes for an organization are some of the most difficult challenges facing decision makers within an organization. The reason for it is the fact that policies are not made in a world with perfect information and markets in equilibrium. These are complex systems where the behavior of entities within the system are dynamic and generally uncertain. Reinforcement Learning (RL) has gained popularity for modeling complex behavior to identify optimal strategy. RL maps states or situations to actions in order to maximize some result or reward. The Markov Decision Process (MDP) is a core component of the RL methodology. The Markov chain is a probabilistic model that uses the current state to predict the next state.
This presentation discusses using PySpark to scale an MDP example problem. When simulating complex systems, it can be very challenging to scale to large numbers of agents, due to the amount of processing that needs to be performed in memory as each agent goes through a permutation. PySpark allows us to leverage Spark for the distributed data processing and Python to define the states and actions of the agents.
Roger S. Barga discusses his experience in data science and predictive analytics projects across multiple industries. He provides examples of predictive models built for customer segmentation, predictive maintenance, customer targeting, and network intrusion prevention. Barga also outlines a sample predictive analytics project for a real estate client to predict whether they can charge above or below market rates. The presentation emphasizes best practices for building predictive models such as starting small, leveraging third-party tools, and focusing on proxy metrics that drive business outcomes.
Smart solutions for productivity gain IQA conference 2017Steve Franklin
Presentation by Steve Franklin of Cement & Aggregate Consulting at the 2017 IQA conference in Toowoomba covering use of drones and quarry planning and scheduling tools.
The document discusses various aspects of prototyping, including prototype development methodologies, types of prototypes, evaluation techniques, and tools used in prototyping. Specifically, it covers methodology for prototype development, types of prototypes like throwaway, evolutionary, and incremental prototypes. It also discusses techniques for prototype evaluation like protocol analysis and cognitive walkthroughs, and the benefits of prototyping for software development.
This document discusses agile manufacturing systems from an automotive industry perspective. It presents two decision models - a spreadsheet model and decision tree model - to study whether to invest in a dedicated, agile, or flexible manufacturing system for engine and transmission parts. The models provide initial insights into assessing the business case for investing in agile manufacturing. Agile systems seem to enable fast and cost-effective responses to new product introductions and unpredictable demand. However, more research is needed to fully address the business case for agility across multiple manufacturing sites. The document also discusses key enablers of agile manufacturing like virtual enterprise tools and concurrent engineering.
The document summarizes a student project presentation on using machine learning to predict mobile phone prices based on features. It used various classification algorithms like logistic regression, decision trees, random forests and SVMs on a dataset of phone features to predict if a phone would be economical or expensive. The highest accuracy of 95% was achieved using support vector machines. The project aims to help users determine phone prices based on features.
Chad Richeson gave a presentation on harnessing big data. He discussed how nearly every industry is trying to apply big data concepts to improve opportunities, efficiencies, and minimize risk. Examples of big data applications in different industries were provided. Richeson emphasized that successful big data projects require blending analytics, business, and technical skills. He outlined key steps for moving big data projects from development to implementation, including focusing on business goals and gaining user agreement.
The document discusses various methods for conducting market analysis for new product development. It focuses on four main areas: idea generation, product optimization, marketing mix optimization, and market prediction. For idea generation, methods like brainstorming, focus groups, and morphological analysis are presented. Product optimization discusses approaches like Quality Function Deployment (QFD) and conjoint analysis to design the product based on customer needs. Test marketing and concept testing are described as ways to introduce the new product to the market and predict its anticipated success. Case studies on pharmaceutical companies and automakers are provided as examples.
AI-900 - Fundamental Principles of ML.pptxkprasad8
Automated machine learning uses algorithms to automate the machine learning workflow including data preprocessing, model selection, hyperparameter tuning, and evaluation to build an optimal machine learning model with little or no human involvement. It can save time by automating repetitive tasks and help identify the best performing models for various types of machine learning problems like classification, regression, and clustering. Automated machine learning tools provide an end-to-end experience to build, deploy, and manage machine learning models at scale with minimal coding or machine learning expertise required.
Artificial intelligence using mobile prediction hjshsgshksb hsbdhdnbdbbdbbd ndbbdhdbnd dhdhdbdbf bdhdhfhdnd dhdhdhdnf fbfhdbf dbhfbf fhf d fhfbd djbdbd fhfbbf d fb f fbf f fhfb fbf fbfhfvvf fbfjfnbfvfbfjfnf bfhd fhfbf. Fjf f fbfbf bfbf fbfbf bfbf fhfbvf fbdbbf f fjfbbf f fjbf f fn f f fnf. F fnfbfbf fnf f fbfbf fbnfbf fjfbf f fbfbf fbfbfb fbfbf bbahsh s sbbshshndd jdbdbdbdbdbdb dhdbdbbddhbd dbbdvdbdbd bdbdbdbd bdbdbbdbd dhdbbd dbfbdbfbfbfbxbfbfbdbbdhdhd d hdbdbvxxdbhd xbdbdhbx dhdbd fhdbd. Xbdbdbf dhbd dbd d bdbd dbdjbf fbfbf xbfbf hd f fbfbf f hdbdn xbdnfbfbbfjfjd xhfvbfbdndhfhf. Fhdbfnndndnfnndnfn
Multicriteria and cost benefit analysis for smart grid projectsLeonardo ENERGY
Cost-Benefit Analysis (CBA) is a well-established technique for Decision-Making (DM) in companies recently applied to Smart Grid projects whose impact can span over the electrical power system borders and cannot be easily monetized. Therefore, CBA lacks in describing the smart grid potential and Multi-Criteria Analysis (MCA) has been introduced for improving DM. The Webinar covers DM fundamentals focusing on MCA and CBA. Pros, cons and research gaps of each technique are analysed with the aid of real-world examples. Finally, a novel implementation of MCA-CBA is proposed with particular reference to Smart Grid application as proposed by ISGAN Annex 3.
The concept generation process begins with a set of customer needs and target specifications and results in a set of product concepts from which the team will make a final selection.
The document discusses building a customer churn prediction model for a telecom company in Syria using machine learning techniques. It proposes using the XGBoost algorithm to classify customers as churners or non-churners based on their customer data over 9 months. XGBoost builds sequential decision trees and increases the weights of misclassified variables to improve predictive performance. The model achieved an AUC of 93.3% and incorporated social network features to further enhance results. The document outlines the hardware, software and methodology used to develop and test the model on a large dataset from SyriaTel to predict customer churn.
The document discusses operationalizing analytics and Remsoft's 20 years of experience in this area. It describes operationalizing analytics as requiring a collaborative environment, reliable architecture, and repeatable processes. It outlines Remsoft's solutions over time for clients like Coillte, including optimization engines, integration with databases and models, and an interface in Excel. Lessons learned include the need to stay ahead of trends in technology and data and provide flexible modeling environments. Remsoft offers educational partnerships with discounted or free software access.
This document summarizes a research paper on developing a software tool called "Smart Sim Selector" to help users select simulation software. It describes the development of the tool, including:
1) Designing a database containing information on various simulation software packages based on over 200 evaluation criteria.
2) Creating an interface in Visual Basic that allows users to specify their requirements and priorities, then queries the database to recommend suitable software.
3) Implementing different techniques (AHP, weighted scoring, TOPSIS) to analyze users' inputs and software attributes to determine the best recommendation.
The tool aims to provide an unbiased approach to simulation software selection and reduce problems companies face in choosing inappropriate packages.
Advanced Optimization for the Enterprise WebinarSigOpt
SigOpt provides an optimization platform and techniques to help practitioners solve machine learning problems more efficiently. It allows users to frame problems as black box optimization and optimize multiple competing metrics. SigOpt's parallel optimization capabilities also help fully utilize available compute resources to accelerate results. Case studies demonstrated significant speedups and accuracy gains from SigOpt compared to random search and grid search.
KB Seminars: Working with Technology - Product Management; 10/13MDIF
The document provides an overview of the product management process, including defining roles and responsibilities. It discusses how product management fits between marketing/strategy and technology, and the key steps in the process including defining user needs, developing use cases, specifying product features, and testing the product. The goal is to translate user needs into technical specifications that guide development of the product.
The document describes the 3P (Production Preparation Process) tool, which is an advanced lean approach used to rapidly design new products and production processes. It involves cross-functional teams simulating and modeling the product and manufacturing processes using techniques like live-size mockups. The goal is to minimize resources and design an efficient production system before making commitments. 3P consists of a series of individual and group activities over 3-5 days to sketch alternatives and select the optimal design and process combination.
NEUROPUBLIC is a Greek company founded in 2003 that provides digital services for agriculture. It has established a coalition with farming organizations and a bank to support Greek agriculture through open data and technology. This coalition formed GAIA Business S.A., which offers various services to farmers through its GAIA cloud platform and network of environmental sensor stations. NEUROPUBLIC is building new business models and data-powered services for agriculture using open data sources like Copernicus data, remote sensing data, and data collected from its sensor network.
Agricultural Data Interest Group & Wheat Data Working Group of RDAVassilis Protonotarios
Presentation delivered during the "Engagement in RDA from Southern-Eastern Europe, Mediterranean and Caucasus region" Workshop. 25/6/2015, Athens, Greece
Presentation delivered during the Introductory Course: "Introduction to agricultural & food safety datasets and semantic technologies" (http://irss.iit.demokritos.gr/2014/hackathon/introductory_course) of the SemaGrow 2nd Hackathon (http://wiki.agroknow.gr/agroknow/index.php/SemaGrow_Hackathon)
4/7/2014, NCSR Demokritos, Athens, Greece
Seeding organic agriculture courses on Moodle: the agriMoodle CaseVassilis Protonotarios
Presentation on agriMoodle delivered at the "Life for Agriculture - Agriculture for Life" international Conference.
6/6/2014, USAMVB, Bucharest, Romania. More info at http://agricultureforlife.usamv.ro/index.php/en/
This document discusses the agINFRA project's efforts to enhance interoperability between agricultural data sources by developing a linked data framework for germplasm data. The agINFRA Germplasm Working Group aims to identify relevant standards, analyze existing schemas and vocabularies, and propose recommendations for exposing germplasm resources as linked open data. Key outcomes include a dossier of germplasm information and engagement with stakeholders. The proposed methodology involves defining a base schema, publishing local classifications as linked data, and linking data from different sources using common vocabularies. Implementation plans include publishing germplasm vocabularies and phenotypic data in 2014.
Presentation made in the context of the FAO AIMS Webinar titled “Knowledge Organization Systems (KOS): Management of Classification Systems in the case of Organic.Edunet” (http://aims.fao.org/community/blogs/new-webinaraims-knowledge-organization-systems-kos-management-classification-systems)
21/2/2014
Presentation of some of the major germplasm data sources, including aggregators, networks and individual data providers. Information based on the agINFRA Dossier on Germplasm Data sources (available at http://wiki.aginfra.eu/index.php/Germplasm_Working_Group)
Presented during Session 3 of the 1st International e-Conference on Germplasm Data Interoperability (https://sites.google.com/site/germplasminteroperability/)
Using Agricultural Learning Portals in Developing Countries: The case of Orga...Vassilis Protonotarios
The Organic.Edunet portal provides open access to educational resources on organic agriculture. It has over 11,000 resources, 3,500 registered users, and receives over 150,000 unique visitors annually from over 200 countries. Usage statistics show the highest traffic comes from regions where English, Spanish, and French (the supported languages) are spoken. Users from non-supported language regions had much lower usage. The portal's multilingual support increases user activity and behavior within the portal. English is the primary language used for queries both on web searches and within the portal.
Developing a network of content providers: The case of Organic.EdunetVassilis Protonotarios
This document discusses the development of the Organic.Edunet network of content providers on organic agriculture. It describes the network's expansion over three phases, adding new collections from projects and affiliated providers. Tools like Confolio and AgLR are used to harvest, ingest, or create metadata compliant with the Organic.Edunet Application Profile. Efforts are underway to improve multilingual support and user engagement within the network.
Presentation delivered during the Workshop on Agricultural Education, Methods, Practices and Technologies" (AgEdWS12). Pollenzo, Bra, Italy, 25/10/2012
Developing a network of content providers: The case of Organic.EdunetVassilis Protonotarios
This document summarizes a presentation about developing a network of content providers using the Organic.Edunet project as a case study. It discusses how Organic.Edunet aggregates educational resources on organic agriculture from various repositories and makes them accessible through a single portal. It also describes how the network has expanded over three phases to include additional collections from various countries and content types. Finally, it provides information on how new content providers can connect their resources to the Organic.Edunet network through harvesting, ingesting, or creating metadata records.
Introducing a content integration process for a federation of agricultural in...Vassilis Protonotarios
Presentation titled "Introducing a content integration process for a federation of agricultural institutional repositories". MTSR 2011, Izmir, Turkey, 12/10/2011
Presentation titled "Designing a Training Session for Public Authorities". Rural Inclusion Workshop / EFITA 2011 Conference. Prague, Czech Republic 11-14/7/2011.
Green Education Using Open Educational Resources (OER) (SPDECE 2012)Vassilis Protonotarios
Presentation titled "Green Education Using Open Educational Resources (OER): Setting up a Green OER Repository". SPDECE 2012 Symposium, Alicante, Spain, 14/6/2012 (http://transducens.dlsi.ua.es/congress/spdece2012)
Presentation titled "Innovation in the Teaching of Sustainable Development in Europe: The Case of ISLE Erasmus Network". SPDECE 2012 Symposium, Alicante, Spain, 14/6/2012 (http://transducens.dlsi.ua.es/congress/spdece2012)
This course provides students with a comprehensive understanding of strategic management principles, frameworks, and applications in business. It explores strategic planning, environmental analysis, corporate governance, business ethics, and sustainability. The course integrates Sustainable Development Goals (SDGs) to enhance global and ethical perspectives in decision-making.
Inventory Reporting in Odoo 17 - Odoo 17 Inventory AppCeline George
This slide will helps us to efficiently create detailed reports of different records defined in its modules, both analytical and quantitative, with Odoo 17 ERP.
Hannah Borhan and Pietro Gagliardi OECD present 'From classroom to community ...EduSkills OECD
Hannah Borhan, Research Assistant, OECD Education and Skills Directorate and Pietro Gagliardi, Policy Analyst, OECD Public Governance Directorate present at the OECD webinar 'From classroom to community engagement: Promoting active citizenship among young people" on 25 February 2025. You can find the recording of the webinar on the website https://oecdedutoday.com/webinars/
How to Configure Deliver Content by Email in Odoo 18 SalesCeline George
In this slide, we’ll discuss on how to configure proforma invoice in Odoo 18 Sales module. A proforma invoice is a preliminary invoice that serves as a commercial document issued by a seller to a buyer.
Unit 1 Computer Hardware for Educational Computing.pptxRomaSmart1
Computers have revolutionized various sectors, including education, by enhancing learning experiences and making information more accessible. This presentation, "Computer Hardware for Educational Computing," introduces the fundamental aspects of computers, including their definition, characteristics, classification, and significance in the educational domain. Understanding these concepts helps educators and students leverage technology for more effective learning.
Research Publication & Ethics contains a chapter on Intellectual Honesty and Research Integrity.
Different case studies of intellectual dishonesty and integrity were discussed.
Odoo 18 Accounting Access Rights - Odoo 18 SlidesCeline George
In this slide, we’ll discuss on accounting access rights in odoo 18. To ensure data security and maintain confidentiality, Odoo provides a robust access rights system that allows administrators to control who can access and modify accounting data.
3. Intro
• This presentation provides introductory information
about
– the (big) data products
– the design of (big) data products,
– the Drivetrain approach for the design of objective-based
(big) data products.
• The Drivetrain approach will be applied to
agricultural case studies in the next session
• The majority of the slides were based on the report
“Big Data Now: 2012 Edition. O’Reilly Media, Inc.”
Slide 3 of 66
4. Objectives
This presentation aims to:
• Provide an introduction to data products
• Define the “objective-based data products” concept
– Describe the Drivetrain approach in the design of (big) Data
products
– Analyze the design of data products
– Provide applications / case studies
In order to provide the methodology for the development
of data products
Slide 4 of 66
5. Structure of the presentation
1. Intro to designing (great) data products
2. Objective-based data products
– The Drivetrain approach
3. Case studies (x4): Application of the
Drivetrain approach
4. The future of data products
Slide 5 of 66
7. What is a (big) data product?
• What happens when (big) data becomes a
product
– specifically, a consumer product
• Produce (big) data based on inputs ?
= data producers
• Deliver results based on (big) data ?
= data processors
• Uses big data for providing useful outcomes?
Slide 7 of 66
8. Facts about (big) data products
• Enable their users to do whatever they want
– which most often has little to do with (big) data
• Replace physical products
Slide 8 of 66
9. The past: Predictive modeling
• Development of data products based on data
predictive modeling
– weather forecasting
– recommendation engines
– email spam filters
– services that predict airline flight times
• sometimes more accurately than the airlines
themselves.
Slide 9 of 66
10. The issue of predictive modeling
• Prediction technology: interesting, useful and
mathematically elegant
– BUT we need to take the next step because….
• These products just make predictions
– instead of asking what action they want someone
to take as a result of a prediction
Slide 10 of 66
11. The role of predictive modeling
• Great predictive modeling is still an important
part of the solution
– but it no longer stands on its own
– as products become more sophisticated, it
becomes less useful
Slide 11 of 66
12. A new, alternative approach
• the Drivetrain Approach
– a four-step approach, already applied in the
industry
– inspired by the emerging field of self-driving
vehicles
– Objective-based approach
B
A
http://www.popularmechanics.com/cars/how-to/repair-questions/1302716
Slide 12 of 66
14. Case study
• A user of Google’s self-driving car
– is completely unaware of the hundreds (if not
thousands) of models and the petabytes of data
that make it work
BUT
• It is an increasingly sophisticated product built
by data scientists
– they need a systematic design approach
Slide 14 of 66
18. The 4 steps in the Drivetrain approach
• The four steps in this transition:
– Identify the main objective
• For Google: show the most relevant search results
– Specify the system’s manageable inputs [levers]
• For Google: ranking the results
– Consider the data needed for managing the inputs
• Information about users’ activities in other web sites
– Building the predictive models
• For Google: PageRank algorithm
Slide 18 of 66
19. Drivetrain approach goal
NOT
use data not just to generate more data
– especially in the form of predictions
BUT
use data to produce actionable outcomes
Slide 19 of 66
20. [CASE STUDY 1] THE MODEL ASSEMBLY
LINE: A CASE STUDY OF INSURANCE
COMPANIES
21. The issue of insurance companies
• Case study: Insurance companies
– Their objective: maximizing the profit from each
policy price
– An optimal pricing model is to them what the
assembly line is to automobile manufacturing
– Despite their long experience in prediction, they
often fail to make optimal business decisions
about what price to charge each new customer
Slide 21 of 66
22. Transition to Drivetrain approach
• Identifying solutions to this issue
– Optimal Decisions Group (ODG) approached this
problem with an early use of the Drivetrain
Approach
– Resulted in a practical take on step 4 that can be
applied to a wide range of problems.
Model Assembly Line
Slide 22 of 66
23. The Drivetrain approach
Set a price for policies, maximizing profit
• price to charge each customer;
• types of accidents to cover;
• how much to spend on marketing and customer service;
• how to react to their competitors’ pricing decisions
• Data collected from real experiments on customers
Randomly changing the prices of hundreds of thousands of policies
over many months
•
• Develop a probability model for optimizing the insurer’s profit
Slide 23 of 66
24. Slide 24 of 66
http://cdn.oreilly.com/radar/images/posts/0312-2-drivetrain-step4-lg.png
Developing the modeler:
Model Assembly Line
25. The role of the Modeler
• Modeler Component 1
• model of price elasticity: the probability that a customer will
accept a given price (for new policies and renewals)
• Modeler Component 2
• relates price to the insurance company’s profit, conditional
on the customer accepting this price.
• Multiplying these two curves creates a final curve
• Shows price versus expected profit
• The final curve has a clearly identifiable local maximum that
represents the best price to charge a customer for the first
year.
Slide 25 of 66
27. The role of the Simulator
• Lets ODG ask the “what if ” questions
• to see how the levers affect the distribution of the final
outcome
• Runs the models over a wide range of inputs
• The operator can adjust the input levers to answer specific
questions
• “What will happen if our company offers the customer a low
teaser price in year one but then raises the premiums in Y2?
• Explores the distribution of profit as affected the
inputs outside of the insurer’s control
• E.g. “What if the economy crashes and the customer loses
his job?”
Slide 27 of 66
28. The role of the Optimizer
• takes the surface of possible outcomes and
identifies the highest point
• finds the best outcomes
• identify catastrophic outcomes
– and show how to avoid them
Slide 28 of 66
29. Take-home message
Using a Drivetrain Approach combined with a
Model Assembly Line bridges the gap
between predictive models and actionable
outcomes.
Slide 29 of 66
31. Recommendation engines
• Recommendation engines
– data product based on well-built predictive
models that do not achieve an optimal objective.
– The current algorithms predict what products a
customer will like,
• based on purchase history and the histories of similar
customers.
Slide 31 of 66
32. The case of Amazon
• Amazon represents every purchase that has
ever been made as a giant sparse matrix
– with customers as the rows and products as the
columns.
• Once they have the data in this format, data
scientists apply some form of collaborative
filtering to “fill in the matrix.”
Slide 32 of 66
33. The case of Amazon
• Such models are good at predicting whether a
customer will like a given product
– but they often suggest products that the customer
already knows about or has already decided not
to buy.
http://strata.oreilly.com/2012/03/drivetrain-approach-data-products.html
Slide 33 of 66
35. The Drivetrain approach
Drive additional sales
by surprising and delighting the customer with books not initially considered
without the recommendation
• Ranking of the recommendations
Data to derive from many randomized experiments about a
wide range of recommendations for a wide range of
customers:
To generate recommendations that will cause new sales
• Develop an algorithm providing recommendations which escape a
recommendation filter bubble
Slide 35 of 66
36. Slide 36 of 66
http://cdn.oreilly.com/radar/images/posts/0312-2-drivetrain-step4-lg.png
Developing the modeler
37. The role of the Modeler
• Modeler Component 1
• purchase probabilities, conditional on seeing a recommendation
• Modeler Component 2
• purchase probabilities, conditional on not seeing a
recommendation
• The difference between these two probabilities is a
utility function for a given recommendation to a
customer
• Low in cases where the algorithm recommends a familiar book
that the customer has already rejected (both components are
small) or a book that he/she would have bought even without
the recommendation
Slide 37 of 66
39. The role of the Simulator
• Test the utility of each of the many possible
books in stock
• Alternatively just overall the outputs of a
collaborative filtering model of similar
customer purchases
Slide 39 of 66
40. The role of the Optimizer
• Rank and display the recommended books
based on their simulated utility
• Less emphasis on the “function” and more on
the “objective.”
– What is the objective of the person using our data
product?
– What choice are we actually helping him or her
make?
Slide 40 of 66
42. Customer value
• Includes all interactions between a retailer and
his customers outside the actual buy-sell
transaction
– making a product recommendation
– encouraging the customer to check out a new feature
of the online store
– sending sales promotions
• Making the wrong choices comes at a cost to the
retailer
– reduced margins (discounts that do not drive extra
sales)
– opportunity costs
Slide 42 of 66
43. The Drivetrain approach
optimize the lifetime value from each
customer
• Product recommendations
• Offer tailored discounts / special offers on products
• Make customer-care calls just to see how the user is
• Invite them to use the site and ask for their feedback
Zafu approach:
not send customers directly to clothes but ask a series of simple questions
about the customers’ body type, how well their other jeans fit and their fashion
preferences
• Develop an algorithm leading customer to browse a recommended selection
of Zafu’s inventory
Slide 43 of 66
45. Slide 45 of 66
http://cdn.oreilly.com/radar/images/posts/0312-2-drivetrain-step4-lg.png
Developing the modeler
46. The role of the Modeler
• Modeler Component 1
• purchase probabilities, conditional on seeing a
recommendation
• Modeler Component 2
• purchase probabilities, conditional on not seeing a
recommendation
• Modeler Component 3
• price elasticity model to test how offering a discount might
change the probability that the customer will buy the item
• Modeler Component 4
• patience model for the customers’ tolerance for poorly
targeted communications
Slide 46 of 66
48. The role of the Simulator
• Test the utility of each of the many possible
clothes available
• Provide successful matches between
questions & recommendations
Slide 48 of 66
49. The role of the Optimizer
• Rank and display the recommended clothes
based on their simulated utility
– driving sales and improving the customer experience
• Less emphasis on the “function” and more on the
“objective”
– What is the objective of the person using our data
product?
– What choice are we actually helping him or her make?
Slide 49 of 66
51. Building a car that drives itself (1/2)
• Alternative approach: Instead of being data
driven, we can now let the data drive us!
• Models required:
– model of distance / speed-limit to predict arrival
time; a ruler and a road map needed
– model for traffic congestion
– model to forecast weather conditions and their
effect on the safest maximum speed
Slide 51 of 66
52. Building a car that drives itself (2/2)
Plenty of cool challenges in building these
models but by themselves, they do not take us
to our destination
• Simulator: to predict the drive times along
various routes
• Optimizer: pick the shortest route subject to
constraints like avoiding tolls or maximizing
gas mileage
Slide 52 of 66
53. It is already implemented
According to Google, about a dozen self-driving cars
are on the road at any given time. They've already
logged more than 500,000 miles in beta tests.
Slide 53 of 66
54. The Drivetrain approach
Building a car that drives itself
Vehicle controls
• Steering wheel, Accelerator, Βrakes
Data from sensors etc.
• sensors that gather data about the road
• cameras that detect road signs, red or green lights & unexpected
obstacles
• Physics models to predict the effects of steering, braking &
acceleration
• Pattern recognition algorithms to interpret data from the road signs
Slide 54 of 66
56. The role of the Modeler
• Modeler Component 1
• Route selection, conditional on following a
recommendation
• Modeler Component 2
• Route selection, conditional on not following a
recommendation
Slide 56 of 66
57. The role of the Simulator
• examine the results of the possible actions the
self-driving car could take
– If it turns left now, will it hit that pedestrian?
– If it makes a right turn at 55 km/h in these
weather conditions, will it skid off the road?
• Merely predicting what will happen isn’t good
enough.
Slide 57 of 66
58. The role of the Optimizer
• optimize the results of the simulation
– to pick the best combination of acceleration and
braking, steering and signaling
Prediction only tells us that there is going to be
an accident.
An optimizer tells us how to avoid accidents.
Slide 58 of 66
60. The present
• Drivetrain Approach:
– a framework for designing the next generation of
great data products
– heavily relies on optimization
• A need for the data science community to
educate others
– on how to derive value from their predictive
models
– Based on product design process
Slide 60 of 66
61. Current status of data products
• Data continuously provided in big data
providers
– Facebook, Twitter etc.
• Data are transformed -> they do not look like
data in the end
– Telematics, booking systems etc.
• Example: Music now lives on the cloud
– Amazon, Apple, Google, or Spotify
Slide 61 of 66
62. The future
• Optimization taught in business schools &
statistics departments.
• Data scientists ship products designed to
produce desirable business outcomes
Risk: Models using data to create more
data, rather than using data to create
actions, disrupt industries, and
transform lives.
Slide 62 of 66
63. To keep in mind for future big data
products
• when building a data product, it is critical to
integrate designers into the engineering team
from the beginning.
• Data products frequently have special
challenges around inputting or displaying
data.
Slide 63 of 66
64. What to expect in the future?
Google needs to move beyond the current
search format of you entering a query and
getting 10 results.
The ideal would be us knowing what you want
before you search for it…
Eric Schmidt
Executive Chairman of Google
Slide 64 of 66
66. References
• Big Data Now: 2012 Edition. O’Reilly Media, Inc.
• O’Reilly Strata: Making Data Work
(http://strata.oreilly.com/tag/big-data)
• Jeremy Howard - The Drivetrain Approach: A four-step process
for building data products
(http://strata.oreilly.com/2012/03/drivetrain-approach-dataproducts.html)
• Mike Loukides - The evolution of data products
(http://strata.oreilly.com/2011/09/evolution-of-data-products.html)
• Wikipedia: Big data (http://en.wikipedia.org/wiki/Big_data)
Slide 66 of 66
#5: Predictive modelling is the process by which a model is created or chosen to try to best predict the probability of an outcome.[1] In many cases the model is chosen on the basis of detection theory to try to guess the probability of an outcome given a set amount of input data, for example given an email determining how likely that it is spam.
#6: A motor vehicle's driveline or drivetrain consists of the parts of the powertrain excluding the engine and transmission. It is the portion of a vehicle, after the transmission, that changes depending on whether a vehicle is front-wheel, rear-wheel, or four-wheel driveStarting point of engineers: clear objective: They want a car to drive safely from point A to point B without human intervention.
#8: Back in 1997, AltaVista was king of the algorithmic search world. While their models were good at finding relevant websites, the answer the user was most interested in was often buried on page 100 = no ranking!Google realized that the objective was to show the most relevant search results first for each unique user
#9: Link to PageRank: http://en.wikipedia.org/wiki/PageRank
#11: recommendation filter bubble = the tendency of personalized news feeds to only display articles that are blandly popular or further confirm the readers’ existing biases.
#12: The figure refers to Step 4 of the Drivetrain approach
#13: Component 1: The price elasticity model is a curve of price versus the probability of the customer accepting the policy conditional on that price. This curve moves from almost certain acceptance at very low prices to almost never at high prices.Component 2: The profit for a very low price will be in the red by the value of expected claims in the first year, plus any overhead for acquiring and servicing the new customer.
#14: Amazon’s recommendation engine is probably the best one out there but it’s easy to get it to show its warts.
#15: recommendation filter bubble = the tendency of personalized news feeds to only display articles that are blandly popular or further confirm the readers’ existing biases.
#16: The figure refers to Step 4 of the Drivetrain approach
#17: Component 1: The price elasticity model is a curve of price versus the probability of the customer accepting the policy conditional on that price. This curve moves from almost certain acceptance at very low prices to almost never at high prices.Component 2: The profit for a very low price will be in the red by the value of expected claims in the first year, plus any overhead for acquiring and servicing the new customer.
#18: The figure refers to Step 4 of the Drivetrain approach
#19: Component 1: The price elasticity model is a curve of price versus the probability of the customer accepting the policy conditional on that price. This curve moves from almost certain acceptance at very low prices to almost never at high prices.Component 2: The profit for a very low price will be in the red by the value of expected claims in the first year, plus any overhead for acquiring and servicing the new customer.
#20: The figure refers to Step 4 of the Drivetrain approach