Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
49 views

Week 1-Introduction-V3-class

This document provides an overview of a course on organizing for business analytics platforms. The syllabus is reviewed which covers topics like data engineering, data science, data analysis, and new data frontiers. Grading will be based on completing AWS courses, case study presentations, individual papers, group projects, attendance, exams, and a final project. The document discusses how data and analytics can be used to make decisions and provide insights by examining patterns and trends. It also differentiates between data analytics, which uses programming logic on structured data, and AI/ML, which can analyze unstructured data and complex variables by learning from examples.

Uploaded by

Rutuja Pabale
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views

Week 1-Introduction-V3-class

This document provides an overview of a course on organizing for business analytics platforms. The syllabus is reviewed which covers topics like data engineering, data science, data analysis, and new data frontiers. Grading will be based on completing AWS courses, case study presentations, individual papers, group projects, attendance, exams, and a final project. The document discusses how data and analytics can be used to make decisions and provide insights by examining patterns and trends. It also differentiates between data analytics, which uses programming logic on structured data, and AI/ML, which can analyze unstructured data and complex variables by learning from examples.

Uploaded by

Rutuja Pabale
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 81

BUAN6335.

501

Organizing for Business Analytics Platforms

Unless Otherwise Stated this presentation refers to study material provided by AWS academy
• Introductions
• Quick Survey
• Course Overview
• Syllabus Walkthrough
• Introduction to Data and More…
3

Syllabus Overview
What is Possible 4

• New topics (data and more)


• New ideas
• Early assignment submissions ☺
• Two assignment due dates are flexible so use you veto for
extension carefully
• Flexible office hours
What may not be possible… 5

❑Attendance sheet is your input. If it is not there it is not


there.
❑Group assignments is group effort. Figure out the group
dynamics!
❑Final exam is in the CLASS. December 12. Manage your
travel accordingly.
❑AWS academy course due dates will not change
❑92.74 is 92.74 and hence A-! This is a data class ☺
6

Course Composition
Emphasis
7

All about data Lot of Examples Hands-on Feedback


• Data Eng • Case Studies • AWS Academy • Knowledge Checks
• Data Sci • HBR articles Courses • Individual Papers
• Data Analysis • Classroom • Data Eng • Group Projects
• New frontiers Discussions • ML Foundation
• Platform agnostic
Grading Criteria
8

Grading Milestones Points % Of Total Score


AWS Data Engineering Course Labs
150
completion + Knowledge Checks 15%
AWS Machine Learning Course Labs
200
Completion 20%
Harvard Case Study Presentation 200 20%
Individual Papers (2) 200 20%
Attendance 50 5%
Final Project Presentation 100 10%
Final Exam 100 10%
Total 1000 100%
9

What is Data?

Data is the …….


Everyone is talking about the data 10

…since decades…
“Without big data, you are blind and deaf and in the middle of a freeway.” —
Geoffrey Moore

“Data is the new oil.” — Clive Humby

“Data is just like crude. It’s valuable, but if unrefined it cannot really be used. It has
to be changed into gas, plastic, chemicals, etc to create a valuable entity that drives
profitable activity; so must data be broken down, analyzed for it to have value.”
- Michael Palmer

“With data collection, ‘the sooner the better’ is always the best answer.” — Marissa
Mayer

“If we have data, let’s look at data. If all we have are opinions, let’s go with mine.”
— Jim Barksdale

“Above all else, show the data.” — Edward R. Tufte


Two More…
11

You collect as much data as you can. You immerse yourself in


that data, but then make the decision with your heart.”
– Jeff Bezos

Last but not the least…

“Torture the data, and it will confess to anything.”


— Ronald Coase
12

How do we decide…with the data?

• Which website provides me better pricing with free


shipping?
• How should I line up by fantasy football team?
• knowing I have pass heavy points system ?
• Which type of course, skills will provide me with more
job opportunities ?
• Which restaurant near me serves the best Thai food?
• Which restaurant has above average reviews but generally
easy to get seating for a group?
• What should I pay for this home?
• How much my current home or car worth of ?
13

How do organizations decide…

• How is my infrastructure is handling the peak season ?


Which of these customer transactions should be
flagged as fraud?
• What customer experience should be enhanced in our
mobile app or website to increase interactions?
• Which patients are most likely to have a relapse?
• How to keep students engaged through the year and
reduce the drop put rate ?
• When is the optimum time to harvest this year's crop?
Fueling decisions with data science
14

Data analytics AI/ML


• Is the systematic analysis of • Is a set of mathematical models
large datasets (big data) to find that are used to make
patterns and trends to produce predictions from data at a scale
actionable insights that is difficult or impossible for
humans
• Uses programming logic to
answer questions from data • Uses examples from large
amounts of data to learn about
• Is good for structured data with
the data and answer questions
a limited number of variables
• Is good for unstructured data
and where the variables are
complex

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. 1
Example: Identify pictures of dogs
15

Data analytics approach AI/ML approach


Based on a set of defined features, Which of these images have the
which of these pictures are dogs? same features as the pictures that I
have seen that were labeled as
dogs?

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. 1
16

Business example: Customer relationship management

Data analytics approach AI/ML approach


A retail business analyzes total A retail business uses AI/ML to
revenue per customer and analyze customer churn (why and
segments customers into categories how often customers come and go).
based on spending.
AI/ML might uncover factors that
The segmentation might be used to influence churn so that the
give a higher level of customer business can make changes to
service to customers who spend better retain customers.
more.

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. 1
17

More valuable insights are more difficult to derive

More Prescriptive
valuable
How can we
Predictive make
something
What will
happen or
happen?
Diagnostic prevent
something from
Why did happening?
Descriptive something
happen?
What
happened?
Less valuable,
easier to derive
More difficult

17
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data Literacy and Gen D Worker
18

Reading data involves


01 understanding what data is, and
what aspects of the world it
represents
Working with data involves
02 creating, acquiring, cleaning, and
managing it.

Analyzing data involves filtering,


03 sorting, aggregating, comparing,
and performing other such analytic
operations on it.

04 Arguing with data involves using


data to support a larger narrative
intended to communicate some
message to a particular audience
Progression in Organization Mindset
19

Data Aware
Data is collected but not
necessarily used to make
decision. Used discretionarily or
only on need basis
Data Indifference
Most Decision are made
from gut feeling than
even being curious
about data Data Driven
Data is the DNA for all
decision making. Data
collection, cleaning, analytics
and insights is matured
Data Informed
Business users use
data many times to
make business
decisions

https://www.smartsheet.com/data-driven-decision-making-management
20

Data Driven Organizations


Prominent Examples…
Data Driven Organizations 21

Tesla Netflix Instagram


• Customer satisfaction • Personalized Movie
• Personalized Feed
• Sensor data (10x than Recommendation
• Auto Generated
• AI Powered IGTV
competitor) to train
Video Thumbnails captioning
models to design auto
pilot/self driving cars • Marketing • Targeted Brand
• Collects ocean of Optimization Marketing
data about traffic, • Predictive and
driver, car, battery prescriptive analysis
behavior and may use
it for monetization in
future
Lead from the front with Data 22

Insurance Industry Intelligence Healthcare

• Targeted marketing • Monitoring and • Pharmaceutical


• Underwriting agility analyzing all Research and
• Setting pricing and possible sources for Development
reducing costs lawful surveillance • Disease Detection
• Claims management • Automate and apply • Health Insurance
• Fraud detection AI wherever possible Risk Assessment
as data capture
growth is
exponential

Big data in Insurance: https://www.actuary.org/sites/default/files/files/publications/BigDataAndTheRoleOfTheActuary.pdf


Book: Army of None: Autonomous Weapons and the Future of War, by Paul Scharre
All is good…But then…
23

Per Gartner, NewVantange Partners…


around 70% data initiatives fail

The real question is:


If Failure Is Not an Option,
Why are successful data
initiatives being so Rare?

https://www.bcg.com/publications/2020/increasing-odds-of-success-in-digital-transformation
Hinderances to Data Progression
24

• Data does not lie but wrong or stale data can


misdirect
• Achieving data-driven leadership remains an
elusive aspiration for most organizations
• Data initiatives can not be executed and managed in
silos
• Becoming data-driven requires an organizational focus
on cultural change
• Data must be considered part of business strategy
than just the technology enablement
Few 2022 Insights
25

Source: https://www.newvantage.com/_files/ugd/e5361a_ad5a8b3da8254a71807d2dccdb0844be.pdf
Few More Insights 26
High Level Challenges…
27
Data becomes less valuable for decision- 28

making over time

Most valuable:
Preventive/
Predictive
Actionable

Reactive
Less valuable:
Historical
How quickly can
you analyze
incoming data? Near real time Within seconds Within minutes to Within days to
hours months

© 2022, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 2
The trade-offs of data-driven decisions
29

Cost Speed Accuracy


• How much should you
• How quickly do you • How accurate does
invest to go faster or
predict more need an answer? the prediction need
accurately? to be?
• Can you sacrifice
• How much incremental accuracy for speed? • Does waiting for a
improvement justifies better answer
additional cost? outweigh answering
more quickly?

© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Five Key challenges 30

(5 v’s) in Data Domain

• volume

• velocity

• variety

• veracity

• value
31

Volume

Amount of Data
• It is the base of big data

• It is the initial size and amount of data that is collected

• If the volume of data is large enough, it can be


considered big data
32

Velocity

• How quickly data is generated

• How quickly that data moves

• Companies need their data to flow quickly


To make it available at the right time to enable business
decision making

Example – In healthcare, there are many medical devices made


today to monitor patients and collect data. From in-hospital
medical equipment to wearable devices, collected data needs to
be sent to its destination and analyzed quickly.
33

Variety

• Diversity of the data types

• Outside of the company

• Within the company

• Structured

• Unstructured

• Semi-structure
Type of Data 34

Structured Un-Structured
data that has been predefined and formatted to a data stored in its native format and not processed
set structure until it is used

Easily used by business users Need analytics to derive patterns and behaviors

Predefined structure makes it easy on Machine Native format helps to use data as-is and easy to
Learning algorithms collect as there are not much rules on the
structure

Cons: Limited storage formats and choices. RDBMS Requires more storage as compared to structured
or DW data e.g. Data lakes or cloud data DWs

Cons: Predefine structure also forces limitation on Limited skill set availability due to technical nature
use an manipulations. of the toolset an frameworks

Examples: Bank Account Data, HRMS, Company Examples: IOT sensor data, Log files, Social Media
Financial System , CRM networks, Collaboration tools/websites
Sources of Data 35

Source Example Type Complexity Velocity Volume Variety

Business or HRMS, ERP, CRM, PPM, Structured Low Mid Mid Low
Enterprise EMR
Application
Documents PDF, XLS, JSSON and so Unstructured Mid Low Low Mid
on

Collaboration Emails, Slack, Teams, Unstructured Mid Mid High Mid


Systems/Public Govt sites, business
Webs sites sites
Media Videos, Audio files, Unstructured High High High High
Images

Social LinkedIn, Twitter, Unstructured High High High Mid


Networks TikTok, Instagram

Data Storage File streams, NoSQL, R- Structured/ Low Mid High Mid
ORDBMS Hybrid

Log files Application, events, Unstructured Mid High High High


transactions ,
clickstream
Sensor Medical devices, Unstructured Low High High High
Data/IoT Data Household Devices,
Security systems, Flight
systems
Variety – cont.
36

• Structured
• Organized into a formatted repository
• Data is made more addressable for effective data
processing and analysis

• Unstructured
• Emails
• Text files

• Semi-structure
• Hasn’t been organized
• Has meta-data – example – pictures group by tags
37

Veracity

• defined as the accuracy or truthfulness of a data


set
• In many cases, the veracity of the data sets can be
traced back to the source provenance.
• However, when multiple data sources are combined,
e.g. to increase variety, the interaction across data
sets and the resultant non-homogeneous landscape of
data quality can be difficult to track.

https://datascience.aero/big-data-veracity-value/
38

Value

• What is the value of the data?

• What can the organization do with it?

• What insight can be gained?

• How could that impact bottom line?

• How could that help gain competitive advantage?


But where to start…? 39

Data landscape is ocean

https://mattturck.com/data2021/
Data Life Cycle 40

https://online.hbs.edu/blog/post/data-life-cycle
Example: Data Sci Lifecycle 41

Reis, Joe; Housley, Matt. Fundamentals of Data Engineering (p. 34). O'Reilly Media.
Kindle Edition.
42

How does big data analytics work?

• Data analysts, data scientists, predictive modelers,

statisticians and other analytics professionals,


• collect

• process

• clean and

• analyze

• growing volumes of structured transaction data as well as other forms

of data not used by conventional BI and analytics programs.


43

4 Major steps of the big data


analytics process

• Collect

• Process

• Clean

• Analyze
Collect
44

•Data professionals collect data from a variety of


different sources. Often, it is a mix of semi
structured and unstructured data. While each
organization will use different data streams, some
common sources include:

• internet clickstream data;


• web server logs;
• cloud applications;
• mobile applications;
• social media content;
• text from customer emails and survey responses;
• mobile phone records; and
• machine data captured by sensor connected to the internet
of things (IoT).
Process
45

• After data is collected and stored in a,


• data warehouse or
• data lake

• Data professionals must,


• organize,
• configure and
• partition the data properly for analytical queries.
• Thorough data preparation and processing makes
for higher performance from analytical queries.
46

Clean
• Data is cleansed to improve its quality.

• Data professionals scrub the data using


• scripting tools or
• data quality software

• They look for any errors or inconsistencies, such as


• duplications or formatting mistakes, and
• organize to tidy up the data
47

Analyze

The collected, processed and cleansed data is analyzed with


analytics software. This includes tools for:

• Data mining, which sifts through data sets in search of


patterns and relationships
• predictive analytics, which builds models to forecast
customer behavior and other future actions, scenarios and
trends
• machine learning, which taps various algorithms to analyze
large data sets
• deep learning, which is a more advanced offshoot of
machine learning
• Text mining and statistical analysis software
• artificial intelligence (AI)
• mainstream business intelligence software
• data visualization tools
48

This all sounds logical...

but how to realize it ?


49

The Data Pipeline


Infrastructure for the data driven decisions
for the
Data driven Organization

© 2022, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
50

A data pipeline in its simplest terms

Collect data Store and process Build something


data useful with data

© 2022, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
50
51

Work BACKWARDS to design your


infrastructure

Weigh the trade-offs of cost, speed, and accuracy.

2. What data do you need to 1. What decision are you trying to


support this? make?

© 2022, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
51
52

Layers of the pipeline infrastructure

Storage Predictions
Data Analysis & and decisions
sources Ingestion
Visualization
Processing

© 2022, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
52
53

Actions taken with data in the pipeline

Data wrangling:
Discover
Clean
Normalize Transformation
Enrich

© 2022, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
53
Iterative processing through the pipeline
54

Evaluate
results
Additional and
3
data iterate
2

1
Predictions
Data
and
sources
decisions

© 2022, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
54
Critical Elements of Data Lifecyle 55

Reis, Joe; Housley, Matt. Fundamentals of Data Engineering (p. 24). O'Reilly Media. Kindle Edition.
Modern Data Professional
56

• Must have a good understanding Data Lifecycle


• Have broad understanding on technology stack
• Can not remain as island of their own role
• Continuous evolve and upgrade capabilities
Data Platform Architectural 57

Considerations

• Volume • Data Quality


• Sources • Reliability
• Throughput • Security
• Latency • Self-Service
• Extensibility • Cost
Data Value Chain 58

• Depicts the interdependencies of data elements and respective steps involved in the data life cycle.
• Any missing piece will burden other elements in the chain and worse, can break the chain leading to
derailing analytics goals.

Feedback for continuous


improvements

Source the Secure, Prioritize Extract + Repurpose,


Data from transform and backlog for review Manage,
relevant make the data improvements insights and Automate
channels usable /new features patterns

Refactor,
Architect Operationalize
Build skills Govern with
Organize Imbibe data agility
driven culture
59

Effective data analysis solutions require both storage


and the ability to analyze data in near real time, with low
latency, while yielding high-value returns.
60

Additional Reference Slides


Big data analytics benefits
61

• Quickly analyzing large amounts of data from different


sources, in many different formats and types.

• Rapidly making better-informed decisions for effective


strategizing, which can benefit and improve the supply chain,
operations and other areas of strategic decision-making.

• Cost savings, which can result from new business process


efficiencies and optimizations.

• A better understanding of customer needs, behavior and


sentiment, which can lead to better marketing insights, as well
as provide information for product development.

• Improved, better informed risk management strategies that


draw from large sample sizes of data.
62

Basic principles of analysis strategy

• Never break the law and regulations that your enterprise

needs to follow.

• Be professional and use up-to-date analytical techniques.

• Be prepared to reject every hypothesis.

• Respect the point of view of other professionals.

• Take care of the security.


63

Basic principles of analysis strategy

• Information Governance needs to grow: Next to technological

solutions, organizational measures help to prevent ethical issues. Like

with all staff handling sensitive issues, data scientists should regularly

sign a legally binding nondisclosure agreement

• Connect business to consumer value: This is an essential aspect

as all we are doing is ensuring better value being delivered to our

consumers. We cannot make wrong decisions or analyze wrong data

sets that do not deliver appropriate value to our consumers


64

Basic principles of analysis strategy

Understand the origin of the data: Data is collected for a

specific purpose and with a measurement instrument. In essence,

from the time it is collected, data represents a point of view.

Don’t collect the data for the sake of its availability and potential

opportunity to use it in the future. Collect it for your current

purpose.
65

Basic principles of analysis strategy

There is nothing objective in analytics: Many people want

analysts to aim at truthfulness in their work. However, since the

analysis depends on the primary data collected, an analyst can

only ensure that the most appropriate data for the analysis has

been collected.
Misconceptions of BIG Data
66

• 80% of all data is unstructured: This is one of the oldest


misconceptions about data and data analytics. Given the
variety of data sources, these appear true though these are
not exactly true. There wouldn’t be any patterns to discover in
data if it had no structure. Use of non-relational databases like
NoSQL and graph databases helps create the structures and
the patterns for most data types.
67

Misconceptions of BIG Data

Advanced analytics is just an advanced version


of Normal Analytics: Many believe they have
mastered database analytics and just need next step
to move onto advanced analytics. Legacy analytics are
mainly static reports from the static databases.
Advanced analytics give us the power to reach
intelligent conclusions and solve real business
problems from analyzing the data.
68

Misconceptions of BIG Data

Improved tools will replace the Data Scientist:

Regardless of the type of tool or advancement of the

tools, you would need data scientists/analysts to use

these tools to perform the analysis and get dynamic

reports. In any case, there is a shortage of data

scientists so they would always be required.


Misconceptions of BIG Data
69

Data Scientists need high-level education:


Data Scientists need an analytical, logical mind to define the rules

and relationships between data sources to get positive outcomes.

We believe education solely does not help if application of mind

and logic is not done properly. It is more important to be more

logical and understand the business needs.


Misconceptions of BIG Data 70

We can predict everything with Big Data: While we

can use big data to form patterns and predict many things, big data

cannot predict everything. Hospitals can analyze which kind of people

are at higher risk of heart ailments so that precautions can be taken but

many things in more complex domains such as law and politics cannot

be predicted.
71

Misconceptions of BIG Data

We can predict everything with Big Data: While we

can use big data to form patterns and predict many things, big

data cannot predict everything. Hospitals can analyze which kind

of people are at higher risk of heart ailments so that precautions

can be taken but many things in more complex domains such as

law and politics cannot be predicted.


Misconceptions of BIG Data
72

Big Data isn’t biased: Data is always biased regardless of the


volume or data source. Data is a result of certain measurements
and was collected with some purpose. You need to approach it
with care and be careful that you have collected the dataset from
the representative of the group you are studying or else you get
the wrong data.
Big data analytics uses and
73

examples

Customer acquisition and retention. Consumer

data can help the marketing efforts of companies,

which can act on trends to increase customer

satisfaction. For example, personalization engines for

Amazon, Netflix and Spotify can provide improved

customer experiences and create customer loyalty.


74

Big data analytics uses and


examples
•Targeted ads. Personalization data from
sources such as past purchases, interaction
patterns and product page viewing histories can
help generate compelling targeted ad campaigns
for users on the individual level and on a larger
scale.
75

Big data analytics uses and


examples
•Product development. Big data analytics can

provide insights to inform about product viability,

development decisions, progress measurement

and steer improvements in the direction of what

fits a business' customers.


76

Big data analytics uses and


examples
•Price optimization. Retailers may opt for

pricing models that use and model data from a

variety of data sources to maximize revenues.


77

Big data analytics uses and


examples
•Risk management. Big data analytics can

identify new risks from data patterns for effective

risk management strategies.


78

Big data analytics uses and


examples
•Supply chain and channel

analytics. Predictive analytical models can help

with preemptive replenishment, B2B supplier

networks, inventory management, route

optimizations and the notification of potential

delays to deliveries.
79

Big data analytics uses and


examples
•Improved decision-making. Insights business

users extract from relevant data can help

organizations make quicker and better decisions.


Definitions 80

• Analysis is a detailed examination of something in order to

understand its nature or determine its essential features.

• Data analysis is the process of compiling, processing, and analyzing data so that

you can use it to make decisions.

• Analytics is the systematic analysis of data.

• Data analytics is the specific analytical process being applied.


81

References:
• https://www.newvantage.com/_files/ugd/e5361a_ad5a8b3da825
4a71807d2dccdb0844be.pdf

• https://www.newvantage.com/_files/ugd/e5361a_ad5a8b3da825
4a71807d2dccdb0844be.pdf

• https://online.hbs.edu/blog/post/data-life-cycle

• https://mattturck.com/data2021/

You might also like