Chapter 1 Introduction Data Analytics
Chapter 1 Introduction Data Analytics
CHAPTER 1
At the end of this chapter
students should be able to
understand
➢1.3 Terminology
2
1.1 Background – Evolvement of Data Analytics
Howard
Dresner
3
1.1 Background - Evolvement of Data Analytics
4
1.1 Background- Data Makes Everything Clearer
5
1.1 Background - Big Data vs Traditional Datasets
Data characteristics Traditional Datasets Big Data
6
1.1 Background – Example of Big Data
NYC Taxi Data - includes driver details, pickup and drop-off locations, time of day, trip
locations (longitude-latitude), cab fare and tip amounts. There are over 500,000 taxi trips
daily in central NYC.
Source: N. Ferreira, J. Poco, H.T. Vo, J. Freire, C.T. Silva, Visual exploration of big spatio-temporal urban data: a study of New York City taxi trips, IEEE Trans. Visual Comput.
Graphics, 19 (12) (2013), pp. 2149-2158
8
1.1 Background – Why Big Data Analytic?
What is enabling them?
• Lower Cost
• Greater Storage (HD and RAM)
• Faster Input / Output Operations
• Faster Processing
• Increased Bandwidth
Since 1990, the average price per MB of memory has dropped from $59
to 0.49 cents – a 99.2% price reduction.
At the same time, the capacity of a memory module has increased from
8MB to 8GB.
11
1.2 Data Analytics (DA) - Definitions
• A process of transforming data into actions
through analysis and insight in the context of
organizational decision making and problem-
solving.
• science of analyzing raw data in order to make
conclusions about that information
• A process of inspecting, cleansing,
transforming and modeling data with the goal
of discovering useful information, informing
conclusions and supporting decision-making
12
1.2 Data Analytics - What is Data Analytics?
Analytics is the use of:
• Data,
• Information technology,
• Statistical analysis,
• Quantitative methods, and
• Mathematical or computer-based models
to help users/managers gain improved insight about
their business operations and make better, fact-based
decisions.
1-13
1.2 Data Analytics (DA) – Applications
Some of the applications
are:
Management of customer
relationships
Financial and marketing
activities
Supply chain
management
Human resource planning
Pricing decisions
Support team game
strategies
1-14
1.2 Data Analytics (DA) -Importance
There is a strong relationship of DA with:
▪ Profitability of businesses
▪ Revenue of businesses
▪ Shareholder return
DA enhances understanding of data
DA is vital for businesses to remain
competitive
DA enables creation of informative
reports
1-15
Overview
➢1.1 Background
➢1.2 Data Analytics
➢1.3 Terminology
➢1.4 Big Data
➢1.5 Type of Data
➢1.6 Type of Analytics
➢1.7 Challenges
16
1.3 Terminology - Data Analytics
Data - collected facts and figures
Database - collection of computer files containing data
Information - comes from analyzing data
18
Overview
➢1.1 Background
➢1.2 Data Analytics
➢1.3 Terminology
➢1.4 Big Data
➢1.5 Type of Data
➢1.6 Type of Analytics
➢1.7 Challenges
19
1.4 Big Data - Definitions
Extremely large data sets that may be analyzed computationally to reveal patterns,
trends, and associations
• Big data refers to the large, diverse sets of information that grow at ever-increasing
rates. It encompasses the volume of information, the velocity or speed at which it is
created and collected, and the variety or scope of the data points being covered
• “There is no standard threshold on minimum size of Big Data, although big data in 2013
was considered one petabyte (1,000 terabytes) or larger “ (Dasgupta, 2013).
• “Volume of 100 terabytes to petabytes, have structured and unstructured formats, and
have a constant flow of data” (Davenport, 2014)
20
1.4 Big Data - Definitions
21
1.4 Big
Data
1.4 Big Data – Size
So, we know that “big data” is BIG…
But what does that mean to us?
https://www.redlands.edu/globalassets/depts/school-of-business/gisab/workshops-conferences/brian-hilton-icis_2015_bnh.pdf
1.4 Big Data - Size
24
1.4 Big Data - Size
25
1.4 Big Data – Sources Sources of big data
Main sources of big data can be grouped under the
headings of social (human), machine (sensor) and
transactional.
Social (human) – this source is becoming more and
more relevant to organizations. This source includes all
social media posts, videos posted etc.
Machine (sensor) – this data comes from what can be
measured by the equipment used.
Transactional – this comes from the transactions which
are undertaken by the organization. This is perhaps the
most traditional of the sources.
27
1.4 Big Data - Sources - Sensor Data
https://www.redlands.edu/globalassets/depts/school-of-business/gisab/workshops-conferences/brian-hilton-icis_2015_bnh.pdf
1.4 Big Data – Sources - Smart “Things”
1.4 Big Data – Characteristics
30
1.4 Big Data – Characteristics
Volume
One estimate indicates that 2.5 quintillion (2.5 with
18 zeros) bytes are generated daily worldwide.
Variety
Data appears in various forms (text, number, 2D,
3D, etc.)
Velocity
Data is generated at a very high speed.
Veracity
Are there biases, noise and abnormality in data?
Is the data meaningful to the problem being
analyzed?
Value
Value of data that are useful after processing and
for decision making.
1.4 Big Data - Application
to produce:
32
Overview
➢1.1 Background
➢1.2 Data Analytics
➢1.3 Terminology
➢1.4 Big Data
➢1.5 Type of Data
➢1.6 Type of Analytics
➢1.7 Challenges
33
1.5 Types of Data There are many data types and
Scales of measurement
•A variable is a unit of data
collection whose value can vary.
•Variables can be defined into types
according to the level of
mathematical scaling that can be
carried out on the data.
34
1.5 Types of Data
35
1.5 Types of Data – (i) Categorical/ Nominal
• Nominal or categorical data is data that comprises
of categories that cannot be rank ordered – each
category is just different.
• Categories bear no quantitative relationship to one
another
• Examples:
• Customer’s location (America, Europe, Asia)
• Employee classification (manager, supervisor,
technician)
• Therefore, nominal data reflect qualitative
differences rather than quantitative ones.
36
1.5 Types of Data – (i) Categorical/ Nominal - Examples
• True or False
• Color coded (Blue/Red /Yellow)
• Sex (Male / Female)
• Blood Group types
• Coin toss result (Tail/Head)
• Country (Britain/Germany)
37
1.5 Types of Data – (ii) Ordinal Data
• Ordinal data is data that comprises of categories that can be rank ordered.
• Similarly with categorical data the distance between each category cannot be
calculated but the categories can be ranked above or below each other.
No fixed units of measurement
Examples are:
▪ Size of T-shirt
▪ College football rankings
▪ Survey responses
▪ Income categories
▪ Course Grade point
▪ Age groups
38
1.5 Types of Data – (iii)Interval and (iv) ratio data
• Both interval and ratio data are examples of scale data.
• Scale data:
• Data is in numeric format ($50, $100, $150)
• Data that can be measured on a continuous scale
• The distance between each can be observed and as a result measured
• The data can be placed in rank order.
39
1.5 Types of Data – (iii) Interval data
• Ordinal data but with constant differences between observations but
don’t have a “true zero.”
• Example:
Temperature – moves along a continuous measure of degrees and is without a
true zero. (0 degree does not mean “no temperature”)
Examples
• Temperature (Fahrenheit)
• Temperature (Celsius)
• pH
40
1.5 Types of Data - (iv) Ratio data
Ratio data measured on a continuous scale and does have a natural zero point.
Ratios are meaningful
Examples:
▪ Monthly sales
▪ Delivery times
▪ Weight
▪ Height
▪ Age
▪ Pulse
▪ Time
▪ Length
41
1.5 Types of Data
42
1.5 Types of Data - Summary
43
Overview
➢1.1 Background
➢1.2 Data Analytics
➢1.3 Terminology
➢1.4 Big Data
➢1.5 Type of Data
➢1.6 Types of Analytics
➢1.7 Challenges
44
1.6 Types of Analytic – Traditional Techniques
What is enabling them?
• Classification
• Clustering
• Regression
• Simulation
• Anomaly Detection
• Numerical Forecasting
• Optimization
• Geographic Mapping
• …
Limitations:
• They tend to work best with “Small Data”
• Challenges in handling the 3 V’s (volume, velocity, and variety)
from https://www.redlands.edu/globalassets/depts/school-of-business/gisab/workshops-conferences/brian-hilton-icis_2015_bnh.pdf
1.6 Types of Analytic - “Non-traditional” Techniques
• Ensemble methods • Spatial Analysis
• Combine multiple models, e.g. linear • Spatial sampling, auto-correlation, continuous
regression, decision tree, neural network, contours (ocean, air), etc.
spatial autocorrelation work together to yield • Analytic Point Solutions
one answer. • Software to solve very specific Big Data, Analytics
problems.
• Commodity models
• Virtual Reality
• Apply complex models to address only the • Google VR
high-value data. • Can include fictional or actual geographic
mapping
• Modern Data Visualization
• Multiple graphs and charts linked to the same • Machine Learning
underlying Big Data, and displayed in • AI-based programs that can learn without having
been specifically pre-programmed them for the
Dashboards, including maps application.
• 3-D Displays. 3-D Mapping. • “Intelligent” Robotics is one type
• Neural networks verges on ML, but they are
• Text Analysis (Content Analysis) often restricted to learning in specialized ways
• Appropriate for unstructured text. Opens up
social media, call center conversations, etc. for
powerful analytics. Parse the text and use the
components to extract meaning, valence, and
feelings.
Adapted from Bill Franks. “Taming the big data tidal wave”. Wiley, 2012
1.6 Types of Analytics - Models
Representation of a real system, idea or object
Captures the most important features
Can be a written description, a visual display, a mathematical
formula, or a spreadsheet representation
Are used to understand, analyze, or facilitate decision making.
Types of model input
- Data
- Uncontrollable variables
- Decision variables (controllable)
1-47
1.6 Data Analytics - Types of Analytics
Descriptive analytics
- uses data to understand past
and present
Diagnostic analytics
- a form of advanced analytics
that examines data or content to
answer the question, “Why did it
happen?”
Predictive analytics
- analyzes past performance
Prescriptive analytics
- uses optimization techniques
1-48
1.6 Data Analytics - Types of Analytics
How do we use them for Analysis?
1-50
1.6. Types of Analytics – (ii) Diagnostics Analytics
Why did it happen?
The purpose of diagnostic
analytics is to determine the
root cause of an occurrence
or trend. Often, a trend is
identified using descriptive
analysis step. The company
can then apply diagnostic
analytics to understand why
the trend occurred
1-51
1.6 Types of Analytics - (iii) Predictive Analytics Models
What will occur?
• Predictive analytics is a branch of advanced
analytics that makes predictions about future
outcomes using historical data combined with
statistical modeling, data mining techniques
and machine learning.
• Predictive Analytics models often incorporate
uncertainty to help managers analyze risk.
• Aim to predict what will happen in the future.
• Algorithms for predictive analytics are such as
regression analysis, machine learning, and
neural networks.
1-52
1.6 Types of Analytics - (iv) Prescriptive Analytics Models
What should occur?
• Prescriptive analytics is the process of using data to determine an optimal course of action.
By considering all relevant factors, this analysis yields recommendations for next steps.
• Prescriptive analytics is the use of advanced processes and tools to analyze data and content
to recommend the optimal course of action or strategy moving forward
• Use mathematical programming for revenue management is common for organizations that
have “perishable” goods (e.g., rental cars, hotel rooms, airline seats).
55
Overview
➢1.1 Background
➢1.2 Data Analytics
➢1.3 Terminology
➢1.4 Big Data
➢1.5 Type of Data
➢1.6 Type of Analytics
➢1.7 Challenges
56
1.7 Challenges
Some of the challenges of data analytics are:
• The Bottleneck is in Technology
New architecture, algorithm and techniques are
needed
• Technical skill
• Experts in using new technology and dealing
with new skill
• How will Big Data affect organizational processes.
• One possible trend is towards centralization of
data in the Cloud, after decades of
decentralization
• Privacy
• Concern about privacy invasion and targeting
from Big Data.
• How will Big Data and Analytics change decision-
making.
• To what extent will human managers and
decision-makers override the results of Big
Data. 57
1.7 Challenges
• Data Base – Historical data may not
be fully documented, very
complicated due to manual process.
• There are more data sources than
initially thought
• The data is not as clean as per
expectation.
• Historical data will have a different
format and will be difficult to merge
• Not everyone agrees on what the
‘systems of record’
• Resources may not be available
27
1.7 Challenges – Storage Volume
59
1.7 Challenges - Converting big data
Converting big data Tricky process into valuable insights
60
1.7 Challenges - Technologies
Confusing variety of big data technologies
61
1.7 Challenges – Cost
Cost is expensive
62
1.7 Challenges - Security
High risk big data security loopholes
63
64