Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
43 views

Chapter 1 Introduction Data Analytics

This document provides an overview of data analytics and big data analytics. It discusses the evolution of data analytics, key terminology, different types of data and analytics. The background section describes how factors like lower storage costs, increased processing power and new tools have enabled the rise of big data analytics. It provides an example of analyzing NYC taxi trip data and how tools like TaxiVis allow users to visualize and explore patterns. The document also discusses some common applications of data analytics and its importance for businesses.

Uploaded by

adisyahmi321
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views

Chapter 1 Introduction Data Analytics

This document provides an overview of data analytics and big data analytics. It discusses the evolution of data analytics, key terminology, different types of data and analytics. The background section describes how factors like lower storage costs, increased processing power and new tools have enabled the rise of big data analytics. It provides an example of analyzing NYC taxi trip data and how tools like TaxiVis allow users to visualize and explore patterns. The document also discusses some common applications of data analytics and its importance for businesses.

Uploaded by

adisyahmi321
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 64

FEM 2063 - Data Analytics

CHAPTER 1
At the end of this chapter
students should be able to
understand

An Overview of Data Analytics


and Big Data Analytics
1
Overview
➢1.1 Background
➢1.2 Data Analytics

➢1.3 Terminology

➢1.4 Big Data


➢1.5 Type of Data
➢1.6 Type of Analytics
➢1.7 Challenges

2
1.1 Background – Evolvement of Data Analytics

W.E. Demming Peter Luhn


R.A. Fisher

Howard
Dresner

3
1.1 Background - Evolvement of Data Analytics

4
1.1 Background- Data Makes Everything Clearer

5
1.1 Background - Big Data vs Traditional Datasets
Data characteristics Traditional Datasets Big Data

Type of data Formatted in columns and rows Unstructured formats

Volume of data 10s of terabytes or less 10 terabytes to petabytes

Flow of data Static pool of data Continual flow


Analytical methods Hypothesis-based Machine learning
Internal decision support and
primary purpose Data-based products
services

6
1.1 Background – Example of Big Data
NYC Taxi Data - includes driver details, pickup and drop-off locations, time of day, trip
locations (longitude-latitude), cab fare and tip amounts. There are over 500,000 taxi trips
daily in central NYC.

Was a tip paid for the trip? (Binary Classification)


What was the tip amount range? (Multiclass Classification)
What was the tip amount? (Regression)
How agglomerated are the origin points of the taxi rides?
(Spatial Autocorrelation)

An analysis of the data, for instance, shows that:


• Almost 50% of the trips did not result in a tip,
• The median tip on Friday and Saturday nights was typically the highest, and
• The largest tips came from taxis going from Manhattan to Queens.
1.1 Background – Example of Big Data
A user-friendly interface TaxiVis to view and analyze the patterns and movements of NYC Taxi data

Taxi trips from Lower


Manhattan to JFK and
LGA airports in May 2011.
Left --> trips on Sundays
Right --> trips on
Mondays.

Blue dots --> pickups


Red dots --> drop-offs

The scatter plots to show


the relationship between
hour of the day and trip
duration.
In Blue --> Trips to JFK
In Red --> trips to LGA .

Source: N. Ferreira, J. Poco, H.T. Vo, J. Freire, C.T. Silva, Visual exploration of big spatio-temporal urban data: a study of New York City taxi trips, IEEE Trans. Visual Comput.
Graphics, 19 (12) (2013), pp. 2149-2158
8
1.1 Background – Why Big Data Analytic?
What is enabling them?
• Lower Cost
• Greater Storage (HD and RAM)
• Faster Input / Output Operations
• Faster Processing
• Increased Bandwidth

Since 1990, the average price per MB of memory has dropped from $59
to 0.49 cents – a 99.2% price reduction.
At the same time, the capacity of a memory module has increased from
8MB to 8GB.

(source: Microsoft, courtesy of Brian Hilton)


1.1 Background – Why Big Data Analytic?
What is enabling them?

• Cloud / Distributed Computing


• New Data Management Tools (Hadoop, etc.)
• New Technologies (Spark, etc.)
• Ease-of-Use (Browser-based, etc.)
Overview
➢1.1 Background
➢1.2 Data Analytics
➢1.3 Terminology
➢1.4 Big Data
➢1.5 Type of Data
➢1.6 Type of Analytics
➢1.7 Challenges

11
1.2 Data Analytics (DA) - Definitions
• A process of transforming data into actions
through analysis and insight in the context of
organizational decision making and problem-
solving.
• science of analyzing raw data in order to make
conclusions about that information
• A process of inspecting, cleansing,
transforming and modeling data with the goal
of discovering useful information, informing
conclusions and supporting decision-making

12
1.2 Data Analytics - What is Data Analytics?
Analytics is the use of:
• Data,
• Information technology,
• Statistical analysis,
• Quantitative methods, and
• Mathematical or computer-based models
to help users/managers gain improved insight about
their business operations and make better, fact-based
decisions.

1-13
1.2 Data Analytics (DA) – Applications
Some of the applications
are:
 Management of customer
relationships
 Financial and marketing
activities
 Supply chain
management
 Human resource planning
 Pricing decisions
 Support team game
strategies
1-14
1.2 Data Analytics (DA) -Importance
 There is a strong relationship of DA with:
▪ Profitability of businesses
▪ Revenue of businesses
▪ Shareholder return
 DA enhances understanding of data
 DA is vital for businesses to remain
competitive
 DA enables creation of informative
reports

1-15
Overview
➢1.1 Background
➢1.2 Data Analytics
➢1.3 Terminology
➢1.4 Big Data
➢1.5 Type of Data
➢1.6 Type of Analytics
➢1.7 Challenges

16
1.3 Terminology - Data Analytics
 Data - collected facts and figures
 Database - collection of computer files containing data
 Information - comes from analyzing data

 Metrics - are used to quantify performance.


 Measures - are numerical values of metrics.
 Discrete metrics -involve counting; e.g.
 -on time or not on time
 -number of on time deliveries
 Continuous metrics - are measured on a continuum; e.g.
 - Delivery time
 - Package weight
1-17
1.3 Terminology-Data Types

18
Overview
➢1.1 Background
➢1.2 Data Analytics
➢1.3 Terminology
➢1.4 Big Data
➢1.5 Type of Data
➢1.6 Type of Analytics
➢1.7 Challenges

19
1.4 Big Data - Definitions
Extremely large data sets that may be analyzed computationally to reveal patterns,
trends, and associations
• Big data refers to the large, diverse sets of information that grow at ever-increasing
rates. It encompasses the volume of information, the velocity or speed at which it is
created and collected, and the variety or scope of the data points being covered
• “There is no standard threshold on minimum size of Big Data, although big data in 2013
was considered one petabyte (1,000 terabytes) or larger “ (Dasgupta, 2013).
• “Volume of 100 terabytes to petabytes, have structured and unstructured formats, and
have a constant flow of data” (Davenport, 2014)

20
1.4 Big Data - Definitions

21
1.4 Big
Data
1.4 Big Data – Size
So, we know that “big data” is BIG…
But what does that mean to us?

https://www.redlands.edu/globalassets/depts/school-of-business/gisab/workshops-conferences/brian-hilton-icis_2015_bnh.pdf
1.4 Big Data - Size

24
1.4 Big Data - Size

25
1.4 Big Data – Sources Sources of big data
Main sources of big data can be grouped under the
headings of social (human), machine (sensor) and
transactional.
Social (human) – this source is becoming more and
more relevant to organizations. This source includes all
social media posts, videos posted etc.
Machine (sensor) – this data comes from what can be
measured by the equipment used.
Transactional – this comes from the transactions which
are undertaken by the organization. This is perhaps the
most traditional of the sources.

Lots of data is being collected and warehoused


• Web data, e-commerce
• Financial transactions, bank/credit transactions
• Online trading and purchasing
• Social network and Mobile devices
• Sensors 26
1.4 Big Data - Sources - User-Generated Contents

Sources Big Data include:


• GPS
• Satellite remote sensing
• Aerial surveying
• Radar
• Sensor networks
• Digital cameras
• Location of readings of RFID
• Internet of things

27
1.4 Big Data - Sources - Sensor Data

https://www.redlands.edu/globalassets/depts/school-of-business/gisab/workshops-conferences/brian-hilton-icis_2015_bnh.pdf
1.4 Big Data – Sources - Smart “Things”
1.4 Big Data – Characteristics

30
1.4 Big Data – Characteristics
Volume
One estimate indicates that 2.5 quintillion (2.5 with
18 zeros) bytes are generated daily worldwide.
Variety
Data appears in various forms (text, number, 2D,
3D, etc.)
Velocity
Data is generated at a very high speed.
Veracity
Are there biases, noise and abnormality in data?
Is the data meaningful to the problem being
analyzed?
Value
Value of data that are useful after processing and
for decision making.
1.4 Big Data - Application

Crowdsourcing + Physical modeling + Sensing + Data assimilation

to produce:

32
Overview
➢1.1 Background
➢1.2 Data Analytics
➢1.3 Terminology
➢1.4 Big Data
➢1.5 Type of Data
➢1.6 Type of Analytics
➢1.7 Challenges

33
1.5 Types of Data There are many data types and
Scales of measurement
•A variable is a unit of data
collection whose value can vary.
•Variables can be defined into types
according to the level of
mathematical scaling that can be
carried out on the data.

34
1.5 Types of Data

35
1.5 Types of Data – (i) Categorical/ Nominal
• Nominal or categorical data is data that comprises
of categories that cannot be rank ordered – each
category is just different.
• Categories bear no quantitative relationship to one
another
• Examples:
• Customer’s location (America, Europe, Asia)
• Employee classification (manager, supervisor,
technician)
• Therefore, nominal data reflect qualitative
differences rather than quantitative ones.

36
1.5 Types of Data – (i) Categorical/ Nominal - Examples
• True or False
• Color coded (Blue/Red /Yellow)
• Sex (Male / Female)
• Blood Group types
• Coin toss result (Tail/Head)
• Country (Britain/Germany)

37
1.5 Types of Data – (ii) Ordinal Data
• Ordinal data is data that comprises of categories that can be rank ordered.
• Similarly with categorical data the distance between each category cannot be
calculated but the categories can be ranked above or below each other.
 No fixed units of measurement
 Examples are:
▪ Size of T-shirt
▪ College football rankings
▪ Survey responses
▪ Income categories
▪ Course Grade point
▪ Age groups
38
1.5 Types of Data – (iii)Interval and (iv) ratio data
• Both interval and ratio data are examples of scale data.
• Scale data:
• Data is in numeric format ($50, $100, $150)
• Data that can be measured on a continuous scale
• The distance between each can be observed and as a result measured
• The data can be placed in rank order.

39
1.5 Types of Data – (iii) Interval data
• Ordinal data but with constant differences between observations but
don’t have a “true zero.”
• Example:
Temperature – moves along a continuous measure of degrees and is without a
true zero. (0 degree does not mean “no temperature”)
Examples
• Temperature (Fahrenheit)
• Temperature (Celsius)
• pH

40
1.5 Types of Data - (iv) Ratio data
Ratio data measured on a continuous scale and does have a natural zero point.
 Ratios are meaningful
 Examples:
▪ Monthly sales
▪ Delivery times
▪ Weight
▪ Height
▪ Age
▪ Pulse
▪ Time
▪ Length
41
1.5 Types of Data

42
1.5 Types of Data - Summary

43
Overview
➢1.1 Background
➢1.2 Data Analytics
➢1.3 Terminology
➢1.4 Big Data
➢1.5 Type of Data
➢1.6 Types of Analytics
➢1.7 Challenges

44
1.6 Types of Analytic – Traditional Techniques
What is enabling them?
• Classification
• Clustering
• Regression
• Simulation
• Anomaly Detection
• Numerical Forecasting
• Optimization
• Geographic Mapping
• …

Limitations:
• They tend to work best with “Small Data”
• Challenges in handling the 3 V’s (volume, velocity, and variety)

from https://www.redlands.edu/globalassets/depts/school-of-business/gisab/workshops-conferences/brian-hilton-icis_2015_bnh.pdf
1.6 Types of Analytic - “Non-traditional” Techniques
• Ensemble methods • Spatial Analysis
• Combine multiple models, e.g. linear • Spatial sampling, auto-correlation, continuous
regression, decision tree, neural network, contours (ocean, air), etc.
spatial autocorrelation work together to yield • Analytic Point Solutions
one answer. • Software to solve very specific Big Data, Analytics
problems.
• Commodity models
• Virtual Reality
• Apply complex models to address only the • Google VR
high-value data. • Can include fictional or actual geographic
mapping
• Modern Data Visualization
• Multiple graphs and charts linked to the same • Machine Learning
underlying Big Data, and displayed in • AI-based programs that can learn without having
been specifically pre-programmed them for the
Dashboards, including maps application.
• 3-D Displays. 3-D Mapping. • “Intelligent” Robotics is one type
• Neural networks verges on ML, but they are
• Text Analysis (Content Analysis) often restricted to learning in specialized ways
• Appropriate for unstructured text. Opens up
social media, call center conversations, etc. for
powerful analytics. Parse the text and use the
components to extract meaning, valence, and
feelings.
Adapted from Bill Franks. “Taming the big data tidal wave”. Wiley, 2012
1.6 Types of Analytics - Models
 Representation of a real system, idea or object
 Captures the most important features
 Can be a written description, a visual display, a mathematical
formula, or a spreadsheet representation
 Are used to understand, analyze, or facilitate decision making.
 Types of model input
- Data
- Uncontrollable variables
- Decision variables (controllable)

1-47
1.6 Data Analytics - Types of Analytics
 Descriptive analytics
- uses data to understand past
and present
 Diagnostic analytics
- a form of advanced analytics
that examines data or content to
answer the question, “Why did it
happen?”
 Predictive analytics
- analyzes past performance
 Prescriptive analytics
- uses optimization techniques

1-48
1.6 Data Analytics - Types of Analytics
How do we use them for Analysis?

(source: courtesy of Brian Hilton)


1.6. Types of Analytics – (i) Descriptive Analytics Models
What has occurred?
 Descriptive analytics focuses on summarizing
and highlighting patterns in current and
historical data, which helps companies
understand what has happened to date.
 Descriptive analytics, such as reporting /Online
analytical processing (OLAP), dashboards, and
data visualization, is important in helping users
interpret the output.
 Simply tell “what is”, to identify trends and
relationships.
 Do not tell managers what to do

1-50
1.6. Types of Analytics – (ii) Diagnostics Analytics
Why did it happen?
The purpose of diagnostic
analytics is to determine the
root cause of an occurrence
or trend. Often, a trend is
identified using descriptive
analysis step. The company
can then apply diagnostic
analytics to understand why
the trend occurred

1-51
1.6 Types of Analytics - (iii) Predictive Analytics Models
What will occur?
• Predictive analytics is a branch of advanced
analytics that makes predictions about future
outcomes using historical data combined with
statistical modeling, data mining techniques
and machine learning.
• Predictive Analytics models often incorporate
uncertainty to help managers analyze risk.
• Aim to predict what will happen in the future.
• Algorithms for predictive analytics are such as
regression analysis, machine learning, and
neural networks.

1-52
1.6 Types of Analytics - (iv) Prescriptive Analytics Models
What should occur?
• Prescriptive analytics is the process of using data to determine an optimal course of action.
By considering all relevant factors, this analysis yields recommendations for next steps.
• Prescriptive analytics is the use of advanced processes and tools to analyze data and content
to recommend the optimal course of action or strategy moving forward
• Use mathematical programming for revenue management is common for organizations that
have “perishable” goods (e.g., rental cars, hotel rooms, airline seats).

How does Netflix use prescriptive analytics?


Netflix uses AI-powered algorithms to make
predictions based on the user's watch
history, search history, demographics,
ratings, and preferences. These predictions
shows with 80% accuracy what the user
might be interested in seeing next.
53
1.6 Types of Analytics - (iv) Prescriptive Analytics Models
Prescriptive Decision Models help decision makers identify the best solution.
 Optimization - finding values of decision variables that minimize (or maximize)
something such as cost (or profit).
Marketing and sales perspective, prescriptive analytics can be used to:
➢ Optimize the assortment of products in a retail store.
➢ optimally price items and services.
➢ find the best mix of marketing methods (online, print, radio, etc.)
➢ negotiate a better contract with customers and vendors.
Transportation and its logistics study, use prescriptive analytics to:
➢ Improve driver retention to reduce training costs.
➢ eliminate unnecessary driving, flight, and sea transportation miles.
➢ increase driver productivity by improving routes and eliminating wait times to
load/unload.
➢ increase speeds and reduce costs by optimizing distribution networks. 1-54
1.6 Types of Analytics - Summary

55
Overview
➢1.1 Background
➢1.2 Data Analytics
➢1.3 Terminology
➢1.4 Big Data
➢1.5 Type of Data
➢1.6 Type of Analytics
➢1.7 Challenges

56
1.7 Challenges
Some of the challenges of data analytics are:
• The Bottleneck is in Technology
New architecture, algorithm and techniques are
needed
• Technical skill
• Experts in using new technology and dealing
with new skill
• How will Big Data affect organizational processes.
• One possible trend is towards centralization of
data in the Cloud, after decades of
decentralization
• Privacy
• Concern about privacy invasion and targeting
from Big Data.
• How will Big Data and Analytics change decision-
making.
• To what extent will human managers and
decision-makers override the results of Big
Data. 57
1.7 Challenges
• Data Base – Historical data may not
be fully documented, very
complicated due to manual process.
• There are more data sources than
initially thought
• The data is not as clean as per
expectation.
• Historical data will have a different
format and will be difficult to merge
• Not everyone agrees on what the
‘systems of record’
• Resources may not be available

27
1.7 Challenges – Storage Volume

59
1.7 Challenges - Converting big data
Converting big data Tricky process into valuable insights

60
1.7 Challenges - Technologies
Confusing variety of big data technologies

61
1.7 Challenges – Cost
Cost is expensive

62
1.7 Challenges - Security
High risk big data security loopholes

63
64

You might also like