Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
42 views

Chapter 1 Introduction Data Analytics

This document provides an overview of data analytics and big data analytics. It discusses the background and history of data analytics, including key figures. It defines data analytics and big data, and describes the types of data and analytics. Challenges of working with big data are also mentioned. The document uses examples from New York City taxi data to illustrate big data concepts. It explains how lower costs, greater storage, faster processing, and cloud computing have enabled big data analytic platforms.

Uploaded by

shah reza
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

Chapter 1 Introduction Data Analytics

This document provides an overview of data analytics and big data analytics. It discusses the background and history of data analytics, including key figures. It defines data analytics and big data, and describes the types of data and analytics. Challenges of working with big data are also mentioned. The document uses examples from New York City taxi data to illustrate big data concepts. It explains how lower costs, greater storage, faster processing, and cloud computing have enabled big data analytic platforms.

Uploaded by

shah reza
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 68

FEM 2063 - Data Analytics

CHAPTER 1
At the end of this chapter
students should be able to
understand

An Overview of Data Analytics


and Big Data Analytics
1
Overview
➢1.1 Background
➢1.2 Data Analytics

➢1.3 Terminology

➢1.4 Big Data


➢1.5 Type of Data
➢1.6 Type of Analytics
➢1.7 Challenges

2
1.1 Background - Data Analytics has been around

W.E. Demming Peter Luhn


R.A. Fisher

Howard
Dresner

3
1.1 Background - Data Analytics has been around

4
1.1 Background- Data Makes Everything Clearer

5
1.1 Background - Big Data vs Traditional Datasets
Data characteristics Traditional Datasets Big Data

Type of data Formatted in columns and rows Unstructured formats

Volume of data 10s of terabytes or less 10 terabytes to petabytes

Flow of data Static pool of data Continual flow


Analytical methods Hypothesis-based Machine learning

primary purpose Internal decision support and services Data-based products

6
1.1 Background- Big Data – Example
NYC Taxi Data - includes driver details, pickup and drop-off locations, time of day, trip
locations (longitude-latitude), cab fare and tip amounts. There are over 500,000 taxi trips
daily in central NYC.

Was a tip paid for the trip? (Binary Classification)


What was the tip amount range? (Multiclass Classification)
What was the tip amount? (Regression)
How agglomerated are the origin points of the taxi rides?
(Spatial Autocorrelation)

An analysis of the data, for instance, shows that:


• Almost 50% of the trips did not result in a tip,
• The median tip on Friday and Saturday nights was typically the highest, and
• The largest tips came from taxis going from Manhattan to Queens.
1.1 Background- Big Data – Example
A user-friendly interface TaxiVis to view and analyze the patterns and movements of NYC Taxi data

Taxi trips from Lower


Manhattan to JFK and LGA
airports in May 2011.
Left --> trips on Sundays
Right --> trips on Mondays.

Blue dots --> pickups


Red dots --> dropoffs

The scatter plots to show


the relationship between
hour of the day and trip
duration.
In Blue --> Trips to JFK
In Red --> trips to LGA .

Source: N. Ferreira, J. Poco, H.T. Vo, J. Freire, C.T. Silva, Visual exploration of big spatio-temporal urban data: a study of New York City taxi trips, IEEE Trans. Visual Comput.
Graphics, 19 (12) (2013), pp. 2149-2158
8
1.1 Background – Why Big Data Analytic Platforms
What is enabling them?
• Lower Cost
• Greater Storage (HD and RAM)
• Faster Input / Output Operations
• Faster Processing
• Increased Bandwidth

Since 1990, the average price per MB of memory has dropped from $59
to 0.49 cents – a 99.2% price reduction.
At the same time, the capacity of a memory module has increased from
8MB to 8GB.

(source: Microsoft, courtesy of Brian Hilton)


1.1 Background – Why Big Data Analytic Platforms
What is enabling them?

• Cloud / Distributed Computing

• New Data Management Tools (Hadoop, etc.)

• New Technologies (Spark, etc.)

• Ease-of-Use (Browser-based, etc.)


Overview
➢1.1 Background
➢1.2 Data Analytics
➢1.3 Terminology
➢1.4 Big Data
➢1.5 Type of Data
➢1.6 Type of Analytics
➢1.7 Challenges

11
1.2 Data Analytics (DA) - Definition
• A process of transforming data into actions
through analysis and insight in the context of
organizational decision making and problem-
solving.”
• science of analyzing raw data in order to make
conclusions about that information
• A process of inspecting, cleansing, transforming
and modeling data with the goal of discovering
useful information, informing conclusions and
supporting decision-making

12
1.2 Data Analytics - What is Data Analytics?
Analytics is the use of:
• Data,
• Information technology,
• Statistical analysis,
• Quantitative methods, and
• Mathematical or computer-based models
to help managers gain improved insight about their
business operations and make better, fact-based
decisions.

1-13
1.2 Data Analytics (DA) – Applications
 Management of customer relationships

 Financial and marketing activities


 Supply chain management

 Human resource planning

 Pricing decisions
 Sport team game strategies

1-14
1.2 Data Analytics (DA) -Importance
 There is a strong relationship of DA with:
▪ Profitability of businesses
▪ Revenue of businesses
▪ Shareholder return

 DA enhances understanding of data

 DA is vital for businesses to remain


competitive

 DA enables creation of informative reports

1-15
1.2 Data Analytics (DA) - Example

1-16
1.2 Data Analytics - Types with Examples
Retail Market
 Most department stores clear seasonal inventory by reducing prices.

 The question is:


When to reduce the price and by how much?
 Descriptive analytics: examine historical data for similar products
(prices, units sold, advertising, …)
 Predictive analytics: predict sales based on price
 Prescriptive analytics: find the best sets of pricing and advertising to
maximize sales revenue

1-17
Overview
➢1.1 Background
➢1.2 Data Analytics
➢1.3 Terminology
➢1.4 Big Data
➢1.5 Type of Data
➢1.6 Type of Analytics
➢1.7 Challenges

18
1.3 Terminology - Data Analytics
 Data - collected facts and figures
 Database - collection of computer files containing data
 Information - comes from analyzing data

 Metrics - are used to quantify performance.


 Measures - are numerical values of metrics.
 Discrete metrics -involve counting; e.g.
 -on time or not on time
 -number of on time deliveries
 Continuous metrics - are measured on a continuum; e.g.
 - Delivery time
 - Package weight
1-19
1.3 Terminology-Data Types

20
Overview
➢1.1 Background
➢1.2 Data Analytics
➢1.3 Terminology
➢1.4 Big Data
➢1.5 Type of Data
➢1.6 Type of Analytics
➢1.7 Challenges

21
1.4 Big Data - Definition
Extremely large data sets that may be analyzed computationally to reveal
patterns, trends, and associations

• “data sets that are so big they cannot be handled


efficiently by common database management systems”.
(Dasgupta, 2013).

• “There is no standard threshold on minimum size of Big


Data, although big data in 2013 was considered one
petabyte (1,000 terabytes) or larger “ (Dasgupta, 2013).

• “Volume of 100 terabytes to petabytes, have


structured and unstructured formats, and have a
constant flow of data” (Davenport, 2014)
22
1.4 Big
Data
1.4 Big Data – Size
So, we know that “big data” is BIG…
But what does that mean to us?

https://www.redlands.edu/globalassets/depts/school-of-business/gisab/workshops-conferences/brian-hilton-icis_2015_bnh.pdf
1.4 Big Data - Size
• eBay has 6.5 PB of user data + 50 TB/day (5/2009)
• Google processes 20 PB a day (2008)
• Facebook has 60 TB of daily logs
• 1000 genomes project: 200 T
• Time to read 1 TB disk: 3 hrs (100 MB/s)
Example - Set (50552 rows)
6509887 Construction 1430 35TH Construct
AVE additions
SINGLE FAMILY
andADD/ALT
alterations
/ DUPLEX Plan
to existing
Review single $509,239.00
family residenceWOOTEN,and establish
SHARYN
#########detached accessory dwelling unit, per plan. Application Accepted http://web6.seattle.gov/dpd/PermitStatus/Project.aspx?id=6509887
47.61382 -122.288 (47.61381638, -122.2878649)
6533114 Site Development2851 NW 72ND
Tree STremoval of one Douglas TREE/VEGETATION
Fir.Tree
Norisk
planassessment
MAINT/RESTORE
review provided. $0.00 ADAMS, ASHLEY ######### AP Closed http://web6.seattle.gov/dpd/PermitStatus/Project.aspx?id=6533114
47.68079 -122.395 (47.6807873, -122.39525408)
6530899 Construction 154 20TH AVE
Establish
E useSINGLE
as townhouse
FAMILY
NEW/and DUPLEX
Construct
Plan Reviewnew two-family
$300,786.00dwelling,
KIM,perBRIAN
plan.######### Application Accepted http://web6.seattle.gov/dpd/PermitStatus/Project.aspx?id=6530899
3022948 47.61989 -122.306 (47.61988579, -122.3058199)
6535290 Site Development3460R 3RD Shoreline
AVE W Exemption onSHORELINE 4 SPU underground
Plan
EXEMPTION
Review utility
ONLY tunnels.$0.00
Work ATIEAU,
in the rightCLAY #########
of way for NW Canal St & 2nd Ave NW (north workApplicationsite)-and WCITY
Accepted
Ewing
OF SEA
St (south
http://web6.seattle.gov/dpd/PermitStatus/Project.aspx?id=6535290
SPU DRAIN
work site).
& WASTEAdditional
47.65197
work sites
-122.361
at 170 (47.65196506,
W Ewing St & 190
-122.36087789)
W Ewing St.
6535118 Construction 800 31ST AVE
Construct front
SINGLE andFAMILY
rearADD/ALT
deck
/ DUPLEX
to single
No plan
familyreview
residence,
$5,000.00
subject toSCOFIELD,
field inspection
ALEX
#########
(STFI).######### 11/17/2017 Permit Issued http://web6.seattle.gov/dpd/PermitStatus/Project.aspx?id=6535118
47.60943 -122.292 (47.60942802, -122.29236301)
6533136 Site Development2400 11TH Removal
AVE E of 2 Big Leaf Maples. TREE/VEGETATION
TreeNo
riskplan
assessment
MAINT/RESTORE
review provided. $0.00 O'NEIL, JOHN ######### AP Closed http://web6.seattle.gov/dpd/PermitStatus/Project.aspx?id=6533136
47.64133 -122.316 (47.64132744, -122.31645152)
6535415 Demolition 3635 PHINNEY
Demo AVEexsiting
N MULTIFAMILY
single family
DEMOLITION
residence
No subject
plan review
to field inspection
$0.00 (STFI)
VOIGT, JAKE######### ######### 11/17/2017 Permit Issued BUILD URBAN http://web6.seattle.gov/dpd/PermitStatus/Project.aspx?id=6535415
LLC 3017589 47.65332 -122.355 (47.65331998, -122.35480073)
6535403 Construction 3645 45TH Interior
AVE SWalterations
SINGLE FAMILY
to remodel
ADD/ALT
/ DUPLEX
2ndNofloor
planbathroom
review of$20,000.00
single familyHANSMIRE,
residence,#########
STEFAN
subject to field
#########
inspection (STFI). 11/17/2017 Permit Issued http://web6.seattle.gov/dpd/PermitStatus/Project.aspx?id=6535403
47.57074 -122.39 (47.57073555, -122.38985286)
6521205 Construction 1326 5TH AVE
Replacement COMMERCIAL
of existingADD/ALT
theater soundPlan room.
Review $90,000.00 WEAVER, HANK ######### Application Accepted http://web6.seattle.gov/dpd/PermitStatus/Project.aspx?id=6521205
47.60932 -122.334 (47.60932305, -122.33389853)
6530115 Construction 4521 46TH Alteration
AVE SW ofSINGLE existingFAMILY
single
ADD/ALT
/family
DUPLEXresidence
Plan Review to create$60,000.00
a room above BERMAN,
the garage,
MARGARET
#########
per plan. Application Accepted http://web6.seattle.gov/dpd/PermitStatus/Project.aspx?id=6530115
47.56227 -122.391 (47.5622663, -122.39118372)
6518960 Construction 1419 35TH Construct
AVE alternations
SINGLE FAMILY and
ADD/ALT
/dormer
DUPLEXPlan
addition
Review to an existing
$80,550.00
single family
COLUCCIO,residence,
MARC
#########
per plan. Application Accepted http://web6.seattle.gov/dpd/PermitStatus/Project.aspx?id=6518960
47.61351 -122.289 (47.61351439, -122.28850533)
6526693 Construction 1911 PIKE PL
Construct voluntary
COMMERCIAL seismic
ADD/ALT upgrades
PlantoReview
existing Desimone
$700,000.00 Bridge,DOUB,
per planSTEVE ######### Application Accepted http://web6.seattle.gov/dpd/PermitStatus/Project.aspx?id=6526693
47.61008 -122.343 (47.61007972, -122.34313084)
6526693 Construction 1911 PIKE PL
Construct voluntary
COMMERCIAL seismic
ADD/ALT upgrades
PlantoReview
existing Desimone
$700,000.00 Bridge,DOUB,
per planSTEVE ######### Application Accepted http://web6.seattle.gov/dpd/PermitStatus/Project.aspx?id=6526693
47.61008 -122.343 (47.61007972, -122.34313084)
6533800 Construction 1749 S SNOQUALMIE
AlterationsSTSINGLE
for repair FAMILY
ofADD/ALT
existing
/ DUPLEX
deck
Noabove
plan review
a garage,$30,000.00
and trellis over
JO-BUTRIM,
deck, subject
#########
SUSAN to field#########
inspection (STFI). 11/17/2017 Permit Issued http://web6.seattle.gov/dpd/PermitStatus/Project.aspx?id=6533800
47.56142 -122.308 (47.56142427, -122.30809053)
6533800 Construction 1749 S SNOQUALMIE
AlterationsSTSINGLE
for repair FAMILY
ofADD/ALT
existing
/ DUPLEX
deck
Noabove
plan review
a garage,$30,000.00
and trellis over
JO-BUTRIM,
deck, subject
#########
SUSAN to field#########
inspection (STFI). 11/17/2017 Permit Issued http://web6.seattle.gov/dpd/PermitStatus/Project.aspx?id=6533800
47.56142 -122.308 (47.56142427, -122.30809053)
6535379 Construction 3902 SW CHARLESTOWN
Construct interior
SINGLE
ST alterations
FAMILY
ADD/ALT
/ DUPLEX
to existing
No plan single
review
family$24,615.00
residence, per HERON,
(STFI)HOLLICE######### ######### 11/17/2017 Permit Issued http://web6.seattle.gov/dpd/PermitStatus/Project.aspx?id=6535379
47.57038 -122.382 (47.57037835, -122.38168041)
6535373 Construction 1124 COLUMBIA
Construct
ST alterations
INSTITUTIONAL inADD/ALT
Center Atrium No plan
on main
reviewlevel of
$2,500.00
First Hill Pavilion
RICE, SCOTT#########
of Swedish Hos[ital. #########
subject to field inspection
11/17/2017
(STFI) Permit Issued http://web6.seattle.gov/dpd/PermitStatus/Project.aspx?id=6535373
47.60863 -122.324 (47.6086266, -122.32373921)
6532900 Site Development4550R 22NDRemoval
AVE SWof red alder, big TREE/VEGETATION
leaf maple,Noscouler
planMAINT/RESTORE
review
willow, and bitter
$0.00cherry
NICKERSON,
trees that #########
TAGE
are hazardardous, and/or dead, dying, or diseased AP Closed
per Tree Risk Assessment
http://web6.seattle.gov/dpd/PermitStatus/Project.aspx?id=6532900
report prepared by47.56216
Gilles Consulting,
-122.362April
(47.56216004,
26th, 2016. -122.36160322)
6532900 Site Development4550R 22NDRemoval
AVE SWof red alder, big TREE/VEGETATION
leaf maple,Noscouler
planMAINT/RESTORE
review
willow, and bitter
$0.00cherry
NICKERSON,
trees that #########
TAGE
are hazardardous, and/or dead, dying, or diseased AP Closed
per Tree Risk Assessment
http://web6.seattle.gov/dpd/PermitStatus/Project.aspx?id=6532900
report prepared by47.56216
Gilles Consulting,
-122.362April
(47.56216004,
26th, 2016. -122.36160322)
6534328 Construction 6015 48TH Construct
AVE SW detached
SINGLE FAMILY
garage
ADD/ALT
/toDUPLEX
existing
No plan
singlereview
family residence
$1,900.00 Subject
VERVILLES,
To FieldTHEO
#########
Inspection STFI######### 11/17/2017 Permit Issued http://web6.seattle.gov/dpd/PermitStatus/Project.aspx?id=6534328
47.54813 -122.394 (47.54812835, -122.39415012)
6535147 Construction 800 NE 95THConstruct
ST deck
SINGLEandFAMILY
trellis
ADD/ALT
alterations
/ DUPLEX Noto plan
an review
exsiting single
$30,000.00
family residence
BANKS, JAREDsubject
#########
to field#########
inspection *STFI) 11/17/2017 Permit Issued http://web6.seattle.gov/dpd/PermitStatus/Project.aspx?id=6535147
47.69787 -122.32 (47.69787283, -122.32016801)
6535367 Construction 11306 30THConstruct
AVE NE inteior
SINGLEalterations
FAMILY
ADD/ALT
/ DUPLEX
to existing
No plansingle
review
family,$45,000.00
per (STFI) SOMERS, CRAIG ######### ######### 11/17/2017 Permit Issued http://web6.seattle.gov/dpd/PermitStatus/Project.aspx?id=6535367
47.71045 -122.296 (47.71045122, -122.29598146)
6535356 Construction 2201 6TH AVE
Interior alterations
COMMERCIAL to southeast
ADD/ALT portionNo plan of review
10th floor,$1,500.00
subject to field
TAYLOR,
inspection
SCOTT#########
(STFI). ######### 11/17/2017 Permit Issued http://web6.seattle.gov/dpd/PermitStatus/Project.aspx?id=6535356
47.616 -122.342 (47.61599976, -122.34166938)
6535357 Site Development3323 NW GOLDEN
RemovalPLof SINGLE
tulip tree.FAMILY
Tree
TREE/VEGETATION
risk
/ DUPLEX
assessment
No planMAINT/RESTORE
review
provided. $0.00 ADAMS, ASJA #########
& HARLAN AP Closed http://web6.seattle.gov/dpd/PermitStatus/Project.aspx?id=6535357
47.69318 -122.401 (47.6931848, -122.40056522)
6535360 Construction 2021 7TH AVE
Interior alterations
COMMERCIAL to southeast
ADD/ALT portionNo plan of review
16th floor,$2,000.00
subject to field
TAYLOR,
inspection
SCOTT#########
(STFI). ######### 11/17/2017 Permit Issued http://web6.seattle.gov/dpd/PermitStatus/Project.aspx?id=6535360
47.61524 -122.338 (47.61523711, -122.33836402)
6535364 Construction 515 WESTLAKE
Interior
AVEalterations
N COMMERCIAL to northwest
ADD/ALT portionNo planofreview
4th floor, $1,000.00
subject to field
TAYLOR,
inspection
SCOTT#########
(STFI). ######### 11/17/2017 Permit Issued http://web6.seattle.gov/dpd/PermitStatus/Project.aspx?id=6535364
47.62414 -122.339 (47.6241378, -122.33869307)
6521295 Construction 6227 27TH Add
AVE deck
NE toSINGLE
existingFAMILY
single
NEWfamily
/ DUPLEXresidence,
No plan reviewsubject to$5,000.00
field inspection
WAGNER, (STFI.)CHRIS
######### ######### 11/17/2017 Permit Issued http://web6.seattle.gov/dpd/PermitStatus/Project.aspx?id=6521295
47.67481 -122.299 (47.6748082, -122.29878777)
6535345 Construction 505 5TH AVEBlanket
S Permit
COMMERCIAL
for interior
ALTER non-structural
Plan Review
alterations $800,000.00
for 5th floorPATTERSON-O'HARE,
per plan. #########JODI Application BLANKET:
Accepted VULCAN
http://web6.seattle.gov/dpd/PermitStatus/Project.aspx?id=6535345
BUILDING 47.59866 -122.329 (47.59865997, -122.32855763)
6535324 Construction 5811 57TH Voluntary
AVE NE seismicSINGLEupgrade
FAMILY
ADD/ALT
to
/ DUPLEX
basement
Plan Review
of single family$5,000.00
residence,BEEMAN,
per plan ANN ######### Reviews Completed http://web6.seattle.gov/dpd/PermitStatus/Project.aspx?id=6535324
47.67073 -122.267 (47.67072758, -122.26702381)
6533231 Construction 10322 40THConstruct
AVE NE interior
SINGLEnon-structural
FAMILY
ADD/ALT
/ DUPLEXalterations
No plan reviewto the$165,000.00
main level of the REED,
exisitng
PHAN#########
single family#########
residence subject to field11/17/2017
inspection (STFI).
Permit Issued http://web6.seattle.gov/dpd/PermitStatus/Project.aspx?id=6533231
47.70365 -122.285 (47.70364638, -122.28519278)
6535333 Construction 5811 57TH Interior
AVE NE alterations
SINGLE FAMILY
to single
ADD/ALT
/ family
DUPLEXNo
residence,
plan reviewsubject$35,000.00
to field inspection
BEEMAN, (STFI)
ANN######### ######### 11/17/2017 Permit Issued http://web6.seattle.gov/dpd/PermitStatus/Project.aspx?id=6535333
47.67073 -122.267 (47.67072758, -122.26702381)
6522406 Construction 3121 WEST Establish
LAURELHURST
existing
SINGLEDRaccessory
NEFAMILY
NO CONSTRUCTION
/boathouse,
DUPLEXPlan Review
teahouse, and pergola $0.00for DEFOREST,
the record,JOHN #########
per plan Application Accepted http://web6.seattle.gov/dpd/PermitStatus/Project.aspx?id=6522406
47.64997 -122.279 (47.64997303, -122.27851736)
6535314 Site Development7309 30TH Hazard
AVE SWtree removal western TREE/VEGETATION
cedar.No planMAINT/RESTORE
review $0.00 TREECYCLE, ######### Application Accepted http://web6.seattle.gov/dpd/PermitStatus/Project.aspx?id=6535314
47.53702 -122.371 (47.53702139, -122.37145303)
6486870 Construction 9702 12TH Construct
AVE NW aSINGLE detached FAMILY
accessory
ADD/ALT
/ DUPLEX
dwelling
Plan Review
unit, per plans.
$36,837.00 ASSADI, GORDON ######### Application Accepted http://web6.seattle.gov/dpd/PermitStatus/Project.aspx?id=6486870
47.70035 -122.371 (47.70034807, -122.37114071)
6483121 Construction 1120 W BLAINE
Construct
ST alterations
SINGLE FAMILY toADD/ALT
existing
/ DUPLEX
single
Planfamily
Reviewresidence,$45,000.00
per plan. TEMPLETON,######### JULIE Application Accepted http://web6.seattle.gov/dpd/PermitStatus/Project.aspx?id=6483121
47.63496 -122.373 (47.63495572, -122.37260344)
6500278 Construction 6221 SW ADMIRAL
Construct WAY
one
SINGLE
half of
FAMILY
a ADD/ALT
shared/ DUPLEX
detached
Plan Review
garage, per plans
$12,503.00 LUTHI, CHRIS######### Application Accepted http://web6.seattle.gov/dpd/PermitStatus/Project.aspx?id=6500278
47.57571 -122.413 (47.57571242, -122.4131716)
6519185
6513394
Construction
Construction
6706 42ND Construct
AVE SW alterations
4625 UNIONChange
SINGLE FAMILY
BAY PLofNEuse
INSTITUTIONAL
and
ADD/ALT
from warehouse
addition
/ DUPLEX
ADD/ALTto UW
Plan
to anReview
existing single
Planlaboratory
Review and
$272,593.00
family residence,
$300,000.00
EDWARDS,
construct alteration
KIM, SANG
per plans
LEE
in an
#########
Y#########
Application Accepted http://web6.seattle.gov/dpd/PermitStatus/Project.aspx?id=6519185
47.54273 -122.385 (47.54272644, -122.38540572)
existing commercial building, occupy per plans. Application Accepted http://web6.seattle.gov/dpd/PermitStatus/Project.aspx?id=6513394
47.66295 -122.295 (47.66294548, -122.29522372) 25
6531461 Construction 3409 SW WEBSTER
Change useST COMMERCIAL
from residential
ADD/ALT to office,
Planoccupy
Reviewper plans$1,000.00 BELCHER, CRAIG ######### Application Accepted http://web6.seattle.gov/dpd/PermitStatus/Project.aspx?id=6531461
47.53539 -122.376 (47.53539418, -122.37558988)
1.4 Big Data - Size

26
1.4 Big Data - Sources
Lots of data is being collected and warehoused;

• Web data, e-commerce

• Financial transactions, bank/credit transactions

• Online trading and purchasing

• Social network

• Mobile devices

27
1.4 Big Data - Sources Internet of Things

Sources Big Data include:


• GPS
• Satellite remote sensing
• Aerial surveying
• Radar
User Generated (Web
• Sensor networks & Mobile)
Health/Scientific Computing
• Digital cameras
• Location of readings of RFID
• Internet of things

28
1.4 Big Data – Sources
It’s User-Generated Content…

https://www.redlands.edu/globalassets/depts/school-of-business/gisab/workshops-conferences/brian-hilton-icis_2015_bnh.pdf
1.4 Big Data - Sources
It’s Sensor Data…

https://www.redlands.edu/globalassets/depts/school-of-business/gisab/workshops-conferences/brian-hilton-icis_2015_bnh.pdf
1.4 Big Data – Sources
It’s all these “Smart” “Things”…

https://www.redlands.edu/globalassets/depts/school-of-business/gisab/workshops-conferences/brian-hilton-icis_2015_bnh.pdf
1.4 Big Data – Sources
1.4 Big Data – Characteristics

33
1.4 Big Data – Characteristics
Volume
• Sensors are expanding worldwide at a rapid rate.
• Digital cameras have reached several billion
through spatially-reference cell phones.
• One estimate indicates that 2.5 quintillion (2.5 with
18 zeros) bytes are generated daily worldwide.
Variety
Data appears in various forms (text, number, 2D, 3D,
etc.)
Velocity
Data is generated at a very high speed.
Veracity
Are there biases, noise and abnormality in data?
Is the data meaningful to the problem being analyzed?
1.4 Big Data - Application

Crowdsourcing + Physical modeling + Sensing + Data assimilation

to produce:

35
1.4 Big Data - Applications
• Politics • Companies leverage data to adapt products
• Transportation and services to:
• Supply Chain Management • Meet customer needs
• Optimize operations
• Public Safety
• Optimize infrastructure
• Urban Traffic • Find new sources of revenue
• Emergency Management • Can reveal more patterns and anomalies
• Healthcare
• Energy and Environment • IBM estimates that by 2015 4.4 million jobs
will be created globally to support big data
• Climate Science
(1.9 million of these jobs in USA)
• Marketing/Advertising
36
Overview
➢1.1 Background
➢1.2 Data Analytics
➢1.3 Terminology
➢1.4 Big Data
➢1.5 Type of Data
➢1.6 Type of Analytics
➢1.7 Challenges

37
1.5 Types of Data There are four types of data or
levels of measurement
•A variable is a unit of data
collection whose value can vary.
•Variables can be defined into types
according to the level of
mathematical scaling that can be
carried out on the data.

38
1.5 Types of Data

39
1.5 Types of Data – (i) Categorical/ Nominal
• Nominal or categorical data is data that comprises of
categories that cannot be rank ordered – each category
is just different.
• Categories bear no quantitative relationship to one
another

• Examples:
• Customer’s location (America, Europe, Asia)
• Employee classification (manager, supervisor,
technician)

• Therefore, nominal data reflect qualitative differences


rather than quantitative ones. 40
1.5 Types of Data – (i) Categorical/ Nominal - Examples

• True or False
• Color coded (Blue/Red /Yellow)
• Sex (Male / Female)
• Blood Group types
• Coin toss result (Tail/Head)
• Country (Britain/Germany)

41
1.5 Types of Data – (ii) Ordinal Data
• Ordinal data is data that comprises of categories that can be rank ordered.
• Similarly with categorical data the distance between each category cannot
be calculated but the categories can be ranked above or below each other.
 No fixed units of measurement
 Examples are:
▪ Size of T-shirt
▪ College football rankings
▪ Survey responses
▪ Income categories
▪ Course Grade point
▪ Age groups
42
1.5 Types of Data – (iii)Interval and (iv) ratio data
• Both interval and ratio data are examples of scale data.
• Scale data:
• Data is in numeric format ($50, $100, $150)
• Data that can be measured on a continuous scale
• The distance between each can be observed and as a result measured
• The data can be placed in rank order.

43
1.5 Types of Data – (iii) Interval data
• Ordinal data but with constant differences between observations
but don’t have a “true zero.”
• Example:
Temperature – moves along a continuous measure of degrees and is
without a true zero. (0 degree does not mean “no temperature”)
Examples
• Temperature (Fahrenheit)
• Temperature (Celsius)
• pH

44
1.5 Types of Data - (iv) Ratio data
Ratio data measured on a continuous scale and does have a natural zero point.
 Ratios are meaningful
 Examples:
▪ Monthly sales
▪ Delivery times
▪ Weight
▪ Height
▪ Age
▪ Pulse
▪ Time
▪ Length
45
1.5 Types of Data

46
1.3 Terminology - Summary

47
Overview
➢1.1 Background
➢1.2 Data Analytics
➢1.3 Terminology
➢1.4 Big Data
➢1.5 Type of Data
➢1.6 Types of Analytics
➢1.7 Challenges

48
1.6 Type of Analytic – Traditional Techniques
What is enabling them?
• Classification
• Clustering
• Regression
• Simulation
• Anomaly Detection
• Numerical Forecasting
• Optimization
• Geographic Mapping
• …

Limitations:
• They tend to work best with “Small Data”
• Challenges in handling the 3 V’s (volume, velocity, and variety)

from https://www.redlands.edu/globalassets/depts/school-of-business/gisab/workshops-conferences/brian-hilton-icis_2015_bnh.pdf
1.6 Type of Analytic - “Non-traditional” Techniques
• Text Analysis (Content Analysis)
• Ensemble methods • Appropriate for unstructured text. Opens up
• Combine multiple models, e.g. linear social media, call center conversations, etc. for
powerful analytics. Parse the text and use the
regression, decision tree, neural network, components to extract meaning, valence, and
spatial autocorrelation work together to yield feelings.
one answer. • Spatial Analysis
• Commodity models • Spatial sampling, auto-correlation, continuous
contours (ocean, air), etc.
• Apply complex models to address only the • Analytic Point Solutions
high-value data. • Software to solve very specific Big Data, Analytics
• For most of the data, use simple, less resource- problems.
intensive model(s) • Virtual Reality
• Modern Data Visualization • Google VR
• Can include fictional or actual geographic
• Multiple graphs and charts linked to the same mapping
underlying Big Data, and displayed in
• Machine Learning
Dashboards, including maps
• AI-based programs that can learn without having
• Space-Time slider visualizations, showing been specifically pre-programmed them for the
locational changes in a movie-like sequence. application.
• 3-D Displays. 3-D Mapping. • “Intelligent” Robotics is one type
• Neural networks verges on ML, but they are
often restricted to learning in specialized ways
Adapted from Bill Franks. “Taming the big data tidal wave”. Wiley, 2012
1.6 Types of Analytics - Models
 Representation of a real system, idea or object
 Captures the most important features
 Can be a written description, a visual display, a mathematical
formula, or a spreadsheet representation
 Are used to understand, analyze, or facilitate decision making.
 Types of model input
- Data
- Uncontrollable variables
- Decision variables (controllable)

1-51
1.6 Data Analytics (DA) - Types
 Descriptive analytics
- uses data to understand past
and present
 Diagnostic analytics
- a form of advanced analytics
that examines data or content to
answer the question, “Why did it
happen?”
 Predictive analytics
- analyzes past performance
 Prescriptive analytics
- uses optimization techniques

1-52
1.6 Data Analytic
How do we use them for Analysis?

(source: courtesy of Brian Hilton)


1.6. Types of Analytics – (i) Descriptive Analytics Models
What has occurred?
 Descriptive analytics focuses on summarizing
and highlighting patterns in current and
historical data, which helps companies
understand what has happened to date.
 Descriptive analytics, such as reporting /Online
analytical processing (OLAP), dashboards, and
data visualization, is important in helping users
interpret the output.
 Simply tell “what is”, to identify trends and
relationships.
 Do not tell managers what to do

1-54
1.6. Types of Analytics – (ii) Diagnostics Analytics
Why did it happen?
The purpose of diagnostic
analytics is to determine the
root cause of an occurrence
or trend. Often, a trend is
identified using descriptive
analysis step. The company
can then apply diagnostic
analytics to understand why
the trend occurred

1-55
1.6 Types of Analytics - (iii) Predictive Analytics Models
What will occur?

• Predictive analytics is a branch of advanced


analytics that makes predictions about future
outcomes using historical data combined with
statistical modeling, data mining techniques
and machine learning.
• Predictive Analytics models often incorporate
uncertainty to help managers analyze risk.
• Aim to predict what will happen in the future.
• Algorithms for predictive analytics are such as
regression analysis, machine learning, and
neural networks.
1-56
1.6 Types of Analytics - (iv) Prescriptive Analytics Models
What should occur?
• Prescriptive analytics is the process of using data to determine an optimal course of action.
By considering all relevant factors, this type of analysis yields recommendations for next
steps.
• Prescriptive analytics are often referred to as advanced analytics. Examples are
• Regression analysis, Machine learning, Neural networks
• Often for the allocation of scarce resources
• Use mathematical programming for revenue management is common for organizations that
have “perishable” goods (e.g., rental cars, hotel rooms, airline seats).

57
1.6 Types of Analytics - (iv) Prescriptive Analytics Models
Prescriptive Decision Models help decision makers identify the best solution.
 Optimization - finding values of decision variables that minimize (or maximize)
something such as cost (or profit).
Marketing and sales perspective, prescriptive analytics can be used to:
➢ Optimize the assortment of products in a retail store.
➢ optimally price items and services.
➢ find the best mix of marketing methods (online, print, radio, etc.)
➢ negotiate a better contract with customers and vendors.
Transportation and its logistics study, use prescriptive analytics to:
➢ Improve driver retention to reduce training costs.
➢ eliminate unnecessary driving, flight, and sea transportation miles.
➢ increase driver productivity by improving routes and eliminating wait times to load/unload.
➢ increase speeds and reduce costs by optimizing distribution networks.
1-58
1.6 Types of Analytics - Summary

59
Overview
➢1.1 Background
➢1.2 Data Analytics
➢1.3 Terminology
➢1.4 Big Data
➢1.5 Type of Data
➢1.6 Type of Analytics
➢1.7 Challenges

60
1.7 Challenges
Some of the challenges of data analytics are:
• The Bottleneck is in Technology
New architecture, algorithm and techniques are
needed
• Technical skill
• Experts in using new technology and dealing
with new skill
• How will Big Data affect organizational processes.
• One possible trend is towards centralization of
data in the Cloud, after decades of
decentralization
• Privacy
• Concern about privacy invasion and targeting
from Big Data.
• How will Big Data and Analytics change decision-
making.
• To what extent will human managers and
decision-makers override the results of Big
Data. 61
1.7 Challenges
• Data Base – Historical data may not
be fully documented, very
complicated due to manual process.
• There are more data sources than
you initially thought
• The data is not as clean as you
thought
• Historical data will have a different
format and will be difficult to merge
• Not everyone agrees on what the
‘systems of record’
• Resources may not be available

27
1.7 Challenges – Storage Volume

63
1.7 Challenges - Converting big data
➢ Converting big data Tricky process into valuable insights

64
1.7 Challenges - Technologies
➢ Confusing variety of big data technologies

65
1.7 Challenges – Cost
➢ Cost is expensive

66
1.7 Challenges - Security
High risk big data security loopholes

67
68

You might also like