Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Introduction To Data Science

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 16

INTRODUCTION TO DATA SCIENCE

OUTLINE

• Data, Big Data and Challenges


• Data Science
• Introduction
• Why Data Science
• Data Scientists
• What do they do?
• Major/Concentration in Data Science
• What courses to take.
DATA ALL
AROUND
• Lots of data is being collected
and warehoused
• Web data, e-commerce
• Financial transactions,
bank/credit transactions
• Online trading and
purchasing
• Social Network
HOW MUCH
DATA DO WE
HAVE?
• Google processes 20 PB a day
(2008)

• Facebook has 60 TB of daily logs


• eBay has 6.5 PB of user data + 50
TB/day (5/2009)

• 1000 genomes project: 200 TB

• Cost of 1 TB of disk: $35


• Time to read 1 TB disk: 3 hrs
(100 MB/s)
Big Data is any data that is expensive to manage and
hard to extract value from
• Volume
• The size of the data
• Velocity
BIG DATA • The latency of data processing relative to the
growing demand for interactivity
• Variety and Complexity
• the diversity of sources, formats, quality,
structures.
BIG DATA
• “… the sexy job in the next 10 years will be statisticians,”
Hal Varian, Google Chief Economist

• The U.S. will need 140,000-190,000 predictive analysts


and 1.5 million managers/analysts by 2018. McKinsey
BIG DATA Global Institute’s June 2011

AND DATA • New Data Science institutes being created or repurposed


– NYU, Columbia, Washington, UCB,...
SCIENCE • New degree programs, courses, boot-camps:
• e.g., at Berkeley: Stats, I-School, CS, Astronomy…
• One proposal (elsewhere) for an MS in “Big Data
Science”
WHAT IS DATA SCIENCE?

• An area that manages, manipulates, extracts, and interprets knowledge from


tremendous amount of data
• Data science (DS) is a multidisciplinary field of study with goal to address
the challenges in big data
• Data science principles apply to all data – big and small

https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/
• Theories and techniques from many fields and disciplines
are used to investigate and analyze a large amount of data
to help decision makers in many industries such as
science, engineering, economics, politics, finance, and
education
WHAT IS • Computer Science

DATA • Pattern recognition, visualization, data


warehousing, High performance computing,
SCIENCE? Databases, AI
• Mathematics
• Mathematical Modeling
• Statistics
• Statistical and Stochastic modeling, Probability.
REAL LIFE EXAMPLES

• Companies learn your secrets, shopping patterns, and preferences


• For example, can we know if a woman is pregnant, even if she doesn’t want us to
know? Target case study

• Data Science and election (2008, 2012)


• 1 million people installed the Obama Facebook app that gave access to info on “friends”
DATA SCIENTISTS

• Data Scientist
• The Sexiest Job of the 21st Century
• They find stories, extract
knowledge. They are not reporters
DATA SCIENTISTS

• Data scientists are the key to


realizing the opportunities
presented by big data. They bring
structure to it, find compelling
patterns in it, and advise
executives on the implications for
products, processes, and decisions
• National Security

WHAT DO • Cyber Security


DATA •
Business Analytics
Engineering
SCIENTIST • Healthcare
S DO? • And more ….
• Mathematics and Applied Mathematics

• Applied Statistics/Data Analysis

CONCENTRATI • Solid Programming Skills (R, Python, Julia, SQL)


ON IN DATA • Data Mining
SCIENCE • Data Base Storage and Management

• Machine Learning and discovery

You might also like