Lecture 1_ Introduction to Data Science
Lecture 1_ Introduction to Data Science
Science
Why Data is important?
● Improve Peoples’ life
○ Health monitoring, AI based diagnosis
● Quality monitoring
○ TDS alarm, Low cartridge alarm, monitoring a complex system with
number of parameters
● Stop guessing
○ “I think this would work” – no more trial, go with data.
● Example of OLA/UBER/OYO
● New models are estimating which cities are most at risk for
spread of the Ebola virus.
… Health/Scientific
Internet of Things / M2M Computing
Datafication
● How to quantify friendship?
● How to rate a product?
● Taking all aspects of life and turning them
into data
○ Google’s augmented-reality glasses datafy the gaze
○ Linked in datafy our professional network
● When we like something or someone online
then we are helping in datafying something.
How Big the data is
● There are 2.5 Exabyte (1 Exabyte = 1018 byte) of data created each day
● Internet
○ More than 3.7 billion humans use the internet
○ On average, Google now processes more than 3.5 billion searches per day
● Digital Photo
○ People takes around 1.2 trillion photos per day
Data generated in a Day
The Data Equation
Oceans of Data
Drops of
Understanding
(Nix 1984)
What is Data Science?
Like any emerging field, it isn’t yet well defined,
but incorporates elements of:
● Exploratory Data Analysis and Visualization
● Machine Learning and Statistics
● High-Performance Computing technologies
for dealing with scale.
What is Data Science?
● Data science is an interdisciplinary field that uses
scientific methods, processes, algorithms and systems to
extract knowledge and insights from data in various forms.
● Datasets:
○ https://www.kaggle.com/datasets?tagids=3022
○ https://www.data.gov/
○ https://data.gov.in/
● Some Project Ideas:
○ https://www.analyticsvidhya.com/blog/2018/05/24-ultimate-
data-science-projects-to-boost-your-knowledge-and-skills/
○ Kdnuggets
○ https://www.analyticsindiamag.com/popular-data-science-
projects-for-aspiring-data-scientists
Course Evaluation
● Quiz:4-6 (40)
● Mid Sem: quiz (10)+project/assignment(15)
● End Sem: quiz(15) + project/assignment(20)
Reference Books
● The Data Science Design Manual, Skiena
● Probability and Statistics for Engineers and
Scientists, Ronald E Walpole, Raymond H
Myers, Sharon L Myers, Keying E Ye