Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Introduction
to
Data
Science
Introduction: What is Data Science?
• Data science is one of the most promising and in-demand career
paths for skilled professionals.
• The term “data scientist” was coined as recently as 2008.
• It is a blending of three application: Data, Business and Statistics
Size of the Data is increasing
• The amount of data in the world was
estimated to be 44 zettabytes at the
dawn of 2020.
• By 2025, the amount of data generated
each day is expected to reach 463
exabytes globally.
• Google, Facebook, Microsoft, and
Amazon store at least 1,200 petabytes of
information.
• The world spends almost $1 million per
minute on commodities on the Internet.
• Electronic Arts process roughly 50
terabytes of data every day.
• By 2025, there would be 75 billion Internet-
of-Things (IoT) devices in the world
• By 2030, nine out of every ten people
aged six and above would be digitally
active.
What Does a Data Scientist Do?
• Understand the business problem
• How can I improve the sells of an e-commerce platform?
• Analise the data provided by the company
• The sell data, what and when products customer buy (before festival), How many
time they spend on the app (app time) etc.
• Visualize the data and get an intuition
• Visualize all the data and try to find any pattern.
• Recommend a product/service based on the past data
• Which product to recommend to the user (if he is buying a phone, recommend
him for its accessories/warranty or similar phones)
• Predict future uncertainties/values
• Predict the sell/revenue/stock price of the company
Data
Science
tasks
Data Science as a multidisciplinary domain
• Data Science is a multidisciplinary domain which consists of many
other domains.
• The following Venn diagram will explain. There are three main
domains that primarily includes Data Science:
• For theory:
• Theoretical Computer Science
• Statistics (why)
• For Application:
• Application Oriented Computer Science
• Different tools: python/R/Java
• Software Development
Where data science
comes from?
• Core of Data Science is Statistics
with a wrapping package +
computer science
• The other domain which handles
data is Statistics but without
computer, as it was unavailable
until recently
Applications of
Data Science
Example-1: Email Spam detection
• One of the classic example.
Example-2: Medical
Diagnosis
• Input: Symptoms (Fever, Cough,
nausea, pain)
• Output: Diagnosis (Covid-19, Common
cold, pneumonia)
• Assuming that, there are only these
possible disease.
• An Example of multiclass
classification
• There is some uncertainty such as 20%
sure: Covid19 and 80% sure about a
common cold.
• Probabilistic or soft classification
(soft computing)
Example-3: predicting
a stock price
• Input: History of
stock prices
• Output: Price of
the stock at the
nearest future
Example-4: Self-driving car
• Input: Road
conditions/traffics signals/
crowd
• Sensors: Camera, IR,
radar etc.
• Output: directions of the
vehicle, speed, acceleration
etc.
Essence of Data Science
1. Exploratory analysis: Discover the structure within the data. E.g.:
Experience (in years in a company) and salary are correlated.
1. Unsupervised learning
2. Predictive Analysis: This is sometimes described as “learn from
the past to predict the future”.
1. Supervised learning
Theoretical Aspects of Data Science
Practical Aspects of Data Science
Some buzz words
• Data Science
• Machine Learning
• Statistics
• Big Data
• Artificial Intelligence
• speech recognition
• Supervised Machine Learning
• Unsupervised Machine Learning
• Data Mining
• Soft Computing
• Artificial Intelligence (AI)
• Deep Learning
• Artificial Neural Network (ANN)
• Explainable AI (XAI)
• Speech recognition
Example 1: Data Science and Statistics
Example 2: Data Science and Software development
Example 3: Data Science and Soft Computing
Example 4: Data Science and Theoretical Comp. Sc.
A word of caution: Don’t Use ML everywhere
Not all problems are machine learning problems. It is important to know when to
(not) use machine learning.
1. If you have a deterministic logic that solves it with 100% accuracy, then obviously that is
cheaper, easier, and more accurate than any ML model one can make.
2. If you have some stable heuristic rules that solve it most of the time, the extra
work/complexity of ML might not be worth it.
3. If your heuristics does not work up to the desired accuracy, and requires constant
updates, then ML can be a good bet.
Applications

More Related Content

DataScience_introduction.pdf

  • 2. Introduction: What is Data Science? • Data science is one of the most promising and in-demand career paths for skilled professionals. • The term “data scientist” was coined as recently as 2008. • It is a blending of three application: Data, Business and Statistics
  • 3. Size of the Data is increasing • The amount of data in the world was estimated to be 44 zettabytes at the dawn of 2020. • By 2025, the amount of data generated each day is expected to reach 463 exabytes globally. • Google, Facebook, Microsoft, and Amazon store at least 1,200 petabytes of information. • The world spends almost $1 million per minute on commodities on the Internet. • Electronic Arts process roughly 50 terabytes of data every day. • By 2025, there would be 75 billion Internet- of-Things (IoT) devices in the world • By 2030, nine out of every ten people aged six and above would be digitally active.
  • 4. What Does a Data Scientist Do? • Understand the business problem • How can I improve the sells of an e-commerce platform? • Analise the data provided by the company • The sell data, what and when products customer buy (before festival), How many time they spend on the app (app time) etc. • Visualize the data and get an intuition • Visualize all the data and try to find any pattern. • Recommend a product/service based on the past data • Which product to recommend to the user (if he is buying a phone, recommend him for its accessories/warranty or similar phones) • Predict future uncertainties/values • Predict the sell/revenue/stock price of the company
  • 6. Data Science as a multidisciplinary domain • Data Science is a multidisciplinary domain which consists of many other domains. • The following Venn diagram will explain. There are three main domains that primarily includes Data Science: • For theory: • Theoretical Computer Science • Statistics (why) • For Application: • Application Oriented Computer Science • Different tools: python/R/Java • Software Development
  • 7. Where data science comes from? • Core of Data Science is Statistics with a wrapping package + computer science • The other domain which handles data is Statistics but without computer, as it was unavailable until recently
  • 9. Example-1: Email Spam detection • One of the classic example.
  • 10. Example-2: Medical Diagnosis • Input: Symptoms (Fever, Cough, nausea, pain) • Output: Diagnosis (Covid-19, Common cold, pneumonia) • Assuming that, there are only these possible disease. • An Example of multiclass classification • There is some uncertainty such as 20% sure: Covid19 and 80% sure about a common cold. • Probabilistic or soft classification (soft computing)
  • 11. Example-3: predicting a stock price • Input: History of stock prices • Output: Price of the stock at the nearest future
  • 12. Example-4: Self-driving car • Input: Road conditions/traffics signals/ crowd • Sensors: Camera, IR, radar etc. • Output: directions of the vehicle, speed, acceleration etc.
  • 13. Essence of Data Science 1. Exploratory analysis: Discover the structure within the data. E.g.: Experience (in years in a company) and salary are correlated. 1. Unsupervised learning 2. Predictive Analysis: This is sometimes described as “learn from the past to predict the future”. 1. Supervised learning
  • 14. Theoretical Aspects of Data Science
  • 15. Practical Aspects of Data Science
  • 16. Some buzz words • Data Science • Machine Learning • Statistics • Big Data • Artificial Intelligence • speech recognition • Supervised Machine Learning • Unsupervised Machine Learning • Data Mining • Soft Computing • Artificial Intelligence (AI) • Deep Learning • Artificial Neural Network (ANN) • Explainable AI (XAI) • Speech recognition
  • 17. Example 1: Data Science and Statistics
  • 18. Example 2: Data Science and Software development
  • 19. Example 3: Data Science and Soft Computing
  • 20. Example 4: Data Science and Theoretical Comp. Sc.
  • 21. A word of caution: Don’t Use ML everywhere Not all problems are machine learning problems. It is important to know when to (not) use machine learning. 1. If you have a deterministic logic that solves it with 100% accuracy, then obviously that is cheaper, easier, and more accurate than any ML model one can make. 2. If you have some stable heuristic rules that solve it most of the time, the extra work/complexity of ML might not be worth it. 3. If your heuristics does not work up to the desired accuracy, and requires constant updates, then ML can be a good bet.