Lesson1 Introduction To The Data Science Process and The Value of Learning Data Science
Lesson1 Introduction To The Data Science Process and The Value of Learning Data Science
Lesson 1: Introduction to the data science process and the value of learning data science
This is a course for passionately curious that want to work with Data to:
1. Help business leverage data for innovation and success
2. Innovate and predict future trends in business and other industries
3. Learn how to analyze data and provide data- driven insight to make
decisions
Learning Objectives
o Data science scope - data science will deal with everything, from
analyzing complex data, creating new analytics algorithms and tools for
data processing and purification, and even building powerful, useful
visualizations.
1|P ag e
Customer Prediction - System can be trained based on customer
behavior patterns to predict the likelihood of a customer buying a product
Service Planning - Restaurants can predict how many customers will
visit on the weekend and plan their food inventory to handle the demand
Data Terminologies:
Big data - refers to any large and complex collection of data.
Data analytics - is the process of extracting meaningful information from data.
Data science - is a multidisciplinary field that aims to produce broader insights.i.e
Scientific methods
Maths and statistics
Programming
Advanced analytics
ML and AI
Deep learning
2|P ag e
Data analytics
3|P ag e
This includes programming languages like R, Python, Julia, which can be
used to create new algorithms, ML models, AI processes for big data platforms like
Apache Spark and Apache Hadoop.
Data processing and purification tools -such as
o Winpure, Data Ladder.
Data visualization tools -such as
o Microsoft Power Platform, Google Data Studio, Tableau to
visualization frameworks like
matplotlib and ploty can also be considered as data science
tools.
As data science covers everything related to data, any tool or technology that
is used in Big Data and Data Analytics can somehow be utilized in the Data
Science process.
There are several other reasons why Python is one of the most used
programming languages for data science, including:
4|P ag e
3. Data Engineer - Responsible for designing, building, and maintaining data
. pipelines.
They need to test the ecosystem for the businesses and prepare for
data scientist to run their algorithms.
4. Data Storyteller - find the narrative that best describes
5|P ag e
How to Install Anaconda! Video. Link.
https://www.youtube.com/watch?v=T8wK5loXkXg
6|P ag e