1 Introduction To Data Science
1 Introduction To Data Science
Introduction
Curiosity
Curiosity: Only when you ask questions, you will have a better understanding of the
business problem.
Common Sense: To identify new ways to solve a business problems and to detect
priority problems.
Communication Skills: A Data Scientist needs to communicate their findings to
business teams to act upon the insights
Skills required for Data Scientist
Domain Knowledge:
• To get useful information out of raw data that
benefits a company’s business.
• Know about the business model of the
company .
• Ask the right questions to produce valuable
results.
Math Skills:
• Linear Algebra, Calculus, and other concepts
of mathematics help us to understand the
complex behavior of Machine Learning
algorithms.
• Probability and statistics are mainly used in
predictive modeling and clustering.
Skills required for Data Scientist
Computer Science:
• To implement Data Science techniques using programming
languages like Python, R, SQL, Scala, Julia, JavaScript, etc.
• To deal with varied databases and loud networks to process the
data.
• Knowledge about algorithms, relational and non-relational
databases, Distributed Computing, and Machine Learning.
Communication Skills:
• To have good communication when working in team.
• To draw conclusions from the data analysis and make
presentation.
Data Science Three skill tracks:
Engineering
• Involves in building the data pipeline infrastructure.
• It involves the software and the hardware used to store the data and perform
data ETL (i.e., extract, transform, and load).
• Store and compute data on the cloud.
• The fundamental building block for automation is maintaining the data
pipeline through modular, well-commented code and version control.
• Key task involved are:-
Engineering
• Key task involved are:-
1. Data Environment: Designing and setting up the entire environment to
support data, science workflow is the prerequisite for data science
projects. It may include setting up storage in the cloud, Kafka
platform, Hadoop and Spark cluster, etc
2. Data Management: Automated data collection, that includes parsing
the logs (depending on the stage of the company and the type of
industry you are in), web scraping, API queries, and interrogating data
streams. Determine and construct data schema to support analytical
and modeling needs. Use tools, processes, guidelines to ensure data is
correct, standardized, and documented.
3. Production: Involves the whole pipeline from data access,
preprocessing, modeling to final deployment. It is necessary to make
the system work smoothly with all existing software stacks.
Data Science Three skill tracks:
Analysis
• Analysis turns raw information into insights in a fast and often exploratory
way.
• In general, an analyst needs to have decent domain knowledge, do
exploratory analysis efficiently, and present the results using storytelling.
• Key point includes are:-