Data Science Process and Machine Learning
Data Science Process and Machine Learning
2. Retrieving data
3. Data preparation
4. Data exploration
5. Data modeling
This step involves acquiring data from all the identified internal and external
sources, which helps to answer the business question.
Data can have many inconsistencies like missing values, blank columns, an
incorrect data format, which needs to be cleaned. We need to process, explore
and condition data before modeling. The cleandata, gives the better predictions.
In this step, the actual model building process starts. Here, Data scientist
distributes datasets for training and testing. Techniques like association,
classification and clustering are applied to the training data set. The model, once
prepared, is tested against the "testing" dataset.
Deliver the final baselined model with reports, code and technical documents in
this stage. Model is deployed into a real-time production environment after
thorough testing. In this stage, the key findings are communicated to all
stakeholders. This helps to decide if the project results are a success or a failure
based on the inputs from the model.
What is Machine Learning?
Machine learning algorithms can be trained in many ways, with each method
having its pros and cons. Based on these methods and ways of learning,
machine learning is broadly categorized into four main types:
For example, consider an input dataset of parrot and crow images. Initially, the
machine is trained to understand the pictures, including the parrot and crow’s
color, eyes, shape, and size. Post-training, an input picture of a parrot is
provided, and the machine is expected to identify the object and predict the
output. The trained machine checks for the various features of the object, such
as color, eyes, shape, etc., in the input picture, to make a final prediction. This is
the process of object identification in supervised machine learning.
The primary objective of the supervised learning technique is to map the input
variable (a) with the output variable (b). Supervised machine learning is further
classified into two broad categories:
3. Semi-supervised learning
Unlike supervised learning, reinforcement learning lacks labeled data, and the
agents learn via experiences only. Consider video games. Here, the game
specifies the environment, and each move of the reinforcement agent defines its
state. The agent is entitled to receive feedback via punishment and rewards,
thereby affecting the overall game score. The ultimate goal of the agent is to
achieve a high score.