Machine learning assignment (3)
Machine learning assignment (3)
1,What is mean input and output variables in ML? Which is dependent and
independent variable ?
Input Variables
Output Variable
Input variable is Independent Variable: It stands alone and is not affected by other
variables in the dataset.
Output variable is Dependent Variable: It depends on the independent variables. In
ML, the goal is to find a relationship or function that maps the independent variables to
the dependent variable.
Model
Algorithm
Parametric ML algorithm
Non-Parametric ML Algorithms
These algorithms do not make strong assumptions about the data and have no fixed
number of parameters. The complexity of the model can grow with the amount of data.
Non-parametric algorithms uses a flexible number of parameters, and the number of
parameters often grows as it learns from more data.
Non-parametric algorithms uses a flexible number of parameters, and the number of
parameters often grows as it learns from more data.
Model Structure: The model complexity can grow with the amount of data (e.g., the
number of training samples).
Training Speed: Typically slower to train, especially with large datasets, since they may
involve storing the entire dataset or a large portion of it.
4, Write the difference between over fitting and under fitting. Explain the cause of
over fitting and under fitting.
overfitting
Over fitting refers to a model learns the training data too well but not generalizing well to new
data,.
High accuracy on the training dataset but poor performance on the validation/test dataset.
The model is overly complex, often having too many parameters relative to the amount of
training data.
It reflects a situation where the model memorizes the training data instead of learning
general patterns.
Overfitting occurs when a model learns the details and noise in the training data to the
extent that it negatively impacts its performance on new data. In essence, the model
becomes too complex and captures patterns that do not generalize.
underfitting
Under fitting refers to a model that can neither well the training data not generalize to new
data. It failing to learn the problem from the training data sufficiently.
Causes due to Too Simple Models: Using overly simplistic models (e.g., a linear model
for a non-linear relationship) which cannot capture the data's complexity.
Causes due to Insufficient Training: Not training the model long enough for it to learn
from the training data effectively.
Causes due to Excessive Feature Reduction: Removing too many features can lead to
loss of important information necessary for making accurate predictions.
Underfitting occurs when a model is too simple to capture the underlying structure of the data.
It fails to learn the relationships in the data, leading to poor performance on both the training
and test datasets.
Using data effectively in machine learning (ML) is crucial for building models that generalize well
to unseen data. The process involves several steps
Gather data from various sources, which can include databases, external APIs, web
scraping, or existing datasets.
Ensure that the data is relevant to the problem you're trying to solve.
2, Data Understanding:
Explore and analyze the data to understand its structure and characteristics.
3,Data Cleaning:
Select relevant features that contribute to the prediction of the target variable.
Create new features from existing ones (feature engineering) to improve model
performance. This might involve combining, transforming, or encoding variables.
5, Data Splitting:
Divide the dataset into training, validation, and test sets. Common splits are 70% for
training, 15% for validation, and 15% for testing.
6, Model Training:
Choose an appropriate algorithm and train the model using the training data.
Adjust the model's parameters to minimize prediction errors.
7,Model Evaluation:
8,Model Tuning:
9, Deployment:
Once the model is trained and evaluated, it can be deployed to make predictions on new
data in a production environment.
Attributes or features are individual measurable properties or characteristics of the data being used in
the machine learning model. They are the input variables that the model uses to make predictions.
Types of Features:
Numerical Features: Continuous numerical values (e.g., age, temperature, salary) that
can be further categorized into:
o Continuous: Values can take on any real number (e.g., height, price).
o Discrete: Countable values (e.g., number of children, number of cars).
Categorical Features: Represent discrete categories or groups (e.g., gender, color, city).
They can be further divided into:
o Nominal: No inherent order (e.g., red, blue, green).
o Ordinal: There is an order or ranking (e.g., ratings from 1 to 5).
Binary Features: A specific type of categorical feature that has only two values (e.g.,
yes/no, true/false).
etc
Traditional ML algorithms require carefully handcrafted features also called feature engineering. It uses
external feature extraction algorithms and the extracted features depend on the algorithms.
Feature Engineering is a crucial step in the machine learning (ML) process that involves creating,
selecting, and transforming features (attributes) from raw data to improve the performance of machine
learning models. The goal of feature engineering is to provide the models with the most informative and
relevant data, enabling them to make better predictions or classifications.
Feature engineering is an iterative and creative process that requires domain knowledge, analytical
skills, and a deep understanding of the data. It plays an essential role in building effective machine
learning models and is often what distinguishes successful models from those that fail to perform well.
TO INSTRUCTOR SIMON H.
DUE DATE DECEMBER 20