Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
7 views

Machine learning assignment (3)

The document is an individual assignment by Solomon Abrha from MIT, discussing key concepts in machine learning, including input and output variables, the difference between models and algorithms, and the distinctions between parametric and non-parametric algorithms. It also covers overfitting and underfitting, the process of using data in ML, and the importance of feature engineering. The assignment emphasizes the significance of data preparation and feature selection in building effective machine learning models.

Uploaded by

selemunabrha276
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Machine learning assignment (3)

The document is an individual assignment by Solomon Abrha from MIT, discussing key concepts in machine learning, including input and output variables, the difference between models and algorithms, and the distinctions between parametric and non-parametric algorithms. It also covers overfitting and underfitting, the process of using data in ML, and the importance of feature engineering. The assignment emphasizes the significance of data preparation and feature selection in building effective machine learning models.

Uploaded by

selemunabrha276
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

MU -- MIT

MACHINE LEARNING-INDIVISUAL ASSIGNMENT


NAME SOLOMON ABRHA
ID - MIT/UR/122/12

1,What is mean input and output variables in ML? Which is dependent and
independent variable ?

Input Variables

 Also known as features, predictors, and it is independent variables.


 These are the variables or attributes used to predict the output variable.
 They are the input to the model and represent the information or data you provide to the
algorithm for analysis or prediction.

 Example: In a house price prediction problem, input variables could be:


o Number of bedrooms
o Square footage
o Location of the house

Output Variable

 Also known as the target variable, label, and it is dependent variable.


 This is the variable the model is trying to predict or explain.
 It is dependent on the input variables since its value is determined by them.
 Example: In the same house price prediction problem, the output variable is:
o Price of the house

 Input variable is Independent Variable: It stands alone and is not affected by other
variables in the dataset.
 Output variable is Dependent Variable: It depends on the independent variables. In
ML, the goal is to find a relationship or function that maps the independent variables to
the dependent variable.

2. Write the difference between model and algorithm ?

Model

 A model is a mathematical representation of a real-world process. It is the result of applying an


algorithm to data and is used to make predictions or decisions based on new data.
 A model can make predictions on unseen data based on the patterns it has learned.
 A model is the final output of an ML algorithm after it has been trained on data. It represents
the learned patterns, relationships, or rules that the algorithm discovered in the training data.
 Examples include linear regression models, decision trees, neural networks, etc.

Algorithm

 An algorithm is a set of rules or instructions for solving a problem or performing a


task. In the context of machine learning, it refers to the procedure used to learn the
model from data.
 An algorithm is a set of mathematical instructions or a procedure that is used to train a model
by finding patterns in data.
 Used To process data, optimize the model parameters, and derive the model.
 Examples include gradient descent, random forest, support vector machines, etc.

3, Write the difference between parametric and nonparametric ML algorithms.

Parametric ML algorithm

 A parametric algorithm has a fixed number of parameters.


 Parametric methods make large assumptions about the mapping of the input variables to the
output variable.
 Parametric machine learning algorithms simply the mapping to a know functional form
 Its Model Structure is defined by a fixed number of parameters (e.g., coefficients in
linear regression).
 Its Training Speed is Generally faster to train because they require estimating a limited
number of parameters.

Non-Parametric ML Algorithms

 These algorithms do not make strong assumptions about the data and have no fixed
number of parameters. The complexity of the model can grow with the amount of data.
 Non-parametric algorithms uses a flexible number of parameters, and the number of
parameters often grows as it learns from more data.
 Non-parametric algorithms uses a flexible number of parameters, and the number of
parameters often grows as it learns from more data.
 Model Structure: The model complexity can grow with the amount of data (e.g., the
number of training samples).
 Training Speed: Typically slower to train, especially with large datasets, since they may
involve storing the entire dataset or a large portion of it.

4, Write the difference between over fitting and under fitting. Explain the cause of
over fitting and under fitting.

overfitting
 Over fitting refers to a model learns the training data too well but not generalizing well to new
data,.
 High accuracy on the training dataset but poor performance on the validation/test dataset.

 The model is overly complex, often having too many parameters relative to the amount of
training data.
 It reflects a situation where the model memorizes the training data instead of learning
general patterns.
 Overfitting occurs when a model learns the details and noise in the training data to the
extent that it negatively impacts its performance on new data. In essence, the model
becomes too complex and captures patterns that do not generalize.

underfitting

 Under fitting refers to a model that can neither well the training data not generalize to new
data. It failing to learn the problem from the training data sufficiently.
 Causes due to Too Simple Models: Using overly simplistic models (e.g., a linear model
for a non-linear relationship) which cannot capture the data's complexity.
 Causes due to Insufficient Training: Not training the model long enough for it to learn
from the training data effectively.
 Causes due to Excessive Feature Reduction: Removing too many features can lead to
loss of important information necessary for making accurate predictions.
 Underfitting occurs when a model is too simple to capture the underlying structure of the data.
It fails to learn the relationships in the data, leading to poor performance on both the training
and test datasets.

5, Explain the way to use data in ML. Describe attribute or feature.

 Using data effectively in machine learning (ML) is crucial for building models that generalize well
to unseen data. The process involves several steps

1,problem understanding and Data Collection:

 Gather data from various sources, which can include databases, external APIs, web
scraping, or existing datasets.
 Ensure that the data is relevant to the problem you're trying to solve.

2, Data Understanding:

 Explore and analyze the data to understand its structure and characteristics.

3,Data Cleaning:

 Handle missing values, duplicates, and outliers.


 Correct inconsistencies and format the data properly to ensure quality inputs for the
model.
4, Feature Selection and Engineering:

 Select relevant features that contribute to the prediction of the target variable.
 Create new features from existing ones (feature engineering) to improve model
performance. This might involve combining, transforming, or encoding variables.

5, Data Splitting:

 Divide the dataset into training, validation, and test sets. Common splits are 70% for
training, 15% for validation, and 15% for testing.

6, Model Training:

 Choose an appropriate algorithm and train the model using the training data.
 Adjust the model's parameters to minimize prediction errors.

7,Model Evaluation:

 Test the trained model using the validation/test dataset.


 Use performance metrics (like accuracy, precision, recall, F1-score, etc.) to evaluate how
well the model predicts on unseen data.

8,Model Tuning:

 Fine-tune the model's hyperparameters, structure, or features based on evaluation results.


 This may involve techniques such as cross-validation.

9, Deployment:

 Once the model is trained and evaluated, it can be deployed to make predictions on new
data in a production environment.

10, Monitoring and Maintenance:

 Continuously monitor the model's performance over time.


 Update the model and data as necessary to ensure it remains relevant and accurate.

Attributes or features are individual measurable properties or characteristics of the data being used in
the machine learning model. They are the input variables that the model uses to make predictions.

Types of Features:
 Numerical Features: Continuous numerical values (e.g., age, temperature, salary) that
can be further categorized into:
o Continuous: Values can take on any real number (e.g., height, price).
o Discrete: Countable values (e.g., number of children, number of cars).
 Categorical Features: Represent discrete categories or groups (e.g., gender, color, city).
They can be further divided into:
o Nominal: No inherent order (e.g., red, blue, green).
o Ordinal: There is an order or ranking (e.g., ratings from 1 to 5).
 Binary Features: A specific type of categorical feature that has only two values (e.g.,
yes/no, true/false).
 etc

6. What is feature engineering ?

Traditional ML algorithms require carefully handcrafted features also called feature engineering. It uses
external feature extraction algorithms and the extracted features depend on the algorithms.

Feature Engineering is a crucial step in the machine learning (ML) process that involves creating,
selecting, and transforming features (attributes) from raw data to improve the performance of machine
learning models. The goal of feature engineering is to provide the models with the most informative and
relevant data, enabling them to make better predictions or classifications.

Feature engineering is an iterative and creative process that requires domain knowledge, analytical
skills, and a deep understanding of the data. It plays an essential role in building effective machine
learning models and is often what distinguishes successful models from those that fail to perform well.

TO INSTRUCTOR SIMON H.
DUE DATE DECEMBER 20

You might also like