Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
125 views

Machine Learning Notes

This document provides an overview of machine learning lifecycles and basic terminology. It discusses the 11 steps of a typical machine learning lifecycle including problem definition, data selection, modeling, evaluation and deployment. It also defines common terms like features, datasets, dependent and independent variables. Additionally, it covers topics like data preprocessing, transformation, univariate and multivariate analysis, and model selection.

Uploaded by

Nikhita Nair
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
125 views

Machine Learning Notes

This document provides an overview of machine learning lifecycles and basic terminology. It discusses the 11 steps of a typical machine learning lifecycle including problem definition, data selection, modeling, evaluation and deployment. It also defines common terms like features, datasets, dependent and independent variables. Additionally, it covers topics like data preprocessing, transformation, univariate and multivariate analysis, and model selection.

Uploaded by

Nikhita Nair
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

📈

Machine Learning Notes


Machine Learning Lifecyle:
1. Problem Definition: Defining the project requirements and business requirements.
Defining data requirements and modules.

2. Data Selection: Collect and prepare all of the relevant data for from dataset used in
machine learning.

3. Descriptive Statistics: Descriptive statistics are used to describe or summarize the


characteristics of a sample or data set.

4. Exploratory Data Analysis: Analysis of data. Find hidden patterns in the dataset.

5. Data Preprocessing: Data Cleaning, Imputing(Removing missing data) and getting


more useful and relevant data.

6. Data Transformation: Transforming the relevant data into appropriate form.


encoding techniques used(one hot), scaling, e.t.c.

7. Feature Selection: Selection of useful and informative features(attributes) and


eliminating irrelevant feature, optimizing the features. Required features to be used.
Filtering out best features. Subset of data selection.

Machine Learning Notes 1


8. Model Selection: Selection of model based on the variables. Selecting right
algorithm.

9. Model Training: 80-20 rule(training-80,test data-20),working on getting max


accuracy in training stage.

10. Model Evaluation: Model evaluation aims to estimate the generalization accuracy
of a model on future (unseen/out-of-sample) data.

11. Model Deployment: The process of taking a trained ML model and making its
predictions available to users or other systems is known as deployment.

Basic Terminologies:
Feature matrix/Data Matrix:

Matrix of all features

Features/Attributes:

Columns in a dataset

N-dimensional array/Data points:

Rows in a dataset

Dataset:

Set of data used for training model

Dependent/Output(y-axis) variable:

Variable which is output or predicted in a training model

Independent/Input(x-axis)variable:

Variable which is used for input in a training model

Target:

used for predicting

Types of Data:
Continuous variables- Always numeric, continuous and infinite, eg: height, score

Machine Learning Notes 2


Discrete variables- Numeric or categorical, countable and finite, eg: number of
fruits, gender,pincode,etc.

VLOOKUP() in Excel:
VLOOKUP()-merging various tables together, fetching data from multiple tables.

VLOOKUP(search criterion ;array; index; sort)

eg: VLOOKUP(State_ID; userState.A2-An; sort(asc/desc))

Types of Data Analysis:


UNIVARIATE ANALYSIS:

only using one feature

BIVARIATE ANALYSIS:

numeric vs numeric

categoric vs categoric

numeric vs categoric

MULTIVARIATE ANALYSIS:

using multiple features for doing analysis

~min()- it will return the minimum data from a particular dataset


Outlier is any data which is out of the range of your dataset. Anything below or above
the limits will be a outlier.

Upper limit=Q3+1.5IQR
Lower limit=Q1-1.5IQR

avg() used for calculation of mean


median() for calculating of

Coefficient of dispersion based on range: (max-min)/(max+min)


Coefficient of dispersion based on mean deviation: mean deviation/mean

Coefficient of dispersion based on range: (Q3-Q1)/(Q3+Q1)

Machine Learning Notes 3


Quartiles are divided in 4 parts:
Q2=median

Q1=25%, Q2=50%, Q3=75%, Q4=100%


QUARTILE()
IQR(INTER QUARTILE DEVIATION)

Q3-Q1=IQR

QUARTILE DEVIATION=IQR/2

Frequency table
-Divide in form particular ranges

-Frequency(data,classes)

-returns arrays

Pivot table for univariate categorical

pie chart used for 100% data

Bivariate Numeric vs Numeric

Correlation is the how two variables are re


Corelation range 1 to -1

1=two variable highly correlated

-1=highly negatively correlated(inversely)

0=no correlation
R-square is the square of correlation

Trendline is line of best fit

f(x) is the line equation (y=mx+c) in graph

Bivariate categorical vs categorical

Eg gender and state

Machine Learning Notes 4


Bivariate numeric vs categorical

eg: weight and gender

Multivariate: analysis on multiple variables

eg: each state and each gender their average height ,weight

CONCATENATE(col1;" ";col2;...;coln)-concatenating columns like names having more


than 1 word

removing inconsistencies from tables: PROPER(TRIM)-making it proper case and


removing spaces

UPPER()-uppercase and LOWER()- lowercase


combine TRIM with other function for removing extra spaces

Removing duplicates: using advanced filters > no duplication check

Imputation: filling out missing data; using average of a column/median/mode of the data;
if there is col where 70 to 80% NA,then you fill in data, dont use for model
Outliers: Anything below or above the lower and upper limits; UL=Q3+1.5Q1

Normalization: normalizing the data on common format in range of 0 to 1

(X-min)/(max+min)

X-value to be normalized
min(of the X's column)

max(X's column)

max+min>x-min

Standarization:
Regression,Linear regression,correlation

Machine Learning Notes 5


📈 Machine learning using scikit learn
📈 Machine Learning Axioms
📈 Deep Learning-Chorale Prelude + I ngression to DL
📈 Neural Networks and Deep Learning
📈 Convolutional Neural Network
📈 Machine Learning -Exploring the model
📈 Understanding Conversational Systems
Machine Learning Notes 6

You might also like