Artificial Intelligence (AI)

Machine Learning
1. Definition of Machine Learning

Machine learning is a branch of artificial intelligence (AI) and computer science which
focuses on the use of data and algorithms to imitate the way that humans learn,
gradually improving its accuracy.
2. Types of Machine Learning with definition and example
Classical machine learning is often categorized by how an algorithm learns to become
more accurate in its predictions. There are four basic approaches:
supervised learning, unsupervised learning, semi-supervised learning and
reinforcement learning. The type of algorithm data scientists choose to use depends on
what type of data they want to predict.
Supervised learning: In this type of machine learning, data scientists supply
algorithms with labeled training data and define the variables they want the algorithm
to assess for correlations. Both the input and the output of the algorithm is specified.
Unsupervised learning: This type of machine learning involves algorithms that train on
unlabeled data. The algorithm scans through data sets looking for any meaningful
connection. The data that algorithms train on as well as the predictions or
recommendations they output are predetermined.
Semi-supervised learning: This approach to machine learning involves a mix of the two
preceding types. Data scientists may feed an algorithm mostly labeled training data, but
the model is free to explore the data on its own and develop its own understanding of
the data set.
Reinforcement learning: Data scientists typically use reinforcement learning to teach
a machine to complete a multi-step process for which there are clearly defined rules.
Data scientists program an algorithm to complete a task and give it positive or negative
cues as it works out how to complete a task. But for the most part, the algorithm decides
on its own what steps to take along the way.
3. How does supervised machine learning work?
Supervised machine learning requires the data scientist to train the algorithm with both
labeled inputs and desired outputs. Supervised learning algorithms are good for the
following tasks:
Binary classification: Dividing data into two categories.
Multi-class classification: Choosing between more than two types of answers.
Regression modeling: Predicting continuous values.
Ensembling: Combining the predictions of multiple machine learning models to
produce an accurate prediction.
4. How does unsupervised machine learning work?
Unsupervised machine learning algorithms do not require data to be labeled. They sift
through unlabeled data to look for patterns that can be used to group data points into
subsets. Most types of deep learning, including neural networks, are unsupervised
algorithms. Unsupervised learning algorithms are good for the following tasks:
Clustering: Splitting the dataset into groups based on similarity.
Anomaly detection: Identifying unusual data points in a data set.
Association mining: Identifying sets of items in a data set that frequently occur
together.
Dimensionality reduction: Reducing the number of variables in a data set.
5. What do you mean labeled data
In machine learning, data labeling is the process of identifying raw data (images, text
files, videos, etc.) and adding one or more meaningful and informative labels to provide
context so that a machine learning model can learn from it. For example, labels might
indicate whether a photo contains a bird or car, which words were uttered in an audio
recording, or if an x-ray contains a tumor. Data labeling is required for a variety of use
cases including computer vision, natural language processing, and speech recognition.
6. Why need Machine Learning
Machine learning is important because it gives enterprises a view of trends in customer
behavior and business operational patterns, as well as supports the development of new
products. Many of today's leading companies, such as Facebook, Google and Uber, make
machine learning a central part of their operations. Machine learning has become a
significant competitive differentiator for many companies.
In addition to recommendation engines, other uses for machine learning include the
following:
Customer relationship management. CRM software can use machine learning models
to analyze email and prompt sales team members to respond to the most important
messages first. More advanced systems can even recommend potentially effective
responses.
Business intelligence. BI and analytics vendors use machine learning in their software
to identify potentially important data points, patterns of data points and anomalies.
Human resource information systems. HRIS systems can use machine learning
models to filter through applications and identify the best candidates for an open
position.
Self-driving cars. Machine learning algorithms can even make it possible for a semi-
autonomous car to recognize a partially visible object and alert the driver.
Virtual assistants. Smart assistants typically combine supervised and unsupervised
machine learning models to interpret natural speech and supply context.
7. How works Machine Learning
The early stages of machine learning (ML) saw experiments involving theories of
computers recognizing patterns in data and learning from them. Today, after building
upon those foundational experiments, machine learning is more complex.
While machine learning algorithms have been around for a long time, the ability to apply
complex algorithms to big data applications more rapidly and effectively is a more recent
development. Being able to do these things with some degree of sophistication can set a
company ahead of its competitors.
Machine learning is a form of artificial intelligence (AI) that teaches computers to think
in a similar way to how humans do: Learning and improving upon past experiences. It
works by exploring data and identifying patterns, and involves minimal human
intervention.
Almost any task that can be completed with a data-defined pattern or set of rules can
be automated with machine learning. This allows companies to transform processes
that were previously only possible for humans to perform—think responding
to customer service calls, bookkeeping, and reviewing resumes.
8. What do you mean linear model
There are a large number of ML models available. Amazon ML learns one type of ML
model: linear models. The term linear model implies that the model is specified as a
linear combination of features. Based on training data, the learning process computes
one weight for each feature to form a model that can predict or estimate the target value.
For example, if your target is the amount of insurance a customer will purchase and
your variables are age and income,
9. What is regression
Regression analysis is a statistical method to model the relationship between a
dependent (target) and independent (predictor) variables with one or more independent
variables. More specifically, Regression analysis helps us to understand how the value
of the dependent variable is changing corresponding to an independent variable when
other independent variables are held fixed. It predicts continuous/real values such
as temperature, age, salary, price, etc.
Some examples of regression can be as:
Prediction of rain using temperature and other factors
Determining Market trends
Prediction of road accidents due to rash driving.
Terminologies Related to the Regression Analysis:
Dependent Variable: The main factor in Regression analysis which we want to predict
or understand is called the dependent variable. It is also called target variable.
Independent Variable: The factors which affect the dependent variables or which are
used to predict the values of the dependent variables are called independent variable,
also called as a predictor.
Outliers: Outlier is an observation which contains either very low value or very high
value in comparison to other observed values. An outlier may hamper the result, so it
should be avoided.
Multicollinearity: If the independent variables are highly correlated with each other
than other variables, then such condition is called Multicollinearity. It should not be
present in the dataset, because it creates problem while ranking the most affecting
variable.
Underfitting and Overfitting: If our algorithm works well with the training dataset but
not well with test dataset, then such problem is called Overfitting. And if our algorithm
does not perform well even with training dataset, then such problem is
called underfitting.
10. Types of regression with definition and example
There are various types of regressions which are used in data science and machine
learning. Each type has its own importance on different scenarios, but at the core, all
the regression methods analyze the effect of the independent variable on dependent
variables. Here we are discussing some important types of regression which are given
below:
 Linear Regression
 Logistic Regression
 Polynomial Regression
 Support Vector Regression
 Decision Tree Regression
 Random Forest Regression
 Ridge Regression
 Lasso Regression
11. Define a Machine Learning model
A machine learning model is a file that has been trained to recognize certain types of
patterns. You train a model over a set of data, providing it an algorithm that it can use
to reason over and learn from those data.
12. What do you mean loss function and how to calculate?
In supervised machine learning algorithms, we want to minimize the error for each
training example during the learning process. This is done using some optimization
strategies like gradient descent. And this error comes from the loss function.
13. Types of loss function with explanation
https://iq.opengenus.org/types-of-loss-function/
14. What do you mean hypothesis or forward function
A hypothesis is an explanation for something.
It is a provisional idea, an educated guess that requires some evaluation.
A good hypothesis is testable; it can be either true or false.
In science, a hypothesis must be falsifiable, meaning that there exists a test whose
outcome could mean that the hypothesis is not true. The hypothesis must also be
framed before the outcome of the test is known.
A good hypothesis fits the evidence and can be used to make predictions about new
observations or new situations.
The hypothesis that best fits the evidence and can be used to make predictions is called
a theory, or is part of a theory.
Hypothesis in Science: Provisional explanation that fits the evidence and can be
confirmed or disproved.
15. What do you mean prediction
“Prediction” refers to the output of an algorithm after it has been trained on a historical
dataset and applied to new data when forecasting the likelihood of a particular outcome,
such as whether or not a customer will churn in 30 days. The algorithm will generate
probable values for an unknown variable for each record in the new data, allowing the
model builder to identify what that value will most likely be.
16. Explain Gradient descent briefly
Gradient Descent is an optimization algorithm for finding a local minimum of a
differentiable function. Gradient descent in machine learning is simply used to find the
values of a function's parameters (coefficients) that minimize a cost function as far as
possible.
Instead of climbing up a hill, think of gradient descent as hiking down to the bottom of
a valley. This is a better analogy because it is a minimization algorithm that minimizes
a given function.
The equation below describes what the gradient descent algorithm does: b is the next
position of our climber, while a represents his current position. The minus sign refers
to the minimization part of the gradient descent algorithm. The gamma in the middle is
a waiting factor and the gradient term ( Δf(a) ) is simply the direction of the steepest
descent.
So this formula basically tells us the next position we need to go, which is the direction
of the steepest descent. Let’s look at another example to really drive the concept home.
Imagine you have a machine learning problem and want to train your algorithm with
gradient descent to minimize your cost-function J(w, b) and reach its local minimum by
tweaking its parameters (w and b). The image below shows the horizontal axes
representing the parameters (w and b), while the cost function J(w, b) is represented on
the vertical axes. Gradient descent is a convex function.
We know we want to find the values of w and b that correspond to the minimum of the
cost function (marked with the red arrow). To start finding the right values we
initialize w and b with some random numbers. Gradient descent then starts at that
point (somewhere around the top of our illustration), and it takes one step after another
in the steepest downside direction (i.e., from the top to the bottom of the illustration)
until it reaches the point where the cost function is as small as possible.
17. Calculate gradient descent
18. Why need gradient descent
The main reason we use gradient descent to optimize our models over analytical
optimization is that it’s generally just faster! Analytical solutions typically require
complex linear algebra operations, such as matrix inversion, which are very
computationally expensive to compute at large scales and can be numerically unstable.
As an example to demonstrate this, we’ll use OLS linear regression for its simplicity. In
this case, given a matrix of input observations, X, and target vector y, the linear OLS
solution is:
Image by author: linear OLS analytical solution

The key thing to note here though is the square matrix inversion, which has a time
complexity of O(n³), where n is the number of features. Therefore, this doesn’t really
scale well to larger problems often encountered in machine learning .
Meanwhile, calculating just the gradient for use in gradient descent is much simpler
and a computer has no issue running these calculations thousands of times very
quickly, even at larger scales
19. What is the learning rate, how it works, how to achieve good learning
rate, why need learning rate.
Learning rate: The learning rate is a hyperparameter that controls how much to change
the model in response to the estimated error each time the model weights are updated.
Choosing the learning rate is challenging as a value too small may result in a long
training process that could get stuck, whereas a value too large may result in learning
a sub-optimal set of weights too fast or an unstable training process.
Learning rate is used to scale the magnitude of parameter updates during gradient
descent. The choice of the value for learning rate can impact two things: 1) how fast the
algorithm learns and 2) whether the cost function is minimized or not. Figure 2 shows
the variation in cost function with a number of iterations/epochs for different learning
rates.
It can be seen that for an optimal value of the learning rate, the cost function value is
minimized in a few iterations (smaller time). This is represented by the blue line in the
figure. If the learning rate used is lower than the optimal value, the number of
iterations/epochs required to minimize the cost function is high (takes longer time). This
is represented by the green line in the figure. If the learning rate is high, the cost function
could saturate at a value higher than the minimum value. This is represented by the
red line in the figure. If the learning rate selected is very high, the cost function could
continue to increase with iterations/epochs. An optimal learning rate is not easy to find
for a given problem. Though getting the right learning is always a challenge, there are
some well-researched methods documented to figure out optimal learning rates. Some
of these techniques are discussed in the following sections. In all these techniques the
fundamental idea is to vary the learning rate dynamically
Good Learning Rate, Bad Learning Rate
If there is a “good learning rate” that we are looking for, does “bad learning” also exist?
And what does it mean? Let me stick to the concept of supervised learning and discuss
a trivial example:
we are playing “guess the number” game,
with each incorrect guess, you get a response: “too low” or “too high”.
While this is not really a neural network example, let’s imagine how one could play that
game and let’s assume that it’s not purely random guessing this time.
Each time you get a feedback — would you rather take a small step towards the right
answer or maybe a big leap? While this example is not really about neural networks or
machine learning, this is essentially how learning rate works.
Now, imagine taking only tiny steps, each bringing you closer to the correct number.
Will this work? Of course. However, it can really take some time until you get there. This
is the case of small learning rate. In context of machine learning, a model with too-small
LR would be a slow-learner and it would need more iterations to solve the problem.
Sometimes you may already decide to stop the training (or playing Guess the Number
game) before it is finished.
20. Neural network briefly
A neural network is a series of algorithms that endeavors to recognize underlying
relationships in a set of data through a process that mimics the way the human brain
operates. In this sense, neural networks refer to systems of neurons, either organic or
artificial in nature.
21. Types of Neural Networks
Feed-Forward Neural Networks
Feed-forward neural networks are one of the more simple types of neural networks. It
conveys information in one direction through input nodes; this information continues
to be processed in this single direction until it reaches the output mode. Feed-forward
neural networks may have hidden layers for functionality, and this type of most often
used for facial recognition technologies.
Recurrent Neural Networks
A more complex type of neural network, recurrent neural networks take the output of a
processing node and transmit the information back into the network. This results in
theoretical "learning" and improvement of the network. Each node stores historical
processes, and these historical processes are reused in the future during processing.
This becomes especially critical for networks in which the prediction is incorrect; the
system will attempt to learn why the correct outcome occurred and adjust accordingly.
This type of neural network is often used in text-to-speech applications.
Convolutional Neural Networks
Convolutional neural networks, also called ConvNets or CNNs, have several layers in
which data is sorted into categories. These networks have an input layer, an output
layer, and a hidden multitude of convolutional layers in between. The layers create
feature maps that record areas of an image that are broken down further until they
generate valuable outputs. These layers can be pooled or entirely connected, and these
networks are especially beneficial for image recognition applications.
Deconvolutional Neural Networks
Deconvolutional neural networks simply work in reverse of convolutional neural
networks. The application of the network is to detect items that might have been
recognized as important under a convolutional neural network. These items would likely
have been discarded during the convolutional neural network execution process. This
type of neural network is also widely used for image analysis or processing.
Modular Neural Networks
Modular neural networks contain several networks that work independently from one
another. These networks do not interact with each other during an analysis process.
Instead, these processes are done to allow complex, elaborate computing processes to
be done more efficiently. Similar to other modular industries such as modular real
estate, the goal of the network independence is to have each module responsible for a
particular part of an overall bigger picture.
22. Design a Computational graph
Computational graphs are a type of graph that can be used to represent mathematical
expressions. This is similar to descriptive language in the case of deep learning models,
providing a functional description of the required computation.
In general, the computational graph is a directed graph that is used for expressing and
evaluating mathematical expressions.
These can be used for two different types of calculations:
 Forward computation
 Backward computation
23. Chain rule math
24. Forward propagation
25. Backward propagation
26. What do you mean optimizer?
It is very important to tweak the weights of the model during the training process, to
make our predictions as correct and optimized as possible. But how exactly do you do
that? How do you change the parameters of your model, by how much, and when?
Best answer to all above question is optimizers. They tie together the loss function and
model parameters by updating the model in response to the output of the loss function.
In simpler terms, optimizers shape and mold your model into its most accurate possible
form by futzing with the weights. The loss function is the guide to the terrain, telling the
optimizer when it’s moving in the right or wrong direction.
27. What is Logistic Regression, how perform it, what is Logistic or
sigmoid function
Logistic regression models a relationship between predictor variables and a categorical
response variable. For example, we could use logistic regression to model the
relationship between various measurements of a manufactured specimen (such as
dimensions and chemical composition) to predict if a crack greater than 10 mils will
occur (a binary variable: either yes or no). Logistic regression helps us estimate a
probability of falling into a certain level of the categorical response given a set of
predictors. We can choose from three types of logistic regression, depending on the
nature of the categorical response variable:
 Binary Logistic Regression
 Nominal Logistic Regression
 Ordinal Logistic Regression
The logistic function in linear regression is a type of sigmoid, a class of functions with
the same specific properties.
Sigmoid is a mathematical function that takes any real number and maps it to a
probability between 1 and 0.
28. Differs between linear regression and logistic regression

 Linear Regression is used to handle regression problems whereas Logistic
regression is used to handle the classification problems.
 Linear regression provides a continuous output but Logistic regression provides
discreet output.
 The purpose of Linear Regression is to find the best-fitted line while Logistic
regression is one step ahead and fitting the line values to the sigmoid curve.
 The method for calculating loss function in linear regression is the mean squared
error whereas for logistic regression it is maximum likelihood estimation.
29. What is activation function, why need activation function, types of
activation function
The activation function decides whether a neuron should be activated or not by
calculating the weighted sum and further adding bias to it. The purpose of the activation
function is to introduce non-linearity into the output of a neuron.
Explanation: We know, the neural network has neurons that work in correspondence
with weight, bias, and their respective activation function. In a neural network, we
would update the weights and biases of the neurons on the basis of the error at the
output. This process is known as back-propagation. Activation functions make the
back-propagation possible since the gradients are supplied along with the error to
update the weights and biases.
Types: Binary Step, Linear, Sigmoid, Tanh, ReLU, Leaky ReLU, Parameterised
ReLU, Exponential Linear Unit, Swish, Softmax
30. What do you mean binary problem
31. Discreate value vs continues value:
32. What batch, how to choose batch size?

Ans: Batch learning represents the training of machine learning models in a batch
manner. In other words, batch learning represents the training of the models at regular
intervals such as weekly, bi-weekly, monthly, quarterly, etc.
Batch size is a term used in machine learning and refers to the number of training
examples utilized in one iteration. The batch size can be one of three options:
batch mode: where the batch size is equal to the total dataset thus making the iteration
and epoch values equivalent
mini-batch mode: where the batch size is greater than one but less than the total dataset
size. Usually, a number that can be divided into the total dataset size.
stochastic mode: where the batch size is equal to one. Therefore the gradient and the
neural network parameters are updated after each sample.
33. What do you mean data loader
Ans: The job of a data loader is to sample minibatches from a dataset, giving us the
flexibility to choose from different sampling strategies. A very common strategy is
uniform sampling after shuffling the data at each epoch. Figure 7.14 shows the data
loader shuffling the indices it gets from the Dataset .
34. Why do we need DataLoader?
The DataLoader creates batches for us to be able to iterate through them. We no longer
have to care about slicing the data to retrieve batches. Shuffle — this allows our data to
be shuffled, but more importantly, it shuffles our data every epoch. This trick allows our
batches to be a random set of 64 records each time.
35. What is CNN
Ans: Within Deep Learning, a Convolutional Neural Network or CNN is a type of artificial
neural network, which is widely used for image/object recognition and classification.
Deep Learning thus recognizes objects in an image by using a CNN.
36. Why needs CNN
Ans: A CNN is a kind of network architecture for deep learning algorithms and is
specifically used for image recognition and tasks that involve the processing of pixel
data. There are other types of neural networks in deep learning, but for identifying and
recognizing objects, CNNs are the network architecture of choice.
37. What do you mean Convolution
Ans: Convolution is a mathematical way of combining two signals to form a third signal.
It is the single most important technique in Digital Signal Processing. Using the strategy
of impulse decomposition, systems are described by a signal called the impulse
response.
38. Design a basic CNN model
39. Describe padding, filter, pooling
Padding is used to create space around an element's content, inside of any defined
borders.
Filtering is a technique for modifying or enhancing an image.
Pooling is the process of extracting the features from the image output of a convolution
layer.
40. What do you mean Features
Ans: In computer vision and image processing, a feature is a piece of information about
the content of an image; typically about whether a certain region of the image has certain
properties. Features may be specific structures in the image such as points, edges or
objects.
41. Machine learning vs deep learning
42. What do you mean Vanishing gradient

Ans: In Machine Learning, the Vanishing Gradient Problem is encountered while
training Neural Networks with gradient-based methods (example, Back Propagation).
This problem makes it hard to learn and tune the parameters of the earlier layers in the
network.
The vanishing gradients problem is one example of unstable behaviour that you may
encounter when training a deep neural network.
It describes the situation where a deep multilayer feed-forward network or a recurrent
neural network is unable to propagate useful gradient information from the output end
of the model back to the layers near the input end of the model.
The result is the general inability of models with many layers to learn on a given dataset,
or for models with many layers to prematurely converge to a poor solution.

Artificial Intelligence (AI)

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Artificial Intelligence (AI)

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Artificial Intelligence (AI)

Uploaded by

Copyright:

Available Formats

Machine Learning

1. Definition of Machine Learning

Image by author: linear OLS analytical solution

28. Differs between linear regression and logistic regression

32. What batch, how to choose batch size?

42. What do you mean Vanishing gradient

You might also like