ML Questions

1. What is XGBoost?

Basic ML
2. How do you deploy a model to cloud

Intermediate ML
The workflow can be broken down into following basic steps:
- Training a machine learning model on a local system
- Wrapping the inference logic into a flask application
- Using Docker to containerize the flask application
- Hosting the docker container on an AWS ec2 instance and consuming the web-service
3. How will you make models out of the tweets for the pharma company

Advanced ML
4. Make 4 segments (product category, competitors etc) and identify which

medicine a doctor is likely to recommend

Intermediate ML
5. Working of ensemble methods such as bagging, boosting, random forest.

Intermediate ML
6. What is clustering and KNN?

Basic ML
k-Means Clustering is an unsupervised learning algorithm that is used for clustering whereas
KNN is a supervised learning algorithm used for classification.
The “k” in k-means denotes the number of clusters you want to have in the end. If k = 5, you will
have 5 clusters on the data set. “k” in K-Nearest Neighbors is the number of neighbours it
checks. It is supervised because you are trying to classify a point based on the known
classification of other points.
7. What is bagging and boosting?

Basic ML
Bagging and Boosting decrease the variance of your single estimate as they combine several
estimates from different models. So the result may be a model with higher stability.
Bagging is used when the goal is to reduce the variance of a decision tree classifier. Here the
objective is to create several subsets of data from the training sample chosen randomly with
replacement. Each collection of subset data is used to train their decision trees. As a result, we
get an ensemble of different models. Average of all the predictions from different trees are used
which is more robust than a single decision tree classifier.
Boosting is used to create a collection of predictors. In this technique, learners are learned
sequentially with early learners fitting simple models to the data and then analysing data for
errors. Consecutive trees (random sample) are fit and at every step, the goal is to improve the
accuracy from the prior tree. When an input is misclassified by a hypothesis, its weight is
increased so that next hypothesis is more likely to classify it correctly. This process converts
weak learners into a better performing model
8. What is ADA boosting?

Basic ML
Ada-boost is an ensemble classifier. It combines a weak classifier algorithm to form strong
classifier. A single algorithm may classify the objects poorly. But if we combine multiple
classifiers with a selection of training set at every iteration and assigning the right amount of
weight in the final voting, we can have good accuracy score for the overall classifier.
9. Explain Gradient boosting and Extreme Gradient Boosting?

Basic ML
XGBoost stands for Extreme Gradient Boosting; it is a specific implementation of the Gradient
Boosting method which uses more accurate approximations to find the best tree model. It
employs a number of nifty tricks that make it exceptionally successful, particularly with structured
data.
The most important are:
1) computing second-order gradients, i.e. second partial derivatives of the loss function (similar
to Newton’s method), which provides more information about the direction of gradients and how
to get to the minimum of our loss function. While regular gradient boosting uses the loss function
of our base model (e.g. decision tree) as a proxy for minimizing the error of the overall model,
XGBoost uses the 2nd order derivative as an approximation.
2) And advanced regularization (L1 & L2), which improves model generalization.
XGBoost has additional advantages: training is very fast and can be parallelized/distributed
across clusters.
10. What is Bootstrap sampling?

Basic ML
Bootstrap Sampling is a method that involves drawing of sample data repeatedly with
replacement from a data source to estimate a population parameter.
11. What to be done on the dataset if the assumptions are not met?

Intermediate ML
1. If you create a scatter plot of values for x and y and see that there is not a linear relationship
between the two variables, then one can do the following:
- Apply a nonlinear transformation to the independent and/or dependent variable. e.g. log, square
root, or reciprocal of the independent and/or dependent variable
- Add another independent variable to the model.
2. If residuals are not independent then one can do the following:
- For positive serial correlation, consider adding lags of the dependent and/or independent
variable to the model.
- For negative serial correlation, check to make sure that none of your variables is
overdifferenced.
- For seasonal correlation, consider adding seasonal dummy variables to the model
3. If Residuals do not have constant variance, then one can do the following:
- Transform the dependent variable
- Use weighted regression
4. If Residuals are not normally distributed, then one can do the following:
- First, verify that any outliers aren’t having a huge impact on the distribution. If there are outliers
present, make sure that they are real values and that they aren’t data entry errors
- Next, you can apply a nonlinear transformation to the independent and/or dependent variable.
e.g. log, square root, or the reciprocal of the independent and/or dependent variable
12. How to apply ML Algorithms in Mfg/Production Environment ?

Intermediate ML
1. Specify Performance Requirements (This may be as accurate or false positives or whatever
metrics are important to the business)
2. Separate Prediction Algorithm From Model Coefficients
2a. Select or Implement The Prediction Algorithm
2b. Serialize Your Model Coefficients
3. Develop Automated Tests For Your Model
4. Develop Back-Testing and Now-Testing Infrastructure
5. Challenge Then Trial Model Updates (For example, perhaps you set up a grid or random
search of model hyperparameters that runs every night and spits out new candidate models)
13. Difference between Classification and Linear Regression?

Basic ML
1. Fundamentally, classification is about predicting a label and regression is about predicting a
quantity.
i.e. Classification is the task of predicting a discrete class label while Regression is the task of
predicting a continuous quantity
2. Classification predictions can be evaluated using accuracy, whereas regression predictions
cannot.
Regression predictions can be evaluated using root mean squared error, whereas classification
predictions cannot.
3. A regression algorithm can predict a discrete value which is in the form of an integer quantity
A classification algorithm can predict a continuous value if it is in the form of a class label
probability
14. Which model to use to check whether a patient is diabetic or not?

Basic ML
Classification algorithm such as Logistic regression, Random forest etc
15. Explain missing values and outlier treatment

Basic ML
16. What is logistic regression? The output for logistic regression?

Basic ML
a. Logistic regression models the probabilities for classification problems with two possible
outcomes. It's an extension of the linear regression model for classification problems.
b. Log likelihood – This is the log likelihood of the final model
c. Number of obs – This is the number of observations that were used in the analysis
d. LR chi2(3) – This is the likelihood ratio (LR) chi-square test. The number in the parenthesis
indicates the number of degrees of freedom
e. Prob > chi2 – This is the probability of obtaining the chi-square statistic given that the null
hypothesis is true. In this case, the model is statistically significant because the p-value is less
than .000.
f. Pseudo R2 – This is the pseudo R-squared.
17. What is Ensemble techniques and it's working? some models?

Basic ML
A group of weak learners coming together to form a strong learner, thus increasing the accuracy
of any Machine Learning model is called an ensemble model
Simple Ensemble Techniques: Hard Voting Classifier, Averaging, Weighted Averaging
Advanced Ensemble Techniques: Stacking, Bagging (Randomforest) and Pasting
Boosting(Adaboost, XGB etc)
18. What is Decision tree and Random forest?

Basic ML
- A decision tree is a supervised machine learning algorithm that can be used for both
classification and regression problems. A decision tree is simply a series of sequential decisions
made to reach a specific result
- Random Forest is a tree-based machine learning algorithm that leverages the power of multiple
(randomly created) decision trees for making decisions. i.e. The Random Forest Algorithm
combines the output of multiple (randomly created) Decision Trees to generate the final output.
- Random Forest is suitable for situations when we have a large dataset, and interpretability is
not a major concern. Decision trees are much easier to interpret and understand. Since a
random forest combines multiple decision trees, it becomes more difficult to interpret.
- The decision tree model gives high importance to a particular set of features. But the random
forest chooses features randomly during the training process.
19. How to deal with underfitting and overfitting

Basic ML
Handling Overfitting:
Cross-validation
This is done by splitting your dataset into ‘test’ data and ‘train’ data. Build the model using the
‘train’ set. The ‘test’ set is used for in-time validation.
Regularization
This is a form of regression, that regularizes or shrinks the coefficient estimates towards zero.
This technique discourages learning a more complex model
Early stopping
When training a learner with an iterative method, you stop the training process before the final
iteration. This prevents the model from memorizing the dataset.
Pruning
This technique applies to decision trees.
Pre-pruning: Stop ‘growing’ the tree earlier before it perfectly classifies the training set.
Post-pruning: Allows the tree to ‘grow’, perfectly classify the training set and then post prune the
tree.
Dropout
This is a technique where randomly selected neurons are ignored during training.
Regularize the weights
Handling Underfitting:
Get more training data
Increase the size or number of parameters in the model
Increase the complexity of the model
Increasing the training time, until cost function is minimised
20. What is bias variance tradeoff

Basic ML
The goal of any supervised machine learning algorithm is to achieve low bias(the difference
between the average prediction of our model and the correct value which we are trying to predict)
and low variance(variability of model prediction for a given data point or a value which tells us
spread of our data).
If our model is too simple and has very few parameters then it may have high bias and low
variance. On the other hand, if our model has a large number of parameters then it’s going to
have high variance and low bias.
Increasing the bias will decrease the variance. Increasing the variance will decrease bias.
So we need to find the right/good balance without overfitting and underfitting the data.
This tradeoff in complexity is why there is a tradeoff between bias and variance.
21. How will you explain machine learning to a 5 year old.

Intermediate ML
Just like a human, a computer can learn from three sources.
One is Observing what others did in similar situations. The other is observing a situation and
trying to come up with the best possible logic on the spot to decide/conclude. The third is
learning from previous mistakes/success. These three methods correspond to three branches of
Machine learning, Supervised, Unsupervised and Reinforcement learning respectively.
- In Supervised Learning, a computer can tell what word in a sentence is the name of a city,
given it is shown example sentences which may or may not contain names of cities and every
occurrence of a city name is tagged in these examples.
- Unsupervised is where we ask the computer to make decisions based on raw data attributes
and a set of measurable quantities. Some examples would include asking a computer to come
up with localities in a dataset where Lat-Long of the house is given. It would use Lat Long to find
distances and form localities of house.
- The third type of learning is Reinforcement Learning. This is a method in which computer starts
with making random decisions, and then learns based on errors it makes and successes it
encounters as it goes. A recent discovery was an algorithm which could play many different
arcade games after learning the correct/wrong moves. These algorithms would start by making a
lot of failures in the beginning and then get better as they go.
22. What do you do in data exploration?

Basic ML
23. You are given a data set on cancer detection. You’ve build a classification
model and achieved an accuracy of 96%. Why shouldn’t you be happy with
your model performance? What can you do about it?

Intermediate ML
24. You are working on a time series data set. You manager has asked you to
build a high accuracy model. You start with the decision tree algorithm, since
you know it works fairly well on all kinds of data. Later, you tried a time series
regression model and got higher accuracy than decision tree model. Can this
happen? Why?

Advanced ML
25. You came to know that your model is suffering from low bias and high
variance. Which algorithm should you use to tackle it? Why?

Intermediate ML
26. How is kNN different from kmeans clustering?

Basic ML
27. After analyzing the model, your manager has informed that your regression
model is suffering from multicollinearity. How would you check if he’s true?
Without losing any information, can you still build a better model?

Intermediate ML
28. When is Ridge regression favorable over Lasso regression?

Basic ML
29. While working on a data set, how do you select important variables?
Explain your methods.

Basic ML
30. What is the difference between covariance and correlation?

Intermediate ML
31. Both being tree based algorithm, how is random forest different from
Gradient boosting algorithm (GBM)?

Basic ML
32. You’ve got a data set to work having p (no. of variable) > n (no. of
observation). Why is (Ordinary Least Squares) OLS is bad option to work with?
Which techniques would be best to use? Why?

Advanced ML
33. We know that one hot encoding increasing the dimensionality of a data set.
But, label encoding doesn’t. How ?

Intermediate ML
34. You are given a data set consisting of variables having more than 30%
missing values? Let’s say, out of 50 variables, 8 variables have missing values
higher than 30%. How will you deal with them?

Basic ML
35. People who bought this, also bought…’ recommendations seen on amazon
is a result of which algorithm?

Intermediate ML
36. What do you understand by Type I vs Type II error ?

Basic ML
37. You have been asked to evaluate a regression model based on R², adjusted
R² and tolerance. What will be your criteria?

Basic ML
38. Considering the long list of machine learning algorithm, given a data set,
how do you decide which one to use?

Basic ML
39. When does regularization becomes necessary in Machine Learning?

Basic ML
40. What do you understand by Bias Variance trade off?

Basic ML
41. How can you prove that one improvement you've brought to an algorithm
is really an improvement over not doing anything?

Basic ML
42. Explain what resampling methods are and why they are useful. Also
explain their limitations.

Basic ML
- Repeatedly drawing samples from a training set and refitting a model of interest on each
sample in order to obtain additional information about the fitted model
- Example: repeatedly draw different samples from training data, fit a linear regression to each
new sample, and then examine the extent to which the resulting fit differ
- most common are: cross-validation and the bootstrap
cross-validation: random sampling with no replacement, bootstrap: random sampling with
replacement
- cross-validation: evaluating model performance, model selection (select the appropriate level of
flexibility)
- bootstrap: mostly used to quantify the uncertainty associated with a given estimator or statistical
learning method
43. Is it better to have too many false positives, or too many false negatives?
Explain.

Basic ML
False-positive and false-negative are two problems we have to deal with while evaluating a
mode.
In medical, a false positive can lead to unnecessary treatment and a false negative can lead to a
false diagnostic, which is very serious since the disease has been ignored.
However, we can minimize the errors by collecting more information, considering other variables,
adjusting the sensitivity (true positive rate) and specificity (true negative rate) of the test, or
conducting the test multiple times.
Even so, it is still hard since reducing one type of error means increasing the other type of error.
Sometimes, one type of error is more preferable than the other one, so data scientists will have
to evaluate the consequences of the errors and make a decision
44. What is selection bias, why is it important and how can you avoid it

Basic ML
Selection bias occurs if a data set's examples are chosen in a way that is not reflective of their
real-world distribution.
How to avoid selection biases
Mechanisms for avoiding selection biases include:
- Using random methods when selecting subgroups from populations.
- Ensuring that the subgroups selected are equivalent to the population at large in terms of their
key characteristics (this method is less of a protection than the first since typically the key
characteristics are not known).
45. Differentiate between univariate, bivariate and multivariate analysis.

Basic ML
Univariate statistics summarize only one variable at a time.
Bivariate statistics compare two variables.
Multivariate statistics compare more than two variables.
46. What is the difference between Cluster and Systematic Sampling?

Basic ML
Systematic sampling and cluster sampling are both statistical measures used by researchers,
analysts, and marketers to study samples of a population.
Systematic sampling involves selecting fixed intervals from the larger population to create the
sample.
Cluster sampling divides the population into groups, then takes a random sample from each
cluster.
47. Can you cite some examples where both false positive and false negatives
are equally important?

Intermediate ML
Let us take an example of a medical field where:
A false positive = person is considered as sick but actually is healthy
A false negative = person is considered as healthy but is actually sick
What does it mean?
False-positive cases lead to overspending due to unnecessary care and damaging the health of
an otherwise healthy person due to unnecessary side effects of the therapy.
A false negative case means that your patients get sicker or die.
In this case, both false positive and false negatives are equally important since it concerns a
person’s life
48. Explain Lasso regression

Basic ML
Lasso regression is a type of linear regression that uses shrinkage. Shrinkage is where data
values are shrunk towards a central point, like the mean. The lasso procedure encourages
simple, sparse models (i.e. models with fewer parameters)
Lasso regression performs L1 regularization, which adds a penalty equal to the absolute value of
the magnitude of coefficients. This type of regularization can result in sparse models with few
coefficients; Some coefficients can become zero and eliminate from the model. Larger penalties
result in coefficient values closer to zero, which is ideal for producing simpler models.
49. Explain Gradient Descent Algorithm

Intermediate ML
Gradient descent is an optimization algorithm that's used when training a machine learning
model.
It's based on a convex function and tweaks its parameters iteratively to minimize a given cost
function to its local minimum.
You start by defining the initial parameter's values and from there gradient descent uses calculus
to iteratively adjust the values so they minimize the given cost-function (where a gradient
measures how much the output of a function changes if you change the inputs a little bit.)
50. How machine learning is deployed in real world scenarios?

Advanced ML
AWS or Azure instances with python jobs that run with either manual schedules, or automated to
trigger on receiving say new data. These are usually a suite of services that constitute a
deployment environment of such models.
Storage - model needs to be stored somewhere (pickle or joblib or specific model object). Either
s3 on aws or blob in azure.
Computing instance - Computing environment that contains python and is enabled to
communicate to every platform that is relevant to the deployment context.
Job scheduler - Devops is the norm now. Automated pipelines that procure data, process,
load/retrain/predict with the packaged model.
Final layer - either BI tools like tableau, qilkview etc or sql/nosql databases or excel reports
51. What is cosine similarity?

Intermediate ML
Cosine similarity is a metric used to measure how similar the documents are irrespective of their
size. Mathematically, it measures the cosine of the angle between two vectors projected in a
multi-dimensional space. The cosine similarity is advantageous because even if the two similar
documents are far apart by the Euclidean distance (due to the size of the document), chances
are they may still be oriented closer together. The smaller the angle, the higher the cosine
similarity.
52. How to implement Tensorflow?

Intermediate ML
The usual workflow of running a program in TensorFlow is as follows:
Build a computational graph, this can be any mathematical operation TensorFlow supports.
Initialize variables, to compile the variables defined previously
Create a session, this is where the magic starts!
Run graph in session, the compiled graph is passed to the session, which starts its execution.
Close session, shut down the session.
53. What is part of speech (POS) tagging? What is the simplest approach to
building a POS tagger that you can imagine?

Basic NLP
POS tagging is the process of marking up a word in a corpus to a corresponding part of a speech
tag, based on its context and definition. The most common approach is to use the lexicon-based
approach, using a lexicon to assign a tag for each word. The lexicon is constructed from a gold
standard annotated corpus, where each word type is coupled with its most frequent associated
tag in the gold standard corpus.
54. How would you build a part of speech (POS) tagger from scratch given a
corpus of annotated sentences? How would you deal with unknown words?

Basic NLP
First, we will create features from words (like last 2,3 letters, the previous word, next word, etc.).
Then we will train a classifier to find the POS tag. HMM, CRF and RNNs can be used to train the
model. Unknown words can also be predicted by generating the features (position of the word,
suffix, etc) from them.
55. How would you train a model that identifies whether the word “Apple” in a
sentence belongs to the fruit or the company?

Basic NLP
This particular task is known as NER (Named Entity Recognition) tagging. HMM, CRF and RNNs
can be used to train a model for NER
56. How would you find all the occurrences of quoted text in a news article?

Basic NLP
Train a classifier model to look at the constituent parts of a news article and assign a probability
that, taken together, composes a valid quoted text.
57. How would you build a system that auto-corrects text that has been
generated by a speech recognition system?

Basic NLP
It can be done in multiple ways, but the simplest way would be to take the unknown words and
compare them with similar words from our dictionary. Distances can be calculated using
algorithms like Levenshtein and if the result is satisfactory, the words can be exchanged
58. Which are some popular models other than word2vec?

Basic NLP
Some popular models other than word2vec are GloVe, Adagram, FastText, etc
59. What is latent semantic indexing and where can it be applied?

Basic NLP
Latent semantic indexing (LSI) is a concept used by search engines to discover how a term and
content work together to mean the same thing, even if they do not share keywords or synonyms.
Search engines use LSI to judge the quality of the content on a page by checking for words that
should appear alongside a given search term or keyword
60. Explain some metrics to test out a Named Entity recognition model.

Basic NLP
When you train a NER system the most typical evaluation method is to measure precision, recall,
f1-score, and confusion matrix at a token level.
61. List out some popular Python libraries that are used for NLP.

Basic NLP
Some popular libraries for NLP are, NLTK, Gensim, spaCy, TextBlob, etc.
62. What are some popular applications of NLP?

Basic NLP
Some popular applications are Text summarization, Machine translation, Sentiment Analysis,
chatbots, etc.
63. What is the difference between search function and match function?

Basic NLP
re.search() method finds something anywhere present in the string and return a match object,
whereas re.match() method finds something only at the beginning of the string and returns a
match object
64. What is tokenization, chinking, chunking?

Basic NLP
Tokenization is a way of separating a piece of text into smaller units called tokens. Here, tokens
can be either word, characters, or subwords. Chunking means a grouping of words/tokens into
chunks. Chunking can break sentences into phrases that are more useful than individual words
and yield meaningful results. Chinking is a lot like chunking, it is basically a way for you to
remove a chunk from a chunk.
65. What is the skip-gram model?

Basic NLP
Skip-gram is an unsupervised algorithm to find word embeddings. It tries to predict the source
context words (surrounding words) given a target word (the center word)
66. What is a CBOW model?

Basic NLP
CBOW is an unsupervised algorithm to find word embeddings. It tries to predict the target word
(the center word) given the source context words (surrounding words).
67. How can you create your own word embeddings?

Basic NLP
You can use gensim library to implement word2vec model, you can train the word2vec model on
your text corpus and then generate word embeddings.
68. What is the difference between stemming and lemmatization?

Basic NLP
Stemming and lemmatization, both are used to derive root (base) word from their inflected form.
A stem might not be an actual word whereas a lemma will be an actual word.
69. How would you build a system to translate English text to Greek and vice-
versa?

Basic NLP
One can use Neural Machine Translation to translate English text to Greek and vice-versa. A
sequence to sequence model can be created using RNNs.
70. How would you build a system that automatically groups news articles by
subject?

Basic NLP
There can be different ways to do this task, if you have annotated data, you can train a classifier
model to classify different articles
71. What are stop words? Describe an application in which stop words should
be removed.

Basic NLP
Stop words are frequently used word, which does not add much meaning to a sentence or does
not help in prediction. We will need to remove the stop words while performing sentiment
analysis.
72. How would you design a model to predict whether a movie review was
positive or negative?

Basic NLP
We will need to perform sentiment analysis on the reviews, It can be done in multiple ways, one
simple way to do this is by training a classifier using ML algorithms or RNNs (LSTM or GRU).
73. What is entropy? How would you estimate the entropy of the English
language?

Basic NLP
Entropy is a measure of randomness in the information. One possible way of calculating the
entropy of English uses N-grams. One can statistically calculate the entropy of the next letter
when the previous N - 1 letters are known.
74. What is the TF-IDF score of a word and in what context is this useful?

Basic NLP
TF-IDF is a statistical measure that evaluates how relevant a word is to a document in a
collection of documents. This is done by multiplying two metrics: how many times a word
appears in a document, and the inverse document frequency of the word across a set of
documents. TF-IDF is used to convert text corpus into a matrix on which Machine learning
algorithms can be implemented
75. What is dependency parsing?

Basic NLP
Dependency parsing is the process of analyzing the grammatical structure of a sentence based
on the dependencies between the words in that sentence.
76. What are the difficulties in building and using an annotated corpus of text
such as the Brown Corpus and what can be done to mitigate them?

Basic NLP
77. What tools for training NLP models (NLTK, Apache OpenNLP, GATE,
MALLET etc…) have you used?

Basic NLP
To train NLP models, I have used NLTK, Gensim, Spacy and a few others
78. Are you familiar with WordNet or other related linguistic resources?

Basic NLP
WordNet is the lexical database i.e. dictionary for the English language, specifically designed for
NLP. Synset is a special kind of a simple interface that is present in NLTK to look up words in
WordNet.
79. Problems faced in NLP and how you tackled them?

Basic NLP
Most of the challenges I faced in NLP are due to data complexity, characteristics such as
sparsity, diversity, dimensionality, etc. and the dynamic nature of the datasets. With the special
focus on addressing NLP challenges, one can build accelerators, robust, scalable domain-
specific knowledge bases and dictionaries that bridges the gap between user vocabulary and
domain nomenclature.
80. What are some of the common problems using fixed window neural
models?

Advanced NLP
The main problem faced while using a fixed window neural model, is the window size can be
small for large sentences, making it unable to process the complete information
81. What are some common examples of sequential data?

Advanced NLP
Some common examples of sequential data are text corpus, DNA sequence, and time-series
data
82. What are some problems with N-gram language models?

Advanced NLP
An issue when using n-gram language models is out-of-vocabulary (OOV) words. They are
encountered in computational linguistics and natural language processing when the input
includes words which were not present in a system's dictionary or database during its
preparation.
83. What are some limitations of RNNs?

Advanced NLP
RNNs are prone to exploding and vanishing gradient problem. RNN also fails to keep track of
long term dependencies.
84. What are Vanishing gradient problems?

Advanced NLP
As more layers using certain activation functions are added to neural networks, the gradients of
the loss function approach zero, making the network hard to train.
85. What is exploding gradients in RNN?

Advanced NLP
Exploding gradients are a problem where large error gradients accumulate and result in very
large updates to neural network model weights during training.
86. Can you give me an example of many-to-one architecture in sequence

models?

Advanced NLP
An example of Many to one architecture in sequence model, would be sentiment analysis, where
the inputs are words and the output is sentiment
87. What activation layer is used in the hidden units of an RNN?

Advanced NLP
tanh activation function is used in hidden units of RNN
88. What is the use of the Forget Gate in LSTMs?

Advanced NLP
In LSTM, the forget gate controls the extent to which a value remains in the cell
89. Why is there a specific need for an architecture like GRU or LSTM?

Advanced NLP
RNN’s suffer from short-term memory. If a sequence is long enough, they will have a hard time
carrying the information from the earlier timesteps to later ones. This is called the Vanishing
Gradient Problem. To solve this issue, GRU and LSTMs are used
90. What problems of RNNs do LSTMs address?

Advanced NLP
RNN’s suffer from short-term memory. If a sequence is long enough, they will have a hard time
carrying the information from the earlier timesteps to later ones. This is called the Vanishing
Gradient Problem. To solve this issue, GRU and LSTMs are used
91. What is the primary difference between an LSTM and GRU?

Advanced NLP
The main difference between GRU and LSTM is, GRU have 2 gates whereas LSTM has 3 gates,
thus GRU is faster than LSTM. But LSTMs generally perform better at remembering longer
sequences than GRU
92. What kind of datasets are RNNs known best to work on?

Advanced NLP
RNNs are good at making predictions when the data is sequential.
93. What are the different possible architectures in RNNs and give examples of
the same?

Advanced NLP
Different possible architectures for RNN are the following:
1. One-to-Many: ex. Auto-Image captioning

2. Many-to-Many: ex. Neural Machine Translation
3. Many-to-one: ex. Sentiment Analysis
94. What are some of the ways to address the exploding gradients problem in
RNNs?

Advanced NLP
Some of the ways to address the exploding gradient problem are
1. Gradient Clipping: Limit the size of gradients during the training of your network.
2. Weight Regularization: apply a penalty to the networks loss function for large weight
values.
3. Using LSTM or GRU
95. Explain encoder-decoder architecture?

Advanced NLP
An Encoder-Decoder architecture was developed where an input sequence was read in entirety
and encoded to a fixed-length internal representation. A decoder network then used this internal
representation to output words. This is generally used in machine translation
96. What are the drawbacks of attention mechanisms?

Advanced NLP
The main disadvantage of attention mechanism is that it adds more weights to train, thus
increases the training time of the model
97. What is BERT? What are the applications of it?

Advanced NLP
BERT stands for Bidirectional Encoder Representations from Transformers. BERT is pre-trained
on a large corpus of unlabelled text. It is bidirectional meaning it learns information from both the
left and the right side of a token’s context during the training phase. BERT is used for text
summarization, knowledge extraction, chatbots etc.
98. What is XLNet?

Advanced NLP
XLNet is an auto-regressive language model which outputs the joint probability of a sequence of
tokens based on the transformer architecture with recurrence.
99. What are the Transformers?

Advanced NLP
The Transformer in NLP is a novel architecture that aims to solve sequence-to-sequence tasks
while handling long-range dependencies with ease.
100. What is the time complexity of LSTM?

Advanced NLP
101. Why do we need attention mechanisms?

Advanced NLP
The standard seq2seq model is generally unable to accurately process long input sequences,
the attention mechanism allows the model to focus and place more “Attention” on the relevant
parts of the input sequence as needed.
102. What are the different types of attention mechanisms?

Advanced NLP
There are 2 different types of attention mechanism
1. Bahdanau Attention
2. Luong Attention
103. What are the advantages of BERT?

Advanced NLP
Since BERT model is deeply bidirectional, it is able to generate more accurate word
representation. Since BERT uses transformers, it provides parallelization and thus faster to train
on large datasets
104. What information is stored in the hidden and cell state of an LSTM?

Advanced NLP
The cell state ( also called long-term memory) contains the information from the past. Hidden
State (also called working memory) contains the information from the current state that needs to
be taken to the next state
105. Why is the transformer better than LSTMs?

Advanced NLP
Transformers are better than all the other architectures because they totally avoid recursion, by
processing sentences as a whole and by learning relationships between words, using multi-head
attention mechanisms and positional embeddings
106. What are the differences between BERT and ALBERT v2?

Advanced NLP
BERT is an expensive model in terms of memory and time consumed on computations, even
with GPU. ALBERT v2 is lighter and faster than BERT. Cross-layer parameter sharing is the
most significant change in BERT architecture that created ALBERT.
107. What are the different variants of BERT?

Advanced NLP
There are 2 different variants of BERT
1. BERT Base: 12 layers (transformer blocks), 12 attention heads, and 110 million
parameters
2. BERT Large: 24 layers (transformer blocks), 16 attention heads and, 340 million
parameters
108. What is the state of the art model currently in NLP?

Advanced NLP
Following are the state of the art model currently in NLP
1. BERT
2. GPT-3
3. XLNet
109. What are the most challenging NLP problems that researchers/industries
are working on currently?

Advanced NLP
Following are the challenges faced currently in NLP
1. Extraction of meaning from a variety of complex, multi-format documents.

2. Support for multiple languages
3. Integration of pre-existing, text-based knowledge
110. What are built-in function in python?

Basic Python
111. Differentiate between Call by value and Call by reference

Basic Python
112. How to read any file (without using Pandas)

Intermediate Python
113. What is NaN in python?

Basic Python
114. What is the use of ID() function in python?

Basic Python
115. How will you import multiple excel sheets in a data frame?

Basic Python
116. What are the different types of data types?

Basic Python
117. Difference between lists/ tuples/ dictionaries?

Basic Python
118. How would check a number is prime or not using Python?

Basic Python
# taking input from user
number = int(input("Enter any number: "))
# prime number is always greater than 1
if number > 1:
for i in range(2, number):
if (number % i) == 0:
print(number, "is not a prime number")
break
else:
print(number, "is a prime number")
# if the entered number is less than or equal to 1
# then it is not a prime number
else:
print(number, "is not a prime number")
119. How would check a number is armstrong number using Python?

Basic Python
# Python program to check if the number is an Armstrong number or not
# take input from the user
num = int(input("Enter a number: "))
# initialize sum
sum = 0
# find the sum of the cube of each digit
temp = num
while temp > 0:
digit = temp % 10
sum += digit ** 3
temp //= 10
# display the result
if num == sum:
print(num,"is an Armstrong number")
else:
print(num,"is not an Armstrong number")
120. What is an Append Function?

Basic Python
The append() method adds an item to the end of the list.
The syntax of the append() method is:
list.append(item)
121. For what Beautiful soup library is used for?

Basic Python
122. Which function is most useful to convert a multidimensional array into a

one-dimensional

Basic Python
123. Python or R – Which one would you prefer for text analytics?

Intermediate Python
124. What is the lambda function in Python?

Intermediate Python
In Python, anonymous functions are defined using the lambda keyword
Syntax of Lambda Function in python
lambda arguments: expression
125. How negative indices are used in Python?

Intermediate Python
Python programming language supports negative indexing of arrays, something which is not
available in arrays in most other programming languages. This means that the index value of -1
gives the last element, and -2 gives the second last element of an array. The negative indexing
starts from where the array ends. This means that the last element of the array is the first
element in the negative indexing which is -1.
126. How is the Python series different from a single column dataframe?

Intermediate Python
Python series is the data structure for a single column of a DataFrame, not only conceptually, but
literally, i.e. the data in a DataFrame is actually stored in memory as a collection of Series
Series is a one-dimensional object that can hold any data type such as integers, floats and
strings and it does not have any name/header whereas the dataframe has column names.
127. Which libraries in SciPy have you worked with in your project?

Intermediate Python
SciPy contains modules for optimization, linear algebra, integration, interpolation, special
functions, FFT, signal and image processing, ODE solvers etc
Subpackages include:
scipy.cluster
scipy.constants
scipy.fftpack
scipy.integrate
scipy.interpolation
scipy.linalg
scipy.io
scipy.ndimage
scipy.odr
scipy.optimize
scipy.signal
scipy.sparse
scipy.spatial
scipy.special
scipy.stats
scipy.weaves
128. How groupby function works in Python?

Intermediate Python
Pandas dataframe.groupby() function is used to split the data into groups based on some criteria.
pandas objects can be split on any of their axes.
Syntax: DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True,
group_keys=True, squeeze=False, **kwargs)
Parameters :
by: mapping, function, str, or iterable
axis: int, default 0
level: If the axis is a MultiIndex (hierarchical), group by a particular level or levels
as_index: For aggregated output, return object with group labels as the index. Only relevant for
DataFrame input. as_index=False is effectively “SQL-style” grouped output
sort: Sort group keys. Get better performance by turning this off. Note this does not influence the
order of observations within each group. groupby preserves the order of rows within each group.
group_keys: When calling apply, add group keys to index to identify pieces
squeeze: Reduce the dimensionality of the return type if possible, otherwise return a consistent
type
Returns: GroupBy object
129. What does [::-1] do in python?

Intermediate Python
[::] just produces a copy of all the elements in order
[::-1] produces a copy of all the elements in reverse order
130. What are python packages?

Basic Python
Packages are namespaces which contain multiple packages and modules themselves. They are
simply directories.
Each package in Python is a directory which MUST contain a special file called __init__.py. This
file can be empty, and it indicates that the directory it contains is a Python package, so it can be
imported the same way a module can be imported.
If we create a directory called foo, which marks the package name, we can then create a module
inside that package called bar. We also must not forget to add the __init__.py file inside the foo
directory.
131. How do you check missing values in a dataframe using python?

Intermediate Python
Pandas isnull() function detect missing values in the given object. It returns a boolean same-
sized object indicating if the values are NA. Missing values get mapped to True and non-missing
value gets mapped to False.
132. How do you get the frequency of a categorical column of a dataframe

using python?

Basic Python
Using Series.value_counts()
133. Can you write a function using python to impute outliers?

Basic Python
import numpy as np
def removeOutliers(x, outlierConstant):
a = np.array(x)
upper_quartile = np.percentile(a, 75)
lower_quartile = np.percentile(a, 25)
IQR = (upper_quartile - lower_quartile) * outlierConstant
quartileSet = (lower_quartile - IQR, upper_quartile + IQR)
resultList = []
for y in a.tolist():
if y > = quartileSet[0] and y < = quartileSet[1]:
resultList.append(y)
return resultList
134. How can we convert a python series object into a dataframe?

Basic Python
Series.to_frame(name=None)
135. How can you change the index of a dataframe in python?

Basic Python
DataFrame.set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)
keys: label or array-like or list of labels/arrays
This parameter can be either a single column key, a single array of the same length as the
calling DataFrame, or a list containing an arbitrary combination of column keys and arrays. Here,
“array” encompasses Series, Index, np.ndarray, and instances of Iterator.
136. Is Python case sensitive?

Basic Python
Yes
137. What all ways have you used to convert categorical columns into
numerical data using python?

Intermediate Python
One of the most used and popular ones are LabelEncoder and OneHotEncoder.
Both are provided as parts of sklearn library.
LabelEncoder can be used to transform categorical data into integers:
from sklearn.preprocessing import LabelEncoder
label_encoder = LabelEncoder()
x = ['Apple', 'Orange', 'Apple', 'Pear']
y = label_encoder.fit_transform(x)
print(y)
array([0, 1, 0, 2])
OneHotEncoder can be used to transform categorical data into one hot encoded array:
from sklearn.preprocessing import OneHotEncoder
onehot_encoder = OneHotEncoder(sparse=False)
y = y.reshape(len(y), 1)
onehot_encoded = onehot_encoder.fit_transform(y)
print(onehot_encoded)
138. How get_dummies() is different from one hot encoder?

Intermediate Python
OneHotEncoder cannot process string values directly. If your nominal features are strings, then
you need to first map them into integers.
pandas.get_dummies is kind of the opposite. By default, it only converts string columns into one-
hot representation, unless columns are specified.
139. How do you check the distribution of data in python?

Intermediate Python
A simple and commonly used plot to quickly check the distribution of a sample of data is the
histogram.
from matplotlib import pyplot
pyplot.hist(data)
140. What is the difference between iloc and loc activity?

Basic Python
loc gets rows (or columns) with particular labels from the index.
iloc gets rows (or columns) at particular positions in the index (so it only takes integers).
141. Difference between univariate and bivariate analysis? What all different
functions can be used in python?

Basic Python
Univariate statistics summarize only one variable at a time.
Bivariate statistics compare two variables.

Below are a few functions which can be used in the univariate and bivariate analysis:
1. To find the population proportions with different types of blood disorders.
df.Thal.value_counts()
2. To make a plot of the distribution :
sns.distplot(df.Variable.dropna())
3. Find the minimum, maximum, average, and standard deviation of data.
There is a function called ‘describe’
4. Find the mean of the Variable
df.Variable.dropna().mean()
5. Boxplot to observe outliers
sns.boxplot(x = "", y = "", hue = "", data=df)
6. Correlation plot:
data.corr()
142. What all different methods can be used to standardize the data using
python?

Intermediate Python
Min Max Scaler.
Standard Scaler.
Max Abs Scaler.
Robust Scaler.
Quantile Transformer Scaler.
Power Transformer Scaler.
Unit Vector Scaler.
143. What is the apply function in Python? How does it work?

Basic Python
Pandas.apply allow the users to pass a function and apply it on every single value of the Pandas
series.
Syntax:
s.apply(func, convert_dtype=True, args=())
144. How do you do upsampling of data? Name a python function or explain

the code.

Intermediate Python
Up-sampling is the process of randomly duplicating observations from the minority class in order
to reinforce its signal.
There are several heuristics for doing so, but the most common way is to simply resample with
replacement.
Module for resampling in Python:
from sklearn.utils import resample
145. Can you plot 3D plots using matplotlib? Name the function.

Intermediate Python
Yes
Function:
import numpy as np
import matplotlib.pyplot as plt
fig = plt.figure()
ax = plt.axes(projection ='3d')
146. How can you drop a column in python?

Basic Python
DataFrame.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False,
errors='raise')
147. What is the use of ‘inplace’ in python functions?

Basic Python
In-place operation is an operation that changes directly the content of a given linear algebra,
vector, matrices(Tensor) with/without making a copy
When inplace = True is used, it performs an operation on data and nothing is returned.
When inplace=False is used, it performs an operation on data and returns a new copy of data.
148. How do you select a sample of dataframe?

Intermediate Python
1. Randomly select a single row: df = df.sample()
2. Randomly select a specified n number of rows: df = df.sample(n=3)
3. Allow a random selection of the same row more than once: df = df.sample(n=3,replace=True)
4. Randomly select a specified fraction of the total number of rows: df = df.sample(frac=0.50)
149. How would you define a block in Python?

Intermediate Python
A block is a group of statements in a program or script. Usually, it consists of at least one
statement and declarations for the block, depending on the programming or scripting language.
A language which allows grouping with blocks is called a block-structured language
150. How will you remove duplicate data from a dataframe?

Intermediate Python
DataFrame.drop_duplicates(subset=None, keep=’first’, inplace=False)
subset: Subset takes a column or list of column label. It’s default value is none. After passing
columns, it will consider them only for duplicates.
keep: keep is to control how to consider duplicate value. It has only three distinct value and
default is ‘first’.
151. Can you convert a string into an int? When and how?

Basic Python
Python offers the int() method that takes a String object as an argument and returns an integer.
This can be done when the value is either of numeric object or floating-point.
But keep these special cases in mind:
A floating-point (an integer with a fractional part) as an argument will return the float rounded
down to the nearest whole integer.
152. What does the function zip() do?

Intermediate Python
The zip() function takes iterables (can be zero or more), aggregates them in a tuple, and return it.
The syntax of the zip() function is:
zip(*iterables)
153. How many arguments can the range() function take?

Basic Python
It can take mainly three arguments.
start: integer starting from which the sequence of integers is to be returned
stop: integer before which the sequence of integers is to be returned.
The range of integers ends at stop – 1.
step: integer value which determines the increment between each integer in the sequence
154. What is the difference between list, array and tuple in Python?

Basic Python
List:
The list is an ordered collection of data types.
The list is mutable.
List are dynamic and can contain objects of different data types.
List elements can be accessed by index number

Array:
An array is an ordered collection of similar data types.
An array is mutable.
An array can be accessed by using its index number.

Tuple:
Tuples are immutable and can store any type of data type.
it is defined using ().
it cannot be changed or replaced as it is an immutable data type

ML Questions

Uploaded by

Copyright:

Available Formats

ML Questions

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ML Questions

Uploaded by

Copyright:

Available Formats

1. What is XGBoost?

2. How do you deploy a model to cloud

4. Make 4 segments (product category, competitors etc) and identify which

5. Working of ensemble methods such as bagging, boosting, random forest.

6. What is clustering and KNN?

7. What is bagging and boosting?

8. What is ADA boosting?

9. Explain Gradient boosting and Extreme Gradient Boosting?

10. What is Bootstrap sampling?

12. How to apply ML Algorithms in Mfg/Production Environment ?

13. Difference between Classification and Linear Regression?

14. Which model to use to check whether a patient is diabetic or not?

15. Explain missing values and outlier treatment

16. What is logistic regression? The output for logistic regression?

17. What is Ensemble techniques and it's working? some models?

18. What is Decision tree and Random forest?

19. How to deal with underfitting and overfitting

21. How will you explain machine learning to a 5 year old.

22. What do you do in data exploration?

26. How is kNN different from kmeans clustering?

28. When is Ridge regression favorable over Lasso regression?

30. What is the difference between covariance and correlation?

36. What do you understand by Type I vs Type II error ?

39. When does regularization becomes necessary in Machine Learning?

40. What do you understand by Bias Variance trade off?

45. Differentiate between univariate, bivariate and multivariate analysis.

46. What is the difference between Cluster and Systematic Sampling?

48. Explain Lasso regression

49. Explain Gradient Descent Algorithm

50. How machine learning is deployed in real world scenarios?

51. What is cosine similarity?

52. How to implement Tensorflow?

59. What is latent semantic indexing and where can it be applied?

62. What are some popular applications of NLP?

64. What is tokenization, chinking, chunking?

65. What is the skip-gram model?

66. What is a CBOW model?

67. How can you create your own word embeddings?

68. What is the difference between stemming and lemmatization?

75. What is dependency parsing?

79. Problems faced in NLP and how you tackled them?

81. What are some common examples of sequential data?

82. What are some problems with N-gram language models?

83. What are some limitations of RNNs?

85. What is exploding gradients in RNN?

86. Can you give me an example of many-to-one architecture in sequence

87. What activation layer is used in the hidden units of an RNN?

88. What is the use of the Forget Gate in LSTMs?

90. What problems of RNNs do LSTMs address?

1. One-to-Many: ex. Auto-Image captioning

95. Explain encoder-decoder architecture?

96. What are the drawbacks of attention mechanisms?

97. What is BERT? What are the applications of it?

98. What is XLNet?

99. What are the Transformers?

100. What is the time complexity of LSTM?

101. Why do we need attention mechanisms?

102. What are the different types of attention mechanisms?

103. What are the advantages of BERT?

105. Why is the transformer better than LSTMs?