ML Unit 1 MCQ

UNIT I
1. What is classification?
a) when the output variable is a category, such as “red” or “blue” or “disease” and “no
disease”.
b) when the output variable is a real value, such as “dollars” or “weight”.
Ans: Solution A
2. What is regression?
a) When the output variable is a category, such as “red” or “blue” or “disease” and “no
disease”.
b) When the output variable is a real value, such as “dollars” or “weight”.
Ans: Solution B
3. What is supervised learning?

a) All data is unlabelled and the algorithms learn to inherent structure from the input data
b) All data is labelled and the algorithms learn to predict the output from the input data
c) It is a framework for learning where an agent interacts with an environment and receives
a reward for each interaction
d) Some data is labelled but most of it is unlabelled and a mixture of supervised and
unsupervised techniques can be used.
Ans: Solution B
4. What is Unsupervised learning?

Ans: Solution A
5. What is Semi-Supervised learning?

Ans: Solution D
6. What is Reinforcement learning?
Ans: Solution C
7. Sentiment Analysis is an example of:
Regression,
Classification
Clustering
Reinforcement Learning
Options:
A. 1 Only
B. 1 and 2
C. 1 and 3
D. 1, 2 and 4
Ans : Solution D
8. The process of forming general concept definitions from examples of concepts to be

learned.
a) Deduction
b) abduction
c) induction
d) conjunction
Ans : Solution C
9. Computers are best at learning

a) facts.
b) concepts.
c) procedures.
d) principles.
Ans : Solution A
10. Data used to build a data mining model.

a) validation data
b) training data
c) test data
d) hidden data
Ans : Solution B
11. Supervised learning and unsupervised clustering both require at least one
a) hidden attribute.
b) output attribute.
c) input attribute.
d) categorical attribute.
Ans : Solution A
12. Supervised learning differs from unsupervised clustering in that supervised learning requires
a) at least one input attribute.
b) input attributes to be categorical.
c) at least one output attribute.
d) output attributes to be categorical.
Ans : Solution B
13. A regression model in which more than one independent variable is used to predict the
dependent variable is called
a) a simple linear regression model
b) a multiple regression models
c) an independent model
d) none of the above
Ans : Solution C
14. A term used to describe the case when the independent variables in a multiple regression model
are correlated is
a) Regression
b) correlation
c) multicollinearity
Ans : Solution C
15. A multiple regression model has the form: y = 2 + 3x1 + 4x2. As x1 increases by 1 unit (holding x2
constant), y will
a) increase by 3 units
b) decrease by 3 units
c) increase by 4 units
d) decrease by 4 units
Ans : Solution C
16. A multiple regression model has

a) only one independent variable
b) more than one dependent variable
c) more than one independent variable
Ans : Solution B
17. A measure of goodness of fit for the estimated regression equation is the
a) multiple coefficient of determination
b) mean square due to error
c) mean square due to regression
Ans : Solution C
18. The adjusted multiple coefficient of determination accounts for

a) the number of dependent variables in the model
b) the number of independent variables in the model
c) unusually large predictors
Ans : Solution D
19. The multiple coefficient of determination is computed by

a) dividing SSR by SST
b) dividing SST by SSR
c) dividing SST by SSE
Ans : Solution C
20. For a multiple regression model, SST = 200 and SSE = 50. The multiple coefficient of
determination is
a) 0.25
b) 4.00
c) 0.75
Ans : Solution B
21. A nearest neighbor approach is best used

a) with large-sized datasets.
b) when irrelevant attributes have been removed from the data.
c) when a generalized model of the data is desirable.
d) when an explanation of what has been found is of primary importance.
Ans : Solution B
22. Another name for an output attribute.

a) predictive variable
b) independent variable
c) estimated variable
d) dependent variable
Ans : Solution B
23. Classification problems are distinguished from estimation problems in that

a) classification problems require the output attribute to be numeric.
b) classification problems require the output attribute to be categorical.
c) classification problems do not allow an output attribute.
d) classification problems are designed to predict future outcome.
Ans : Solution C
24. Which statement is true about prediction problems?

a) The output attribute must be categorical.
b) The output attribute must be numeric.
c) The resultant model is designed to determine future outcomes.
d) The resultant model is designed to classify current behavior.
Ans : Solution D
25. Which statement about outliers is true?

a) Outliers should be identified and removed from a dataset.
b) Outliers should be part of the training dataset but should not be present in the test
data.
c) Outliers should be part of the test dataset but should not be present in the training
data.
d) The nature of the problem determines how outliers are used.
Ans : Solution D
26. Which statement is true about neural network and linear regression models?
a) Both models require input attributes to be numeric.
b) Both models require numeric attributes to range between 0 and 1.
c) The output of both models is a categorical attribute value.
d) Both techniques build models whose output is determined by a linear sum of weighted
input attribute values.
Ans : Solution A
27. Which of the following is a common use of unsupervised clustering?

a) detect outliers
b) determine a best set of input attributes for supervised learning
c) evaluate the likely performance of a supervised learner model
d) determine if meaningful relationships can be found in a dataset
Ans : Solution A
28. The average positive difference between computed and desired outcome values.
a) root mean squared error
b) mean squared error
c) mean absolute error
d) mean positive error
Ans : Solution D
29. Selecting data so as to assure that each class is properly represented in both the training and
test set.
a) cross validation
b) stratification
c) verification
d) bootstrapping
Ans : Solution B
30. The standard error is defined as the square root of this computation.
a) The sample variance divided by the total number of sample instances.
b) The population variance divided by the total number of sample instances.
c) The sample variance divided by the sample mean.
d) The population variance divided by the sample mean.
Ans : Solution A
31. Data used to optimize the parameter settings of a supervised learner model.
a) Training
b) Test
c) Verification
d) Validation
Ans : Solution D
32. Bootstrapping allows us to

a) choose the same training instance several times.
b) choose the same test set instance several times.
c) build models with alternative subsets of the training data several times.
d) test a model with alternative subsets of the test data several times.
Ans : Solution A
33. The correlation between the number of years an employee has worked for a company and the
salary of the employee is 0.75. What can be said about employee salary and years worked?
a) There is no relationship between salary and years worked.
b) Individuals that have worked for the company the longest have higher salaries.
c) Individuals that have worked for the company the longest have lower salaries.
d) The majority of employees have been with the company a long time.
e) The majority of employees have been with the company a short period of time.
Ans : Solution B
34. The correlation coefficient for two real-valued attributes is –0.85. What does this value tell you?
a) The attributes are not linearly related.
b) As the value of one attribute increases the value of the second attribute also increases.
c) As the value of one attribute decreases the value of the second attribute increases.
d) The attributes show a curvilinear relationship.
Ans : Solution C
35. The average squared difference between classifier predicted output and actual output.
a) mean squared error
b) root mean squared error
c) mean absolute error
d) mean relative error
Ans : Solution A
36. Simple regression assumes a __________ relationship between the input attribute and output
attribute.
a) Linear
b) Quadratic
c) reciprocal
d) inverse
Ans : Solution A
37. Regression trees are often used to model _______ data.

a) Linear
b) Nonlinear
c) Categorical
d) Symmetrical
Ans : Solution B
38. The leaf nodes of a model tree are

a) averages of numeric output attribute values.
b) nonlinear regression equations.
c) linear regression equations.
d) sums of numeric output attribute values.
Ans : Solution C
39. Logistic regression is a ________ regression technique that is used to model data having a
_____outcome.
a) linear, numeric
b) linear, binary
c) nonlinear, numeric
d) nonlinear, binary
Ans : Solution D
40. This technique associates a conditional probability value with each data instance.
a) linear regression
b) logistic regression
c) simple regression
d) multiple linear regression
Ans : Solution B
41. This supervised learning technique can process both numeric and categorical input attributes.
a) linear regression
b) Bayes classifier
c) logistic regression
d) backpropagation learning
Ans : Solution A
42. With Bayes classifier, missing data items are

a) treated as equal compares.
b) treated as unequal compares.
c) replaced with a default value.
d) ignored.
Ans : Solution B
43. This clustering algorithm merges and splits nodes to help modify nonoptimal partitions.
a) agglomerative clustering
b) expectation maximization
c) conceptual clustering
d) K-Means clustering
Ans : Solution D
44. This clustering algorithm initially assumes that each data instance represents a single cluster.
b) conceptual clustering
c) K-Means clustering
d) expectation maximization
Ans : Solution C
45. This unsupervised clustering algorithm terminates when mean values computed for the current
iteration of the algorithm are identical to the computed mean values for the previous iteration.
b) conceptual clustering
c) K-Means clustering
d) expectation maximization
Ans : Solution C
46. Machine learning techniques differ from statistical techniques in that machine learning methods
a) typically assume an underlying distribution for the data.
b) are better able to deal with missing and noisy data.
c) are not able to explain their behavior.
d) have trouble with large-sized datasets.
Ans : Solution B

ML Unit 1 MCQ

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

ML Unit 1 MCQ

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ML Unit 1 MCQ

Uploaded by

Copyright:

Available Formats

UNIT I

3. What is supervised learning?

4. What is Unsupervised learning?

5. What is Semi-Supervised learning?

7. Sentiment Analysis is an example of:

8. The process of forming general concept definitions from examples of concepts to be

9. Computers are best at learning

10. Data used to build a data mining model.

16. A multiple regression model has

18. The adjusted multiple coefficient of determination accounts for

19. The multiple coefficient of determination is computed by

21. A nearest neighbor approach is best used

22. Another name for an output attribute.

23. Classification problems are distinguished from estimation problems in that

24. Which statement is true about prediction problems?

25. Which statement about outliers is true?

27. Which of the following is a common use of unsupervised clustering?

32. Bootstrapping allows us to

37. Regression trees are often used to model _______ data.

38. The leaf nodes of a model tree are

42. With Bayes classifier, missing data items are

You might also like