Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
56 views

Data Mining

The document provides a list of 40 multiple choice questions about data mining objectives and concepts from the years 2020, 2017, 2016 and 2015. The questions cover topics such as the definition of knowledge discovery in databases, examples of supervised and unsupervised learning, outputs of data mining processes, types of data mining algorithms and applications.

Uploaded by

Sujeet Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views

Data Mining

The document provides a list of 40 multiple choice questions about data mining objectives and concepts from the years 2020, 2017, 2016 and 2015. The questions cover topics such as the definition of knowledge discovery in databases, examples of supervised and unsupervised learning, outputs of data mining processes, types of data mining algorithms and applications.

Uploaded by

Sujeet Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

1

Data Mining Objectives


2020
1.The full form of KDD is

(a) Knowledge Database

(b) Knowledge discovery in databases

(c) Knowledge data division

(d) Knowledge data definition

Ans- b

2.You are given data about seismic activity in Japan, and you want to predict a magnitude of the next
earthquake, this is an example of

(a) Supervised Learning

(b) Unsupervised Learning

(c) Serration

(d) dimensionality reduction

Ans- a

3.Which of the following does not involve in data mining?

(a) Knowledge extraction

(b) Data archaeology

(c) Data exploration

(d) Data transformation

Ans- d

4. ____ is a comparison of the general features of the target class data objects against the general
features of objects from one or multiple contrasting classes

(a) Data Characterization

(b) Data classification

(c) Data discrimination

(d)Data selection

Ans- c

5. Bayesian classifiers is

(a) a class of learning algorithm that tries to find an optimum classification of a set of examples using
the probabilistic theory

(b) any mechanism employed by a learning system to constrain the search space of a hypothesis
2

(c) an approach to the design of learning algorithms that is inspired by the fact that when people
encounter new situations, they often explain them by reference to familiar experiences, adapting
the explanations to fit the new situation

(d) None of the above

Ans- a

6. The output of KDD is

(a) Data

(b) Information

(c) Query

(d) Useful information

Ans-d

7. Cluster is

(a) Group of similar objects that differ significantly from other objects

(b) Operations on a database to transform or simplify data in order to prepare it for a machine
learning algorithm

(c) Symbol of Representation of facts or ideas from which information can potentially be extracted

(d) None of the above

Ans- a

8. Background Knowledge referred to

(a) Additional acquaintance used by a learning algorithm to facilitate the learning process

(b) a neural network that makes use of a hidden layer

(c) it is a form of automatic learning

(d) None of the above

Ans- a

9. Case-based learning is

(a) A class of learning algorithm that tries to find an optimum classification of a set of examples using
the probabilistic theory

(b) Any mechanism employed by a learning system to constrain the search space of a hypothesis

(c) An approach to the design of learning algorithms that is inspired by the fact that when people
encounter new situations, they often explain them by reference to the familiar experiences,
adapting the explanations to fit the new situation

(d) None of the above

Ans- c

10. Some telecommunication companies want to segment their customers into distinct groups in
order to send appropriate subscription offers this is an example of
3

(a) Supervised Learning

(b) Data extraction

(c) Serration

(d) Unsupervised Learning

Ans- d

2017
11. An ……………… system is market-oriented and is used for data analysis by knowledge workers, including
managers, executives, and analysts.
(a) OLAP
(b) OLTP
(c) Both of the above
(d) None of the above

Ans- a

12. Which of the following is not a kind of data warehouse application

(a) Information Processing

(b) Analytical Processing

(c) Data Mining

(d) Transaction Processing

Ans- d

13. Data can be frequently Updated in …………. environment

(a) Data Warehouse

(b) Data Mining

(c) Operational

(d) Informational

Ans- c

14. ……… is data about data.

(a) Metadata

(b) Microdata

(c) Minidata

(d) Multidata

Ans- a

15. Which of the following is required K -means clustering?

(a) Define distance matric

(b) Number of clusters


4

(c) Initial guess as to cluster centroids

(d) All of the above

Ans- d

16. The sigmoid also known as ……….. functions.

(a) Regression

(b) Logistic

(c) Probability

(d) Neural

Ans- b

17. A FP – Tree Growth Algorithm can be implemented ………... Phases.

(a) One

(b) Two

(c) Three

(d) Five

Ans- Two

18. DENCLUE clustering method is of type

(a) Partitioning

(b) Hierarchical

(c) Density Based

(d) Grid Based

Ans- c

19. STING in grid-based multi resolution clustering stands for

(a) Statistical information Grid

(b) Statistics in Geometric Data

(c) Standardization of Geometric

(d) None of the above

Ans- a

20. A tree structure called a dendrogram is commonly used in

(a) Partitioning Base Clustering

(b) Hierarchical clustering

(c) Model-based clustering

(d) All of the above

Ans- b
5

2016
21. An operational system is which of the following?

(a) A system that is used to turn the business in real time and is based on historical data

(b) A system that is used to run the business in real time and is based on current data

(c) A system that is used to support decision making and is based on current data

(d) A system that is used to support decision making and is based on historical data

Ans- b

22. A star schema has what type of relationship between a dimension and fact table

(a) Many to Many

(b) One to One

(c) One to many

(d) All of the above

Ans- c

23. A data warehouse is which of the following?

(a) Can be updated by end users

(b) Contains numerous naming conventions and formats

(c) Organized around important subject areas

(d) Contains only current data

Ans- c

24. Which of the following schema contains multiple fact tables?

(a) Star schema

(b) Snowflake schema

(c) Fact constellation schema

(d) All of the above

Ans- c

25. The ……. operation performs a selection on One dimension of the given cube, resulting in a subcube

(a) pivot

(b) slice

(c) roll-up

(d) drill down

Ans- b

26. The process of partitioning the ranges of quantitative attributes into intervals, is called
6

(a) Splitting

(b) grouping

(c) binning

(d) None of the above

Ans- c

27. OPTICS clustering method is

(a) Partitioning method

(b) grid-based method

(c) hierarchical method

(d) density-based method

Ans- d

28. A prior Algorithm forms frequent k-itemset candidates based on the

(a) frequent (k-5) itemsets

(b) frequent (5-3) itemsets

(c) frequent (5-2) itemsets

(d) frequent (5-1) itemsets

Ans- d

29. ID3, C4.5 and CART are used in

(a) Association rules generation

(b) Decision trees

(c) clustering

(d) Web mining

Ans- b

30. CLARANS stands for

Ans- -Clustering Large Applications based on RANdomized Search

2015
31. Data mining is also referred to as

(a) Knowledge discovery in databases

(b) data cleaning

(c) data extraction

(d) data management


7

Ans- a

32. Data about data is called

(a) table

(b) database

(c) metadata

(d) integration

Ans- c

33. To represent any n-dimension data we need a series of ………. Dimension cubes.

(a) (n-1)

(b) n

(c) n+1

(d) n+2

Ans- a

34. The ……... operation performs a selection on one dimension of the given cube, resulting in a subcube.

(a) pivot

(b) slice

(c) roll-up

(d) drill-down

Ans- b

35. ________ serves support multidimensional views of data through array-based multidimensional storage
engines.

(a) ROLAP

(b) MOLAP

(c) Data warehouse

(d) database

Ans- b

36. The ______ software gives the user the opportunity to look at the data from a variety of different
dimensions.

(a) query tools

(b) multidimensional analysis

(c) data mining tools

(d) None of the above

Ans- b
8

37. _____ Techniques can be used to reduce the number of values for a given continuous attribute by
dividing the range of the attribute into two intervals

(a) Discretization

(b) Transformation

(c) Smoothing

(d) Generalization

Ans- a

38. FP tree growth algorithm can be implemented in

(a) one phase

(b) two phases

(c) Three phases

(d) four phases

Ans- a

39. Consider a scenario where a bin contains values 4,8 and 15. if smoothing by bin-means method is
applied to clean the data then each of the original value in the bin will be replaced by

(a) 8

(b) 9

(c) 15

(d) 4

Ans- b

40. ____ is simple text files that are automatically generated every time someone accesses one web site.

(a) server session

(b) Log file

(c) User session

(d) None of the above

Ans- b

You might also like