Data Analytics 2marks PDF
Data Analytics 2marks PDF
QUESTION BANK
m
What are the factors to be considered while selecting the sample in statistics?
The sample should be
• Large enough to be representative of the population.
.co
• Small enough to be manageable.
• Accessible to the sampler.
• Free of bias.
m
2. To discover new things or structure that is unknown to human beings.
3. To fill in skeletal or computer specifications about a domain.
.co
Give the components of a learning system.
1. Critic
2. Sensors
3. Learning Element
4. Performance Element
5. Effectors
lts
6. Problem generators.
Data transformation
Data mining
Pattern evaluation
Knowledge representation
y2
What is Visualization?
Visualization is for depiction of data and to gain intuition about data being observed. It
Assists the analysts in selecting display formats, viewer perspectives and data
representation schema
m
What is Descriptive and predictive data mining?
Descriptive data mining describes the data set in a concise and summertime manner and
.co
Presents interesting general properties of the data. Predictive data mining analyzes the
data in order to construct one or set of models and attempts to predict the behavior of new
data sets.
What is bootstrap?
An interpretation of the jack knife is that the construction of pseudo value is based on
Repeatedly and systematically sampling with out replacement from the data at hand. This
y2
m
optimizes a certain scoring function (e.g. Least nodes, most robust, least assumptions)
.co
What is clustering?
Clustering is the process of grouping the data into classes or clusters so that objects
within a cluster have high similarity in comparison to one another, but are very dissimilar
to objects in other clusters.
lts
What are the requirements of clustering?
• Scalability
• Ability to deal with different types of attributes
• Ability to deal with noisy data
u
• Minimal requirements for domain knowledge to determine input parameters
• Constraint based clustering
• Interpretability and usability
res
Write the preprocessing steps that may be applied to the data for classification and
prediction.
a. Data Cleaning
b. Relevance Analysis
c. Data Transformation
m
Define Data Classification.
It is a two-step process. In the first step, a model is built describing a pre-determined set
.co
of data classes or concepts. The model is constructed by analyzing database tuples
described by attributes. In the second step the model is used for classification.
set of data items, which is used for decision-making processes. Association rules analyzes
buying patterns that are frequently associated or purchased together.
Define support.
2b
Support is the ratio of the number of transactions that include all items in the antecedent
and consequent parts of the rule to the total number of transactions. Support is an
association rule interestingness measure.
m
Boolean Association rule.
Quantitative Association rule.
2. Based on the dimensions of data involved in the rule.
.co
a. Single Dimensional Association rule.
b. Multi Dimensional Association rule.
3. Based on the levels of abstractions involved in the rule.
Single level Association rule.
Multi level Association rule.
lts
4. Based on various extensions to association mining.
Maxpatterns.
Frequent closed itemsets.
u
What are the advantages of Dimensional modeling?
Ease of use.
High performance
res
Dimensional modeling is a logical design technique that seeks to present the data in a
Standard framework that intuitive and allows for high-performance access. It is
inherently
Dimensional and adheres to a discipline that uses the relational model with some
2b
important restrictions.
m
What is data warehouse performance issue?
The performance of a data warehouse is largely a function of the quantity and type of
data stored within a database and the query/data loading workload placed upon the
.co
system.
are brought together in the right order and at the right time.
Development
• Technology track: Technical Architecture design, Product Selection & Installation
• Application track: End user Application Specification, End user Application
Development
2b
• Deployment
• Maintenance & Growth
• Project Management
m
• Not Dynamic
• Consistency
• Iterative Development
.co
List some of the Data Warehouse tools.
• OLAP (Online Analytic Processing)
• ROLAP (Relational OLAP)
• End User Data Access tool
• Ad Hoc Query tool
lts
• Data Transformation services
• Replication
u
Explain OLAP.
The general activity of querying and presenting text and number data from Data
res
Explain ROLAP.
ROLAP is a set of user interfaces and applications that give a relational database a
dimensional flavour. ROLAP stands for Relational Online Analytic Processing.
2b
m
Data mining for the Telecommunication industry
.co
• Data mining for Biomedical and DNA data analysis
• Data mining for Financial data analysis
• Data mining for the Retail industry
• Data mining for the Telecommunication industry
lts
What is the difference between “supervised” and unsupervised” learning scheme?
In data mining during classification the class label of each training sample is provided,
this type of training is called supervised learning (i.e.) the learning of the model is
supervised in that it is told to which class each training sample belongs. Eg. Classification
u
In unsupervised learning the class label of each training sample is not known and the
member or set of classes to be learned may not be known in advance. Eg.Clustering
res
m
• Performance
• Diversity in data types
.co
Explain the data mining functionalities.
The data mining functionalities are:
• Concept class description
• Association analysis
• Classification and prediction
• Cluster Analysis
lts
• Outlier Analysis
u
Explain the different types of data repositories on which mining can be performed.
The different types of data repositories on which mining can be performed are:
res
• Relational Databases
• Data Warehouses
• Transactional Databases
• Advanced Databases
• Flat files
y2
Top-down view
Data source view
Data warehouse view
Business query view
3tier DW architecture
m
The data preprocessing techniques are:
Data Cleaning
Data integration
.co
Data transformation
Data reduction
• Normalization
• Attribute Construction
Parametric Methods:
• Regression Model
• Log linear Model
Non-Parametric Methods
Sampling
Histogram
Clustering
m
Segmentation by natural partitioning
Binning
Histogram Analysis
Cluster Analysis
.co
Explain Data mining Primitives.
There are 5 Data mining Primitives. They are:
• Task relevant data
• Kinds of knowledge to be mined
• Concept Hierarchies
lts
• Interesting Measures
• Knowledge Presentation and Visualization Technique to be used for Discovery
patterns
u
Explain Attribute Oriented Induction.
res
m
Explain the Back Propagation technique.
• Definition
• Back Propagation Algorithm & diagram
• Example