Vivaquestions
Vivaquestions
Vivaquestions
( 4 keyword definition)
6. What is dimension?
7. What is Fact?
9. Briefly state different between data ware house & data mart?
10. Difference /what is largely changing dimension and slowly changing dimension
12. What is the difference between dependent data warehouse and independent data
warehouse?
o PPT
o A data mining extension can be used to slice the data the source cube in the order
as discovered by data mining. When a cube is mined the case table is a dimension.
16. Explain how to use DMQL -the data mining query language.
17. Define Rollup and cube.
o Custom rollup operators provide a simple way of controlling the process of rolling
up a member to its parents values.The rollup uses the contents of the column as
custom rollup operator for each member and is used to evaluate the value of the
member’s parents.
If a cube has multiple custom rollup formulas and custom rollup members, then
the formulas are resolved in the order in which the dimensions have been added
to the cube.
o Data warehousing is merely extracting data from different sources, cleaning the
data and storing it in the warehouse. Where as data mining aims to examine or
explore the data using queries. These queries can be fired on the data warehouse.
Explore the data in data mining helps in reporting, planning strategies, finding
meaningful patterns etc.
E.g. a data warehouse of a company stores all the relevant information of projects
and employees. Using Data mining, one can use this data to generate different
reports like profits generated etc.
o Discreet data can be considered as defined or finite data. E.g. Mobile numbers,
gender. Continuous data can be considered as data which changes continuously
and in an ordered fashion. E.g. age, height, wt, temp
o Clustering algorithm is used to group sets of data with similar characteristics also
called as clusters. These clusters help in making faster decisions, and exploring
data. The algorithm first identifies relationships in a dataset following which it
generates a series of clusters based on the relationships. The process of creating
clusters is iterative. The algorithm redefines the groupings to create clusters that
better represent the data.
o Agriculture, biological data analysis, call record analysis, DSS, Business intelligence
system etc
o Distributed data warehouse shares data across multiple data repositories for the
purpose of OLAP operation.
o Data marts
o Middle cuboids
o Star schema-example
o A model that fits training data well can have generalization errors. Such situation is
called as model over fitting.
o It is one of the lazy learner algorithm used in classification. It finds the k-nearest
neighbor of the point of interest.
Spatial Data Mining = Mining Spatial Data Sets (i.e. Data Mining + Geographic Information Systems)
o Multimedia Data Mining is a subfield of data mining that deals with an extraction
of implicit knowledge, multimedia data relationships, or other patterns not
explicitly stored in multimedia databases
o patent analysis
o Information dissemination
o Web content mining refers to the discovery of useful information from Web
contents, including text, images, audio, video, etc.
o Web structure mining studies the model underlying the link structures of the Web.
It has been used for search engine result ranking and other Web applications.
Web usage mining focuses on using data mining techniques to analyze search logs to find interesting
patterns. One of the main applications of Web usage mining is its use to learn user profiles.
o Data discrimination is the comparison of the general features of the target class
objects against one or more contrasting objects.
63. What can business analysts gain from having a data warehouse?
Second, a data warehouse can enhance business productivity because it is able to quickly and efficiently
gather information that accurately describes the organization.
Third, a data warehouse facilitates customer relationship management because it provides a consistent
view of customers and item across all lines of business, all departments and all markets.
Finally, a data warehouse may bring about cost reduction by tracking trends, patterns, and exceptions
over long periods in a consistent and reliable manner.
o Descriptive task
o Predictive task
o Classification is the process of finding a model (or function) that describes and
distinguishes data classes or concepts.
o A database may contain data objects that do not comply with the general behavior
or model of the data. These data objects are called outliers.
o Data evolution analysis describes and models regularities or trends for objects
whose behavior change over time.
Although this may include characterization, discrimination, association and correlation analysis,
classification, prediction, or clustering of time related data.
Distinct features of such as analysis include time-series data analysis, sequence or periodicity pattern
matching, and similarity-based data analysis.
ØKnowledge Based
ØUser Interface
ØGenerate data extract, transform, and load procedures for import jobs
o Association rules
o Clustering
o Deviation detection
o Similarity search
o Sequence Mining
o Dashboards
o A subsequence, such as buying first a PC, the a digital camera, and then a memory
card, if it occurs frequently in a shopping history database, is a (frequent)
sequential pattern.
o Prediction
80. List the typical OLAP operations.
o Roll UP
o DRILL DOWN
o ROTATE
81. If there are 3 dimensions, how many cuboids are there in cube?
o 2^3 = 8 cuboids
• Star Schema is a multi-dimension model where each of its disjoint dimension is represented in
single table.
•Both star and snowflake schemas are dimensional models; the difference is in their physical
implementations.
•Snowflake schemas support ease of dimension maintenance because they are more normalized.
•Star schemas are easier for direct user access and often support simpler and more efficient queries.
•It may be better to create a star version of the snowflaked dimension for presentation to the users
o •Star Schema is very easy to understand, even for non technical business manager.
•Star Schema is easily extensible and will handle future changes easily
84. What are the characteristics of data warehouse?
o Integrated
o Non-volatile
o Subject oriented
o Time varient
o The support for a rule R is the ratio of the number of occurrences of R, given all
occurrences of all rules.
The confidence of a rule X->Y, is the ratio of the number of occurrences of Y given X, among all other
occurrences given X
86. What are the criteria on the basic of which classification and prediction can be compared?
o The process of cleaning junk data is termed as data purging. Purging data would
mean getting rid of unnecessary NULL values of columns. This usually happens
when the size of the database gets too large.