Data Generalization
Data Generalization
Data Generalization
Advantage:
Since many aggregate functions need to be computed repeatedly in data
analysis, the storage of pre-computed results in multidimensional data
cube may ensure fast response time.
It offers flexible views of data from different angles and at different level
of abstraction.
An efficient implementation Data generalization.
Disadvantage:
It cannot answer some important questions which concept
description can such as which dimensions should be used in the
description, and what levels should the generalization process
reach.
Lack of intelligent analysis
The Attribute-Oriented indication (AOI) approach:
I. Attribute removal
II. Attribute generalization (also known as concept hierarchy ascension)
Aggregation is performed by merging identical, generalized tuples and
accumulating their respective counts. The resulting generalized relation can
be mapped into different forms for presentation to user such as charts or
rules.
Attribute Removal:
Attribute removal is based on the following rules:
1. If there is a large dataset of distinct values for an attribute of the initial working
relation, but there is no generalization operator on the attribute. Then that attribute
should be removed because it cannot be generalized and preserving it would imply
keeping a large number of disjuncts which contradicts the goal of generating concise
rules…
These rules corresponds to the generalization rule know as Dropping
conditions in the machine learning literature on learning-from-examples
2. If higher level concepts are expressed in terms of other attributes, then the
attribute should be removed from the working relation.
For example, suppose that the attribute in question street, whose higher
concepts represented by the attribute (city, province or state, country).
The removal of street is equivalent to the application of a generalization
operator.
This corresponds to the generalization rule know as Climbing
Generalization.
Attribute Generalization:
1. If there is a large dataset of distinct values for an attribute in the initial working
relation, and there exists a set of generalization operators on the attribute, then a
generalization operator should be selected and applied to the attribute.
2. This rule is based on the following reasoning.
Use of generalization operator to generalize an attribute value within a tuple,
or rule, in the working relation will make the rule cover more of the original
data tuples, thus generalizing the concept it represents.
This corresponds to the generalization rule know as climbing generalization
tress in learning-from-examples.