Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Wqpso - Ieee

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

An Optimization of Association Rule Mining

Algorithm using Weighted Quantum behaved PSO


Deepa.S Kalimuthu.M
PG Student, Department of Information Technology Associate Prof, Department of Information Technology
SNS College of Technology SNS College of Technology
Coimbatore, India. Coimbatore, India.
deepasengg@gmail.com mkmuthu73@gmail.com

Abstract— In this paper we propose Weighed Quantum behaved confidence, are always determined by the decision-maker
Particle Swarm Optimization (WQPSO) algorithm for him/herself or through trial-and-error, which seriously affect
improving the performance of association rule mining algorithm the quality of association rule mining, is still under
Apriori. It is a global convergence guaranteed algorithm, which investigation.
outperforms original PSO algorithm and it has fewer
parameters to control the search ability of PSO. Finding Particle swarm optimization (PSO), first introduced by
minimum support and minimum confidence values for mining Kennedy and Eberhart [3], is a population-based optimization
association rules seriously affect the quality of association rule technique, where a population is called a swarm. A simple
mining. In association rule mining, the minimum threshold
explanation of the PSO’s operation is as follows. Each
values are always given by the user. But in this paper, WQPSO
algorithm is used to determine suitable threshold values particle represents a possible solution to the optimization task
automatically and also it improves the computational efficiency at hand. During each iteration, the accelerating direction of
of Apriori algorithm. First, the WQPSO algorithm is processed one particle determined by its own best solution found so far
to find the minimum threshold values. In this algorithm which and the global best position discovered so far by any of the
particle having the highest optimal fitness value, its support and particles in the swarm. This means that if a particle discovers
confidence values are taken as the minimum threshold value to a promising new solution, all the other particles will move
association rule algorithm. Then the minimum support and closer to it, exploring the region more thoroughly in the
minimum confidence values are given to the input of Apriori process. As far as the PSO itself concerned, however, it is not
association rule mining algorithm for mining association rules.
a global optimization algorithm, as has been demonstrated by
Thus the proposed algorithm is verified by applying the
FoodMart2000 database to Microsoft SQL Server 2000. The Van den Bergh [4]. In [5,6], Sun et al. introduce quantum
experimental results show that our proposed method gives theory into PSO and propose a quantum-behaved PSO
better performance and less computational time than the (QPSO) algorithm, which can be guaranteed theoretically to
existing algorithms. find optimal solution in search space. The experiment results
on some widely used benchmark functions show that the
Index Terms—Apriori algorithm, Association rule mining, Data QPSO works better than standard PSO and should be a
mining, PSO algorithm. promising algorithm. In this paper, in order to balance the
global and local searching abilities, we introduce a weight
parameter in calculating the mean best position in QPSO to
render the importance of particles in population when they
I. INTRODUCTION are evolving, and thus proposed an improved quantum-
behaved particle swarm optimization algorithm, weighted
Dpreviously
ata mining is "The nontrivial extraction of implicit,
unknown, and potentially useful information QPSO (WQPSO).
from data." Data mining is an inter-disciplinary field, whose
core is at the intersection of machine learning, statistics and The most representative association rule algorithm is the
databases. Data mining can be categorized into several Apriori algorithm, which was proposed by Agrawal et al. in
models, including association rules, clustering and 1993. The Apriori algorithm repeatedly generates candidate
classification. Among these models, association rule mining itemsets and uses minimal support and minimal confidence to
is the most widely applied method. In the area of association filter these candidate itemsets to find high-frequency
rule mining, most previous research had focused on itemsets. Association rules can be figured out from the high-
improving computational efficiency. The Apriori algorithm is frequency itemsets[2].
the most representative algorithm. It consists of many
modified algorithms that focus on improving its efficiency The rest part of the paper is organized as follows. In
and accuracy. However, two parameters, minimal support and Section 2, a brief introduction of Association rule mining
algorithm is given. The QPSO and its related work is improved WQPSO and show how to balance the searching
introduced in Section 3. In Section 4, we propose the abilities to guarantee the better convergence speed
of particles. Some experiments result on functions and deletes infrequent candidate itemsets. Infrequent itemset is
discussions are presented in Section 5. Finally, the paper is tested in “has infrequent subset.” After the Apriori algorithm
concluded in Section 6. has generated frequent itemsets, association rules can be
generated. As long as the calculated confidence of a frequent
II. ASSICIATION RULE MINING itemset is larger than the predefined minimal confidence, its
corresponding association rule can be accepted. Since the
This section briefly presents the general algorithms of processing of the Apriori algorithm requires plenty of time,
association rule mining. Section A defines the basic its computational efficiency is a very important issue. In
definition of association rule mining, Section B defines the order to improve the efficiency of Apriori, WQPSO
association rule mining algorithm Apriori and Section C algorithm is proposed.
defines the other association rule mining algorithms, which
are used for mining association rules.

A. Definition
The Association rule mining defines that, some hidden
relationships exist between purchased items in transactional
databases. Therefore, mining results can help decision-
makers understand customers’ purchasing behavior. An
association rule is in the form of X→Y, where X and Y
represent Itemset(I), or products, respectively and Itemset
includes all possible items. However, the mining association
rule must accord with two parameters at the same time:

(1)Minimal support: Finding frequent itemsets with their


supports above the minimal support threshold.

(1)

(2)Minimal confidence: Using frequent itemsets found in Eq.


(1) to generate association rules that have confidence levels
above the minimal confidence threshold.

(2) Fig. 1. The Apriori algorithm


B. Apriori Algorithm
The most representative association rule algorithm is the C. Other Association rule mining algorithms
Apriori algorithm, which was proposed by Agrawal et al. in In this section defines, the other algorithms used for
1993. The Apriori algorithm repeatedly generates candidate optimization of association rule mining.
itemsets and uses minimal support and minimal confidence to
filter these candidate itemsets to find high-frequency An efficient hash-based method for discovering the
itemsets. Association rules can be figured out from the high- maximal frequent set (HMFS) algorithm. In 2001, Yang et al.
frequency itemsets. The process of finding high-frequency proposed the efficient hash-based method, HMFS, for
itemsets from candidate itemsets is introduced in Fig. 1 [1]. discovering maximal frequent itemsets. The HMFS method
combines the advantages of both the DHP and the Pincer-
In Fig. 1, Step 1.1 finds the frequent itemset, represented Search algorithm. The combination of the two methods leads
as L1. In Steps 1.2 through 1.10, L1 are utilized to generate to two advantages. First, the HMFS method, in general, can
candidate itemset Ck to find Lk. The process “Apriori gen” reduce the number of database scans. Second, the HMFS can
generates candidate itemsets and processes join and prune. filter the infrequent candidate itemsets and use the filtered
The join procedure, Steps 2.1–2.4, combines Lk−1 into itemsets to find the maximal frequent itemsets. These two
candidate itemsets. The prune procedure, Steps 2.5–2.7, advantages can reduce the overall computing time of finding
the maximal frequent itemsets. In addition, the HMFS xid ( t ) = xid ( t −1) + vid ( t ) (4)
method also provides an efficient mechanism to construct the
maximal frequent candidate itemsets so as to reduce the In the above equation, xid is a current value of the
search space [7]. dimension “d” of the individual “i”, vid is a current velocity
of the dimension “d” of the individual “i”, Pid is a optimal
Genetic algorithms have also been applied in association value of the dimension “d” of the individual “i” so far, Pgd is
rule mining [8]. This study uses weighted items to represent a current optimal value of the dimension “d” of the swarm,
the importance of individual items. These weighted items are c1, c2 are a acceleration coefficients, w is a inertia weight
applied to the fitness function of heuristic genetic algorithms factor. This PSO algorithm is not a global convergence
to estimate the value of different rules. These genetic algorithm and which not having any control of its search
algorithms can generate suitable threshold values for ability. So WQPSO algorithm is proposed for global
association rule mining. In addition, Saggar et al. proposed an convergence and for real time applications.
approach concentrating on optimizing the rules generated
using genetic algorithms. The most important aspect of their B. Quantum behaved PSO
approach is that it can predict the rule that contains negative
attributes [9]. According to the test results, the conclusion In practice, the evolution of mans thinking is uncertain to
drawn stated that the genetic algorithm had considerably a great extent somewhat like a particle having quantum
higher efficiency [10]. behavior. In [3], Jun Sun er al. introduce quantum theory into
PSO and propose a Quantum behaved PSO (QPSO)
Particle swarm optimization algorithm Kennedy and algorithm Their experiment results indicate that the QPSO
Eberhart proposed the particle swarm optimization (PSO) works better than standard PSO on several benchmark
algorithm in 1995. The PSO algorithm has become an functions and it is a promising algorithm. In this section a
evolutionary computation technique and an important novel parameter control method of QPSO is defined. The new
heuristic algorithm in recent years. The main concept of PSO approach in the revised QPSO, with a global reference point
originates from the study of fauna behavior [12]. called “Mainstream Thought” introduced to evaluate the
search scope of a particle, is more efficient in global search
than that in [3].
III. REALATED WORK
In quantum-behaved PSO, search space and solution space
This related work defines the optimization algorithms such of the problem are two spaces of different quality. Wave
as PSO and our proposed algorithm WQPSO. Section A function or probability function of position depicts the state
defines the Particle Swarm Optimization algorithm and its of the particle in quantized search space, not informing us of
velocity, position update. Section B defines that quantum any certain information about the position of a particle that is
behaved Particle Swarm Optimization algorithm. At last vital to evaluate the fitness of a particle. Therefore, state
Section C defines our proposed Weighted Quantum Behaved transformation between two spaces is absolutely necessary In
terms of quantum mechanics, the transformation from
PSO.
quantum state to classical state is called collapse, which in
nature is the measurement of a particle s position. In a
A. Particle Swarm Optimization (PSO) algorithm
quantized search space, wave function is defined for finding
global search solution. The difference in state transformation
PSO is initialized with a group of random particles
between QPSO and traditional PSO are shown in Fig 2.
(solutions) and then searches for optima by updating
generations. During all iterations, each particle is updated by
following the two “best” values. The first one is the best
solution (fitness) it has achieved so far. The fitness value is
also stored. This value is called “pbest.” The other “best”
value that is tracked by the particle swarm optimizer is the
best value obtained so far by any particle in the population.
This best value is a global best and is called “gbest.” After
finding the two best values, each particle updates its
corresponding velocity and position with Eqs. (3) and (4), as
follows [11]:

Velocity calculation is defined as,


( )
vid(t ) = ωvid (t −1) + c1 × rand() × ( pid − xid ) + c2 × Rand() × pgd − xid
(3)

Position update is defined as, Fig. 2 The QPSO search space


C. Weighted quantum behaved PSO This approach can accelerate the database scanning
operation, and it calculates support and confidence more
In this section, we propose an improved quantum-behaved easily and quickly. The transformation approach is explained
particle swarm optimization with weighted mean best by an example in Fig. 3. In Fig. 3, there are five records, say
position according to fitness values of the particles. It is T1 to T5, in the original data. Each of these records is
shown that the improved QPSO has faster local convergence transformed and stored as a binary type. For instance, there
speed, resulting in better balance between the global and local are a total of only four different products in the database, so
searching of the algorithm, and thus generating good four cells exist for each transaction. Take B4 as an example,
performance. In summary, the main contributions of this this transaction only purchased products 2 and 3, so the
paper are as follows: values of cells 2 and 3 are both “1s,” whereas cells 1 and 4
are both “0s.”
1. We propose the specific algorithm to determine the
minimum threshold values using Weighted Quantum behaved
B. IR Value calculation
Particle Swarm Optimization algorithm, and which gives
more reliable and efficient values.
This study applies the WQPSO algorithm in association
2. Then we apply above minimum threshold values to Apriori
rule discovery, as well as in the calculation of IR value which
association rule mining algorithm, to improve the
is included in chromosome encoding. The purpose of such an
performance and efficiency of association rules for real time
inclusion is to produce more meaningful association rules.
transactions.
Moreover, search efficiency is increased when IR analysis is
utilized to decide the rule length generated by chromosomes
IV . WEIGHTED QUANTUM BEHAVED PARTICLE SWARM in particle swarm evolution. IR analysis avoids searching for
OPTIMIZATION ALGORITHM too many association rules, which are meaningless itemsets in
the process of particle swarm evolution. This method
In this paper, in order to balance the global and local addresses the front and back partition points of each
searching abilities, we introduce a weight parameter in chromosome, and the range decided by these two points is
calculating the mean best position in QPSO to render the called the IR, which is shown in Eq. (5):
importance of particles in population when they are evolving,
and thus proposed an improved quantum-behaved particle
swarm optimization algorithm, weighted QPSO (WQPSO).
The proposed WQPSO algorithm comprises two parts, (5)
preprocessing and mining. The first part provides procedures
related to calculating the fitness values of the particle swarm. In Eq. (5), m / = n and m< n. “m” represents the length of
Thus, the data are transformed and stored in a binary format. the itemset and TransNum(m) means the number of
Then, the search range of the particle swarm is set using the transaction records containing m products. “n” is the length
IR (itemset range) value. In the second part of the algorithm, of the itemset, and TransNum(n) means the number of
which is the main contribution of this study, the WQPSO transaction records containing n products. Trans(m, n) means
algorithm is employed to mine the association rules. the number of transaction records purchasing m to n products.
TotalTrans represents the number of total transactions.
A. Binary transformation
The WQPSO algorithmic process is quite similar to that of
This study adopts the approach proposed by Wur and Leu PSO algorithms, but the proposed procedures include only
in 1998 [13] to transform transaction data into binary type the main Quantum function and its mean best position. Each
data, each recorded and stored as either 0 or 1. of the steps in the WQPSO algorithm and the process of
generating association rules are explained as follows:
C. Encoding

According to the definition of association rule mining, the


intersection of the association rule of itemset X to itemset Y
(X→Y) must be empty. Items which appear in itemset X do
not appear in itemset Y, and vice versa. Hence, both the front
and back partition points must be given for the purpose of
chromosome encoding. The itemset before the front partition
point is called “itemset X,” while that between the front
partition and back partition points is called “itemset Y.” The
chromosome encoding approach in this study is “string
Fig. 3 The QPSO search space
encoding.”
D. Fitness value calculation Mainstream Thought as mean of the personal best positions is
somewhat reasonable.
The fitness value in this study is utilized to evaluate the
importance of each particle. The fitness value of each particle
comes from the fitness function. Here, we employ the target
function proposed by Kung [14] to determine the fitness
function value. The improved algorithm is called Weighted (7)
QPSO that is outlined as follows.
The equally weighted mean position, however, is
something of paradox, compared with the evolution of social
(6) culture in real world. For one thing, although the whole social
Fitness (k) is the fitness value of association rule type k. organism determines the Mainstream Thought, it is not
Confidence (k) is the confidence of association rule type k. properly to consider each member equal. In fact, the elitists
Support (k) is the actual support of association rule type k. play more important role in culture development. With this in
Length (k) is the length of association rule type k. The mind when we design a new control method for the QPSO in
objective of this fitness function is maximization. The larger this paper, m in Equation of QPSO is replaced for a weighted
the particle support and confidence, the greater the strength of mean best position. The most important problem is to
the association, meaning that it is an important association determine whether a particle is an elitist or not, or say it
rule. exactly, how to evaluate its importance in calculate the value
of m. It is natural, as in other evolutionary algorithm, that we
associate elitism with the particles’ fitness value. The greater
the fitness, the more important the particle is. Describing it
formally, we can rank the particle in descendent order
according to their fitness value first. Then assign each particle
a weight coefficient ai linearly decreasing with the particle’s
rank, that is, the nearer the best solution, the larger its weight
coefficient is. The mean best position m, therefore, is
calculated as

Fig. 4 The Proposed system architecture (8)

E. Population generation: G. Termination condition:

In order to apply the evolution process of the WQPSO To complete particle evolution, the design of a termination
algorithm, it is necessary to first generate the initial condition is necessary. In this study, the evolution terminates
population. In this study, we select particles which have when the fitness values of all particles are the same. In other
larger fitness values as the population. The particles in this words, the positions of all particles are fixed. Another
population are called initial particles. termination condition occurs after 100 iterations and the
evolution of the particle swarm is completed. Finally, after
F. Search the best particle: the best particle is found, its support and confidence are
recommended as the value of minimal support and minimal
First, the particle with the maximum fitness value in the confidence. These parameters are employed for association
population is selected as the “gbest”. rule mining to extract valuable information.

From Eq(7), we can see that the mean best position is The WQPSO is much different from the PSO in that the
simply the average on the personal best position of all update equation of QPSO ensures the particle’s appearing in
particles, which means that each particle is considered equal the whole n-dimensional search space at each iteration, while
and exert the same influence on the value of m. The the particle in the PSO can only fly in a bounded space at
philosophy of this method is that the Mainstream Though, each iteration. Employing the global convergence criterion,
that is, mean best position m, determines the search scope or we can conclude that the QPSO or WQPSO is a global
creativity of the particle [14]. The definition of the convergent algorithm and the PSO is not.
V. EXPERIMENTAL STUDIES

Our proposed methodology is to be evaluated for the Food


mart 2000 database using Microsoft SQL server and java.
This experiment also compares the performance of existing
PSO algorithm and the proposed WQPSO algorithm. This
results shows that, WQPSO algorithm minimizes the
Quantization errors and also it gives improved fitness values
for mining association rules. By using this optimization
algorithm, the performance of Apriori association rule mining
algorithm will be improved.

The performance of this algorithm is measured using the


following parameters. The parameters are no of iterations, Fig. 6 Iterations Vs Quantization error
quantization error, fitness value and accuracy of association
rules. The quantization error is minimized using our proposed Thus, the proposed WQPSO algorithm is better than the
algorithm. The accuracy and fitness values are also to be traditional Apriori algorithm since it does not need to
improved. subjectively set up the threshold values for minimal support
and confidence. This can also save computation time and
A. Results and Discussion enhance performance.

Our proposed algorithm minimizes quantization errors and VI. CONCLUSION


it gives large fitness value for every iterations. The results are
discussed and compared with Particle swarm optimization In the field of association rule mining, the minimum
algorithm. threshold values are always given by the user. But this study
intends to determine the minimum support and minimum
confidence values for mining association rules using the
WQPSO optimization algorithm. This algorithm mainly
defined for improving the performance of the Apriori
algorithm. Thus also it minimizes the quantization errors and
fitness value to be improved. From this experiment we can
know that, WQPSO algorithm is a global convergence
algorithm and PSO is not. Thus the Optimizing association
rule mining algorithms gives better results than the simple
association rule mining algorithms.

REFERENCES

[1] J. Han, M. Kamber, Data Mining: Concepts and


Fig. 5 Iterations Vs Fitness value Techniques, Morgan Kaufmann, New York, 2000.
[2] A. Savasere, E. Omiecinski, S. Navathe, An efficient
Figure 5 defines the iterations Vs fitness value. That is the algorithm for mining association rules in large database, in:
fitness value is increased for WQPSO than the PSO Proceedings of the 21st VLDB Conference, 1995, pp. 432–
algorithm. Thus the accuracy of association rules are also to 444.
be improved. [3] J. Kennedy, R. Eberhart, Particle swarm optimization, in:
Proceedings of IEEE International Conference On Neural
The average running times for different population sizes Network, 1995, pp. 1942–1948.
are also illustrated. Furthermore, in regard to the selection of [4] F. Van den Bergh, An Analysis of Particle Swarm
threshold value setup, this study can provide the most feasible Optimizers, Ph.D. Thesis, University of Pretoria, November
minimal support and confidence. This dramatically decreases 2001.
the time consumed by trial-and-error. This algorithm mainly [5] J. Sun, B. Feng, W.B. Xu, Particle swarm optimization
defined for improving the performance of the Apriori with particles having quantum behavior, in: IEEE
algorithm. Proceedings of Congress on Evolutionary Computation,
2004, pp. 325–331.
Figure 6 defines the iterations Vs quantization error. It [6] J. Sun, W.B. Xu, B. Feng, a global search strategy of
defines that the quantization error of WQPSO is minimized quantum-behaved particle swarm optimization, in:
using our proposed algorithm.
Cybernetics and Intelligent Systems Proceedings of the 2004
IEEE Conference, 2004, pp. 111–116.
[7] D.L. Yang, C.T. Pan, Y.C. Chung, An efficient hash-
based method for discovering the maximal frequent set, in:
Proceeding of the 25th Annual International Conference on
Computer Software and Applications, 2001, pp. 516–551.
[8] S.S. Gun, Application of genetic algorithm and weighted
itemset for association rule mining, Master Thesis,
Department of Industrial Engineering and Management,
Yuan-Chi University, 2002.
[9] M. Saggar, A.K. Agrawal, A. Lad, Optimization of
association rule mining using improved genetic algorithms,
in: Proceeding of the IEEE International Conference on
Systems Man and Cybernetics, vol. 4, 2004, pp. 3725–3729.
[10] C. Li, M. Yang, Association rule data mining in
manufacturing information system based on genetic
algorithms, in: Proceeding of the 3rd International
Conference on Computational Electromagnetics and Its
Applications, 2004, pp. 153–156.
[11] Particle Swarm Optimization: Tutorial,
http://www.swarmintelligence.org/
tutorials.php
[13] M.P. Song, G.C. Gu, Research on particle swarm
optimization: a review, in: Proceedings of the IEEE
International Conference on Machine Learning and
Cybernetics, 2004, pp. 2236–2241.
[14] C.Y. Chen, F. Ye, Particle swarm optimization algorithm
and its application to clustering analysis, in: Proceedings of
the IEEE International Conference on Networking, Sensing
and Control, Taipei, Taiwan, 2004, pp. 21–23.
[15] R.J. Kuo, M.J. Wang, T.W. Huang, Application of
clustering analysis to reduce SMT setup time—a case study
on an industrial PC manufacturer in Taiwan, in: Proceedings
of International Conference on Enterprise Information
Systems, Milan, Italy, 2008.
[16] R.J. Kuo, F.J. Lin, Application of particle swarm
optimization to reduce SMT setup time for industrial PC
manufacturer in Taiwan, International Journal of Innovative
Computing, Information, and Control, in press.
[17] B. Zhao, C.X. Guo, B.R. Bai, Y.J. Cao, An improved
particle swarm optimization algorithm for unit commitment,
International Journal of Electrical Power & Energy Systems
28 (7) (2006) 482–490.
[18] S.H. Kung, Applying Genetic Algorithm and Weight
Item to Association Rule, Master Thesis, Department of
Industrial Engineering and Management, Yuan Ze University,
Taiwan, 2002.
[19] R. Agrawal, T. Imielin´ ski, A. Swami, Mining
association rules between sets of items in large databases,
ACM SIGMOD Record 22 (2) (1993) 207–216.

You might also like