Abstract
Internet serves as source of information. Clustering web pages is needed to identify topics in a page. But dynamism is one of the web clustering challenges, because the web pages change very frequently and new pages are always added and removed. Processing a new page should not require to repeat the whole clustering. For these reasons, incremental algorithms are an appropriate alternative for web page clustering
In this paper we propose a new hybrid technique we call Incremental K Ant Colony Clustering (IKACC). It is based on the Ant Colony Optimization and the k-means algorithms. We adapt this approach to classify the new pages in the online manner, and we compare it to incremental k-means algorithm. The results show that this approach is more efficient and produces better results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proc. Fifth Berkeley Symp. on Math. Statist. and Prob., vol. 1, pp. 281–297. Univ. of Calif. Press (1967)
Saatchi, S., Hung, C.-C.: Hybridization of the ant colony optimization with the K-means algorithm for clustering. In: Kalviainen, H., Parkkinen, J., Kaarna, A. (eds.) SCIA 2005. LNCS, vol. 3540, pp. 511–520. Springer, Heidelberg (2005)
Wong, W.C., Fu, A.W.C.: Incremental Document Clustering for Web Page Classification. In: IEEE Int. Conference on Society in the 21st Century: Emerging Technologies and New Challenges (IS 2000), Japan (2000)
Gavin, S., Yue, X.: Enhancing an incremental clustering algorithm for web page collections. In: 2009 IEEEWICACM International Joint Conference on Web Intelligence and Intelligent Agent Technology, vol. 3, pp. 81–84 (2009)
Liu, B., Pan, J., McKay, R.I.B.: Entropy-based metrics in swarm clustering. International Journal of Intelligent Systems 24, 989–1011 (2009)
Deneubourg, J.L., Goss, S., Franks, N., Sendova-Franks, A., Detrain, C., Chretien, L.: The dynamics of collective sorting robot-like ants and ant-like robots. In: Proceedings of the First International Conference on Simulation of Adaptive Behavior on from Animals to Animats (1990)
Monmarche, N., Slimane, M., Venturini, G.: On Improving Clustering in Numerical Databases With Artificial Ants. In: Floreano, D., Mondada, F. (eds.) ECAL 1999. LNCS, vol. 1674, pp. 626–635. Springer, Heidelberg (1999)
Kao, Y., Lee, S.Y.: Combining k-means and particle swarm optimization for dynamic data clustering problems. In: IEEE International Conference on Intelligent Computing and Intelligent Systems, ICIS 2009, vol. 1, pp. 757–761 (2009)
Kuo, R.J., Wang, M.J., Huang, T.W.: An application of particle swarm optimization algorithm to clustering analysis. Soft Computing 15, 533–542 (2009)
Shu-Chuan Chu, J.F.R.: A clustering algorithm using tabu search approach with simulated annealing for vector quantization. Chinese Journal of Electronics 12, 349–353 (2003)
Shang, G., Zaiyue, Z., Xiaoru, Z., Cungen, C.: A new hybrid ant colony algorithm for clustering problem. In: International Workshop on Education Technology and Training and 2008 International Workshop on Geoscience and Remote Sensing, ETT and GRS 2008, vol. 1, pp. 645–648 (2008)
Kao, Y.T., Zahara, E., Kao, I.W.: A hybridized approach to data clustering. Expert Systems with Applications 34, 1754–1762 (2008)
Youssef, S.M.: A new hybrid evolutionary-based data clustering using fuzzy particle swarm optimization. In: 23rd IEEE International Conference on Tools with Artificial Intelligence 1082-3409/11 (2011)
Wang, C., Lu, J., Zhang, G.: Mining key information of web pages: A method and its application. Expert Systems with Applications 33, 425–433 (2007)
Linde, Y., Buzo, A.G.R.: An algorithm for vector quantizer design. IEEE Transactions on Communications 28, 84–95 (1980)
Chakraborty, S., Nagwani, N.K.: Analysis and study of incremental K-means clustering algorithm. In: Mantri, A., Nandi, S., Kumar, G., Kumar, S. (eds.) HPAGC 2011. CCIS, vol. 169, pp. 338–341. Springer, Heidelberg (2011)
Sinkaa, M., Corneb, D.W.: The banksearch web document dataset: investigating unsupervised clustering and category similarity. Journal of Network and Computer Applications 28, 129–146 (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Boughachiche, Y., Kamel, N. (2014). A New Algorithm for Incremental Web Page Clustering Based on k-Means and Ant Colony Optimization. In: Herawan, T., Ghazali, R., Deris, M. (eds) Recent Advances on Soft Computing and Data Mining. Advances in Intelligent Systems and Computing, vol 287. Springer, Cham. https://doi.org/10.1007/978-3-319-07692-8_33
Download citation
DOI: https://doi.org/10.1007/978-3-319-07692-8_33
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07691-1
Online ISBN: 978-3-319-07692-8
eBook Packages: EngineeringEngineering (R0)