Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-642-54903-8_17guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Website Community Mining from Query Logs with Two-Phase Clustering

Published: 06 April 2014 Publication History

Abstract

A website community refers to a set of websites that concentrate on the same or similar topics. There are two major challenges in website community mining task. First, the websites in the same topic may not have direct links among them because of competition concerns. Second, one website may contain information about several topics. Accordingly, the website community mining method should be able to capture such phenomena and assigns such website into different communities. In this paper, we propose a method to automatically mine website communities by exploiting the query log data in Web search. Query log data can be regarded as a comprehensive summarization of the real Web. The queries that result in a particular website clicked can be regarded as the summarization of that website content. The websites in the same topic are indirectly connected by the queries that convey information need in this topic. This observation can help us overcome the first challenge. The proposed two-phase method can tackle the second challenge. In the first phase, we cluster the queries of the same host to obtain different content aspects of the host. In the second phase, we further cluster the obtained content aspects from different hosts. Because of the two-phase clustering, one host may appear in more than one website communities.

References

[1]
Beitzel, S.M., Jensen, E.C., Chowdhury, A., Frieder, O., Grossman, D.: Temporal analysis of a very large topically categorized web query log. J. Am. Soc. Inf. Sci. Technol. 582, 166---178 2007
[2]
Beitzel, S.M., Jensen, E.C., Chowdhury, A., Grossman, D., Frieder, O.: Hourly analysis of a very large topically categorized web query log. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 321---328 2004
[3]
Beitzel, S.M., Jensen, E.C., Frieder, O., Lewis, D.D., Chowdhury, A., Kolcz, A.: Improving automatic query classification via semi-supervised learning. In: Proceedings of the Fifth IEEE International Conference on Data Mining, pp. 42---49 2005
[4]
Bing, L., Lam, W., Wong, T.L.: Using query log and social tagging to refine queries based on latent topics. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 583---592 2011
[5]
Broder, A.Z., Fontoura, M., Gabrilovich, E., Joshi, A., Josifovski, V., Zhang, T.: Robust classification of rare queries using web knowledge. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 231---238 2007
[6]
Fagni, T., Perego, R., Silvestri, F., Orlando, S.: Boosting the performance of web search engines: Caching and prefetching query results by exploiting historical usage data. ACM Trans. Inf. Syst. 241, 51---78 2006
[7]
Gravano, L., Hatzivassiloglou, V., Lichtenstein, R.: Categorizing web queries according to geographical locality. In: Proceedings of the Twelfth International Conference on Information and Knowledge Management, pp. 325---333 2003
[8]
Hu, J., Fang, L., Cao, Y., Zeng, H.J., Li, H., Yang, Q., Chen, Z.: Enhancing text clustering by leveraging wikipedia semantics. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 179---186 2008
[9]
Jansen, B.J., Spink, A.: An analysis of web searching by european alltheweb. com users. Inf. Process. Manage. 412, 361---381 2005
[10]
Jansen, B.J., Spink, A.: How are we searching the world wide web?: a comparison of nine search engine transaction logs. Inf. Process. Manage. 421, 248---263 2006
[11]
Jansen, B.J., Spink, A., Koshman, S.: Web searcher interaction with the dogpile.com metasearch engine. J. Am. Soc. Inf. Sci. Technol. 585, 744---755 2007
[12]
Joachims, T.: Optimizing search engines using clickthrough data. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 133---142 2002
[13]
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM 465, 604---632 1999
[14]
Koshman, S., Spink, A., Jansen, B.J.: Web searching on the vivisimo search engine. J. Am. Soc. Inf. Sci. Technol. 5714, 1875---1887 2006
[15]
Lempel, R., Moran, S.: Predictive caching and prefetching of query results in search engines. In: Proceedings of the 12th International Conference on World Wide Web, pp. 19---28 2003
[16]
Mat-Hassan, M., Levene, M.: Associating search and navigation behavior through log analysis: Research articles. J. Am. Soc. Inf. Sci. Technol. 569, 913---934 2005
[17]
Ni, X., Sun, J.T., Hu, J., Chen, Z.: Cross lingual text classification by mining multilingual topics from wikipedia. In: Proceedings of the fourth ACM International Conference on Web Search and Data Mining, pp. 375---384 2011
[18]
Ozmutlu, H.C., Spink, A., Ozmutlu, S.: Analysis of large data logs: an application of poisson sampling on excite web queries. Inf. Process. Manage. 384, 473---490 2002
[19]
Ozmutlu, S., Spink, A., Ozmutlu, H.C.: A day in the life of web searching: an exploratory study. Inf. Process. Manage. 402, 319---345 2004
[20]
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. In: Proceedings of the 7th International World Wide Web Conference, pp. 161---172 1998
[21]
Pass, G., Chowdhury, A., Torgeson, C.: A picture of search. In: InfoScale 2006 2006
[22]
Ponzetto, S.P., Navigli, R.: Large-scale taxonomy mapping for restructuring and integrating wikipedia. In: Proceedings of the 21st International Jont Conference on Artifical Intelligence, pp. 2083---2088 2009
[23]
Radlinski, F., Joachims, T.: Query chains: learning to rank from implicit feedback. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 239---248 2005
[24]
Silverstein, C., Marais, H., Henzinger, M., Moricz, M.: Analysis of a very large web search engine query log. SIGIR Forum 331, 6---12 1999
[25]
Spink, A., Ozmutlu, H.C., Lorence, D.P.: Web searching for sexual information: an exploratory study. Inf. Process. Manage. 401, 113---123 2004
[26]
Vogel, D., Bickel, S., Haider, P., Schimpfky, R., Siemen, P., Bridges, S., Scheffer, T.: Classifying search engine queries using the web as background knowledge. SIGKDD Explor. Newsl. 72, 117---122 2005
[27]
Wang, P., Domeniconi, C.: Building semantic kernels for text classification using wikipedia. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 713---721 2008
[28]
Wang, X., Zhai, C.: Mining term association patterns from search logs for effective query reformulation. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, pp. 479---488 2008
[29]
Wong, T.L., Bing, L., Lam, W.: Normalizing web product attributes and discovering domain ontology with minimal effort. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, pp. 805---814 2011
[30]
Zhai, Y., Liu, B.: Web data extraction based on partial tree alignment. In: Proceedings of the 14th International Conference on World Wide Web, pp. 76---85 2005

Cited By

View all
  • (2015)Constructing Complex Search Tasks with Coherent Subtask Search GoalsACM Transactions on Asian and Low-Resource Language Information Processing10.1145/274254715:2(1-29)Online publication date: 11-Dec-2015
  • (2015)Web Query Reformulation via Joint Modeling of Latent Topic Dependency and Term ContextACM Transactions on Information Systems10.1145/269966633:2(1-38)Online publication date: 17-Feb-2015

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
CICLing 2014: Proceedings of the 15th International Conference on Computational Linguistics and Intelligent Text Processing - Volume 8404
April 2014
577 pages
ISBN:9783642549021
  • Editor:
  • Alexander Gelbukh

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 06 April 2014

Author Tags

  1. Query Logs
  2. Tow-phase Clustering
  3. Website Community

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2015)Constructing Complex Search Tasks with Coherent Subtask Search GoalsACM Transactions on Asian and Low-Resource Language Information Processing10.1145/274254715:2(1-29)Online publication date: 11-Dec-2015
  • (2015)Web Query Reformulation via Joint Modeling of Latent Topic Dependency and Term ContextACM Transactions on Information Systems10.1145/269966633:2(1-38)Online publication date: 17-Feb-2015

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media