Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1871437.1871583acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Temporal query log profiling to improve web search ranking

Published: 26 October 2010 Publication History

Abstract

Temporal information can be leveraged and incorporated to improve web search ranking. In this work, we propose a method to improve the ranking of search results by identifying the fundamental properties of temporal behavior of low-quality hosts and spam-prone queries in search logs and modeling those properties as quantifiable features. In particular, we introduce the concepts of host churn, a measure of changes in host visibility for user queries, and query volatility, a measure of semantic instability of query results, and propose the methods for construction of temporal profiles from search query logs that can be used for estimation of a set of features based on the introduced concepts. The utility of the proposed concepts has been experimentally demonstrated for two language-independent search tasks: the regression-based ranking of search results and a novel classification problem of detecting spam-prone queries introduced in this work.

References

[1]
E. Agichtein, E. Brill, and S. Dumais. Improving web search ranking by incorporating user behavior information. In Proceedings of ACM SIGIR, pages 19--26, 2006.
[2]
I. Bíró, J. Szabó, and A. A. Benczúr. Latent dirichlet allocation in web spam filtering. In Proceedings of the 4th International Workshop on Adversarial Information Retrieval on the Web (AIRWeb'08), 2008.
[3]
L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Wadsworth and Brooks, Monterey, CA, 1984.
[4]
C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In Proceedings of ICML, pages 89--96, 2005.
[5]
C. Castillo, C. Corsi, D. Donato, P. Ferragina, and A. Gionis. Query-log mining for detecting spam. In Proceedings of the 4th International Workshop on Adversarial Information Retrieval on the Web (AIRWeb'08), 2008.
[6]
C. Castillo, D. Donato, A. Gionis, V. Murdock, and F. Silvestri. Know your neighbors: Web spam detection using the web topology. In Proceedings of ACM SIGIR, pages 423--430, 2008.
[7]
O. Chapelle and Y. Zhang. A dynamic bayesian network click model for web search ranking. In Proceedings of WWW, pages 1--10, 2009.
[8]
N. Dai, B. D. Davison, and X. Qi. Looking into the past to better classify web spam. In Fetterly and Gyöngyi {13}, pages 1--8.
[9]
F. Diaz. Integration of news content into web results. In Proceedings of WSDM, pages 182--191, 2009.
[10]
F. Diaz and R. Jones. Using temporal profiles of queries for precision prediction. In Proceedings of ACM SIGIR, pages 18--24, New York, NY, USA, 2004. ACM.
[11]
G. Dupret and C. Liao. A model to estimate intrinsic document relevance from the clickthrough logs of a web search engine. In Proceedings of WSDM, pages 181--190, 2010.
[12]
R. Fagin, R. Kumar, and D. Sivakumar. Comparing top k lists. SIAM Journal on Discrete Mathematics, 17(1):134--160, 2003.
[13]
D. Fetterly and Z. Gyöngyi, editors. AIRWeb 2009, Fifth International Workshop on Adversarial Information Retrieval on the Web, Madrid, Spain, April 21, 2009, ACM International Conference Proceeding Series, 2009.
[14]
D. Fetterly, M. Manasse, and M. Najork. Spam, damn spam, and statistic: using statistical analysis to locate spam web pages. In Proceedings of the 7th International Workshop on the Web and Databases (WebDB'04), pages 1--6, 2004.
[15]
J. H. Friedman. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29:1189--1232, 2001.
[16]
N. S. Glance, M. Hurst, and T. Tomokiyo. Blogpulse: Automated trend discovery for weblogs. In WWW 2004 Workshop on the Weblogging Ecosystem, New York, NY USA, May 2004. ACM.
[17]
J. Guiver and E. Snelson. Learning to rank with SoftRank and Gaussian processes. In Proceedings of SIGIR, pages 259--266, 2008.
[18]
F. Guo, C. Liu, A. Kannan, T. Minka, M. Taylor, Y.-M. Wang, and C. Faloutsos. Click chain model in web search. In Proceedings of WWW, pages 11--20, 2009.
[19]
Z. Gyöngyi and H. Garcia-Molina. Web spam taxonomy. In Proceedings of the 1st International Workshop on Adversarial Information Retrieval on the Web (AIRWeb'05), 2005.
[20]
M. R. Henzinger, R. Motwani, and C. Silverstein. Challenges in web search engines. SIGIR Forum, 36(2):11--22, 2002.
[21]
K. Järvelin and J. Kekäläinen. Ir evaluation methods for retrieving highly relevant documents. In SIGIR 2000, pages 41--48, New York, NY, USA, 2000. ACM.
[22]
Y. joo Chung, M. Toyoda, and M. Kitsuregawa. A study of link farm distribution and evolution using a time series of web snapshots. In Fetterly and Gyöngyi {13}, pages 9--16.
[23]
Y.-R. Lin, H. Sundaram, Y. Chi, J. Tatemura, and B. L. Tseng. Splog detection using self-similarity analysis on blog temporal dynamics. In AIRWeb '07: Proceedings of the 3rd international workshop on Adversarial information retrieval on the web, pages 1--8, New York, NY, USA, 2007. ACM.
[24]
G. Mishne, D. Carmel, and R. Lempel. Blocking blog spam with language model disagreement. In Proceedings of the 1st International Workshop on Adversarial Information Retrieval on the Web (AIRWeb'05), 2005.
[25]
G. A. Mishne. Applied Text Analytics for Blogs. PhD thesis, University of Amsterdam, Amsterdam, 2007.
[26]
A. Ntoulas, M. Najork, M. Manasse, and D. Fetterly. Detecting spam web pages through content analysis. In Proceedings of WWW, pages 83--92, 2006.
[27]
F. Radlinski and T. Joachims. Active exploration for learning rankings from clickthrough data. In Proceedings of SIGKDD, pages 570--579, 2007.
[28]
G. Shen, B. Gao, T.-Y. Liu, G. Feng, S. Song, and H. Li. Detecting link spam using temporal information. In Proceedings of ICDM, pages 1049--1053, 2006.
[29]
B. Wu and B. D. Davison. Identifying link farm spam pages. In Proceedings of WWW, pages 820--829, 2005.
[30]
J. Xu and H. Li. Adarank: a boosting algorithm for information retrieval. In Proceedings of ACM SIGIR, pages 391--398, 2007.
[31]
R. zhang, Y. Chang, Z. Zheng, D. Metzler, and J. yun Nie. Search engine adaptation by feedback control adjustment for time-sensitive query. In Proceedings of North American Chapter of the Association for Computational Linguistics - Human Language Technologies conference, 2009.
[32]
Z. Zheng, K. Chen, G. Sun, and H. Zha. A regression framework for learning ranking functions using relative relevance judgments. In Proceedings of ACM SIGIR, pages 287--294, 2007.

Cited By

View all
  • (2017)Exploring Scalability and Time-Sensitiveness in Reliable Social Sensing With Accuracy AssessmentIEEE Access10.1109/ACCESS.2017.27074805(14405-14418)Online publication date: 2017
  • (2013)Intercepting temporal constraints for searching images over the web2013 International Conference on Information Systems and Computer Networks10.1109/ICISCON.2013.6524188(129-132)Online publication date: Mar-2013

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management
October 2010
2036 pages
ISBN:9781450300995
DOI:10.1145/1871437
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. search logs analysis
  2. search spam
  3. temporal data mining

Qualifiers

  • Research-article

Conference

CIKM '10

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2017)Exploring Scalability and Time-Sensitiveness in Reliable Social Sensing With Accuracy AssessmentIEEE Access10.1109/ACCESS.2017.27074805(14405-14418)Online publication date: 2017
  • (2013)Intercepting temporal constraints for searching images over the web2013 International Conference on Information Systems and Computer Networks10.1109/ICISCON.2013.6524188(129-132)Online publication date: Mar-2013

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media