Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1367497.1367617acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Characterizing typical and atypical user sessions in clickstreams

Published: 21 April 2008 Publication History

Abstract

Millions of users retrieve information from the Internet using search engines. Mining these user sessions can provide valuable information about the quality of user experience and the perceived quality of search results. Often search engines rely on accurate estimates of Click Through Rate (CTR) to evaluate the quality of user experience. The vast heterogeneity in the user population and presence of automated software programs (bots) can result in high variance in the estimates of CTR. To improve the estimation accuracy of user experience metrics like CTR, we argue that it is important to identify typical and atypical user sessions in clickstreams. Our approach to identify these sessions is based on detecting outliers using Mahalanobis distance in the user session space. Our user session model incorporates several key clickstream characteristics including a novel conformance score obtained by Markov Chain analysis. Editorial results show that our approach of identifying typical and atypical sessions has a precision of about 89%. Filtering out these atypical sessions reduces the uncertainty (95% confidence interval) of the mean CTR by about 40%. These results demonstrate that our approach of identifying typical and atypical user sessions is extremely valuable for cleaning "noisy" user session data for increased accuracy in evaluating user experience.

References

[1]
E. Agichtein, E. Brill, and S. T. Dumais. Improving web search ranking by incorporating user behavior information. In SIGIR, pages 19--26, 2006.
[2]
K. Ali and M. Scarr. Robust methodologies for modeling web click distributions. In WWW ?07: Proceedings of the 16th international conference on World Wide Web, pages 511--520. ACM Press, 2007.
[3]
V. Almeida, D. A. Menascé, R. H. Riedi, F. Peligrinelli, R. C. Fonseca, and W. M. Jr. Analyzing robot behavior in e-business sites. In SIGMETRICS/Performance, pages 338--339, 2001.
[4]
R. Baeza-Yates, C. Hurtado, M. Mendoza, and G. Dupret. Modeling user search behavior. In LA-WEB ?05: Proceedings of the Third Latin American Web Congress, page 242, 2005.
[5]
J. Borges and M. Levene. Data mining of user navigation patterns. Web Usage Analysis and User Profiling, Springer-Verlag as Lecture Notes in Computer Science, 1836:92--111, 1999.
[6]
L. D. Catledge and J. E. Pitkow. Characterizing browsing strategies in the world-wide web. Computer Networks and ISDN Systems, 27(6):1065--1073, 1995.
[7]
L. Clark, I. Ting, C. Kimble, P. Wright, and D. Kudenko. Combining ethnographic and clickstream data to identify user Web browsing strategies, Information Research, 11(2) paper 249, 2006.
[8]
J. F. Cove and B. C. Walsh. Online text retrieval via browsing. Information Processing and Management, 24(1):31--37, 1988.
[9]
M. D. Dikaiakosa, A. Stassopoulou, and L. Papageorgioua. An investigation of webcrawler behavior: characterization and metrics. Computer Communications, 28(8):880--897, 2005.
[10]
C. Holscher and G. Strube. Web search behavior of internet experts and newbies. In Proceedings of the 9th international World Wide Web conference on Computer networks, pages 337--346, 2000.
[11]
T. Joachims, L. Granka, B. Pan, H. Hembrooke, and G. Gay. Accurately interpreting clickthrough data as implicit feedback. In SIGIR ?05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 154--161, 2005.
[12]
R. A. Johnson and D. W. Wichern, editors. Applied multivariate statistical analysis. Prentice-Hall, Inc., 1988.
[13]
N. Kammenhuber, J. Luxenburger, A. Feldmann, and G. Weikum. Web search clickstreams. In Proceedings of the 6th ACM SIGCOMM on Internet measurement (IMC), pages 245--250, 2006.
[14]
Kosala and Blockeel. Web mining research: A survey. SIGKDD: SIGKDD Explorations: Newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining, ACM, 2, 2000.
[15]
D. A. Menascé, V. Almeida, R. H. Riedi, F. Ribeiro, R. C. Fonseca, and W. M. Jr. In search of invariants for e-business workloads. In ACM Conference on Electronic Commerce, pages 56--65, 2000.
[16]
A. L. Montgomery, S. Li, K. Srinivasan, and J. C. Liechty. Modeling online browsing and path analysis using clickstream data. In Mining Business Databases. Joint Statistical Meetings (JSM), 2003.
[17]
R. R. Sarukkai. Link prediction and path analysis using markov chains. Computer Networks, 33:377?386, 2000.
[18]
A. Stassopoulou and M. D. Dikaiakos. Crawler detection: A bayesian approach. In International Conference on Internet Surveillance and Protection (ICISP), 2006.
[19]
P. Tan and V. Kumar. Modeling of web robot navigational patterns. In Proc. ACM WebKDD Workshop, 2000.
[20]
P. Tan and V. Kumar. Discovery of web robot sessions based on their navigational patterns. Data Mining and Knowledge Discovery, 6:9--35, 2002.
[21]
I. Ting, C. Kimble, and D. Kudenko. UBB mining: Finding unexpected browsing behaviour in clickstream data to improve a web sites design. In IEEE/WIC/ACM International Conference on Web Intelligence (WI), pages 179--185, 2005.
[22]
D. Vise. Clicking to steal. Washington Post Magazine, April 17 2005.
[23]
H. Weinreich, H. Obendorf, and E. Herder. Data cleaning methods for client and proxy logs. In WWW Workshop Proceedings: Logging Traces of Web Activity: The Mechanics of Data Collection, 2006.

Cited By

View all
  • (2024)A Generic Framework for Finding Special Quadratic Elements in Data StreamsIEEE/ACM Transactions on Networking10.1109/TNET.2024.339202932:4(3269-3284)Online publication date: Aug-2024
  • (2023)Efficient Fingerprinting Attack on Web Applications: An Adaptive Symbolization ApproachElectronics10.3390/electronics1213294812:13(2948)Online publication date: 4-Jul-2023
  • (2023)Sync Ratio and Cluster Heat Map for Visualizing Student EngagementEducational Data Science: Essentials, Approaches, and Tendencies10.1007/978-981-99-0026-8_7(255-289)Online publication date: 30-Apr-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WWW '08: Proceedings of the 17th international conference on World Wide Web
April 2008
1326 pages
ISBN:9781605580852
DOI:10.1145/1367497
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 April 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. clickstream analysis
  2. outlier detection
  3. web search

Qualifiers

  • Research-article

Conference

WWW '08
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)1
Reflects downloads up to 12 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A Generic Framework for Finding Special Quadratic Elements in Data StreamsIEEE/ACM Transactions on Networking10.1109/TNET.2024.339202932:4(3269-3284)Online publication date: Aug-2024
  • (2023)Efficient Fingerprinting Attack on Web Applications: An Adaptive Symbolization ApproachElectronics10.3390/electronics1213294812:13(2948)Online publication date: 4-Jul-2023
  • (2023)Sync Ratio and Cluster Heat Map for Visualizing Student EngagementEducational Data Science: Essentials, Approaches, and Tendencies10.1007/978-981-99-0026-8_7(255-289)Online publication date: 30-Apr-2023
  • (2021)Detecting user-perceived failure in mobile applications via mining user tracesProceedings of the 43rd International Conference on Software Engineering: Companion Proceedings10.1109/ICSE-Companion52605.2021.00054(123-125)Online publication date: 25-May-2021
  • (2021)Exploiting the Community Structure of Fraudulent Keywords for Fraud Detection in Web SearchJournal of Computer Science and Technology10.1007/s11390-021-0218-236:5(1167-1183)Online publication date: 30-Sep-2021
  • (2021)Mining sequences with exceptional transition behaviour of varying order using quality measures based on information-theoretic scoring functionsData Mining and Knowledge Discovery10.1007/s10618-021-00808-xOnline publication date: 24-Nov-2021
  • (2020)Early Detection of User Exits from Clickstream Data: A Markov Modulated Marked Point Process ModelProceedings of The Web Conference 202010.1145/3366423.3380238(1671-1681)Online publication date: 20-Apr-2020
  • (2020)Traceback: Learning to Identify Website’s Landing URLs via Noisy Web Traces Passively2020 27th International Conference on Telecommunications (ICT)10.1109/ICT49546.2020.9239491(1-6)Online publication date: 5-Oct-2020
  • (2020)Representation of Click-Stream DataSequences for Learning User Navigational Behavior by Using Embeddings2020 IEEE International Conference on Big Data (Big Data)10.1109/BigData50022.2020.9378437(3173-3179)Online publication date: 10-Dec-2020
  • (2019)Co-learning Multiple Browsing Tendencies of a User by Matrix Factorization-based Multitask LearningIEEE/WIC/ACM International Conference on Web Intelligence10.1145/3350546.3352526(253-257)Online publication date: 14-Oct-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media