Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Demographic differences in search engine use with implications for cohort selection

Published: 01 January 2019 Publication History

Abstract

The correlation between the demographics of users and the text they write has been investigated through literary texts and, more recently, social media. However, differences pertaining to language use in search engines has not been thoroughly analyzed, especially for age and gender differences. Such differences are important especially due to the growing use of search engine data in the study of human health, where queries are used to identify patient populations. Using three datasets comprising of queries from multiple general-purpose Internet search engines we investigate the correlation between demography (age, gender, and income) and the text of queries submitted to search engines. Our results show that females and younger people use longer queries. This difference is such that females make approximately 25% more queries with 10 or more words. In the case of queries which identify users as having specific medical conditions we find that females make 53% more queries than expected, and that this results in patient cohorts which are highly skewed in gender and age, compared to known patient populations. We show that methods for cohort selection which use additional information beyond queries where users indicate their condition are less skewed. Finally, we show that biased training cohorts can lead to differential performance of models designed to detect disease from search engine queries. Our results indicate that studies where demographic representation is important, such as in the study of health aspect of users or when search engines are evaluated for fairness, care should be taken in the selection of search engine data so as to create a representative dataset.

References

[1]
Bi, B., Shokouhi, M., Kosinski, M., & Graepel, T. (2013). Inferring the demographics of search users: Social data meets search queries. In Proceedings of the 22nd international conference on World Wide Web (pp. 131–140). ACM.
[2]
Diaz F, Gamon M, Hofman JM, Kıcıman E, and Rothschild D Online and social media data as an imperfect continuous panel survey PLoS ONE 2016 11 1 e0145406
[3]
Giat E and Yom-Tov E Evidence from web-based dietary search patterns to the role of b12 deficiency in non-specific chronic pain: A large-scale observational study Journal of Medical Internet Research 2018 20 1 e4
[4]
Goswami, S., Sarkar, S., & Rustagi, M. (2009). Stylometric analysis of bloggers age and gender. In Third international AAAI conference on weblogs and social media.
[5]
Koppel M, Schler J, and Argamon S Computational methods in authorship attribution Journal of the Association for Information Science and Technology 2009 60 1 9-26
[6]
Lorigo L, Pan B, Hembrooke H, Joachims T, Granka L, and Gay G The influence of task and gender on search and evaluation behavior using google Information Processing & Management 2006 42 4 1123-1131
[7]
Mehrotra, R., Anderson, A., Diaz, F., Sharma, A., Wallach, H., & Yilmaz, E. (2017). Auditing search engines for differential satisfaction across demographics. In Proceedings of the 26th international conference on World Wide Web companion (pp. 626–633). International World Wide Web Conferences Steering Committee.
[8]
Newman ML, Groom CJ, Handelman LD, and Pennebaker JW Gender differences in language use: An analysis of 14,000 text samples Discourse Processes 2008 45 3 211-236
[9]
Ofran Y, Paltiel O, Pelleg D, Rowe JM, and Yom-Tov E Patterns of information-seeking for cancer on the internet: An analysis of real world data PLoS ONE 2012 7 9 e45921
[10]
Otterbacher, J. (2010). Inferring gender of movie reviewers: exploiting writing style, content and metadata. In Proceedings of the 19th ACM international conference on information and knowledge management (pp. 369–378). ACM.
[11]
Paparrizos J, White RW, and Horvitz E Screening for pancreatic adenocarcinoma using signals from web search logs: Feasibility study and results Journal of Oncology Practice 2016 12 8 737-744
[12]
Pennebaker JW and Stone LD Words of wisdom: Language use over the life span Journal of Personality and Social Psychology 2003 85 2 291
[13]
Polgreen PM, Chen Y, Pennock DM, Nelson FD, and Weinstein RA Using internet searches for influenza surveillance Clinical Infectious Diseases 2008 47 11 1443-1448
[14]
Preoţiuc-Pietro, D., Lampos, V., & Aletras, N. (2015). An analysis of the user occupational class through twitter content. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (Vol. 1, pp. 1754–1764).
[15]
Purcell, K., Rainie, L., & Brenner, J. (2012) Search engine use 2012.
[16]
Rangel, F., & Rosso, P. (2013). Use of language and author profiling: Identification of gender and age. In Proceedings of the 10th Workshop on Natural Language Processing and Cognitive Science, NLPCS-2013, Marseille, France, Oct 15–16.
[17]
Schwartz HA, Eichstaedt JC, Kern ML, Dziurzynski L, Ramones SM, Agrawal M, Shah A, Kosinski M, Stillwell D, Seligman ME, et al. Personality, gender, and age in the language of social media: The open-vocabulary approach PLoS ONE 2013 8 9 e73791
[18]
Soldaini, L., & Yom-Tov, E. (2017). Inferring individual attributes from search engine queries and auxiliary information. In Proceedings of the 26th international conference on World Wide Web (pp. 293–301). International World Wide Web Conferences Steering Committee.
[19]
Song, Y., Ma, H., Wang, H., & Wang, K. (2013). Exploring and exploiting user search behavior on mobile and tablet devices to improve search relevance. In Proceedings of the 22nd international conference on World Wide Web (pp. 1201–1212). ACM.
[20]
Weber, I., & Castillo, C. (2010). The demographics of web search. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval (pp. 523–530). ACM.
[21]
Weber, I., & Jaimes, A. (2011). Who uses web search for what: and how. In Proceedings of the fourth ACM international conference on Web search and data mining (pp. 15–24). ACM.
[22]
Yom-Tov E Crowdsourced health: How what you do on the internet will improve medicine 2016 Cambridge MIT Press
[23]
Yom-Tov E, Borsa D, Hayward AC, McKendry RA, and Cox IJ Automatic identification of web-based risk markers for health events Journal of Medical Internet Research 2015 17 1 e29
[24]
Yom-Tov E, Brunstein-Klomek A, Mandel O, Hadas A, and Fennig S Inducing behavioral change in seekers of pro-anorexia content using internet advertisements: Randomized controlled trial JMIR Mental Health 2018 5 1 e6
[25]
Yom-Tov E and Gabrilovich E Postmarket drug surveillance without trial costs: Discovery of adverse drug reactions through large-scale analysis of web search queries Journal of Medical Internet Research 2013 15 6 e124

Index Terms

  1. Demographic differences in search engine use with implications for cohort selection
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Information Retrieval
      Information Retrieval  Volume 22, Issue 6
      Dec 2019
      95 pages

      Publisher

      Kluwer Academic Publishers

      United States

      Publication History

      Published: 01 January 2019
      Accepted: 18 December 2018
      Received: 07 August 2018

      Author Tags

      1. Search engines
      2. Age
      3. Gender
      4. Demographics

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 0
        Total Downloads
      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 24 Jan 2025

      Other Metrics

      Citations

      View Options

      View options

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media