Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3025453.3026015acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article
Open access

The Effect of Population and "Structural" Biases on Social Media-based Algorithms: A Case Study in Geolocation Inference Across the Urban-Rural Spectrum

Published: 02 May 2017 Publication History

Abstract

Much research has shown that social media platforms have substantial population biases. However, very little is known about how these population biases affect the many algorithms that rely on social media data. Focusing on the case study of geolocation inference algorithms and their performance across the urban-rural spectrum, we establish that these algorithms exhibit significantly worse performance for underrepresented populations (i.e. rural users). We further establish that this finding is robust across both text- and network-based algorithm designs. However, we also show that some of this bias can be attributed to the design of algorithms themselves rather than population biases in the underlying data sources. For instance, in some cases, algorithms perform badly for rural users even when we substantially overcorrect for population biases by training exclusively on rural data. We discuss the implications of our findings for the design and study of social media-based algorithms.

Supplementary Material

MP4 File (p1167-johnson.mp4)

References

[1]
Saeed Abdullah, Elizabeth L. Murnane, Jean M.R. Costa, and Tanzeem Choudhury. 2015. Collective Smile: Measuring Societal Happiness from Geolocated Images. In CSCW. https://doi.org/10.1145/2675133.2675186
[2]
Mike Ananny, Karrie Karahalios, Christian Sandvig, and Christo Wilson. 2015. Auditing Algorithms from the Outside: Methods and Implications. In ICWSM.
[3]
Lars Backstrom, Eric Sun, and Cameron Marlow. 2010. Find me if you can: improving geographical prediction with social and spatial proximity. In WWW.
[4]
Saeideh Bakhshi, David A. Shamma, and Eric Gilbert. 201 Faces Engage Us: Photos with Faces Attract More Likes and Comments on Instagram. In Proceedings of the 32nd Annual ACM Conference on Human Factors in Computing Systems (CHI '14), 965--974. https://doi.org/10.1145/2556288.2557403
[5]
John D. Burger, John Henderson, George Kim, and Guido Zarrella. 2011. Discriminating gender on Twitter. In EMNLP.
[6]
Miriam Cha, Youngjune Gwon, and H. T. Kung. 2015. Twitter Geolocation and Regional Classification via Sparse Coding. In ICWSM.
[7]
Le Chen, Alan Mislove, and Christo Wilson. 2015. Peeking Beneath the Hood of Uber. 495--508. https://doi.org/10.1145/2815675.2815681
[8]
Zhiyuan Cheng, James Caverlee, and Kyumin Lee. 2010. You Are Where You Tweet?: A Content-Based Approach to Geo-locating Twitter Users. CIKM. https://doi.org/10.1145/1871437.1871535
[9]
Zhiyuan Cheng, James Caverlee, Kyumin Lee, and Daniel Z. Sui. 2011. Exploring Millions of Footprints in Location Sharing Services. ICWSM 2011.
[10]
Ryan Compton, David Jurgens, and David Allen. 2014. Geotagging one hundred million twitter accounts with total variation minimization. In IEEE BigData.
[11]
Ryan Compton, Craig Lee, Jiejun Xu, Luis Artieda-moncada, Tsai-ching Lu, Lalindra De Silva, and Michael Macy. 2013. Using publicly visible social media to build detailed forecasts of civil unrest. 1--
[12]
Justin Cranshaw, Jason I Hong, and Norman Sadeh. 20 The Livehoods Project?: Utilizing Social Media to Understand the Dynamics of a City. ICWSM: 58--65.
[13]
Aron Culotta. 2014. Estimating county health statistics with twitter. In JSM Proceedings, 1335--1344. https://doi.org/10.1145/2556288.2557139
[14]
Aron Culotta. 20 Reducing Sampling Bias in Social Media Data for County Health Inference. JSM Proceedings.
[15]
Mark Dredze, Michael J. Paul, Shane Bergsma, and Hieu Tran. 2013. Carmen: A twitter geolocation system with applications to public health. In AAAI Workshop: HIAI.
[16]
Jacob Eisenstein, Brendan O'Connor, Noah A. Smith, and Eric P. Xing. 2010. A latent variable model for geographic lexical variation. EMNLP. https://doi.org/10.1038/nrm2900
[17]
Benjamin Elgin and Peter Robison. 2016. How Despots Use Twitter to Hunt Dissidents. Bloomberg Technology. Retrieved from https://www.bloomberg.com/news/articles/2016--10--27/twitter-s-firehose-of-tweets-is-incredibly-valuable-and-just-as-dangerous
[18]
David Flatow, Mor Naaman, Ke Eddie Xie, Yana Volkovich, and Yaron Kanza. 2015. On the Accuracy of Hyper-local Geotagging of Social Media Content. In WSDM. https://doi.org/10.1145/2684822.2685296
[19]
Andrew Gallagher, Devashree Joshi, Jie Yu, and Jiebo Luo. 2009. Geo-location inference from image content and user tags. In Computer Vision and Pattern Recognition Workshops, 2009. CVPR Workshops 2009. IEEE Computer Society Conference on, 55--62.
[20]
Ruth Garcia-Gavilanes, Daniele Quercia, and Alejandro Jaimes. 2013. Cultural dimensions in twitter: Time, individualism and power. ICWSM 13.
[21]
Eric Gilbert, Karrie Karahalios, and Christian Sandvig. 2008. The Network in the Garden?: An Empirical Analysis of Social Media in Rural Life. CHI: 1603--1612.
[22]
Eric Gilbert, Karrie Karahalios, and Christian Sandvig. 2010. The Network in the Garden: Designing Social Media for Rural Life. American Behavioral Scientist 53, 9: 1367--1388. https://doi.org/10.1177/0002764210361690
[23]
Mark Graham, Scott A. Hale, and Devin Gaffney. 2014. Where in the World Are You? Geolocation and Language Identification in Twitter. The Professional Geographer 0, 0: 1--11. https://doi.org/10.1080/00330124.2014.907699
[24]
T. Hagerstrand. 1968. Innovation diffusion as a spatial process. 334 pp.
[25]
Bo Han, Paul Cook, and Timothy Baldwin. 2014. Text-based twitter user geolocation prediction. Journal of Artificial Intelligence Research: 451--500.
[26]
Brent Hecht and Darren Gergle. 2010. On the "localness" of user-generated content. CSCW: 229. https://doi.org/10.1145/1718918.1718962
[27]
Brent Hecht, Lichan Hong, Bongwon Suh, and Ed H. Chi. 2011. Tweets from Justin Bieber's heart: the dynamics of the location field in user profiles. In CHI.
[28]
Brent Hecht and Monica Stephens. 2014. A Tale of Cities: Urban Biases in Volunteered Geographic Information. In Eighth International AAAI Conference on Weblogs and Social Media.
[29]
DD Ingram and SJ Franco. 2014. 2013 NCHS urban-rural classification scheme for counties. Vital Health Statistics 2, 166.
[30]
Yushi Jing, David Liu, Dmitry Kislyuk, Andrew Zhai, Jiajing Xu, Jeff Donahue, and Sarah Tavel. 2015. Visual Search at Pinterest. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '15), 1889--1898. https://doi.org/10.1145/2783258.2788621
[31]
Isaac L. Johnson, Subhasree Sengupta, Johannes Schöning, and Brent Hecht. 2016. The Geography and Importance of Localness in Geotagged Social Media. In 2016 CHI Conference on Human Factors in Computing Systems, 515--526. https://doi.org/10.1145/2858036.2858122
[32]
Isaac Johnson, Yilun Lin, Toby Jia-Jun Li, Andrew Hall, Aaron Halfaker, Johannes Schöning, and Brent Hecht. 2016. Not at Home on the Range: Peer Production and the Urban/Rural Divide. CHI.
[33]
David Jurgens. 2013. That's What Friends Are For: Inferring Location in Online Social Media Platforms Based on Social Relationships. ICWSM 13: 273--282.
[34]
David Jurgens, Tyler Finethy, James McCorriston, Yi Tian Xu, and Derek Ruths. 2015. Geolocation prediction in twitter using social networks: A critical analysis and review of current practice. In ICWSM.
[35]
Matthew Kay, Cynthia Matuszek, and Sean A. Munson. 2015. Unequal Representation and Gender Stereotypes in Image Search Results for Occupations. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI '15), 3819--3828. https://doi.org/10.1145/2702123.2702520
[36]
Lorin D. Kusmin. 2016. Rural America At A Glance: 2015 Edition. USA Dept. of Agriculture. Retrieved from http://www.ers.usda.gov/media/1952235/eib145.pdf
[37]
Virgile Landeiro and Aron Culotta. 2016. Robust text classification in the presence of confounding bias. In Thirtieth AAAI Conference on Artificial Intelligence. Retrieved May 17, 2016 from http://www.aaai.org/Conferences/AAAI/2016/Papers/02Landeiro12445.pdf
[38]
Géraud Le Falher, Aristides Gionis, and Michael Mathioudakis. 2015. Where Is the Soho of Rome? Measures and Algorithms for Finding Similar Neighborhoods in Cities. In ICWSM.
[39]
Linna Li, Michael F. Goodchild, and Bo Xu. 2013. Spatial, temporal, and socioeconomic patterns in the use of Twitter and Flickr. Cartography and Geographic Information Science 40, 2: 61--77. https://doi.org/10.1080/15230406.2013.777139
[40]
Rui Li, Shengjie Wang, Hongbo Deng, Rui Wang, and Kevin Chen-Chuan Chang. 2012. Towards social user profiling: unified and discriminative influence model for inferring home locations. In SIGKDD.
[41]
Xutao Li, Tuan-Anh Nguyen Pham, Gao Cong, Quan Yuan, Xiao-Li Li, and Shonali Krishnaswamy. 2015. Where You Instagram?: Associating Your Instagram Photos with Points of Interest. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management (CIKM '15), 1231--1240. https://doi.org/10.1145/2806416.2806463
[42]
J. Lindamood, R. Heatherly, M. Kantarcioglu, and B. Thuraisingham. 2009. Inferring Private Information Using Social Network Data. In WWW '09: 2009 International World Wide Web Conference.
[43]
Jalal Mahmud, Jeffrey Nichols, and Clemens Drews. 2014. Home Location Identification of Twitter Users. ACM TIST 5, 3: 1--21. https://doi.org/10.1145/2528548
[44]
Momin M. Malik, Hemank Lamba, Constantine Nakos, and Jürgen Pfeffer. 2015. Population Bias in Geotagged Tweets. In ICWSM.
[45]
Jeffrey McGee, James Caverlee, and Zhiyuan Cheng. 2013. Location prediction in social media based on tie strength. In CIKM, 459--468. https://doi.org/10.1145/2505515.2505544
[46]
Alan Mislove, Sune Lehmann, Yong-yeol Ahn, Jukka-pekka Onnela, and J Niels Rosenquist. Understanding the Demographics of Twitter Users. ICWSM: 554--557.
[47]
Lewis Mitchell, Morgan R Frank, Kameron Decker Harris, Peter Sheridan Dodds, and Christopher M Danforth. 2013. The geography of happiness: connecting twitter sentiment and expression, demographics, and objective characteristics of place. PloS one 8, 5: e64417. https://doi.org/10.1371/journal.pone.0064417
[48]
Cathy O'Neil. 2016. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. Crown, New York.
[49]
Aditya Pal, Amac Herdagdelen, Sourav Chatterji, Sumit Taank, and Deepayan Chakrabarti. 2016. Discovery of Topical Authorities in Instagram. In WWW.
[50]
Umashanthi Pavalanathan and Jacob Eisenstein. 2015. Confounds and Consequences in Geotagged Twitter Data. EMNLP.
[51]
Andrew Perrin. 2015. Social Media Usage: 2005--2015. Pew Research Center.
[52]
Reid Priedhorsky, Aron Culotta, and Sara Y. Del Valle. 2014. Inferring the Origin Locations of Tweets with Quantitative Confidence. CSCW 29: 997--1003.
[53]
Stephen Roller, Michael Speriosu, Sarat Rallapalli, Benjamin Wing, and Jason Baldridge. 2012. Supervised text-based geolocation using language models on an adaptive grid. In EMNLP-CoNLL.
[54]
Dominic Rout, Kalina Bontcheva, Daniel Preotiuc-Pietro, and Trevor Cohn. 2013. Where's@ wally?: a classification approach to geolocating users based on their social ties. In Hypertext, 11--20.
[55]
Derek Ruths and Jürgen Pfeffer. 2014. Social media for large studies of behavior. Science 346, 6213: 1063--1064. https://doi.org/10.1126/science.346.6213.1063
[56]
Christian Sandvig, Kevin Hamilton, Karrie Karahalios, and Cedric Langbort. 2014. Auditing algorithms: Research methods for detecting discrimination on internet platforms. Data and Discrimination.
[57]
Christian Sandvig, Kevin Hamilton, Karrie Karahalios, and Cedric Langbort. 2015. Can an Algorithm be Unethical? In 65th Annual Meeting of the International Communication Association.
[58]
Shilad Sen, Toby Jia-Jun Li, WikiBrain Team, and Brent Hecht. 2014. WikiBrain: Democratizing Computation on Wikipedia. In OpenSym (OpenSym '14), 27:1--27:10. https://doi.org/10.1145/2641580.2641615
[59]
Börkur Sigurbjörnsson and Roelof Van Zwol. 2008. Flickr tag recommendation based on collective knowledge. In WWW.
[60]
Gary Soeller, Karrie Karahalios, Christian Sandvig, and Christo Wilson. 2016. MapWatch: Detecting and Monitoring International Border Personalization on Online Maps. In Proceedings of the 25th International Conference on World Wide Web (WWW '16), 867--878. https://doi.org/10.1145/2872427.2883016
[61]
Monica Stephens. 2013. Gender and the GeoWeb: divisions in the production of user-generated cartographic information. GeoJournal 78, 6: 981--996. https://doi.org/10.1007/s10708-013--9492-z
[62]
Suresh Venkatasubramanian. 2016. Algorithmic Fairness: From social good to a mathematical framework. Retrieved September 17, 2016 from https://algorithmicfairness.wordpress.com/2016/04/15/keynote-at-icwsm/
[63]
Jacob Thebault-Spieker, Loren G. Terveen, and Brent Hecht. 2015. Avoiding the South Side and the Suburbs: The Geography of Mobile Crowdsourcing Markets. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing (CSCW '15), 265--275. https://doi.org/10.1145/2675133.2675278
[64]
Benjamin P. Wing and Jason Baldridge. 2011. Simple supervised document geolocation with geodesic grids. In ACL.
[65]
Wilbur Zelinsky. 1980. North America's Vernacular Regions. Annals of the Association of American Geographers 70, 1: 1--16. https://doi.org/10.1111/j.1467--8306.1980.tb01293.x
[66]
Danning Zheng, Tianran Hu, Quanzeng You, Henry Kautz, and Jiebo Luo. 2015. Towards Lifestyle Understanding: Predicting Home and Vacation Locations from User's Online Photo Collections. In ICWSM.
[67]
Kathryn Zickuhr and Aaron Smith. 2011. 28% of American Adults Use Mobile and Social Location-Based Services. Pew Internet and American Life Project.

Cited By

View all
  • (2024)New Digital Divide Shaped by Algorithm? Evidence from Agent-Based Testing on Douyin’s Health-Related Video RecommendationCommunication Research10.1177/00936502241262056Online publication date: 30-Jul-2024
  • (2024)The ``Colonial Impulse" of Natural Language Processing: An Audit of Bengali Sentiment Analysis Tools and Their Identity-based BiasesProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642669(1-18)Online publication date: 11-May-2024
  • (2024)Networks and identity drive the spatial diffusion of linguistic innovation in urban and rural areasnpj Complexity10.1038/s44260-024-00009-91:1Online publication date: 2-Sep-2024
  • Show More Cited By

Index Terms

  1. The Effect of Population and "Structural" Biases on Social Media-based Algorithms: A Case Study in Geolocation Inference Across the Urban-Rural Spectrum

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CHI '17: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems
    May 2017
    7138 pages
    ISBN:9781450346559
    DOI:10.1145/3025453
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 02 May 2017

    Check for updates

    Author Tags

    1. algorithmic accountability
    2. geolocation inference
    3. population bias
    4. social media

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    CHI '17
    Sponsor:

    Acceptance Rates

    CHI '17 Paper Acceptance Rate 600 of 2,400 submissions, 25%;
    Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

    Upcoming Conference

    CHI 2025
    ACM CHI Conference on Human Factors in Computing Systems
    April 26 - May 1, 2025
    Yokohama , Japan

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)394
    • Downloads (Last 6 weeks)33
    Reflects downloads up to 06 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)New Digital Divide Shaped by Algorithm? Evidence from Agent-Based Testing on Douyin’s Health-Related Video RecommendationCommunication Research10.1177/00936502241262056Online publication date: 30-Jul-2024
    • (2024)The ``Colonial Impulse" of Natural Language Processing: An Audit of Bengali Sentiment Analysis Tools and Their Identity-based BiasesProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642669(1-18)Online publication date: 11-May-2024
    • (2024)Networks and identity drive the spatial diffusion of linguistic innovation in urban and rural areasnpj Complexity10.1038/s44260-024-00009-91:1Online publication date: 2-Sep-2024
    • (2024)Remote illness detection faces a trust barrierThe Lancet Digital Health10.1016/S2589-7500(24)00145-66:8(e537-e538)Online publication date: Aug-2024
    • (2024)Geospectra: leveraging quantum-SAR and deep learning for enhanced geolocation in urban environmentsThe Journal of Supercomputing10.1007/s11227-024-06619-381:1Online publication date: 30-Nov-2024
    • (2024)Some Observations on Social Media Mining tools for Health ApplicationsData Science and Applications10.1007/978-981-99-7817-5_8(97-109)Online publication date: 18-Jan-2024
    • (2024)Factors influencing the design and implementation of accessible e‐Government services in South AfricaTHE ELECTRONIC JOURNAL OF INFORMATION SYSTEMS IN DEVELOPING COUNTRIES10.1002/isd2.1231790:4Online publication date: 31-Jan-2024
    • (2023)When Biased Humans Meet Debiased AI: A Case Study in College Major RecommendationACM Transactions on Interactive Intelligent Systems10.1145/361131313:3(1-28)Online publication date: 11-Sep-2023
    • (2023)Mobilizing Social Media Data: Reflections of a Researcher Mediating between Data and OrganizationProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3580916(1-19)Online publication date: 19-Apr-2023
    • (2023)The Ethics of Computational Social ScienceHandbook of Computational Social Science for Policy10.1007/978-3-031-16624-2_4(57-104)Online publication date: 24-Jan-2023
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media