Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3055601.3055603acmconferencesArticle/Chapter ViewAbstractPublication PagescpsweekConference Proceedingsconference-collections
research-article

A Machine Learning Approach to Demographic Prediction using Geohashes

Published: 18 April 2017 Publication History

Abstract

With the rapid proliferation of smartphones, human beings act as social sensors by means of carrying GPS-enabled devices that share location data. This has resulted in an abundance of sensor data gathered over long periods of time. Gaining meaningful insights from such massive amounts of spatio-temporal data accumulated by several disparate sources is often a challenge for organizations. Identifying demographics of mobile phone users by telecommunication providers is one such example. Demographic information plays a very significant role in targeting online advertisements to focused user groups by gaining insights about userfis mobility patterns. However, in practice, demographic information such as age and gender are mostly unavailable to app developers for open access due to privacy concerns. In this paper, we try to address the gap of how to enrich location data with demographics, which could be valuable for app developers. In our study, we use a machine learning approach to predict the gender and age of mobile phone users from a set of 3,252,950 anonymised GPS trajectories with 60,865 unique devices using a predictive model which is based upon the concept of Geohashes. We study to what extent usersfi demographics could be inferred from their frequently visited locations by encoding by formulating a multi-level classification algorithm to find the most frequently visited Geohashes and associating them with nearest points of interests which would enable predicting age-group and gender of the users who prefer to visit a specific location in a sequential manner. Experiments are conducted on a real dataset of mobile phone users collected and shared by a telecommunication provider. Th The experimental results show that the proposed algorithm can achieve mean prediction accuracy scores of 71.62% and 96.75% for predicting gender and age groups of the users respectively.

References

[1]
Overpass Turbo API. 2011. http://wiki.openstreetmap.org/wiki/Overpass_API. (2011).
[2]
Jorge Brea, Javier Burroni, Martin Minnoni, and Carlos Sarraute. 2014. Harnessing mobile phone social network topology to infer users demographic attributes. In Proceedings of the 8th Workshop on Social Network Mining and Analysis. ACM, 1.
[3]
Xin Chen, Yu Wang, Eugene Agichtein, and Fusheng Wang. 2015. A Comparative Study of Demographic Attribute Inference in Twitter. ICWSM 15 (2015), 590--593.
[4]
Aron Culotta, Nirmal Ravi Kumar, and Jennifer Cutler. 2015. Predicting the Demographics of Twitter Users from Website Traffic Data. In AAAI. 72-78.
[5]
Sharad Goel, Jake M Hofman, and M Irmak Sirer. 2012. Who Does What on the Web: A Large-Scale Study of Browsing Behavior. In ICWSM.
[6]
Marta C Gonzalez, Cesar A Hidalgo, and Albert-Laszlo Barabasi. 2008. Understanding individual human mobility patterns. Nature 453, 7196 (2008), 779--782.
[7]
Michael F Goodchild. 2007. Citizens as sensors: the world of volunteered geography. GeoJournal 69, 4 (2007), 211--221.
[8]
Jiawei Han, Jian Pei, and Yiwen Yin. 2000. Mining frequent patterns without candidate generation. In ACM Sigmod Record, Vol. 29. ACM, 1--12.
[9]
Jian Hu, Hua-Jun Zeng, Hua Li, Cheng Niu, and Zheng Chen. 2007. Demographic prediction based on user's browsing behavior. In Proceedings of the 16th international conference on World Wide Web. ACM, 151--160.
[10]
Neal Jean, Marshall Burke, Michael Xie, W Matthew Davis, David B Lobell, and Stefano Ermon. 2016. Combining satellite imagery and machine learning to predict poverty. Science 353, 6301 (2016), 790--794.
[11]
Kaggle. 2016. TalkingData Mobile User Demographics. https://www.kaggle.com/c/talkingdata-mobile-user-demographics. (2016).
[12]
Xiangrui Meng, Joseph Bradley, Burak Yavuz, Evan Sparks, Shivaram Venkataraman, Davies Liu, Jeremy Freeman, DB Tsai, Manish Amde, Sean Owen, and others. 2016. Mllib: Machine learning in apache spark. Journal of Machine Learning Research 17, 34 (2016), 1--7.
[13]
Alan Mislove, Sune Lehmann, Yong-Yeol Ahn, Jukka-Pekka Onnela, and J Niels Rosenquist. 2011. Understanding the Demographics of Twitter Users. ICWSM 11 (2011), 5th.
[14]
Gustavo Niemeyer. 2008. Geohash. https://en.wikipedia.org/wiki/Geohash. (2008).
[15]
OpenStreetMap. 2004. http://www.openstreetmap.org/. (2004).
[16]
Christopher J Riederer, Sebastian Zimmeck, Coralie Phanord, Augustin Chaintreau, and Steven M Bellovin. 2015. I don't have a photograph, but you can have my footprints.: Revealing the Demographics of Location Data. In Proceedings of the 2015 ACM on Conference on Online Social Networks. ACM, 185--195.
[17]
Carlos Sarraute, Pablo Blanc, and Javier Burroni. 2014. A study of age and gender seen through mobile phone usage patterns in mexico. In Advances in Social Networks Analysis and Mining (ASONAM), 2014 IEEE/ACM International Conference on. IEEE, 836--843.
[18]
Suranga Seneviratne, Aruna Seneviratne, Prasant Mohapatra, and Anirban Mahanti. 2014. Predicting user traits from a snapshot of apps installed on a smartphone. ACM SIGMOBILE Mobile Computing and Communications Review 18, 2 (2014), 1--8.
[19]
Suranga Seneviratne, Aruna Seneviratne, Prasant Mohapatra, and Anirban Mahanti. 2015. Your installed apps reveal your gender and more! ACM SIGMOBILE Mobile Computing and Communications Review 18, 3 (2015), 55--61.
[20]
International Telecommunication Union. 2016. Measuring the Information Society 2016. (2016). http://www.itu.int/en/ITU-D/Statistics/Documents/publications/misr2016/MISR2016-w4.pdf
[21]
Glen Van Brummelen. 2013. Heavenly mathematics: The forgotten art of spherical trigonometry. Princeton University Press. Pgs. 157-160.
[22]
Dong Wang, Md Tanvir Amin, Shen Li, Tarek Abdelzaher, Lance Kaplan, Siyu Gu, Chenji Pan, Hengchang Liu, Charu C Aggarwal, Raghu Ganti, and others. 2014. Using humans as sensors: an estimation-theoretic perspective. In Information Processing in Sensor Networks, IPSN-14 Proceedings of the 13th International Symposium on. IEEE, 35--46.
[23]
Ingmar Weber and Alejandro Jaimes. 2011. Who uses web search for what: and how. In Proceedings of the fourth ACM international conference on Web search and data mining. ACM, 15--24.
[24]
Yu Zheng, Lizhu Zhang, Xing Xie, and Wei-Ying Ma. 2009. Mining interesting locations and travel sequences from GPS trajectories. In Proceedings of the 18th international conference on Worldwide web. ACM, 791--800.

Cited By

View all
  • (2023)National-Level Multimodal Origin–Destination Estimation Based on Passively Collected Location Data and Machine Learning MethodsTransportation Research Record: Journal of the Transportation Research Board10.1177/036119812311897322678:5(525-541)Online publication date: 19-Aug-2023
  • (2023)IIoT Based Trustworthy Demographic Dynamics Tracking With Advanced Bayesian LearningIEEE Transactions on Network Science and Engineering10.1109/TNSE.2022.314557210:5(2745-2754)Online publication date: 1-Sep-2023
  • (2023)OCEANS-6 : Character Trait, Age Demographic Analysis and Sexist Speech Detection of Dialogues2023 IEEE 8th International Conference on Recent Advances and Innovations in Engineering (ICRAIE)10.1109/ICRAIE59459.2023.10468349(1-6)Online publication date: 2-Dec-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SocialSens'17: Proceedings of the 2nd International Workshop on Social Sensing
April 2017
97 pages
ISBN:9781450349772
DOI:10.1145/3055601
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 April 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Decision-Tree Classifier
  2. Demographic Prediction
  3. Frequent Itemset Mining
  4. Geohash
  5. Linear Discriminant Analysis

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

CPS Week '17
Sponsor:
CPS Week '17: Cyber Physical Systems Week 2017
April 18 - 21, 2017
PA, Pittsburgh, USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)25
  • Downloads (Last 6 weeks)2
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)National-Level Multimodal Origin–Destination Estimation Based on Passively Collected Location Data and Machine Learning MethodsTransportation Research Record: Journal of the Transportation Research Board10.1177/036119812311897322678:5(525-541)Online publication date: 19-Aug-2023
  • (2023)IIoT Based Trustworthy Demographic Dynamics Tracking With Advanced Bayesian LearningIEEE Transactions on Network Science and Engineering10.1109/TNSE.2022.314557210:5(2745-2754)Online publication date: 1-Sep-2023
  • (2023)OCEANS-6 : Character Trait, Age Demographic Analysis and Sexist Speech Detection of Dialogues2023 IEEE 8th International Conference on Recent Advances and Innovations in Engineering (ICRAIE)10.1109/ICRAIE59459.2023.10468349(1-6)Online publication date: 2-Dec-2023
  • (2021)Development of Music Theory Analytic Platform in Mobile Environment based on Mobile Computing2021 Second International Conference on Electronics and Sustainable Communication Systems (ICESC)10.1109/ICESC51422.2021.9532639(1058-1061)Online publication date: 4-Aug-2021
  • (2020)High‐performance implementation of a two‐bit geohash coding technique for nearest neighbor searchConcurrency and Computation: Practice and Experience10.1002/cpe.602933:5Online publication date: 5-Oct-2020
  • (2019)DIRAC: A Hybrid Approach to Customer Demographics Analysis for Advertising Campaigns2019 6th NAFOSTED Conference on Information and Computer Science (NICS)10.1109/NICS48868.2019.9023806(256-261)Online publication date: Dec-2019

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media