Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3318464.3389761acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

SLIM: Scalable Linkage of Mobility Data

Published: 31 May 2020 Publication History
  • Get Citation Alerts
  • Abstract

    We present a scalable solution to link entities across mobility datasets using their spatio-temporal information. This is a fundamental problem in many applications such as linking user identities for security, understanding privacy limitations of location based services, or producing a unified dataset from multiple sources for urban planning. Such integrated datasets are also essential for service providers to optimise their services and improve business intelligence. In this paper, we first propose a mobility based representation and similarity computation for entities. An efficient matching process is then developed to identify the final linked pairs, with an automated mechanism to decide when to stop the linkage. We scale the process with a locality-sensitive hashing (LSH) based approach that significantly reduces candidate pairs for matching. To realize the effectiveness and efficiency of our techniques in practice, we introduce an algorithm called SLIM. In the experimental evaluation, SLIM outperforms the two existing state-of-the-art approaches in terms of precision and recall. Moreover, the LSH-based approach brings two to four orders of magnitude speedup.

    Supplementary Material

    MP4 File (3318464.3389761.mp4)
    Presentation Video

    References

    [1]
    Gowtham Atluri, Anuj Karpatne, and Vipin Kumar. 2018. Spatio-Temporal Data Mining: A Survey of Problems and Methods. In ACM Comp. Surv. Association for Computing Machinery, New York, NY, USA, 83:1--83:41.
    [2]
    Fuat Basik. 2017. Scalable linkage across location enhanced services. In CEUR Workshop Proceedings. CEUR-WS, VLDB Endowment, Munich, Germany, 1--4.
    [3]
    Fuat Basik, Buugra Gedik, Cagri Etemouglu, and Hakan Ferhatosmanouglu. 2018. Spatio-Temporal Linkage over Location-Enhanced Services. IEEE Transactions on Mobile Computing, Vol. 17, 2 (Feb 2018), 447--460.
    [4]
    Wei Cao, Zhengwei Wu, Dong Wang, Jian. Li, and Haishan Wu. 2016. Automatic user identification method across heterogeneous mobility data sources. In IEEE Int. Conference on Data Engineering (ICDE). IEEE, USA, 978--989.
    [5]
    Alket Cecaj, Marco Mamei, and Nicola Bicocchi. 2014. Re-identification of anonymized CDR datasets using social network data. In 2014 IEEE International Conference on Pervasive Computing and Communication Workshops. IEEE, Budapest, Hungary, 237--242.
    [6]
    Alket Cecaj, Marco Mamei, and Franco Zambonelli. 2016. Re-identification and information fusion between anonymized CDR and social network data. Jour. of Ambient Intelligence and Humanized Computing, Vol. 7, 1 (01 Feb 2016), 83--96.
    [7]
    Peter Christen. 2012. A Survey of Indexing Techniques for Scalable Record Linkage and Deduplication. IEEE Transactions on Knowledge and Data Engineering, Vol. 24, 9 (Sep. 2012), 1537--1555.
    [8]
    Vittoria Colizza, Alain Barrat, Marc Barthelemy, and Alessandro Vespignani. 2006. The role of the airline transportation network in the prediction and predictability of global epidemics. Proceedings of the National Academy of Sciences of the United States of America, Vol. 103, 7 (2006), 2015--2020.
    [9]
    Robert Corless, Gaston Gonnet, D E. G. Hare, David Jeffrey, and D E. Knuth. 1996. On the Lambert W Function. Advances in Computational Mathematics, Vol. 5 (01 1996), 329--359.
    [10]
    Yves-Alexandre de Montjoye, César A Hidalgo, Michel Verleysen, and Vincent D Blondel. 2013. Unique in the Crowd: The privacy bounds of human mobility. Scientific reports, Vol. 3 (2013), 1376.
    [11]
    E. Frias-Martinez, G. Williamson, and V. Frias-Martinez. 2011. An Agent-Based Model of Epidemic Spread Using Human Mobility and Social Network Information. In IEEE Third International Conference on Privacy, Security, Risk and Trust and IEEE Third International Conference on Social Computing. IEEE, Boston, Massachusetts, USA, 57--64.
    [12]
    Raghu K. Ganti, Fan Ye, and Hui Lei. 2011. Mobile crowdsensing: current state and future challenges. IEEE Communications Magazine, Vol. 49, 11 (November 2011), 32--39.
    [13]
    Oana Goga, Howard Lei, Sree Hari Krishnan Parthasarathi, Gerald Friedland, Robin Sommer, and Renata Teixeira. 2013. Exploiting innocuous activity for correlating users across sites. In Proceedings of the 22nd International Conference on World Wide Web. Association for Computing Machinery, New York, NY, USA, 447--458.
    [14]
    Oana Goga, Patrick Loiseau, Robin Sommer, Renata Teixeira, and Krishna P. Gummadi. 2015. On the Reliability of Profile Matching Across Large Online Social Networks. In International Conference on Knowledge Discovery and Data Mining (KDD). Association for Computing Machinery, New York, NY, USA, 1799--1808.
    [15]
    Desislava Hristova, Matthew J. Williams, Mirco Musolesi, Pietro Panzarasa, and Cecilia Mascolo. 2016. Measuring Urban Social Diversity Using Interconnected Geo-Social Networks. In Proceedings of the 25th Int. Conference on World Wide Web. Association for Computing Machinery, New York, NY, USA, 21--30.
    [16]
    Piotr Indyk and Rajeev Motwani. 1998. Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality. In Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing (STOC '98). ACM, New York, NY, USA, 604--613.
    [17]
    Yaron Kanza, Elad Kravi, Eliyahu Safra, and Yehoshua Sagiv. 2017. Location-Based Distance Measures for Geosocial Similarity. ACM Transactions on Web, Vol. 11, 3 (2017), 17:1--17:32.
    [18]
    Tung Kieu, Bin Yang, Chenjuan Guo, and Christian S. Jensen. 2018. Distinguishing Trajectories from Different Drivers Using Incompletely Labeled Trajectories. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM '18). Association for Computing Machinery, New York, NY, USA, 863--872.
    [19]
    Daniel Kondor, Behrooz Hashemian, Yves-Alexandr de Montjoye, and Carlo Ratti. 2018. Towards matching user mobility traces in large-scale datasets. IEEE Transactions on Big Data (2018), 1--1.
    [20]
    Nitish Korula and Silvio Lattanzi. 2014. An efficient reconciliation algorithm for social networks. VLDB Conference, Vol. 7 (2014), 377--388.
    [21]
    Harold W. Kuhn. 2010. The Hungarian Method for the Assignment Problem .Springer Berlin Heidelberg, Berlin, Heidelberg, 29--47.
    [22]
    Jerome M. Kurtzberg. 1962. On Approximation Methods for the Assignment Problem. J. ACM, Vol. 9, 4 (Oct. 1962), 419--439.
    [23]
    Jing Liu, Fan Zhang, Xinying Song, Young-In Song, Chin-Yew Lin, and Hsiao-Wuen Hon. 2013. What's in a name?: an unsupervised approach to link users across communities. In Proceedings of the sixth ACM international conference on Web search and data mining. ACM, Association for Computing Machinery, New York, NY, USA, 495--504.
    [24]
    Siyuan Liu, Shuhui Wang, and Feida Zhu. 2015. Structured Learning from Heterogeneous Behavior for Social Identity Linkage. IEEE Transactions on Knowledge and Data Engineering (TKDE), Vol. 27, 7 (2015), 2005--2019.
    [25]
    Siyuan Liu, Shuhui Wang, Feida Zhu, Jinbo Zhang, and Ramayya Krishnan. 2014. HYDRA: large-scale social identity linkage via heterogeneous behavior modeling. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. Association for Computing Machinery, New York, NY, USA, 51--62.
    [26]
    Xin Mu, Feida Zhu, Ee-Peng Lim, Jing Xiao, Jianzong Wang, and Zhi-Hua Zhou. 2016. User Identity Linkage by Latent User Space Modelling. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, New York, NY, USA, 1775--1784.
    [27]
    Nobuyuki Otsu. 1979. A Threshold Selection Method from Gray-Level Histograms. IEEE Transactions on Systems, Man, and Cybernetics, Vol. 9, 1 (1979), 62--66.
    [28]
    Alex Pentland. 2009. Reality Mining of Mobile Communications: Toward A New Deal On Data .Springer US, Boston, MA, 1--1.
    [29]
    Alex Pentland, David Lazer, Devon Brewer, and Tracy Heibeck. 2009. Using reality mining to improve public health and medicine. Studies in health technology and informatics, Vol. 149 (02 2009), 93--102.
    [30]
    Anand Rajaraman and Jeffrey David Ullman. 2011. Mining of Massive Datasets .Cambridge University Press, New York, NY, USA. 73--129 pages.
    [31]
    Douglas A. Reynolds. 2009. Gaussian Mixture Models. In Encyclopedia of Biometric Recognition. Springer.
    [32]
    Christopher Riederer, Yunsung Kim, Augustin Chaintreau, Nitish Korula, and Silvio Lattanzi. 2016. Linking Users Across Domains with Location Data: Theory and Validation. In Proc. of the 25th Int. Conf.on World Wide Web. Association for Computing Machinery, New York, NY, USA, 707--719.
    [33]
    Stephen. E. Robertson and Steve Walker. 1994. Some Simple Effective Approximations to the 2-Poisson Model for Probabilistic Weighted Retrieval. In SIGIR '94. Springer London, London, 232--241.
    [34]
    Diego O. Rodrigues, Azzedine Boukerche, Thiago H. Silva, Antonio A.F. Loureiro, and Leandro A. Villas. 2018. Combining taxi and social media data to explore urban mobility issues. Computer Communications, Vol. 132 (2018), 111 -- 125.
    [35]
    Luca Rossi and Mirco Musolesi. 2014. It's the Way You Check-in: Identifying Users in Location-Based Social Networks. In Proceedings of the Second ACM Conference on Online Social Networks (COSN '14). Association for Computing Machinery, New York, NY, USA, 215--226.
    [36]
    Ville Satopaa, Jeannie Albrecht, David Irwin, and Barath Raghavan. 2011. Finding a "Kneedle" in a Haystack: Detecting Knee Points in System Behavior. In 2011 31st International Conference on Distributed Computing Systems Workshops. IEEE, USA, 166--171.
    [37]
    Vishal Sharma and Curtis Dyreson. 2018. LINKSOCIAL: Linking User Profiles Across Multiple Social Media Platforms. In 2018 IEEE International Conference on Big Knowledge (ICBK). IEEE, USA, 260--267.
    [38]
    Kai Shu, Suhang Wang, Jiliang Tang, Reza Zafarani, and Huan Liu. 2016. User Identity Linkage across Online Social Networks: A Review. SIGKDD Explorations, Vol. 18 (2016), 5--17.
    [39]
    Jessica E. Steele, Pr al Roe Sundsøy, Carla Pezzulo, Victor A. Alegana, Tomas J. Bird, Joshua Blumenstock, Johannes Bjelland, Kenth Engø-Monsen, Yves-Alexandre de Montjoye, Asif M. Iqbal, Khandakar N. Hadiuzzaman, Xin Lu, Erik Wetter, Andrew J. Tatem, and Linus Bengtsson. 2017. Mapping poverty using mobile phone and satellite data. Journal of The Royal Society Interface, Vol. 14, 127 (2017), 20160690.
    [40]
    Rebecca C. Steorts, Samuel L. Ventura, Mauricio Sadinle, and Stephen E. Fienberg. 2014. A Comparison of Blocking Methods for Record Linkage. In Privacy in Statistical Databases, Josep Domingo-Ferrer (Ed.). Springer International Publishing, Cham, 253--268.
    [41]
    Jayakrishnan Unnikrishnan and Farid Movahedi Naini. 2013. De-anonymizing private data by matching statistics. 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton) (2013), 1616--1623.
    [42]
    Norases Vesdapunt and Hector Garcia-Molina. 2015. Identifying users in social networks with limited information. In IEEE Int. Conference on Data Engineering (ICDE). IEEE, USA, 627--638.
    [43]
    Huandong Wang, Chen Gao, Yong Li, Gang Wang, Depeng Jin, and Jingbo Sun. 2018a. De-anonymization of Mobility Trajectories: Dissecting the Gaps between Theory and Practice. In The 25th Annual Network & Distributed System Security Symposium (NDSS). The Internet Society, San Diego, CA, USA, 1--15.
    [44]
    Huandong Wang, Yong Li, Gang Wang, and Depeng Jin. 2018b. You Are How You Move: Linking Multiple User Identities From Massive Mobility Traces. In Proceedings of the 2018 SIAM International Conference on Data Mining. SIAM, 189--197.
    [45]
    Haozhou Wang, Han Su, Kai Zheng, Shazia Wasim Sadiq, and Xiaofang Zhou. 2013. An Effectiveness Study on Trajectory Similarity Measures. In Australian Database Confrence. Australian Computer Society, Inc., AUS, 13--22.
    [46]
    Yaqing Wang, Chunyan Feng, Ling Chen, Hongzhi Yin, Caili Guo, and Yunfei Chu. 2019. User identity linkage across social networks via linked heterogeneous network embedding. World Wide Web, Vol. 22, 6 (2019), 2611--2632.
    [47]
    Fengli Xu, Zhen Tu, Yong Li, Pengyu Zhang, Xiaoming Fu, and Depeng Jin. 2017. Trajectory Recovery From Ash: User Privacy Is NOT Preserved in Aggregated Mobility Data. In Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, Geneva, CHE, 1241--1250.
    [48]
    Yu Zheng. 2015a. Methodologies for Cross-Domain Data Fusion: An Overview. IEEE Transactions on Big Data, Vol. 1, 1 (March 2015), 16--34.
    [49]
    Yu Zheng. 2015b. Trajectory Data Mining: An Overview. ACM Trans. Intell. Syst. Technol., Vol. 6, 3 (May 2015), 29:1--29:41.

    Cited By

    View all
    • (2024)EgoMUIL: Enhancing Spatio-Temporal User Identity Linkage in Location-Based Social Networks With Ego-Mo HypergraphIEEE Transactions on Mobile Computing10.1109/TMC.2023.334531223:8(8341-8354)Online publication date: Aug-2024
    • (2024)A Trajectory-oriented Locality-sensitive Hashing Method for User IdentificationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.3324427(1-14)Online publication date: 2024
    • (2023)A Trajectory-Based User Movement Pattern Similarity Measure for User IdentificationIEEE Transactions on Network Science and Engineering10.1109/TNSE.2023.3274516(1-12)Online publication date: 2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data
    June 2020
    2925 pages
    ISBN:9781450367356
    DOI:10.1145/3318464
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 31 May 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. data integration
    2. entity linkage
    3. mobility data
    4. scalability

    Qualifiers

    • Research-article

    Conference

    SIGMOD/PODS '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)13
    • Downloads (Last 6 weeks)2
    Reflects downloads up to

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)EgoMUIL: Enhancing Spatio-Temporal User Identity Linkage in Location-Based Social Networks With Ego-Mo HypergraphIEEE Transactions on Mobile Computing10.1109/TMC.2023.334531223:8(8341-8354)Online publication date: Aug-2024
    • (2024)A Trajectory-oriented Locality-sensitive Hashing Method for User IdentificationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.3324427(1-14)Online publication date: 2024
    • (2023)A Trajectory-Based User Movement Pattern Similarity Measure for User IdentificationIEEE Transactions on Network Science and Engineering10.1109/TNSE.2023.3274516(1-12)Online publication date: 2023
    • (2021)Spatial-Temporal Similarity for Trajectories with Location Noise and Sporadic Sampling2021 IEEE 37th International Conference on Data Engineering (ICDE)10.1109/ICDE51399.2021.00110(1224-1235)Online publication date: Apr-2021
    • (2020)User Identity Linkage across Location-Based Social Networks with Spatio- Temporal Check-in Patterns2020 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)10.1109/ISPA-BDCloud-SocialCom-SustainCom51426.2020.00189(1278-1285)Online publication date: Dec-2020

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media