Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3469830.3470893acmotherconferencesArticle/Chapter ViewAbstractPublication PagessstdConference Proceedingsconference-collections
research-article

Privacy-Preserving Synthetic Location Data in the Real World

Published: 23 August 2021 Publication History
  • Get Citation Alerts
  • Abstract

    Sharing sensitive data is vital in enabling many modern data analysis and machine learning tasks. However, current methods for data release are insufficiently accurate or granular to provide meaningful utility, and they carry a high risk of deanonymization or membership inference attacks. In this paper, we propose a differentially private synthetic data generation solution with a focus on the compelling domain of location data. We present two methods with high practical utility for generating synthetic location data from real locations, both of which protect the existence and true location of each individual in the original dataset. Our first, partitioning-based approach introduces a novel method for privately generating point data using kernel density estimation, in addition to employing private adaptations of classic statistical techniques, such as clustering, for private partitioning. Our second, network-based approach incorporates public geographic information, such as the road network of a city, to constrain the bounds of synthetic data points and hence improve the accuracy of the synthetic data. Both methods satisfy the requirements of differential privacy, while also enabling accurate generation of synthetic data that aims to preserve the distribution of the real locations. We conduct experiments using three large-scale location datasets to show that the proposed solutions generate synthetic location data with high utility and strong similarity to the real datasets. We highlight some practical applications for our work by applying our synthetic data to a range of location analytics queries, and we demonstrate that our synthetic data produces near-identical answers to the same queries compared to when real data is used. Our results show that the proposed approaches are practical solutions for sharing and analyzing sensitive location data privately.

    References

    [1]
    Jayadev Acharya, Keith Bonawitz, Peter Kairouz, Daniel Ramage, and Ziteng Sun. 2019. Context-Aware Local Differential Privacy. arxiv:1911.00038
    [2]
    Francesco Aldà and Benjamin I.P. Rubinstein. 2017. The Bernstein Mechanism: Function Release under Differential Privacy. In AAAI. 1705–1711.
    [3]
    Geoff Boeing. 2017. OSMnx: New methods for acquiring, constructing, analyzing, and visualizing complex street networks. Computers, Environment and Urban Systems 65 (2017), 126–139. https://doi.org/10.1016/j.compenvurbsys.2017.05.004
    [4]
    R. Chen, H. Li, A. K. Qin, S. P. Kasiviswanathan, and H. Jin. 2016. Private spatial data aggregation in the local setting. In IEEE ICDE. https://doi.org/10.1109/ICDE.2016.7498248
    [5]
    Graham Cormode, Cecilia Procopiuc, Divesh Srivastava, Entong Shen, and Ting Yu. 2012. Differentially Private Spatial Decompositions. In IEEE ICDE. https://doi.org/10.1109/ICDE.2012.16
    [6]
    Cynthia Dwork. 2006. Differential Privacy. In ICALP. 1–12.
    [7]
    Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Calibrating Noise to Sensitivity in Private Data Analysis. In Theory of Cryptography. 265–284.
    [8]
    Cynthia Dwork and Aaron Roth. 2014. The Algorithmic Foundations of Differential Privacy. Found. Trends Theor. Comput. Sci. 9, 3–4 (2014), 197.
    [9]
    Geolink. 2015. Taxi Service Trajectory Prediction Challenge. Retrieved August 15, 2019 from http://www.geolink.pt/ecmlpkdd2015-challenge/dataset.html
    [10]
    Soheila Ghane, Lars Kulik, and Kotagiri Ramamohanarao. 2018. Publishing Spatial Histograms under Differential Privacy. In SSDBM. https://doi.org/10.1145/3221269.3223039
    [11]
    Mehmet Emre Gursoy, Ling Liu, Stacey Truex, and Lei Yu. 2018. Differentially private and utility preserving publication of trajectory data. IEEE Transactions on Mobile Computing 18, 10 (2018), 2315–2329.
    [12]
    Mehmet Emre Gursoy, Vivekanand Rajasekar, and Ling Liu. 2020. Utility-Optimized Synthesis of Differentially Private Location Traces. arxiv:2009.06505
    [13]
    Rob Hall, Alessandro Rinaldo, and Larry Wasserman. 2013. Differential privacy for functions and functional data. JMLR 14(2013), 703–727.
    [14]
    Xi He, Graham Cormode, Ashwin Machanavajjhala, Cecilia M Procopiuc, and Divesh Srivastava. 2015. DPT: differentially private trajectory synthesis using hierarchical reference systems. PVLDB 8, 11 (2015), 1154–1165.
    [15]
    Mengdi Huai, Di Wang, Chenglin Miao, Jinhui Xu, and Aidong Zhang. 2019. Privacy-aware Synthesizing for Crowdsourced Data. In IJCAI. 2542–2548. https://doi.org/10.24963/ijcai.2019/353
    [16]
    John Krumm. 2007. Inference Attacks on Location Tracks. In Pervasive Computing. 127–143. https://doi.org/10.1007/978-3-540-72037-9_8
    [17]
    Elham Naghizade, Lars Kulik, Egemen Tanin, and James Bailey. 2020. Privacy- and Context-Aware Release of Trajectory Data. ACM Trans. Spatial Algorithms Syst. 6, 1 (2020). https://doi.org/10.1145/3363449
    [18]
    New York City Open Data. 2020. 311 Service Requests from 2010 to Present. Retrieved January 23, 2020 from https://data.cityofnewyork.us/browse?q=311
    [19]
    W. Qardaji, W. Yang, and N. Li. 2013. Differentially private grids for geospatial data. In IEEE ICDE. 757–768. https://doi.org/10.1109/ICDE.2013.6544872
    [20]
    Dong Su, Jianneng Cao, Ninghui Li, Elisa Bertino, and Hongxia Jin. 2016. Differentially Private K-Means Clustering. In ACM CODASPY. 26–37. https://doi.org/10.1145/2857705.2857708
    [21]
    Dong Su, Jianneng Cao, Ninghui Li, Elisa Bertino, Min Lyu, and Hongxia Jin. 2017. Differentially Private K-Means Clustering and a Hybrid Approach to Private Optimization. ACM Trans. Priv. Secur. 20, 4 (2017). https://doi.org/10.1145/3133201
    [22]
    Latanya Sweeney. 1997. Weaving Technology and Policy Together to Maintain Confidentiality. The Journal of Law, Medicine & Ethics 25, 2-3 (1997), 98–110. https://doi.org/10.1111/j.1748-720X.1997.tb01885.x
    [23]
    Eric W. Weisstein. 2021. Triangle Point Picking. http://mathworld.wolfram.com/TrianglePointPicking.html
    [24]
    Yonghui Xiao and Li Xiong. 2015. Protecting Locations with Differential Privacy under Temporal Correlations. In ACM SIGSAC. 1298–1309. https://doi.org/10.1145/2810103.2813640
    [25]
    Emre Yilmaz, Sanem Elbasi, and Hakan Ferhatosmanoglu. 2017. Predicting Optimal Facility Location Without Customer Locations. In ACM SIGKDD. https://doi.org/10.1145/3097983.3098198
    [26]
    Jing Yuan, Yu Zheng, Xing Xie, and Guangzhong Sun. 2011. Driving with knowledge from the physical world. In ACM SIGKDD. 316. https://doi.org/10.1145/2020408.2020462
    [27]
    Jing Yuan, Yu Zheng, Chengyang Zhang, Wenlei Xie, Xing Xie, Guangzhong Sun, and Yan Huang. 2010. T-drive: driving directions based on taxi trajectories. In ACM SIGSPATIAL. 99. https://doi.org/10.1145/1869790.1869807

    Cited By

    View all
    • (2024)Shapes and frictions of synthetic dataBig Data & Society10.1177/2053951724124939011:2Online publication date: 30-Apr-2024
    • (2024)Real-Time Trajectory Synthesis with Local Differential Privacy2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00137(1685-1698)Online publication date: 13-May-2024
    • (2024)Privacy Mechanisms and Evaluation Metrics for Synthetic Data Generation: A Systematic ReviewIEEE Access10.1109/ACCESS.2024.341760812(88048-88074)Online publication date: 2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    SSTD '21: Proceedings of the 17th International Symposium on Spatial and Temporal Databases
    August 2021
    173 pages
    ISBN:9781450384254
    DOI:10.1145/3469830
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 August 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Differential Privacy
    2. Location Data Sharing
    3. Synthetic Data

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    SSTD '21

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)67
    • Downloads (Last 6 weeks)3
    Reflects downloads up to

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Shapes and frictions of synthetic dataBig Data & Society10.1177/2053951724124939011:2Online publication date: 30-Apr-2024
    • (2024)Real-Time Trajectory Synthesis with Local Differential Privacy2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00137(1685-1698)Online publication date: 13-May-2024
    • (2024)Privacy Mechanisms and Evaluation Metrics for Synthetic Data Generation: A Systematic ReviewIEEE Access10.1109/ACCESS.2024.341760812(88048-88074)Online publication date: 2024
    • (2024)Trading Off Scalability, Privacy, and Performance in Data SynthesisIEEE Access10.1109/ACCESS.2024.336655612(26642-26654)Online publication date: 2024
    • (2023)Fast private kernel density estimation via locality sensitive quantizationProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619879(35339-35367)Online publication date: 23-Jul-2023
    • (2023)LDPTrace: Locally Differentially Private Trajectory SynthesisProceedings of the VLDB Endowment10.14778/3594512.359452016:8(1897-1909)Online publication date: 1-Apr-2023
    • (2023)PUTS: Privacy-Preserving and Utility-Enhancing Framework for Trajectory SynthesizationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.328815436:1(296-310)Online publication date: 21-Jun-2023
    • (2023)Synthesizing Realistic Trajectory Data With Differential PrivacyIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2023.324129024:5(5502-5515)Online publication date: 1-May-2023
    • (2023)Acceptable Margin of Error: Quantifying Location Privacy in BLE Localization2023 International Conference on Localization and GNSS (ICL-GNSS)10.1109/ICL-GNSS57829.2023.10148925(1-7)Online publication date: 6-Jun-2023
    • (2022)Privacy-Preserving Synthetic Data Generation for Recommendation SystemsProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3532044(1379-1389)Online publication date: 6-Jul-2022
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media