Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3448016.3452777acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

EquiTensors: Learning Fair Integrations of Heterogeneous Urban Data

Published: 18 June 2021 Publication History

Abstract

Neural methods are state-of-the-art for urban prediction problems such as transportation resource demand, accident risk, crowd mobility, and public safety. Model performance can be improved by integrating exogenous features from open data repositories (e.g., weather, housing prices, traffic, etc.), but these uncurated sources are often too noisy, incomplete, and biased to use directly. We propose to learn integrated representations, called EquiTensors, from heterogeneous datasets that can be reused across a variety of tasks. We align datasets to a consistent spatio-temporal domain, then describe an unsupervised model based on convolutional denoising autoencoders to learn shared representations. We extend this core integrative model with adaptive weighting to prevent certain datasets from dominating the signal. To combat discriminatory bias, we use adversarial learning to remove correlations with a sensitive attribute (e.g., race or income). Experiments with 23 input datasets and 4 real applications show that EquiTensors could help mitigate the effects of the sensitive information embodied in the biased data. Meanwhile, applications using EquiTensors outperform models that ignore exogenous features and are competitive with "oracle" models that use hand-selected datasets.

Supplementary Material

MP4 File (3448016.3452777.mp4)
Neural methods are state-of-the-art for urban prediction problems such as transportation resource demand, accident risk, crowd mobility, and public safety. Model performance can be improved by integrating exogenous features from open data repositories (e.g., weather, housing prices, traffic, etc.), but these uncurated sources are often too noisy, incomplete, and biased to use directly. We propose to learn integrated features from heterogeneous datasets that can be reused across a variety of tasks. We align datasets to a consistent spatio-temporal domain, then describe an unsupervised model based on denoising autoencoders to learn shared features. We extend this core integrative model with adaptive weighting to prevent certain datasets from dominating the signal. To combat discriminatory bias, we use adversarial training to remove correlations with a sensitive attribute (e.g., race or income). Experiments with 23 input datasets and 4 real applications show models using pre-trained features outperform those that ignore exogenous features and are competitive with supervised "oracle" models that use hand-selected datasets while significantly improving fairness. We conclude that our pre-trained integrated features can improve model performance and reduce discriminatory effects for complex prediction applications, broadening the utility of uncurated civic open data repositories.

References

[1]
Mart'in Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et almbox. 2016. Tensorflow: a system for large-scale machine learning. In OSDI, Vol. 16. 265--283.
[2]
Solon Barocas and Andrew D Selbst. 2016. Big data's disparate impact. Calif. L. Rev., Vol. 104 (2016), 671.
[3]
Patrick Bayer, Marcus Casey, Fernando Ferreira, and Robert McMillan. 2017. Racial and ethnic price differentials in the housing market. Journal of Urban Economics, Vol. 102 (2017), 91--105.
[4]
Yoshua Bengio, Aaron Courville, and Pascal Vincent. 2013. Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence, Vol. 35, 8 (2013), 1798--1828.
[5]
Toon Bogaerts, Antonio D Masegosa, Juan S Angarita-Zapata, Enrique Onieva, and Peter Hellinckx. 2020. A graph CNN-LSTM neural network for short and long-term traffic forecasting based on trajectory data. Transportation Research Part C: Emerging Technologies, Vol. 112 (2020), 62--77.
[6]
Konstantinos Bousmalis, Nathan Silberman, David Dohan, Dumitru Erhan, and Dilip Krishnan. 2017. Unsupervised pixel-level domain adaptation with generative adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition . 3722--3731.
[7]
Toon Calders, Asim Karim, Faisal Kamiran, Wasif Ali, and Xiangliang Zhang. 2013. Controlling attribute effect in linear regression. In Data Mining (ICDM), 2013 IEEE 13th International Conference on. IEEE, 71--80.
[8]
Zhao Chen, Vijay Badrinarayanan, Chen-Yu Lee, and Andrew Rabinovich. 2018. Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. In International Conference on Machine Learning . 794--803.
[9]
Xingyi Cheng, Ruiqing Zhang, Jie Zhou, and Wei Xu. 2018. Deeptransport: Learning spatial-temporal dependency for traffic condition forecasting. In 2018 International Joint Conference on Neural Networks (IJCNN). IEEE, 1--8.
[10]
Fernando Chirigati, Harish Doraiswamy, Theodoros Damoulas, and Juliana Freire. 2016. Data polygamy: the many-many relationships among urban spatio-temporal data sets. In Proceedings of the 2016 International Conference on Management of Data . 1011--1025.
[11]
Alexandra Chouldechova and Aaron Roth. 2018. The frontiers of fairness in machine learning. arXiv preprint arXiv:1810.08810 (2018).
[12]
Elliot Creager, David Madras, Jörn-Henrik Jacobsen, Marissa A Weis, Kevin Swersky, Toniann Pitassi, and Richard Zemel. 2019. Flexibly fair representation learning by disentanglement. arXiv preprint arXiv:1906.02589 (2019).
[13]
Zhiyong Cui, Kristian Henrickson, Ruimin Ke, and Yinhai Wang. 2019. Traffic graph convolutional recurrent neural network: A deep learning framework for network-scale traffic learning and forecasting. IEEE Transactions on Intelligent Transportation Systems, Vol. 21, 11 (2019), 4883--4894.
[14]
Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness Through Awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference (Cambridge, Massachusetts) (ITCS '12). ACM, New York, NY, USA, 214--226. https://doi.org/10.1145/2090236.2090255
[15]
Michael D. Ekstrand, Rezvan Joshaghani, and Hoda Mehrpouyan. 2018. Privacy for All: Ensuring Fair and Equitable Privacy Protections. In Conference on Fairness, Accountability and Transparency, FAT 2018, 23--24 February 2018, New York, NY, USA. 35--47. http://proceedings.mlr.press/v81/ekstrand18a.html
[16]
Yanjie Fu, Guannan Liu, Yong Ge, Pengyang Wang, Hengshu Zhu, Chunxiao Li, and Hui Xiong. 2018. Representing urban forms: A collective learning model with heterogeneous human mobility data. IEEE transactions on knowledge and data engineering, Vol. 31, 3 (2018), 535--548.
[17]
Yaroslav Ganin and Victor Lempitsky. 2015. Unsupervised domain adaptation by backpropagation. In International conference on machine learning . 1180--1189.
[18]
Shengnan Guo, Youfang Lin, Ning Feng, Chao Song, and Huaiyu Wan. 2019. Attention based spatial-temporal graph convolutional networks for traffic flow forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 922--929.
[19]
Moritz Hardt, Eric Price, and Nathan Srebro. 2016. Equality of Opportunity in Supervised Learning. In Proceedings of the 30th International Conference on Neural Information Processing Systems (Barcelona, Spain) (NIPS'16). Curran Associates Inc., USA, 3323--3331. http://dl.acm.org/citation.cfm?id=3157382.3157469
[20]
Hoda Heidari, Claudio Ferrari, Krishna Gummadi, and Andreas Krause. 2018. Fairness behind a veil of ignorance: A welfare analysis for automated decision making. In Advances in Neural Information Processing Systems. 1265--1276.
[21]
Wei-Ning Hsu, Yu Zhang, and James Glass. 2017. Unsupervised learning of disentangled and interpretable representations from sequential data. In Advances in neural information processing systems. 1878--1889.
[22]
Porter Jenkins, Ahmad Farag, Suhang Wang, and Zhenhui Li. 2019. Unsupervised Representation Learning of Spatial Data via Multimodal Embedding. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 1993--2002.
[23]
Shenggong Ji, Zhaoyuan Wang, Tianrui Li, and Yu Zheng. 2020. Spatio-temporal feature fusion for dynamic taxi route recommendation via deep reinforcement learning. Knowledge-Based Systems, Vol. 205 (2020), 106302.
[24]
Hyeon-Woo Kang and Hang-Bong Kang. 2017. Prediction of crime occurrence from multi-modal data using deep learning. PloS one, Vol. 12, 4 (2017), e0176244.
[25]
Alex Kendall, Yarin Gal, and Roberto Cipolla. 2018. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7482--7491.
[26]
Guillaume Lample, Neil Zeghidour, Nicolas Usunier, Antoine Bordes, Ludovic Denoyer, and Marc'Aurelio Ranzato. 2017. Fader networks: Manipulating images by sliding attributes. In Advances in neural information processing systems. 5967--5976.
[27]
Shikun Liu, Edward Johns, and Andrew J Davison. 2019. End-to-end multi-task learning with attention. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1871--1880.
[28]
Francesco Locatello, Gabriele Abbati, Thomas Rainforth, Stefan Bauer, Bernhard Schölkopf, and Olivier Bachem. 2019. On the fairness of disentangled representations. In Advances in Neural Information Processing Systems. 14611--14624.
[29]
Christos Louizos, Kevin Swersky, Yujia Li, Max Welling, and Richard Zemel. 2015. The variational fair autoencoder. arXiv preprint arXiv:1511.00830 (2015).
[30]
Yongxi Lu, Abhishek Kumar, Shuangfei Zhai, Yu Cheng, Tara Javidi, and Rogerio Feris. 2017. Fully-adaptive feature sharing in multi-task networks with applications in person attribute classification. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5334--5343.
[31]
Xiaolei Ma, Zhuang Dai, Zhengbing He, Jihui Ma, Yong Wang, and Yunpeng Wang. 2017. Learning traffic as images: a deep convolutional neural network for large-scale transportation network speed prediction. Sensors, Vol. 17, 4 (2017), 818.
[32]
David Madras, Elliot Creager, Toniann Pitassi, and Richard Zemel. 2018. Learning adversarially fair and transferable representations. In International Conference on Machine Learning. PMLR, 3384--3393.
[33]
Gengchen Mai, Krzysztof Janowicz, Bo Yan, Rui Zhu, Ling Cai, and Ni Lao. 2020. Multi-scale representation learning for spatial feature distributions using grid cells. arXiv preprint arXiv:2003.00824 (2020).
[34]
Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, Ian Goodfellow, and Brendan Frey. 2015. Adversarial autoencoders. arXiv preprint arXiv:1511.05644 (2015).
[35]
N McNeil, J Dill, J MacArthur, J Broach, and S Howland. 2017. Breaking Barriers to Bike Share: Insights from Residents of Traditionally Underserved Neighborhoods. NITC-RR-884b. National Institute for Transportation and Communities: Portland, ME, USA (2017).
[36]
Renée J Miller. 2018. Open data integration. Proceedings of the VLDB Endowment, Vol. 11, 12 (2018), 2130--2139.
[37]
Ishan Misra, Abhinav Shrivastava, Abhinav Gupta, and Martial Hebert. 2016. Cross-stitch networks for multi-task learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3994--4003.
[38]
Stephen J Mooney, Kate Hosford, Bill Howe, An Yan, Meghan Winters, Alon Bassok, and Jana A Hirsch. 2019. Freedom from the station: Spatial equity in access to dockless bike share. Journal of Transport Geography, Vol. 74 (2019), 91--96.
[39]
Fatemeh Nargesian, Erkang Zhu, Renée J Miller, Ken Q Pu, and Patricia C Arocena. 2019. Data lake management: Challenges and opportunities. Proceedings of the VLDB Endowment, Vol. 12, 12 (2019), 1986--1989.
[40]
Anthony Michael Ricciardi, Jianhong Cecilia Xia, and Graham Currie. 2015. Exploring public transport equity between separate disadvantaged cohorts: a case study in Perth, Australia. Journal of transport geography, Vol. 43 (2015), 111--122.
[41]
R Alexander Rixey. 2013. Station-level forecasting of bikesharing ridership: Station Network Effects in Three US Systems. Transportation research record, Vol. 2387, 1 (2013), 46--55.
[42]
Sebastian Ruder. 2017. An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098 (2017).
[43]
Cynthia Rudin. 2013. Predictive policing using machine learning to detect patterns of crime. Wired Magazine, August (2013).
[44]
Anian Ruoss, Mislav Balunović, Marc Fischer, and Martin Vechev. 2020. Learning Certified Individually Fair Representations. arXiv preprint arXiv:2002.10312 (2020).
[45]
Bashir Sadeghi and Vishnu Naresh Boddeti. 2020. Imparting Fairness to Pre-Trained Biased Representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops . 16--17.
[46]
Bilong Shen, Xiaodan Liang, Yufeng Ouyang, Miaofeng Liu, Weimin Zheng, and Kathleen M Carley. 2018. StepDeep: A Novel Spatial-temporal Mobility Event Prediction Framework based on Deep Neural Network. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 724--733.
[47]
SimplyAnalytics. 2018. EASI/MRI Census US . SimplyAnalytics Retrieved November 2, 2018 from
[48]
Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems. 3104--3112.
[49]
Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE international conference on computer vision. 4489--4497.
[50]
Christina Wadsworth, Francesca Vera, and Chris Piech. 2018. Achieving fairness through adversarial learning: an application to recidivism prediction. arXiv preprint arXiv:1807.00199 (2018).
[51]
Dong Wang, Wei Cao, Jian Li, and Jieping Ye. 2017. DeepSD: supply-demand prediction for online car-hailing services using deep neural networks. In 2017 IEEE 33rd International Conference on Data Engineering (ICDE). IEEE, 243--254.
[52]
Mingshu Wang and Lan Mu. 2018. Spatial disparities of Uber accessibility: An exploratory analysis in Atlanta, USA. Computers, Environment and Urban Systems, Vol. 67 (2018), 169--175.
[53]
Pengyang Wang, Yanjie Fu, Jiawei Zhang, Pengfei Wang, Yu Zheng, and Charu Aggarwal. 2018. You are how you drive: Peer and temporal-aware representation learning for driving behavior analysis. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 2457--2466.
[54]
Svante Wold, Kim Esbensen, and Paul Geladi. 1987. Principal component analysis. Chemometrics and intelligent laboratory systems, Vol. 2, 1--3 (1987), 37--52.
[55]
Depeng Xu, Yongkai Wu, Shuhan Yuan, Lu Zhang, and Xintao Wu. 2019 a. Achieving causal fairness through generative adversarial networks. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence .
[56]
Depeng Xu, Shuhan Yuan, Lu Zhang, and Xintao Wu. 2019 b. FairGAN
[57]
: Achieving Fair Data Generation and Classification through Generative Adversarial Nets. In 2019 IEEE International Conference on Big Data (Big Data). IEEE, 1401--1406.
[58]
An Yan and Bill Howe. 2019. FairST: Equitable Spatial and Temporal Demand Prediction for New Mobility Systems. In Proceedings of the 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems . 552--555.
[59]
An Yan and Bill Howe. 2020. Fairness-Aware Demand Prediction for New Mobility. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 1079--1087.
[60]
Sirui Yao and Bert Huang. 2017. New fairness metrics for recommendation that embrace differences. arXiv preprint arXiv:1706.09838 (2017).
[61]
Yaqiang Yao, Jie Cao, and Huanhuan Chen. 2019. Robust Task Grouping with Representative Tasks for Clustered Multi-Task Learning. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 1408--1417.
[62]
Haiyang Yu, Zhihai Wu, Shuqin Wang, Yunpeng Wang, and Xiaolei Ma. 2017. Spatiotemporal recurrent convolutional networks for traffic prediction in transportation networks. Sensors, Vol. 17, 7 (2017), 1501.
[63]
Zhuoning Yuan, Xun Zhou, and Tianbao Yang. 2018. Hetero-ConvLSTM: A deep learning approach to traffic accident prediction on heterogeneous spatio-temporal data. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 984--992.
[64]
Amir Zadeh, Minghai Chen, Soujanya Poria, Erik Cambria, and Louis-Philippe Morency. 2017. Tensor fusion network for multimodal sentiment analysis. arXiv preprint arXiv:1707.07250 (2017).
[65]
Richard Zemel, Yu Wu, Kevin Swersky, Toniann Pitassi, and Cynthia Dwork. 2013. Learning Fair Representations. In Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28 (Atlanta, GA, USA) (ICML'13). JMLR.org, III--325--III--333. http://dl.acm.org/citation.cfm?id=3042817.3042973
[66]
Junbo Zhang, Yu Zheng, Dekang Qi, Ruiyuan Li, Xiuwen Yi, and Tianrui Li. 2018. Predicting citywide crowd flows using deep spatio-temporal residual networks. Artificial Intelligence, Vol. 259 (2018), 147--166.
[67]
Yu Zheng. 2015. Methodologies for cross-domain data fusion: An overview. IEEE transactions on big data, Vol. 1, 1 (2015), 16--34.
[68]
Yu Zheng, Licia Capra, Ouri Wolfson, and Hai Yang. 2014. Urban computing: concepts, methodologies, and applications. ACM Transactions on Intelligent Systems and Technology (TIST), Vol. 5, 3 (2014), 38.
[69]
Indre Zliobaite. 2015. A survey on measuring indirect discrimination in machine learning. arXiv preprint arXiv:1511.00148 (2015).

Cited By

View all
  • (2024)Travel Demand Forecasting: A Fair AI ApproachIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.339506125:10(14611-14627)Online publication date: Oct-2024
  • (2023)Demystifying the QoS and QoE of Edge-hosted Video Streaming Applications in the Wild with SNESetProceedings of the ACM on Management of Data10.1145/36267231:4(1-29)Online publication date: 12-Dec-2023
  • (2023)Model Debiasing via Gradient-based Explanation on RepresentationProceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society10.1145/3600211.3604668(193-204)Online publication date: 8-Aug-2023
  • Show More Cited By

Index Terms

  1. EquiTensors: Learning Fair Integrations of Heterogeneous Urban Data

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data
    June 2021
    2969 pages
    ISBN:9781450383431
    DOI:10.1145/3448016
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 June 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. data integration
    2. fairness in machine learning
    3. neural networks
    4. open data
    5. spatial-temporal predictions

    Qualifiers

    • Research-article

    Funding Sources

    • NSF

    Conference

    SIGMOD/PODS '21
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)88
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 17 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Travel Demand Forecasting: A Fair AI ApproachIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.339506125:10(14611-14627)Online publication date: Oct-2024
    • (2023)Demystifying the QoS and QoE of Edge-hosted Video Streaming Applications in the Wild with SNESetProceedings of the ACM on Management of Data10.1145/36267231:4(1-29)Online publication date: 12-Dec-2023
    • (2023)Model Debiasing via Gradient-based Explanation on RepresentationProceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society10.1145/3600211.3604668(193-204)Online publication date: 8-Aug-2023
    • (2023)Fairness-Enhancing Deep Learning for Ride-Hailing Demand PredictionIEEE Open Journal of Intelligent Transportation Systems10.1109/OJITS.2023.32975174(551-569)Online publication date: 2023
    • (2023)The Current State and Challenges of Fairness in Federated LearningIEEE Access10.1109/ACCESS.2023.329541211(80903-80914)Online publication date: 2023
    • (2022)A Coverage-based Approach to Nondiscrimination-aware Data TransformationJournal of Data and Information Quality10.1145/354691314:4(1-26)Online publication date: 23-Nov-2022
    • (2022)Integrative urban AI to expand coverage, access, and equity of urban dataThe European Physical Journal Special Topics10.1140/epjs/s11734-022-00475-z231:9(1741-1752)Online publication date: 9-Apr-2022
    • (2022)Fairness-Aware Range Queries for Selecting Unbiased Data2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00111(1423-1436)Online publication date: May-2022
    • (2022)An Auditing Framework for Analyzing Fairness of Spatial-Temporal Federated Learning Applications2022 IEEE World AI IoT Congress (AIIoT)10.1109/AIIoT54504.2022.9817283(699-707)Online publication date: 6-Jun-2022
    • (2022)Fairness & friends in the data science eraAI & Society10.1007/s00146-022-01472-538:2(721-731)Online publication date: 9-Jun-2022

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media