research-article

EquiTensors: Learning Fair Integrations of Heterogeneous Urban Data

Authors:

An Yan,

Bill HoweAuthors Info & Claims

SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data

Pages 2338 - 2347

https://doi.org/10.1145/3448016.3452777

Published: 18 June 2021 Publication History

Get Access

Abstract

Neural methods are state-of-the-art for urban prediction problems such as transportation resource demand, accident risk, crowd mobility, and public safety. Model performance can be improved by integrating exogenous features from open data repositories (e.g., weather, housing prices, traffic, etc.), but these uncurated sources are often too noisy, incomplete, and biased to use directly. We propose to learn integrated representations, called EquiTensors, from heterogeneous datasets that can be reused across a variety of tasks. We align datasets to a consistent spatio-temporal domain, then describe an unsupervised model based on convolutional denoising autoencoders to learn shared representations. We extend this core integrative model with adaptive weighting to prevent certain datasets from dominating the signal. To combat discriminatory bias, we use adversarial learning to remove correlations with a sensitive attribute (e.g., race or income). Experiments with 23 input datasets and 4 real applications show that EquiTensors could help mitigate the effects of the sensitive information embodied in the biased data. Meanwhile, applications using EquiTensors outperform models that ignore exogenous features and are competitive with "oracle" models that use hand-selected datasets.

Supplementary Material

MP4 File (3448016.3452777.mp4)

Neural methods are state-of-the-art for urban prediction problems such as transportation resource demand, accident risk, crowd mobility, and public safety. Model performance can be improved by integrating exogenous features from open data repositories (e.g., weather, housing prices, traffic, etc.), but these uncurated sources are often too noisy, incomplete, and biased to use directly. We propose to learn integrated features from heterogeneous datasets that can be reused across a variety of tasks. We align datasets to a consistent spatio-temporal domain, then describe an unsupervised model based on denoising autoencoders to learn shared features. We extend this core integrative model with adaptive weighting to prevent certain datasets from dominating the signal. To combat discriminatory bias, we use adversarial training to remove correlations with a sensitive attribute (e.g., race or income). Experiments with 23 input datasets and 4 real applications show models using pre-trained features outperform those that ignore exogenous features and are competitive with supervised "oracle" models that use hand-selected datasets while significantly improving fairness. We conclude that our pre-trained integrated features can improve model performance and reduce discriminatory effects for complex prediction applications, broadening the utility of uncurated civic open data repositories.

Download
35.83 MB

References

[1]

Mart'in Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et almbox. 2016. Tensorflow: a system for large-scale machine learning. In OSDI, Vol. 16. 265--283.

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Clinical and genomics data integration using meta-dimensional approach

Open Data Integration

Contextual word embeddings for tabular data search and integration

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations