Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3627673.3679593acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article
Open access

Combining Incomplete Observational and Randomized Data for Heterogeneous Treatment Effects

Published: 21 October 2024 Publication History

Abstract

Data from observational studies (OSs) is widely available and readily obtainable yet frequently contains confounding biases. On the other hand, data derived from randomized controlled trials (RCTs) helps to reduce these biases; however, it is expensive to gather, resulting in a tiny size of randomized data. For this reason, effectively fusing observational data and randomized data to better estimate heterogeneous treatment effects (HTEs) has gained increasing attention. However, existing methods for integrating observational data with randomized data must require complete observational data, meaning that both treated subjects and untreated subjects must be included in OSs. This prerequisite confines the applicability of such methods to very specific situations, given that including all subjects, whether treated or untreated, in observational studies is not consistently achievable. In our paper, we propose a resilient approach to Combine Incomplete Observational data and randomized data for HTE estimation, which we abbreviate as CIO. The CIO is capable of estimating HTEs efficiently regardless of the completeness of the observational data, be it full or partial. Concretely, a confounding bias function is first derived using the pseudo-experimental group from OSs, in conjunction with the pseudo-control group from RCTs, via an effect estimation procedure. This function is subsequently utilized as a corrective residual to rectify the observed outcomes of observational data during the HTE estimation by combining the available observational data and the all randomized data. To validate our approach, we have conducted experiments on a synthetic dataset and two semi-synthetic datasets.

References

[1]
Jeffrey A. Smith and Petra E. Todd. 2005. Does Matching Overcome LaLonde's Critique of Nonexperimental Estimators? Journal of Econometrics 125, 1 (March 2005), 305--353. https://doi.org/10.1016/j.jeconom.2004.04.011
[2]
Ahmed Alaa and Mihaela Schaar. 2018. Limits of Estimating Heterogeneous Treatment Effects: Guidelines for Practical Algorithm Design. In Proceedings of the 35th International Conference on Machine Learning. PMLR, 129--138.
[3]
Ahmed M. Alaa and Mihaela van der Schaar. 2017. Bayesian Inference of Individualized Treatment Effects Using Multi-task Gaussian Processes. https://doi.org/10.48550/arXiv.1704.02801 arXiv:1704.02801 [cs]
[4]
Susan Athey. 2017. Beyond Prediction: Using Big Data for Policy Problems. Science 355, 6324 (Feb. 2017), 483--485. https://doi.org/10.1126/science.aal4321
[5]
Susan Athey, Raj Chetty, and Guido Imbens. 2020. Combining Experimental and Observational Data to Estimate Treatment Effects on Long Term Outcomes. https://doi.org/10.48550/arXiv.2006.09676 arXiv:2006.09676 [econ, stat]
[6]
Susan Athey, Julie Tibshirani, and Stefan Wager. 2018. Generalized Random Forests. https://doi.org/10.48550/arXiv.1610.01271 arXiv:1610.01271 [econ, stat]
[7]
Kay H. Brodersen, Fabian Gallusser, Jim Koehler, Nicolas Remy, and Steven L. Scott. 2015. Inferring Causal Impact Using Bayesian Structural Time-Series Models. The Annals of Applied Statistics 9, 1 (March 2015). https://doi.org/10.1214/14-AOAS788
[8]
David Cheng and Tianxi Cai. 2021. Adaptive Combination of Randomized and Observational Data. https://doi.org/10.48550/arXiv.2111.15012 arXiv:2111.15012 [stat]
[9]
Bénédicte Colnet, Imke Mayer, Guanhua Chen, Awa Dieng, Ruohong Li, Gaël Varoquaux, Jean-Philippe Vert, Julie Josse, and Shu Yang. 2023. Causal Inference Methods for Combining Randomized Trials and Observational Studies: A Review. https://doi.org/10.48550/arXiv.2011.08047 arXiv:2011.08047 [stat]
[10]
Irina Degtiar and Sherri Rose. 2023. A Review of Generalizability and Transportability. Annual Review of Statistics and Its Application 10, 1 (March 2023), 501--524. https://doi.org/10.1146/annurev-statistics-042522--103837 arXiv:2102.11904 [stat]
[11]
AmirEmad Ghassami, Alan Yang, David Richardson, Ilya Shpitser, and Eric Tchetgen Tchetgen. 2022. Combining Experimental and Observational Data for Identification and Estimation of Long-Term Causal Effects. https://doi.org/10.48550/arXiv.2201.10743 arXiv:2201.10743 [econ, math, stat]
[12]
Thomas A. Glass, Steven N. Goodman, Miguel A. Hernán, and Jonathan M. Samet. 2013. Causal Inference in Public Health. Annual Review of Public Health 34, 1 (2013), 61--75. https://doi.org/10.1146/annurev-publhealth-031811--124606
[13]
Jia Gu, Caizhi Tang, Han Yan, Qing Cui, Longfei Li, and Jun Zhou. 2023. FAST: A Fused and Accurate Shrinkage Tree for Heterogeneous Treatment Effects Estimation. Thirty-seventh Conference on Neural Information Processing Systems (2023).
[14]
Margaret A. Hamburg and Francis S. Collins. 2010. The Path to Personalized Medicine. The New England Journal of Medicine 363, 4 (July 2010), 301--304. https://doi.org/10.1056/NEJMp1006304
[15]
Tobias Hatt, Jeroen Berrevoets, Alicia Curth, Stefan Feuerriegel, and Mihaela van der Schaar. 2022. Combining Observational and Randomized Data for Estimating Heterogeneous Treatment Effects. arXiv:2202.12891 [cs, stat]
[16]
Jennifer L. Hill. 2011. Bayesian Nonparametric Modeling for Causal Inference. Journal of Computational and Graphical Statistics 20, 1 (Jan. 2011), 217--240. https://doi.org/10.1198/jcgs.2010.08162
[17]
Fredrik D. Johansson, Nathan Kallus, Uri Shalit, and David Sontag. 2018. Learning Weighted Representations for Generalization Across Designs. https://doi.org/10.48550/arXiv.1802.08598 arXiv:1802.08598 [stat]
[18]
Fredrik D Johansson, Uri Shalit, and David Sontag. [n. d.]. Learning Representations for Counterfactual Inference. ([n. d.]).
[19]
Nathan Kallus, Aahlad Manas Puli, and Uri Shalit. 2018. Removing Hidden Confounding by Experimental Grounding. https://doi.org/10.48550/arXiv.1810.11646 arXiv:1810.11646 [cs, stat]
[20]
Alan B. Krueger. 1999. Experimental Estimates of Education Production Functions. The Quarterly Journal of Economics 114, 2 (1999), 497--532. jstor:2587015
[21]
Sören R. Künzel, Jasjeet S. Sekhon, Peter J. Bickel, and Bin Yu. 2019. Meta-Learners for Estimating Heterogeneous Treatment Effects Using Machine Learning. Proceedings of the National Academy of Sciences 116, 10 (March 2019), 4156--4165. https://doi.org/10.1073/pnas.1804597116 arXiv:1706.03461 [math, stat]
[22]
Milan Kuzmanovic, Tobias Hatt, and Stefan Feuerriegel. 2021. Deconfounding Temporal Autoencoder: Estimating Treatment Effects over Time Using Noisy Proxies. https://doi.org/10.48550/arXiv.2112.03013 arXiv:2112.03013 [cs, stat]
[23]
Robert J. LaLonde. 1986. Evaluating the Econometric Evaluations of Training Programs with Experimental Data. The American Economic Review 76, 4 (1986), 604--620. jstor:1806062
[24]
Christos Louizos, Uri Shalit, Joris Mooij, David Sontag, Richard Zemel, and Max Welling. 2017. Causal Effect Inference with Deep Latent-Variable Models. https://doi.org/10.48550/arXiv.1705.08821 arXiv:1705.08821 [cs, stat]
[25]
Xinkun Nie and Stefan Wager. 2020. Quasi-Oracle Estimation of Heterogeneous Treatment Effects. https://doi.org/10.48550/arXiv.1712.04912 arXiv:1712.04912 [econ, math, stat]
[26]
Scott Powers, Junyang Qian, Kenneth Jung, Alejandro Schuler, Nigam H. Shah, Trevor Hastie, and Robert Tibshirani. 2018. Some Methods for Heterogeneous Treatment Effect Estimation in High Dimensions. Statistics in Medicine 37, 11 (May 2018), 1767--1787. https://doi.org/10.1002/sim.7623
[27]
James M. Robins, Miguel Ángel Hernán, and Babette Brumback. 2000. Marginal Structural Models and Causal Inference in Epidemiology:. Epidemiology 11, 5 (Sept. 2000), 550--560. https://doi.org/10.1097/00001648--200009000-00011
[28]
Paul R. Rosenbaum and Donald B. Rubin. 1983. The Central Role of the Propensity Score in Observational Studies for Causal Effects. Biometrika 70, 1 (1983), 41--55. https://doi.org/10.2307/2335942 jstor:2335942
[29]
Evan Rosenman, Guillaume Basse, Art Owen, and Michael Baiocchi. 2020. Combining Observational and Experimental Datasets Using Shrinkage Estimators. https://doi.org/10.48550/arXiv.2002.06708 arXiv:2002.06708 [math, stat]
[30]
Donald B. Rubin. 1974. Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies. Journal of Educational Psychology 66, 5 (Oct. 1974), 688--701. https://doi.org/10.1037/h0037350
[31]
Patrick Schwab, Lorenz Linhardt, and Walter Karlen. 2019. Perfect Match: A Simple Method for Learning Representations For Counterfactual Inference With Neural Networks. https://doi.org/10.48550/arXiv.1810.00656 arXiv:1810.00656 [cs, stat]
[32]
Uri Shalit, Fredrik D. Johansson, and David Sontag. 2017. Estimating Individual Treatment Effect: Generalization Bounds and Algorithms. In Proceedings of the 34th International Conference on Machine Learning. PMLR, 3076--3085.
[33]
Caizhi Tang, Huiyuan Wang, Xinyu Li, Qing Cui, Ya-Lin Zhang, Feng Zhu, Longfei Li, Jun Zhou, and Linbo Jiang. 2022. Debiased Causal Tree: Heterogeneous Treatment Effects Estimation with Unmeasured Confounding. Advances in Neural Information Processing Systems 35 (2022), 5628--5640.
[34]
Stefan Wager and Susan Athey. 2017. Estimation and Inference of Heterogeneous Treatment Effects Using Random Forests. https://doi.org/10.48550/arXiv.1510.04342 arXiv:1510.04342 [math, stat]
[35]
Shu Yang. 2022. Integrative $R$-Learner of Heterogeneous Treatment Effects Combining Experimental and Observational Studies. In Proceedings of the First Conference on Causal Learning and Reasoning. PMLR, 904--926.
[36]
Shu Yang and Peng Ding. 2021. Combining Multiple Observational Data Sources to Estimate Causal Effects. arXiv:1801.00802 [stat]
[37]
Shu Yang, Donglin Zeng, and Xiaofei Wang. 2022. Improved Inference for Heterogeneous Treatment Effects Using Real-World Data Subject to Hidden Confounding. https://doi.org/10.48550/arXiv.2007.12922 arXiv:2007.12922 [stat]
[38]
Liuyi Yao, Sheng Li, Yaliang Li, Mengdi Huai, Jing Gao, and Aidong Zhang. 2018. Representation Learning for Treatment Effect Estimation from Observational Data. In Advances in Neural Information Processing Systems, Vol. 31. Curran Associates, Inc.
[39]
Jinsung Yoon and James Jordon. 2018. GANITE: ESTIMATION OF INDIVIDUALIZED TREAT- MENT EFFECTS USING GENERATIVE ADVERSARIAL. (2018).
[40]
Yao Zhang, Alexis Bellot, and Mihaela van der Schaar. 2020. Learning Overlapping Representations for the Estimation of Individualized Treatment Effects. https://doi.org/10.48550/arXiv.2001.04754 arXiv:2001.04754 [cs, stat]

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management
October 2024
5705 pages
ISBN:9798400704369
DOI:10.1145/3627673
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 October 2024

Check for updates

Author Tags

  1. causal inference
  2. heterogeneous treatment effects
  3. observational data
  4. random control trial data

Qualifiers

  • Research-article

Conference

CIKM '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 154
    Total Downloads
  • Downloads (Last 12 months)154
  • Downloads (Last 6 weeks)82
Reflects downloads up to 24 Dec 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media