Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3692070.3692739guideproceedingsArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
research-article

Collaborative heterogeneous causal inference beyond meta-analysis

Published: 21 July 2024 Publication History

Abstract

Collaboration between different data centers is often challenged by heterogeneity across sites. To account for the heterogeneity, the state-of-the-art method is to re-weight the covariate distributions in each site to match the distribution of the target population. Nevertheless, this method still relies on the concept of traditional meta-analysis after adjusting for the distribution shift. This work proposes a collaborative inverse propensity score weighting estimator for causal inference with heterogeneous data. Instead of adjusting the distribution shift separately, we use weighted propensity score models to collaboratively adjust for the distribution shift. Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases. By incorporating outcome regression models, we prove the asymptotic normality when the covariates have dimension d < 8. Our methods preserve privacy at individual sites by implementing federated learning protocols.

References

[1]
Abadie, A. and Imbens, G. W. Matching on the Estimated Propensity Score. Econometrica, 84(2):781-807, 2016. ISSN 0012-9682. URL https://www.econometricsociety.org/doi/10.3982/ECTA11293.
[2]
Athey, S. and Imbens, G. Machine Learning Methods Economists Should Know About, March 2019. URL http://arxiv.org/abs/1903.10075. arXiv:1903.10075 [econ, stat].
[3]
Athey, S. and Wager, S. Policy Learning with Observational Data, September 2020. URL http://arxiv.org/abs/1702.02896. arXiv:1702.02896 [cs, econ, math, stat].
[4]
Bang, H. and Robins, J. M. Doubly Robust Estimation in Missing Data and Causal Inference Models. Biometrics, 61(4):962-973, 2005. ISSN 1541-0420.
[5]
Betthäuser, B. A., Bach-Mortensen, A. M., and Engzell, P. A systematic review and meta-analysis of the evidence on learning during the COVID-19 pandemic. Nature Human Behaviour, 7(3):375-385, March 2023. ISSN 2397-3374. URL https://www.nature.com/articles/s41562-022-01506-4. Number: 3 Publisher: Nature Publishing Group.
[6]
Borenstein, M., Hedges, L., and Rothstein, H. Introduction to Meta-Analysis. 2007.
[7]
Borenstein, M., Hedges, L. V., Higgins, J. P., and Rothstein, H. R. A basic introduction to fixed-effect and random-effects models for metaanalysis. Research Synthesis Methods, 1(2):97-111, 2010. ISSN 1759-2887.
[8]
Cheng, D. and Cai, T. Adaptive Combination of Randomized and Observational Data, November 2021. URL http://arxiv.org/abs/2111.15012. arXiv:2111.15012 [stat].
[9]
Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., and Robins, J. Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21(1):C1-C68, February 2018. ISSN 1368-4221, 1368-423X. URL https://academic.oup.com/ectj/article/21/1/C1/5056401.
[10]
Colnet, B., Mayer, I., Chen, G., Dieng, A., Li, R., Varoquaux, G., Vert, J.-P., Josse, J., and Yang, S. Causal inference methods for combining randomized trials and observational studies: a review, January 2023. URL http://arxiv.org/abs/2011.08047. arXiv:2011.08047 [stat].
[11]
Concato, J., Shah, N., and Horwitz, R. I. Randomized, Controlled Trials, Observational Studies, and the Hierarchy of Research Designs. New England Journal of Medicine, 342(25):1887-1892, June 2000. ISSN 0028-4793, 1533-4406.
[12]
Cook, T. D., Campbell, D. T., and Shadish, W. Experimental and quasi-experimental designs for generalized causal inference, volume 1195. Houghton Mifflin Boston, MA, 2002.
[13]
Farahani, A., Voghoei, S., Rasheed, K., and Arabnia, H. R. A Brief Review of Domain Adaptation, October 2020. URL http://arxiv.org/abs/2010.03978. arXiv:2010.03978 [cs].
[14]
Foster, D. J. and Syrgkanis, V. Orthogonal Statistical Learning, September 2020. URL http://arxiv.org/abs/1901.09036. arXiv:1901.09036 [cs, econ, math, stat].
[15]
Funk, M. J., Westreich, D., Wiesen, C., Stürmer, T., Brookhart, M. A., and Davidian, M. Doubly Robust Estimation of Causal Effects. American Journal of Epidemiology, 173(7):761-767, April 2011. ISSN 1476-6256, 0002-9262. URL https://academic.oup.com/aje/article-lookup/doi/10.1093/aje/kwq439.
[16]
Glynn, A. N. and Quinn, K. M. An Introduction to the Augmented Inverse Propensity Weighted Estimator. Political Analysis, 18(1):36-56, 2010. ISSN 1047-1987, 1476-4989. URL https://www.cambridge.org/core/product/identifier/S1047198700012304/type/journal_article.
[17]
Guo, Z., Li, X., Han, L., and Cai, T. Robust Inference for Federated Meta-Learning, January 2023. URL http://arxiv.org/abs/2301.00718. arXiv:2301.00718 [stat].
[18]
Han, L., Hou, J., Cho, K., Duan, R., and Cai, T. Federated Adaptive Causal Estimation (FACE) of Target Treatment Effects, April 2022. URL http://arxiv.org/abs/2112.09313. arXiv:2112.09313 [math, stat].
[19]
Han, L., Li, Y., Niknam, B. A., and Zubizarreta, J. R. Privacy-Preserving, Communication-Efficient, and Target-Flexible Hospital Quality Measurement, February 2023a. URL http://arxiv.org/abs/2203.00768. arXiv:2203.00768 [stat].
[20]
Han, L., Shen, Z., and Zubizarreta, J. Multiply Robust Federated Estimation of Targeted Average Treatment Effects, September 2023b. URL http://arxiv.org/abs/2309.12600. arXiv:2309.12600 [cs, math, stat].
[21]
Hedges, L. V. and Vevea, J. L. Fixed-and random-effects models in meta-analysis. Psychological methods, 3(4): 486, 1998. Publisher: American Psychological Association.
[22]
Higgins, J. P. T., Thompson, S. G., and Spiegelhalter, D. J. A Re-Evaluation of Random-Effects Meta-Analysis. Journal of the Royal Statistical Society Series A: Statistics in Society, 172(1):137-159, January 2009. ISSN 0964-1998, 1467-985X. URL https://academic.oup.com/jrsssa/article/172/1/137/7084465.
[23]
Hu, M., Shi, X., and Song, P. X.-K. Collaborative causal inference with a distributed data-sharing management, April 2022. URL http://arxiv.org/abs/2204.00857. arXiv:2204.00857 [stat].
[24]
Huang, J., Gretton, A., Borgwardt, K., Schölkopf, B., and Smola, A. Correcting Sample Selection Bias by Unlabeled Data. In Advances in Neural Information Processing Systems, volume 19. MIT Press, 2006. URL https://proceedings.neurips.cc/paper/2006/hash/a2186aa7c086b46ad4e8bf81e2a3a19b-Abstract.html.
[25]
Härdle, W., Müller, M., Sperlich, S., Werwatz, A., and others. Nonparametric and semiparametric models, volume 1. Springer, 2004.
[26]
Imbens, G. W. and Rubin, D. B. Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge University Press, 2015.
[27]
Jin, Y., Guo, K., and Rothenhäusler, D. Diagnosing the role of observable distribution shift in scientific replications, September 2023. URL http://arxiv.org/abs/2309.01056. arXiv:2309.01056 [stat].
[28]
Karimireddy, S. P., Kale, S., Mohri, M., Reddi, S., Stich, S., and Suresh, A. T. SCAFFOLD: Stochastic Controlled Averaging for Federated Learning. In Proceedings of the 37th International Conference on Machine Learning, pp. 5132-5143. PMLR, November 2020. URL https://proceedings.mlr.press/v119/karimireddy20a.html. ISSN: 2640-3498.
[29]
Koesters, M., Guaiana, G., Cipriani, A., Becker, T., and Barbui, C. Agomelatine efficacy and acceptability revisited: systematic review and meta-analysis of published and unpublished randomised trials. British Journal of Psychiatry, 203(3):179-187, September 2013. ISSN 0007-1250, 1472-1465. URL https://www.cambridge.org/core/product/identifier/S0007125000052533/type/journal_article.
[30]
Li, X., Huang, K., Yang, W., Wang, S., and Zhang, Z. On the Convergence of FedAvg on Non-IID Data, June 2020. URL http://arxiv.org/abs/1907.02189. arXiv:1907.02189 [cs, math, stat].
[31]
Lin, Z., Ding, P., and Han, F. Estimation based on nearest neighbor matching: from density ratio to average treatment effect, December 2021. URL http://arxiv.org/abs/2112.13506. arXiv:2112.13506 [econ, math, stat].
[32]
Little, R. J. and Rubin, D. B. Statistical analysis with missing data, volume 793. John Wiley & Sons, 2019.
[33]
Pennycook, G., McPhetres, J., Zhang, Y., Lu, J. G., and Rand, D. G. Fighting COVID-19 Misinformation on Social Media: Experimental Evidence for a Scalable Accuracy-Nudge Intervention. Psychological Science, 31(7):770-780, July 2020. ISSN 0956-7976. Publisher: SAGE Publications Inc.
[34]
Raschka, S. Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning, November 2020. URL http://arxiv.org/abs/1811.12808. arXiv:1811.12808 [cs, stat].
[35]
Riley, R. D., Higgins, J. P. T., and Deeks, J. J. Interpretation of random effects meta-analyses. BMJ, 342: d549, February 2011. ISSN 0959-8138, 1468-5833. URL https://www.bmj.com/content/342/bmj.d549. Publisher: British Medical Journal Publishing Group Section: Research Methods & Reporting.
[36]
Roozenbeek, J., Freeman, A. L. J., and Linden, S. v. d. How Accurate Are Accuracy-Nudge Interventions? A Preregistered Direct Replication of Pennycook et al. (2020). Psychological Science, 32(7):1169-1178, 2021.
[37]
Rothwell, P. M. External validity of randomised controlled trials: To whom do the results of this trial apply?. The Lancet, 365(9453):82-93, January 2005. ISSN 01406736. URL https://linkinghub.elsevier.com/retrieve/pii/S0140673604176708.
[38]
Stroup, D. F. Meta-analysis of Observational Studies in EpidemiologyA Proposal for Reporting. JAMA, 283(15):2008, April 2000. ISSN 0098-7484. URL http://jama.jamanetwork.com/article.aspx?doi=10.1001/jama.283.15.2008.
[39]
Sugiyama, M., Krauledat, M., and Müller, K.-R. Covariate shift adaptation by importance weighted cross validation. Journal of Machine Learning Research, 8(5), 2007a.
[40]
Sugiyama, M., Nakajima, S., Kashima, H., Buenau, P., and Kawanabe, M. Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation. In Advances in Neural Information Processing Systems, volume 20. Curran Associates, Inc., 2007b. URL https://proceedings.neurips.cc/paper_files/paper/2007/hash/be83ab3ecd0db773eb2dc1b0a17836a1-Abstract.html.
[41]
Tibshirani, R. Regression Shrinkage and Selection Via the Lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267-288, January 1996. ISSN 0035-9246, 2517-6161.
[42]
Tufanaru, C., Munn, Z., Stephenson, M., and Aromataris, E. Fixed or random effects meta-analysis? Common methodological issues in systematic reviews of effectiveness. JBI Evidence Implementation, 13(3):196, September 2015. ISSN 2691-3321. URL https://journals.lww.com/ijebh/fulltext/2015/09000/fixed_or_random_effects_meta_analysis__common.12.aspx.
[43]
van Houwelingen, H. C., Arends, L. R., and Stijnen, T. Advanced methods in meta-analysis: multivariate approach and meta-regression. Statistics in Medicine, 21(4):589-624, 2002. ISSN 1097-0258.
[44]
Vershynin, R. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018.
[45]
Vo, T. V., Bhattacharyya, A., Lee, Y., and Leong, T.-Y. An adaptive kernel approach to federated learning of heterogeneous causal effects. Advances in Neural Information Processing Systems, 35:24459-24473, 2022.
[46]
Vo, T. V., lee, Y., and Leong, T.-Y. Federated Learning of Causal Effects from Incomplete Observational Data, August 2023. URL http://arxiv.org/abs/2308.13047. arXiv:2308.13047 [cs, stat].
[47]
Wager, S. and Athey, S. Estimation and Inference of Heterogeneous Treatment Effects using Random Forests, July 2017. URL http://arxiv.org/abs/1510.04342. arXiv:1510.04342 [math, stat].
[48]
Xiong, R., Koenecke, A., Powell, M., Shen, Z., Vogelstein, J. T., and Athey, S. Federated Causal Inference in Heterogeneous Observational Data, December 2022. URL http://arxiv.org/abs/2107.11732. arXiv:2107.11732 [cs, econ, q-bio, stat].
[49]
Yang, S. and Ding, P. Combining Multiple Observational Data Sources to Estimate Causal Effects. Journal of the American Statistical Association, 115(531):1540-1554, 2020. ISSN 0162-1459. URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7571608/.
[50]
Zhang, S., Li, X., Zong, M., Zhu, X., and Wang, R. Efficient kNN Classification With Different Numbers of Nearest Neighbors. IEEE Transactions on Neural Networks and Learning Systems, 29(5):1774-1785, May 2018. ISSN 2162-237X, 2162-2388. URL http://ieeexplore.ieee.org/document/7898482/.
[51]
Zhao, Q. and Percival, D. Entropy balancing is doubly robust. Journal of Causal Inference, 5(1):20160010, September 2017. ISSN 2193-3685, 2193-3677. URL http://arxiv.org/abs/1501.03571. arXiv:1501.03571 [stat].

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
ICML'24: Proceedings of the 41st International Conference on Machine Learning
July 2024
63010 pages

Publisher

JMLR.org

Publication History

Published: 21 July 2024

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media