Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3626772.3657885acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

OEHR: An Orthopedic Electronic Health Record Dataset

Published: 11 July 2024 Publication History

Abstract

During the past decades, healthcare institutions continually amassed clinical data that is not intended to support research. Despite the increasing number of publicly available electronic health record (EHR) datasets, it is difficult to find publicly available datasets in Orthopedics that can be used to compare and evaluate downstream tasks. This paper presents OEHR, a healthcare benchmark dataset in Orthopedics, sourced from the EHR of real hospitals. Information available includes patient measurements, diagnoses, treatments, clinical notes, and medical images. OEHR is intended to support clinical research. To evaluate the quality of OEHR, we conduct extensive experiments by implementing state-of-the-art methods for performing downstream tasks. The results show that OEHR serves as a valuable extension to existing publicly available EHR datasets. The dataset is available at http://47.94.174.82/.

References

[1]
Ian Alexander. 2007. Electronic medical records for the orthopaedic practice. Clinical orthopaedics and related research, 457, 114--9. https://api.semanticscholar.org/CorpusID:24896288.
[2]
Khalid Alghatani, Nariman Ammar, Abdelmounaam Rezgui, Arash Shaban-Nejad, et al. 2021. Predicting intensive care unit length of stay and mortality using patient vital signs: machine learning model development and validation. JMIR medical informatics, 9, 5, e21347.
[3]
Belal Alsinglawi, Osama Alshari, Mohammed Alorjani, Omar Mubin, Fady Alnajjar, Mauricio Novoa, and Omar Darwish. 2022. An explainable machine learning framework for lung cancer hospital length of stay prediction. Scientific reports, 12, 1, 607.
[4]
Zhengming Chen, Junshi Chen, Rory Collins, Yu Guo, Richard Peto, Fan Wu, and Liming Li. 2011. China kadoorie biobank of 0.5 million people: survey methods, baseline characteristics and long-term follow-up. International journal of epidemiology, 40, 6, 1652--1666.
[5]
Edward Choi, Mohammad Taha Bahadori, Le Song, Walter F Stewart, and Jimeng Sun. 2017. Gram: graph-based attention model for healthcare representation learning. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, 787--795.
[6]
Edward Choi, Mohammad Taha Bahadori, Jimeng Sun, Joshua Kulas, Andy Schuetz, and Walter Stewart. 2016. Retain: an interpretable predictive model for healthcare using reverse time attention mechanism. Advances in neural information processing systems, 29.
[7]
Zhixing Ding, Zhengqiang Li, Xi Li, and Hao Li. 2024. Drr: global contextaware neural network using disease relationship reasoning and attention-based feature fusion. Mathematics, 12, 3, 488.
[8]
Rachael L Fleurence, Lesley H Curtis, Robert M Califf, Richard Platt, Joe V Selby, and Jeffrey S Brown. 2014. Launching pcornet, a national patient-centered clinical research network. Journal of the American Medical Informatics Association, 21, 4, 578--582.
[9]
Tushaar Gangavarapu, Gokul S Krishnan, Sowmya Kamath, and Jayakumar Jeganathan. 2020. Farsight: long-term disease prediction using unstructured clinical nursing notes. IEEE Transactions on Emerging Topics in Computing, 9, 3, 1151--1169.
[10]
Thanos Gentimis, Alnaser Ala'J, Alex Durante, Kyle Cook, and Robert Steele. 2017. Predicting hospital length of stay using neural networks on mimic iii data. In 2017 IEEE 15th intl conf on dependable, autonomic and secure computing, 15th intl conf on pervasive intelligence and computing, 3rd intl conf on big data intelligence and computing and cyber science and technology congress (DASC/PiCom/DataCom/CyberSciTech). IEEE, 1194--1201.
[11]
Sara Nouri Golmaei and Xiao Luo. 2021. Deepnote-gnn: predicting hospital readmission using clinical notes and patient network. In Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, 1--9.
[12]
Hrayr Harutyunyan, Hrant Khachatrian, David C Kale, Greg Ver Steeg, and Aram Galstyan. 2019. Multitask learning and benchmarking with clinical time series data. Scientific data, 6, 1, 96.
[13]
Lars Hempel, Sina Sadeghi, and Toralf Kirsten. 2023. Prediction of intensive care unit length of stay in the mimic-iv dataset. Applied Sciences, 13, 12, 6930.
[14]
Kexin Huang, Jaan Altosaar, and Rajesh Ranganath. 2019. Clinicalbert: modeling clinical notes and predicting hospital readmission. arXiv preprint arXiv:1904.05342.
[15]
Alistair EW Johnson et al. 2016. Mimic-iii, a freely accessible critical care database. Scientific data, 3, 1, 1--9.
[16]
Alistair EW Johnson et al. 2023. Mimic-iv, a freely accessible electronic health record dataset. Scientific data, 10, 1, 1.
[17]
Alok Kumar Kasgar, Jitendra Agrawal, and Satntosh Shahu. 2012. New modified 256-bit md 5 algorithm with sha compression function. International Journal of Computer Applications, 42, 12.
[18]
Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. 2019. Bert: pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT, 4171--4186.
[19]
Atieh Khodadadi, Nima Ghanbari Bousejin, Soheila Molaei, Vinod Kumar Chauhan, Tingting Zhu, and David A Clifton. 2023. Improving diagnostics with deep forest applied to electronic health records. Sensors, 23, 14, 6571.
[20]
Hung Le, Truyen Tran, and Svetha Venkatesh. 2018. Dual memory neural computer for asynchronous two-view sequential learning. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, 1637--1645.
[21]
Chenxia Li, Ruoyu Guo, Jun Zhou, Mengtao An, Yuning Du, Lingfeng Zhu, Yi Liu, Xiaoguang Hu, and Dianhai Yu. 2022. Pp-structurev2: a stronger document analysis system. arXiv preprint arXiv:2210.05391.
[22]
Jili Li, Siru Liu, Yundi Hu, Lingfeng Zhu, Yujia Mao, and Jialin Liu. 2022. Predicting mortality in intensive care unit patients with heart failure using an interpretable machine learning model: retrospective cohort study. Journal of Medical Internet Research, 24, 8, e38082.
[23]
Sicen Liu, Xiaolong Wang, Xianbing Zhao, and Hao Chen. 2023. Medication recommendation via domain knowledge informed deep learning. arXiv preprint arXiv:2305.19604.
[24]
Ahmad Wisnu Mulyadi and Heung-Il Suk. 2023. Kindmed: knowledge-induced medicine prescribing network for medication recommendation. arXiv preprint arXiv:2310.14552.
[25]
Bret Nestor, Matthew BA McDermott, Willie Boag, Gabriela Berner, Tristan Naumann, Michael C Hughes, Anna Goldenberg, and Marzyeh Ghassemi. 2019. Feature robustness in non-stationary health records: caveats to deployable model performance in common clinical machine learning tasks. In Machine Learning for Healthcare Conference. PMLR, 381--405.
[26]
Tom J Pollard, Alistair EW Johnson, Jesse D Raffa, Leo A Celi, Roger G Mark, and Omar Badawi. 2018. The eicu collaborative research database, a freely available multi-center database for critical care research. Scientific data, 5, 1, 1--13.
[27]
Lin Qiu, Sruthi Gorantla, Vaibhav Rajan, and Bernard CY Tan. 2021. Multidisease predictive analytics: a clinical knowledge-aware approach. ACM Transactions on Management Information Systems (TMIS), 12, 3, 1--34.
[28]
Alvin Rajkomar et al. 2018. Scalable and accurate deep learning with electronic health records. NPJ digital medicine, 1, 1, 18.
[29]
MatthewAReyna, Chris Josef, Salman Seyedi, Russell Jeter, Supreeth P Shashikumar, MBrandonWestover, Ashish Sharma, Shamim Nemati, and Gari D Clifford. 2019. Early prediction of sepsis from clinical data: the physionet/computing in cardiology challenge 2019. In 2019 Computing in Cardiology (CinC). IEEE, Page--1.
[30]
Emma Rocheteau, Pietro Liò, and Stephanie Hyland. 2021. Temporal pointwise convolutional networks for length of stay prediction in the intensive care unit. In Proceedings of the conference on health, inference, and learning, 58--68.
[31]
Junyuan Shang, Cao Xiao, Tengfei Ma, Hongyan Li, and Jimeng Sun. 2019. Gamenet: graph augmented memory networks for recommending medication combination. In proceedings of the AAAI Conference on Artificial Intelligence number 01. Vol. 33, 1126--1133.
[32]
Seyedmostafa Sheikhalishahi, Vevake Balaraman, and Venet Osmani. 2019. Benchmarking machine learning models on eicu critical care dataset. arXiv preprint arXiv:1910.00964.
[33]
Rafael Grompone Von Gioi, Jérémie Jakubowicz, Jean-Michel Morel, and Gregory Randall. 2012. Lsd: a line segment detector. Image Processing On Line, 2, 35--55.
[34]
David Weininger. 1988. Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences, 28, 1, 31--36.
[35]
Raymond E Wright. 1995. Logistic regression.
[36]
Rui Wu, Zhaopeng Qiu, Jiacheng Jiang, Guilin Qi, and Xian Wu. 2022. Conditional generation net for medication recommendation. In Proceedings of the ACM Web Conference 2022, 935--945.
[37]
Wen-Tao Wu, Yuan-Jie Li, Ao-Zi Feng, Li Li, Tao Huang, An-Ding Xu, and Jun Lyu. 2021. Data mining in clinical big data: the frequently used databases, steps, and methodological models. Military Medical Research, 8, 1--12.
[38]
Chaoqi Yang, Cao Xiao, Lucas Glass, and Jimeng Sun. 2021. Change matters: medication change prediction with recurrent residual networks. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI 2021.
[39]
Chaoqi Yang, Cao Xiao, Fenglong Ma, Lucas Glass, and Jimeng Sun. 2021. Safedrug: dual molecular graph encoders for recommending effective and safe drug combinations. arXiv preprint arXiv:2105.02711.
[40]
Haiyang Yang, Li Kuang, and FengQiang Xia. 2021. Multimodal temporalclinical note network for mortality prediction. Journal of Biomedical Semantics, 12, 1, 1--14.
[41]
Nianzu Yang, Kaipeng Zeng, Qitian Wu, and Junchi Yan. 2023. Molerec: combinatorial drug recommendation with substructure-aware molecular representation learning. In Proceedings of the ACM Web Conference 2023, 4075--4085.
[42]
Zongbao Yang, Yuchen Lin, Yinxin Xu, Jinlong Hu, and Shoubin Dong. 2023. Interpretable disease prediction via path reasoning over medical knowledge graphs and admission history. Knowledge-Based Systems, 281, 111082.
[43]
Xian Zeng, Gang Yu, Yang Lu, Linhua Tan, Xiujing Wu, Shanshan Shi, Huilong Duan, Qiang Shu, and Haomin Li. 2020. Pic, a paediatric-specific intensive care database. Scientific data, 7, 1, 14.
[44]
Bing Zhang, Huijun Wang, and Shufa Du. 2022. China health and nutrition survey, 1989--2019. In Encyclopedia of gerontology and population aging. Springer, 943--948.
[45]
Yutao Zhang, Robert Chen, Jie Tang, Walter F Stewart, and Jimeng Sun. 2017. Leap: learning to prescribe effective and safe treatment combinations for multimorbidity. In proceedings of the 23rd ACM SIGKDD international conference on knowledge Discovery and data Mining, 1315--1324.
[46]
Yaohui Zhao, Yisong Hu, James P Smith, John Strauss, and Gonghuan Yang. 2014. Cohort profile: the china health and retirement longitudinal study (charls). International journal of epidemiology, 43, 1, 61--68.
[47]
Xu Zhong, Elaheh Shafiei Bavani, and Antonio Jimeno Yepes. 2020. Image based table recognition: data, model, and evaluation. In European conference on computer vision. Springer, 564--580.
[48]
Weicheng Zhu and Narges Razavian. 2021. Variationally regularized graph based representation learning for electronic health records. In Proceedings of the Conference on Health, Inference, and Learning, 1--13.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval
July 2024
3164 pages
ISBN:9798400704314
DOI:10.1145/3626772
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 July 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. benchmark dataset
  2. electronic health record
  3. orthopedic

Qualifiers

  • Research-article

Funding Sources

  • Central Guidance on Local Science and Technology
  • Natural Science Foundation of Fujian Province of China

Conference

SIGIR 2024
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 154
    Total Downloads
  • Downloads (Last 12 months)154
  • Downloads (Last 6 weeks)29
Reflects downloads up to 25 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media