Abstract
While systematic reviews (SRs) are positioned as an essential element of modern evidence-based medical practice, the creation and update of these reviews is resource intensive. In this research, we propose to leverage advanced analytics techniques for automatically classifying articles for inclusion and exclusion for systematic reviews. Specifically, we used soft-margin polynomial Support Vector Machine (SVM) as a classifier, exploited Unified Medical Language Systems (UMLS) for medical terms extraction, and examined various techniques to resolve the class imbalance issue. Through an empirical study, we demonstrated that soft-margin polynomial SVM achieves better classification performance than the existing algorithms used in current research, and the performance of the classifier can be further improved by using UMLS to identify medical terms in articles and applying re-sampling methods to resolve the class imbalance issue.
Similar content being viewed by others
References
Adeva, G., Atxa, P., Carrillo, U., & Zengotitabengoa, A. (2014). Automatic text classification to support systematic reviews in medicine. Expert Systems with Applications, 41(4), 1498–1508.
Allen, I., & Olkin, I. (1999). Estimating time to conduct a meta-analysis from number of citations retrieved. JAMA, 282(7), 634–635.
Ananiadou, S., Procter, R., Rea, B., & Sasaki, Y. (2009). Supporting Systematic Reviews Using Text Mining., 3.
Aronson, A. R., Bodenreider, O., Demner-Fushman, D., Fung, K. W., Lee, V. K., Mork, J. G., et al. (2007) From indexing the biomedical literature to coding clinical text: experience with MTI and machine learning approaches. In Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing, 2007 (pp. 105–112): Association for Computational Linguistics
Bekhuis, T., & Demner-Fushman, D. (2012). Screening nonrandomized studies for medical systematic reviews: a comparative study of classifiers. Artificial Intelligence in Medicine, 55, 197–207. doi:10.1016/j.artmed.2012.05.002.
Chawla, N. V. (2010). Data mining for imbalanced datasets: an overview. Data mining and knowledge discovery handbook, Springer.
Cochrane (2013). Cochrane handbook for systematic reviews of interventions. http://handbook.cochrane.org. Accessed Nov 20, 2013.
Cohen, A. M. C. (2014). Systematic drug class review gold standard data. http://skynet.ohsu.edu/~cohenaa/systematic-drug-class-review-data.html. Accessed April 2, 2014.
Cohen, A., Ersh, W., & Eterson, K. (2006). Reducing workload in systematic review preparation using automated citation classification. 206–219, doi:10.1197/jamia.M1929.The.
Cohen, A., Adams, C., Davis, J., Yu, C., Yu, P., Meng, W., et al. (2010). The Essential role of systematic reviews, and the need for automated text mining tools. 376–380.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.
Dietterich, T. G. (1998). Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10(7), 1895–1923.
Frunza, O., Inkpen, D., & Matwin, S. (2010). Building systematic reviews using automatic text classification techniques. Proceedings of the 23rd International Conference on Computational Linguistics: Posters. Association for Computational Linguistics, 303–311.
He, H., & Ma, Y. (2013). Imbalanced Learning: Foundations, Algorithms, and Applications: Technology & engineering.
Higgins, J., & Green, S. (2011). Cochrane Handbook for Systematic Reviews of Interventions Version 5.1.0 [updated March 2011]. The Cochrane Collaboration.
Joachims, T. (1998). Text categorization with support vector machines : learning with many relevant features. Universtat Dortmund, 1-19.
Kilicoglu, H., Demner-Fushman, D., Rindflesch, T. C., Wilczynski, N. L., & Haynes, R. B. (2009). Towards automatic recognition of scientifically rigorous clinical research evidence. Am Med Inform Assoc, 16(1), 25–31. doi:10.1197/jamia.M2996.
Kivinen, J., Warmuth, M., & Auer, P. (1995). The perceptron algorithm vs. winnow: Linear vs. logarithmic mistakes bounds when few input variables are relavant. Conference on Computational Learning Theory.
Liu, A. Y. (2004). The effect of oversampling and undersampling on classifying imbalanced text datasets. The University of Texas at Austin.
Liu, H., Johnson, S. B., & Friedman, C. (2002). Automatic resolution of ambiguous terms based on machine learning and conceptual relations in the UMLS. [Evaluation Studies
Liu, T. Y., Xu, J., Qin, T., Xiong, W., & Li, H. (2007). Letor: Benchmark dataset for research on learning to rank for information retrieval. In Proceedings of SIGIR 2007 workshop on learning to rank for information retrieval, 3–10.
Liu, X. Y., Wu, J., & Zhou, Z.-H. (2009). Exploratory undersampling for class-imbalance learning. IEEE Transactions On SYSTEMS, Man, And Cybernetics—Part B: Cybernetics, 39(2), 539–550.
Matwin, S., Kouznetsov, A., Inkpen, D., Frunza, O., & O'Blenis, P. (2010). A new algorithm for reducing the workload of experts in performing systematic reviews. [research support, Non-U.S. Gov't]. Journal of the American Medical Informatics Association, 17(4), 446–453. doi:10.1136/jamia.2010.004325.
McGowan, J., & Sampson, M. (2005). Systematic reviews need systematic searchers. Journal of the Medical Library Association, 93(1), 74–80.
Mulrow, C. (1994). Rationale for systematic reviews. BMJ, 309, 597–599.
Research Support, U.S. Gov't, P.H.S.]. J Am Med Inform Assoc, 9(6), 621–636.
Robertson, S. (2004). Understanding inverse document frequency: on theoretical arguments for IDF. Journal of Documentation, 60(5), 503–520.
Shemilt, I., Simon, A., Hollands, G. J., Marteau, T. M., Ogilvie, D., O'Mara-Eves, A., et al. (2013). Pinpointing needles in giant haystacks: use of text mining to reduce impractical screening workload in extremely large scoping reviews. Research Synthesis Methods, n/a-n/a. doi:10.1002/jrsm.1093.
Shojania, K. G., Sampson, M., Ansari, M. T., & Garritty, C. (2007a). Updating systematic reviews. AHRQ, 16.
Shojania, K. G., Sampson, M., Ansari, M. T., Garritty, C., Doucette, S., Rader, T., et al. (2007b). Updating Systematic Reviews. Agency for Healthcare Research and Quality, Contract No. 290–02–0021.
Stanford (2014). Soft margin classification. http://nlp.stanford.edu/IR-book/html/htmledition/soft-margin-classification-1.html. Accessed June 11, 2014.
Stevens, S. (2001). Systematic reviews: the heart of evidence-based practice. AACN Clinical Issues: Advanced Practice in Acute & Critical Care, 12(4), 529–538.
Tsafnat, G., Glasziou, P., Choong, M. K., Dunn, A., Galgani, F., & Coiera, E. (2014). Systematic review automation technologies. Syst Rev, 3, 74. doi:10.1186/2046-4053-3-74.
US National Library of Medicine (2014). Unified Medical Language System® (UMLS®). http://www.nlm.nih.gov/research/umls/knowledge_sources/metathesaurus/index.html2014.
Wallace, B. C., Trikalinos, T. a., Lau, J., Brodley, C., & Schmid, C. H. (2010). Semi-automated screening of biomedical citations for systematic reviews. BMC Bioinformatics, 11, 55. doi:10.1186/1471-2105-11-55.
Wells, S. Role of information technology in evidence based medicine: advantages and limitations (2006). The Internet Journal of Healthcare Administration, 4, 2.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Timsina, P., Liu, J. & El-Gayar, O. Advanced analytics for the automation of medical systematic reviews . Inf Syst Front 18, 237–252 (2016). https://doi.org/10.1007/s10796-015-9589-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10796-015-9589-7