Abstract
Nowadays, text classification has been extensively employed in medical domain to classify free text clinical reports. In this study, text classification techniques have been used to determine cause of death from free text forensic autopsy reports using proposed term-based and SNOMED CT concept-based features. In this study, detailed term-based features and concept-based features were extracted from a set of 1500 forensic autopsy reports belonging to four manners of death and 16 different causes of death. These features were used to train text classifier. The classifier was deployed in cascade architecture: the first level will predict the manner of death and the second level will predict the CoD using proposed term-based and SNOMED CT concept-based features. Moreover, to show the significance of our proposed approach, we compared the results of our proposed approach with four state-of-the-art feature extraction approaches. Finally, we also presented the comparison of one-level classification versus two-level classification. The experimental results showed that our proposed approach showed 8% improvement in accuracy as compared to other four baselines. Moreover, two-level classification showed improved accuracy in determining CoD compared to one-level classification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Machine Learning 39, 103–134 (2000)
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys (CSUR) 34, 1–47 (2002)
Lewis, D.D.: Feature selection and feature extraction for text categorization. In: Proceedings of the Workshop on Speech and Natural Language, pp. 212–217(1992)
Markov, A., Last, M., Kandel, A.: The hybrid representation model for web document classification. International Journal of Intelligent Systems 23, 654–679 (2008)
Al-garadi, M.A., Varathan, K.D., Ravana, S.D.: Cybercrime detection in online communications: The experimental case of cyberbullying detection in the Twitter network. Computers in Human Behavior 63, 433–443 (2016)
Mujtaba, G., Shuib, L., Raj, R. G., Rajandram, R., Shaikh, K.: Automatic Text Classification of ICD-10 Related CoD from Complex and Free Text Forensic Autopsy Reports. In: 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 1055–1058
Mujtaba, G., Shuib, L., Raj, R.G., Rajandram, R., Shaikh, K., Al-Garadi, M.A.: Automatic ICD-10 multi-class classification of cause of death from plaintext autopsy reports through expert-driven feature selection. PloS one 12, e0170242 (2017)
James, S. H., Nordby, J. J., Bell, S.:Forensic science: an introduction to scientific and investigative techniques. CRC press (2002)
Yeow, W.L., Mahmud, R., Raj, R.G.: An application of case-based reasoning with machine learning for forensic autopsy. Expert Systems with Applications 41, 3497–3505 (2014)
Koopman, B., Zuccon, G., Nguyen, A., Bergheim, A., Grayson, N.: Automatic ICD-10 classification of cancers from free-text death certificates. International Journal of Medical Informatics 84, 956–965 (2015)
Dias, R., Salvini, R., Nierenberg, A., Lafer, B.: Machine learning approach with baseline clinical data forecasting depression relapse in bipolar disorder. Bipolar Disorders 18, 103–103 (2016)
Farooq, K., Hussain, A.: A novel ontology and machine learning driven hybrid cardiovascular clinical prognosis as a complex adaptive clinical system. Complex Adaptive Systems Modeling 4, 21 (2016)
Galli, M., Zoppis, I., Smith, A., Magni, F., Mauri, G.: Machine learning approaches in MALDI-MSI: clinical applications. Expert Review of Proteomics 13, 685–696 (2016)
Harris, Z.S.: Distributional structure. Word 10, 146–162 (1954)
Passalis, N., Tefas, A.: Entropy optimized feature-based bag-of-words representation for information retrieval. IEEE Transactions on Knowledge and Data Engineering 28, 1664–1677 (2016)
Le, Q.V., Mikolov, T.: Distributed Representations of Sentences and Documents. In: ICML, pp. 1188–1196 (2014)
EnrÃquez, F., Troyano, J.A., López-Solaz, T.: An approach to the use of word embeddings in an opinion classification task. Expert Systems with Applications 66, 1–6 (2016)
Jouhet, V., Defossez, G., Burgun, A., Le Beux, P., Levillain, P., Ingrand, P., et al.: Automated classification of free-text pathology reports for registration of incident cases of cancer. Methods of Information in Medicine 51, 242 (2012)
Danso, S., Atwell, E., Johnson, O.: Linguistic and statistically derived features for cause of death prediction from verbal autopsy text. In: Gurevych, I., Biemann, C., Zesch, T. (eds.) GSCL 2013. LNCS, vol. 8105, pp. 47–60. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40722-2_5
Danso, S., Atwell, E., Johnson, O.: A comparative study of machine learning methods for verbal autopsy text classification (2014). arXiv preprint arXiv:1402.4380
Siddiqui, M.F., Reza, A.W., Kanesan, J.: An automated and intelligent medical decision support system for brain MRI scans classification. PloS One 10, e0135875 (2015)
Al-garadi, M.A., Khan, M.S., Varathan, K.D., Mujtaba, G., Al-Kabsi, A.M.: Using online social networks to track a pandemic: A systematic review. Journal of Biomedical Informatics 62, 1–11 (2016)
Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: IJCAI, pp. 1137–1145 (1995)
Sokolova, M., Lapalme, G.: A systematic analysis of performance measures for classification tasks. Information Processing & Management 45, 427–437 (2009)
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998). doi:10.1007/BFb0026683
Xu, B., Guo, X., Ye, Y., Cheng, J.: An Improved Random Forest Classifier for Text Categorization. JCP 7, 2913–2920 (2012)
Dreiseitl, S., Ohno-Machado, L., Kittler, H., Vinterbo, S., Billhardt, H., Binder, M.: A comparison of machine learning methods for the diagnosis of pigmented skin lesions. Journal of Biomedical Informatics 34, 28–36 (2001)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Mujtaba, G., Shuib, L., Raj, R.G., Al-Garadi, M.A., Rajandram, R., Shaikh, K. (2017). Hierarchical Text Classification of Autopsy Reports to Determine MoD and CoD Through Term-Based and Concepts-Based Features. In: Perner, P. (eds) Advances in Data Mining. Applications and Theoretical Aspects. ICDM 2017. Lecture Notes in Computer Science(), vol 10357. Springer, Cham. https://doi.org/10.1007/978-3-319-62701-4_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-62701-4_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62700-7
Online ISBN: 978-3-319-62701-4
eBook Packages: Computer ScienceComputer Science (R0)