Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-031-08473-7_30guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

A Framework for False Negative Detection in NER/NEL

Published: 15 June 2022 Publication History

Abstract

Finding the false negatives of a NER/NEL system is fundamental to improve it, and is usually done by manual annotation of texts. However, in an environment with a huge volume of unannotated texts (e.g. a hospital) and a low frequency of positives (e.g. a mention of a particular disease in the clinical notes) the task becomes very inefficient. This paper presents a framework to tackle this problem: given an existing NER/NEL system, we propose a technique consisting of using text similarity search to rank texts by probability of containing false negatives of a given concept, using as a query those texts where the existing NER/NEL system has found positives of this concept. We formulate text similarity as a function of shared medical entities between texts, and we re-purpose an existing public dataset (CodiEsp) to propose an evaluation strategy.

References

[1]
Alodadi, M., Janeja, V.P.: Similarity in patient support forums using TF-IDF and cosine similarity metrics. In: 2015 International Conference on Healthcare Informatics, pp. 521–522 (2015).
[2]
Arora, S., Liang, Y., Ma, T.: A simple but tough-to-beat baseline for sentence embeddings. In: ICLR (2017)
[3]
Aryal, S., Ting, K.M., Washio, T., Haffari, G.: A new simple and effective measure for bag-of-word inter-document similarity measurement. arXiv preprint arXiv:1902.03402 (2019)
[4]
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018). http://arxiv.org/abs/1810.04805
[5]
Farouk, M.: Measuring sentences similarity: a survey. CoRR abs/1910.03940 (2019). http://arxiv.org/abs/1910.03940
[6]
Gao M, Li T, Huang P, et al. Liu X et al. Text classification research based on improved Word2vec and CNN Service-Oriented Computing – ICSOC 2018 Workshops 2019 Cham Springer 126-135
[7]
Gupta, V., Saw, A., Nokhiz, P., Netrapalli, P., Rai, P., Talukdar, P.P.: P-SIF: document embeddings using partition averaging. CoRR abs/2005.09069 (2020). https://arxiv.org/abs/2005.09069
[8]
Jang, B., Kim, M., Harerimana, G., Kang, S.U., Kim, J.W.: Bi-LSTM model to increase accuracy in text classification: combining Word2vec CNN and attention mechanism. Appl. Sci. 10(17) (2020)., https://www.mdpi.com/2076-3417/10/17/5841
[9]
Kadhim, A.I., Cheah, Y.N., Ahamed, N.H.: Text document preprocessing and dimension reduction techniques for text document clustering. In: 2014 4th International Conference on Artificial Intelligence with Applications in Engineering and Technology, pp. 69–73 (2014).
[10]
Kadhim, A.I., Cheah, Y.N., Hieder, I.A., Ali, R.A.: Improving TF-IDF with Singular Value Decomposition (SVD) for feature extraction on Twitter (2017)
[11]
Lahitani, A.R., Permanasari, A.E., Setiawan, N.A.: Cosine similarity to determine similarity measure: study case in online essay assessment. In: 2016 4th International Conference on Cyber and IT Service Management, pp. 1–6 (2016).
[12]
Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning (2014)
[13]
Li, B., Zhou, H., He, J., Wang, M., Yang, Y., Li, L.: On the sentence embeddings from pre-trained language models. CoRR abs/2011.05864 (2020). https://arxiv.org/abs/2011.05864
[14]
Mikolov, T., Chen, K., Corrado, G.S., Dean, J.: Efficient estimation of word representations in vector space. In: ICLR (2013)
[15]
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. CoRR abs/1310.4546 (2013), http://arxiv.org/abs/1310.4546
[16]
Miranda-Escalada, A., Gonzalez-Agirre, A., Armengol-Estapé, J., Krallinger, M.: Overview of automatic clinical coding: annotations, guidelines, and solutions for non-English clinical cases at codiesp track of CLEF eHealth 2020. In: Working Notes of Conference and Labs of the Evaluation (CLEF) Forum. CEUR Workshop Proceedings (2020)
[17]
Sato, R., Yamada, M., Kashima, H.: Re-evaluating word mover’s distance. CoRR abs/2105.14403 (2021). https://arxiv.org/abs/2105.14403
[18]
Schmidt, C.W.: Improving a TF-IDF weighted document vector embedding. CoRR abs/1902.09875 (2019). http://arxiv.org/abs/1902.09875
[19]
Tata S and Patel JM Estimating the selectivity of TF-IDF based cosine similarity predicates ACM SIGMOD Rec. 2007 36 2 7-12
[20]
Wieting, J., Bansal, M., Gimpel, K., Livescu, K.: Towards universal paraphrastic sentence embeddings. CoRR abs/1511.08198 (2016)
[21]
Wu, L., et al.: Word mover’s embedding: from Word2Vec to document embedding. CoRR abs/1811.01713 (2018). http://arxiv.org/abs/1811.01713

Index Terms

  1. A Framework for False Negative Detection in NER/NEL
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image Guide Proceedings
          Natural Language Processing and Information Systems: 27th International Conference on Applications of Natural Language to Information Systems, NLDB 2022, Valencia, Spain, June 15–17, 2022, Proceedings
          Jun 2022
          529 pages
          ISBN:978-3-031-08472-0
          DOI:10.1007/978-3-031-08473-7

          Publisher

          Springer-Verlag

          Berlin, Heidelberg

          Publication History

          Published: 15 June 2022

          Author Tags

          1. Natural language processing
          2. NLP
          3. Clinical NLP
          4. False negatives
          5. Document representation
          6. Text similarity search
          7. Named Entity Recognition
          8. NER
          9. Named Entity Linking
          10. NEL

          Qualifiers

          • Article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 0
            Total Downloads
          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 16 Oct 2024

          Other Metrics

          Citations

          View Options

          View options

          Get Access

          Login options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media