An Ontology for Generalized Disease Incidence Detection on Twitter

Magumba, Mark Abraham; Nabende, Peter

doi:10.1007/978-3-319-59650-1_4

Mark Abraham Magumba¹⁷ &
Peter Nabende¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10334))

Included in the following conference series:

International Conference on Hybrid Artificial Intelligence Systems

2686 Accesses

Abstract

In this paper, we present an ontology of disease related concepts that is designated for detection of disease incidence in tweets. Unlike previous key word based systems and topic modeling approaches, our ontological approach allows us to apply more stringent criteria for determining which messages are relevant such as spatial and temporal characteristics whilst giving a stronger guarantee that the resulting models will perform well on new data that may be lexically divergent. We achieve this by training supervised learners on concepts rather than individual words. Effectively, we map every possible word to a fixed length lexicon thereby eliminating lexical divergence between training data and new data. For training we use a dataset containing mentions of influenza, common cold and Listeria and use the learned models to classify datasets containing mentions of an arbitrary selection of other diseases. We show that our ontological approach results in models whose performance is not only good but also stable on lexically divergent data versus a word-level lookup unigram, bag of words baseline approach. We also show that word vectors can be learned directly from our concepts to achieve even better results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Evaluation of different machine learning approaches and input text representations for multilingual classification of tweets for disease surveillance in the social web

Article Open access 26 October 2021

Topic Model—Machine Learning Classifier Integrations on Geocoded Twitter Data

IDTCKS: An Intelligent Integrative Approach for Disaster Tweet Classification and Disaster-Related Document Classification Using Knowledge-Driven Hybrid Semantics

Notes

1.
https://github.com/MarkMagumba/Twitter-Disease-incidence-Description-Language-Ontology.
2.
General architecture for text engineering.

References

Lee, K., Agrawal, A., Choudary, A.: Real time disease surveillance using twitter data: case study flu and cancer. In: ACM, Chicago, Illinois, USA, pp. 1474–1477 (2013)
Google Scholar
Google Inc, https://www.google.org/flutrends/about/
Paul, M.J., Dredze, M.: Discovering health topics in social media using topic models. PLoS ONE 9, 8 (2014)
Google Scholar
Lampos, V., Cristianini, N.: Tracking the flu pandemic by monitoring the social web, pp. 411–416. IEEE, Naregno, Elba island, Italy (2010)
Google Scholar
Collier, N., Doan, S., Kawazoe, A., Goodwin, R.M., Conway, M., Tateno, Y., et al.: Biocaster: detecting public health rumors with a web-based text mining system. Bioinform. 24(24), 2940–2941 (2008)
Article Google Scholar
Okhmatovskaia, A., Chapman, W., Collier, N., Espino, J., Buckeridge, D.L.: SSO: The Syndromic Surveillance Ontology https://www.bioontology.org/sites/default/files/SSO.pdf
Porta, M.: A Dictionary of Epidemiology. Oxford University Press, New York (2008)
Google Scholar
Smith, B., Ashburner, M., Rosse, C., Bard, J., Bug, W., et al.: The OBO foundry: coordinated evolution of ontologies to support biomedical data integration. Nat. Biotech. 25, 1251–1255 (2007)
Article Google Scholar
Osborne, J.D., Flatow, J., Holko, M., Lin, S.M., Kibbe, W.A., Zhue, L., et al.: Annotating the human genome with disease ontology. BMC Genom. 10, 1 (2009)
Article Google Scholar
Pesquira, C., Ferreira, J.D., Couto, M.F., Silva, M.J.: The epidemiology ontology: an ontology for semantic annotation of epidemiological resources. J. Biomed. Semant. 5, 4 (2014)
Article Google Scholar
Clark, T., Ciccarese, P.N., Goble, C.A.: Micropublications: a semantic model for claims, evidence, arguments and annotations in biomedical communications. J. Biomed. Semant. 5(1), 1–33 (2014)
Article Google Scholar
Elliott, J., Mavergames, C., Becker, L., Meerpohl, J., Thomas, J., Gruen, R., Tovey, D.: Achieving high quality and efficient systematic review through technological innovation. BMJ Rapid Response (2013) http://www.bmj.com/content/346/bmj.f139/rr/625503
Smith, B., Fellbaum, C.: Medical Wordnet: A New Methodology for the Construction and Validation of Information Resources for Consumer Health, p. 371. ACM, Geneva (2004)
Google Scholar
Taylor, A., Marcus, M., Santorini, B.: The Penn Treebank: An Overview. In: Abeille, A. (ed.) Treebanks. Building and Using Parsed Corpora, pp. 5–22. Springer, Netherlands (2003)
Chapter Google Scholar
Derczynski, L., Ritter, A., Clark, S., Bontcheva, K.: Twitter part-of-speech tagging for all: overcoming sparse and noisy data. In: ACL, Hisar, Bulgaria, pp. 198–206 (2013)
Google Scholar
Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: Gate: an architecture for development of robust HLT applications. In: ACL, Philadelphia, USA, pp. 168–175 (2002)
Google Scholar
Toutanova, K., Manning, C.D.: Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In: ACL, Hong Kong, pp. 63–70 (2000)
Google Scholar
Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: ACM, Edmonton, Canada, pp. 252–259 (2003)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient Estimation Of Word Representations In Vector Space. Google Curran Associates Inc., Arizona, USA (2013)
Google Scholar
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: JMLR Workshop and Conference Proceedings, Beijing, China, pp. 1188–1196 (2014)
Google Scholar
Rehurek, R., Sojka, P.: Software Framework for Topic Modeling with Large Corpora, pp. 46–50. University of Malta Valetta, Malta (2010)
Google Scholar
Pedregrosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. 12, 2825–2830 (2011)
MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Systems, College of Computing and Information Sciences, Makerere University, P.O. Box 7062, Kampala, Uganda
Mark Abraham Magumba & Peter Nabende

Authors

Mark Abraham Magumba
View author publications
You can also search for this author in PubMed Google Scholar
Peter Nabende
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mark Abraham Magumba .

Editor information

Editors and Affiliations

University of La Rioja , Logroño, La Rioja, Spain
Francisco Javier Martínez de Pisón
University of La Rioja , Logroño, La Rioja, Spain
Rubén Urraca
University of A Coruña , Ferrol, La Coruña, Spain
Héctor Quintián
University of Salamanca, Salamanca, Spain
Emilio Corchado

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Magumba, M.A., Nabende, P. (2017). An Ontology for Generalized Disease Incidence Detection on Twitter. In: Martínez de Pisón, F., Urraca, R., Quintián, H., Corchado, E. (eds) Hybrid Artificial Intelligent Systems. HAIS 2017. Lecture Notes in Computer Science(), vol 10334. Springer, Cham. https://doi.org/10.1007/978-3-319-59650-1_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-59650-1_4
Published: 02 June 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59649-5
Online ISBN: 978-3-319-59650-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Ontology for Generalized Disease Incidence Detection on Twitter

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Evaluation of different machine learning approaches and input text representations for multilingual classification of tweets for disease surveillance in the social web

Topic Model—Machine Learning Classifier Integrations on Geocoded Twitter Data

IDTCKS: An Intelligent Integrative Approach for Disaster Tweet Classification and Disaster-Related Document Classification Using Knowledge-Driven Hybrid Semantics

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

An Ontology for Generalized Disease Incidence Detection on Twitter

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Evaluation of different machine learning approaches and input text representations for multilingual classification of tweets for disease surveillance in the social web

Topic Model—Machine Learning Classifier Integrations on Geocoded Twitter Data

IDTCKS: An Intelligent Integrative Approach for Disaster Tweet Classification and Disaster-Related Document Classification Using Knowledge-Driven Hybrid Semantics

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation