Abstract
Health maintenance is one of the foremost pillars of human society which needs up-to-date solutions to medical problems. The advancement in the biomedical field has intensified the—information load that exists in the form of clinic reports, research papers, or lab tests, etc. Extracting meaningful insights from this corpus is equally important as its progress—to make it valuable for recent medicine. In terms of biomedical text mining, the areas explored include protein–protein interactions, entity-relationship detection, and so on. The biomedical effects of drugs have significance when administered to a living organism. Biomedical literature is not widely explored in terms of gene-drug relations, hence needs investigation. Indexing methods can be used for ranking gene-drug relations. In scientific literature, Hirsch’s the h-index is usually used to quantify the impact of an individual author. Likewise, in this research, we propose the Drug-Index, a quantifiable measure that can be used to detect gene-drug relations. It is useful in drug discovery, diagnosing, personalized treatment using suitable drugs for relevant genes. For a strong and reliable gene-drug relationship discovery, drugs are extracted from a subset of MEDLINE—a bibliographic medical database. The detected drugs are verified from the PharmacoGenomics KnowledgeBase (PharmGKB)—a publicly available medical knowledgebase by Stanford University.
Similar content being viewed by others
Notes
References
Alasbahi, R. H., & Melzig, M. F. (2012). Forskolin and derivatives as tools for studying the role of cAMP. Die Pharmazie-An International Journal of Pharmaceutical Sciences, 67(1), 5–13.
An-Bing, Z., Hui-Hua, Y., Xipeng, P., Li-Hui, Y., & Yan-chun, F. (2020). On-site identification of counterfeit drugs based on near-infrared spectroscopy Siamese-network modeling. IEEE Access. https://doi.org/10.1109/ACCESS.2020.3047683
Aronson, A. R. (2001). Effective mapping of Biomedical Text to the UMLS Metathesaurus: The Metamap Program. In Proceedings of the AMIA symposium, (pp. 17–21).
Aronson, A. R., & Lang, F.-M. (2010). An overview of MetaMap: historical perspective and recent advances. JAMIA: A Scholarly Journal of Informatics in Health and Biomedicine, 17, 229–236.
Bahat, H. S., Takasaki, H., Chen, X., Bet-Or, Y., & Treleaven, J. (2015). Cervical kinematic training with and without interactive VR training for chronic neck pain–a randomized clinical trial. Manual therapy, 20(1), 68–78.
Baumgartner, W. A., Jr., Cohen, K. B., Fox, L. M., Acquaah-Mensah, G., & Hunter, L. (2007). Manual curation is not sufficient for annotation of genomic databases. Bioinformatics, 23, 41–48.
Blakey, J. D., & Hall, I. P. (2011). Current progress in pharmacogenomics. British Journal of Clinical Pharmacology, 71, 824–836.
Bodenreider, O. (2004). The Unified Medical Langauge System (UMLS): Integrating biomedical terminology. Nucleic Acids Research, 32, D267–D270.
Braat, H., Rottiers, P., Hommes, D. W., Huyghebaert, N., Remaut, E., Remon, J. P., et al. (2006). A phase I trial with transgenic bacteria expressing interleukin-10 in Crohn’s disease. Clinical gastroenterology and hepatology, 4(6), 754–759.
Chen, Q., & Pan, G. (2021). A structure-self-organizing DBN for image recognition. Neural Computing and Applications, 33, 877–886. https://doi.org/10.1007/s00521-020-05262-2
Choi, S. Y., Lee, H., & Yoo, Y. (2010). The impact of information technology and transactive memory systems on knowledge sharing, application, and team performance: A field study. MIS quarterly, 855–870.
Cohen, B. K., Johnson, H. L., Verspoor, K., Roeder, C., & Hunter, L. E. (2010). The structural and content aspects of abstracts versus bodies of full text journal articles are different. BMC Bioinformatics, 11, 492.
Ding, J., Berleant, D., Nettleton, D., & Wurtele, E. (2002). Mining MEDLINE: abstracts, sentences, or phrases? Pacific Symposium on BIOCOMPUTING (pp. 326–3).
Ding, Y., Tang, J., & Guo, F. (2017). Identification of drug-target interactions via multiple information integration. Information Sciences, 418, 546–560.
EhsanBasiri, M., Abdar, M., Cifci, M. A., Nemati, S., & Acharya, U. R. (2020). A novel method for sentiment classification of drug reviews using fusion of deep and machine learning techniques. Knowledge-Based Systems. https://doi.org/10.1016/j.knosys.2020.105949
Fabiano, G., Marcellusi, A., & Favato, G. (2020). Public–private contribution to biopharmaceutical discoveries: A bibliometric analysis of biomedical research in UK. Scientometrics, 124, 153–168. https://doi.org/10.1007/s11192-020-03429-1
Follett, L., Geletta, S., & Laugerman, M. (2019). Quantifying risk associated with clinical trial termination: A text mining approach. Information Processing and Management, 56(3), 516–525. https://doi.org/10.1016/j.ipm.2018.11.009
Fraunhofer SCAI: Corpora for Chemical Entity Recognition. (2016). Retrieved 12 27, 2014 from Fraunhofer SCAI: http://www.scai.fraunhofer.de/en/business-research-areas/bioinformatics/research-development/information-extraction-semantic-text-analysis/named-entity-recognition/chem-corpora.html
Furman, D. J., III., Naskolnakorn, J., Ye, J., Kayser, A., & D’Esposito, M. (2020). Effects of dopaminergic drugs on cognitive control processes vary by genotype. Journal of Cognitive Neuroscience, 32(5), 804–821.
Garten, Y., Coulet, A., & Altman, R. B. (2010). Recent progress in automatically extracting information from the pharmacogenetic literature. Pharmacogenomics, 11, 1467–1489.
Geng, Z., Chen, G., Han, Y., Lu, G., & Li, F. (2020). Semantic relation extraction using sequential and tree-structured LSTM with attention. Information Sciences, 509, 183–192.
Giacomini, K. M., Krauss, R. M., Roden, D. M., Eichelbaum, M., Hayden, M. R., & Nakamura, Y. (2007). When good drugs go bad. Nature, 446, 975–977.
Hamburg, M. A., & Collins, F. S. (2010). The path to personalized medicine. The NEW ENGLAND JOURNAL of MEDICINE, 363, 301–304.
Hewett, M., Oliver, D. E., Rubin, D. L., Easton, K. L., Stuart, J. M., Altman, R. B., & Klein, T. E. (2002). PharmGKB: The Pharmacogenetics Knowledge Base. Nucleic Acids Research, 30(1), 163–165.
Hirsch, J. E. (2005, November 15). Proceedings of the National Academy of Sciences. An index to quantify an individual’s scientific research output, 102(46), 16569–16572.
Klinger, R., Kolářik, C., Fluck, J., Hofmann-Apitius, M., & Friedrich, C. M. (2008). Detection of IUPAC and IUPAC-like chemical names. Bioinformatics, 24(13), i268–i276.
Knowles, B. B., Howe, C. C., & Aden, D. P. (1980). Human hepatocellular carcinoma cell lines secrete the major plasma proteins and hepatitis B surface antigen. Science, 209(4455), 497–499.
Li, X., Peng, S., & Du, J. (2021). Towards medical knowmetrics: Representing and computing medical knowledge using semantic predications as the knowledge unit and the uncertainty as the knowledge context. Scientometrics, 126, 6225–6251. https://doi.org/10.1007/s11192-021-03880-8
Liu, H., Hu, Z.-Z., Zhang, J., & Wu, C. (2006). BioThesaurus: A web-based thesaurus of protein and gene names. Bioinformatics, 22(1), 103–105.
Lu, Z. (2011). PubMed and beyond: A survey of Web tools for searching biomedical literature. Database The Journal of Biological Databases and Curation, 2011.
McCray, A. T., Srinivasan, S., & Browne, A. C. (1994). Lexical methods for managing variation in biomedical terminologies. In Proceedings of the annual symposium on computer application in medical care (pp. 235–239).
Naseem, U., Musial, K., Eklund, P., & Prasad, M. (2020). Biomedical named-entity recognition by hierarchically fusing BioBERT representations and deep contextual-level word-embedding. In International Joint Conference on Neural Networks (IJCNN), (pp. 1–8). Glasgow, UK. https://doi.org/10.1109/IJCNN48605.2020.9206808
Nguyen, N., Choi, C. J., Robbins, R., Korich, R., Raymond, J., Dolezal, C., et al. (2020). Psychiatric trajectories across adolescence in perinatally HIV-exposed youth: The role of HIV infection and associations with viral load. AIDS (London, England), 34(8), 1205.
Percha, B., & Altman, R. B. (2015). Learning the structure of biomedical relationships from unstructured text. PLOS Computational Biology, 11(7), e1004216.
Quirk, C., & Poon, H. (2017). Distant Supervision for Relation Extraction beyond the Sentence Boundary. In Proceedings of the 15th conference of the European chapter of the Association for computational linguistics: Volume 1, Long Papers (pp. 1171–1182).
Samuels, Y., Wang, Z., Bardelli, A., Silliman, N., Ptak, J., Szabo, S., et al. (2004). High frequency of mutations of the PIK3CA gene in human cancers. Science, 304(5670), 554–554.
Segura-Bedmar, I., Martínez, P., & Segura-Bedmar, M. (2008). Drug name recognition and classification in biomedical texts. A case study outlining approaches underpinning automated systems. Drug Discovery Today, 13, 816–823.
Siu, A., Nguyen, D. B., & Weikum, G. (2013). Fast entity recognition in biomedical text. In Workshop on Data Mining for Healthcare (DMH) at the 19th ACM SIGKDD conference on Knowledge Discovery and Data Mining (KDD) 2013. Chicago, USA: Association for Computing Machinery (ACM).
Song, M., Kim, M., Kang, K., Kim, Y. H., & Jeon, S. (2018). Application of public knowledge discovery tool (PKDE4J) to represent biomedical scientific knowledge. Frontiers in Research Metrics and Analytics, 3, 7.
Takanobu, R., Zhang, T., Liu, J., & Huang, M. (2019). A Hierarchical Framework for Relation Extraction with Reinforcement Learning. Proceedings of the AAAI conference on artificial intelligence.
Wang, L., Mo, T., Wang, X., Chen, W., He, Q., Li, X., & Zhen, X. (2021). A hierarchical fusion framework to integrate homogeneous and heterogeneous classifiers for medical decision-making. Knowledge-Based Systems,. https://doi.org/10.1016/j.knosys.2020.106517
Wang, X., Yang, C., & Guan, R. (2018). A comparative study for biomedical named entity recognition. Machine Learning and Cybernetics, 9, 373–382.
Wang, X., Zhang, S., Wu, Y., & Yang, X. (2021). Revealing potential drug-disease-gene association patterns for precision medicine. Scientometrics, 126, 3723–3748. https://doi.org/10.1007/s11192-021-03892-4
Wu, Y., Liu, M., Zheng, W. J., Zhao, Z., & Xu, H. (2012). Ranking gene-drug relationships in biomedical literature using Latent Dirichlet Allocation. Pacific Symposium on Biocomputing, 2012, 422–433.
Xu, R., & Wang, Q. (2012). A knowledge-driven conditional approach to extract pharmacogenomics specific drug-gene relationships from free text. Journal of Biomedical Informatics, 45(5), 827–834.
Xu, R., & Wang, Q. (2013). A semi-supervised approach to extract pharmacogenomics-specific drug–gene pairs from biomedical literature for personalized medicine. Journal of Biomedical Informatics, 46(4), 585–593.
Yang, H., Hu, B., Pan, X., Yan, S., Feng, Y., Zhang, X., & Hu, C. (2017). Deep belief network-based drug identification using near infrared spectroscopy. Journal of Innovative Optical Health Sciences, 10(2), 1–10.
Acknowledgements
The work was funded by the University of Jeddah, Saudi Arabia under Grant No (DSR-UJ-20-047-DR). This work was also supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (NRF- 2019S1A5C2A03083499). The authors, therefore, acknowledge with thanks the university's technical and financial support. The main idea of the work is given and supervised by Ali Daud.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Alharbey, R., Kim, J.I., Daud, A. et al. Indexing important drugs from medical literature. Scientometrics 127, 2661–2681 (2022). https://doi.org/10.1007/s11192-022-04340-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-022-04340-7