Rezarta Dogan

Followers

Following

Co-authors

Public Views

Interests

Uploads

Papers by Rezarta Dogan

DNorm: A New Method and Online Tool for Disease Name Normalization

AMIA, 2013

DNorm: disease name normalization with pairwise learning to rank

Bioinformatics, Aug 21, 2013

Download

Information retrieval and knowledge discovery in biomedical text : papers from the AAAI Fall Symposium

AAAI Press eBooks, 2012

Download

Feature generation and analysis applied to sequence classification for splice-site prediction

Download

Linking chemical mentions to medical subject headings in full text

F1000Research, Jun 10, 2019

Comprehensively identifying Long Covid articles with human-in-the-loop machine learning

Patterns, 2023

Author keywords in biomedical journal articles

PubMed, Nov 13, 2010

As an information retrieval system, PubMed(®) aims at providing efficient access to documents cit... more As an information retrieval system, PubMed(®) aims at providing efficient access to documents cited in MEDLINE(®). For this purpose, it relies on matching representations of documents, as provided by authors and indexers to user queries. In this paper, we describe the growth of author keywords in biomedical journal articles and present a comparative study of author keywords and MeSH(®) indexing terms assigned by MEDLINE indexers to PubMed Central Open Access articles. A similarity metric is used to assess automatically the relatedness between pairs of author keywords and indexing terms. A set of 300 pairs is manually reviewed to evaluate the metric and characterize the relationships between author keywords and indexing terms. Results show that author keywords are increasingly available in biomedical articles and that over 60% of author keywords can be linked to a closely related indexing term. Finally, we discuss the potential impact of this work on indexing and terminology development.

Download

PMC text mining subset in BioC: about three million full-text articles and growing

Bioinformatics, Jan 31, 2019

Download

NLM-Chem-BC7: manually annotated full-text resources for chemical entity annotation and indexing in biomedical articles

Database, 2022

Download

An Inference Method for Disease Name Normalization

National Conference on Artificial Intelligence, 2012

A context-blocks model for identifying clinical relationships in patient records

BMC Bioinformatics, Jun 9, 2011

Download

NLM-Gene, a richly annotated gold standard dataset for gene entities that addresses ambiguity and multi-species gene recognition

Journal of Biomedical Informatics, Jun 1, 2021

The automatic recognition of gene names and their corresponding database identifiers in biomedica... more The automatic recognition of gene names and their corresponding database identifiers in biomedical text is an important first step for many downstream text-mining applications. While current methods for tagging gene entities have been developed for biomedical literature, their performance on species other than human is substantially lower due to the lack of annotation data. We therefore present the NLM-Gene corpus, a high-quality manually annotated corpus for genes developed at the US National Library of Medicine (NLM), covering ambiguous gene names, with an average of 29 gene mentions (10 unique identifiers) per document, and a broader representation of different species (including Homo sapiens, Mus musculus, Rattus norvegicus, Drosophila melanogaster, Arabidopsis thaliana, Danio rerio, etc.) when compared to previous gene annotation corpora. NLM-Gene consists of 550 PubMed abstracts from 156 biomedical journals, doubly annotated by six experienced NLM indexers, randomly paired for each document to control for bias. The annotators worked in three annotation rounds until they reached complete agreement. This gold-standard corpus can serve as a benchmark to develop & test new gene text mining algorithms. Using this new resource, we have developed a new gene finding algorithm based on deep learning which improved both on precision and recall from existing tools. The NLM-Gene annotated corpus is freely available at ftp://ftp.ncbi.nlm.nih.gov/pub/lu/NLMGene. We have also applied this tool to the entire PubMed/PMC with their results freely accessible through our web-based tool PubTator (www.ncbi.nlm.nih.gov/research/pubtator).