It is our great pleasure to welcome you to the 3rd ACM International Workshop on Data and Text Mining in Bioinformatics (DTMBIO'09), in conjunction with the 18th ACM International Conference on Information and Knowledge Management (CIKM'09). This year's workshop continues its tradition of bringing together researchers who work in the field of data mining, text mining, and computational biology and providing a forum to present and discuss frontier research topics at the interface of the related fields. The mission of the DTMBIO is to promote a tightened connection between literature search and data analysis for bioinformatics solutions. In particular, this year we focus on the following two themes: (1) Data and text mining solutions for Bioinformatics that identify relevant background knowledge in textual documents, such as scientific publications, or in database annotations. And (2) Ambitious knowledge discovery solutions that process heterogeneous biomedical data collected from electronic bulletin boards, scientific publications, and any type of experiments.
The call for papers attracted 18 submissions from Asia, Europe, and the United States. The program committee accepted eight papers as full papers. These papers cover a variety of topics, including environmental genomic information analysis, knowledge emergence model, drug-drug interaction extraction in literature, breast cancer diagnosis based on metabolome profiles, page ranking in biomedical literature search, sequence alignment, cancer genes discovery, word sense disambiguation. More over, the program committee accepted six poster papers. This year we have initiated a new technology track for researchers to present the most recent industry application of data and text mining in bioinformatics research. We accepted two papers in the technology track. We hope that these proceedings will serve as a valuable reference for latest progresses on applying data mining and text mining techniques in bioinformatics research.
Proceeding Downloads
Data mining in bioinformatics: challenges and opportunities
In this talk I will discuss some data mining techniques and methods in the bioinformatics domain, what are the main challenges and what are the opportunities. I will cover some of the issues related to biomedical literature mining, bioinformatics data ...
Unsupervised binning of environmental genomic fragments based on an error robust selection of l-mers
With the rapid development of genome sequencing techniques, the traditional research methods of microorganisms based on the isolation and cultivation are gradually replaced by metagenomics, also known as environmental genomics. The first step, which is ...
Multivariate classification of urine metabolome profiles for breast cancer diagnosis
Diagnosis techniques using urine are non-invasive, inexpensive, and easy to perform in clinical settings. The metabolites in urine, as the end products of cellular processes, are closely linked to phenotypes. Although research using urine metabolome has ...
DrugNerAR: linguistic rule-based anaphora resolver for drug-drug interaction extraction in pharmacological documents
DrugNerAR, a drug anaphora resolution system is presented to address the problem of co-referring expressions in pharmacological literature. This development is part of a larger and innovative study about automatic drug-drug interaction extraction. ...
Linear predictive coding representation of correlated mutation for protein sequence alignment
Although both conservation and correlated mutation (CM) are important information reflecting the different sorts of context in multiple sequence alignment, most of alignment methods use sequence profiles that only represent conservation. There is no ...
Mining cancer genes with running-sum statistics
In this paper, we propose a new method to detect candidate cancer genes for developing molecular biomarkers or therapeutic targets from cancer microarray datasets. To resolve problems resulted in the molecular heterogeneity of cancers on gene ...
Enabling multi-level relevance feedback on pubmed by integrating rank learning into DBMS
Background: Finding relevant articles from PubMed is challenging because it is hard to express the user's specific intention in the given query interface, and a keyword query typically retrieves a large number of results. Researchers have applied ...
MKEM: a multi-level knowledge emergence model for mining undiscovered public knowledge
Since Swanson proposed the Undiscovered Public Knowledge (UPK) model, there have been many approaches to uncover UPK by mining the biomedical literature. These earlier works, however, required substantial manual intervention to reduce the number of ...
Learning word sense disambiguation in biomedical text with difference between training and test distributions
Word sense disambiguation (WSD) is a crucial issue in bio-medical text mining since the performance of diverse biomedical text mining techniques strongly depends on the senses of lexicons. Thus, it is natural to consider lexicons as the most crucial ...
Efficient computation of impact degrees for multiple reactions in metabolic networks with cycles
Analysis of the robustness of a metabolic network against of single or multiple reaction(s) is useful for mining important enzymes/genes. For that purpose, the impact degree was proposed by Jiang et al. In this short paper, we extend the impact degree ...
A web-based comparative visualization system for human endogenous RetroVirus(HERV) on whole genomes
Human Endogenous RetroViruses(HERVs) are suggested that they have a function of regulating the activity of human genes and could produce protein in some conditions. So it is crucial to examine the physical layout relationship between HERVs and genes in ...
Incremental non-gaussian analysis of microarray gene expression data
The microarray is gaining popularity in biomedical research due to its ability to analyze hundreds to thousands of genes simultaneously in one experiment. However, the unique nature of microarray data, with a large number of features but relative small ...
A graph-based approach for biomedical thesaurus expansion
The addition of new terms to biomedical thesauri is important for keeping pace with new research. In the context of a thesaurus expansion task, we investigate the property of Laplacian diffusion kernel matrices that depreciate pivotal vertices having ...
LITSEEK: public health literature search by metadata enhancement with external knowledge bases
- Priyanka Sharad Prabhu,
- Shamkant B. Navathe,
- Stephen Tyler,
- Venu Dasigi,
- Neha Narkhede,
- Balaji Palanisamy
Biomedical literature is an important source of information in any researcher's investigation of genes, risk factors, diseases and drugs. Often the information searched by public health researchers is distributed across multiple disparate sources that ...
The challenge of high recall in biomedical systematic search
Clinical systematic reviews are based on expert, laborious search of well-annotated literature. Boolean search on bibliographic databases, such as MEDLINE, continues to be the preferred discovery method, but the size of these databases, now approaching ...
A large-scale gene network inference system for systems biology on supercomputing resources
Motivation: Although gene expression data has been continuously accumulated and meta-analysis approaches have been developed to integrate independent expression profiles into larger datasets, the amount of information is still insufficient to infer ...
An outcome discovery system to determine mortality factors in primary care facilities
This project assembles a virtual team consisting of personnel from the New Jersey Institute of Technology with expertise in the data mining domain and the Saint Barnabas Health Care System with expertise in the medical domain. We apply proven techniques ...
- Proceedings of the third international workshop on Data and text mining in bioinformatics
Recommendations
Acceptance Rates
Year | Submitted | Accepted | Rate |
---|---|---|---|
DTMBIO '14 | 211 | 22 | 10% |
DTMBIO '13 | 18 | 11 | 61% |
DTMBIO '09 | 18 | 8 | 44% |
Overall | 247 | 41 | 17% |