This volume contains the papers presented at the Seventh Annual International Conference on Computational Biology, also known as RECOMB 2003, which was held in Berlin, Germany on April 10-13, 2003. The RECOMB conference series was started by Sorin Istrail, Pavel Pevzner and Michael Waterman in 1997. It is a tradition that polished versions of many of the papers from the Proceedings appear in the Journal of Computational Biology, which is closely affiliated with the Conference.This year, 175 papers were submitted, of which the Program Committee selected 35 for presentation at the meeting and inclusion in this Proceedings. Each submission was refereed by at least three Program Committee members.RECOMB 2003 had nine invited speakers: Edward N. Trifonov (University of Haifa; The Stanislav Ulam Memorial Computational Biology Address), Christiane Nüsslein-Volhard (Max-Planck-Institut für Entwicklungsbiologie; Distinguished Biology Lecture), Árpád Furka (Eötvös Loránd University: Distinguished New Technologies Lecture), Andrew G. Clark (Cornell University), David Haussler (University of California, Santa Cruz), Arthur Lesk (University of Cambridge), Dieter Oesterhelt (Max Planck Institute for Biochemistry), Terry Speed (University of California at Berkeley), and Kari Stefansson (deCode Genetics).The RECOMB 2003 program had three components: the keynote speakers, the contributed talks, and the poster session. The lively poster session with 183 accepted abstracts was an important ingredient of the conference program. Rainer Spang and Patricia Béziat organized the poster session and coordinated the publication of a separate volume containing the poster abstracts.RECOMB 2003 took place under the patronage of Edelgard Bulmahn, the German Federal Minister for Education and Research, who also opened the conference. The Steering committee wishes to thank the Minister for her efforts in supporting RECOMB 2003 and furthering the field of computational biology.
Efficient extraction of mapping rules of atoms from enzymatic reaction data
Extraction of mapping rules of atoms from enzymatic reaction data is useful for drug design, simulation of tracer experiments and consistency checking of pathway databases. Most of previous methods for this problem are based on maximal common subgraph ...
On de novo interpretation of tandem mass spectra for peptide identification
The correct interpretation of tandem mass spectra is a difficult problem, even when it is limited to scoring peptides against a database. De novo sequencing is considerably harder, but critical when sequence databases are incomplete or not available. In ...
Haplotypes and informative SNP selection algorithms: don't block out information
It is widely hoped that variation in the human genome will provide a means of predicting risk of a variety of complex, chronic diseases. A major stumbling block to the successful identification of association between human DNA polymorphisms (SNPs) and ...
Modeling dependencies in protein-DNA binding sites
The availability of whole genome sequences and high-throughput genomic assays opens the door for in silico analysis of transcription regulation. This includes methods for discovering and characterizing the binding sites of DNA-binding proteins, such as ...
Efficient exact value computation and applications to biosequence analysis
Like other fields of life sciences, bioinformatics has turned to capture biological phenomena through probabilistic models, and to analyse these models using statistical methodology. A central computational problem in applying useful statistical ...
Towards optimally multiplexed applications of universal DNA tag systems
We study a design and optimization problem that occurs, for example, when single nucleotide polymorphisms (SNPs) are to be genotyped using a universal DNA tag array. The problem of optimizing the universal array to avoid disruptive cross-hybridization ...
A comparative analysis method for detecting binding sites in coding regions
While the problem of predicting transcription factor binding sites in a gene's promoter region has been extensively studied, binding sites located in coding regions are also crucial for regulating gene expression but are more difficult to detect. Coding ...
Designing seeds for similarity search in genomic DNA
Large-scale comparison of genomic DNA is of fundamental importance in annotating functional elements of genomes. To perform large comparisons e.ciently, BLAST [3, 2] and other widely used tools use seeded alignment, which compares only sequences that ...
Maximum likelihood on four taxa phylogenetic trees: analytic solutions
Maximum likelihood (ML) is increasingly used as an optimality criterion for selecting evolutionary trees (Felsenstein, 1981), but finding the global optimum is a hard computational task. Because no general analytic solution is known, numeric techniques ...
Phylogenetically and spatially conserved word pairs associated with gene expression changes in yeasts
Background. Transcriptional regulation in eukaryotes is often multifactorial, involving multiple transcription factors binding to the same transcription control region (e.g., upstream activating sequences and enhancers), and to understand the regulatory ...
Haplotype phase inference
Most of the information being collected on DNA variation among people does not identify which of the two parents transmitted which of the two copies of each gene. Even worse, the parent of origin is often scrambled for each single nucleotide ...
An integrated probabilistic model for functional prediction of proteins
We develop an integrated probabilistic model to combine protein physical interactions, genetic interactions, highly correlated gene expression network, protein complex data, and domain structures of individual proteins to predict protein functions. The ...
Large scale reconstruction of haplotypes from genotype data
Critical to the understanding of the genetic basis for complex diseases is the modeling of human variation. Most of this variation can be characterized by single nucleotide polymorphisms (SNPs) which are mutations at a single nucleotide position. To ...
Optimizing exact genetic linkage computations
Genetic linkage analysis is a challenging application which requires Bayesian networks consisting of thousands of vertices. Consequently, computing the likelihood of data, which is needed for learning linkage parameters, using exact inference procedures ...
Combinatorial synthesis on macroscopic solid support units
The quality of our every day life is strongly affected by the new compounds and new materials introduced as a result of scientific research. In the last decade of the last century the traditional approach of research, that is, making and testing one new ...
Finding recurrent sources in sequences
Many genomic sequences and, more generally, (multivariate) time series display tremendous variability. However, often it is reasonable to assume that the sequence is actually generated by or assembled from a small number of sources, each of which might ...
Model-based inference of haplotype block variation
The uneven recombination structure of human DNA has been highlighted by several recent studies. Knowledge of the haplotype blocks generated by this phenomenon can be applied to dramatically increase the statistical power of genetic mapping. Several ...
Computational analysis of the human and other mammalian genomes
Working drafts are now available for the human, mouse and rat genomes, and other mammalian genome sequences are on the way. We discuss some of the key bioinformatic analysis problems presented by this data, including the problems of assembling the ...
Accurate detection of very sparse sequence motifs
Protein sequence alignments are more reliable the shorter the evolutionary distance. Here, we align distantly related proteins using many closely spaced intermediate sequences as stepping stones. Such transitive alignments can be generated between any ...
Engineering a scalable placement heuristic for DNA probe arrays
Design of DNA arrays for very large-scale immobilized polymer synthesis (VLSIPS) [8] seeks to minimize effects of unintended illumination during mask exposure steps. [9, 14] formulate this requirement as the Border Minimization Problem and give methods ...
Whole-genome comparative annotation and regulatory motif discovery in multiple yeast species
In [13] we reported the genome sequences of S. paradoxus, S. mikatae and S. bayanus and compared these three yeast species to their close relative, S. cerevisiae. Genome-wide comparative analysis allowed the identification of functionally important ...
Joint classifier and feature optimization for cancer diagnosis using gene expression data
Recent research has demonstrated quite convincingly that accurate cancer diagnosis can be achieved by constructing classifiers that are designed to compare the gene expression profile of a tissue of unknown cancer status to a database of stored ...
Large a polynomial-time nuclear vector replacement algorithm for automated NMR resonance assignments
High-throughput NMR structural biology can play an important role in structural genomics. We report an automated procedure for high-throughput NMR resonance assignment for a protein of known structure, or of an homologous structure. These assignments ...
A complete and effective move set for simplified protein folding
We present new lowest energy configurations for several large benchmark problems for the two-dimensional hydrophobic-hydrophilic model. We found these solutions with a generic implementation of tabu search using an apparently novel set of ...
Invited: Prediction of protein function
A genome sequence embodies the potential life of an organism, but implementation of genetic information depends on the functions of the proteins that it encodes. Many proteins of known sequence and even of known structure present challenges to ...
Efficient rule-based haplotyping algorithms for pedigree data
We study haplotype reconstruction under the Mendelian law of inheritance and the minimum recombination principleon pedigree data. We prove that the problem of finding a mini-mum-recombinant haplotype configuration (MRHC) is in general NP-hard. This is ...
Haplotype reconstruction from SNP alignment
In this paper we describe a method for statistical reconstruction of haplotypes from a set of aligned SNP fragments. We consider the case of a pair of homologous human chromosomes, one from the mother and the other from the father. After fragment ...
Gene selection criterion for discriminant microarray data analysis based on extreme value distributions
An important issue commonly encountered in the analysis of microarray data is to decide which and how many genes should be selected for further studies. For discriminant microarray data analyses based on statistical models, such as the logistic ...
A multi-expert system for the automatic detection of protein domains from sequence information
We describe a novel method for detecting the domain structure of a protein from sequence information alone. The method is based on analyzing multiple sequence alignments that are derived from a database search. Multiple measures are defined to quantify ...
Recommendations
Acceptance Rates
Year | Submitted | Accepted | Rate |
---|---|---|---|
RECOMB '03 | 175 | 35 | 20% |
RECOMB '02 | 118 | 35 | 30% |
RECOMB '01 | 128 | 35 | 27% |
RECOMB '97 | 117 | 43 | 37% |
Overall | 538 | 148 | 28% |