Abstract Genome mapping, or the experimental determination of the ordering of DNA markers on a ch... more Abstract Genome mapping, or the experimental determination of the ordering of DNA markers on a chromosome, is an important step in genome sequencing and ultimate assembly of sequenced genomes. The presented research addresses the problem of identifying markers that cannot be placed reliably. If such markers are included in standard mapping procedures they can result in an overall poor mapping.
Abstract In data mining applications it is common to have more than one data source available to ... more Abstract In data mining applications it is common to have more than one data source available to describe the same record. For example, in biological sciences, the same genes may be characterized through many types of experiments. Which of the data sources proves to be most reliable in predictions may depend on the record in question.
Abstract We propose a frequent pattern-based algorithm for predicting functions and localizations... more Abstract We propose a frequent pattern-based algorithm for predicting functions and localizations of proteins from their primary structure (amino acid sequence). We use reduced alphabets that capture the higher rate of substitution between amino acids that are physiochemically similar. Frequent sub strings are mined from the training sequences, transformed into different alphabets, and used as features to train an ensemble of SVMs.
In the course of evolution, the genomes of grasses have maintained an observable degree of gene o... more In the course of evolution, the genomes of grasses have maintained an observable degree of gene order conservation. The information available for already sequenced genomes can be used to predict the gene order of nonsequenced species by means of comparative colinearity studies. The" Wheat Zapper" application presented here performs on-demand colinearity analysis between wheat, rice, Sorghum, and Brachypodium in a simple, time efficient, and flexible manner.
Abstract An algorithm is presented for clustering sequential data in which each unit is a collect... more Abstract An algorithm is presented for clustering sequential data in which each unit is a collection of vectors. An example of such a type of data is speaker data in a speaker clustering problem. The algorithm first constructs affinity matrices between each pair of units, using a modified version of the Point Distribution algorithm which is initially developed for mining patterns between vector and item data. The subsequent clustering procedure is based on fitting a Gaussian mixture model on multiple random projection matrices. The ...
Abstract Genome mapping, or the experimental determination of the ordering of DNA markers on a ch... more Abstract Genome mapping, or the experimental determination of the ordering of DNA markers on a chromosome, is an important step in genome sequencing and ultimate assembly of sequenced genomes. The presented research addresses the problem of identifying markers that cannot be placed reliably. If such markers are included in standard mapping procedures they can result in an overall poor mapping.
Abstract In data mining applications it is common to have more than one data source available to ... more Abstract In data mining applications it is common to have more than one data source available to describe the same record. For example, in biological sciences, the same genes may be characterized through many types of experiments. Which of the data sources proves to be most reliable in predictions may depend on the record in question.
Abstract We propose a frequent pattern-based algorithm for predicting functions and localizations... more Abstract We propose a frequent pattern-based algorithm for predicting functions and localizations of proteins from their primary structure (amino acid sequence). We use reduced alphabets that capture the higher rate of substitution between amino acids that are physiochemically similar. Frequent sub strings are mined from the training sequences, transformed into different alphabets, and used as features to train an ensemble of SVMs.
In the course of evolution, the genomes of grasses have maintained an observable degree of gene o... more In the course of evolution, the genomes of grasses have maintained an observable degree of gene order conservation. The information available for already sequenced genomes can be used to predict the gene order of nonsequenced species by means of comparative colinearity studies. The" Wheat Zapper" application presented here performs on-demand colinearity analysis between wheat, rice, Sorghum, and Brachypodium in a simple, time efficient, and flexible manner.
Abstract An algorithm is presented for clustering sequential data in which each unit is a collect... more Abstract An algorithm is presented for clustering sequential data in which each unit is a collection of vectors. An example of such a type of data is speaker data in a speaker clustering problem. The algorithm first constructs affinity matrices between each pair of units, using a modified version of the Point Distribution algorithm which is initially developed for mining patterns between vector and item data. The subsequent clustering procedure is based on fitting a Gaussian mixture model on multiple random projection matrices. The ...
Uploads
Papers by Loai Alnemer