Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
Fuxin Li
  • Portland, OR, USA
The Continuous-Time Hidden Markov Model (CT-HMM) is an attractive approach to modeling disease progression due to its ability to describe noisy observations arriving irregularly in time. However, the lack of an efficient parameter... more
The Continuous-Time Hidden Markov Model (CT-HMM) is an attractive approach to modeling disease progression due to its ability to describe noisy observations arriving irregularly in time. However, the lack of an efficient parameter learning algorithm for CT-HMM restricts its use to very small models or requires unrealistic constraints on the state transitions. In this paper, we present the first complete characterization of efficient EM-based learning methods for CT-HMM models. We demonstrate that the learning problem consists of two challenges: the estimation of posterior state probabilities and the computation of end-state conditioned statistics. We solve the first challenge by reformulating the estimation problem in terms of an equivalent discrete time-inhomogeneous hidden Markov model. The second challenge is addressed by adapting three approaches from the continuous time Markov chain literature to the CT-HMM domain. We demonstrate the use of CT-HMMs with more than 100 states to visualize and predict disease progression using a glaucoma dataset and an Alzheimer's disease dataset.
ABSTRACT The One-Class Support Vector Machine (OC-SVM) is an unsupervised learning algorithm, identifying unusual or outlying points (outliers) from a given dataset. In OC-SVM, it is required to set the regularization hyperparameter and... more
ABSTRACT The One-Class Support Vector Machine (OC-SVM) is an unsupervised learning algorithm, identifying unusual or outlying points (outliers) from a given dataset. In OC-SVM, it is required to set the regularization hyperparameter and kernel hyperparameter in order to obtain a good estimate. Generally, cross-validation is often used which requires multiple runs with different hyperparameters, making it very slow. Recently, the solution path algorithm becomes popular. It can obtain every solution for all hyperparameters in a single run rather than re-solve the optimization problem multiple times. Generalizing from previous algorithms for solution path in SVMs, this paper proposes a complete set of solution path algorithms for OC-SVM, including a ν-path algorithm and a kernel-path algorithm. In the kernel-path algorithm, a new method is proposed to avoid the failure of algorithm due to indefinite matrix . Using those algorithms, we can obtain the optimum hyperparameters by computing an entire path solution with the computational cost O(n 2 + cnm 3) on ν-path algorithm or O(cn 3 + cnm 3) on kernel-path algorithm (c: constant, n: the number of sample, m: the number of sample which on the margin).
Research Interests:
Human urinary proteome analysis is a convenient and efficient approach for understanding disease processes affecting the kidney and urogenital tract. Many potential biomarkers have been identified in previous differential analyses;... more
Human urinary proteome analysis is a convenient and efficient approach for understanding disease processes affecting the kidney and urogenital tract. Many potential biomarkers have been identified in previous differential analyses; however, dynamic variations of the urinary proteome have not been intensively studied, and it is difficult to conclude that potential biomarkers are genuinely associated with disease rather then simply being physiological proteome variations. In this paper, pooled and individual urine samples were used to analyze dynamic variations in the urinary proteome. Five types of pooled samples (first morning void, second morning void, excessive water-drinking void, random void, and 24 h void) collected in 1 day from six volunteers were used to analyze intra-day variations. Six pairs of first morning voids collected a week apart were used to study inter-day, inter-individual, and inter-gender variations. The intra-day, inter-day, inter-individual, and inter-gender variation analyses showed that many proteins were constantly present with relatively stable abundances, and some of these had earlier been reported as potential disease biomarkers. In terms of sensitivity, the main components of the five intra-day urinary proteomes were similar, and the second morning void is recommended for clinical proteome analysis. The advantages and disadvantages of pooling samples are also discussed. The data presented describe a pool of stable urinary proteins seen under different physiological conditions. Any significant qualitative or quantitative changes in these stable proteins may mean that such proteins could serve as potential urinary biomarkers.
The urinary proteome is known to be a valuable field of study related to organ functions. There have been several extensive urine proteome studies. However, the overlapping rate among different studies is relatively low. Whether the low... more
The urinary proteome is known to be a valuable field of study related to organ functions. There have been several extensive urine proteome studies. However, the overlapping rate among different studies is relatively low. Whether the low overlapping rate was caused by different sample sources, preparation, separation and identification methods is unknown. Moreover, low molecular mass (<10 kDa) proteins have not been studied extensively. In this report, male and female pooled urine samples were collected from healthy volunteers. The urinary proteins were acetone precipitated, separated and identified by three approaches, 1-DE plus 1-D LC/MS/MS, direct 1-D LC/MS/MS and 2-D LC/MS/MS. 1-D tricine gels were used to separate low molecular mass proteins. The tandem mass spectra of positive identifications were quality controlled both by manual validation and using advanced mass spectrum scanner software. A total of 226 urinary proteins were identified; 171 proteins were identified by proteomics approach for the first time, including 4 male-specific proteins. Twelve low molecular mass proteins were identified. Most urinary proteins had a molecular mass between 30 and 60 kDa and a pI between 4 and 10. The apparent molecular masses of many proteins were different from theoretical ones, which indicated their post-translational modification and degradation. The effects of sample preparation, separation and identification methods on the overlapping rate of different experiments are discussed.
Deep networks are often not scale-invariant hence their performance can vary wildly if recognizable objects are at an unseen scale occurring only at testing time. In this paper, we propose ScaleNet, which recursively predicts object scale... more
Deep networks are often not scale-invariant hence their performance can vary wildly if recognizable objects are at an unseen scale occurring only at testing time. In this paper, we propose ScaleNet, which recursively predicts object scale in a deep learning framework. With an explicit objective to predict the scale of objects in images, ScaleNet enables pretrained deep learning models to identify objects in the scales that are not present in their training sets. By recursively calling ScaleNet, one can generalize to very large scale changes unseen in the training set. To demonstrate the robustness of our proposed framework, we conduct experiments with pretrained as well as fine-tuned classification and detection frameworks on MNIST, CIFAR-10, and MS COCO datasets and results reveal that our proposed framework significantly boosts the performances of deep networks.
The Continuous-Time Hidden Markov Model (CT-HMM) is an attractive approach to modeling disease progression due to its ability to describe noisy observations arriving irregularly in time. However, the lack of an efficient parameter... more
The Continuous-Time Hidden Markov Model (CT-HMM) is an attractive approach to modeling disease progression due to its ability to describe noisy observations arriving irregularly in time. However, the lack of an efficient parameter learning algorithm for CT-HMM restricts its use to very small models or requires unrealistic constraints on the state transitions. In this paper, we present the first complete characterization of efficient EM-based learning methods for CT-HMM models. We demonstrate that the learning problem consists of two challenges: the estimation of posterior state probabilities and the computation of end-state conditioned statistics. We solve the first challenge by reformulating the estimation problem in terms of an equivalent discrete time-inhomogeneous hidden Markov model. The second challenge is addressed by adapting three approaches from the continuous time Markov chain literature to the CT-HMM domain. We demonstrate the use of CT-HMMs with more than 100 states to ...
Tandem mass spectrometry (MS/MS) has been widely used in proteomics studies. Multiple algorithms have been developed for assessing matches between MS/MS spectra and peptide sequences in databases. However, it is still a challenge to... more
Tandem mass spectrometry (MS/MS) has been widely used in proteomics studies. Multiple algorithms have been developed for assessing matches between MS/MS spectra and peptide sequences in databases. However, it is still a challenge to reduce false negative rates without compromising the high confidence of peptide identification. In this study, we developed the score, Oscore, by logistic regression using SEQUEST and AMASS variables to identify fully tryptic peptides. Since these variables showed complicated association with each other, combining them together rather than applying them to a threshold model improved the classification of correct and incorrect peptide identifications. Oscore achieved both a lower false negative rate and a lower false positive rate than PeptideProphet on datasets from 18 known protein mixtures and several proteome-scale samples of different complexity, database size and separation methods. By a three-way comparison among Oscore, PeptideProphet and another logistic regression model which made use of PeptideProphet's variables, the main contributor for the improvement made by Oscore is discussed. Copyright © 2008 John Wiley & Sons, Ltd.
We present an approach to visual object-class recognition and segmentation based on a pipeline that combines multiple, holistic figure-ground hypotheses generated in a bottom-up, object independent process. Decisions are performed based... more
We present an approach to visual object-class recognition and segmentation based on a pipeline that combines multiple, holistic figure-ground hypotheses generated in a bottom-up, object independent process. Decisions are performed based on continuous estimates of the spatial overlap between image segment hypotheses and each putative class. We differ from existing approaches not only in our seemingly unreasonable assumption that good object-level segments can be obtained in a feed-forward fashion, but also in framing recognition as a regression problem. Instead of focusing on a one-vs-all winning margin that can scramble ordering inside the non-maximum (non-winning) set, learning produces a globally consistent ranking with close ties to segment quality, hence to the extent entire object or part hypotheses spatially overlap with the ground truth. We demonstrate results beyond the current state of the art for image classification, object detection and semantic segmentation, in a number of challenging datasets including Caltech-101, ETHZ-Shape and PASCAL VOC 2009.
RScore, a new criterion of randomicity for evaluating tandem mass (MS/MS) spectra, is described. RScore is defined as the relative quality in cross-correlation and matched intensity percentage of a potentially positive peptide to those of... more
RScore, a new criterion of randomicity for evaluating tandem mass (MS/MS) spectra, is described. RScore is defined as the relative quality in cross-correlation and matched intensity percentage of a potentially positive peptide to those of other possible candidates for the same spectrum. By utilizing RScore combined with less stringent SEQUEST score filters, the number of true positive peptides can be increased and the number of false positives in datasets from a known protein mixture can be reduced compared with current SEQUEST parameters used alone. This algorithm is simple and adds little overheads to SEQUEST computation. Copyright © 2004 John Wiley & Sons, Ltd.
Approximations based on random Fourier features have recently emerged as an efficient and elegant methodology for designing large-scale kernel machines [4]. By expressing the kernel as a Fourier expansion, features are generated based on... more
Approximations based on random Fourier features have recently emerged as an efficient and elegant methodology for designing large-scale kernel machines [4]. By expressing the kernel as a Fourier expansion, features are generated based on a finite set of random basis projections with inner products that are Monte Carlo approximations to the original kernel. However, the original Fourier features are only applicable to translation-invariant kernels and are not suitable for histograms that are always non-negative. This paper extends the concept of translation-invariance and the random Fourier feature methodology to arbitrary, locally compact Abelian groups. Based on empirical observations drawn from the exponentiated χ 2 kernel, the state-of-the-art for histogram descriptors, we propose a new group called the skewed-multiplicative group and design translation-invariant kernels on it. Experiments show that the proposed kernels outperform other kernels that can be similarly approximated. In a semantic segmentation experiment on the PASCAL VOC 2009 dataset, the approximation allows us to train large-scale learning machines more than two orders of magnitude faster than previous nonlinear SVMs.