We consider the problem of minimizing the sum of a smooth function and a separable convex function. This problem includes as special cases bound-constrained optimization and smooth optimization with ℓ1-regularization. We propose a (block)... more
We consider the problem of minimizing the sum of a smooth function and a separable convex function. This problem includes as special cases bound-constrained optimization and smooth optimization with ℓ1-regularization. We propose a (block) coordinate gradient descent method for solving this class of nonsmooth separable problems. We establish global convergence and, under a local Lipschitzian error bound assumption, linear convergence for this method. The local Lipschitzian error bound holds under assumptions analogous to those for constrained smooth optimization, e.g., the convex function is polyhedral and the smooth function is (nonconvex) quadratic or is the composition of a strongly convex function with a linear mapping. We report numerical experience with solving the ℓ1-regularization of unconstrained optimization problems from Moré et al. in ACM Trans. Math. Softw. 7, 17–41, 1981 and from the CUTEr set (Gould and Orban in ACM Trans. Math. Softw. 29, 373–394, 2003). Comparison with L-BFGS-B and MINOS, applied to a reformulation of the ℓ1-regularized problem as a bound-constrained optimization problem, is also reported.
We describe a general methodology for the design of large-scale recursive neural network architec-tures (DAG-RNNs) which comprises three fundamental steps: (1) representation of a given domain using suitable directed acyclic graphs (DAGs)... more
We describe a general methodology for the design of large-scale recursive neural network architec-tures (DAG-RNNs) which comprises three fundamental steps: (1) representation of a given domain using suitable directed acyclic graphs (DAGs) to connect visible and ...
Wave-front distortion compensation using direct system performance metric optimization is studied both theo-retically and experimentally. It is shown how different requirements for wave-front control can be incorpo-rated, and how... more
Wave-front distortion compensation using direct system performance metric optimization is studied both theo-retically and experimentally. It is shown how different requirements for wave-front control can be incorpo-rated, and how information from different wave-front sensor types can be fused, within a generalized gradient descent optimization paradigm. In our experiments a very-large-scale integration (VLSI) system implement-ing a simultaneous perturbation stochastic approximation optimization algorithm was applied for real-time adaptive control of multielement wave-front correctors. The custom-chip controller is used in two adaptive laser beam focusing systems, one with a 127-element liquid-crystal phase modulator and the other with beam steering and 37-control channel micromachined deformable mirrors. The submillisecond response time of the micromachined deformable mirror and the parallel nature of the analog VLSI control architecture provide for high-speed adaptive compensation ...
Feature selection and feature weighting are useful techniques for improving the classification accuracy of K-nearest-neighbor (K-NN) rule. The term feature selection refers to algorithms that select the best subset of the input feature... more
Feature selection and feature weighting are useful techniques for improving the classification accuracy of K-nearest-neighbor (K-NN) rule. The term feature selection refers to algorithms that select the best subset of the input feature set. In feature weighting, each feature is multiplied by a weight value proportional to the ability of the feature to distinguish pattern classes. In this paper, a novel hybrid approach is proposed for simultaneous feature selection and feature weighting of K-NN rule based on Tabu Search (TS) heuristic. The proposed TS heuristic in combination with K-NN classifier is compared with several classifiers on various available data sets. The results have indicated a significant improvement in the performance in classification accuracy. The proposed TS heuristic is also compared with various feature selection algorithms. Experiments performed revealed that the proposed hybrid TS heuristic is superior to both simple TS and sequential search algorithms. We also present results for the classification of prostate cancer using multispectral images, an important problem in biomedicine.
SUMMARY Large-scale microarray gene expression data provide the possibility of constructing genetic networks or biological pathways. Gaussian graphical models have been suggested to provide an effective method for constructing such... more
SUMMARY Large-scale microarray gene expression data provide the possibility of constructing genetic networks or biological pathways. Gaussian graphical models have been suggested to provide an effective method for constructing such genetic networks. However, most of the available methods for constructing Gaussian graphs do not account for the sparsity of the networks and are computationally more demanding or in- feasible, especially in the settings of high dimension and low sample size. We introduce a threshold gradient descent (TGD) regularization procedure for estimating the sparse precision matrix in the setting of Gaussian graphical models and demonstrate its application to identifying genetic networks. Such a procedure is computationally feasible and can easily incorporate prior biological knowledge about the network structure. Simulation results indicate that the proposed method yields a better estimate of the precision matrix than the procedures that fail to account for the s...
In this paper we examine ensemble methods for regression that leverage or "boost" base regressors by iteratively calling them on modified samples. The most successful leveraging algorithm for classification is AdaBoost, an... more
In this paper we examine ensemble methods for regression that leverage or "boost" base regressors by iteratively calling them on modified samples. The most successful leveraging algorithm for classification is AdaBoost, an algorithm that requires only modest assumptions on the base learning method for its strong theoretical guarantees. We present several gradient descent leveraging algorithms for regression and prove AdaBoost-style bounds on their sample errors using intuitive assumptions on the base learners. We bound the complexity of the regression functions produced in order to derive PAC-style bounds on their generalization errors. Experiments validate our theoretical results.
Evolving gradient-learning artificial neural networks (ANNs) using an evolutionary algorithm (EA) is a popular approach to address the local optima and design problems of ANN. The typical approach is to combine the strength of... more
Evolving gradient-learning artificial neural networks (ANNs) using an evolutionary algorithm (EA) is a popular approach to address the local optima and design problems of ANN. The typical approach is to combine the strength of backpropagation (BP) in weight learning and EA's capability of searching the architecture space. However, the BP's "gradient descent" approach requires a highly computer-intensive operation that relatively restricts the search coverage of EA by compelling it to use a small population size. To address this problem, we utilized mutation-based genetic neural network (MGNN) to replace BP by using the mutation strategy of local adaptation of evolutionary programming (EP) to effect weight learning. The MGNN's mutation enables the network to dynamically evolve its structure and adapt its weights at the same time. Moreover, MGNN's EP-based encoding scheme allows for a flexible and less restricted formulation of the fitness function and makes ...
The authors propose a general fuzzy classification scheme with learning ability using an adaptive network. System parameters, such as the membership functions defined for each feature and the parameterized t-norms used to combine... more
The authors propose a general fuzzy classification scheme with learning ability using an adaptive network. System parameters, such as the membership functions defined for each feature and the parameterized t-norms used to combine conjunctive conditions, are calibrated with backpropagation. To explain this approach, the concept of adaptive networks is introduced and a supervised learning procedure based on a gradient descent algorithm is derived to update the parameters in an adaptive network. The proposed architecture is applied to two problems: two-spiral classification and Iris categorization. From the experimental results, it is concluded that the adaptively adjusted classifier performs well on an Iris classification problem. The results are discussed from the viewpoint of feature selection
Software estimation is a tedious and daunting task in project management and software development. Software estimators are notorious in predicting software effort and they have been struggling in the past decades to provide new models to... more
Software estimation is a tedious and daunting task in project management and software development. Software estimators are notorious in predicting software effort and they have been struggling in the past decades to provide new models to enhance software estimation. The most critical and crucial part of software estimation is when estimation is required in the early stages of the software life cycle where the problem to be solved has not yet been completely revealed. This paper presents a novel log-linear regression model based on the use case point model (UCP) to calculate the software effort based on use case diagrams. A fuzzy logic approach is used to calibrate the productivity factor in the regression model. Moreover, a multilayer perceptron (MLP) neural network model was developed to predict software effort based on the software size and team productivity. Experiments show that the proposed approach outperforms the original UCP model. Furthermore, a comparison between the MLP and log-linear regression models was conducted based on the size of the projects. Results demonstrate that the MLP model can surpass the regression model when small projects are used, but the log-linear regression model gives better results when estimating larger projects.► This paper focuses on creating a log-linear regression model for software effort estimation from use case diagrams. ► A multi layer perceptron (MLP) neural network model was also developed to predict software effort. ► The proposed approach can be used in the early stages of the software life cycle. ► The MLP model can be used as an alternative to regression models when small projects are used (<3000 person-hours).
This paper proposes a framework for dealing with several problems related to the analysis of shapes. Two related such problems are the definition of the relevant set of shapes and that of defining a metric on it. Following a recent... more
This paper proposes a framework for dealing with several problems related to the analysis of shapes. Two related such problems are the definition of the relevant set of shapes and that of defining a metric on it. Following a recent research monograph by Delfour and Zolésio [11], we consider the characteristic functions of the subsets of R2 and their distance functions. The L2 norm of the difference of characteristic functions, the L∞ and the W1,2 norms of the difference of distance functions define interesting topologies, in particular the well-known Hausdorff distance. Because of practical considerations arising from the fact that we deal with image shapes defined on finite grids of pixels, we restrict our attention to subsets of ℝ2 of positive reach in the sense of Federer [16], with smooth boundaries of bounded curvature. For this particular set of shapes we show that the three previous topologies are equivalent. The next problem we consider is that of warping a shape onto another by infinitesimal gradient descent, minimizing the corresponding distance. Because the distance function involves an inf, it is not differentiable with respect to the shape. We propose a family of smooth approximations of the distance function which are continuous with respect to the Hausdorff topology, and hence with respect to the other two topologies. We compute the corresponding Gâteaux derivatives. They define deformation flows that can be used to warp a shape onto another by solving an initial value problem.We show several examples of this warping and prove properties of our approximations that relate to the existence of local minima. We then use this tool to produce computational definitions of the empirical mean and covariance of a set of shape examples. They yield an analog of the notion of principal modes of variation. We illustrate them on a variety of examples.