Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
Serge Belongie
  • Cornell Tech
    111 8th Ave #302
    New York, NY  10011-5204
  • (858) 633-7487

Serge Belongie

  • Serge Belongie received the B.S. degree (with honor) in Electrical Engineering from the California Institute of Techn... moreedit
Abstract Segmentation of medical images has become an indispensable process to perform quantitative analysis of images of human organs and their functions. Normalized Cuts (NCut) is a spectral graph theoretic method that readily admits... more
Abstract Segmentation of medical images has become an indispensable process to perform quantitative analysis of images of human organs and their functions. Normalized Cuts (NCut) is a spectral graph theoretic method that readily admits combinations of different features for image segmentation. The computational demand imposed by NCut has been successfully alleviated with the Nystr��m approximation method for applications different than medical imaging.
Fowlkes et al.[7] recently introduced an approximation to the Normalized Cut (NCut) grouping algorithm [18] based on random subsampling and the Nystr��m extension. As presented, their method is restricted to the case where W, the weighted... more
Fowlkes et al.[7] recently introduced an approximation to the Normalized Cut (NCut) grouping algorithm [18] based on random subsampling and the Nystr��m extension. As presented, their method is restricted to the case where W, the weighted adjacency matrix, is positive definite. Although many common measures of image similarity (ie kernels) are positive definite, a popular example being Gaussian-weighted distance, there are important cases that are not.
Abstract We present a framework for motion segmentation that combines the concepts of layer-based methods and feature-based motion estimation. We estimate the initial correspondences by comparing vectors of filter outputs at interest... more
Abstract We present a framework for motion segmentation that combines the concepts of layer-based methods and feature-based motion estimation. We estimate the initial correspondences by comparing vectors of filter outputs at interest points, from which we compute candidate scene relations via random sampling of minimal subsets of correspondences.
Abstract Spectral graph theoretic methods have recently shown great promise for the problem of image segmentation, but due to the computational demands, applications of such methods to spatiotemporal data have been slow to appear For even... more
Abstract Spectral graph theoretic methods have recently shown great promise for the problem of image segmentation, but due to the computational demands, applications of such methods to spatiotemporal data have been slow to appear For even a short video sequence, the set of all pairwise voxel similarities is a huge quantity of data: one second of a 256�� 384 sequence captured at 30 Hz entails on the order of 10 13 pairwise similarities.
Abstract We design and implement a comprehensive study of the perception of gloss. This is the largest study of its kind to date, and the first to use real material measurements. In addition, we develop a novel multi-dimensional scaling... more
Abstract We design and implement a comprehensive study of the perception of gloss. This is the largest study of its kind to date, and the first to use real material measurements. In addition, we develop a novel multi-dimensional scaling (MDS) algorithm for analyzing pairwise comparisons. The data from the psychophysics study and the MDS algorithm is used to construct a low dimensional perceptual embedding of these bidirectional reflectance distribution functions (BRDFs).
Abstract We address the problem of tracking multiple, identical, nonrigid moving targets through occlusion for purposes of rodent surveillance from a side view. Automated behavior analysis of individual mice promises to improve animal... more
Abstract We address the problem of tracking multiple, identical, nonrigid moving targets through occlusion for purposes of rodent surveillance from a side view. Automated behavior analysis of individual mice promises to improve animal care and data collection in medical research. In our experiments, we consider the case of three brown mice that repeatedly occlude one another and have no stable trackable features.
Abstract We consider the non-metric multidimensional scaling problem: given a set of dissimilarities���, find an embedding whose inter-point Euclidean distances have the same ordering as���. In this paper, we look at a generalization of... more
Abstract We consider the non-metric multidimensional scaling problem: given a set of dissimilarities���, find an embedding whose inter-point Euclidean distances have the same ordering as���. In this paper, we look at a generalization of this problem in which only a set of order relations of the form dij< dkl are provided. Unlike the original problem, these order relations can be contradictory and need not be specified for all pairs of dissimilarities.
Object detection is one of the key problems in computer vision. In the last decade, discriminative learning approaches have proven effective in detecting rigid objects, achieving very low false positives rates. The field has also seen a... more
Object detection is one of the key problems in computer vision. In the last decade, discriminative learning approaches have proven effective in detecting rigid objects, achieving very low false positives rates. The field has also seen a resurgence of part-based recognition methods, with impressive results on highly articulated, diverse object categories. In this paper we propose a discriminative learning approach for detection that is inspired by part-based recognition approaches.
Abstract The increased interest in region based image coding has given rise to graph coloring based partition encoding methods. These methods are based on the four color theorem for planar graphs, and assume that a coloring for a graph... more
Abstract The increased interest in region based image coding has given rise to graph coloring based partition encoding methods. These methods are based on the four color theorem for planar graphs, and assume that a coloring for a graph with the minimum possible number of colors will result in the most compressible representation. We show that this assumption is wrong.
1 Abstract We propose to combine computer vision techniques implemented on our low-cost mobile vision platform with a haptic feedback interface to provide direction towards certain objects to a visually impaired user. This solution is a... more
1 Abstract We propose to combine computer vision techniques implemented on our low-cost mobile vision platform with a haptic feedback interface to provide direction towards certain objects to a visually impaired user. This solution is a novel approach to the problem of autonomous navigation in that it enables the user to avail of visual cues existing in the real world (street signs, crosswalk signs, bus stop signs, for example) without requiring any modifications to the urban infrastructure (such as implanted RFID tags).
Retrieving images from very large collections, using image content as a key, is becoming an important problem. Users prefer to ask for pictures using notions of content that are strongly oriented to the presence of abstractly defined... more
Retrieving images from very large collections, using image content as a key, is becoming an important problem. Users prefer to ask for pictures using notions of content that are strongly oriented to the presence of abstractly defined objects. Computer programs that implement these queries automatically are desirable, but are hard to build because conventional object recognition techniques from computer vision cannot recognize very general objects in very general contexts.
Abstract We present a practical, stratified autocalibration algorithm with theoretical guarantees of global optimality. Given a projective reconstruction, the first stage of the algorithm upgrades it to affine by estimating the position... more
Abstract We present a practical, stratified autocalibration algorithm with theoretical guarantees of global optimality. Given a projective reconstruction, the first stage of the algorithm upgrades it to affine by estimating the position of the plane at infinity. The plane at infinity is computed by globally minimizing a least squares formulation of the modulus constraints.
Abstract A rotation-invariant texture recognition system is presented. A steerable oriented pyramid is used to extract representative features for the input textures. The steerability of the filter set allows a shift to an invariant... more
Abstract A rotation-invariant texture recognition system is presented. A steerable oriented pyramid is used to extract representative features for the input textures. The steerability of the filter set allows a shift to an invariant representation via a DFT-encoding step. Supervised classification follows. State-of-the-art recognition results are presented on a 30 texture database with a comparison across the performance of the k-NN, backpropagation and rule-based classifiers.
Abstract Digital libraries can contain hundreds of thousands of pictures and video sequences. Typically, users of digital libraries wish to recover pictures and videos from collections based on the objects and actions depicted: this is... more
Abstract Digital libraries can contain hundreds of thousands of pictures and video sequences. Typically, users of digital libraries wish to recover pictures and videos from collections based on the objects and actions depicted: this is object recognition, in a form that emphasizes large, general modelbases, where new classes of object or action can be added easily.
Abstract In the course of modern medical research, it is common for a research facility to house thousands of caged mice, rats, rabbits, and other mammals in rooms known as vivaria. In any experiment involving a group of animals it is... more
Abstract In the course of modern medical research, it is common for a research facility to house thousands of caged mice, rats, rabbits, and other mammals in rooms known as vivaria. In any experiment involving a group of animals it is necessary to perform environmental and physiological monitoring to determine the effects of the procedure and the health of the animals involved.
Abstract We describe an algorithm for segmenting a novel image based on the available segmentation of another image. The algorithm consists of two stages. In the first stage we construct a locally connected graph to represent the novel... more
Abstract We describe an algorithm for segmenting a novel image based on the available segmentation of another image. The algorithm consists of two stages. In the first stage we construct a locally connected graph to represent the novel image. This graph is obtained by inheriting local connectivity between pixels based on the similarity of small neighborhoods in the two images. In the second stage a graph parititioning algorithm is used to partition the graph and recover the resulting segments in the image.
Abstract Automatic evaluation of human facial attractiveness is a challenging problem that has received relatively little attention from the computer vision community. Previous work in this area have posed attractiveness as a... more
Abstract Automatic evaluation of human facial attractiveness is a challenging problem that has received relatively little attention from the computer vision community. Previous work in this area have posed attractiveness as a classification problem. However, for applications that require fine-grained relationships between objects, learning to rank has been shown to be superior over the direct interpretation of classifier scores as ranks [27]. In this paper, we propose and implement a personalized relative beauty ranking system.
Abstract The goal of object category discovery is to automatically identify groups of image regions which belong to some new, previously unseen category. This task is typically performed in a purely unsupervised setting, and as a result,... more
Abstract The goal of object category discovery is to automatically identify groups of image regions which belong to some new, previously unseen category. This task is typically performed in a purely unsupervised setting, and as a result, performance depends critically upon accurate assessments of similarity between unlabeled image regions.
Abstract. We present a general approach and analytical method for determining a search region for use in guided matching under projective mappings. Our approach is based on the propagation of covariance through a first-order approximation... more
Abstract. We present a general approach and analytical method for determining a search region for use in guided matching under projective mappings. Our approach is based on the propagation of covariance through a first-order approximation of the error model to define the boundary of the search region for a specified probability and we provide an analytical expression for the Jacobian matrix used in the covariance propagation calculation.
Multiple Instance Learning (MIL) provides a framework for training a discriminative classifier from data with ambiguous labels. This framework is well suited for the task of learning object classifiers from weakly labeled image data,... more
Multiple Instance Learning (MIL) provides a framework for training a discriminative classifier from data with ambiguous labels. This framework is well suited for the task of learning object classifiers from weakly labeled image data, where only the presence of an object in an image is known, but not its location. Some recent work has explored the application of MIL algorithms to the tasks of image categorization and natural scene classification.
Abstract Contextual models play a very important role in the task of object recognition. Over the years, two kinds of contextual models have emerged: models with contextual inference based on the statistical summary of the scene (we will... more
Abstract Contextual models play a very important role in the task of object recognition. Over the years, two kinds of contextual models have emerged: models with contextual inference based on the statistical summary of the scene (we will refer to these as scene based context models, or SBC), and models representing the context in terms of relationships among objects in the image (object based context, or OBC).
Abstract In this work we present a novel application of non-rigid surface detection to enable gestural interaction with applicable surfaces. This method can add interactivity to traditionally passive media such as books, newspapers,... more
Abstract In this work we present a novel application of non-rigid surface detection to enable gestural interaction with applicable surfaces. This method can add interactivity to traditionally passive media such as books, newspapers, restaurant menus, or anything else printed on paper. We allow a user to interact with these surfaces in a natural manner and present basic gestures based on pointing and touching.
Blobworld is a system for image retrieval based on finding coherent image regions which roughly correspond to objects. Each image is automatically segmented into regions (���blobs���) with associated color and texture descriptors.... more
Blobworld is a system for image retrieval based on finding coherent image regions which roughly correspond to objects. Each image is automatically segmented into regions (���blobs���) with associated color and texture descriptors. Queryingi s based on the attributes of one or two regions of interest, rather than a description of the entire image. In order to make large-scale retrieval feasible, we index the blob descriptions using a tree.
Abstract Edge detection is one of the most studied problems in computer vision, yet it remains a very challenging task. It is difficult since often the decision for an edge cannot be made purely based on low level cues such as gradient,... more
Abstract Edge detection is one of the most studied problems in computer vision, yet it remains a very challenging task. It is difficult since often the decision for an edge cannot be made purely based on low level cues such as gradient, instead we need to engage all levels of information, low, middle, and high, in order to decide where to put edges. In this paper we propose a novel supervised learning algorithm for edge and object boundary detection which we refer to as Boosted Edge Learning or BEL for short.
Abstract This paper proposes a method to recover the embedding of the possible shapes assumed by a deforming nonrigid object by comparing triplets of frames from an orthographic video sequence. We assume that we are given features tracked... more
Abstract This paper proposes a method to recover the embedding of the possible shapes assumed by a deforming nonrigid object by comparing triplets of frames from an orthographic video sequence. We assume that we are given features tracked with no occlusions and no outliers but possible noise, an orthographic camera and that any 3D shape of a deforming object is a linear combination of several canonical shapes.
Abstract Retrieving images from large and varied collections using image content as a key is a challenging and important problem. We present a new image representation that provides a transformation from the raw pixel data to a small set... more
Abstract Retrieving images from large and varied collections using image content as a key is a challenging and important problem. We present a new image representation that provides a transformation from the raw pixel data to a small set of image regions that are coherent in color and texture. This" Blobworld" representation is created by clustering pixels in a joint color-texture-position feature space. The segmentation algorithm is fully automatic and has been run on a collection of 10,000 natural images.
Abstract The task of automatic lipreading requires two major preprocessing steps: localisation of the speaker's mouth and tracking of the salient lip movements. For the rst step, we employ a technique which we have termed Orientation... more
Abstract The task of automatic lipreading requires two major preprocessing steps: localisation of the speaker's mouth and tracking of the salient lip movements. For the rst step, we employ a technique which we have termed Orientation Template Correlation (OTC) which searches for facial features based on their characteristic orientation maps. In the spirit of Mase and Pentland MP91], we have chosen to focus on the estimation of two measures in particular: vertical lip motion and mouth elongation.
Abstract In many machine learning applications, precisely labeled data is either burdensome or impossible to collect. Multiple Instance Learning (MIL), in which training data is provided in the form of labeled bags rather than labeled... more
Abstract In many machine learning applications, precisely labeled data is either burdensome or impossible to collect. Multiple Instance Learning (MIL), in which training data is provided in the form of labeled bags rather than labeled instances, is one approach for dealing with ambiguously labeled data. In this paper we argue that in many applications of MIL (eg image, audio, text, bioinformatics) a single bag actually consists of a large or infinite number of instances, such as all points on a low dimensional manifold.
Abstract We address the problem of tracking multiple, identical, nonrigid moving targets through occlusion for purposes of rodent surveillance from a side view. Automated behavior analysis of individual mice promises to improve animal... more
Abstract We address the problem of tracking multiple, identical, nonrigid moving targets through occlusion for purposes of rodent surveillance from a side view. Automated behavior analysis of individual mice promises to improve animal care and data collection in medical research. In our experiments, we consider the case of three brown mice that repeatedly occlude one another and have no stable trackable features.
Abstract The problem of using pictures of objects captured under ideal imaging conditions (here referred to as in vitro) to recognize objects in natural environments (in situ) is an emerging area of interest in computer vision and pattern... more
Abstract The problem of using pictures of objects captured under ideal imaging conditions (here referred to as in vitro) to recognize objects in natural environments (in situ) is an emerging area of interest in computer vision and pattern recognition. Examples of tasks in this vein include assistive vision systems for the blind and object recognition for mobile robots; the proliferation of image databases on the web is bound to lead to more examples in the near future.
Abstract Retrieving images from large and varied collections using image content as a key is a challenging and important problem. In this paper, we present a new image representation which provides a transformation from the raw pixel data... more
Abstract Retrieving images from large and varied collections using image content as a key is a challenging and important problem. In this paper, we present a new image representation which provides a transformation from the raw pixel data to a small set of localized coherent regions in color and texture space. This so-called &ldquo; blobworld&rdquo; representation is based on segmentation using the expectation-maximization algorithm on combined color and texture features.
Abstract Accurate spectral decomposition is essential for the analysis and diagnosis of histologically stained tissue sections. In this paper we present the first automated system for performing this decomposition. We compare the... more
Abstract Accurate spectral decomposition is essential for the analysis and diagnosis of histologically stained tissue sections. In this paper we present the first automated system for performing this decomposition. We compare the performance of our system with ground truth data and report favorable results.
Abstract The efficiency and robustness of a vision system is often largely determined by the quality of the image features available to it. In data mining, one typically works with immense volumes of raw data, which demands effective... more
Abstract The efficiency and robustness of a vision system is often largely determined by the quality of the image features available to it. In data mining, one typically works with immense volumes of raw data, which demands effective algorithms to explore the data space. In analogy to data mining, the space of meaningful features for image analysis is also quite vast.
We present a novel approach to measuring similarity between shapes and exploit it for object recognition. In our framework, the measurement of similarity is preceded by (1) solving for correspondences between points on the two shapes, and... more
We present a novel approach to measuring similarity between shapes and exploit it for object recognition. In our framework, the measurement of similarity is preceded by (1) solving for correspondences between points on the two shapes, and (2) using the correspondences to estimate an aligning transform. In order to solve the correspondence problem, we attach a descriptor, the shape context, to each point.
Abstract The paper makes two contributions: it provides (1) an operational definition of textons, the putative elementary units of texture perception, and (2) an algorithm for partitioning the image into disjoint regions of coherent... more
Abstract The paper makes two contributions: it provides (1) an operational definition of textons, the putative elementary units of texture perception, and (2) an algorithm for partitioning the image into disjoint regions of coherent brightness and texture, where boundaries of regions are defined by peaks in contour orientation energy and differences in texton densities across the contour. B.
Caltech-UCSD Birds 200 (CUB-200) is a challenging image dataset annotated with 200 bird species. It was created to enable the study of subordinate categorization, which is not possible with other popular datasets that focus on basic level... more
Caltech-UCSD Birds 200 (CUB-200) is a challenging image dataset annotated with 200 bird species. It was created to enable the study of subordinate categorization, which is not possible with other popular datasets that focus on basic level categories (such as PASCAL VOC, Caltech-101, etc). The images were downloaded from the website Flickr and filtered by workers on Amazon Mechanical Turk. Each image is annotated with a bounding box, a rough bird segmentation, and a set of attribute labels.
Abstract Retrieving images from very large collections using image content as a key is becoming an important problem. Classifying images into visual categories and finding objects in image databases are two major challenges in the field.... more
Abstract Retrieving images from very large collections using image content as a key is becoming an important problem. Classifying images into visual categories and finding objects in image databases are two major challenges in the field. This paper describes our approach toward the first of the two tasks, the generalization of which we believe will assist in the second task as well.
Abstract Recent work in object localization has shown that the use of contextual cues can greatly improve accuracy over models that use appearance features alone. Although many of these models have successfully explored different types of... more
Abstract Recent work in object localization has shown that the use of contextual cues can greatly improve accuracy over models that use appearance features alone. Although many of these models have successfully explored different types of contextual sources, they only consider one type of contextual interaction (eg, pixel, region or object level interactions), leaving open questions about the true potential contribution of context. Furthermore, contributions across object classes and over appearance features still remain unknown.
This document is meant to serve as an addendum to [1], published at BMVC 2009. The purpose of this addendum is twofold:(1) to respond to feedback we've received since publication and (2) to describe a number of changes, especially to the... more
This document is meant to serve as an addendum to [1], published at BMVC 2009. The purpose of this addendum is twofold:(1) to respond to feedback we've received since publication and (2) to describe a number of changes, especially to the non-maximal suppression, that further improve performance.
Abstract We propose a framework for large scale learning and annotation of structured models. The system interleaves interactive labeling (where the current model is used to semi-automate the labeling of a new example) and online learning... more
Abstract We propose a framework for large scale learning and annotation of structured models. The system interleaves interactive labeling (where the current model is used to semi-automate the labeling of a new example) and online learning (where a newly labeled example is used to update the current model parameters).
The thin plate spline (TPS) is an effective tool for modeling coordinate transformations that has been applied successfully in several computer vision applications. Unfortunately the solution requires the inversion of ap�� p matrix, where... more
The thin plate spline (TPS) is an effective tool for modeling coordinate transformations that has been applied successfully in several computer vision applications. Unfortunately the solution requires the inversion of ap�� p matrix, where p is the number of points in the data set, thus making it impractical for large scale applications. As it turns out, a surprisingly good approximate solution is often possible using only a small subset of corresponding points.
Abstract Retrieving images from large and varied collections using image content as a key is a challenging and important problem. In this paper we present a new image representation which provides a transformation from the raw pixel data... more
Abstract Retrieving images from large and varied collections using image content as a key is a challenging and important problem. In this paper we present a new image representation which provides a transformation from the raw pixel data to a small set of image regions which are coherent in color and texture space. This so-called ���blobworld��� representation is based on segmentation using the expectation-maximization algorithm on combined color and texture features.
Abstract In this work we introduce a novel approach to object categorization that incorporates two types of context-co-occurrence and relative location-with local appearance-based features. Our approach, named CoLA (for co-occurrence,... more
Abstract In this work we introduce a novel approach to object categorization that incorporates two types of context-co-occurrence and relative location-with local appearance-based features. Our approach, named CoLA (for co-occurrence, location and appearance), uses a conditional random field (CRF) to maximize object label agreement according to both semantic and spatial relevance. We model relative location between objects using simple pairwise features.
Abstract We present a novel approach to non-rigid structure from motion (NRSFM) from an orthographic video sequence, based on a new interpretation of the problem. Existing approaches assume the object shape space is well-modeled by a... more
Abstract We present a novel approach to non-rigid structure from motion (NRSFM) from an orthographic video sequence, based on a new interpretation of the problem. Existing approaches assume the object shape space is well-modeled by a linear subspace. Our approach only assumes that small neighborhoods of shapes are well-modeled with a linear subspace. This constrains the shapes to belong to a manifold of dimensionality equal to the number of degrees of freedom of the object.
Abstract Current image search engines on the web rely purely on the keywords around the images and the filenames, which produces a lot of garbage in the search results. Alternatively, there exist methods for content based image retrieval... more
Abstract Current image search engines on the web rely purely on the keywords around the images and the filenames, which produces a lot of garbage in the search results. Alternatively, there exist methods for content based image retrieval that require a user to submit a query image, and return images that are similar in content. We propose a novel approach named ReSPEC (Re-ranking Sets of Pictures by Exploiting Consistency), that is a hybrid of the two methods.
We present an interactive, hybrid human-computer method for object classification. The method applies to classes of objects that are recognizable by people with appropriate expertise (eg, animal species or airplane model), but not (in... more
We present an interactive, hybrid human-computer method for object classification. The method applies to classes of objects that are recognizable by people with appropriate expertise (eg, animal species or airplane model), but not (in general) by people without such expertise. It can be seen as a visual version of the 20 questions game, where questions based on simple visual attributes are posed interactively.
Abstract We present a novel approach to measuring similarity between shapes and exploit it for object recognition. In our framework, the measurement of similarity is preceded by (1) solving for correspondences between points on the two... more
Abstract We present a novel approach to measuring similarity between shapes and exploit it for object recognition. In our framework, the measurement of similarity is preceded by (1) solving for correspondences between points on the two shapes,(2) using the correspondences to estimate an aligning transform. In order to solve the correspondence problem, we attach a descriptor, the shape context, to each point.
Abstract A given (overcomplete) discrete oriented pyramid may be converted into a steerable pyramid by interpolation. We present a technique for deriving the optimal interpolation functions (otherwise calledsteering coefficients'). The... more
Abstract A given (overcomplete) discrete oriented pyramid may be converted into a steerable pyramid by interpolation. We present a technique for deriving the optimal interpolation functions (otherwise calledsteering coefficients'). The proposed scheme is demonstrated on a computationally efficient oriented pyramid, which is a variation on the Burt and Adelson (1983) pyramid. We apply the generated steerable pyramid to orientation-invariant texture analysis in order to demonstrate its excellent rotational isotropy.
Abstract This paper describes a method by which a computer can autonomously acquire training data for learning to recognize a user's face. The computer, in this method, actively seeks out opportunities to acquire informative face... more
Abstract This paper describes a method by which a computer can autonomously acquire training data for learning to recognize a user's face. The computer, in this method, actively seeks out opportunities to acquire informative face examples. Using the principles of co-training, it combines a face detector trained on a single input image with tracking to extract face examples for learning.

And 66 more