In this paper, we propose using augmented hypotheses which consider objectness, foreground and compactness for salient object detection. Our algorithm consists of four basic steps. First, our method generates the objectness map via... more
In this paper, we propose using augmented hypotheses which consider objectness, foreground and compactness for salient object detection. Our algorithm consists of four basic steps. First, our method generates the objectness map via object-ness hypotheses. Based on the objectness map, we estimate the foreground margin and compute the corresponding foreground map which prefers the foreground objects. From the objectness map and the foreground map, the compactness map is formed to favor the compact objects. We then derive a saliency measure that produces a pixel-accurate saliency map which uniformly covers the objects of interest and consistently separates fore-and background. We finally evaluate the proposed framework on two challenging datasets, MSRA-1000 and iCoSeg. Our extensive experimental results show that our method outperforms state-of-the-art approaches.
Research Interests:
In this work, we propose an alternative ground truth to the eye fixation map in visual attention study, called touch saliency. As can be directly collected from the recorded data of users’ daily browsing behavior on widely used smart... more
In this work, we propose an alternative ground truth to the eye fixation map in visual attention study, called touch saliency.
As can be directly collected from the recorded data of users’ daily browsing behavior on widely used smart phone devices with touch screens, the touch saliency data is easy to obtain. Due to the limited screen size, smart phone users usually move and zoom in the images, and fix the region of interest on the screen when browsing images. Our studies are two-fold. First, we collect and study the characteristics of these touch screen fixation maps (named as touch saliency) by comprehensive comparisons with their counterpart, i.e., the eye-fixation maps (namely, visual saliency). The comparisons show that the touch saliency is highly correlated with the eye fixations for the same stimuli, which indicates its utility in data collection for visual attention study. Based on the consistency between both touch saliency and visual saliency, our second task is to propose a unified saliency prediction model for both visual and touch
saliency detection. This model utilizes middle-level object category features extracted from pre-segmented image superpixels as input to the recently proposed multitask sparsity pursuit (MTSP) framework for saliency prediction. Extensive evaluations show that the proposed middle-level category features can considerably improve the saliency prediction performance when taking both touch saliency and visual saliency as groundtruth.
As can be directly collected from the recorded data of users’ daily browsing behavior on widely used smart phone devices with touch screens, the touch saliency data is easy to obtain. Due to the limited screen size, smart phone users usually move and zoom in the images, and fix the region of interest on the screen when browsing images. Our studies are two-fold. First, we collect and study the characteristics of these touch screen fixation maps (named as touch saliency) by comprehensive comparisons with their counterpart, i.e., the eye-fixation maps (namely, visual saliency). The comparisons show that the touch saliency is highly correlated with the eye fixations for the same stimuli, which indicates its utility in data collection for visual attention study. Based on the consistency between both touch saliency and visual saliency, our second task is to propose a unified saliency prediction model for both visual and touch
saliency detection. This model utilizes middle-level object category features extracted from pre-segmented image superpixels as input to the recently proposed multitask sparsity pursuit (MTSP) framework for saliency prediction. Extensive evaluations show that the proposed middle-level category features can considerably improve the saliency prediction performance when taking both touch saliency and visual saliency as groundtruth.
In this paper, we present an adaptive nonparametric solution to the image parsing task, namely annotating each image pixel with its corresponding category label. For a given test image, first, a locality-aware retrieval set is extracted... more
In this paper, we present an adaptive nonparametric solution to the image parsing task, namely annotating each image pixel with its corresponding category label. For a given test image, first, a locality-aware retrieval set is extracted from the training data based on super-pixel matching similarities, which are augmented with feature extraction for better differentiation
of local super-pixels. Then, the category of each super-pixel is
initialized by the majority vote of the k-nearest-neighbor superpixels in the retrieval set. Instead of fixing k as in traditional non-parametric approaches, here we propose a novel adaptive nonparametric approach which determines the sample-specific k for each test image. In particular, k is adaptively set to be the number of the fewest nearest super-pixels which the images in the retrieval set can use to get the best category prediction. Finally, the initial super-pixel labels are further refined by contextual smoothing. Extensive experiments on challenging datasets demonstrate the superiority of the new solution over other state-of-the-art nonparametric solutions.
of local super-pixels. Then, the category of each super-pixel is
initialized by the majority vote of the k-nearest-neighbor superpixels in the retrieval set. Instead of fixing k as in traditional non-parametric approaches, here we propose a novel adaptive nonparametric approach which determines the sample-specific k for each test image. In particular, k is adaptively set to be the number of the fewest nearest super-pixels which the images in the retrieval set can use to get the best category prediction. Finally, the initial super-pixel labels are further refined by contextual smoothing. Extensive experiments on challenging datasets demonstrate the superiority of the new solution over other state-of-the-art nonparametric solutions.
Human weight estimation is useful in a variety of potential applications, e.g. targeted advertisement, entertainment scenarios and forensic science. However, estimating weight only from color cues is particularly challenging since these... more
Human weight estimation is useful in a variety of potential applications, e.g. targeted advertisement, entertainment scenarios and forensic science. However, estimating weight only from color cues is particularly challenging since these cues are quite sensitive to lighting and imaging conditions. In this article, we propose a novel weight estimator based on a single RGB-D image, which utilizes the visual color cues and depth information. Our main contributions are three-fold. First, we construct the W8-400 dataset including RGB-D images of dierent people with ground truth weight. Second, the novel sideview shape feature and the feature fusion model are proposed to facilitate weight estimation. Additionally, we also consider gender as another important factor for human weight estimation. Third, we conduct comprehensive experiments using various regression models and feature fusion models on the new weight dataset, and encouraging results are obtained based on the proposed features and models.