An important aspect of current computer vision research is the analysis of scenes, and particularly the extraction of their 3D structure and their segmentation into objects. Linked to that is the interpretation of the images and/or recognition of types of images, either from the output of the 3D reconstruction and segmentation or directly from the images. Solving such problems is often accomplished by combining a variety of methods and/or attributes of the images, as the papers in this special issue show.

Ladicky et al. provide us with a framework to unify dense stereo reconstruction and object segmentation, where both are formulated as Random Field labelling which are jointly optimized. Evaluation is done on an enhanced Leuven data set, which is publicly available.

The paper by Hwang and Grauman provides an enhanced image search methodology by incorporating high-level human scene perception aspects, such as the order of associated key-words, into the visual representation used.

Sun et al. present a system that iteratively combines object detection, 3D scene layout estimation, and segmentation of the objects’ support region. As knowledge of the scene becomes available from the layout estimation, it can be used to improve the confidence in the object detection and so remove false detections. The results are demonstrated on the authors’ own dataset and two publicly available datasets.

Direkoglu et al. propose an anisotropic heat diffusion approach for skeleton extraction from either binary or grey-level images. To this end, the image is diffused mainly in the direction normal to the feature boundaries, but also allowing tangential diffusion to make a small contribution. The diffusion results are finally subject to non-maximal suppression and hysteresis thresholding.

Tsai et al. propose an off-line method for the tracking and segmentation of objects in video sequences. The task is formulated in the framework of a Markov Random Field as a minimization problem, taking into account both motion and attribute coherence. The objective function is optimized using an existing Fast-PD method.

Solving these problems, in particular in the context of classification, can be expensive. Improving the computational cost of these methods is therefore an important aspect of current research. Kim et al. describe how to convert a boosting classifier into a well-balanced super tree, using a novel boolean optimization method to maximize the region information gain. Hence they improve the efficiency of this well used approach while preserving the decision regions.

These six papers cover a wide range of approaches related to extracting and classifying objects from images and should appeal to a wide audience.