Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

    Yong Rui

    ... The bottom architecture is an example of ho wthe model is used to describe an image object. ... V ariousfeatures, such as color, texture, shape, layout, motion parameters, etc,are extracted to make the MIR system flexible enough to... more
    ... The bottom architecture is an example of ho wthe model is used to describe an image object. ... V ariousfeatures, such as color, texture, shape, layout, motion parameters, etc,are extracted to make the MIR system flexible enough to support different information need of different ...
    IntroductionWith the advances in storage technology and the advent of the World Wide Web, there has been an explosionin the amount and complexity of digital information being generated, analyzed, stored, accessed and transmitted.Most of... more
    IntroductionWith the advances in storage technology and the advent of the World Wide Web, there has been an explosionin the amount and complexity of digital information being generated, analyzed, stored, accessed and transmitted.Most of this data is multimedia in nature, including digital images, video, audio and simple text data. To makeuse of this vast amount of multimedia data, we need
    Person-based indices and timelines can enable fast and non-linear access to recorded meetings. This paper focuses on how to automatically construct those indices and timelines by using face recognition techniques. While there exist... more
    Person-based indices and timelines can enable fast and non-linear access to recorded meetings. This paper focuses on how to automatically construct those indices and timelines by using face recognition techniques. While there exist extensive research in generic face recognition, recognizing faces in recorded meetings is still an understudied area. Real-world meeting videos impose several interesting and unique challenges including complex lighting, low imaging quality, and large variations in head pose and size. In this paper, a promising approach based on MRC- Boosting is presented to address these challenges, which achieves encouraging performance on real-world meeting videos and shows superior accuracy and robustness compared to two popular existing approaches.
    Research Interests:
    ... free paper. Printed in the United States of America. Page 7. To my parents, Wei, and Michael. -Sean To my parents, Dongqin, and Olivia. -Yong To Margaret; Caroline,Marjorie, Thomas, Gregory. -Tom Page 8. Page 9. Contents 1 ...
    Machine-aided retrieval of multimedia information—image [44], video [170], or audio [195], etc.—is achieved based on representations in the form of descriptors (or feature vectors). Two issues arise: one is the effectiveness of the... more
    Machine-aided retrieval of multimedia information—image [44], video [170], or audio [195], etc.—is achieved based on representations in the form of descriptors (or feature vectors). Two issues arise: one is the effectiveness of the representation, ie, to what extent can the meaningful contents of the media be represented in these vectors? The other is the selection of a similarity metric during the retrieval process. This is an important issue because the similarity metric dynamically depends upon the user and the user defined query class, ...
    ABSTRACT Different from the existing work focusing on emotion type detection, the proposed approach in this paper provides flexibility for users to pick up their favorite affective content by choosing either emotion intensity levels or... more
    ABSTRACT Different from the existing work focusing on emotion type detection, the proposed approach in this paper provides flexibility for users to pick up their favorite affective content by choosing either emotion intensity levels or emotion types. Specifically, we propose a hierarchical structure for movie emotions and analyze emotion intensity and emotion type by using arousal and valence related features hierarchically. Firstly, three emotion intensity levels are detected by using fuzzy c-mean clustering on arousal features. Fuzzy clustering provides a mathematical model to represent vagueness, which is close to human perception. Then, valence related features are used to detect five emotion types. Considering video is continuous time series data and the occurrence of a certain emotion is affected by recent emotional history, conditional random fields (CRFs) are used to capture the context information. Outperforming Hidden Markov Model, CRF relaxes the independence assumption for states required by HMM and avoids bias problem. Experimental results show that CRF-based hierarchical method outperforms the one-step method on emotion type detection. User study shows that majority of the viewers prefer to have option of accessing movie content by emotion intensity levels. Majority of the users are satisfied with the proposed emotion detection.
    ... Furthermore, the HSV, CIE-LAB, and Munsell color spaces also attempt to make the colorspace perceptu-ally uniform. ... We chose the HSV (hue, saturation, and value) color space for simplicity. Spatial space is just the 2-D Cartesian... more
    ... Furthermore, the HSV, CIE-LAB, and Munsell color spaces also attempt to make the colorspace perceptu-ally uniform. ... We chose the HSV (hue, saturation, and value) color space for simplicity. Spatial space is just the 2-D Cartesian space spanned ...
    Combining learning with vision techniques in interactive image retrieval has been an active research topic during the past few years. However, existing learning techniques either are based on heuristics or fail to analyze the working... more
    Combining learning with vision techniques in interactive image retrieval has been an active research topic during the past few years. However, existing learning techniques either are based on heuristics or fail to analyze the working con-ditions. Furthermore, there is almost no in depth study ...
    Supporting multimedia search has emerged as an important research topic. There are three paradigms on the research spectrum that ranges from the least automatic to the most automatic. On the far left end, there is the pure manual labeling... more
    Supporting multimedia search has emerged as an important research topic. There are three paradigms on the research spectrum that ranges from the least automatic to the most automatic. On the far left end, there is the pure manual labeling paradigm that labels multimedia content, e.g., images and video clips, manually with text labels and then use text search to search
    Abstract Content-Based Image Retrieval (CBIR) has become one of the most active research areas in the past few years. Many visual feature representations have been explored and many systems built. While these research e orts establish the... more
    Abstract Content-Based Image Retrieval (CBIR) has become one of the most active research areas in the past few years. Many visual feature representations have been explored and many systems built. While these research e orts establish the basis of CBIR, the usefulness of ...
    We propose a new multiple instance learning (MIL) al-gorithm to learn image categories. Unlike existing MIL al-gorithms, in which the individual instances in a bag are as-sumed to be independent with each other, we develop con-current... more
    We propose a new multiple instance learning (MIL) al-gorithm to learn image categories. Unlike existing MIL al-gorithms, in which the individual instances in a bag are as-sumed to be independent with each other, we develop con-current tensors to explicitly model the inter-dependency ...
    This paper provides a comprehensive survey of the technical achievements in the research area of image retrieval, especially content-based image retrieval, an area that has been so active and prosperous in the past few years. The survey... more
    This paper provides a comprehensive survey of the technical achievements in the research area of image retrieval, especially content-based image retrieval, an area that has been so active and prosperous in the past few years. The survey includes 100+ papers covering the ...
    Decrypting the secret of beauty or attractiveness has been the pursuit of artists and philosophers for centuries. To date, the computational model for attractiveness estimation has been actively explored in computer vision and multimedia... more
    Decrypting the secret of beauty or attractiveness has been the pursuit of artists and philosophers for centuries. To date, the
    computational model for attractiveness estimation has been actively explored in computer vision and multimedia community, yet with the focus mainly on facial features. In this article, we conduct a comprehensive study on female  attractiveness conveyed by single/multiplemodalities of cues, that is, face, dressing and/or voice, and aim to discover how different modalities individually and collectively affect the human sense of beauty. To extensively investigate the problem, we collect the Multi-Modality Beauty (M2B) dataset, which is annotated with attractiveness levels converted from manual k-wise ratings and semantic attributes of different modalities. Inspired by the common consensus that middle-level attribute prediction can assist higher-level computer vision tasks, we manually labeled many attributes for each modality. Next, a tri-layer Dual-supervised Feature-Attribute-Task (DFAT) network is proposed to jointly learn the attribute model and attractiveness model of single/multiple modalities. To remedy possible loss of information caused by incomplete manual attributes, we also propose a novel Latent Dual-supervised Feature-Attribute-Task (LDFAT) network, where latent attributes are combined with manual attributes to contribute to the final attractiveness estimation. The extensive experimental evaluations on the collected M2B dataset well demonstrate the effectiveness of the proposed DFAT and LDFAT networks for female attractiveness prediction.