Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Generating Descriptive Visual Words and Visual Phrases for Large-Scale Image Applications

Published: 01 September 2011 Publication History

Abstract

Bag-of-visual Words (BoWs) representation has been applied for various problems in the fields of multimedia and computer vision. The basic idea is to represent images as visual documents composed of repeatable and distinctive visual elements, which are comparable to the text words. Notwithstanding its great success and wide adoption, visual vocabulary created from single-image local descriptors is often shown to be not as effective as desired. In this paper, descriptive visual words (DVWs) and descriptive visual phrases (DVPs) are proposed as the visual correspondences to text words and phrases, where visual phrases refer to the frequently co-occurring visual word pairs. Since images are the carriers of visual objects and scenes, a descriptive visual element set can be composed by the visual words and their combinations which are effective in representing certain visual objects or scenes. Based on this idea, a general framework is proposed for generating DVWs and DVPs for image applications. In a large-scale image database containing 1506 object and scene categories, the visual words and visual word pairs descriptive to certain objects or scenes are identified and collected as the DVWs and DVPs. Experiments show that the DVWs and DVPs are informative and descriptive and, thus, are more comparable with the text words than the classic visual words. We apply the identified DVWs and DVPs in several applications including large-scale near-duplicated image retrieval, image search re-ranking, and object recognition. The combination of DVW and DVP performs better than the state of the art in large-scale near-duplicated image retrieval in terms of accuracy, efficiency and memory consumption. The proposed image search re-ranking algorithm: DWPRank outperforms the state-of-the-art algorithm by 12.4% in mean average precision and about 11 times faster in efficiency.

Cited By

View all
  • (2022)Discovering informative features in large-scale landmark image collectionJournal of Information Science10.1177/016555152095065348:2(237-250)Online publication date: 1-Apr-2022
  • (2020)BoVW model based on adaptive local and global visual words modeling and log-based relevance feedback for semantic retrieval of the imagesJournal on Image and Video Processing10.1186/s13640-020-00516-42020:1Online publication date: 6-Jul-2020
  • (2020)Personality Trait Classification Based on Co-occurrence Pattern Modeling with Convolutional Neural NetworkHCI International 2020 – Late Breaking Papers: Interaction, Knowledge and Social Media10.1007/978-3-030-60152-2_27(359-370)Online publication date: 19-Jul-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Image Processing
IEEE Transactions on Image Processing  Volume 20, Issue 9
September 2011
300 pages

Publisher

IEEE Press

Publication History

Published: 01 September 2011

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Discovering informative features in large-scale landmark image collectionJournal of Information Science10.1177/016555152095065348:2(237-250)Online publication date: 1-Apr-2022
  • (2020)BoVW model based on adaptive local and global visual words modeling and log-based relevance feedback for semantic retrieval of the imagesJournal on Image and Video Processing10.1186/s13640-020-00516-42020:1Online publication date: 6-Jul-2020
  • (2020)Personality Trait Classification Based on Co-occurrence Pattern Modeling with Convolutional Neural NetworkHCI International 2020 – Late Breaking Papers: Interaction, Knowledge and Social Media10.1007/978-3-030-60152-2_27(359-370)Online publication date: 19-Jul-2020
  • (2019)Modeling Dyadic and Group Impressions with Intermodal and Interperson FeaturesACM Transactions on Multimedia Computing, Communications, and Applications10.1145/326575415:1s(1-30)Online publication date: 24-Jan-2019
  • (2019)Square texton histogram features for image retrievalMultimedia Tools and Applications10.1007/s11042-018-5795-x78:3(2719-2746)Online publication date: 1-Feb-2019
  • (2018)Improving Bag-of-Visual-Words model using visual n-grams for human action classificationExpert Systems with Applications: An International Journal10.1016/j.eswa.2017.09.01692:C(182-191)Online publication date: 1-Feb-2018
  • (2017)Local Pattern Collocations Using Regional Co-occurrence FactorizationIEEE Transactions on Multimedia10.1109/TMM.2016.261991219:3(492-505)Online publication date: 1-Mar-2017
  • (2016)ShapeLearnerProceedings of the Twenty-second European Conference on Artificial Intelligence10.3233/978-1-61499-672-9-435(435-443)Online publication date: 29-Aug-2016
  • (2016)Extended Discriminative Spatial PyramidProceedings of the 8th International Conference on Signal Processing Systems10.1145/3015166.3015182(51-55)Online publication date: 21-Nov-2016
  • (2016)Fast beta wavelet network-based feature extraction for image copy detectionNeurocomputing10.1016/j.neucom.2015.04.113173:P2(306-316)Online publication date: 15-Jan-2016
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media