Learning hierarchical semantic description via mixed-norm regularization for image understanding

L Li, S Jiang, Q Huang - IEEE Transactions on Multimedia, 2012 - ieeexplore.ieee.org
IEEE Transactions on Multimedia, 2012ieeexplore.ieee.org
This paper proposes a new perspective-Vicept representation to solve the problem of visual
polysemia and concept polymorphism in the large-scale semantic image understanding.
Vicept characterizes the membership probability distribution between visual appearances
and semantic concepts, and forms a hierarchical representation of image semantic from
local to global. In the implementation, incorporating group sparse coding, visual appearance
is encoded as a weighted sum of dictionary elements, which could obtain more accurate …
This paper proposes a new perspective-Vicept representation to solve the problem of visual polysemia and concept polymorphism in the large-scale semantic image understanding. Vicept characterizes the membership probability distribution between visual appearances and semantic concepts, and forms a hierarchical representation of image semantic from local to global. In the implementation, incorporating group sparse coding, visual appearance is encoded as a weighted sum of dictionary elements, which could obtain more accurate image representation with sparsity at the image level. To obtain discriminative Vicept descriptions with structural sparsity, mixed-norm regularization is adopted in the optimization problem for learning the concept membership distribution of visual appearance. Furthermore, we introduce a novel image distance measurement based on the hierarchical Vicept description, where different levels of Vicept distance are fused together by multi-level separability analysis. Finally, the wide applications of Vicept description are validated in our experiments, including large-scale semantic image search, image annotation, and semantic image re-ranking.
ieeexplore.ieee.org