Abstract
The large-scale video data on the web contain a lot of semantics, which are an important part of semantic web. Video descriptors can usually represent somewhat the semantics. Thus, they play a very important role in web multimedia content analysis, such as Scale-invariant feature transform (SIFT) feature. In this paper, we proposed a new video descriptor, called a temporal-compress and shorter SIFT(TC-S-SIFT) which can efficiently and effectively represent the semantics of web videos. By omitting the least discriminability orientation in three stages of standard SIFT on every representative frame, the dimensions of the shorter SIFT are reduced from 128-dimension to 96-dimension to save space storage. Then, the SIFT can be compressed by tracing SIFT features on video temporal domain, which highly compress the quantity of local features to reduce visual redundancy, and keep basically the robustness and discrimination. Experimental results show our method can yield comparable accuracy and compact storage size.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the 7th IEEE International Conference on Computer Vision, pp. 1150–1157 (1999)
Lowe, D.G.: Distinctive image features from scale-invariant key points. International Journal of Computer Vision 60(2), 91–110 (2004)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR, pp. 886–893 (2005)
Bay, H., Ess, A., Tuytelaars, T., Gool, L.V.: SURF: Speeded up robust features. CVIU 110(3), 346–359 (2008)
Ke, Y., Sukthankar, R.: PCA-SIFT: a more distinctive representation for local image descriptors. In: Proceedings of Computer Vision and Pattern Recognition, pp. 560–513 (2004)
Yi, J., Peng, Y., Xiao, J.: Exploiting semantic and visual context for effective video annotation. IEEE Trans. Multimed., 1400–1414 (2013)
Megrhi, S., Souidene, W., Beghdadi, A.: Spatio-temporal salient feature extraction for perceptual content based video retrieval. In: CVCS, pp. 1–7 (2013)
Coskun, B., Sankur, B., Memon, N.: Spatio-temporal transform based video hashing. IEEE Trans. on Multimedia, pp. 1190–1208 (2006)
Malekesmaeili, M., Fatourechi, M., Ward, R.K.: Video copy detection using temporally informative representative images. In: International Conference on Machine Learning and Applications, pp. 69–74 (2009)
Li, F.F., Fergus, R., Torralba, A.: Recognizing and learning object categories. In: Proceedings of the 12th IEEE International Conference on Computer Vision, Short course. The 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 506–513 (2009)
Brown, M., Lowe, D.G.: Automatic panoramic image stitching using invariant features. International Journal of Computer Vision, 59–73 (2007)
Qian, Y., Hui, R., Gao, X.H.: 3D CBIR with sparse coding for image-guided neurosurgery. Signal Processing 93, 1673–1683 (2013)
Burghouts, G.J., Geusebroek. J.M.: Performance evaluation of local colour invariants. Computer Vision and Image Understanding, 48–62 (2009)
Saeedi, P.P., Lawrence, D., Lowe, D.G.: Vision-based 3-D trajectory tracking for unknown environments. IEEE Transaction on Robotics 22(1), 119–136 (2006)
Zhong, S.H., Liu, Y., Wu, G.S.: S-SIFT: a shorter SIFT without least discriminability visual orientation. In: Proceedings of the 2012 IEEE/WIC/ACM International Conference on Web Intelligence, vol. 1, pp. 669–672 (2012)
Zhu, G.K., Wang, Q., Yuan, Y., Yan, P.K.: SIFT on manifold: An intrinsic description. Neurocomputing 113, 227–233 (2013)
Laptev, I., Lindeberg, T.: Local descriptors for spatio-temporal recognition. In: MacLean, W. (ed.) SCVMA 2004. LNCS, vol. 3667, pp. 91–103. Springer, Heidelberg (2006)
Girshick, A.R., Landy, M.S., Simoncelli, E.P.: Cardinal rules: visual orientation perception reflects knowledge of environmental statistics. Nat. Neurosci. 14, 926–932 (2011)
Reddy, K., Shah, M.: Recognizing 50 human action categories of web videos. In: Proc. Mach. Vision Applicat., pp. 1–11 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Zhu, Y., Jiang, C., Huang, X., Xiao, Z., Zhong, S. (2015). A Temporal-Compress and Shorter SIFT Research on Web Videos. In: Zhang, S., Wirsing, M., Zhang, Z. (eds) Knowledge Science, Engineering and Management. KSEM 2015. Lecture Notes in Computer Science(), vol 9403. Springer, Cham. https://doi.org/10.1007/978-3-319-25159-2_78
Download citation
DOI: https://doi.org/10.1007/978-3-319-25159-2_78
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25158-5
Online ISBN: 978-3-319-25159-2
eBook Packages: Computer ScienceComputer Science (R0)