Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3011077.3011138acmotherconferencesArticle/Chapter ViewAbstractPublication PagessoictConference Proceedingsconference-collections
research-article

Searching a specific person in a specific location using deep features

Published: 08 December 2016 Publication History

Abstract

Video instance search or also well known as object retrieval is a fundamental task in computer vision field and has a lot of applications. Most state-of-the-art systems are based on the Bag-of-Words model (BOW) for representing video frames and target object. When searching on nearly planar and rich-textured objects such as buildings and book cover, BOW argue to be a suitable model with very high performance. However, when searching on harder but more popular objects such as a specific person, BOW model still keep a lower performance. In this paper, we consider a new type of query which covers most popular topics: searching a person in a specific location. Inspired by recent successes of deep learning techniques, we propose new framework which leverage the powerful of both BOW model and deep feature in instance search. In particular, we use a linear kernel classifier instead of using L2 distance to compute similarity between two deep features. For further improvement, scene tracking are employed to deal with the cases face of query person is not detected. To evaluate the proposed methods, we conduct experiments over a standard benchmark dataset (TRECVID Instance Search 2016) with more than 300 GB in storage and 464 hours in duration. The results show that, our proposed methods significant improve the baseline system.

References

[1]
R. Arandjelović and A. Zisserman. Three things everyone should know to improve object retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), CVPR '12, pages 2911--2918, Washington, DC, USA, 2012.
[2]
R. Arandjelović and A. Zisserman. All about VLAD. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1578--1585, 2013.
[3]
A. Babenko and V. S. Lempitsky. Aggregating deep convolutional features for image retrieval. CoRR, abs/1510.07493, 2015.
[4]
A. Babenko, A. Slesarev, A. Chigorin, and V. Lempitsky. Neural codes for image retrieval. In D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, editors, Computer Vision - ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6--12, 2014, Proceedings, Part I, pages 584--599. Springer International Publishing, Cham, 2014.
[5]
Y. Cao, C. Wang, Z. Li, L. Zhang, and L. Zhang. Spatial-bag-of-features. In IEEE Conference on Computer Vision and Pattern Recognition, pages 3352--3359, June 2010.
[6]
D. Chen, X. Cao, L. Wang, F. Wen, and J. Sun. Bayesian face revisited: A joint formulation. In Proceedings of the European Conference on Computer Vision - Volume Part III, ECCV'12, pages 566--579, Berlin, Heidelberg, 2012. Springer-Verlag.
[7]
O. Chum, M. Perdoch, A. Mikulik, and J. Matas. Total recall ii: Query expansion revisited. In IEEE Conference on Computer Vision and Pattern Recognition, pages 889--896, Los Alamitos, CA, USA, 2011.
[8]
O. Chum, J. Philbin, J. Sivic, M. Isard, and A. Zisserman. Total recall: Automatic query expansion with a generative feature model for object retrieval. In IEEE International Conference on Computer Vision, 2007.
[9]
R. G. Cinbis, J. Verbeek, and C. Schmid. Unsupervised Metric Learning for Face Identification in TV Video. In ICCV 2011 - International Conference on Computer Vision, pages 1559--1566, Barcelona, Spain, Nov. 2011. IEEE.
[10]
H. Jegou, M. Douze, and C. Schmid. Hamming embedding and weak geometric consistency for large scale image search. In Proceedings of the European Conference on Computer Vision: Part I, ECCV '08, pages 304--317, Berlin, Heidelberg, 2008. Springer-Verlag.
[11]
H. Jégou and A. Zisserman. Triangulation embedding and democratic aggregation for image search. In CVPR - International Conference on Computer Vision and Pattern Recognition, Columbus, United States, June 2014.
[12]
C. Lu and X. Tang. Surpassing human-level face verification performance on lfw with gaussian face. In Proceedings of the AAAI Conference on Artificial Intelligence, AAAI'15, pages 3811--3819. AAAI Press, 2015.
[13]
P. Over, J. Fiscus, G. Sanders, D. Joy, M. Michel, G. Awad, A. Smeaton, W. Kraaij, and G. QuÃl'not. Trecvid 2014 -- an overview of the goals, tasks, data, evaluation mechanisms and metrics. In Proceedings of TRECVID 2014. NIST, USA, 2014.
[14]
O. M. Parkhi, K. Simonyan, A. Vedaldi, and A. Zisserman. A compact and discriminative face track descriptor. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE, IEEE, 2014.
[15]
O. M. Parkhi, A. Vedaldi, and A. Zisserman. Deep face recognition. In British Machine Vision Conference, 2015.
[16]
J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large vocabularies and fast spatial matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2007.
[17]
J. Philbin, M. Isard, J. Sivic, and A. Zisserman. Lost in quantization: Improving particular object retrieval in large scale image databases. In In CVPR, 2008.
[18]
A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson. Cnn features off-the-shelf: An astounding baseline for recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPRW '14, pages 512--519, Washington, DC, USA, 2014. IEEE Computer Society.
[19]
S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. In Neural Information Processing Systems (NIPS), 2015.
[20]
F. Schroff, D. Kalenichenko, and J. Philbin. Facenet: A unified embedding for face recognition and clustering. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015.
[21]
X. Shen, Z. Lin, J. Brandt, S. Avidan, and Y. Wu. Object retrieval and localization with spatially-constrained similarity measure and k-nn re-ranking. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 3013--3020, June 2012.
[22]
K. Simonyan, O. M. Parkhi, A. Vedaldi, and A. Zisserman. Fisher Vector Faces in the Wild. In British Machine Vision Conference, 2013.
[23]
J. Sivic and A. Zisserman. Video Google: A text retrieval approach to object matching in videos. In Proceedings of the International Conference on Computer Vision, volume 2, pages 1470--1477, Oct. 2003.
[24]
Y. Sun, Y. Chen, X. Wang, and X. Tang. Deep learning face representation by joint identification-verification. In Proceedings of the International Conference on Neural Information Processing Systems, NIPS'14, pages 1988--1996, Cambridge, MA, USA, 2014. MIT Press.
[25]
Y. Sun, D. Liang, X. Wang, and X. Tang. Deepid3: Face recognition with very deep neural networks. CoRR, abs/1502.00873, 2015.
[26]
Y. Sun, X. Wang, and X. Tang. Deep learning face representation from predicting 10,000 classes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR '14, pages 1891--1898, Washington, DC, USA, 2014. IEEE Computer Society.
[27]
Y. Sun, X. Wang, and X. Tang. Deeply learned face representations are sparse, selective, and robust. CoRR, abs/1412.1265, 2014.
[28]
Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. Deepface: Closing the gap to human-level performance in face verification. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2014.
[29]
Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. Web-scale training for face identification. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015.
[30]
G. Tolias and Y. S. Avrithis. Speeded-up, relaxed spatial matching. In IEEE International Conference on Computer Vision, ICCV 2011, Barcelona, Spain, November 6--13, 2011, pages 1653--1660, 2011.
[31]
J. Wan, D. Wang, S. C. H. Hoi, P. Wu, J. Zhu, Y. Zhang, and J. Li. Deep learning for content-based image retrieval: A comprehensive study. In Proceedings of the ACM International Conference on Multimedia, MM '14, pages 157--166, New York, NY, USA, 2014. ACM.
[32]
L. Wolf, T. Hassner, and I. Maoz. Face recognition in unconstrained videos with matched background similarity. In in Proc. IEEE Conf. Comput. Vision Pattern Recognition, 2011.
[33]
W. Zhang and C.-W. Ngo. Searching visual instances with topology checking and context modeling. In Proceedings of the ACM Conference on International Conference on Multimedia Retrieval, ICMR '13, pages 57--64, New York, NY, USA, 2013. ACM.

Index Terms

  1. Searching a specific person in a specific location using deep features

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    SoICT '16: Proceedings of the 7th Symposium on Information and Communication Technology
    December 2016
    442 pages
    ISBN:9781450348157
    DOI:10.1145/3011077
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 December 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. deep neural network
    2. location search
    3. person search
    4. scene tracking
    5. video instance search

    Qualifiers

    • Research-article

    Funding Sources

    • Vietnam National University HoChiMinh City (VNU-HCM)

    Conference

    SoICT '16

    Acceptance Rates

    SoICT '16 Paper Acceptance Rate 58 of 132 submissions, 44%;
    Overall Acceptance Rate 147 of 318 submissions, 46%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 54
      Total Downloads
    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 13 Sep 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media