research-article

Searching a specific person in a specific location using deep features

Authors:

Vinh-Tiep Nguyen,

Minh-Triet Tran,

Duc Anh DuongAuthors Info & Claims

SoICT '16: Proceedings of the 7th Symposium on Information and Communication Technology

Pages 79 - 86

https://doi.org/10.1145/3011077.3011138

Published: 08 December 2016 Publication History

Abstract

Video instance search or also well known as object retrieval is a fundamental task in computer vision field and has a lot of applications. Most state-of-the-art systems are based on the Bag-of-Words model (BOW) for representing video frames and target object. When searching on nearly planar and rich-textured objects such as buildings and book cover, BOW argue to be a suitable model with very high performance. However, when searching on harder but more popular objects such as a specific person, BOW model still keep a lower performance. In this paper, we consider a new type of query which covers most popular topics: searching a person in a specific location. Inspired by recent successes of deep learning techniques, we propose new framework which leverage the powerful of both BOW model and deep feature in instance search. In particular, we use a linear kernel classifier instead of using L₂ distance to compute similarity between two deep features. For further improvement, scene tracking are employed to deal with the cases face of query person is not detected. To evaluate the proposed methods, we conduct experiments over a standard benchmark dataset (TRECVID Instance Search 2016) with more than 300 GB in storage and 464 hours in duration. The results show that, our proposed methods significant improve the baseline system.

References

[1]

R. Arandjelović and A. Zisserman. Three things everyone should know to improve object retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), CVPR '12, pages 2911--2918, Washington, DC, USA, 2012.

Digital Library

[2]

R. Arandjelović and A. Zisserman. All about VLAD. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1578--1585, 2013.

Digital Library

[3]

A. Babenko and V. S. Lempitsky. Aggregating deep convolutional features for image retrieval. CoRR, abs/1510.07493, 2015.

[4]

A. Babenko, A. Slesarev, A. Chigorin, and V. Lempitsky. Neural codes for image retrieval. In D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, editors, Computer Vision - ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6--12, 2014, Proceedings, Part I, pages 584--599. Springer International Publishing, Cham, 2014.

[5]

Y. Cao, C. Wang, Z. Li, L. Zhang, and L. Zhang. Spatial-bag-of-features. In IEEE Conference on Computer Vision and Pattern Recognition, pages 3352--3359, June 2010.

[6]

D. Chen, X. Cao, L. Wang, F. Wen, and J. Sun. Bayesian face revisited: A joint formulation. In Proceedings of the European Conference on Computer Vision - Volume Part III, ECCV'12, pages 566--579, Berlin, Heidelberg, 2012. Springer-Verlag.

Digital Library

[7]

O. Chum, M. Perdoch, A. Mikulik, and J. Matas. Total recall ii: Query expansion revisited. In IEEE Conference on Computer Vision and Pattern Recognition, pages 889--896, Los Alamitos, CA, USA, 2011.

Digital Library

[8]

O. Chum, J. Philbin, J. Sivic, M. Isard, and A. Zisserman. Total recall: Automatic query expansion with a generative feature model for object retrieval. In IEEE International Conference on Computer Vision, 2007.

[9]

R. G. Cinbis, J. Verbeek, and C. Schmid. Unsupervised Metric Learning for Face Identification in TV Video. In ICCV 2011 - International Conference on Computer Vision, pages 1559--1566, Barcelona, Spain, Nov. 2011. IEEE.

Digital Library

[10]

H. Jegou, M. Douze, and C. Schmid. Hamming embedding and weak geometric consistency for large scale image search. In Proceedings of the European Conference on Computer Vision: Part I, ECCV '08, pages 304--317, Berlin, Heidelberg, 2008. Springer-Verlag.

Digital Library

[11]

H. Jégou and A. Zisserman. Triangulation embedding and democratic aggregation for image search. In CVPR - International Conference on Computer Vision and Pattern Recognition, Columbus, United States, June 2014.

Digital Library

[12]

C. Lu and X. Tang. Surpassing human-level face verification performance on lfw with gaussian face. In Proceedings of the AAAI Conference on Artificial Intelligence, AAAI'15, pages 3811--3819. AAAI Press, 2015.

Digital Library

[13]

P. Over, J. Fiscus, G. Sanders, D. Joy, M. Michel, G. Awad, A. Smeaton, W. Kraaij, and G. QuÃl'not. Trecvid 2014 -- an overview of the goals, tasks, data, evaluation mechanisms and metrics. In Proceedings of TRECVID 2014. NIST, USA, 2014.

[14]

O. M. Parkhi, K. Simonyan, A. Vedaldi, and A. Zisserman. A compact and discriminative face track descriptor. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE, IEEE, 2014.

Digital Library

[15]

O. M. Parkhi, A. Vedaldi, and A. Zisserman. Deep face recognition. In British Machine Vision Conference, 2015.

[16]

J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large vocabularies and fast spatial matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2007.

[17]

J. Philbin, M. Isard, J. Sivic, and A. Zisserman. Lost in quantization: Improving particular object retrieval in large scale image databases. In In CVPR, 2008.

[18]

A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson. Cnn features off-the-shelf: An astounding baseline for recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPRW '14, pages 512--519, Washington, DC, USA, 2014. IEEE Computer Society.

Digital Library

[19]

S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. In Neural Information Processing Systems (NIPS), 2015.

Digital Library

[20]

F. Schroff, D. Kalenichenko, and J. Philbin. Facenet: A unified embedding for face recognition and clustering. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015.

[21]

X. Shen, Z. Lin, J. Brandt, S. Avidan, and Y. Wu. Object retrieval and localization with spatially-constrained similarity measure and k-nn re-ranking. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 3013--3020, June 2012.

Digital Library

[22]

K. Simonyan, O. M. Parkhi, A. Vedaldi, and A. Zisserman. Fisher Vector Faces in the Wild. In British Machine Vision Conference, 2013.

[23]

J. Sivic and A. Zisserman. Video Google: A text retrieval approach to object matching in videos. In Proceedings of the International Conference on Computer Vision, volume 2, pages 1470--1477, Oct. 2003.

Digital Library

[24]

Y. Sun, Y. Chen, X. Wang, and X. Tang. Deep learning face representation by joint identification-verification. In Proceedings of the International Conference on Neural Information Processing Systems, NIPS'14, pages 1988--1996, Cambridge, MA, USA, 2014. MIT Press.

Digital Library

[25]

Y. Sun, D. Liang, X. Wang, and X. Tang. Deepid3: Face recognition with very deep neural networks. CoRR, abs/1502.00873, 2015.

[26]

Y. Sun, X. Wang, and X. Tang. Deep learning face representation from predicting 10,000 classes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR '14, pages 1891--1898, Washington, DC, USA, 2014. IEEE Computer Society.

Digital Library

[27]

Y. Sun, X. Wang, and X. Tang. Deeply learned face representations are sparse, selective, and robust. CoRR, abs/1412.1265, 2014.

[28]

Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. Deepface: Closing the gap to human-level performance in face verification. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2014.

Digital Library

[29]

Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. Web-scale training for face identification. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015.

[30]

G. Tolias and Y. S. Avrithis. Speeded-up, relaxed spatial matching. In IEEE International Conference on Computer Vision, ICCV 2011, Barcelona, Spain, November 6--13, 2011, pages 1653--1660, 2011.

Digital Library

[31]

J. Wan, D. Wang, S. C. H. Hoi, P. Wu, J. Zhu, Y. Zhang, and J. Li. Deep learning for content-based image retrieval: A comprehensive study. In Proceedings of the ACM International Conference on Multimedia, MM '14, pages 157--166, New York, NY, USA, 2014. ACM.

Digital Library

[32]

L. Wolf, T. Hassner, and I. Maoz. Face recognition in unconstrained videos with matched background similarity. In in Proc. IEEE Conf. Comput. Vision Pattern Recognition, 2011.

Digital Library

[33]

W. Zhang and C.-W. Ngo. Searching visual instances with topology checking and context modeling. In Proceedings of the ACM Conference on International Conference on Multimedia Retrieval, ICMR '13, pages 57--64, New York, NY, USA, 2013. ACM.

Digital Library

Index Terms

Searching a specific person in a specific location using deep features
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Visual content-based indexing and retrieval

Recommendations

Deep Location-Specific Tracking
MM '17: Proceedings of the 25th ACM international conference on Multimedia

Convolutional Neural Network (CNN) based methods have shown significant performance gains in the problem of visual tracking in recent years. Due to many uncertain changes of objects online, such as abrupt motion, background clutter and large deformation,...
Person search over security video surveillance systems using deep learning methods: A review
Abstract
Person search has become one of the most critical and challenging applications in today's video surveillance systems. It helps in locating a person in surveillance videos, which is plausible only with advanced deep learning models, large scale ...
Highlights
- Examines deep learning methods applied to person search tasks.
- Explores feature representation, loss functions, datasets, and metrics.
- Evaluates image-based person search and person re-identification techniques.
- Summarizes the ...
Towards effective person search with deep learning: A survey from systematic perspective
Abstract
Person search detects and retrieves simultaneously a query person across uncropped scene images captured by multiple non-overlapping cameras. In light of the deep learning advancement, person search has emerged as a promising research direction ...
Highlights
- A novel taxonomy that unifies person search paradigms from a systematic perspective..
- A systematically summarization of contributions of existing person search methods.
- Performance comparisons and an in-depth analysis of effective ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

SoICT '16: Proceedings of the 7th Symposium on Information and Communication Technology

December 2016

442 pages

ISBN:9781450348157

DOI:10.1145/3011077

General Chairs:
Nguyen Manh Hung
NTT University, Vietnam
,
Huynh Quyet Thang
HUST, Vietnam
,
Program Chairs:
Luc De Raedt
KULeuven, Belgium
,
Yves Deville
UCLouvain, Belgium
,
Marc Bui
EPHE, France
,
Truong Thi Dieu Linh
HUST, Vietnam
,
Publications Chairs:
Dinh Viet Sang
HUST, Vietnam
,
Nguyen Hong Phuong
HUST, Vietnam
,
Nguyen Thi Oanh
HUST, Vietnam

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 December 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Vietnam National University HoChiMinh City (VNU-HCM)

Conference

SoICT '16

SoICT '16: Seventh International Symposium on Information and Communication Technology

December 8 - 9, 2016

Ho Chi Minh City, Vietnam

Acceptance Rates

SoICT '16 Paper Acceptance Rate 58 of 132 submissions, 44%;

Overall Acceptance Rate 147 of 318 submissions, 46%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
54
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents