Abstract
The analysis of customer behavior from surveillance camera is one of the most important open topics for marketing. Traditionally, retailers use the records of cash registers or credit cards to analyze the buying behaviors of customers. However, this information cannot reveal the behaviors of customer when he or she shows interest on the front of the merchandise shelf but does not buy. Those behaviors can be recorded and analyzed by the surveillance camera. We propose a system to classify different customer behaviors on the front of shelf: no interest, viewing, turning body to shelf, touching, picking and returning to shelf and picking and putting into basket, which show customer’s increasing interest to products. In the proposed system, head orientation, body orientation, and arm action, the multiple cues are integrated for the customer behavior recognition. The proposed system discretizes the head and body orientation of customer into 8 directions to estimate whether the customer is looking or turning to the merchandise shelf. Semi-Supervised Learning method is applied to optimize the training dataset and to generate the accurate classifier. In addition, the temporal constraint and the human physical model constraint are considered in joint body and head orientation estimation. As for the arm action recognition, a novel Combined Hand Feature (CHF), which includes hand trajectory, tracking status and the relative position between hand and shopping basket, is proposed to classify different arm actions. The hand tracking is done by an improved particle filter. The CHF is classified by Dynamic Bayesian Network (DBN) to output different types of arm actions. A series of experiments demonstrate effectiveness of the proposed technologies and the performance to the developed system.
Similar content being viewed by others
References
Abe S, Morimoto M, Fujii K (2010) Estimating face direction from wideview surveillance camera. In World Automation Congress (WAC), 2010 (pp. 1–6). IEEE
Benmokhtar R (2014) Robust human action recognition scheme based on high-level feature fusion. Multimedia Tools Appl 69(2):253–275
Chen C, Heili A, Odobez JM (2011). Combined estimation of location and body pose in surveillance video. In Advanced Video and Signal-Based Surveillance (AVSS), 2011 8th IEEE International Conference on (pp. 5–10). IEEE
Chen F, Wang W (2010) Activity recognition through multi-scale dynamic bayesian network. In Virtual Systems and Multimedia (VSMM), 2010 16th International Conference on (pp. 34–41). IEEE
Choi W, Savarese S (2012) A unified framework for multi-target tracking and collective activity recognition. In computer vision–ECCV. Springer, Berlin Heidelberg, pp 215–230
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on (Vol. 1, pp. 886–893). IEEE
Elmezain M, Al-Hamadi A, Michaelis B (2009) Hand trajectory-based gesture spotting and recognition using HMM. In Image Processing (ICIP), 2009 16th IEEE International Conference on (pp. 3577–3580). IEEE
Gandhi T, Trivedi MM (2008). Image based estimation of pedestrian orientation for improving path prediction. In Intelligent Vehicles Symposium, 2008 I.E. (pp. 506–511). IEEE
Goffredo M, Bouchrika I, Carter JN, Nixon MS (2010) Performance analysis for automated gait extraction and recognition in multi-camera surveillance. Multimedia Tools Appl 50(1):75–94
Gu Y, Kamijo S (2014) Recognition and pose estimation of urban road users from on-board camera for collision avoidance. In Intelligent Transportation Systems (ITSC), 2014 I.E. 17th International Conference on (pp. 1266–1273). IEEE
Haritaoglu I, Beymer D, Flickner M (2002) Ghost 3d: detecting body posture and parts using stereo. In Motion and Video Computing, 2002. Proceedings. Workshop on (pp. 175–180). IEEE
Haritaoglu I, Flickner M (2001) Detection and tracking of shopping groups in stores. In Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 I.E. Computer Society Conference on (Vol. 1, pp. I-431). IEEE
Haritaoglu I, Flickner M (2002) Attentive billboards: towards to video based customer behavior understanding. In Applications of Computer Vision, 2002.(WACV 2002). Proceedings. Sixth IEEE Workshop on (pp. 127–131). IEEE
Hu Y, Cao L, Lv F, Yan S, Gong Y, Huang TS (2009) Action detection in complex scenes with spatial and temporal ambiguities. In Computer Vision, 2009 I.E. 12th International Conference on (pp. 128–135). IEEE
Lao W, Han J, De With PH (2009) Automatic video-based human motion analyzer for consumer surveillance system. Consumer Electronics, IEEE Trans 55(2):591–598
Lee KD, Nam MY, Chung KY, Lee YH, Kang UG (2013) Context and profile based cascade classifier for efficient people detection and safety care system. Multimedia Tools Appl 63(1):27–44
Leykin A, Tuceryan M (2007) Detecting shopper groups in video sequences. In Advanced Video and Signal Based Surveillance, 2007. AVSS 2007. IEEE Conference on (pp. 417–422). IEEE
Liu J, Shah M (2008) Learning human actions via information maximization. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on (pp. 1–8). IEEE
Migniot C, Ababsa F (2013) 3D human tracking from depth cue in a buying behavior analysis context. In Computer Analysis of Images and Patterns (pp. 482–489). Springer Berlin Heidelberg
Murphy KP (2002) Dynamic bayesian networks: representation, inference and learning. Diss. University of California, Berkeley
Niebles JC, Fei-Fei L (2007) A hierarchical model of shape and appearance for human action classification. In Computer Vision and Pattern Recognition, 2007. CVPR’07. IEEE Conference on (pp. 1–8). IEEE
Popa M, Rothkrantz L, Yang Z, Wiggers P, Braspenning R, Shan C (2010) Analysis of shopping behavior based on surveillance system. In Systems Man and Cybernetics (SMC), 2010 I.E. International Conference on (pp. 2512–2519). IEEE
Ryoo MS, Aggarwal JK (2009) Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities. In Computer vision, 2009 ieee 12th international conference on (pp. 1593–1600). IEEE
Ryoo MS, Aggarwal JK (2009) Semantic representation and recognition of continued and recursive human activities. Int J Comput Vis 82(1):1–24
Sae-ueng S, Ogino A, Kato T (2007) Modeling personal preference using shopping behaviors in ubiquitous information environment. DEWS2007, Mar
Schüldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on (Vol. 3, pp. 32–36). IEEE
Schulz A, Damer N, Fischer M, Stiefelhagen R (2011) Combined head localization and head pose estimation for video–based advanced driver assistance systems. In pattern recognition. Springer, Berlin Heidelberg, pp 51–60
Schulz A, Stiefelhagen R (2012) Video-based pedestrian head pose estimation for risk assessment. In Intelligent Transportation Systems (ITSC), 2012 15th International IEEE Conference on (pp. 1771–1776). IEEE
Senior AW, Brown L, Hampapur A, Shu C-F, Zhai Y, Feris RS, Tian Y-L, Borger S, Carlson C (2007) Video analytics for retail. In Advanced Video and Signal Based Surveillance, 2007. AVSS 2007. IEEE Conference on (pp. 423–428)
Shao L, Ji L, Liu Y, Zhang J (2012) Human action segmentation and recognition via motion and shape analysis. Pattern Recogn Lett 33(4):438–445
Shechtman E, Irani M (2005) Space-time behavior based correlation. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on (Vol. 1, pp. 405–412). IEEE
Stan CE, Dumitrescu D, Caras V, Tiliute DE, Pop E, Anghel LE (2008) Intelligent store-an innovative technological solution for retail activities with mobile access. In Computing in the Global Information Technology, 2008. ICCGI’08. The Third International Multi-Conference on (pp. 7–11). IEEE
Trinh H, Fan Q, Pan J, Gabbur P, Miyazawa S, Pankanti S (2011) Detecting human activities in retail surveillance using hierarchical finite state machine. In Acoustics, Speech and Signal Processing (ICASSP), 2011 I.E. International Conference on (pp. 1337–1340). IEEE
Watanabe T, Ito S, Yokoi K (2010) Co-occurrence histograms of oriented gradients for human detection. Information Media Technol 5(2):659–667
Weinland D, Özuysal M, Fua P (2010) Making action recognition robust to occlusions and viewpoint changes. In computer vision–ECCV. Springer, Berlin Heidelberg, pp 635–648
Yano S, Gu Y, Kamijo S (2014) Estimation of pedestrian pose and orientation using on-board camera with histograms of oriented gradients features. International Journal of Intelligent Transportation Systems Research, 1–10
Yao J, Odobez JM (2007) Multi-layer background subtraction based on color and texture. In Computer Vision and Pattern Recognition, 2007. CVPR’07. IEEE Conference on (pp. 1–8). IEEE
Zelnik-Manor L, Irani M (2001) Event-based analysis of video. In Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 I.E. Computer Society Conference on (Vol. 2, pp. II-123). IEEE
Zhang TY, Suen CY (1984) A fast parallel algorithm for thinning digital patterns. Commun ACM 27(3):236–239
Acknowledgments
The authors thank Envirosell Japan, Inc. for providing the test video. The faces of customers are blurred to protect the privacy. This research is permitted by the Compliance Committee of The University of Tokyo.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Liu, J., Gu, Y. & Kamijo, S. Customer behavior classification using surveillance camera for marketing. Multimed Tools Appl 76, 6595–6622 (2017). https://doi.org/10.1007/s11042-016-3342-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-016-3342-1