Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content

Using context from inside‐out vision for improved activity recognition

Published: 03 January 2018 Publication History


The authors propose a method to improve activity recognition by including the contextual information from first person vision (FPV). Adding the context, i.e. objects seen while performing an activity, increases the activity recognition precision. This is because, in goal‐oriented tasks, human gaze precedes the action and tends to focus on relevant objects. They extract object information from FPV images and combine it with the activity information from external or FPV videos to train an Artificial Neural Network (ANN). They used four configurations as combination of gaze/eye‐tracker, head‐mounted and externally mounted cameras using three standard cooking datasets from Georgia Tech Egocentric Activities Gaze, Technische Universität München kitchen and CMU multi‐modal activity database. Adding object information when training the ANN increased the precision and accuracy of activity recognition from average 58.02% (and 89.78%) to 74.03% (and 93.42%). Experiments also showed that when objects are not considered, having an external camera is necessary. However, when objects are considered, the combination of internal and external cameras is optimal because of their complementary advantages in observing hand and objects. Adding object information also decreases ANN training cycles from 513.25 to 139, which shows that it provides critical information that speeds up training.

6 References

Mennie, M.N., Rusted, J.: ‘The roles of vision and eye movements in the control of activities of daily living’, Perception, 1999, 28, (11), pp. 1311–1328
Land, M.F.: ‘Vision, eye movements, and natural behavior’, Vis. Neurosci., 2009, 26, (1), pp. 51–62
Land, M.F., Hayhoe, M.: ‘In what ways do eye movements contribute to everyday activities?’, Vis. Res., 2001, 41, (25‐26), pp. 3559–3565. Available at http://www.sciencedirect.com/science/article/pii/S004269890100102X
Yarbus, A.L.: ‘Eye movements and vision’ (Plenum, 1967)
Duchowski, A.T.: ‘A breadth‐first survey of eye‐tracking applications’, Behav. Res. Methods Instrum. Comput., 2002, 34, (4), pp. 455–470. Available at https://doi.org/10.3758/BF03195475
Lupu, R.G., Ungureanu, F.: ‘A survey of eye tracking methods and applications’, 2013, 29 August 2013, Available at http://www12.tuiasi.ro/users/103/071‐086_006_Lupu_.pdf
Kanade, T.: ‘First‐person, inside‐out vision’. Keynote Speech, First Workshop on Egocentric Vision in Conjunction with CVPR, 2009
Betancourt, A., Morerio, P., Regazzoni, C.S., et al: ‘The evolution of first person vision methods: a survey’, IEEE Trans. Circuits Syst. Video Technol., 2015, 25, (5), pp. 744–760
Tacca, M.C.: ‘Commonalities between perception and cognition’, Front. Psychol., 2011, 2, (358), pp. 1–10
Kang, H., Efros, A.A., Hebert, M., et al: ‘Image matching in large scale environments’. IEEE Conf. Computer Vision and Pattern Recognition (CVPR) Workshop on Egocentric Vision, ser. IVCNZ ‘12, 2009, pp. 67–72. Available at http://doi.acm.org/10.1145/2425836.2425852
Sun, L., Klank, U., Beetz, M.: ‘Eyewatchme – 3D hand and object tracking for inside out activity analysis’. 2009 IEEE Computer Society Conf. Computer Vision and Pattern Recognition Workshops, June 2009, pp. 9–16
Hayhoe, M., Ballard, D.: ‘Eye movements in natural behavior’, J. Trends Cogn. Sci., 2005, 9, (4), pp. 188–194
Zhang, S., Wei, Z., Nie, J., et al: ‘A review on human activity recognition using vision‐based method’, J. Healthc. Eng., 2017, 2017, pp. 31, 3090343
Aggarwal, J.K., Ryoo, M.S., Kitani, K.M.: ‘Frontiers of human activity analysis’, Tutor. Hum. Activity Recognit. (CVPR), 2011, 43, (3), pp. 1–43
Plötz, T., Hammerla, N.Y., Olivier, P.: ‘Feature learning for activity recognition in ubiquitous computing’. Proc. 22nd Int. Joint Conf. Artificial Intelligence, ser. IJCAI'11, 2011, vol. 2, pp. 1729–1734. Available at https://doi.org/10.5591/978-1-57735-516-8/IJCAI11-290
Wang, X., Ji, Q.: ‘Video event recognition with deep hierarchical context model’. 2015 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), June 2015, pp. 4418–4427
Hasan, M., Chowdhury, R., Amit, K.: ‘Continuous learning of human activity models using deep nets’. Computer Vision – ECCV 2014: 13th European Conf. Proc., Part III, Zurich, Switzerland, 6–12 September, 2014, pp. 705–720
Rahul, K., Vinod, K., Fnu, R., et al: ‘Multiview fusion for activity recognition using deep neural networks’, J. Electron. Imaging, 2016, 25, (4), pp. 043010, Available at https://doi.org/10.1117/1.JEI.25.4.043010
Kwapisz, J.R., Weiss, G.M., Moore, S.A.: ‘Activity recognition using cell phone accelerometers’, ACM SIGKDD Explor. Newsl., 2010, 12, (2), pp. 74–82. Available at http://doi.acm.org/10.1145/1964897.1964918
Kalaivani, P., Vimala, S.: ‘Human action recognition using background subtraction method’, Int. Res. J. Eng. Technol. (IRJET), 2015, 2, (3), pp. 1032–1035
Wilson, D.H., Atkeson, C.: ‘Simultaneous tracking and activity recognition (star) using many anonymous, binary sensors’. Proc. Third Int. Conf. Pervasive Computing, ser. PERVASIVE'05, 2005, pp. 62–79. Available at https://doi.org/10.1007/11428572_5
Liao, L., Fox, D., Kautz, H.: ‘Location‐based activity recognition using relational Markov networks’. Proc. 19th Int. Joint Conf. Artificial Intelligence, ser. IJCAI'05, 2005, pp. 773–778. Available at http://dl.acm.org/citation.cfm?id=1642293.1642417
Liao, L., Fox, D., Henry, K.: ‘Location‐based activity recognition’, in Weiss, Y., Schölkopf, P.B., Platt, J.C. (Eds.): ‘Advances in neural information processing systems 18’ (MIT Press, 2006), pp. 787–794. Available at http://papers.nips.cc/paper/2911‐location‐based‐activity‐recognition.pdf
Ariki, Y., Tonaru, T., Takiguchi, T.: ‘Human action recognition using HDP by integrating motion and location information’ (Springer Berlin Heidelberg, Berlin, Heidelberg, 2010), pp. 291–300. Available at https://doi.org/10.1007/978-3-642-12304-7_28
Wu, C., Khalili, A.H., Aghajan, H.: ‘Multiview activity recognition in smart homes with spatio‐temporal features’. Proc. Fourth ACM/IEEE Int. Conf. Distributed Smart Cameras, ser. ICDSC ‘10, 2010, pp. 142–149. Available at http://doi.acm.org/10.1145/1865987.1866010
Ikizler‐Cinbis, N., Sclaroff, S.: ‘Object, scene and actions: combining multiple features for human action recognition’. Proc. 11th European Conf. Computer Vision: Part I, ser. ECCV'10, 2010, pp. 494–507. Available at http://dl.acm.org/citation.cfm?id=1886063.1886101
Segundo, R.S., Montero, J.M., Pimentel, J.M., et al: ‘HMM adaptation for improving a human activity recognition system’, Algorithms, 2016, 9, (3), pp. 13, 60
Kim, E., Helal, S., Cook, D.: ‘Human activity recognition and pattern discovery’, IEEE Pervasive Comput., 2010, 9, (1), pp. 48–53
Jatoba, L.C., Grossmann, U., Kunze, C., et al: ‘Context aware mobile health monitoring: evaluation of different pattern recognition methods for classification of physical activity’. 30th Annual Int. Conf. IEEE Engineering in Medicine and Biology Society, 2008
Anguita, D., Ghio, A., Oneto, L., et al: ‘Energy efficient smartphone‐based activity recognition using fixed‐point arithmetic’, J. Univ. Comput. Sci., 2013, 19, pp. 1295–1314
Maurer, U., Smailagic, A., Siewiorek, D.P., et al: ‘Activity recognition and monitoring using multiple sensors on different body positions’. Proc. Int. Workshop on Wearable and Implantable Body Sensor Networks (BSNŠ06), 2006
Kamal, S., Jalal, A., Kim, D.: ‘Depth images‐based human detection, tracking and activity recognition using spatiotemporal features and modified HMM’, J. Electr. Eng. Technol., 2016, 11, (3), pp. 1921–1926
Yang, J.: ‘Toward physical activity diary: motion recognition using simple acceleration features with mobile phones’. Proc. First ACM Int. Workshop on Interactive Multimedia for Consumer Electronics (IMCE Š09), 2009
Goodfellow, I., Bengio, Y., Courville, A.: ‘Deep learning’ (MIT Press, 2016). Available at http://www.deeplearningbook.org
Noor, S., Uddin, V.: ‘Using ANN for multi‐view activity recognition in indoor environment’. 2016 Int. Conf. Frontiers of Information Technology (FIT), December 2016, pp. 258–263
Bourobou, S.T.M., Yoo, Y.: ‘User activity recognition in smart homes using pattern clustering applied to temporal ANN algorithm’, Sensors, 2015, 15, pp. 11953–11971
Yao, M.: ‘Understanding the limits of deep learning’, 2017. Available at https://venturebeat.com/2017/04/02/understanding‐the‐limits‐of‐deep‐learning/
Kadous, W.: ‘What are the advantages and disadvantages of deep learning? Can you compare it with the statistical learning theory?’, Promoted by NYC Data Science Academy
Collobert, R., Bengio, S.: ‘Links between perceptrons, MLPs and SVMs’. Proc. 21st Int. Conf. Machine Learning, series ICML ‘04, 2004, p. 23. Available at http://doi.acm.org/10.1145/1015330.1015415
Bengio, Y., LeCun, Y.: ‘Scaling learning algorithms towards AI’, in Bottou, L., Chapelle, O., DeCoste, D., et al. (Eds.): ‘Large‐scale kernel machines’ (MIT Press, 2007)
Salzberg, S.L.: ‘C4.5: programs for machine learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993’, Mach. Learn., 1994, 16, (3), pp. 235–240. Available at https://doi.org/10.1007/BF00993309
Jacob, R.J.K., Karn, K.S.: ‘Eye tracking in human–computer interaction and usability research: ready to deliver the promises’, J. Mind's Eye, Cogn. Appl. Aspects Eye Mov. Res., 2003, 2, pp. 573–605
Toet, A.: ‘Gaze directed displays as an enabling technology for attention aware systems’, J. Comput. Hum. Behav., 2006, 22, (4), pp. 615–647
Zelinsky, G.J., Rao, R.P.N., Hayhoe, M.M., et al: ‘Eye movements reveal the spatiotemporal dynamics of visual search’, J. Assoc. Psychol. Sci., 1997, 8, (6), pp. 448–453
Fathi, A., Li, Y., Rehg, J.M.: ‘Learning to recognize daily actions using gaze’ (Springer Berlin Heidelberg, Berlin, Heidelberg, 2012), pp. 314–327. Available at https://doi.org/10.1007/978-3-642-33718-5_23
Ramirez‐Amaro, K., Minhas, H.N., Zehetleitner, M., et al: ‘Added value of gaze‐exploiting semantic representation to allow robots inferring human behaviors’, ACM Trans. Interact. Intell. Syst., 2017, 7, (1), pp. 5:1–5:30. Available at http://doi.acm.org/10.1145/2939381
De la Torre, F., Hodgins, J., Bargteil, A., et al: ‘Guide to the Carnegie Mellon University multimodal activity (CMU‐MMAC) database’. Technical Report CMU‐RI‐TR‐08‐22, Carnegie Mellon University, 2008. Available at http://repository.cmu.edu/robotics
Pirsiavash, H., Ramanan, D.: ‘Detecting activities of daily living in first‐person camera views’. Proc. 2012 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), ser. CVPR ‘12, 2012, pp. 2847–2854. Available at http://dl.acm.org/citation.cfm?id=2354409.2355089
Xia, L., Gori, I., Aggarwal, J.K., et al: ‘Robot‐centric activity recognition from first‐person RGB‐D videos’. 2015 IEEE Winter Conf. Applications of Computer Vision, January 2015, pp. 357–364
Ma, M., Fan, H., Kitani, K.M.: ‘Going deeper into first‐person activity recognition’, CoRR, 2016, pp. 1894–1903, 27–30 June 2016, abs/1605.03688. Available at http://arxiv.org/abs/1605.03688
Ryoo, M.S., Matthies, L.: ‘First‐person activity recognition: what are they doing to me’. CVPR, 2013
Matsuo, K., Yamada, K., Ueno, S., et al: ‘An attention‐based activity recognition for egocentric video’. 2014 IEEE Conf. Computer Vision and Pattern Recognition Workshops, June 2014, pp. 565–570
Turaga, P., Chellappa, R., Subrahmanian, V.S., et al: ‘Machine recognition of human activities: a survey’, IEEE Trans. Circuits Syst. Video Technol., 2008, 18, (11), pp. 1473–1488
Ru, K.S., Uyen, T.H.L., Jin, L.Y., et al: ‘A review on video‐based human activity recognition’, Comput. Open Access J., 2013, 2, (2), pp. 88–131. Available at www.mdpi.com/journal/computers
Jasmine, R.R., Thyagharajan, K.K.: ‘Study on recent approaches for human action recognition in real time’, Int. J. Eng. Res. Technol. (IJERT), 2015, 4, (8), pp. 660–664
Cheng, G., Wan, Y., Saudagar, A.N., et al: ‘Advances in human action recognition: a survey’, CoRR, 2015, pp. 30, 23 January 2017, abs/1501.05964. Available at http://arxiv.org/abs/1501.05964
Schuldt, C., Laptev, I., Caputo, B.: ‘Recognizing human actions: a local SVM approach’. Proc. 17th Int. Conf. Pattern Recognition (ICPRŠ04), 2004
Gross, R., Shi, J.: ‘The CMU motion of body (MoBo) database’. Technical Report CMU‐RI‐TR‐01‐18, Pittsburgh, PA, June 2001
Laptev, I., Marszalek, M., Schmid, C., et al: ‘Learning realistic human actions from movies’. IEEE Conf. Computer Vision and Pattern Recognition 2008 CVPR 2008, 2008, pp. 1–8
Marszalek, M., Laptev, I., Schmid, C.: ‘Actions in context’. IEEE Conf. Computer Vision and Pattern Recognition 2009 (CVPR 2009), 2009, pp. 2929–2936
Ren, X., Philipose, M.: ‘Egocentric recognition of handled objects: benchmark and analysis’. IEEE Computer Society Conf. Computer Vision and Pattern Recognition Workshops 2009 CVPR Workshops 2009, June 2009, pp. 1–8
Fathi, A., Ren, X., Rehg, J.M.: ‘Learning to recognize objects in egocentric activities’. 2011 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), June 2011, pp. 3281–3288
Tan, C., Goh, H., Chandrasekhar, V., et al: ‘Understanding the nature of first‐person videos: characterization and classification using low‐level features’. 2014 IEEE Conf. Computer Vision and Pattern Recognition Workshops (CVPRW), June 2014, pp. 549–556
Lowe, D.G.: ‘Object recognition from local scale‐invariant features’. Proc. Int. Conf. Computer Vision, ser. ICCV ‘99, 1999, vol. 2, p. 1150. Available at http://dl.acm.org/citation.cfm?id=850924.851523
Calonder, M., Lepetit, V., Ozuysal, M., et al: ‘BRIEF: computing a local binary descriptor very fast’, IEEE Trans. Pattern Anal. Mach. Intell., 2012, 34, (7), pp. 1281–1298
Mair, E., Hager, G., Burschka, D., et al: ‘Adaptive and generic corner detection based on the accelerated segment test’. European Conf. Computer Vision (ECCV2010), 2010 (LNCS, 6312), pp. 183–196
Bay, H., Tuytelaars, T., Gool, L.V.: ‘SURF: speeded‐up robust features’, Comput. Vis. Image Underst., 2008, 110, pp. 346–359
Oyallon, E., Rabin, J.: ‘An analysis and implementation of the SURF method, and its comparison to SIFT’, Image Process. Online (Preprint), 2013, 5, pp. 176–218
Bauer, J., Sunderhauf, N., Protzel, P.: ‘Comparing several implementations of two recently published feature detectors’. Proc. Int. Conf. Intelligent and Autonomous Systems, 2007
Rublee, E., Rabaud, V., Konolige, K., et al: ‘ORB: an efficient alternative to SIFT or SURF’. 2011 IEEE Int. Conf. Computer Vision (ICCV), November 2011, pp. 2564–2571
Mikolajczyk, K., Schmid, C.: ‘A performance evaluation of local descriptors’, IEEE Trans. Pattern Anal. Mach. Intell., 2005, 27, (10), pp. 1615–1630. Available at http://lear.inrialpes.fr/pubs/2005/MS05
Kanade, T., Hebert, M.: ‘First‐person vision’, Proc. IEEE, 2012, 100, (8), pp. 2442–2453
Kang, H., Efros, A.A., Hebert, M., et al: ‘Image composition for object pop‐out’. IEEE 12th Int. Conf. Computer Vision Workshops (ICCV Workshops), September 2009, pp. 681–688
Noor, S., Uddin, V.: ‘MapReduce for multi‐view object recognition’. 2016 Int. Conf. High Performance Computing & Simulation (HPCS 2016), 2016
Noor, S., Uddin, V.: ‘MapReduce for high‐speed feature identification for computer vision applications’. First Int. Conf. Innovative Computing (ICIC), 2016
Devyver, M.S., Akihiro, T., Takeo, K.: ‘A wearable device for first person vision’. Third Int. Symp. Quality of Life Technology, FICCDAT Workshop, 2011
Noris, B., Keller, J.‐B., Billard, A.: ‘A wearable gaze tracking system for children in unconstrained environments’, Elsevier Comput. Vis. Image Underst., 2011, 115, (4), pp. 476–486
Schneider, E., Dera, T., Bard, K., et al: ‘Eye movement driven head‐mounted camera: it looks where the eyes look’. 2005 IEEE Int. Conf. Systems, Man and Cybernetics, October 2005, vol. 3, pp. 2437–2442
Schneider, E., Villgrattner, T., Vockeroth, J., et al: ‘Eyeseecam: an eye movement‐driven head camera for the examination of natural visual exploration’, Ann. New York Acad. Sci., 2009, 1164, pp. 461–467
Schneider, E., Bartl, K., Dera, T., et al: ‘Gaze‐aligned head‐mounted camera with pan, tilt, and roll motion control for medical documentation and teaching applications’. 2006 IEEE Int. Conf. Systems, Man and Cybernetics, October 2006, vol. 1, pp. 327–331
Schmitow, C., Stenberg, G., Billard, A., et al: ‘Using a head‐mounted camera to infer attention direction’, Int. J. Behav. Dev., 2013, 37, (5), pp. 468–474
Lutz, O.H.‐M., Burmeister, C., dos Santos, L.F., et al: ‘Application of head‐mounted devices with eye‐tracking in virtual reality therapy’, Curr. Dir. Biomed. Eng., 2017, 3, (1), pp. 53–56
Franchak, J.M., Kretch, K.S., Soska, K.C., et al: ‘Head‐mounted eyetracking: a new method to describe infant looking’, HHS Public Access, 2011, 82, (6), pp. 1738–1750
Breeze, J.: ‘A detailed examination of the benefits of eye tracking’, Eye Tracking: Best Way to Test Rich App Usability, 2011, 8 December 2011, Available at http://uxmag.com/articles/eye‐tracking‐the‐best‐way‐to‐test‐rich‐app‐usability
Bar, M.: ‘Visual objects in context’, Nat. Rev. Neurosci., 2004, 5, pp. 617–629
Biederman, I.: ‘Visual object recognition’, in Kosslyn, M., Osherson, D.N. (Eds.): ‘An invitation to cognitive science: visual cognition’, vol. 2 (1995, 2nd edn.), pp. 121–165
Davenport, J., Potter, M.: ‘Scene consistency in object and background perception’, Psychol. Sci., 2004, 15, (8), pp. 559–564
Oliva, A., Torralba, A.: ‘Modeling the shape of the scene: a holistic representation of the spatial envelope’, Int. J. Comput. Vis., 2001, 42, (3), pp. 145–175. Available at https://doi.org/10.1023/A:1011139631724
Torralba, A., Oliva, A., Castelhano, M.S., et al: ‘Contextual guidance of eye movements and attention in real‐world scenes: the role of global features in object search’, Psychol. Rev., 2006, 113, (4), pp. 766–786
Noor, S., Uddin, V.: ‘Using ANN for multi‐view activity recognition in indoor environment’. 14th Int. Conf. Frontier of Information Technology, 2016
Lowe, D.G.: ‘Distinctive image features from scale‐invariant keypoints’, Int. J. Comput. Vis., 2004, 60, (2), pp. 91–110. Available at https://doi.org/10.1023/B:VISI.0000029664.99615.94
Hinton, G.E.: ‘Learning multiple layers of representation’, Trends Cogn. Sci., 2007, 11, (10), pp. 428–434
Taylor, G.W., Fergus, R., LeCun, Y., et al: ‘Convolutional learning of spatio‐temporal features’. Computer Vision – ECCV 2010: 11th European Conf. Computer Vision, Proc., Part VI, Heraklion, Crete, Greece, 5–11 September 2010, 2010, pp. 140–153. Available at https://doi.org/10.1007/978-3-642-15567-3_11
Laptev, I.: ‘On space–time interest points’, Int. J. Comput. Vis., 2005, 64, (2/3), pp. 107–123. Available at https://doi.org/10.1007/s11263-005-1838-7
Yang, J., Yu, K., Gong, Y., et al: ‘Linear spatial pyramid matching using sparse coding for image classification’. IEEE Conf. Computer Vision and Pattern Recognition, 2009 (CVPR 2009), June 2009, pp. 1794–1801
Harris, C., Stephens, M.: ‘A combined corner and edge detector’. Proc. Fourth Alvey Vision Conf., 1988, pp. 147–151
Dalal, N., Triggs, B.: ‘Histograms of oriented gradients for human detection’. 2005 IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR'05), June 2005, vol. 1, pp. 886–893
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: ‘Learning representations by back‐propagating errors’, Nat. Int. Wkly. J. Sci., 1986, 323, pp. 533–536



Information & Contributors


Published In

cover image IET Computer Vision
IET Computer Vision  Volume 12, Issue 3
April 2018
131 pages
Issue’s Table of Contents


John Wiley & Sons, Inc.

United States

Publication History

Published: 03 January 2018

Author Tags

  1. gaze tracking
  2. neural nets
  3. cameras
  4. gesture recognition

Author Tags

  1. inside-out vision
  2. improved activity recognition
  3. contextual information
  4. first-person vision
  5. activity recognition precision
  6. human gaze
  7. object information
  8. FPV images
  9. FPV videos
  10. artificial neural network
  11. gaze-eye-tracker
  12. head-mounted cameras
  13. externally mounted cameras
  14. standard cooking datasets
  15. Georgia Tech Egocentric Activities Gaze
  16. Technische Universitat Munchen kitchen
  17. CMU multimodal activity database
  18. ANN training cycles


  • Research-article


Other Metrics

Bibliometrics & Citations


Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Feb 2025

Other Metrics


View Options

View options






Share this Publication link

Share on social media