research-article

M2P3: multimodal multi-pedestrian path prediction by self-driving cars with egocentric vision

Authors:

Atanas Poibrenski,

Matthias Klusch,

Christian MüllerAuthors Info & Claims

SAC '20: Proceedings of the 35th Annual ACM Symposium on Applied Computing

Pages 190 - 197

https://doi.org/10.1145/3341105.3373877

Published: 30 March 2020 Publication History

Abstract

Accurate prediction of the future position of pedestrians in traffic scenarios is required for safe navigation of an autonomous vehicle but remains a challenge. This concerns, in particular, the effective and efficient multimodal prediction of most likely trajectories of tracked pedestrians from egocentric view of self-driving car. In this paper, we present a novel solution, named M2P3, which combines a conditional variational autoencoder with recurrent neural network encoder-decoder architecture in order to predict a set of possible future locations of each pedestrian in a traffic scene. The M2P3 system uses a sequence of RGB images delivered through an internal vehicle-mounted camera for egocentric vision. It takes as an input only two modes, that are past trajectories and scales of pedestrians, and delivers as an output the three most likely paths for each tracked pedestrian. Experimental evaluation of the proposed architecture on the JAAD and ETH/UCY datasets reveal that the M2P3 system is significantly superior to selected state-of-the-art solutions.

References

[1]

A. Alahi, K. Goel, V. Ramanathan, A. Robicquet, L. Fei-Fei, and S. Savarese. 2016. Social LSTM: Human Trajectory Prediction in Crowded Spaces. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 961--971.

[2]

Shervin Ardeshir and Ali Borji. 2016. Ego2Top: Matching Viewers in Egocentric and Top-view Videos. In ECCV.

[3]

Sven Bambach, Stefan Lee, David J. Crandall, and Chen Yu. 2015. Lending A Hand: Detecting Hands and Recognizing Activities in Complex Egocentric Interactions. 2015 IEEE International Conference on Computer Vision (ICCV) (2015), 1949--1957.

[4]

Gedas Bertasius, Hyun Soo Park, Stella X. Yu, and Jianbo Shi. 2017. First-Person Action-Object Detection with EgoNet. ArXiv abs/1603.04908 (2017).

[5]

Apratim Bhattacharyya, Mario Fritz, and Bernt Schiele. 2018. Long-Term On-board Prediction of People in Traffic Scenes Under Uncertainty. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018), 4194--4202.

[6]

Minjie Cai, Kris Makoto Kitani, and Yoichi Sato. 2015. A scalable approach for understanding the visual structures of hand grasps. 2015 IEEE International Conference on Robotics and Automation (ICRA) (2015), 1360--1366.

[7]

François Chollet et al. 2015. Keras. https://keras.io.

[8]

S. Danielsson, L. Petersson, and A. Eidehall. 2007. Monte Carlo based Threat Assessment: Analysis and Improvements. In 2007 IEEE Intelligent Vehicles Symposium. 233--238.

[9]

D. Ellis, E. Sommerlade, and I. Reid. 2009. Modelling pedestrian trajectory patterns with Gaussian processes. In 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops. 1229--1234.

[10]

Chenyou Fan, Jangwon Lee, and Michael S. Ryoo. 2017. Forecasting Hand and Object Locations in Future Frames. CoRR abs/1705.07328 (2017). arXiv:1705.07328 http://arxiv.org/abs/1705.07328

[11]

C. Fan, J. Lee, M. Xu, K. K. Singh, Y. J. Lee, D. J. Crandall, and M. S. Ryoo. 2017. Identifying First-Person Camera Wearers in Third-Person Videos. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4734--4742.

[12]

Zhijie Fang and Antonio M. López. 2018. Is the Pedestrian going to Cross? Answering by 2D Pose Estimation. 2018 IEEE Intelligent Vehicles Symposium (IV) (2018), 1271--1276.

Digital Library

[13]

Alireza Fathi, Ali Farhadi, and James M. Rehg. 2011. Understanding egocentric activities. 2011 International Conference on Computer Vision (2011), 407--414.

Digital Library

[14]

J. Firl, H. StÃijbing, S. A. Huss, and C. Stiller. 2012. Predictive maneuver evaluation for enhancement of Car-to-X mobility data. In 2012 IEEE Intelligent Vehicles Symposium. 558--564.

[15]

Katerina Fragkiadaki, Jonathan Huang, Alex Alemi, Sudheendra Vijayanarasimhan, Susanna Ricco, and Rahul Sukthankar. 2017. Motion prediction under multimodality with conditional stochastic networks. arXiv preprint arXiv:1705.02082 (2017).

[16]

Antonino Furnari, Sebastiano Battiato, Kristen Grauman, and Giovanni Maria Farinella. 2017. Next-Active-Object prediction from Egocentric Videos. ArXiv abs/1904.05250 (2017).

[17]

Agrim Gupta, Justin Johnson, Li Fei-Fei, Silvio Savarese, and Alexandre Alahi. 2018. Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks. CoRR abs/1803.10892 (2018). http://dblp.uni-trier.de/db/journals/corr/corr1803.html#abs-1803-10892

[18]

Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross B. Girshick. 2017. Mask R-CNN. CoRR abs/1703.06870 (2017). arXiv:1703.06870 http://arxiv.org/abs/1703.06870

[19]

Yedid Hoshen and Shmuel Peleg. 2014. Egocentric Video Biometrics. CoRR abs/1411.7591 (2014). arXiv:1411.7591 http://arxiv.org/abs/1411.7591

[20]

Zhuxi Jiang, Yin Zheng, Huachun Tan, Bangsheng Tang, and Hanning Zhou. 2016. Variational Deep Embedding: An Unsupervised and Generative Approach to Clustering. In IJCAI.

[21]

Rudolf E. Kálmán. 1960. A New Approach to Linear Filtering and Prediction.

[22]

Vasiliy Karasev, Alper Ayvaci, Bernd Heisele, and Stefano Soatto. 2016. Intent-aware long-term prediction of pedestrian motion. 2016 IEEE International Conference on Robotics and Automation (ICRA) (2016), 2543--2549.

Digital Library

[23]

Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. CoRR abs/1412.6980 (2015).

[24]

Diederik P. Kingma and Max Welling. 2014. Auto-Encoding Variational Bayes. CoRR abs/1312.6114 (2014).

[25]

Iuliia Kotseruba, Amir Rasouli, and John K. Tsotsos. 2016. Joint Attention in Autonomous Driving (JAAD). arXiv e-prints, Article arXiv:1609.04741 (Sep 2016), arXiv:1609.04741 pages. arXiv:cs.RO/1609.04741

[26]

Laura Leal-TaixÃl', Michele Fenzi, Alina Kuznetsova, Bodo Rosenhahn, and Silvio Savarese. 2014. Learning an Image-Based Motion Context for Multiple People Tracking.

Digital Library

[27]

Namhoon Lee, Wongun Choi, Paul Vernaza, Christopher Bongsoo Choy, Philip H. S. Torr, and Manmohan Krishna Chandraker. 2017. DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents. CoRR abs/1704.04394 (2017). arXiv:1704.04394 http://arxiv.org/abs/1704.04394

[28]

Yong Jin Lee, Joydeep Ghosh, and Kristen Grauman. 2012. Discovering important people and objects for egocentric video summarization. 2012 IEEE Conference on Computer Vision and Pattern Recognition (2012), 1346--1353.

[29]

Yong Jin Lee and Kristen Grauman. 2014. Predicting Important Objects for Egocentric Video Summarization. International Journal of Computer Vision 114 (2014), 38--55.

Digital Library

[30]

S. LefÃĺvre, C. Laugier, and J. IbaÃśez-GuzmÃąn. 2011. Exploiting map information for driver intention estimation at road intersections. In 2011 IEEE Intelligent Vehicles Symposium (IV). 583--588.

[31]

Cheng Yen Li and Kris M. Kitani. 2013. Pixel-Level Hand Detection in Ego-centric Videos. 2013 IEEE Conference on Computer Vision and Pattern Recognition (2013), 3570--3577.

[32]

Yin Li, Alireza Fathi, and James M. Rehg. 2013. Learning to Predict Gaze in Egocentric Video. In Proceedings of the 2013 IEEE International Conference on Computer Vision (ICCV '13). IEEE Computer Society, Washington, DC, USA, 3216--3223.

Digital Library

[33]

Yin Li, Zhefan Ye, and James M. Rehg. 2015. Delving into egocentric actions. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015), 287--295.

[34]

Junwei Liang, Lu Jiang, Juan Carlos Niebles, Alexander G. Hauptmann, and Li Fei-Fei. 2019. Peeking into the Future: Predicting Future Person Activities and Locations in Videos. CoRR abs/1902.03748 (2019). arXiv:1902.03748 http://arxiv.org/abs/1902.03748

[35]

Manuel Lopez-Martin, Belen Carro, Antonio Sanchez-Esguevillas, and Jaime Lloret. 2017. Conditional variational autoencoder for prediction and feature recovery applied to intrusion detection in iot. Sensors 17, 9 (2017), 1967.

[36]

Minghuang Ma, Haoqi Fan, and Kris Makoto Kitani. 2016. Going Deeper into First-Person Activity Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), 1894--1903.

[37]

Yuexin Ma, Xinge Zhu, Sibo Zhang, Ruigang Yang, Wenping Wang, and Dinesh Manocha. 2019. TrafficPredict: Trajectory Prediction for Heterogeneous Traffic-Agents. ArXiv abs/1811.02146 (2019).

[38]

Dimitrios Makris and Tim J. Ellis. 2002. Spatial and Probabilistic Modelling of Pedestrian Behaviour. In BMVC.

[39]

Ashish Mishra, Shiva Krishna Reddy, Anurag Mittal, and Hema A Murthy. 2018. A generative model for zero shot learning using conditional variational autoencoders. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2188--2196.

[40]

Sang Min Oh, James M. Rehg, Tucker R. Balch, and Frank Dellaert. 2007. Learning and Inferring Motion Patterns using Parametric Segmental Switching Linear Dynamic Systems. International Journal of Computer Vision 77 (2007), 103--124.

Digital Library

[41]

Hyun Soo Park, Jyh-Jing Hwang, Yedong Niu, and Jianbo Shi. 2016. Egocentric Future Localization. (June 2016).

[42]

Alexandre Alahi Parth Kothari. 2019. Human Trajectory Prediction using Adversarial Loss.

[43]

Stefano Pellegrini, Andreas Ess, Konrad Schindler, and Luc Van Gool. 2009. You'll never walk alone: Modeling social behavior for multi-target tracking. 2009 IEEE 12th International Conference on Computer Vision (2009), 261--268.

[44]

Hamed Pirsiavash and Deva Ramanan. 2012. Detecting activities of daily living in first-person camera views. 2012 IEEE Conference on Computer Vision and Pattern Recognition (2012), 2847--2854.

[45]

Carl Edward Rasmussen and Christopher K. I. Williams. 2005. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press.

Digital Library

[46]

Amir Rasouli, Iuliia Kotseruba, and John K. Tsotsos. 2017. Agreeing to cross: How drivers and pedestrians communicate. 2017 IEEE Intelligent Vehicles Symposium (IV) (2017), 264--269.

Digital Library

[47]

A. Rasouli and J. K. Tsotsos. 2019. Autonomous Vehicles That Interact With Pedestrians: A Survey of Theory and Practice. IEEE Transactions on Intelligent Transportation Systems (2019), 1--19.

[48]

Nicholas Rhinehart and Kris Makoto Kitani. 2017. First-Person Activity Forecasting with Online Inverse Reinforcement Learning. 2017 IEEE International Conference on Computer Vision (ICCV) (2017), 3716--3725.

[49]

A. V. I. Rosti and M. J. F. Gales. 2004. Rao-Blackwellised Gibbs sampling for switching linear dynamical systems. In 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 1. 1--809.

[50]

A. Rudenko and et al. 2019. Human Motion Trajectory Prediction: A Survey. In arXiv preprint arXiv:1905.06113.

[51]

Amir Sadeghian, Vineet Kosaraju, Ali Sadeghian, Noriaki Hirose, and Silvio Savarese. 2018. SoPhie: An Attentive GAN for Predicting Paths Compliant to Social and Physical Constraints. CoRR abs/1806.01482 (2018). arXiv:1806.01482 http://arxiv.org/abs/1806.01482

[52]

A. Saran, D. Teney, and K. M. Kitani. 2015. Hand parsing for fine-grained recognition of human grasps in monocular images. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 5052--5058.

Digital Library

[53]

Nicolas Schneider and Dariu M. Gavrila. 2013. Pedestrian Path Prediction with Recursive Bayesian Filters: A Comparative Study. In Pattern Recognition, Joachim Weickert, Matthias Hein, and Bernt Schiele (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 174--183.

[54]

Christoph Schöller, Vincent Aravantinos, Florian Lay, and Alois Knoll. 2019. The Simpler the Better: Constant Velocity for Pedestrian Motion Prediction. ArXiv abs/1903.07933 (2019).

[55]

Kihyuk Sohn, Honglak Lee, and Xinchen Yan. 2015. Learning structured output representation using deep conditional generative models. In Advances in neural information processing systems. 3483--3491.

[56]

Olly Styles, Arun Ross, and Victor Rojo Sánchez. 2019. Forecasting Pedestrian Trajectory with Machine-Annotated Training Data. ArXiv abs/1905.03681 (2019).

[57]

Shan Su, Jung Pyo Hong, Jianbo Shi, and Hyun Soo Park. 2017. Predicting Behaviors of Basketball Players from First Person Videos. 1206--1215.

[58]

Jur P. van den Berg, Stephen J. Guy, Ming C. Lin, and Dinesh Manocha. 2009. Reciprocal n-Body Collision Avoidance. In ISRR.

[59]

Nicolai Wojke, Alex Bewley, and Dietrich Paulus. 2017. Simple Online and Realtime Tracking with a Deep Association Metric. CoRR abs/1703.07402 (2017). arXiv:1703.07402 http://arxiv.org/abs/1703.07402

[60]

Mingze Xu, Chenyou Fan, Yuchen Wang, Michael S. Ryoo, and David J. Crandall. 2018. Joint Person Segmentation and Identification in Synchronized First- and Third-Person Videos. In ECCV.

[61]

Takuma Yagi, Karttikeya Mangalam, Ryo Yonetani, and Yoichi Sato. 2018. Future Person Localization in First-Person Videos. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018), 7593--7602.

[62]

Yu Yao, Mingze Xu, Chiho Choi, David J. Crandall, Ella M. Atkins, and Behzad Dariush. 2018. Egocentric Vision-based Future Vehicle Localization for Intelligent Driving Assistance Systems. CoRR abs/1809.07408 (2018). arXiv:1809.07408 http://arxiv.org/abs/1809.07408

[63]

Lidan Zhang, Qi She, and Ping Guo. 2019. Stochastic trajectory prediction with social graph network. CoRR abs/1907.10233 (2019). arXiv:1907.10233 http://arxiv.org/abs/1907.10233

[64]

M. Zhang, K. T. Ma, J. H. Lim, Q. Zhao, and J. Feng. 2017. Deep Future Gaze: Gaze Anticipation on Egocentric Videos Using Adversarial Networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3539--3548.

[65]

Yue Zhang, Yonggang Qi, Jun Liu, and Yanyan Wang. 2018. Decade of Vision-Based Pedestrian Detection for Self-Driving: An Experimental Survey and Evaluation, In SAE Technical Paper.

Cited By

He ZZhang TWang WLi J(2024)A deep pedestrian trajectory generator for complex indoor environmentsTransactions in GIS10.1111/tgis.1314328:2(411-432)Online publication date: 15-Feb-2024
https://doi.org/10.1111/tgis.13143
Stefani ABisagno NConci N(2024)MapFlow: Multi-Agent Pedestrian Trajectory Prediction Using Normalizing FlowICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10448062(3295-3299)Online publication date: 14-Apr-2024
https://doi.org/10.1109/ICASSP48485.2024.10448062
Capy SVenture GRaksincharoensak P(2023)Pedestrians and Cyclists’ Intention Estimation for the Purpose of Autonomous DrivingInternational Journal of Automotive Engineering10.20485/jsaeijae.14.1_1014:1(10-19)Online publication date: 2023
https://doi.org/10.20485/jsaeijae.14.1_10
Show More Cited By

Index Terms

M2P3: multimodal multi-pedestrian path prediction by self-driving cars with egocentric vision
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Information systems

Index terms have been assigned to the content through auto-classification.

Recommendations

Multimodal multi-pedestrian path prediction for autonomous cars

Accurate prediction of the future position of pedestrians in traffic scenarios is required for safe navigation of an autonomous vehicle but remains a challenge. This concerns, in particular, the effective and efficient multimodal prediction of most ...
Numerical Analysis of Tractor Accidents using Driving Simulator for Autonomous Driving Tractor
ICMRE'19: Proceedings of the 5th International Conference on Mechatronics and Robotics Engineering

Autonomous driving of automobiles is a hot research topic in recent years. The autonomous driving tractor also has been studied in the agricultural field as well as an autonomous driving automobile. On the other hand, tractor accidents frequently occur ...
Switching Back to Manual Driving: How Does it Compare to Simply Driving Away After Parking?
Automotive'UI 16: Proceedings of the 8th International Conference on Automotive User Interfaces and Interactive Vehicular Applications

Is there a difference in behavior when drivers start driving after parking compared to taking over from an autonomous driving car? In the former, the driving context switch (from static to driving) might be bigger than the latter, where drivers are ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SAC '20: Proceedings of the 35th Annual ACM Symposium on Applied Computing

March 2020

2348 pages

ISBN:9781450368667

DOI:10.1145/3341105

Conference Chairs:
Chih-Cheng Hung
Kennesaw State University
,
Tomas Cerny
Baylor University
,
Program Chairs:
Dongwan Shin
New Mexico Tech
,
Alessio Bechini
University of Pisa, Italy

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGAPP: ACM Special Interest Group on Applied Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 March 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Bundesministerium für Bildung und Forschung (BMBF)

Conference

SAC '20

Sponsor:

SIGAPP

SAC '20: The 35th ACM/SIGAPP Symposium on Applied Computing

March 30 - April 3, 2020

Brno, Czech Republic

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

18
Total Citations
View Citations
399
Total Downloads

Downloads (Last 12 months)49
Downloads (Last 6 weeks)4

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

He ZZhang TWang WLi J(2024)A deep pedestrian trajectory generator for complex indoor environmentsTransactions in GIS10.1111/tgis.1314328:2(411-432)Online publication date: 15-Feb-2024
https://doi.org/10.1111/tgis.13143
Stefani ABisagno NConci N(2024)MapFlow: Multi-Agent Pedestrian Trajectory Prediction Using Normalizing FlowICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10448062(3295-3299)Online publication date: 14-Apr-2024
https://doi.org/10.1109/ICASSP48485.2024.10448062
Capy SVenture GRaksincharoensak P(2023)Pedestrians and Cyclists’ Intention Estimation for the Purpose of Autonomous DrivingInternational Journal of Automotive Engineering10.20485/jsaeijae.14.1_1014:1(10-19)Online publication date: 2023
https://doi.org/10.20485/jsaeijae.14.1_10
Vozniak IMuller PHell LLipp NAbouelazm AMuller C(2023)Context-empowered Visual Attention Prediction in Pedestrian Scenarios2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV56688.2023.00101(950-960)Online publication date: Jan-2023
https://doi.org/10.1109/WACV56688.2023.00101
Petzold JWahby MZiad YElSheikh MDawood ABerekovic MHamann H(2023)Protecting Vulnerable Road Users: Semantic Video Analysis for Accident Prediction2023 IEEE Symposium Series on Computational Intelligence (SSCI)10.1109/SSCI52147.2023.10371809(463-469)Online publication date: 5-Dec-2023
https://doi.org/10.1109/SSCI52147.2023.10371809
Poibrenski ANozarian FRezaeianaran FMüller C(2023)Uncertainty-Aware Pseudo Labels for Domain Adaptation in Pedestrian Trajectory Prediction2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC)10.1109/ITSC57777.2023.10421945(5771-5777)Online publication date: 24-Sep-2023
https://doi.org/10.1109/ITSC57777.2023.10421945
Zaier MWannous HDrira HBoonaert J(2023)A Dual Perspective of Human Motion Analysis - 3D Pose Estimation and 2D Trajectory Prediction2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)10.1109/ICCVW60793.2023.00233(2181-2191)Online publication date: 2-Oct-2023
https://doi.org/10.1109/ICCVW60793.2023.00233
Korbmacher RDang HTordeux A(2023)Predicting pedestrian trajectories at different densities: A multi-criteria empirical analysisPhysica A: Statistical Mechanics and its Applications10.1016/j.physa.2023.129440(129440)Online publication date: Dec-2023
https://doi.org/10.1016/j.physa.2023.129440
Jo SLee JKang J(2022)Future Object Localization in Autonomous Driving Using Ego-Centric Images and Motions2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)10.23919/APSIPAASC55919.2022.9980234(1035-1039)Online publication date: 7-Nov-2022
https://doi.org/10.23919/APSIPAASC55919.2022.9980234
Korbmacher RTordeux A(2022)Review of Pedestrian Trajectory Prediction Methods: Comparing Deep Learning and Knowledge-Based ApproachesIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2022.320567623:12(24126-24144)Online publication date: Dec-2022
https://doi.org/10.1109/TITS.2022.3205676
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents