Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3341105.3373877acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

M2P3: multimodal multi-pedestrian path prediction by self-driving cars with egocentric vision

Published: 30 March 2020 Publication History

Abstract

Accurate prediction of the future position of pedestrians in traffic scenarios is required for safe navigation of an autonomous vehicle but remains a challenge. This concerns, in particular, the effective and efficient multimodal prediction of most likely trajectories of tracked pedestrians from egocentric view of self-driving car. In this paper, we present a novel solution, named M2P3, which combines a conditional variational autoencoder with recurrent neural network encoder-decoder architecture in order to predict a set of possible future locations of each pedestrian in a traffic scene. The M2P3 system uses a sequence of RGB images delivered through an internal vehicle-mounted camera for egocentric vision. It takes as an input only two modes, that are past trajectories and scales of pedestrians, and delivers as an output the three most likely paths for each tracked pedestrian. Experimental evaluation of the proposed architecture on the JAAD and ETH/UCY datasets reveal that the M2P3 system is significantly superior to selected state-of-the-art solutions.

References

[1]
A. Alahi, K. Goel, V. Ramanathan, A. Robicquet, L. Fei-Fei, and S. Savarese. 2016. Social LSTM: Human Trajectory Prediction in Crowded Spaces. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 961--971.
[2]
Shervin Ardeshir and Ali Borji. 2016. Ego2Top: Matching Viewers in Egocentric and Top-view Videos. In ECCV.
[3]
Sven Bambach, Stefan Lee, David J. Crandall, and Chen Yu. 2015. Lending A Hand: Detecting Hands and Recognizing Activities in Complex Egocentric Interactions. 2015 IEEE International Conference on Computer Vision (ICCV) (2015), 1949--1957.
[4]
Gedas Bertasius, Hyun Soo Park, Stella X. Yu, and Jianbo Shi. 2017. First-Person Action-Object Detection with EgoNet. ArXiv abs/1603.04908 (2017).
[5]
Apratim Bhattacharyya, Mario Fritz, and Bernt Schiele. 2018. Long-Term On-board Prediction of People in Traffic Scenes Under Uncertainty. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018), 4194--4202.
[6]
Minjie Cai, Kris Makoto Kitani, and Yoichi Sato. 2015. A scalable approach for understanding the visual structures of hand grasps. 2015 IEEE International Conference on Robotics and Automation (ICRA) (2015), 1360--1366.
[7]
François Chollet et al. 2015. Keras. https://keras.io.
[8]
S. Danielsson, L. Petersson, and A. Eidehall. 2007. Monte Carlo based Threat Assessment: Analysis and Improvements. In 2007 IEEE Intelligent Vehicles Symposium. 233--238.
[9]
D. Ellis, E. Sommerlade, and I. Reid. 2009. Modelling pedestrian trajectory patterns with Gaussian processes. In 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops. 1229--1234.
[10]
Chenyou Fan, Jangwon Lee, and Michael S. Ryoo. 2017. Forecasting Hand and Object Locations in Future Frames. CoRR abs/1705.07328 (2017). arXiv:1705.07328 http://arxiv.org/abs/1705.07328
[11]
C. Fan, J. Lee, M. Xu, K. K. Singh, Y. J. Lee, D. J. Crandall, and M. S. Ryoo. 2017. Identifying First-Person Camera Wearers in Third-Person Videos. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4734--4742.
[12]
Zhijie Fang and Antonio M. López. 2018. Is the Pedestrian going to Cross? Answering by 2D Pose Estimation. 2018 IEEE Intelligent Vehicles Symposium (IV) (2018), 1271--1276.
[13]
Alireza Fathi, Ali Farhadi, and James M. Rehg. 2011. Understanding egocentric activities. 2011 International Conference on Computer Vision (2011), 407--414.
[14]
J. Firl, H. StÃijbing, S. A. Huss, and C. Stiller. 2012. Predictive maneuver evaluation for enhancement of Car-to-X mobility data. In 2012 IEEE Intelligent Vehicles Symposium. 558--564.
[15]
Katerina Fragkiadaki, Jonathan Huang, Alex Alemi, Sudheendra Vijayanarasimhan, Susanna Ricco, and Rahul Sukthankar. 2017. Motion prediction under multimodality with conditional stochastic networks. arXiv preprint arXiv:1705.02082 (2017).
[16]
Antonino Furnari, Sebastiano Battiato, Kristen Grauman, and Giovanni Maria Farinella. 2017. Next-Active-Object prediction from Egocentric Videos. ArXiv abs/1904.05250 (2017).
[17]
Agrim Gupta, Justin Johnson, Li Fei-Fei, Silvio Savarese, and Alexandre Alahi. 2018. Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks. CoRR abs/1803.10892 (2018). http://dblp.uni-trier.de/db/journals/corr/corr1803.html#abs-1803-10892
[18]
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross B. Girshick. 2017. Mask R-CNN. CoRR abs/1703.06870 (2017). arXiv:1703.06870 http://arxiv.org/abs/1703.06870
[19]
Yedid Hoshen and Shmuel Peleg. 2014. Egocentric Video Biometrics. CoRR abs/1411.7591 (2014). arXiv:1411.7591 http://arxiv.org/abs/1411.7591
[20]
Zhuxi Jiang, Yin Zheng, Huachun Tan, Bangsheng Tang, and Hanning Zhou. 2016. Variational Deep Embedding: An Unsupervised and Generative Approach to Clustering. In IJCAI.
[21]
Rudolf E. Kálmán. 1960. A New Approach to Linear Filtering and Prediction.
[22]
Vasiliy Karasev, Alper Ayvaci, Bernd Heisele, and Stefano Soatto. 2016. Intent-aware long-term prediction of pedestrian motion. 2016 IEEE International Conference on Robotics and Automation (ICRA) (2016), 2543--2549.
[23]
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. CoRR abs/1412.6980 (2015).
[24]
Diederik P. Kingma and Max Welling. 2014. Auto-Encoding Variational Bayes. CoRR abs/1312.6114 (2014).
[25]
Iuliia Kotseruba, Amir Rasouli, and John K. Tsotsos. 2016. Joint Attention in Autonomous Driving (JAAD). arXiv e-prints, Article arXiv:1609.04741 (Sep 2016), arXiv:1609.04741 pages. arXiv:cs.RO/1609.04741
[26]
Laura Leal-TaixÃl', Michele Fenzi, Alina Kuznetsova, Bodo Rosenhahn, and Silvio Savarese. 2014. Learning an Image-Based Motion Context for Multiple People Tracking.
[27]
Namhoon Lee, Wongun Choi, Paul Vernaza, Christopher Bongsoo Choy, Philip H. S. Torr, and Manmohan Krishna Chandraker. 2017. DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents. CoRR abs/1704.04394 (2017). arXiv:1704.04394 http://arxiv.org/abs/1704.04394
[28]
Yong Jin Lee, Joydeep Ghosh, and Kristen Grauman. 2012. Discovering important people and objects for egocentric video summarization. 2012 IEEE Conference on Computer Vision and Pattern Recognition (2012), 1346--1353.
[29]
Yong Jin Lee and Kristen Grauman. 2014. Predicting Important Objects for Egocentric Video Summarization. International Journal of Computer Vision 114 (2014), 38--55.
[30]
S. LefÃĺvre, C. Laugier, and J. IbaÃśez-GuzmÃąn. 2011. Exploiting map information for driver intention estimation at road intersections. In 2011 IEEE Intelligent Vehicles Symposium (IV). 583--588.
[31]
Cheng Yen Li and Kris M. Kitani. 2013. Pixel-Level Hand Detection in Ego-centric Videos. 2013 IEEE Conference on Computer Vision and Pattern Recognition (2013), 3570--3577.
[32]
Yin Li, Alireza Fathi, and James M. Rehg. 2013. Learning to Predict Gaze in Egocentric Video. In Proceedings of the 2013 IEEE International Conference on Computer Vision (ICCV '13). IEEE Computer Society, Washington, DC, USA, 3216--3223.
[33]
Yin Li, Zhefan Ye, and James M. Rehg. 2015. Delving into egocentric actions. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015), 287--295.
[34]
Junwei Liang, Lu Jiang, Juan Carlos Niebles, Alexander G. Hauptmann, and Li Fei-Fei. 2019. Peeking into the Future: Predicting Future Person Activities and Locations in Videos. CoRR abs/1902.03748 (2019). arXiv:1902.03748 http://arxiv.org/abs/1902.03748
[35]
Manuel Lopez-Martin, Belen Carro, Antonio Sanchez-Esguevillas, and Jaime Lloret. 2017. Conditional variational autoencoder for prediction and feature recovery applied to intrusion detection in iot. Sensors 17, 9 (2017), 1967.
[36]
Minghuang Ma, Haoqi Fan, and Kris Makoto Kitani. 2016. Going Deeper into First-Person Activity Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), 1894--1903.
[37]
Yuexin Ma, Xinge Zhu, Sibo Zhang, Ruigang Yang, Wenping Wang, and Dinesh Manocha. 2019. TrafficPredict: Trajectory Prediction for Heterogeneous Traffic-Agents. ArXiv abs/1811.02146 (2019).
[38]
Dimitrios Makris and Tim J. Ellis. 2002. Spatial and Probabilistic Modelling of Pedestrian Behaviour. In BMVC.
[39]
Ashish Mishra, Shiva Krishna Reddy, Anurag Mittal, and Hema A Murthy. 2018. A generative model for zero shot learning using conditional variational autoencoders. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2188--2196.
[40]
Sang Min Oh, James M. Rehg, Tucker R. Balch, and Frank Dellaert. 2007. Learning and Inferring Motion Patterns using Parametric Segmental Switching Linear Dynamic Systems. International Journal of Computer Vision 77 (2007), 103--124.
[41]
Hyun Soo Park, Jyh-Jing Hwang, Yedong Niu, and Jianbo Shi. 2016. Egocentric Future Localization. (June 2016).
[42]
Alexandre Alahi Parth Kothari. 2019. Human Trajectory Prediction using Adversarial Loss.
[43]
Stefano Pellegrini, Andreas Ess, Konrad Schindler, and Luc Van Gool. 2009. You'll never walk alone: Modeling social behavior for multi-target tracking. 2009 IEEE 12th International Conference on Computer Vision (2009), 261--268.
[44]
Hamed Pirsiavash and Deva Ramanan. 2012. Detecting activities of daily living in first-person camera views. 2012 IEEE Conference on Computer Vision and Pattern Recognition (2012), 2847--2854.
[45]
Carl Edward Rasmussen and Christopher K. I. Williams. 2005. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press.
[46]
Amir Rasouli, Iuliia Kotseruba, and John K. Tsotsos. 2017. Agreeing to cross: How drivers and pedestrians communicate. 2017 IEEE Intelligent Vehicles Symposium (IV) (2017), 264--269.
[47]
A. Rasouli and J. K. Tsotsos. 2019. Autonomous Vehicles That Interact With Pedestrians: A Survey of Theory and Practice. IEEE Transactions on Intelligent Transportation Systems (2019), 1--19.
[48]
Nicholas Rhinehart and Kris Makoto Kitani. 2017. First-Person Activity Forecasting with Online Inverse Reinforcement Learning. 2017 IEEE International Conference on Computer Vision (ICCV) (2017), 3716--3725.
[49]
A. V. I. Rosti and M. J. F. Gales. 2004. Rao-Blackwellised Gibbs sampling for switching linear dynamical systems. In 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 1. 1--809.
[50]
A. Rudenko and et al. 2019. Human Motion Trajectory Prediction: A Survey. In arXiv preprint arXiv:1905.06113.
[51]
Amir Sadeghian, Vineet Kosaraju, Ali Sadeghian, Noriaki Hirose, and Silvio Savarese. 2018. SoPhie: An Attentive GAN for Predicting Paths Compliant to Social and Physical Constraints. CoRR abs/1806.01482 (2018). arXiv:1806.01482 http://arxiv.org/abs/1806.01482
[52]
A. Saran, D. Teney, and K. M. Kitani. 2015. Hand parsing for fine-grained recognition of human grasps in monocular images. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 5052--5058.
[53]
Nicolas Schneider and Dariu M. Gavrila. 2013. Pedestrian Path Prediction with Recursive Bayesian Filters: A Comparative Study. In Pattern Recognition, Joachim Weickert, Matthias Hein, and Bernt Schiele (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 174--183.
[54]
Christoph Schöller, Vincent Aravantinos, Florian Lay, and Alois Knoll. 2019. The Simpler the Better: Constant Velocity for Pedestrian Motion Prediction. ArXiv abs/1903.07933 (2019).
[55]
Kihyuk Sohn, Honglak Lee, and Xinchen Yan. 2015. Learning structured output representation using deep conditional generative models. In Advances in neural information processing systems. 3483--3491.
[56]
Olly Styles, Arun Ross, and Victor Rojo Sánchez. 2019. Forecasting Pedestrian Trajectory with Machine-Annotated Training Data. ArXiv abs/1905.03681 (2019).
[57]
Shan Su, Jung Pyo Hong, Jianbo Shi, and Hyun Soo Park. 2017. Predicting Behaviors of Basketball Players from First Person Videos. 1206--1215.
[58]
Jur P. van den Berg, Stephen J. Guy, Ming C. Lin, and Dinesh Manocha. 2009. Reciprocal n-Body Collision Avoidance. In ISRR.
[59]
Nicolai Wojke, Alex Bewley, and Dietrich Paulus. 2017. Simple Online and Realtime Tracking with a Deep Association Metric. CoRR abs/1703.07402 (2017). arXiv:1703.07402 http://arxiv.org/abs/1703.07402
[60]
Mingze Xu, Chenyou Fan, Yuchen Wang, Michael S. Ryoo, and David J. Crandall. 2018. Joint Person Segmentation and Identification in Synchronized First- and Third-Person Videos. In ECCV.
[61]
Takuma Yagi, Karttikeya Mangalam, Ryo Yonetani, and Yoichi Sato. 2018. Future Person Localization in First-Person Videos. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018), 7593--7602.
[62]
Yu Yao, Mingze Xu, Chiho Choi, David J. Crandall, Ella M. Atkins, and Behzad Dariush. 2018. Egocentric Vision-based Future Vehicle Localization for Intelligent Driving Assistance Systems. CoRR abs/1809.07408 (2018). arXiv:1809.07408 http://arxiv.org/abs/1809.07408
[63]
Lidan Zhang, Qi She, and Ping Guo. 2019. Stochastic trajectory prediction with social graph network. CoRR abs/1907.10233 (2019). arXiv:1907.10233 http://arxiv.org/abs/1907.10233
[64]
M. Zhang, K. T. Ma, J. H. Lim, Q. Zhao, and J. Feng. 2017. Deep Future Gaze: Gaze Anticipation on Egocentric Videos Using Adversarial Networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3539--3548.
[65]
Yue Zhang, Yonggang Qi, Jun Liu, and Yanyan Wang. 2018. Decade of Vision-Based Pedestrian Detection for Self-Driving: An Experimental Survey and Evaluation, In SAE Technical Paper.

Cited By

View all
  • (2024)A deep pedestrian trajectory generator for complex indoor environmentsTransactions in GIS10.1111/tgis.1314328:2(411-432)Online publication date: 15-Feb-2024
  • (2024)MapFlow: Multi-Agent Pedestrian Trajectory Prediction Using Normalizing FlowICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10448062(3295-3299)Online publication date: 14-Apr-2024
  • (2023)Pedestrians and Cyclists’ Intention Estimation for the Purpose of Autonomous DrivingInternational Journal of Automotive Engineering10.20485/jsaeijae.14.1_1014:1(10-19)Online publication date: 2023
  • Show More Cited By

Index Terms

  1. M2P3: multimodal multi-pedestrian path prediction by self-driving cars with egocentric vision
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        SAC '20: Proceedings of the 35th Annual ACM Symposium on Applied Computing
        March 2020
        2348 pages
        ISBN:9781450368667
        DOI:10.1145/3341105
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 30 March 2020

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. autonomous driving
        2. multi-pedestrian path prediction

        Qualifiers

        • Research-article

        Funding Sources

        • Bundesministerium für Bildung und Forschung (BMBF)

        Conference

        SAC '20
        Sponsor:
        SAC '20: The 35th ACM/SIGAPP Symposium on Applied Computing
        March 30 - April 3, 2020
        Brno, Czech Republic

        Acceptance Rates

        Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)49
        • Downloads (Last 6 weeks)4
        Reflects downloads up to 30 Aug 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)A deep pedestrian trajectory generator for complex indoor environmentsTransactions in GIS10.1111/tgis.1314328:2(411-432)Online publication date: 15-Feb-2024
        • (2024)MapFlow: Multi-Agent Pedestrian Trajectory Prediction Using Normalizing FlowICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10448062(3295-3299)Online publication date: 14-Apr-2024
        • (2023)Pedestrians and Cyclists’ Intention Estimation for the Purpose of Autonomous DrivingInternational Journal of Automotive Engineering10.20485/jsaeijae.14.1_1014:1(10-19)Online publication date: 2023
        • (2023)Context-empowered Visual Attention Prediction in Pedestrian Scenarios2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV56688.2023.00101(950-960)Online publication date: Jan-2023
        • (2023)Protecting Vulnerable Road Users: Semantic Video Analysis for Accident Prediction2023 IEEE Symposium Series on Computational Intelligence (SSCI)10.1109/SSCI52147.2023.10371809(463-469)Online publication date: 5-Dec-2023
        • (2023)Uncertainty-Aware Pseudo Labels for Domain Adaptation in Pedestrian Trajectory Prediction2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC)10.1109/ITSC57777.2023.10421945(5771-5777)Online publication date: 24-Sep-2023
        • (2023)A Dual Perspective of Human Motion Analysis - 3D Pose Estimation and 2D Trajectory Prediction2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)10.1109/ICCVW60793.2023.00233(2181-2191)Online publication date: 2-Oct-2023
        • (2023)Predicting pedestrian trajectories at different densities: A multi-criteria empirical analysisPhysica A: Statistical Mechanics and its Applications10.1016/j.physa.2023.129440(129440)Online publication date: Dec-2023
        • (2022)Future Object Localization in Autonomous Driving Using Ego-Centric Images and Motions2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)10.23919/APSIPAASC55919.2022.9980234(1035-1039)Online publication date: 7-Nov-2022
        • (2022)Review of Pedestrian Trajectory Prediction Methods: Comparing Deep Learning and Knowledge-Based ApproachesIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2022.320567623:12(24126-24144)Online publication date: Dec-2022
        • Show More Cited By

        View Options

        Get Access

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media