Skip to main content

Aniket Bera

Followers

34

Following

23

Public Views

Interests

Uploads

Papers by Aniket Bera

ARC: Alignment-based Redirection Controller for Redirected Walking in Complex Environments

We present a novel redirected walking controller based on alignment that allows the user to explo... more We present a novel redirected walking controller based on alignment that allows the user to explore large and complex virtual environments, while minimizing the number of collisions with obstacles in the physical environment. Our alignment-based redirection controller, ARC, steers the user such that their proximity to obstacles in the physical environment matches the proximity to obstacles in the virtual environment as closely as possible. To quantify a controller's performance in complex environments, we introduce a new metric, Complexity Ratio (CR), to measure the relative environment complexity and characterize the difference in navigational complexity between the physical and virtual environments. Through extensive simulation-based experiments, we show that ARC significantly outperforms current state-of-the-art controllers in its ability to steer the user on a collision-free path. We also show through quantitative and qualitative measures of performance that our controller i...

The Socially Invisible Robot: Navigation in the Social World using Robot Entitativity

We present a real-time, data-driven algorithm to enhance the social-invisibility of robots within... more We present a real-time, data-driven algorithm to enhance the social-invisibility of robots within crowds. Our approach is based on prior psychological research, which reveals that people notice and--importantly--react negatively to groups of social actors when they have high entitativity, moving in a tight group with similar appearances and trajectories. In order to evaluate that behavior, we performed a user study to develop navigational algorithms that minimize entitativity. This study establishes a mapping between emotional reactions and multi-robot trajectories and appearances and further generalizes the finding across various environmental conditions. We demonstrate the applicability of our entitativity modeling for trajectory computation for active surveillance and dynamic intervention in simulated robot-human interaction scenarios. Our approach empirically shows that various levels of entitative robots can be used to both avoid and influence pedestrians while not eliciting st...

Realtime Pedestrian Tracking and Prediction in Dense Crowds

Group and Crowd Behavior for Computer Vision

The Emotionally Intelligent Robot: Improving Social Navigation in Crowded Environments

ArXiv, 2019

We present a real-time algorithm for emotion-aware navigation of a robot among pedestrians. Our a... more We present a real-time algorithm for emotion-aware navigation of a robot among pedestrians. Our approach estimates time-varying emotional behaviors of pedestrians from their faces and trajectories using a combination of Bayesian-inference, CNN-based learning, and the PAD (Pleasure-Arousal-Dominance) model from psychology. These PAD characteristics are used for long-term path prediction and generating proxemic constraints for each pedestrian. We use a multi-channel model to classify pedestrian characteristics into four emotion categories (happy, sad, angry, neutral). In our validation results, we observe an emotion detection accuracy of 85.33%. We formulate emotion-based proxemic constraints to perform socially-aware robot navigation in low- to medium-density environments. We demonstrate the benefits of our algorithm in simulated environments with tens of pedestrians as well as in a real-world setting with Pepper, a social humanoid robot.

Using Graph-Theoretic Machine Learning to Predict Human Driver Behavior

IEEE Transactions on Intelligent Transportation Systems

CMetric: A Driving Behavior Measure using Centrality Functions

2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020

We present a new measure, CMetric, to classify driver behaviors using centrality functions. Our f... more We present a new measure, CMetric, to classify driver behaviors using centrality functions. Our formulation combines concepts from computational graph theory and social traffic psychology to quantify and classify the behavior of human drivers. CMetric is used to compute the probability of a vehicle executing a driving style, as well as the intensity used to execute the style. Our approach is designed for realtime autonomous driving applications, where the trajectory of each vehicle or road-agent is extracted from a video. We compute a dynamic geometric graph (DGG) based on the positions and proximity of the road-agents and centrality functions corresponding to closeness and degree. These functions are used to compute the CMetric based on style likelihood and style intensity estimates. Our approach is general and makes no assumption about traffic density, heterogeneity, or how driving behaviors change over time. We present an algorithm to compute CMetric and demonstrate its performan...

Socially Invisible Navigation for Intelligent Vehicles

We present a real-time, data-driven algorithm to enhance the social-invisibility of autonomous ve... more We present a real-time, data-driven algorithm to enhance the social-invisibility of autonomous vehicles within crowds. Our approach is based on prior psychological research, which reveals that people notice and–importantly–react negatively to groups of social actors when they have high entitativity, moving in a tight group with similar appearances and trajectories. In order to evaluate that behavior, we performed a user study to develop navigational algorithms that minimize entitativity. This study establishes mapping between emotional reactions and multi-robot trajectories and appearances, and further generalizes the finding across various environmental conditions. We demonstrate the applicability of our entitativity modeling for trajectory computation for active surveillance and dynamic intervention in simulated robot-human interaction scenarios. Our approach empirically shows that various levels of entitative robots can be used to both avoid and influence pedestrians while not el...

Learning Unseen Emotions from Gestures via Semantically-Conditioned Zero-Shot Perception with Adversarial Autoencoders

ArXiv, 2020

We present a novel generalized zero-shot algorithm to recognize perceived emotions from gestures.... more We present a novel generalized zero-shot algorithm to recognize perceived emotions from gestures. Our task is to map gestures to novel emotion categories not encountered in training. We introduce an adversarial, autoencoder-based representation learning that correlates 3D motion-captured gesture sequence with the vectorized representation of the natural-language perceived emotion terms using word2vec embeddings. The language-semantic embedding provides a representation of the emotion label space, and we leverage this underlying distribution to map the gesture-sequences to the appropriate categorical emotion labels. We train our method using a combination of gestures annotated with known emotion terms and gestures not annotated with any emotions. We evaluate our method on the MPI Emotional Body Expressions Database (EBEDB) and obtain an accuracy of $58.43\%$. This improves the performance of current state-of-the-art algorithms for generalized zero-shot learning by $25$--$27\%$ on the...

Identifying Emotions from Walking using Affective and Deep Features

ArXiv, 2019

We present a new data-driven model and algorithm to identify the perceived emotions of individual... more We present a new data-driven model and algorithm to identify the perceived emotions of individuals based on their walking styles. Given an RGB video of an individual walking, we extract his/her walking gait in the form of a series of 3D poses. Our goal is to exploit the gait features to classify the emotional state of the human into one of four emotions: happy, sad, angry, or neutral. Our perceived emotion recognition approach uses deep features learned via LSTM on labeled emotion datasets. Furthermore, we combine these features with affective features computed from gaits using posture and movement cues. These features are classified using a Random Forest Classifier. We show that our mapping between the combined feature space and the perceived emotional state provides 80.07% accuracy in identifying the perceived emotions. In addition to classifying discrete categories of emotions, our algorithm also predicts the values of perceived valence and arousal from gaits. We also present an ...

PedLearn : Realtime Pedestrian Tracking , Behavior Learning , and Navigation for Autonomous Vehicles

We present a real-time tracking algorithm for extracting the trajectory of each pedestrian in a c... more We present a real-time tracking algorithm for extracting the trajectory of each pedestrian in a crowd video using a combination of non-linear motion models and learning methods. These motion models are based on new collisionavoidance and local navigation algorithms that provide improved accuracy in dense settings. The resulting tracking algorithm can handle dense crowds with tens of pedestrians at realtime rates (25-30fps). We also give an overview of techniques that combine these motion models with global movement patterns and Bayesian inference to predict the future position of each pedestrian over a long time horizon. The combination of local and global features enables us to accurately predict the trajectory of each pedestrian in a dense crowd at realtime rates. We highlight the performance of the algorithm in real-world crowd videos with medium crowd density.

Text2Gestures: A Transformer-Based Network for Generating Emotive Body Gestures for Virtual Agents**This work has been supported in part by ARO Grants W911NF1910069 and W911NF1910315, and Intel. Code and additional materials available at: https://gamma.umd.edu/t2g

2021 IEEE Virtual Reality and 3D User Interfaces (VR), 2021

We present Text2Gestures, a transformer-based learning method to interactively generate emotive f... more We present Text2Gestures, a transformer-based learning method to interactively generate emotive full-body gestures for virtual agents aligned with natural language text inputs. Our method generates emotionally expressive gestures by utilizing the relevant biomechanical features for body expressions, also known as affective features. We also consider the intended task corresponding to the text and the target virtual agents' intended gender and handedness in our generation pipeline. We train and evaluate our network on the MPI Emotional Body Expressions Database and observe that our network produces state-of-the-art performance in generating gestures for virtual agents aligned with the text for narration or conversation. Our network can generate these gestures at interactive rates on a commodity GPU. We conduct a web-based user study and observe that around 91% of participants indicated our generated gestures to be at least plausible on a five-point Likert Scale. The emotions perc...

Efficient trajectory extraction and parameter learning for data-driven crowd simulation

We present a trajectory extraction and behavior-learning algorithm for data-driven crowd simulati... more We present a trajectory extraction and behavior-learning algorithm for data-driven crowd simulation. Our formulation is based on incrementally learning pedestrian motion models and behaviors from crowd videos. We combine this learned crowd-simulation model with an online tracker based on particle filtering to compute accurate, smooth pedestrian trajectories. We refine this motion model using an optimization technique to estimate the agents' simulation parameters. We highlight the benefits of our approach for improved data-driven crowd simulation, including crowd replication from videos and merging the behavior of pedestrians from multiple videos. We highlight our algorithm's performance in various test scenarios containing tens of human-like agents.

STD-PD: Generating Synthetic Training Data for Pedestrian Detection in Unannotated Videos

ArXiv, 2017

We present a new method for training pedestrian detectors on an unannotated image set, which is c... more We present a new method for training pedestrian detectors on an unannotated image set, which is captured by a moving camera with a fixed height and angle from the ground. Our approach is general and robust, and makes no other assumptions about the image dataset or the number of pedestrians. We automatically extract the vanishing point and the pedestrians’ scale to calibrate the virtual camera and generate a probability map for the pedestrians to spawn. Using these features, we overlay synthetic human-like agents in proper locations on the images from the unannotated dataset. We also present novel techniques to increase the realism of these synthetic agents and use the augmented images to train a Faster R-CNN detector. Our approach improves the accuracy by 12−13% over prior methods for unannotated image datasets.

Classifying Driver Behaviors for Autonomous Vehicle Navigation

We present a novel approach to automatically identify driver behaviors from vehicle trajectories ... more We present a novel approach to automatically identify driver behaviors from vehicle trajectories and use them for safe navigation of autonomous vehicles. We propose a novel set of features that can be easily extracted from car trajectories. We derive a data-driven mapping between these features and six driver behaviors using an elaborate web-based user study. We also compute a summarized score indicating a level of awareness that is needed while driving next to other vehicles. We also incorporate our algorithm into a vehicle navigation simulation system and demonstrate its benefits in terms of safer realtime navigation, while driving next to aggressive or dangerous drivers.

Autonomous Driving among Many Pedestrians: Models and Algorithms

Driving among a dense crowd of pedestrians is a major challenge for autonomous vehicles. This pap... more Driving among a dense crowd of pedestrians is a major challenge for autonomous vehicles. This paper presents a planning system for autonomous driving among many pedestrians. A key ingredient of our approach is a pedestrian motion prediction model that accounts for both a pedestrian's global navigation intention and local interactions with the vehicle and other pedestrians. Unfortunately, the autonomous vehicle does not know the pedestrian's intention a priori and requires a planning algorithm that hedges against the uncertainty in pedestrian intentions. Our planning system combines a POMDP algorithm with the pedestrian motion model and runs in near real time. Experiments show that it enables a robot vehicle to drive safely, efficiently, and smoothly among a crowd with a density of nearly one person per square meter.

EWareNet: Emotion Aware Human Intent Prediction and Adaptive Spatial Profile Fusion for Social Robot Navigation

We present EWareNet, a novel intent-aware social robot navigation algorithm among pedestrians. Ou... more We present EWareNet, a novel intent-aware social robot navigation algorithm among pedestrians. Our approach predicts the trajectory-based pedestrian intent from historical gaits, which is then used for intent-guided navigation taking into account social and proxemic constraints. To predict pedestrian intent, we propose a transformer-based model that works on a commodity RGB-D camera mounted onto a moving robot. Our intent prediction routine is integrated into a mapless navigation scheme and makes no assumptions about the environment of pedestrian motion. Our navigation scheme consists of a novel obstacle profile representation methodology that is dynamically adjusted based on the pedestrian pose, intent, and emotion. The navigation scheme is based on a reinforcement learning algorithm that takes into consideration human intent and robot's impact on human intent, in addition to the environmental configuration. We outperform current state-of-art algorithms for intent prediction fr...

RoadTrack: Tracking Road Agents in Dense and Heterogeneous Environments

We present an algorithm to track traffic agents in dense videos. Our approach is designed for het... more We present an algorithm to track traffic agents in dense videos. Our approach is designed for heterogeneous traffic scenarios that consist of different agents such as pedestrians, two-wheelers, cars, buses etc. sharing the road. We present a novel Heterogeneous Traffic Motion and Interaction model (HTMI) to predict the motion of agents by modeling collision avoidance and interactions between the agents. We implement HTMI within the tracking-by-detection paradigm and use background subtracted representations of traffic agents to extract binary tensors for accurate tracking. We highlight the performance on a dense traffic videos and observe an accuracy of 75.8%. We observe up to 4 times speedup over prior tracking algorithms on standard traffic datasets.

Automatically Learning Driver Behaviors for Safe Autonomous Vehicle Navigation

We present an autonomous driving planning algorithm that takes into account neighboring drivers’ ... more We present an autonomous driving planning algorithm that takes into account neighboring drivers’ behaviors and achieves safer and more efficient navigation. Our approach leverages the advantages of a data-driven mapping that is used to characterize the behavior of other drivers on the road. Our formulation also takes into account pedestrians and cyclists and uses psychology-based models to perform safe navigation. We demonstrate our benefits over previous methods: safer behavior in avoiding dangerous neighboring drivers, pedestrians and cyclists, and efficient navigation around careful drivers.

RAIST: Learning Risk Aware Traffic Interactions via Spatio-Temporal Graph Convolutional Networks

A key aspect of driving a road vehicle is to interact with the other road users, assess their int... more A key aspect of driving a road vehicle is to interact with the other road users, assess their intentions and make risk-aware tactical decisions. An intuitive approach of enabling an intelligent automated driving system would be to incorporate some aspects of the human driving behavior. To this end, we propose a novel driving framework for egocentric views, which is based on spatio-temporal traffic graphs. The traffic graphs not only model the spatial interactions amongst the road users, but also their individual intentions through temporally associated message passing. We leverage spatio-temporal graph convolutional network (ST-GCN) to train the graph edges. These edges are formulated using parameterized functions of 3D positions and scene-aware appearance features of road agents. Along with tactical behavior prediction, it is crucial to evaluate the risk assessing ability of the proposed framework. We claim that our framework learns risk aware representations by improving on the ta...

Classifying Group Emotions for Socially-Aware Autonomous Vehicle Navigation

We present a real-time, data-driven algorithm to enhance the social-invisibility of autonomous ro... more We present a real-time, data-driven algorithm to enhance the social-invisibility of autonomous robot navigation within crowds. Our approach is based on prior psychological research, which reveals that people notice and-importantly-react negatively to groups of social actors when they have negative group emotions or entitativity, moving in a tight group with similar appearances and trajectories. In order to evaluate that behavior, we performed a user study to develop navigational algorithms that minimize emotional reactions. This study establishes a mapping between emotional reactions and multi-robot trajectories and appearances and further generalizes the finding across various environmental conditions. We demonstrate the applicability of our approach for trajectory computation for active navigation and dynamic intervention in simulated autonomous robot-human interaction scenarios. Our approach empirically shows that various levels of emotional autonomous robots can be used to both ...

ARC: Alignment-based Redirection Controller for Redirected Walking in Complex Environments

We present a novel redirected walking controller based on alignment that allows the user to explo... more We present a novel redirected walking controller based on alignment that allows the user to explore large and complex virtual environments, while minimizing the number of collisions with obstacles in the physical environment. Our alignment-based redirection controller, ARC, steers the user such that their proximity to obstacles in the physical environment matches the proximity to obstacles in the virtual environment as closely as possible. To quantify a controller's performance in complex environments, we introduce a new metric, Complexity Ratio (CR), to measure the relative environment complexity and characterize the difference in navigational complexity between the physical and virtual environments. Through extensive simulation-based experiments, we show that ARC significantly outperforms current state-of-the-art controllers in its ability to steer the user on a collision-free path. We also show through quantitative and qualitative measures of performance that our controller i...

The Socially Invisible Robot: Navigation in the Social World using Robot Entitativity

We present a real-time, data-driven algorithm to enhance the social-invisibility of robots within... more We present a real-time, data-driven algorithm to enhance the social-invisibility of robots within crowds. Our approach is based on prior psychological research, which reveals that people notice and--importantly--react negatively to groups of social actors when they have high entitativity, moving in a tight group with similar appearances and trajectories. In order to evaluate that behavior, we performed a user study to develop navigational algorithms that minimize entitativity. This study establishes a mapping between emotional reactions and multi-robot trajectories and appearances and further generalizes the finding across various environmental conditions. We demonstrate the applicability of our entitativity modeling for trajectory computation for active surveillance and dynamic intervention in simulated robot-human interaction scenarios. Our approach empirically shows that various levels of entitative robots can be used to both avoid and influence pedestrians while not eliciting st...

Realtime Pedestrian Tracking and Prediction in Dense Crowds

Group and Crowd Behavior for Computer Vision

The Emotionally Intelligent Robot: Improving Social Navigation in Crowded Environments

ArXiv, 2019

We present a real-time algorithm for emotion-aware navigation of a robot among pedestrians. Our a... more We present a real-time algorithm for emotion-aware navigation of a robot among pedestrians. Our approach estimates time-varying emotional behaviors of pedestrians from their faces and trajectories using a combination of Bayesian-inference, CNN-based learning, and the PAD (Pleasure-Arousal-Dominance) model from psychology. These PAD characteristics are used for long-term path prediction and generating proxemic constraints for each pedestrian. We use a multi-channel model to classify pedestrian characteristics into four emotion categories (happy, sad, angry, neutral). In our validation results, we observe an emotion detection accuracy of 85.33%. We formulate emotion-based proxemic constraints to perform socially-aware robot navigation in low- to medium-density environments. We demonstrate the benefits of our algorithm in simulated environments with tens of pedestrians as well as in a real-world setting with Pepper, a social humanoid robot.

Using Graph-Theoretic Machine Learning to Predict Human Driver Behavior

IEEE Transactions on Intelligent Transportation Systems

CMetric: A Driving Behavior Measure using Centrality Functions

2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020

We present a new measure, CMetric, to classify driver behaviors using centrality functions. Our f... more We present a new measure, CMetric, to classify driver behaviors using centrality functions. Our formulation combines concepts from computational graph theory and social traffic psychology to quantify and classify the behavior of human drivers. CMetric is used to compute the probability of a vehicle executing a driving style, as well as the intensity used to execute the style. Our approach is designed for realtime autonomous driving applications, where the trajectory of each vehicle or road-agent is extracted from a video. We compute a dynamic geometric graph (DGG) based on the positions and proximity of the road-agents and centrality functions corresponding to closeness and degree. These functions are used to compute the CMetric based on style likelihood and style intensity estimates. Our approach is general and makes no assumption about traffic density, heterogeneity, or how driving behaviors change over time. We present an algorithm to compute CMetric and demonstrate its performan...

Socially Invisible Navigation for Intelligent Vehicles

We present a real-time, data-driven algorithm to enhance the social-invisibility of autonomous ve... more We present a real-time, data-driven algorithm to enhance the social-invisibility of autonomous vehicles within crowds. Our approach is based on prior psychological research, which reveals that people notice and–importantly–react negatively to groups of social actors when they have high entitativity, moving in a tight group with similar appearances and trajectories. In order to evaluate that behavior, we performed a user study to develop navigational algorithms that minimize entitativity. This study establishes mapping between emotional reactions and multi-robot trajectories and appearances, and further generalizes the finding across various environmental conditions. We demonstrate the applicability of our entitativity modeling for trajectory computation for active surveillance and dynamic intervention in simulated robot-human interaction scenarios. Our approach empirically shows that various levels of entitative robots can be used to both avoid and influence pedestrians while not el...

Learning Unseen Emotions from Gestures via Semantically-Conditioned Zero-Shot Perception with Adversarial Autoencoders

ArXiv, 2020

We present a novel generalized zero-shot algorithm to recognize perceived emotions from gestures.... more We present a novel generalized zero-shot algorithm to recognize perceived emotions from gestures. Our task is to map gestures to novel emotion categories not encountered in training. We introduce an adversarial, autoencoder-based representation learning that correlates 3D motion-captured gesture sequence with the vectorized representation of the natural-language perceived emotion terms using word2vec embeddings. The language-semantic embedding provides a representation of the emotion label space, and we leverage this underlying distribution to map the gesture-sequences to the appropriate categorical emotion labels. We train our method using a combination of gestures annotated with known emotion terms and gestures not annotated with any emotions. We evaluate our method on the MPI Emotional Body Expressions Database (EBEDB) and obtain an accuracy of $58.43\%$. This improves the performance of current state-of-the-art algorithms for generalized zero-shot learning by $25$--$27\%$ on the...

Identifying Emotions from Walking using Affective and Deep Features

ArXiv, 2019

We present a new data-driven model and algorithm to identify the perceived emotions of individual... more We present a new data-driven model and algorithm to identify the perceived emotions of individuals based on their walking styles. Given an RGB video of an individual walking, we extract his/her walking gait in the form of a series of 3D poses. Our goal is to exploit the gait features to classify the emotional state of the human into one of four emotions: happy, sad, angry, or neutral. Our perceived emotion recognition approach uses deep features learned via LSTM on labeled emotion datasets. Furthermore, we combine these features with affective features computed from gaits using posture and movement cues. These features are classified using a Random Forest Classifier. We show that our mapping between the combined feature space and the perceived emotional state provides 80.07% accuracy in identifying the perceived emotions. In addition to classifying discrete categories of emotions, our algorithm also predicts the values of perceived valence and arousal from gaits. We also present an ...

PedLearn : Realtime Pedestrian Tracking , Behavior Learning , and Navigation for Autonomous Vehicles

We present a real-time tracking algorithm for extracting the trajectory of each pedestrian in a c... more We present a real-time tracking algorithm for extracting the trajectory of each pedestrian in a crowd video using a combination of non-linear motion models and learning methods. These motion models are based on new collisionavoidance and local navigation algorithms that provide improved accuracy in dense settings. The resulting tracking algorithm can handle dense crowds with tens of pedestrians at realtime rates (25-30fps). We also give an overview of techniques that combine these motion models with global movement patterns and Bayesian inference to predict the future position of each pedestrian over a long time horizon. The combination of local and global features enables us to accurately predict the trajectory of each pedestrian in a dense crowd at realtime rates. We highlight the performance of the algorithm in real-world crowd videos with medium crowd density.

Text2Gestures: A Transformer-Based Network for Generating Emotive Body Gestures for Virtual Agents**This work has been supported in part by ARO Grants W911NF1910069 and W911NF1910315, and Intel. Code and additional materials available at: https://gamma.umd.edu/t2g

2021 IEEE Virtual Reality and 3D User Interfaces (VR), 2021

We present Text2Gestures, a transformer-based learning method to interactively generate emotive f... more We present Text2Gestures, a transformer-based learning method to interactively generate emotive full-body gestures for virtual agents aligned with natural language text inputs. Our method generates emotionally expressive gestures by utilizing the relevant biomechanical features for body expressions, also known as affective features. We also consider the intended task corresponding to the text and the target virtual agents' intended gender and handedness in our generation pipeline. We train and evaluate our network on the MPI Emotional Body Expressions Database and observe that our network produces state-of-the-art performance in generating gestures for virtual agents aligned with the text for narration or conversation. Our network can generate these gestures at interactive rates on a commodity GPU. We conduct a web-based user study and observe that around 91% of participants indicated our generated gestures to be at least plausible on a five-point Likert Scale. The emotions perc...

Efficient trajectory extraction and parameter learning for data-driven crowd simulation

We present a trajectory extraction and behavior-learning algorithm for data-driven crowd simulati... more We present a trajectory extraction and behavior-learning algorithm for data-driven crowd simulation. Our formulation is based on incrementally learning pedestrian motion models and behaviors from crowd videos. We combine this learned crowd-simulation model with an online tracker based on particle filtering to compute accurate, smooth pedestrian trajectories. We refine this motion model using an optimization technique to estimate the agents' simulation parameters. We highlight the benefits of our approach for improved data-driven crowd simulation, including crowd replication from videos and merging the behavior of pedestrians from multiple videos. We highlight our algorithm's performance in various test scenarios containing tens of human-like agents.

STD-PD: Generating Synthetic Training Data for Pedestrian Detection in Unannotated Videos

ArXiv, 2017

We present a new method for training pedestrian detectors on an unannotated image set, which is c... more We present a new method for training pedestrian detectors on an unannotated image set, which is captured by a moving camera with a fixed height and angle from the ground. Our approach is general and robust, and makes no other assumptions about the image dataset or the number of pedestrians. We automatically extract the vanishing point and the pedestrians’ scale to calibrate the virtual camera and generate a probability map for the pedestrians to spawn. Using these features, we overlay synthetic human-like agents in proper locations on the images from the unannotated dataset. We also present novel techniques to increase the realism of these synthetic agents and use the augmented images to train a Faster R-CNN detector. Our approach improves the accuracy by 12−13% over prior methods for unannotated image datasets.

Classifying Driver Behaviors for Autonomous Vehicle Navigation

We present a novel approach to automatically identify driver behaviors from vehicle trajectories ... more We present a novel approach to automatically identify driver behaviors from vehicle trajectories and use them for safe navigation of autonomous vehicles. We propose a novel set of features that can be easily extracted from car trajectories. We derive a data-driven mapping between these features and six driver behaviors using an elaborate web-based user study. We also compute a summarized score indicating a level of awareness that is needed while driving next to other vehicles. We also incorporate our algorithm into a vehicle navigation simulation system and demonstrate its benefits in terms of safer realtime navigation, while driving next to aggressive or dangerous drivers.

Autonomous Driving among Many Pedestrians: Models and Algorithms

Driving among a dense crowd of pedestrians is a major challenge for autonomous vehicles. This pap... more Driving among a dense crowd of pedestrians is a major challenge for autonomous vehicles. This paper presents a planning system for autonomous driving among many pedestrians. A key ingredient of our approach is a pedestrian motion prediction model that accounts for both a pedestrian's global navigation intention and local interactions with the vehicle and other pedestrians. Unfortunately, the autonomous vehicle does not know the pedestrian's intention a priori and requires a planning algorithm that hedges against the uncertainty in pedestrian intentions. Our planning system combines a POMDP algorithm with the pedestrian motion model and runs in near real time. Experiments show that it enables a robot vehicle to drive safely, efficiently, and smoothly among a crowd with a density of nearly one person per square meter.

EWareNet: Emotion Aware Human Intent Prediction and Adaptive Spatial Profile Fusion for Social Robot Navigation

We present EWareNet, a novel intent-aware social robot navigation algorithm among pedestrians. Ou... more We present EWareNet, a novel intent-aware social robot navigation algorithm among pedestrians. Our approach predicts the trajectory-based pedestrian intent from historical gaits, which is then used for intent-guided navigation taking into account social and proxemic constraints. To predict pedestrian intent, we propose a transformer-based model that works on a commodity RGB-D camera mounted onto a moving robot. Our intent prediction routine is integrated into a mapless navigation scheme and makes no assumptions about the environment of pedestrian motion. Our navigation scheme consists of a novel obstacle profile representation methodology that is dynamically adjusted based on the pedestrian pose, intent, and emotion. The navigation scheme is based on a reinforcement learning algorithm that takes into consideration human intent and robot's impact on human intent, in addition to the environmental configuration. We outperform current state-of-art algorithms for intent prediction fr...

RoadTrack: Tracking Road Agents in Dense and Heterogeneous Environments

We present an algorithm to track traffic agents in dense videos. Our approach is designed for het... more We present an algorithm to track traffic agents in dense videos. Our approach is designed for heterogeneous traffic scenarios that consist of different agents such as pedestrians, two-wheelers, cars, buses etc. sharing the road. We present a novel Heterogeneous Traffic Motion and Interaction model (HTMI) to predict the motion of agents by modeling collision avoidance and interactions between the agents. We implement HTMI within the tracking-by-detection paradigm and use background subtracted representations of traffic agents to extract binary tensors for accurate tracking. We highlight the performance on a dense traffic videos and observe an accuracy of 75.8%. We observe up to 4 times speedup over prior tracking algorithms on standard traffic datasets.

Automatically Learning Driver Behaviors for Safe Autonomous Vehicle Navigation

We present an autonomous driving planning algorithm that takes into account neighboring drivers’ ... more We present an autonomous driving planning algorithm that takes into account neighboring drivers’ behaviors and achieves safer and more efficient navigation. Our approach leverages the advantages of a data-driven mapping that is used to characterize the behavior of other drivers on the road. Our formulation also takes into account pedestrians and cyclists and uses psychology-based models to perform safe navigation. We demonstrate our benefits over previous methods: safer behavior in avoiding dangerous neighboring drivers, pedestrians and cyclists, and efficient navigation around careful drivers.

RAIST: Learning Risk Aware Traffic Interactions via Spatio-Temporal Graph Convolutional Networks

A key aspect of driving a road vehicle is to interact with the other road users, assess their int... more A key aspect of driving a road vehicle is to interact with the other road users, assess their intentions and make risk-aware tactical decisions. An intuitive approach of enabling an intelligent automated driving system would be to incorporate some aspects of the human driving behavior. To this end, we propose a novel driving framework for egocentric views, which is based on spatio-temporal traffic graphs. The traffic graphs not only model the spatial interactions amongst the road users, but also their individual intentions through temporally associated message passing. We leverage spatio-temporal graph convolutional network (ST-GCN) to train the graph edges. These edges are formulated using parameterized functions of 3D positions and scene-aware appearance features of road agents. Along with tactical behavior prediction, it is crucial to evaluate the risk assessing ability of the proposed framework. We claim that our framework learns risk aware representations by improving on the ta...

Classifying Group Emotions for Socially-Aware Autonomous Vehicle Navigation

We present a real-time, data-driven algorithm to enhance the social-invisibility of autonomous ro... more We present a real-time, data-driven algorithm to enhance the social-invisibility of autonomous robot navigation within crowds. Our approach is based on prior psychological research, which reveals that people notice and-importantly-react negatively to groups of social actors when they have negative group emotions or entitativity, moving in a tight group with similar appearances and trajectories. In order to evaluate that behavior, we performed a user study to develop navigational algorithms that minimize emotional reactions. This study establishes a mapping between emotional reactions and multi-robot trajectories and appearances and further generalizes the finding across various environmental conditions. We demonstrate the applicability of our approach for trajectory computation for active navigation and dynamic intervention in simulated autonomous robot-human interaction scenarios. Our approach empirically shows that various levels of emotional autonomous robots can be used to both ...