Location via proxy:   
[Report a bug]   [Manage cookies]                

Sanjay Haresh

I am an AI Researcher at Qualcomm AI Research working in Roland Memisevic's group on the intersection of large vision language models and robotics. Before that, I completed my MSc. (Thesis) at Simon Fraser University (SFU), where I was advised by Prof. Manolis Savva .

Prior to SFU, I worked at Retrocausal for 2 years as a Research Engineer (Computer Vision) under the supervision of Dr. Quoc Huy Tran and Dr. Zeeshan Zia.

In 2019, I completed my undergrad Computer Science from FAST-NUCES Karachi, Pakistan, where I worked on Class Imbalance under the guidance of Prof. Tahir Syed.

Email  /  CV  /  Google Scholar  /  Twitter

profile photo
Publications and Preprints

Papers are in reverse chronological order. '*' denotes equal contribution.

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Kristen Grauman, ... Sanjay Haresh, Yongsen Mao*, Manolis Savva, ...
CVPR, 2024
project page / arXiv

We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge to push the frontier of first-person video understanding of skilled human activity.

Habitat Synthetic Scenes Dataset (HSSD-200): An Analysis of 3D Scene Scale and Realism Tradeoffs for ObjectGoal Navigation
Mukul Khanna*, Yongsen Mao*, Hanxiao Jiang, Sanjay Haresh, Brennan Shacklett, Dhruv Batra, Alexander Clegg, Eric Undersander, Angel Chang, Manolis Savva
CVPR, 2024
project page / arXiv

We present the Habitat Synthetic Scene Dataset, a dataset of 211 high-quality 3D scenes, and use it to investigate the impact of synthetic 3D scene dataset scale and realism on the task of training embodied agents to find and navigate to objects.

Articulated 3D Human-Object Interactions from RGB Videos: An Empirical Analysis of Approaches and Challenges
Sanjay Haresh, Xiaohao Sun, Hanxiao Jiang, Angel Chang, Manolis Savva
3DV, 2022
project page / arXiv

We canonicalize the task of reconstruction 3D human object from videos and benchmark 5 families of methods on the task.

Timestamp-Supervised Action Segmentation with Graph Convolutional Networks
Hamza Khan Sanjay Haresh, Awais Ahmed, Shakeeb Siddiqui, Andrey Konin , M. Zeeshan Zia, Quoc-Huy Tran
IROS, 2022
project page / arXiv

We leverage graph convolutional networks to propagate timestamp labels to the whole video resulting in a 97% reduction of required labels.

Unsupervised Action Segmentation by Joint Representation Learning and Online Clustering
Sanjay Haresh*, Sateesh Kumar*, Awais Ahmed, Andrey Konin , M. Zeeshan Zia, Quoc-Huy Tran
CVPR, 2022
project page / arXiv

We proposed temporal optimal transport for jointly learning representations and performing online clustering in an unsupervised manner.

Learning by Aligning Video in Time
Sanjay Haresh*, Sateesh Kumar*, Huseyin Coskun, Shahram N. Syed, Andrey Konin , M. Zeeshan Zia, Quoc-Huy Tran
CVPR, 2021
project page / arXiv

Good frame representations can be learned by learning global alignment across pairs of videos via differentiable dynamic time warping.

Towards Anomaly Detection in Dashcam Videos
Sanjay Haresh*, Sateesh Kumar*, M. Zeeshan Zia Quoc-Huy Tran
IV, 2020
talk / arXiv

We curated a large dataset of dashcam videos for road anomalies understanding. We proposed an object-object interaction reasoning approach for detecting anomalies without additional supervision.


Website layout is from Jon Barron