Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–13 of 13 results for author: Duarte, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.01954  [pdf, other

    cs.CV

    Plug-and-Play Diffusion Distillation

    Authors: Yi-Ting Hsiao, Siavash Khodadadeh, Kevin Duarte, Wei-An Lin, Hui Qu, Mingi Kwon, Ratheesh Kalarot

    Abstract: Diffusion models have shown tremendous results in image generation. However, due to the iterative nature of the diffusion process and its reliance on classifier-free guidance, inference times are slow. In this paper, we propose a new distillation approach for guided diffusion models in which an external lightweight guide model is trained while the original text-to-image model remains frozen. We sh… ▽ More

    Submitted 14 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024 project page: https://5410tiffany.github.io/plug-and-play-diffusion-distillation.github.io/

  2. arXiv:2206.02664  [pdf, other

    cs.CV cs.LG

    Learning with Capsules: A Survey

    Authors: Fabio De Sousa Ribeiro, Kevin Duarte, Miles Everett, Georgios Leontidis, Mubarak Shah

    Abstract: Capsule networks were proposed as an alternative approach to Convolutional Neural Networks (CNNs) for learning object-centric representations, which can be leveraged for improved generalization and sample complexity. Unlike CNNs, capsule networks are designed to explicitly model part-whole hierarchical relationships by using groups of neurons to encode visual entities, and learn the relationships… ▽ More

    Submitted 6 June, 2022; originally announced June 2022.

    Comments: 29 pages, 43 figures

  3. arXiv:2112.00775  [pdf, other

    cs.CV

    Routing with Self-Attention for Multimodal Capsule Networks

    Authors: Kevin Duarte, Brian Chen, Nina Shvetsova, Andrew Rouditchenko, Samuel Thomas, Alexander Liu, David Harwath, James Glass, Hilde Kuehne, Mubarak Shah

    Abstract: The task of multimodal learning has seen a growing interest recently as it allows for training neural architectures based on different modalities such as vision, text, and audio. One challenge in training such models is that they need to jointly learn semantic concepts and their relationships across different input representations. Capsule networks have been shown to perform well in context of cap… ▽ More

    Submitted 1 December, 2021; originally announced December 2021.

  4. arXiv:2105.10782  [pdf, other

    cs.CV

    PLM: Partial Label Masking for Imbalanced Multi-label Classification

    Authors: Kevin Duarte, Yogesh S. Rawat, Mubarak Shah

    Abstract: Neural networks trained on real-world datasets with long-tailed label distributions are biased towards frequent classes and perform poorly on infrequent classes. The imbalance in the ratio of positive and negative samples for each class skews network output probabilities further from ground-truth distributions. We propose a method, Partial Label Masking (PLM), which utilizes this ratio during trai… ▽ More

    Submitted 22 May, 2021; originally announced May 2021.

    Comments: Accepted to the CVPR 2021 Learning from Limited or Imperfect Data (L2ID) Workshop

  5. arXiv:2105.04836  [pdf, other

    cs.CV

    Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules

    Authors: Aisha Urooj Khan, Hilde Kuehne, Kevin Duarte, Chuang Gan, Niels Lobo, Mubarak Shah

    Abstract: The problem of grounding VQA tasks has seen an increased attention in the research community recently, with most attempts usually focusing on solving this task by using pretrained object detectors. However, pre-trained object detectors require bounding box annotations for detecting relevant objects in the vocabulary, which may not always be feasible for real-life large-scale applications. In this… ▽ More

    Submitted 11 May, 2021; originally announced May 2021.

    Comments: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021

  6. arXiv:2104.12671  [pdf, other

    cs.CV

    Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos

    Authors: Brian Chen, Andrew Rouditchenko, Kevin Duarte, Hilde Kuehne, Samuel Thomas, Angie Boggust, Rameswar Panda, Brian Kingsbury, Rogerio Feris, David Harwath, James Glass, Michael Picheny, Shih-Fu Chang

    Abstract: Multimodal self-supervised learning is getting more and more attention as it allows not only to train large networks without human supervision but also to search and retrieve data across various modalities. In this context, this paper proposes a self-supervised training framework that learns a common multimodal embedding space that, in addition to sharing representations across different modalitie… ▽ More

    Submitted 3 September, 2021; v1 submitted 26 April, 2021; originally announced April 2021.

    Comments: To be presented at ICCV 2021

    Journal ref: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 8012-8021

  7. arXiv:2103.03027  [pdf, other

    cs.CV

    Modeling Multi-Label Action Dependencies for Temporal Action Localization

    Authors: Praveen Tirupattur, Kevin Duarte, Yogesh Rawat, Mubarak Shah

    Abstract: Real-world videos contain many complex actions with inherent relationships between action classes. In this work, we propose an attention-based architecture that models these action relationships for the task of temporal action localization in untrimmed videos. As opposed to previous works that leverage video-level co-occurrence of actions, we distinguish the relationships between actions that occu… ▽ More

    Submitted 29 May, 2021; v1 submitted 4 March, 2021; originally announced March 2021.

  8. arXiv:2101.06329  [pdf, other

    cs.LG cs.CV

    In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label Selection Framework for Semi-Supervised Learning

    Authors: Mamshad Nayeem Rizve, Kevin Duarte, Yogesh S Rawat, Mubarak Shah

    Abstract: The recent research in semi-supervised learning (SSL) is mostly dominated by consistency regularization based methods which achieve strong performance. However, they heavily rely on domain-specific data augmentations, which are not easy to generate for all data modalities. Pseudo-labeling (PL) is a general SSL approach that does not have this constraint but performs relatively poorly in its origin… ▽ More

    Submitted 19 April, 2021; v1 submitted 15 January, 2021; originally announced January 2021.

    Comments: ICLR 2021

  9. arXiv:2004.11475  [pdf, other

    cs.CV eess.IV

    Gabriella: An Online System for Real-Time Activity Detection in Untrimmed Security Videos

    Authors: Mamshad Nayeem Rizve, Ugur Demir, Praveen Tirupattur, Aayush Jung Rana, Kevin Duarte, Ishan Dave, Yogesh Singh Rawat, Mubarak Shah

    Abstract: Activity detection in security videos is a difficult problem due to multiple factors such as large field of view, presence of multiple activities, varying scales and viewpoints, and its untrimmed nature. The existing research in activity detection is mainly focused on datasets, such as UCF-101, JHMDB, THUMOS, and AVA, which partially address these issues. The requirement of processing the security… ▽ More

    Submitted 19 May, 2020; v1 submitted 23 April, 2020; originally announced April 2020.

    Comments: 9 pages

  10. arXiv:1910.00132  [pdf, other

    cs.CV eess.IV

    CapsuleVOS: Semi-Supervised Video Object Segmentation Using Capsule Routing

    Authors: Kevin Duarte, Yogesh S Rawat, Mubarak Shah

    Abstract: In this work we propose a capsule-based approach for semi-supervised video object segmentation. Current video object segmentation methods are frame-based and often require optical flow to capture temporal consistency across frames which can be difficult to compute. To this end, we propose a video based capsule network, CapsuleVOS, which can segment several frames at once conditioned on a reference… ▽ More

    Submitted 30 September, 2019; originally announced October 2019.

    Comments: 8 pages, 6 figures, ICCV 2019

  11. arXiv:1812.00303  [pdf, other

    cs.CV

    Multi-modal Capsule Routing for Actor and Action Video Segmentation Conditioned on Natural Language Queries

    Authors: Bruce McIntosh, Kevin Duarte, Yogesh S Rawat, Mubarak Shah

    Abstract: In this paper, we propose an end-to-end capsule network for pixel level localization of actors and actions present in a video. The localization is performed based on a natural language query through which an actor and action are specified. We propose to encode both the video as well as textual input in the form of capsules, which provide more effective representation in comparison with standard co… ▽ More

    Submitted 1 December, 2018; originally announced December 2018.

  12. arXiv:1805.08162  [pdf, other

    cs.CV

    VideoCapsuleNet: A Simplified Network for Action Detection

    Authors: Kevin Duarte, Yogesh S Rawat, Mubarak Shah

    Abstract: The recent advances in Deep Convolutional Neural Networks (DCNNs) have shown extremely good results for video human action classification, however, action detection is still a challenging problem. The current action detection approaches follow a complex pipeline which involves multiple tasks such as tube proposals, optical flow, and tube classification. In this work, we present a more elegant solu… ▽ More

    Submitted 21 May, 2018; originally announced May 2018.

  13. Toward a Motor Theory of Sign Language Perception

    Authors: Sylvie Gibet, Pierre-François Marteau, Kyle Duarte

    Abstract: Researches on signed languages still strongly dissociate lin- guistic issues related on phonological and phonetic aspects, and gesture studies for recognition and synthesis purposes. This paper focuses on the imbrication of motion and meaning for the analysis, synthesis and evaluation of sign language gestures. We discuss the relevance and interest of a motor theory of perception in sign language… ▽ More

    Submitted 8 January, 2012; originally announced January 2012.

    Comments: 12 pages Partiellement financé par le projet ANR SignCom

    Journal ref: Gesture and Sign Language in Human-Computer Interaction and Embodied Communication (2012) Vol. 7206, 161-172