Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–17 of 17 results for author: Andonian, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2303.13518  [pdf, other

    cs.CV cs.AI cs.LG

    Three ways to improve feature alignment for open vocabulary detection

    Authors: Relja Arandjelović, Alex Andonian, Arthur Mensch, Olivier J. Hénaff, Jean-Baptiste Alayrac, Andrew Zisserman

    Abstract: The core problem in zero-shot open vocabulary detection is how to align visual and text features, so that the detector performs well on unseen classes. Previous approaches train the feature pyramid and detection head from scratch, which breaks the vision-text feature alignment established during pretraining, and struggles to prevent the language model from forgetting unseen classes. We propose t… ▽ More

    Submitted 23 March, 2023; originally announced March 2023.

  2. arXiv:2210.07229  [pdf, other

    cs.CL cs.LG

    Mass-Editing Memory in a Transformer

    Authors: Kevin Meng, Arnab Sen Sharma, Alex Andonian, Yonatan Belinkov, David Bau

    Abstract: Recent work has shown exciting promise in updating large language models with new memories, so as to replace obsolete information or add specialized knowledge. However, this line of work is predominantly limited to updating single associations. We develop MEMIT, a method for directly updating a language model with many memories, demonstrating experimentally that it can scale up to thousands of ass… ▽ More

    Submitted 1 August, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

    Comments: 18 pages, 11 figures. Code and data at https://memit.baulab.info

  3. arXiv:2206.00535  [pdf, other

    cs.CV cs.HC cs.SI

    Deepfake Caricatures: Amplifying attention to artifacts increases deepfake detection by humans and machines

    Authors: Camilo Fosco, Emilie Josephs, Alex Andonian, Allen Lee, Xi Wang, Aude Oliva

    Abstract: Deepfakes pose a serious threat to digital well-being by fueling misinformation. As deepfakes get harder to recognize with the naked eye, human users become increasingly reliant on deepfake detection models to decide if a video is real or fake. Currently, models yield a prediction for a video's authenticity, but do not integrate a method for alerting a human user. We introduce a framework for ampl… ▽ More

    Submitted 10 April, 2023; v1 submitted 1 June, 2022; originally announced June 2022.

    Comments: 9 pages, 5 figures, 4 tables

  4. arXiv:2204.04588  [pdf, other

    cs.CV cs.LG

    Robust Cross-Modal Representation Learning with Progressive Self-Distillation

    Authors: Alex Andonian, Shixing Chen, Raffay Hamid

    Abstract: The learning objective of vision-language approach of CLIP does not effectively account for the noisy many-to-many correspondences found in web-harvested image captioning datasets, which contributes to its compute and data inefficiency. To address this challenge, we introduce a novel training framework based on cross-modal contrastive learning that uses progressive self-distillation and soft image… ▽ More

    Submitted 9 April, 2022; originally announced April 2022.

    Comments: Accepted to CVPR 2022

  5. arXiv:2202.05262  [pdf, other

    cs.CL cs.LG

    Locating and Editing Factual Associations in GPT

    Authors: Kevin Meng, David Bau, Alex Andonian, Yonatan Belinkov

    Abstract: We analyze the storage and recall of factual associations in autoregressive transformer language models, finding evidence that these associations correspond to localized, directly-editable computations. We first develop a causal intervention for identifying neuron activations that are decisive in a model's factual predictions. This reveals a distinct set of steps in middle-layer feed-forward modul… ▽ More

    Submitted 13 January, 2023; v1 submitted 10 February, 2022; originally announced February 2022.

    Comments: NeurIPS 2022. 35 pages, 30 figures. Code and data at https://rome.baulab.info/

    ACM Class: I.2.7

  6. arXiv:2111.06934  [pdf, other

    cs.CV cs.LG

    Contrastive Feature Loss for Image Prediction

    Authors: Alex Andonian, Taesung Park, Bryan Russell, Phillip Isola, Jun-Yan Zhu, Richard Zhang

    Abstract: Training supervised image synthesis models requires a critic to compare two images: the ground truth to the result. Yet, this basic functionality remains an open problem. A popular line of approaches uses the L1 (mean absolute error) loss, either in the pixel or the feature space of pretrained deep networks. However, we observe that these losses tend to produce overly blurry and grey images, and o… ▽ More

    Submitted 12 November, 2021; originally announced November 2021.

    Comments: Appeared in Advances in Image Manipulation Workshop at ICCV 2021. GitHub: https://github.com/alexandonian/contrastive-feature-loss

  7. arXiv:2104.13714  [pdf

    cs.CV q-bio.NC

    The Algonauts Project 2021 Challenge: How the Human Brain Makes Sense of a World in Motion

    Authors: R. M. Cichy, K. Dwivedi, B. Lahner, A. Lascelles, P. Iamshchinina, M. Graumann, A. Andonian, N. A. R. Murty, K. Kay, G. Roig, A. Oliva

    Abstract: The sciences of natural and artificial intelligence are fundamentally connected. Brain-inspired human-engineered AI are now the standard for predicting human brain responses during vision, and conversely, the brain continues to inspire invention in AI. To promote even deeper connections between these fields, we here release the 2021 edition of the Algonauts Project Challenge: How the Human Brain M… ▽ More

    Submitted 28 April, 2021; originally announced April 2021.

    Comments: 5 pages, 2 figures

  8. arXiv:2103.10951  [pdf, other

    cs.CV cs.AI cs.GR

    Paint by Word

    Authors: Alex Andonian, Sabrina Osmany, Audrey Cui, YeonHwan Park, Ali Jahanian, Antonio Torralba, David Bau

    Abstract: We investigate the problem of zero-shot semantic image painting. Instead of painting modifications into an image using only concrete colors or a finite set of semantic concepts, we ask how to create semantic paint based on open full-text descriptions: our goal is to be able to point to a location in a synthesized image and apply an arbitrary new concept such as "rustic" or "opulent" or "happy dog.… ▽ More

    Submitted 23 March, 2023; v1 submitted 19 March, 2021; originally announced March 2021.

    Comments: 10 pages, 9 figures

    ACM Class: I.2.10; I.4; I.3

  9. arXiv:2102.07887  [pdf, other

    cs.CV

    VA-RED$^2$: Video Adaptive Redundancy Reduction

    Authors: Bowen Pan, Rameswar Panda, Camilo Fosco, Chung-Ching Lin, Alex Andonian, Yue Meng, Kate Saenko, Aude Oliva, Rogerio Feris

    Abstract: Performing inference on deep learning models for videos remains a challenge due to the large amount of computational resources required to achieve robust recognition. An inherent property of real-world videos is the high correlation of information across frames which can translate into redundancy in either temporal or spatial feature maps of the models, or both. The type of redundant features depe… ▽ More

    Submitted 4 October, 2021; v1 submitted 15 February, 2021; originally announced February 2021.

    Comments: Accepted in ICLR 2021

  10. arXiv:2008.05596  [pdf, other

    cs.CV

    We Have So Much In Common: Modeling Semantic Relational Set Abstractions in Videos

    Authors: Alex Andonian, Camilo Fosco, Mathew Monfort, Allen Lee, Rogerio Feris, Carl Vondrick, Aude Oliva

    Abstract: Identifying common patterns among events is a key ability in human and machine perception, as it underlies intelligent decision making. We propose an approach for learning semantic relational set abstractions on videos, inspired by human learning. We combine visual features with natural language supervision to generate high-level representations of similarities across a set of videos. This allows… ▽ More

    Submitted 12 August, 2020; originally announced August 2020.

    Comments: European Conference on Computer Vision (ECCV) 2020, accepted

  11. arXiv:1911.00232  [pdf, other

    cs.CV cs.LG eess.IV

    Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video Understanding

    Authors: Mathew Monfort, Bowen Pan, Kandan Ramakrishnan, Alex Andonian, Barry A McNamara, Alex Lascelles, Quanfu Fan, Dan Gutfreund, Rogerio Feris, Aude Oliva

    Abstract: Videos capture events that typically contain multiple sequential, and simultaneous, actions even in the span of only a few seconds. However, most large-scale datasets built to train models for action recognition in video only provide a single label per video. Consequently, models can be incorrectly penalized for classifying actions that exist in the videos but are not explicitly labeled and do not… ▽ More

    Submitted 27 September, 2021; v1 submitted 1 November, 2019; originally announced November 2019.

  12. arXiv:1906.10112  [pdf, other

    cs.CV

    GANalyze: Toward Visual Definitions of Cognitive Image Properties

    Authors: Lore Goetschalckx, Alex Andonian, Aude Oliva, Phillip Isola

    Abstract: We introduce a framework that uses Generative Adversarial Networks (GANs) to study cognitive properties like memorability, aesthetics, and emotional valence. These attributes are of interest because we do not have a concrete visual definition of what they entail. What does it look like for a dog to be more or less memorable? GANs allow us to generate a manifold of natural-looking images with fine-… ▽ More

    Submitted 24 June, 2019; originally announced June 2019.

    Comments: 17 pages, 15 figures

  13. Cross-view Semantic Segmentation for Sensing Surroundings

    Authors: Bowen Pan, Jiankai Sun, Ho Yin Tiga Leung, Alex Andonian, Bolei Zhou

    Abstract: Sensing surroundings plays a crucial role in human spatial perception, as it extracts the spatial configuration of objects as well as the free space from the observations. To facilitate the robot perception with such a surrounding sensing capability, we introduce a novel visual task called Cross-view Semantic Segmentation as well as a framework named View Parsing Network (VPN) to address it. In th… ▽ More

    Submitted 18 June, 2020; v1 submitted 9 June, 2019; originally announced June 2019.

    Journal ref: IEEE Robotics and Automation Letters ( Volume: 5 , Issue: 3 , July 2020 )

  14. arXiv:1905.11954  [pdf, other

    cs.CV cs.AI cs.LG

    Unsupervised Learning from Video with Deep Neural Embeddings

    Authors: Chengxu Zhuang, Tianwei She, Alex Andonian, Max Sobol Mark, Daniel Yamins

    Abstract: Because of the rich dynamical structure of videos and their ubiquity in everyday life, it is a natural idea that video data could serve as a powerful unsupervised learning signal for training visual representations in deep neural networks. However, instantiating this idea, especially at large scale, has remained a significant artificial intelligence challenge. Here we present the Video Instance Em… ▽ More

    Submitted 10 March, 2020; v1 submitted 28 May, 2019; originally announced May 2019.

    Comments: To appear in CVPR 2020

  15. arXiv:1905.05675  [pdf, other

    cs.CV cs.AI q-bio.NC

    The Algonauts Project: A Platform for Communication between the Sciences of Biological and Artificial Intelligence

    Authors: Radoslaw Martin Cichy, Gemma Roig, Alex Andonian, Kshitij Dwivedi, Benjamin Lahner, Alex Lascelles, Yalda Mohsenzadeh, Kandan Ramakrishnan, Aude Oliva

    Abstract: In the last decade, artificial intelligence (AI) models inspired by the brain have made unprecedented progress in performing real-world perceptual tasks like object classification and speech recognition. Recently, researchers of natural intelligence have begun using those AI models to explore how the brain performs such tasks. These developments suggest that future progress will benefit from incre… ▽ More

    Submitted 14 May, 2019; originally announced May 2019.

    Comments: 4 pages, 2 figures

  16. arXiv:1801.03150  [pdf, other

    cs.CV cs.AI

    Moments in Time Dataset: one million videos for event understanding

    Authors: Mathew Monfort, Alex Andonian, Bolei Zhou, Kandan Ramakrishnan, Sarah Adel Bargal, Tom Yan, Lisa Brown, Quanfu Fan, Dan Gutfruend, Carl Vondrick, Aude Oliva

    Abstract: We present the Moments in Time Dataset, a large-scale human-annotated collection of one million short videos corresponding to dynamic events unfolding within three seconds. Modeling the spatial-audio-temporal dynamics even for actions occurring in 3 second videos poses many challenges: meaningful events do not include only people, but also objects, animals, and natural phenomena; visual and audito… ▽ More

    Submitted 16 February, 2019; v1 submitted 9 January, 2018; originally announced January 2018.

  17. arXiv:1711.08496  [pdf, other

    cs.CV

    Temporal Relational Reasoning in Videos

    Authors: Bolei Zhou, Alex Andonian, Aude Oliva, Antonio Torralba

    Abstract: Temporal relational reasoning, the ability to link meaningful transformations of objects or entities over time, is a fundamental property of intelligent species. In this paper, we introduce an effective and interpretable network module, the Temporal Relation Network (TRN), designed to learn and reason about temporal dependencies between video frames at multiple time scales. We evaluate TRN-equippe… ▽ More

    Submitted 24 July, 2018; v1 submitted 22 November, 2017; originally announced November 2017.

    Comments: camera-ready version for ECCV'18