Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–31 of 31 results for author: Ullman, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.12736  [pdf, other

    cs.CV

    Towards Multimodal In-Context Learning for Vision & Language Models

    Authors: Sivan Doveh, Shaked Perek, M. Jehanzeb Mirza, Wei Lin, Amit Alfassy, Assaf Arbelle, Shimon Ullman, Leonid Karlinsky

    Abstract: State-of-the-art Vision-Language Models (VLMs) ground the vision and the language modality primarily via projecting the vision tokens from the encoder to language-like tokens, which are directly fed to the Large Language Model (LLM) decoder. While these models have shown unprecedented performance in many downstream zero-shot tasks (eg image captioning, question answers, etc), still little emphasis… ▽ More

    Submitted 17 July, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

  2. arXiv:2311.15276  [pdf, other

    cs.CV cs.LG

    Efficient Rehearsal Free Zero Forgetting Continual Learning using Adaptive Weight Modulation

    Authors: Yonatan Sverdlov, Shimon Ullman

    Abstract: Artificial neural networks encounter a notable challenge known as continual learning, which involves acquiring knowledge of multiple tasks over an extended period. This challenge arises due to the tendency of previously learned weights to be adjusted to suit the objectives of new tasks, resulting in a phenomenon called catastrophic forgetting. Most approaches to this problem seek a balance between… ▽ More

    Submitted 26 November, 2023; originally announced November 2023.

  3. arXiv:2306.02415  [pdf, other

    cs.AI

    Biologically-Motivated Learning Model for Instructed Visual Processing

    Authors: Roy Abel, Shimon Ullman

    Abstract: As part of understanding how the brain learns, ongoing work seeks to combine biological knowledge and current artificial intelligence (AI) modeling in an attempt to find an efficient biologically plausible learning scheme. Current models of biologically plausible learning often use a cortical-like combination of bottom-up (BU) and top-down (TD) processing, where the TD part carries feedback signal… ▽ More

    Submitted 16 June, 2024; v1 submitted 4 June, 2023; originally announced June 2023.

  4. arXiv:2305.19595  [pdf, other

    cs.CV

    Dense and Aligned Captions (DAC) Promote Compositional Reasoning in VL Models

    Authors: Sivan Doveh, Assaf Arbelle, Sivan Harary, Roei Herzig, Donghyun Kim, Paola Cascante-bonilla, Amit Alfassy, Rameswar Panda, Raja Giryes, Rogerio Feris, Shimon Ullman, Leonid Karlinsky

    Abstract: Vision and Language (VL) models offer an effective method for aligning representation spaces of images and text, leading to numerous applications such as cross-modal retrieval, visual question answering, captioning, and more. However, the aligned image-text spaces learned by all the popular VL models are still suffering from the so-called `object bias' - their representations behave as `bags of no… ▽ More

    Submitted 1 June, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

  5. arXiv:2211.11733  [pdf, other

    cs.CV

    Teaching Structured Vision&Language Concepts to Vision&Language Models

    Authors: Sivan Doveh, Assaf Arbelle, Sivan Harary, Rameswar Panda, Roei Herzig, Eli Schwartz, Donghyun Kim, Raja Giryes, Rogerio Feris, Shimon Ullman, Leonid Karlinsky

    Abstract: Vision and Language (VL) models have demonstrated remarkable zero-shot performance in a variety of tasks. However, some aspects of complex language understanding still remain a challenge. We introduce the collective notion of Structured Vision&Language Concepts (SVLC) which includes object attributes, relations, and states which are present in the text and visible in the image. Recent studies have… ▽ More

    Submitted 30 May, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

    Journal ref: CVPR 2023

  6. arXiv:2110.08744  [pdf

    cs.AI q-bio.NC

    A model for full local image interpretation

    Authors: Guy Ben-Yosef, Liav Assif, Daniel Harari, Shimon Ullman

    Abstract: We describe a computational model of humans' ability to provide a detailed interpretation of components in a scene. Humans can identify in an image meaningful components almost everywhere, and identifying these components is an essential part of the visual process, and of understanding the surrounding scene and its potential meaning to the viewer. Detailed interpretation is beyond the scope of cur… ▽ More

    Submitted 17 October, 2021; originally announced October 2021.

    Comments: Published in the Proceedings of the 37th Annual Meeting of the Cognitive Science Society (CogSci), 2015

    Journal ref: https://cogsci.mindmodeling.org/2015/papers/0048/

  7. arXiv:2108.01696  [pdf

    cs.CR

    Linking Common Vulnerabilities and Exposures to the MITRE ATT&CK Framework: A Self-Distillation Approach

    Authors: Benjamin Ampel, Sagar Samtani, Steven Ullman, Hsinchun Chen

    Abstract: Due to the ever-increasing threat of cyber-attacks to critical cyber infrastructure, organizations are focusing on building their cybersecurity knowledge base. A salient list of cybersecurity knowledge is the Common Vulnerabilities and Exposures (CVE) list, which details vulnerabilities found in a wide range of software and hardware. However, these vulnerabilities often do not have a mitigation st… ▽ More

    Submitted 3 August, 2021; originally announced August 2021.

    Comments: Proceedings of 2021 ACM Conference Knowledge Discovery and Data Mining (KDD' 21) Workshop on AI-enabled Cybersecurity Analytics

  8. arXiv:2105.05592  [pdf

    cs.CV q-bio.NC

    Image interpretation by iterative bottom-up top-down processing

    Authors: Shimon Ullman, Liav Assif, Alona Strugatski, Ben-Zion Vatashsky, Hila Levy, Aviv Netanyahu, Adam Yaari

    Abstract: Scene understanding requires the extraction and representation of scene components together with their properties and inter-relations. We describe a model in which meaningful scene structures are extracted from the image by an iterative process, combining bottom-up (BU) and top-down (TD) networks, interacting through a symmetric bi-directional communication between them (counter-streams structure)… ▽ More

    Submitted 12 May, 2021; originally announced May 2021.

  9. arXiv:2104.09829  [pdf, other

    cs.CV

    Detector-Free Weakly Supervised Grounding by Separation

    Authors: Assaf Arbelle, Sivan Doveh, Amit Alfassy, Joseph Shtok, Guy Lev, Eli Schwartz, Hilde Kuehne, Hila Barak Levi, Prasanna Sattigeri, Rameswar Panda, Chun-Fu Chen, Alex Bronstein, Kate Saenko, Shimon Ullman, Raja Giryes, Rogerio Feris, Leonid Karlinsky

    Abstract: Nowadays, there is an abundance of data involving images and surrounding free-form text weakly corresponding to those images. Weakly Supervised phrase-Grounding (WSG) deals with the task of using this data to learn to localize (or to ground) arbitrary text phrases in images without any additional annotations. However, most recent SotA methods for WSG assume the existence of a pre-trained object de… ▽ More

    Submitted 20 April, 2021; originally announced April 2021.

  10. What can human minimal videos tell us about dynamic recognition models?

    Authors: Guy Ben-Yosef, Gabriel Kreiman, Shimon Ullman

    Abstract: In human vision objects and their parts can be visually recognized from purely spatial or purely temporal information but the mechanisms integrating space and time are poorly understood. Here we show that human visual recognition of objects and actions can be achieved by efficiently combining spatial and motion cues in configurations where each source on its own is insufficient for recognition. Th… ▽ More

    Submitted 19 April, 2021; originally announced April 2021.

    Comments: Published as a workshop paper at Bridging AI and Cognitive Science (ICLR 2020). Extended paper was published at Cognition

  11. arXiv:2006.05249  [pdf

    q-bio.NC cs.AI cs.CV

    What takes the brain so long: Object recognition at the level of minimal images develops for up to seconds of presentation time

    Authors: Hanna Benoni, Daniel Harari, Shimon Ullman

    Abstract: Rich empirical evidence has shown that visual object recognition in the brain is fast and effortless, with relevant brain signals reported to start as early as 80 ms. Here we study the time trajectory of the recognition process at the level of minimal recognizable images (termed MIRC). These are images that can be recognized reliably, but in which a minute change of the image (reduction by either… ▽ More

    Submitted 9 June, 2020; originally announced June 2020.

    Comments: 7 pages, 2 figures, 1 table

  12. arXiv:2002.03335  [pdf, other

    cs.LG cs.CV stat.ML

    Multi-Task Learning by a Top-Down Control Network

    Authors: Hila Levi, Shimon Ullman

    Abstract: As the range of tasks performed by a general vision system expands, executing multiple tasks accurately and efficiently in a single network has become an important and still open problem. Recent computer vision approaches address this problem by branching networks, or by a channel-wise modulation of the network feature-maps with task specific vectors. We present a novel architecture that uses a de… ▽ More

    Submitted 29 October, 2020; v1 submitted 9 February, 2020; originally announced February 2020.

  13. arXiv:1811.12152  [pdf, other

    cs.CV

    Efficient Coarse-to-Fine Non-Local Module for the Detection of Small Objects

    Authors: Hila Levi, Shimon Ullman

    Abstract: An image is not just a collection of objects, but rather a graph where each object is related to other objects through spatial and semantic relations. Using relational reasoning modules, such as the non-local module \cite{wang2017non}, can therefore improve object detection. Current schemes apply such dedicated modules either to a specific layer of the bottom-up stream, or between already-detected… ▽ More

    Submitted 20 May, 2019; v1 submitted 29 November, 2018; originally announced November 2018.

  14. arXiv:1811.08481  [pdf, other

    cs.CV

    VQA with no questions-answers training

    Authors: Ben-Zion Vatashsky, Shimon Ullman

    Abstract: Methods for teaching machines to answer visual questions have made significant progress in recent years, but current methods still lack important human capabilities, including integrating new visual classes and concepts in a modular manner, providing explanations for the answers and handling new domains without explicit examples. We propose a novel method that consists of two main parts: generatin… ▽ More

    Submitted 26 May, 2020; v1 submitted 20 November, 2018; originally announced November 2018.

    Comments: Accepted to CVPR 2020

  15. arXiv:1810.10656  [pdf, other

    cs.CV

    Understand, Compose and Respond - Answering Visual Questions by a Composition of Abstract Procedures

    Authors: Ben Zion Vatashsky, Shimon Ullman

    Abstract: An image related question defines a specific visual task that is required in order to produce an appropriate answer. The answer may depend on a minor detail in the image and require complex reasoning and use of prior knowledge. When humans perform this task, they are able to do it in a flexible and robust manner, integrating modularly any novel visual capability with diverse options for various el… ▽ More

    Submitted 24 October, 2018; originally announced October 2018.

  16. arXiv:1804.04604  [pdf, other

    cs.CV cs.AI cs.RO q-bio.NC

    Discovery and usage of joint attention in images

    Authors: Daniel Harari, Joshua B. Tenenbaum, Shimon Ullman

    Abstract: Joint visual attention is characterized by two or more individuals looking at a common target at the same time. The ability to identify joint attention in scenes, the people involved, and their common target, is fundamental to the understanding of social interactions, including others' intentions and goals. In this work we deal with the extraction of joint attention events, and the use of such eve… ▽ More

    Submitted 10 April, 2018; originally announced April 2018.

    Comments: 6 pages, 3 figures

  17. arXiv:1804.03576  [pdf, other

    cs.CV

    Large Field and High Resolution: Detecting Needle in Haystack

    Authors: Hadar Gorodissky, Daniel Harari, Shimon Ullman

    Abstract: The growing use of convolutional neural networks (CNN) for a broad range of visual tasks, including tasks involving fine details, raises the problem of applying such networks to a large field of view, since the amount of computations increases significantly with the number of pixels. To deal effectively with this difficulty, we develop and compare methods of using CNNs for the task of small target… ▽ More

    Submitted 10 April, 2018; originally announced April 2018.

    Comments: 15 pages, 7 figures

  18. arXiv:1802.09030  [pdf, ps, other

    stat.ML cs.AI cs.LG

    Cakewalk Sampling

    Authors: Uri Patish, Shimon Ullman

    Abstract: We study the task of finding good local optima in combinatorial optimization problems. Although combinatorial optimization is NP-hard in general, locally optimal solutions are frequently used in practice. Local search methods however typically converge to a limited set of optima that depend on their initialization. Sampling methods on the other hand can access any valid solution, and thus can be u… ▽ More

    Submitted 1 January, 2020; v1 submitted 25 February, 2018; originally announced February 2018.

    Comments: Accepted as a conference paper by AAAI-2020 (oral presentation)

  19. arXiv:1712.09299  [pdf

    cs.CV

    A model for interpreting social interactions in local image regions

    Authors: Guy Ben-Yosef, Alon Yachin, Shimon Ullman

    Abstract: Understanding social interactions (such as 'hug' or 'fight') is a basic and important capacity of the human visual system, but a challenging and still open problem for modeling. In this work we study visual recognition of social interactions, based on small but recognizable local regions. The approach is based on two novel key components: (i) A given social interaction can be recognized reliably f… ▽ More

    Submitted 26 December, 2017; originally announced December 2017.

    Comments: In AAAI spring symposium on Science of Intelligence: Computational Principles of Natural and Artificial Intelligence, Palo Alto, 2017

  20. arXiv:1711.11151  [pdf

    cs.CV

    Structured learning and detailed interpretation of minimal object images

    Authors: Guy Ben-Yosef, Liav Assif, Shimon Ullman

    Abstract: We model the process of human full interpretation of object images, namely the ability to identify and localize all semantic features and parts that are recognized by human observers. The task is approached by dividing the interpretation of the complete object to the interpretation of multiple reduced but interpretable local regions. We model interpretation by a structured learning framework, in w… ▽ More

    Submitted 29 November, 2017; originally announced November 2017.

    Comments: Accepted to Workshop on Mutual Benefits of Cognitive and Computer Vision, at the International Conference on Computer Vision. Venice, Italy, 2017

  21. arXiv:1611.09819  [pdf

    q-bio.NC cs.AI cs.CV cs.LG

    Measuring and modeling the perception of natural and unconstrained gaze in humans and machines

    Authors: Daniel Harari, Tao Gao, Nancy Kanwisher, Joshua Tenenbaum, Shimon Ullman

    Abstract: Humans are remarkably adept at interpreting the gaze direction of other individuals in their surroundings. This skill is at the core of the ability to engage in joint visual attention, which is essential for establishing social interactions. How accurate are humans in determining the gaze direction of others in lifelike scenes, when they can move their heads and eyes freely, and what are the sourc… ▽ More

    Submitted 29 November, 2016; originally announced November 2016.

    Comments: Daniel Harari and Tao Gao contributed equally to this work

    Report number: Center for Brains, Minds and Machines Memo No. 059

  22. arXiv:1610.09625  [pdf

    q-bio.NC cs.CV cs.LG

    Discovering containment: from infants to machines

    Authors: Shimon Ullman, Nimrod Dorfman, Daniel Harari

    Abstract: Current artificial learning systems can recognize thousands of visual categories, or play Go at a champion"s level, but cannot explain infants learning, in particular the ability to learn complex concepts without guidance, in a specific order. A notable example is the category of 'containers' and the notion of containment, one of the earliest spatial relations to be learned, starting already at 2.… ▽ More

    Submitted 30 October, 2016; originally announced October 2016.

    Journal ref: Cognition 183 (2019) 67-81

  23. arXiv:1605.07824  [pdf, other

    cs.CV cs.LG

    Action Classification via Concepts and Attributes

    Authors: Amir Rosenfeld, Shimon Ullman

    Abstract: Classes in natural images tend to follow long tail distributions. This is problematic when there are insufficient training examples for rare classes. This effect is emphasized in compound classes, involving the conjunction of several concepts, such as those appearing in action-recognition datasets. In this paper, we propose to address this issue by learning how to utilize common visual concepts wh… ▽ More

    Submitted 6 March, 2018; v1 submitted 25 May, 2016; originally announced May 2016.

  24. arXiv:1603.08212  [pdf, other

    cs.CV cs.LG

    Human Pose Estimation using Deep Consensus Voting

    Authors: Ita Lifshitz, Ethan Fetaya, Shimon Ullman

    Abstract: In this paper we consider the problem of human pose estimation from a single still image. We propose a novel approach where each location in the image votes for the position of each keypoint using a convolutional neural net. The voting scheme allows us to utilize information from the whole image, rather than rely on a sparse set of keypoint locations. Using dense, multi-target votes, not only prod… ▽ More

    Submitted 27 March, 2016; originally announced March 2016.

  25. arXiv:1603.08079  [pdf, other

    cs.CV cs.AI cs.CL

    Do You See What I Mean? Visual Resolution of Linguistic Ambiguities

    Authors: Yevgeni Berzak, Andrei Barbu, Daniel Harari, Boris Katz, Shimon Ullman

    Abstract: Understanding language goes hand in hand with the ability to integrate complex contextual information obtained via perception. In this work, we present a novel task for grounded language understanding: disambiguating a sentence given a visual scene which depicts one of the possible interpretations of that sentence. To this end, we introduce a new multimodal corpus containing ambiguous sentences, r… ▽ More

    Submitted 26 March, 2016; originally announced March 2016.

    Comments: EMNLP 2015

    Journal ref: Conference on Empirical Methods in Natural Language Processing (EMNLP), 2015, pages 1477--1487

  26. arXiv:1603.04186  [pdf, other

    cs.CV cs.LG

    Visual Concept Recognition and Localization via Iterative Introspection

    Authors: Amir Rosenfeld, Shimon Ullman

    Abstract: Convolutional neural networks have been shown to develop internal representations, which correspond closely to semantically meaningful objects and parts, although trained solely on class labels. Class Activation Mapping (CAM) is a recent method that makes it possible to easily highlight the image regions contributing to a network's classification decision. We build upon these two developments to e… ▽ More

    Submitted 25 May, 2016; v1 submitted 14 March, 2016; originally announced March 2016.

  27. arXiv:1601.04293  [pdf, other

    cs.CV

    Face-space Action Recognition by Face-Object Interactions

    Authors: Amir Rosenfeld, Shimon Ullman

    Abstract: Action recognition in still images has seen major improvement in recent years due to advances in human pose estimation, object recognition and stronger feature representations. However, there are still many cases in which performance remains far from that of humans. In this paper, we approach the problem by learning explicitly, and then integrating three components of transitive actions: (1) the h… ▽ More

    Submitted 17 January, 2016; originally announced January 2016.

    Comments: our more recent work on a related topic is described in a separate paper : http://arxiv.org/abs/1511.03814

  28. arXiv:1511.03814  [pdf, other

    cs.CV

    Hand-Object Interaction and Precise Localization in Transitive Action Recognition

    Authors: Amir Rosenfeld, Shimon Ullman

    Abstract: Action recognition in still images has seen major improvement in recent years due to advances in human pose estimation, object recognition and stronger feature representations produced by deep neural networks. However, there are still many cases in which performance remains far from that of humans. A major difficulty arises in distinguishing between transitive actions in which the overall actor po… ▽ More

    Submitted 24 February, 2016; v1 submitted 12 November, 2015; originally announced November 2015.

    Comments: Minor changes: title and abstract

  29. arXiv:1502.01176  [pdf, other

    cs.LG stat.ML

    Learning Local Invariant Mahalanobis Distances

    Authors: Ethan Fetaya, Shimon Ullman

    Abstract: For many tasks and data types, there are natural transformations to which the data should be invariant or insensitive. For instance, in visual recognition, natural images should be insensitive to rotation and translation. This requirement and its implications have been important in many machine learning applications, and tolerance for image transformations was primarily achieved by using robust fe… ▽ More

    Submitted 4 February, 2015; originally announced February 2015.

  30. arXiv:1412.2672  [pdf

    cs.AI cs.CV

    When Computer Vision Gazes at Cognition

    Authors: Tao Gao, Daniel Harari, Joshua Tenenbaum, Shimon Ullman

    Abstract: Joint attention is a core, early-developing form of social interaction. It is based on our ability to discriminate the third party objects that other people are looking at. While it has been shown that people can accurately determine whether another person is looking directly at them versus away, little is known about human ability to discriminate a third person gaze directed towards objects that… ▽ More

    Submitted 8 December, 2014; originally announced December 2014.

    Comments: Tao Gao and Daniel Harari contributed equally to this work

    Report number: CBMM Memo No. 025, MIT

  31. arXiv:1406.2602  [pdf, other

    stat.ML cs.AI cs.CV cs.LG

    Graph Approximation and Clustering on a Budget

    Authors: Ethan Fetaya, Ohad Shamir, Shimon Ullman

    Abstract: We consider the problem of learning from a similarity matrix (such as spectral clustering and lowd imensional embedding), when computing pairwise similarities are costly, and only a limited number of entries can be observed. We provide a theoretical analysis using standard notions of graph approximation, significantly generalizing previous results (which focused on spectral clustering with two clu… ▽ More

    Submitted 10 June, 2014; originally announced June 2014.