Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–20 of 20 results for author: Arbelle, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.15334  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning

    Authors: Brandon Huang, Chancharik Mitra, Assaf Arbelle, Leonid Karlinsky, Trevor Darrell, Roei Herzig

    Abstract: The recent success of interleaved Large Multimodal Models (LMMs) in few-shot learning suggests that in-context learning (ICL) with many examples can be promising for learning new tasks. However, this many-shot multimodal ICL setting has one crucial problem: it is fundamentally limited by the model's context length set at pretraining. The problem is especially prominent in the multimodal domain, wh… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  2. arXiv:2406.08164  [pdf, other

    cs.CV

    ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs

    Authors: Irene Huang, Wei Lin, M. Jehanzeb Mirza, Jacob A. Hansen, Sivan Doveh, Victor Ion Butoi, Roei Herzig, Assaf Arbelle, Hilde Kuhene, Trevor Darrel, Chuang Gan, Aude Oliva, Rogerio Feris, Leonid Karlinsky

    Abstract: Compositional Reasoning (CR) entails grasping the significance of attributes, relations, and word order. Recent Vision-Language Models (VLMs), comprising a visual encoder and a Large Language Model (LLM) decoder, have demonstrated remarkable proficiency in such reasoning tasks. This prompts a crucial question: have VLMs effectively tackled the CR challenge? We conjecture that existing CR benchmark… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: The first three authors contributed equally

  3. arXiv:2404.00459  [pdf, other

    cs.CL

    NumeroLogic: Number Encoding for Enhanced LLMs' Numerical Reasoning

    Authors: Eli Schwartz, Leshem Choshen, Joseph Shtok, Sivan Doveh, Leonid Karlinsky, Assaf Arbelle

    Abstract: Language models struggle with handling numerical data and performing arithmetic operations. We hypothesize that this limitation can be partially attributed to non-intuitive textual numbers representation. When a digit is read or generated by a causal language model it does not know its place value (e.g. thousands vs. hundreds) until the entire number is processed. To address this issue, we propose… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  4. arXiv:2403.12736  [pdf, other

    cs.CV

    Towards Multimodal In-Context Learning for Vision & Language Models

    Authors: Sivan Doveh, Shaked Perek, M. Jehanzeb Mirza, Amit Alfassy, Assaf Arbelle, Shimon Ullman, Leonid Karlinsky

    Abstract: Inspired by the emergence of Large Language Models (LLMs) that can truly understand human language, significant progress has been made in aligning other, non-language, modalities to be `understandable' by an LLM, primarily via converting their samples into a sequence of embedded language-like tokens directly fed into the LLM (decoder) input stream. However, so far limited attention has been given… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  5. arXiv:2305.19595  [pdf, other

    cs.CV

    Dense and Aligned Captions (DAC) Promote Compositional Reasoning in VL Models

    Authors: Sivan Doveh, Assaf Arbelle, Sivan Harary, Roei Herzig, Donghyun Kim, Paola Cascante-bonilla, Amit Alfassy, Rameswar Panda, Raja Giryes, Rogerio Feris, Shimon Ullman, Leonid Karlinsky

    Abstract: Vision and Language (VL) models offer an effective method for aligning representation spaces of images and text, leading to numerous applications such as cross-modal retrieval, visual question answering, captioning, and more. However, the aligned image-text spaces learned by all the popular VL models are still suffering from the so-called `object bias' - their representations behave as `bags of no… ▽ More

    Submitted 1 June, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

  6. arXiv:2305.06343  [pdf, other

    cs.CV

    Incorporating Structured Representations into Pretrained Vision & Language Models Using Scene Graphs

    Authors: Roei Herzig, Alon Mendelson, Leonid Karlinsky, Assaf Arbelle, Rogerio Feris, Trevor Darrell, Amir Globerson

    Abstract: Vision and language models (VLMs) have demonstrated remarkable zero-shot (ZS) performance in a variety of tasks. However, recent works have shown that even the best VLMs struggle to capture aspects of compositional scene understanding, such as object attributes, relations, and action states. In contrast, obtaining structured annotations, such as scene graphs (SGs), that could improve these models… ▽ More

    Submitted 24 October, 2023; v1 submitted 10 May, 2023; originally announced May 2023.

    Comments: EMNLP 2023

  7. arXiv:2212.04821  [pdf, other

    cs.CV

    PromptonomyViT: Multi-Task Prompt Learning Improves Video Transformers using Synthetic Scene Data

    Authors: Roei Herzig, Ofir Abramovich, Elad Ben-Avraham, Assaf Arbelle, Leonid Karlinsky, Ariel Shamir, Trevor Darrell, Amir Globerson

    Abstract: Action recognition models have achieved impressive results by incorporating scene-level annotations, such as objects, their relations, 3D structure, and more. However, obtaining annotations of scene structure for videos requires a significant amount of effort to gather and annotate, making these methods expensive to train. In contrast, synthetic datasets generated by graphics engines provide power… ▽ More

    Submitted 5 December, 2023; v1 submitted 8 December, 2022; originally announced December 2022.

    Comments: WACV 2024

  8. arXiv:2211.14307  [pdf, other

    cs.CV

    MAEDAY: MAE for few and zero shot AnomalY-Detection

    Authors: Eli Schwartz, Assaf Arbelle, Leonid Karlinsky, Sivan Harary, Florian Scheidegger, Sivan Doveh, Raja Giryes

    Abstract: We propose using Masked Auto-Encoder (MAE), a transformer model self-supervisedly trained on image inpainting, for anomaly detection (AD). Assuming anomalous regions are harder to reconstruct compared with normal regions. MAEDAY is the first image-reconstruction-based anomaly detection method that utilizes a pre-trained model, enabling its use for Few-Shot Anomaly Detection (FSAD). We also show th… ▽ More

    Submitted 15 February, 2024; v1 submitted 25 November, 2022; originally announced November 2022.

    Comments: Computer Vision and Image Understanding, 2024

  9. arXiv:2211.13218  [pdf, other

    cs.CV cs.AI cs.LG

    CODA-Prompt: COntinual Decomposed Attention-based Prompting for Rehearsal-Free Continual Learning

    Authors: James Seale Smith, Leonid Karlinsky, Vyshnavi Gutta, Paola Cascante-Bonilla, Donghyun Kim, Assaf Arbelle, Rameswar Panda, Rogerio Feris, Zsolt Kira

    Abstract: Computer vision models suffer from a phenomenon known as catastrophic forgetting when learning novel concepts from continuously shifting training data. Typical solutions for this continual learning problem require extensive rehearsal of previously seen data, which increases memory costs and may violate data privacy. Recently, the emergence of large-scale pre-trained vision transformer models has e… ▽ More

    Submitted 30 March, 2023; v1 submitted 23 November, 2022; originally announced November 2022.

    Comments: Accepted by the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023)

  10. arXiv:2211.11733  [pdf, other

    cs.CV

    Teaching Structured Vision&Language Concepts to Vision&Language Models

    Authors: Sivan Doveh, Assaf Arbelle, Sivan Harary, Rameswar Panda, Roei Herzig, Eli Schwartz, Donghyun Kim, Raja Giryes, Rogerio Feris, Shimon Ullman, Leonid Karlinsky

    Abstract: Vision and Language (VL) models have demonstrated remarkable zero-shot performance in a variety of tasks. However, some aspects of complex language understanding still remain a challenge. We introduce the collective notion of Structured Vision&Language Concepts (SVLC) which includes object attributes, relations, and states which are present in the text and visible in the image. Recent studies have… ▽ More

    Submitted 30 May, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

    Journal ref: CVPR 2023

  11. arXiv:2211.09790  [pdf, other

    cs.LG cs.AI cs.CV

    ConStruct-VL: Data-Free Continual Structured VL Concepts Learning

    Authors: James Seale Smith, Paola Cascante-Bonilla, Assaf Arbelle, Donghyun Kim, Rameswar Panda, David Cox, Diyi Yang, Zsolt Kira, Rogerio Feris, Leonid Karlinsky

    Abstract: Recently, large-scale pre-trained Vision-and-Language (VL) foundation models have demonstrated remarkable capabilities in many zero-shot downstream tasks, achieving competitive results for recognizing objects defined by as little as short text prompts. However, it has also been shown that VL models are still brittle in Structured VL Concept (SVLC) reasoning, such as the ability to recognize object… ▽ More

    Submitted 30 March, 2023; v1 submitted 17 November, 2022; originally announced November 2022.

    Comments: Accepted by the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023)

  12. arXiv:2209.03648  [pdf, other

    cs.CV

    FETA: Towards Specializing Foundation Models for Expert Task Applications

    Authors: Amit Alfassy, Assaf Arbelle, Oshri Halimi, Sivan Harary, Roei Herzig, Eli Schwartz, Rameswar Panda, Michele Dolfi, Christoph Auer, Kate Saenko, PeterW. J. Staar, Rogerio Feris, Leonid Karlinsky

    Abstract: Foundation Models (FMs) have demonstrated unprecedented capabilities including zero-shot learning, high fidelity data synthesis, and out of domain generalization. However, as we show in this paper, FMs still have poor out-of-the-box performance on expert tasks (e.g. retrieval of car manuals technical illustrations from language queries), data for which is either unseen or belonging to a long-tail… ▽ More

    Submitted 19 December, 2022; v1 submitted 8 September, 2022; originally announced September 2022.

  13. arXiv:2112.02300  [pdf, other

    cs.CV

    Unsupervised Domain Generalization by Learning a Bridge Across Domains

    Authors: Sivan Harary, Eli Schwartz, Assaf Arbelle, Peter Staar, Shady Abu-Hussein, Elad Amrani, Roei Herzig, Amit Alfassy, Raja Giryes, Hilde Kuehne, Dina Katabi, Kate Saenko, Rogerio Feris, Leonid Karlinsky

    Abstract: The ability to generalize learned representations across significantly different visual domains, such as between real photos, clipart, paintings, and sketches, is a fundamental capacity of the human visual system. In this paper, different from most cross-domain works that utilize some (or full) source domain supervision, we approach a relatively new and very practical Unsupervised Domain Generaliz… ▽ More

    Submitted 17 May, 2022; v1 submitted 4 December, 2021; originally announced December 2021.

  14. arXiv:2111.14103  [pdf, other

    cs.CV

    CHARTER: heatmap-based multi-type chart data extraction

    Authors: Joseph Shtok, Sivan Harary, Ophir Azulai, Adi Raz Goldfarb, Assaf Arbelle, Leonid Karlinsky

    Abstract: The digital conversion of information stored in documents is a great source of knowledge. In contrast to the documents text, the conversion of the embedded documents graphics, such as charts and plots, has been much less explored. We present a method and a system for end-to-end conversion of document charts into machine readable tabular data format, which can be easily stored and analyzed in the d… ▽ More

    Submitted 28 November, 2021; originally announced November 2021.

    Comments: Joseph Shtok, Sivan Harary and Leonid Karlinsky had equal contribution

    Journal ref: Document Intelligence workshop at KDD 2021 conference

  15. arXiv:2104.09829  [pdf, other

    cs.CV

    Detector-Free Weakly Supervised Grounding by Separation

    Authors: Assaf Arbelle, Sivan Doveh, Amit Alfassy, Joseph Shtok, Guy Lev, Eli Schwartz, Hilde Kuehne, Hila Barak Levi, Prasanna Sattigeri, Rameswar Panda, Chun-Fu Chen, Alex Bronstein, Kate Saenko, Shimon Ullman, Raja Giryes, Rogerio Feris, Leonid Karlinsky

    Abstract: Nowadays, there is an abundance of data involving images and surrounding free-form text weakly corresponding to those images. Weakly Supervised phrase-Grounding (WSG) deals with the task of using this data to learn to localize (or to ground) arbitrary text phrases in images without any additional annotations. However, most recent SotA methods for WSG assume the existence of a pre-trained object de… ▽ More

    Submitted 20 April, 2021; originally announced April 2021.

  16. arXiv:2005.03995  [pdf, other

    eess.IV cs.CV

    DeepHist: Differentiable Joint and Color Histogram Layers for Image-to-Image Translation

    Authors: Mor Avi-Aharon, Assaf Arbelle, Tammy Riklin Raviv

    Abstract: We present the DeepHist - a novel Deep Learning framework for augmenting a network by histogram layers and demonstrate its strength by addressing image-to-image translation problems. Specifically, given an input image and a reference color distribution we aim to generate an output image with the structural appearance (content) of the input (source) yet with the colors of the reference. The key ide… ▽ More

    Submitted 6 May, 2020; originally announced May 2020.

    Comments: arXiv admin note: text overlap with arXiv:1912.06044

  17. arXiv:1912.06044  [pdf, other

    cs.CV cs.LG

    Hue-Net: Intensity-based Image-to-Image Translation with Differentiable Histogram Loss Functions

    Authors: Mor Avi-Aharon, Assaf Arbelle, Tammy Riklin Raviv

    Abstract: We present the Hue-Net - a novel Deep Learning framework for Intensity-based Image-to-Image Translation. The key idea is a new technique termed network augmentation which allows a differentiable construction of intensity histograms from images. We further introduce differentiable representations of (1D) cyclic and joint (2D) histograms and use them for defining loss functions based on cyclic Earth… ▽ More

    Submitted 12 December, 2019; originally announced December 2019.

  18. arXiv:1904.08503  [pdf, other

    cs.CV cs.AI cs.LG stat.ML

    QANet -- Quality Assurance Network for Image Segmentation

    Authors: Assaf Arbelle, Eliav Elul, Tammy Riklin Raviv

    Abstract: We introduce a novel Deep Learning framework, which quantitatively estimates image segmentation quality without the need for human inspection or labeling. We refer to this method as a Quality Assurance Network -- QANet. Specifically, given an image and a `proposed' corresponding segmentation, obtained by any method including manual annotation, the QANet solves a regression problem in order to esti… ▽ More

    Submitted 5 November, 2019; v1 submitted 9 April, 2019; originally announced April 2019.

  19. arXiv:1805.11247  [pdf, other

    cs.CV

    Microscopy Cell Segmentation via Convolutional LSTM Networks

    Authors: Assaf Arbelle, Tammy Riklin Raviv

    Abstract: Live cell microscopy sequences exhibit complex spatial structures and complicated temporal behaviour, making their analysis a challenging task. Considering cell segmentation problem, which plays a significant role in the analysis, the spatial properties of the data can be captured using Convolutional Neural Networks (CNNs). Recent approaches show promising segmentation results using convolutional… ▽ More

    Submitted 6 January, 2019; v1 submitted 29 May, 2018; originally announced May 2018.

    Comments: Accepted to ISBI 2019

  20. arXiv:1709.05860  [pdf, other

    cs.CV

    Microscopy Cell Segmentation via Adversarial Neural Networks

    Authors: Assaf Arbelle, Tammy Riklin Raviv

    Abstract: We present a novel method for cell segmentation in microscopy images which is inspired by the Generative Adversarial Neural Network (GAN) approach. Our framework is built on a pair of two competitive artificial neural networks, with a unique architecture, termed Rib Cage, which are trained simultaneously and together define a min-max game resulting in an accurate segmentation of a given image. Our… ▽ More

    Submitted 13 September, 2018; v1 submitted 18 September, 2017; originally announced September 2017.

    Comments: Accepted to IEEE International Symposium on Biomedical Imaging (ISBI) 2018