-
NumeroLogic: Number Encoding for Enhanced LLMs' Numerical Reasoning
Authors:
Eli Schwartz,
Leshem Choshen,
Joseph Shtok,
Sivan Doveh,
Leonid Karlinsky,
Assaf Arbelle
Abstract:
Language models struggle with handling numerical data and performing arithmetic operations. We hypothesize that this limitation can be partially attributed to non-intuitive textual numbers representation. When a digit is read or generated by a causal language model it does not know its place value (e.g. thousands vs. hundreds) until the entire number is processed. To address this issue, we propose…
▽ More
Language models struggle with handling numerical data and performing arithmetic operations. We hypothesize that this limitation can be partially attributed to non-intuitive textual numbers representation. When a digit is read or generated by a causal language model it does not know its place value (e.g. thousands vs. hundreds) until the entire number is processed. To address this issue, we propose a simple adjustment to how numbers are represented by including the count of digits before each number. For instance, instead of "42", we suggest using "{2:42}" as the new format. This approach, which we term NumeroLogic, offers an added advantage in number generation by serving as a Chain of Thought (CoT). By requiring the model to consider the number of digits first, it enhances the reasoning process before generating the actual number. We use arithmetic tasks to demonstrate the effectiveness of the NumeroLogic formatting. We further demonstrate NumeroLogic applicability to general natural language modeling, improving language understanding performance in the MMLU benchmark.
△ Less
Submitted 30 March, 2024;
originally announced April 2024.
-
CHARTER: heatmap-based multi-type chart data extraction
Authors:
Joseph Shtok,
Sivan Harary,
Ophir Azulai,
Adi Raz Goldfarb,
Assaf Arbelle,
Leonid Karlinsky
Abstract:
The digital conversion of information stored in documents is a great source of knowledge. In contrast to the documents text, the conversion of the embedded documents graphics, such as charts and plots, has been much less explored. We present a method and a system for end-to-end conversion of document charts into machine readable tabular data format, which can be easily stored and analyzed in the d…
▽ More
The digital conversion of information stored in documents is a great source of knowledge. In contrast to the documents text, the conversion of the embedded documents graphics, such as charts and plots, has been much less explored. We present a method and a system for end-to-end conversion of document charts into machine readable tabular data format, which can be easily stored and analyzed in the digital domain. Our approach extracts and analyses charts along with their graphical elements and supporting structures such as legends, axes, titles, and captions. Our detection system is based on neural networks, trained solely on synthetic data, eliminating the limiting factor of data collection. As opposed to previous methods, which detect graphical elements using bounding-boxes, our networks feature auxiliary domain specific heatmaps prediction enabling the precise detection of pie charts, line and scatter plots which do not fit the rectangular bounding-box presumption. Qualitative and quantitative results show high robustness and precision, improving upon previous works on popular benchmarks
△ Less
Submitted 28 November, 2021;
originally announced November 2021.
-
Detector-Free Weakly Supervised Grounding by Separation
Authors:
Assaf Arbelle,
Sivan Doveh,
Amit Alfassy,
Joseph Shtok,
Guy Lev,
Eli Schwartz,
Hilde Kuehne,
Hila Barak Levi,
Prasanna Sattigeri,
Rameswar Panda,
Chun-Fu Chen,
Alex Bronstein,
Kate Saenko,
Shimon Ullman,
Raja Giryes,
Rogerio Feris,
Leonid Karlinsky
Abstract:
Nowadays, there is an abundance of data involving images and surrounding free-form text weakly corresponding to those images. Weakly Supervised phrase-Grounding (WSG) deals with the task of using this data to learn to localize (or to ground) arbitrary text phrases in images without any additional annotations. However, most recent SotA methods for WSG assume the existence of a pre-trained object de…
▽ More
Nowadays, there is an abundance of data involving images and surrounding free-form text weakly corresponding to those images. Weakly Supervised phrase-Grounding (WSG) deals with the task of using this data to learn to localize (or to ground) arbitrary text phrases in images without any additional annotations. However, most recent SotA methods for WSG assume the existence of a pre-trained object detector, relying on it to produce the ROIs for localization. In this work, we focus on the task of Detector-Free WSG (DF-WSG) to solve WSG without relying on a pre-trained detector. We directly learn everything from the images and associated free-form text pairs, thus potentially gaining an advantage on the categories unsupported by the detector. The key idea behind our proposed Grounding by Separation (GbS) method is synthesizing `text to image-regions' associations by random alpha-blending of arbitrary image pairs and using the corresponding texts of the pair as conditions to recover the alpha map from the blended image via a segmentation network. At test time, this allows using the query phrase as a condition for a non-blended query image, thus interpreting the test image as a composition of a region corresponding to the phrase and the complement region. Using this approach we demonstrate a significant accuracy improvement, of up to $8.5\%$ over previous DF-WSG SotA, for a range of benchmarks including Flickr30K, Visual Genome, and ReferIt, as well as a significant complementary improvement (above $7\%$) over the detector-based approaches for WSG.
△ Less
Submitted 20 April, 2021;
originally announced April 2021.
-
StarNet: towards Weakly Supervised Few-Shot Object Detection
Authors:
Leonid Karlinsky,
Joseph Shtok,
Amit Alfassy,
Moshe Lichtenstein,
Sivan Harary,
Eli Schwartz,
Sivan Doveh,
Prasanna Sattigeri,
Rogerio Feris,
Alexander Bronstein,
Raja Giryes
Abstract:
Few-shot detection and classification have advanced significantly in recent years. Yet, detection approaches require strong annotation (bounding boxes) both for pre-training and for adaptation to novel classes, and classification approaches rarely provide localization of objects in the scene. In this paper, we introduce StarNet - a few-shot model featuring an end-to-end differentiable non-parametr…
▽ More
Few-shot detection and classification have advanced significantly in recent years. Yet, detection approaches require strong annotation (bounding boxes) both for pre-training and for adaptation to novel classes, and classification approaches rarely provide localization of objects in the scene. In this paper, we introduce StarNet - a few-shot model featuring an end-to-end differentiable non-parametric star-model detection and classification head. Through this head, the backbone is meta-trained using only image-level labels to produce good features for jointly localizing and classifying previously unseen categories of few-shot test tasks using a star-model that geometrically matches between the query and support images (to find corresponding object instances). Being a few-shot detector, StarNet does not require any bounding box annotations, neither during pre-training nor for novel classes adaptation. It can thus be applied to the previously unexplored and challenging task of Weakly Supervised Few-Shot Object Detection (WS-FSOD), where it attains significant improvements over the baselines. In addition, StarNet shows significant gains on few-shot classification benchmarks that are less cropped around the objects (where object localization is key).
△ Less
Submitted 17 September, 2020; v1 submitted 15 March, 2020;
originally announced March 2020.
-
LaSO: Label-Set Operations networks for multi-label few-shot learning
Authors:
Amit Alfassy,
Leonid Karlinsky,
Amit Aides,
Joseph Shtok,
Sivan Harary,
Rogerio Feris,
Raja Giryes,
Alex M. Bronstein
Abstract:
Example synthesis is one of the leading methods to tackle the problem of few-shot learning, where only a small number of samples per class are available. However, current synthesis approaches only address the scenario of a single category label per image. In this work, we propose a novel technique for synthesizing samples with multiple labels for the (yet unhandled) multi-label few-shot classifica…
▽ More
Example synthesis is one of the leading methods to tackle the problem of few-shot learning, where only a small number of samples per class are available. However, current synthesis approaches only address the scenario of a single category label per image. In this work, we propose a novel technique for synthesizing samples with multiple labels for the (yet unhandled) multi-label few-shot classification scenario. We propose to combine pairs of given examples in feature space, so that the resulting synthesized feature vectors will correspond to examples whose label sets are obtained through certain set operations on the label sets of the corresponding input pairs. Thus, our method is capable of producing a sample containing the intersection, union or set-difference of labels present in two input samples. As we show, these set operations generalize to labels unseen during training. This enables performing augmentation on examples of novel categories, thus, facilitating multi-label few-shot classifier learning. We conduct numerous experiments showing promising results for the label-set manipulation capabilities of the proposed approach, both directly (using the classification and retrieval metrics), and in the context of performing data augmentation for multi-label few-shot learning. We propose a benchmark for this new and challenging task and show that our method compares favorably to all the common baselines.
△ Less
Submitted 26 February, 2019;
originally announced February 2019.
-
Delta-encoder: an effective sample synthesis method for few-shot object recognition
Authors:
Eli Schwartz,
Leonid Karlinsky,
Joseph Shtok,
Sivan Harary,
Mattias Marder,
Rogerio Feris,
Abhishek Kumar,
Raja Giryes,
Alex M. Bronstein
Abstract:
Learning to classify new categories based on just one or a few examples is a long-standing challenge in modern computer vision. In this work, we proposes a simple yet effective method for few-shot (and one-shot) object recognition. Our approach is based on a modified auto-encoder, denoted Delta-encoder, that learns to synthesize new samples for an unseen category just by seeing few examples from i…
▽ More
Learning to classify new categories based on just one or a few examples is a long-standing challenge in modern computer vision. In this work, we proposes a simple yet effective method for few-shot (and one-shot) object recognition. Our approach is based on a modified auto-encoder, denoted Delta-encoder, that learns to synthesize new samples for an unseen category just by seeing few examples from it. The synthesized samples are then used to train a classifier. The proposed approach learns to both extract transferable intra-class deformations, or "deltas", between same-class pairs of training examples, and to apply those deltas to the few provided examples of a novel class (unseen during training) in order to efficiently synthesize samples from that new class. The proposed method improves over the state-of-the-art in one-shot object-recognition and compares favorably in the few-shot case. Upon acceptance code will be made available.
△ Less
Submitted 29 November, 2018; v1 submitted 12 June, 2018;
originally announced June 2018.
-
RepMet: Representative-based metric learning for classification and one-shot object detection
Authors:
Leonid Karlinsky,
Joseph Shtok,
Sivan Harary,
Eli Schwartz,
Amit Aides,
Rogerio Feris,
Raja Giryes,
Alex M. Bronstein
Abstract:
Distance metric learning (DML) has been successfully applied to object classification, both in the standard regime of rich training data and in the few-shot scenario, where each category is represented by only a few examples. In this work, we propose a new method for DML that simultaneously learns the backbone network parameters, the embedding space, and the multi-modal distribution of each of the…
▽ More
Distance metric learning (DML) has been successfully applied to object classification, both in the standard regime of rich training data and in the few-shot scenario, where each category is represented by only a few examples. In this work, we propose a new method for DML that simultaneously learns the backbone network parameters, the embedding space, and the multi-modal distribution of each of the training categories in that space, in a single end-to-end training process. Our approach outperforms state-of-the-art methods for DML-based object classification on a variety of standard fine-grained datasets. Furthermore, we demonstrate the effectiveness of our approach on the problem of few-shot object detection, by incorporating the proposed DML architecture as a classification head into a standard object detection model. We achieve the best results on the ImageNet-LOC dataset compared to strong baselines, when only a few training examples are available. We also offer the community a new episodic benchmark based on the ImageNet dataset for the few-shot object detection task.
△ Less
Submitted 18 November, 2018; v1 submitted 12 June, 2018;
originally announced June 2018.
-
Spatially-Adaptive Reconstruction in Computed Tomography using Neural Networks
Authors:
Joseph Shtok,
Michael Zibulevsky,
Michael Elad
Abstract:
We propose a supervised machine learning approach for boosting existing signal and image recovery methods and demonstrate its efficacy on example of image reconstruction in computed tomography. Our technique is based on a local nonlinear fusion of several image estimates, all obtained by applying a chosen reconstruction algorithm with different values of its control parameters. Usually such output…
▽ More
We propose a supervised machine learning approach for boosting existing signal and image recovery methods and demonstrate its efficacy on example of image reconstruction in computed tomography. Our technique is based on a local nonlinear fusion of several image estimates, all obtained by applying a chosen reconstruction algorithm with different values of its control parameters. Usually such output images have different bias/variance trade-off. The fusion of the images is performed by feed-forward neural network trained on a set of known examples. Numerical experiments show an improvement in reconstruction quality relatively to existing direct and iterative reconstruction methods.
△ Less
Submitted 28 November, 2013;
originally announced November 2013.
-
Spatially-Adaptive Reconstruction in Computed Tomography Based on Statistical Learning
Authors:
Joseph Shtok,
Michael Zibulevsky,
Michael Elad
Abstract:
We propose a direct reconstruction algorithm for Computed Tomography, based on a local fusion of a few preliminary image estimates by means of a non-linear fusion rule. One such rule is based on a signal denoising technique which is spatially adaptive to the unknown local smoothness. Another, more powerful fusion rule, is based on a neural network trained off-line with a high-quality training set…
▽ More
We propose a direct reconstruction algorithm for Computed Tomography, based on a local fusion of a few preliminary image estimates by means of a non-linear fusion rule. One such rule is based on a signal denoising technique which is spatially adaptive to the unknown local smoothness. Another, more powerful fusion rule, is based on a neural network trained off-line with a high-quality training set of images. Two types of linear reconstruction algorithms for the preliminary images are employed for two different reconstruction tasks. For an entire image reconstruction from full projection data, the proposed scheme uses a sequence of Filtered Back-Projection algorithms with a gradually growing cut-off frequency. To recover a Region Of Interest only from local projections, statistically-trained linear reconstruction algorithms are employed. Numerical experiments display the improvement in reconstruction quality when compared to linear reconstruction algorithms.
△ Less
Submitted 25 April, 2010;
originally announced April 2010.
-
Analysis of Basis Pursuit Via Capacity Sets
Authors:
Joseph Shtok,
Michael Elad
Abstract:
Finding the sparsest solution $α$ for an under-determined linear system of equations $Dα=s$ is of interest in many applications. This problem is known to be NP-hard. Recent work studied conditions on the support size of $α$ that allow its recovery using L1-minimization, via the Basis Pursuit algorithm. These conditions are often relying on a scalar property of $D$ called the mutual-coherence. In t…
▽ More
Finding the sparsest solution $α$ for an under-determined linear system of equations $Dα=s$ is of interest in many applications. This problem is known to be NP-hard. Recent work studied conditions on the support size of $α$ that allow its recovery using L1-minimization, via the Basis Pursuit algorithm. These conditions are often relying on a scalar property of $D$ called the mutual-coherence. In this work we introduce an alternative set of features of an arbitrarily given $D$, called the "capacity sets". We show how those could be used to analyze the performance of the basis pursuit, leading to improved bounds and predictions of performance. Both theoretical and numerical methods are presented, all using the capacity values, and shown to lead to improved assessments of the basis pursuit success in finding the sparest solution of $Dα=s$.
△ Less
Submitted 25 April, 2010;
originally announced April 2010.