-
SpannerLib: Embedding Declarative Information Extraction in an Imperative Workflow
Authors:
Dean Light,
Ahmad Aiashy,
Mahmoud Diab,
Daniel Nachmias,
Stijn Vansummeren,
Benny Kimelfeld
Abstract:
Document spanners have been proposed as a formal framework for declarative Information Extraction (IE) from text, following IE products from the industry and academia. Over the past decade, the framework has been studied thoroughly in terms of expressive power, complexity, and the ability to naturally combine text analysis with relational querying. This demonstration presents SpannerLib a library…
▽ More
Document spanners have been proposed as a formal framework for declarative Information Extraction (IE) from text, following IE products from the industry and academia. Over the past decade, the framework has been studied thoroughly in terms of expressive power, complexity, and the ability to naturally combine text analysis with relational querying. This demonstration presents SpannerLib a library for embedding document spanners in Python code. SpannerLib facilitates the development of IE programs by providing an implementation of Spannerlog (Datalog-based documentspanners) that interacts with the Python code in two directions: rules can be embedded inside Python, and they can invoke custom Python code (e.g., calls to ML-based NLP models) via user-defined functions. The demonstration scenarios showcase IE programs, with increasing levels of complexity, within Jupyter Notebook.
△ Less
Submitted 3 September, 2024;
originally announced September 2024.
-
Performance estimation of photonic integrated wavefront corrector for single-mode fiber coupling
Authors:
Dhwanil Patel,
Momen Diab,
Ross Cheriton,
Jacob Taylor,
Libertad Rojas,
Suresh Sivanandam
Abstract:
Many modern astronomical instruments rely on the optimal coupling of starlight into single-mode fibers (SMFs). For ground-based telescopes, this coupling is limited by atmospheric turbulence. We propose an integrated wavefront corrector based on silicon-on-insulator (SOI) photonics, which samples the aberrated wavefront via a microlens array (MLA). The MLA focuses the sampled wavefront onto an arr…
▽ More
Many modern astronomical instruments rely on the optimal coupling of starlight into single-mode fibers (SMFs). For ground-based telescopes, this coupling is limited by atmospheric turbulence. We propose an integrated wavefront corrector based on silicon-on-insulator (SOI) photonics, which samples the aberrated wavefront via a microlens array (MLA). The MLA focuses the sampled wavefront onto an array of grating couplers that inject the beamlets into the single-mode waveguides of the corrector. The beams in each waveguide are then shifted in phase using thermo-optic phase shifters before combining the co-phased beams into one single-mode waveguide. In this work, we analyze the external factors that we anticipate will impact the performance of the corrector. Specifically, we study the effects of the telescope pupil function with obscuration, determine whether the corrector requires tip/tilt pre-correction, and analyze the impact of scintillation on the correction quality.
△ Less
Submitted 22 August, 2024;
originally announced August 2024.
-
Experimental demonstration of photonic phase correctors based on grating coupler arrays and thermo-optic shifters
Authors:
Momen Diab,
Ross Cheriton,
Jacob Taylor,
Dhwanil Patel,
Libertad Rojas,
Mark Barnet,
Polina Zavyalova,
Dan-Xia Xu,
Pavel Cheben,
Siegfried Janz,
Jens H. Schmid,
Suresh Sivanandam
Abstract:
In ground-based astronomy, the ability to couple light into single-mode fibers (SMFs) is limited by atmospheric turbulence, which prohibits the use of many astrophotonic instruments. We propose a silicon-on-insulator photonic chip capable of coherently coupling the out-of-phase beamlets from the subapertures of a telescope pupil into an SMF. The photonic integrated circuit (PIC) consists of an arr…
▽ More
In ground-based astronomy, the ability to couple light into single-mode fibers (SMFs) is limited by atmospheric turbulence, which prohibits the use of many astrophotonic instruments. We propose a silicon-on-insulator photonic chip capable of coherently coupling the out-of-phase beamlets from the subapertures of a telescope pupil into an SMF. The photonic integrated circuit (PIC) consists of an array of grating couplers that are used to inject light from free space into single-mode waveguides on the chip. Metallic heaters modulate the refractive index of a coiled section of the waveguides, facilitating the co-phasing of the propagating modes. The phased beamlets can then be coherently combined to efficiently deliver the light to an output SMF. In an adaptive optics (AO) system, the phase corrector acts as a deformable mirror (DM) commanded by a controller that takes phase measurements from a wavefront sensor (WFS). We present experimental results for the PIC tested on an AO testbed and compare the performance to simulations.
△ Less
Submitted 14 August, 2024;
originally announced August 2024.
-
The FIGNEWS Shared Task on News Media Narratives
Authors:
Wajdi Zaghouani,
Mustafa Jarrar,
Nizar Habash,
Houda Bouamor,
Imed Zitouni,
Mona Diab,
Samhaa R. El-Beltagy,
Muhammed AbuOdeh
Abstract:
We present an overview of the FIGNEWS shared task, organized as part of the ArabicNLP 2024 conference co-located with ACL 2024. The shared task addresses bias and propaganda annotation in multilingual news posts. We focus on the early days of the Israel War on Gaza as a case study. The task aims to foster collaboration in developing annotation guidelines for subjective tasks by creating frameworks…
▽ More
We present an overview of the FIGNEWS shared task, organized as part of the ArabicNLP 2024 conference co-located with ACL 2024. The shared task addresses bias and propaganda annotation in multilingual news posts. We focus on the early days of the Israel War on Gaza as a case study. The task aims to foster collaboration in developing annotation guidelines for subjective tasks by creating frameworks for analyzing diverse narratives highlighting potential bias and propaganda. In a spirit of fostering and encouraging diversity, we address the problem from a multilingual perspective, namely within five languages: English, French, Arabic, Hebrew, and Hindi. A total of 17 teams participated in two annotation subtasks: bias (16 teams) and propaganda (6 teams). The teams competed in four evaluation tracks: guidelines development, annotation quality, annotation quantity, and consistency. Collectively, the teams produced 129,800 data points. Key findings and implications for the field are discussed.
△ Less
Submitted 25 July, 2024;
originally announced July 2024.
-
End-to-end simulations of photonic phase correctors for adaptive optics systems
Authors:
Dhwanil Patel,
Momen Diab,
Ross Cheriton,
Jacob Taylor,
Libertad Rojas,
Martin Vachon,
Dan-Xia Xu,
Jens H. Schmid,
Pavel Cheben,
Siegfried Janz,
Suresh Sivanandam
Abstract:
Optical beams and starlight distorted by atmospheric turbulence can be corrected with adaptive optics systems to enable efficient coupling into single-mode fibers. Deformable mirrors, used to flatten the wavefront in astronomical telescopes, are costly, sensitive, and complex mechanical components that require careful calibration to enable high-quality imaging in astronomy, microscopy, and vision…
▽ More
Optical beams and starlight distorted by atmospheric turbulence can be corrected with adaptive optics systems to enable efficient coupling into single-mode fibers. Deformable mirrors, used to flatten the wavefront in astronomical telescopes, are costly, sensitive, and complex mechanical components that require careful calibration to enable high-quality imaging in astronomy, microscopy, and vision science. They are also impractical to deploy in large numbers for non-imaging applications like free-space optical communication. Here, we propose a photonic integrated c rcuit capable of spatially sampling the wavefront collected by the telescope and co-phasing the subapertures to maximize the flux delivered to an output single-mode fiber as the integrated photonic implementation of a deformable mirror. We present the results of end-to-end simulations to quantify the performance of the proposed photonic solution under varying atmospheric conditions toward realizing an adaptive optics system without a deformable mirror for free-space optical receivers.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
Grass: Compute Efficient Low-Memory LLM Training with Structured Sparse Gradients
Authors:
Aashiq Muhamed,
Oscar Li,
David Woodruff,
Mona Diab,
Virginia Smith
Abstract:
Large language model (LLM) training and finetuning are often bottlenecked by limited GPU memory. While existing projection-based optimization methods address this by projecting gradients into a lower-dimensional subspace to reduce optimizer state memory, they typically rely on dense projection matrices, which can introduce computational and memory overheads. In this work, we propose Grass (GRAdien…
▽ More
Large language model (LLM) training and finetuning are often bottlenecked by limited GPU memory. While existing projection-based optimization methods address this by projecting gradients into a lower-dimensional subspace to reduce optimizer state memory, they typically rely on dense projection matrices, which can introduce computational and memory overheads. In this work, we propose Grass (GRAdient Stuctured Sparsification), a novel approach that leverages sparse projections to transform gradients into structured sparse updates. This design not only significantly reduces memory usage for optimizer states but also minimizes gradient memory footprint, computation, and communication costs, leading to substantial throughput improvements. Extensive experiments on pretraining and finetuning tasks demonstrate that Grass achieves competitive performance to full-rank training and existing projection-based methods. Notably, Grass enables half-precision pretraining of a 13B parameter LLaMA model on a single 40GB A100 GPU--a feat infeasible for previous methods--and yields up to a $2\times$ throughput improvement on an 8-GPU system. Code can be found at https://github.com/aashiqmuhamed/GRASS .
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Evaluating Large Language Model Biases in Persona-Steered Generation
Authors:
Andy Liu,
Mona Diab,
Daniel Fried
Abstract:
The task of persona-steered text generation requires large language models (LLMs) to generate text that reflects the distribution of views that an individual fitting a persona could have. People have multifaceted personas, but prior work on bias in LLM-generated opinions has only explored multiple-choice settings or one-dimensional personas. We define an incongruous persona as a persona with multi…
▽ More
The task of persona-steered text generation requires large language models (LLMs) to generate text that reflects the distribution of views that an individual fitting a persona could have. People have multifaceted personas, but prior work on bias in LLM-generated opinions has only explored multiple-choice settings or one-dimensional personas. We define an incongruous persona as a persona with multiple traits where one trait makes its other traits less likely in human survey data, e.g. political liberals who support increased military spending. We find that LLMs are 9.7% less steerable towards incongruous personas than congruous ones, sometimes generating the stereotypical stance associated with its demographic rather than the target stance. Models that we evaluate that are fine-tuned with Reinforcement Learning from Human Feedback (RLHF) are more steerable, especially towards stances associated with political liberals and women, but present significantly less diverse views of personas. We also find variance in LLM steerability that cannot be predicted from multiple-choice opinion evaluation. Our results show the importance of evaluating models in open-ended text generation, as it can surface new LLM opinion biases. Moreover, such a setup can shed light on our ability to steer models toward a richer and more diverse range of viewpoints.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Automatic Generation of Model and Data Cards: A Step Towards Responsible AI
Authors:
Jiarui Liu,
Wenkai Li,
Zhijing Jin,
Mona Diab
Abstract:
In an era of model and data proliferation in machine learning/AI especially marked by the rapid advancement of open-sourced technologies, there arises a critical need for standardized consistent documentation. Our work addresses the information incompleteness in current human-generated model and data cards. We propose an automated generation approach using Large Language Models (LLMs). Our key con…
▽ More
In an era of model and data proliferation in machine learning/AI especially marked by the rapid advancement of open-sourced technologies, there arises a critical need for standardized consistent documentation. Our work addresses the information incompleteness in current human-generated model and data cards. We propose an automated generation approach using Large Language Models (LLMs). Our key contributions include the establishment of CardBench, a comprehensive dataset aggregated from over 4.8k model cards and 1.4k data cards, coupled with the development of the CardGen pipeline comprising a two-step retrieval process. Our approach exhibits enhanced completeness, objectivity, and faithfulness in generated model and data cards, a significant step in responsible AI documentation practices ensuring better accountability and traceability.
△ Less
Submitted 18 June, 2024; v1 submitted 10 May, 2024;
originally announced May 2024.
-
Analyzing the Role of Semantic Representations in the Era of Large Language Models
Authors:
Zhijing Jin,
Yuen Chen,
Fernando Gonzalez,
Jiarui Liu,
Jiayi Zhang,
Julian Michael,
Bernhard Schölkopf,
Mona Diab
Abstract:
Traditionally, natural language processing (NLP) models often use a rich set of features created by linguistic expertise, such as semantic representations. However, in the era of large language models (LLMs), more and more tasks are turned into generic, end-to-end sequence generation problems. In this paper, we investigate the question: what is the role of semantic representations in the era of LL…
▽ More
Traditionally, natural language processing (NLP) models often use a rich set of features created by linguistic expertise, such as semantic representations. However, in the era of large language models (LLMs), more and more tasks are turned into generic, end-to-end sequence generation problems. In this paper, we investigate the question: what is the role of semantic representations in the era of LLMs? Specifically, we investigate the effect of Abstract Meaning Representation (AMR) across five diverse NLP tasks. We propose an AMR-driven chain-of-thought prompting method, which we call AMRCoT, and find that it generally hurts performance more than it helps. To investigate what AMR may have to offer on these tasks, we conduct a series of analysis experiments. We find that it is difficult to predict which input examples AMR may help or hurt on, but errors tend to arise with multi-word expressions, named entities, and in the final inference step where the LLM must connect its reasoning over the AMR to its prediction. We recommend focusing on these areas for future work in semantic representations for LLMs. Our code: https://github.com/causalNLP/amr_llm.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
Splitting Techniques for DAEs with port-Hamiltonian Applications
Authors:
Andreas Bartel,
Malak Diab,
Andreas Frommer,
Michael Günther,
Nicole Marheineke
Abstract:
In the simulation of differential-algebraic equations (DAEs), it is essential to employ numerical schemes that take into account the inherent structure and maintain explicit or hidden algebraic constraints without altering them. This paper focuses on operator-splitting techniques for coupled systems and aims at preserving the structure in the port-Hamiltonian framework. The study explores two deco…
▽ More
In the simulation of differential-algebraic equations (DAEs), it is essential to employ numerical schemes that take into account the inherent structure and maintain explicit or hidden algebraic constraints without altering them. This paper focuses on operator-splitting techniques for coupled systems and aims at preserving the structure in the port-Hamiltonian framework. The study explores two decomposition strategies: one considering the underlying coupled subsystem structure and the other addressing energy-associated properties such as conservation and dissipation. We show that for coupled index-$1$ DAEs with and without private index-2 variables, the splitting schemes on top of a dimension-reducing decomposition achieve the same convergence rate as in the case of ordinary differential equations. Additionally, we discuss an energy-associated decomposition for index-1 pH-DAEs and introduce generalized Cayley transforms to uphold energy conservation. The effectiveness of both strategies is evaluated using port-Hamiltonian benchmark examples from electric circuits.
△ Less
Submitted 21 April, 2024;
originally announced April 2024.
-
Non-extensive Effects on the QCD Equation of State and Fluctuations of Conserved Charges within Polyakov Quark Meson Model
Authors:
Abdel Magied Diab
Abstract:
The influence of non-extensive Tsallis statistics on the hadron phase structure has been investigated using the Polyakov-quark-meson (PQM) model. The analysis examines the non-extensive effects on the temperature dependence of PQM order parameters, thermodynamic quantities related to the QCD equation of state, and fluctuations of conserved charges at varying chemical potentials. The results show t…
▽ More
The influence of non-extensive Tsallis statistics on the hadron phase structure has been investigated using the Polyakov-quark-meson (PQM) model. The analysis examines the non-extensive effects on the temperature dependence of PQM order parameters, thermodynamic quantities related to the QCD equation of state, and fluctuations of conserved charges at varying chemical potentials. The results show that non-extensive effects have the most significant deviations near the crossover region. The pseudo-critical temperature $T_χ(μ_B)$ is not a universal constant and decreases with increasing non-extensive $q$ parameter. The chiral phase diagram of the PQM model indicates a decrease in the behavior of the ($T_χ-μ_B$) plane with increasing non-extensive $q$ parameter. The PQM model exhibits good qualitative agreement with lattice QCD calculations. Moreover, these findings suggest the existence of a Tsallis limit, which serves as an alternative to the Stefan-Boltzmann (SB) limit for the massless ideal gas. The critical endpoint (CEP) exhibits lower temperature but higher chemical potential with increasing non-extensive $q$ parameter. Overall, this study highlights the importance of non-extensive Tsallis statistics in characterizing the quark-hadron phase structure of the PQM model and contributes to a deeper understanding of non-extensive effects in the quark-hadron phase transition.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
Recover: A Neuro-Symbolic Framework for Failure Detection and Recovery
Authors:
Cristina Cornelio,
Mohammed Diab
Abstract:
Recognizing failures during task execution and implementing recovery procedures is challenging in robotics. Traditional approaches rely on the availability of extensive data or a tight set of constraints, while more recent approaches leverage large language models (LLMs) to verify task steps and replan accordingly. However, these methods often operate offline, necessitating scene resets and incurr…
▽ More
Recognizing failures during task execution and implementing recovery procedures is challenging in robotics. Traditional approaches rely on the availability of extensive data or a tight set of constraints, while more recent approaches leverage large language models (LLMs) to verify task steps and replan accordingly. However, these methods often operate offline, necessitating scene resets and incurring in high costs. This paper introduces Recover, a neuro-symbolic framework for online failure identification and recovery. By integrating ontologies, logical rules, and LLM-based planners, Recover exploits symbolic information to enhance the ability of LLMs to generate recovery plans and also to decrease the associated costs. In order to demonstrate the capabilities of our method in a simulated kitchen environment, we introduce OntoThor, an ontology describing the AI2Thor simulator setting. Empirical evaluation shows that OntoThor's logical rules accurately detect all failures in the analyzed tasks, and that Recover considerably outperforms, for both failure detection and recovery, a baseline method reliant solely on LLMs.
△ Less
Submitted 31 March, 2024;
originally announced April 2024.
-
Emotion Classification in Low and Moderate Resource Languages
Authors:
Shabnam Tafreshi,
Shubham Vatsal,
Mona Diab
Abstract:
It is important to be able to analyze the emotional state of people around the globe. There are 7100+ active languages spoken around the world and building emotion classification for each language is labor intensive. Particularly for low-resource and endangered languages, building emotion classification can be quite challenging. We present a cross-lingual emotion classifier, where we train an emot…
▽ More
It is important to be able to analyze the emotional state of people around the globe. There are 7100+ active languages spoken around the world and building emotion classification for each language is labor intensive. Particularly for low-resource and endangered languages, building emotion classification can be quite challenging. We present a cross-lingual emotion classifier, where we train an emotion classifier with resource-rich languages (i.e. \textit{English} in our work) and transfer the learning to low and moderate resource languages. We compare and contrast two approaches of transfer learning from a high-resource language to a low or moderate-resource language. One approach projects the annotation from a high-resource language to low and moderate-resource language in parallel corpora and the other one uses direct transfer from high-resource language to the other languages. We show the efficacy of our approaches on 6 languages: Farsi, Arabic, Spanish, Ilocano, Odia, and Azerbaijani. Our results indicate that our approaches outperform random baselines and transfer emotions across languages successfully. For all languages, the direct cross-lingual transfer of emotion yields better results. We also create annotated emotion-labeled resources for four languages: Farsi, Azerbaijani, Ilocano and Odia.
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
Investigating Cultural Alignment of Large Language Models
Authors:
Badr AlKhamissi,
Muhammad ElNokrashy,
Mai AlKhamissi,
Mona Diab
Abstract:
The intricate relationship between language and culture has long been a subject of exploration within the realm of linguistic anthropology. Large Language Models (LLMs), promoted as repositories of collective human knowledge, raise a pivotal question: do these models genuinely encapsulate the diverse knowledge adopted by different cultures? Our study reveals that these models demonstrate greater c…
▽ More
The intricate relationship between language and culture has long been a subject of exploration within the realm of linguistic anthropology. Large Language Models (LLMs), promoted as repositories of collective human knowledge, raise a pivotal question: do these models genuinely encapsulate the diverse knowledge adopted by different cultures? Our study reveals that these models demonstrate greater cultural alignment along two dimensions -- firstly, when prompted with the dominant language of a specific culture, and secondly, when pretrained with a refined mixture of languages employed by that culture. We quantify cultural alignment by simulating sociological surveys, comparing model responses to those of actual survey participants as references. Specifically, we replicate a survey conducted in various regions of Egypt and the United States through prompting LLMs with different pretraining data mixtures in both Arabic and English with the personas of the real respondents and the survey questions. Further analysis reveals that misalignment becomes more pronounced for underrepresented personas and for culturally sensitive topics, such as those probing social values. Finally, we introduce Anthropological Prompting, a novel method leveraging anthropological reasoning to enhance cultural alignment. Our study emphasizes the necessity for a more balanced multilingual pretraining dataset to better represent the diversity of human experience and the plurality of different cultures with many implications on the topic of cross-lingual transfer.
△ Less
Submitted 6 July, 2024; v1 submitted 20 February, 2024;
originally announced February 2024.
-
A Note on Bias to Complete
Authors:
Jia Xu,
Mona Diab
Abstract:
Minimizing social bias strengthens societal bonds, promoting shared understanding and better decision-making. We revisit the definition of bias by discovering new bias types (e.g., societal status) in dynamic environments and describe them relative to context, such as culture, region, time, and personal background. Our framework includes eight hypotheses about bias and a minimizing bias strategy f…
▽ More
Minimizing social bias strengthens societal bonds, promoting shared understanding and better decision-making. We revisit the definition of bias by discovering new bias types (e.g., societal status) in dynamic environments and describe them relative to context, such as culture, region, time, and personal background. Our framework includes eight hypotheses about bias and a minimizing bias strategy for each assumption as well as five methods as proposed solutions in LLM. The realization of the framework is yet to be completed.
△ Less
Submitted 18 February, 2024;
originally announced February 2024.
-
2023 Astrophotonics Roadmap: pathways to realizing multi-functional integrated astrophotonic instruments
Authors:
Nemanja Jovanovic,
Pradip Gatkine,
Narsireddy Anugu,
Rodrigo Amezcua-Correa,
Ritoban Basu Thakur,
Charles Beichman,
Chad Bender,
Jean-Philippe Berger,
Azzurra Bigioli,
Joss Bland-Hawthorn,
Guillaume Bourdarot,
Charles M. Bradford,
Ronald Broeke,
Julia Bryant,
Kevin Bundy,
Ross Cheriton,
Nick Cvetojevic,
Momen Diab,
Scott A. Diddams,
Aline N. Dinkelaker,
Jeroen Duis,
Stephen Eikenberry,
Simon Ellis,
Akira Endo,
Donald F. Figer
, et al. (55 additional authors not shown)
Abstract:
Photonics offer numerous functionalities that can be used to realize astrophotonic instruments. The most spectacular example to date is the ESO Gravity instrument at the Very Large Telescope in Chile. Integrated astrophotonic devices stand to offer critical advantages for instrument development, including extreme miniaturization, as well as integration, superior thermal and mechanical stabilizatio…
▽ More
Photonics offer numerous functionalities that can be used to realize astrophotonic instruments. The most spectacular example to date is the ESO Gravity instrument at the Very Large Telescope in Chile. Integrated astrophotonic devices stand to offer critical advantages for instrument development, including extreme miniaturization, as well as integration, superior thermal and mechanical stabilization owing to the small footprint, and high replicability offering cost savings. Numerous astrophotonic technologies have been developed to address shortcomings of conventional instruments to date, including for example the development of photonic lanterns, complex aperiodic fiber Bragg gratings, complex beam combiners to enable long baseline interferometry, and laser frequency combs for high precision spectral calibration of spectrometers. Despite these successes, the facility implementation of photonic solutions in astronomical instrumentation is currently limited because of (1) low throughputs from coupling to fibers, coupling fibers to chips, propagation and bend losses, device losses, etc, (2) difficulties with scaling to large channel count devices needed for large bandwidths and high resolutions, and (3) efficient integration of photonics with detectors, to name a few. In this roadmap, we identify 24 areas that need further development. We outline the challenges and advances needed across those areas covering design tools, simulation capabilities, fabrication processes, the need for entirely new components, integration and hybridization and the characterization of devices. To realize these advances the astrophotonics community will have to work cooperatively with industrial partners who have more advanced manufacturing capabilities. With the advances described herein, multi-functional instruments will be realized leading to novel observing capabilities for both ground and space platforms.
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
Operator splitting for semi-explicit differential-algebraic equations and port-Hamiltonian DAEs
Authors:
Andreas Bartel,
Malak Diab,
Andreas Frommer,
Michael Günther
Abstract:
Operator splitting methods allow to split the operator describing a complex dynamical system into a sequence of simpler subsystems and treat each part independently. In the modeling of dynamical problems, systems of (possibly coupled) differential-algebraic equations (DAEs) arise. This motivates the application of operator splittings which are aware of the various structural forms of DAEs. Here, w…
▽ More
Operator splitting methods allow to split the operator describing a complex dynamical system into a sequence of simpler subsystems and treat each part independently. In the modeling of dynamical problems, systems of (possibly coupled) differential-algebraic equations (DAEs) arise. This motivates the application of operator splittings which are aware of the various structural forms of DAEs. Here, we present an approach for the splitting of coupled index-1 DAE as well as for the splitting of port-Hamiltonian DAEs, taking advantage of the energy-conservative and energy-dissipative parts. We provide numerical examples illustrating our second-order convergence results.
△ Less
Submitted 31 August, 2023;
originally announced August 2023.
-
Can Large Language Models Infer Causation from Correlation?
Authors:
Zhijing Jin,
Jiarui Liu,
Zhiheng Lyu,
Spencer Poff,
Mrinmaya Sachan,
Rada Mihalcea,
Mona Diab,
Bernhard Schölkopf
Abstract:
Causal inference is one of the hallmarks of human intelligence. While the field of CausalNLP has attracted much interest in the recent years, existing causal inference datasets in NLP primarily rely on discovering causality from empirical knowledge (e.g., commonsense knowledge). In this work, we propose the first benchmark dataset to test the pure causal inference skills of large language models (…
▽ More
Causal inference is one of the hallmarks of human intelligence. While the field of CausalNLP has attracted much interest in the recent years, existing causal inference datasets in NLP primarily rely on discovering causality from empirical knowledge (e.g., commonsense knowledge). In this work, we propose the first benchmark dataset to test the pure causal inference skills of large language models (LLMs). Specifically, we formulate a novel task Corr2Cause, which takes a set of correlational statements and determines the causal relationship between the variables. We curate a large-scale dataset of more than 200K samples, on which we evaluate seventeen existing LLMs. Through our experiments, we identify a key shortcoming of LLMs in terms of their causal inference skills, and show that these models achieve almost close to random performance on the task. This shortcoming is somewhat mitigated when we try to re-purpose LLMs for this skill via finetuning, but we find that these models still fail to generalize -- they can only perform causal inference in in-distribution settings when variable names and textual expressions used in the queries are similar to those in the training set, but fail in out-of-distribution settings generated by perturbing these queries. Corr2Cause is a challenging task for LLMs, and would be helpful in guiding future research on improving LLMs' pure reasoning skills and generalizability. Our data is at https://huggingface.co/datasets/causalnlp/corr2cause. Our code is at https://github.com/causalNLP/corr2cause.
△ Less
Submitted 17 April, 2024; v1 submitted 9 June, 2023;
originally announced June 2023.
-
OPT-R: Exploring the Role of Explanations in Finetuning and Prompting for Reasoning Skills of Large Language Models
Authors:
Badr AlKhamissi,
Siddharth Verma,
Ping Yu,
Zhijing Jin,
Asli Celikyilmaz,
Mona Diab
Abstract:
In this paper, we conduct a thorough investigation into the reasoning capabilities of Large Language Models (LLMs), focusing specifically on the Open Pretrained Transformers (OPT) models as a representative of such models. Our study entails finetuning three different sizes of OPT on a carefully curated reasoning corpus, resulting in two sets of finetuned models: OPT-R, finetuned without explanatio…
▽ More
In this paper, we conduct a thorough investigation into the reasoning capabilities of Large Language Models (LLMs), focusing specifically on the Open Pretrained Transformers (OPT) models as a representative of such models. Our study entails finetuning three different sizes of OPT on a carefully curated reasoning corpus, resulting in two sets of finetuned models: OPT-R, finetuned without explanations, and OPT-RE, finetuned with explanations. We then evaluate all models on 57 out-of-domain tasks drawn from the SUPER-NATURALINSTRUCTIONS benchmark, covering 26 distinct reasoning skills, utilizing three prompting techniques. Through a comprehensive grid of 27 configurations and 6,156 test evaluations, we investigate the dimensions of finetuning, prompting, and scale to understand the role of explanations on different reasoning skills. Our findings reveal that having explanations in the fewshot exemplar has no significant impact on the model's performance when the model is finetuned, while positively affecting the non-finetuned counterpart. Moreover, we observe a slight yet consistent increase in classification accuracy as we incorporate explanations during prompting and finetuning, respectively. Finally, we offer insights on which skills benefit the most from incorporating explanations during finetuning and prompting, such as Numerical (+20.4%) and Analogical (+13.9%) reasoning, as well as skills that exhibit negligible or negative effects.
△ Less
Submitted 24 October, 2023; v1 submitted 19 May, 2023;
originally announced May 2023.
-
A flexible short recurrence Krylov subspace method for matrices arising in the time integration of port Hamiltonian systems and ODEs/DAEs with a dissipative Hamiltonian
Authors:
Malak Diab,
Andreas Frommer,
Karsten Kahl
Abstract:
For several classes of mathematical models that yield linear systems, the splitting of the matrix into its Hermitian and skew Hermitian parts is naturally related to properties of the underlying model. This is particularly so for discretizations of dissipative Hamiltonian ODEs, DAEs and port Hamiltonian systems where, in addition, the Hermitian part is positive definite or semi-definite. It is the…
▽ More
For several classes of mathematical models that yield linear systems, the splitting of the matrix into its Hermitian and skew Hermitian parts is naturally related to properties of the underlying model. This is particularly so for discretizations of dissipative Hamiltonian ODEs, DAEs and port Hamiltonian systems where, in addition, the Hermitian part is positive definite or semi-definite. It is then possible to develop short recurrence optimal Krylov subspace methods in which the Hermitian part is used as a preconditioner. In this paper we develop new, right preconditioned variants of this approach which as their crucial new feature allow the systems with the Hermitian part to be solved only approximately in each iteration while keeping the short recurrences. This new class of methods is particularly efficient as it allows, for example, to use few steps of a multigrid solver or a (preconditioned) CG method for the Hermitian part in each iteration. We illustrate this with several numerical experiments for large scale systems.
△ Less
Submitted 29 December, 2022;
originally announced December 2022.
-
ALERT: Adapting Language Models to Reasoning Tasks
Authors:
Ping Yu,
Tianlu Wang,
Olga Golovneva,
Badr AlKhamissi,
Siddharth Verma,
Zhijing Jin,
Gargi Ghosh,
Mona Diab,
Asli Celikyilmaz
Abstract:
Current large language models can perform reasonably well on complex tasks that require step-by-step reasoning with few-shot learning. Are these models applying reasoning skills they have learnt during pre-training and reason outside of their training context, or are they simply memorizing their training corpus at finer granularity and have learnt to better understand their context? To tease apart…
▽ More
Current large language models can perform reasonably well on complex tasks that require step-by-step reasoning with few-shot learning. Are these models applying reasoning skills they have learnt during pre-training and reason outside of their training context, or are they simply memorizing their training corpus at finer granularity and have learnt to better understand their context? To tease apart these possibilities, we introduce ALERT, a benchmark and suite of analyses for assessing language models' reasoning ability comparing pre-trained and finetuned models on complex tasks that require reasoning skills to solve. ALERT provides a test bed to asses any language model on fine-grained reasoning skills, which spans over 20 datasets and covers 10 different reasoning skills. We leverage ALERT to further investigate the role of finetuning. With extensive empirical analysis we find that language models learn more reasoning skills such as textual entailment, abductive reasoning, and analogical reasoning during finetuning stage compared to pretraining state. We also find that when language models are finetuned they tend to overfit to the prompt template, which hurts the robustness of models causing generalization problems.
△ Less
Submitted 7 July, 2023; v1 submitted 16 December, 2022;
originally announced December 2022.
-
Artificial Potential Field-Based Path Planning for Cluttered Environments
Authors:
Mosab Diab,
Mostafa Mohammadkarimi,
Raj Thilak Rajan
Abstract:
In this paper, we study path planning algorithms of resource constrained mobile agents in unknown cluttered environments, which include but are not limited to various terrestrial missions e.g., search and rescue missions by drones in jungles, and space missions e.g., navigation of rovers on the Moon. In particular, we focus our attention on artificial potential field (APF) based methods, in which…
▽ More
In this paper, we study path planning algorithms of resource constrained mobile agents in unknown cluttered environments, which include but are not limited to various terrestrial missions e.g., search and rescue missions by drones in jungles, and space missions e.g., navigation of rovers on the Moon. In particular, we focus our attention on artificial potential field (APF) based methods, in which the target is attractive while the obstacles are repulsive to the mobile agent. In this paper, we propose two major updates to the classical APF algorithm which significantly improve the performance of path planning using APF. First, we propose to improve an existing classical method that replaces the gradient descent optimization of the potential field cost function on a continuous domain with a combinatorial optimization on a set of predefined points (called bacteria points) around the agent's current location. Our proposition includes an adaptive hyperparameter that changes the value of the potential function associated to each bacteria point based on the current environmental measurements. Our proposed solution improves the navigation performance in terms of convergence to the target at the expense of minimal increase in computational complexity. Second, we propose an improved potential field cost function of the bacteria points by introducing a new branching cost function which further improves the navigation performance. The algorithms were tested on a set of Monte Carlo simulation trials where the environment changes for each trial. Our simulation results show 25% lower navigation time and around 300% higher success rate compared to the conventional potential field method, and we present future directions for research.
△ Less
Submitted 29 August, 2023; v1 submitted 31 October, 2022;
originally announced October 2022.
-
Enabling Classifiers to Make Judgements Explicitly Aligned with Human Values
Authors:
Yejin Bang,
Tiezheng Yu,
Andrea Madotto,
Zhaojiang Lin,
Mona Diab,
Pascale Fung
Abstract:
Many NLP classification tasks, such as sexism/racism detection or toxicity detection, are based on human values. Yet, human values can vary under diverse cultural conditions. Therefore, we introduce a framework for value-aligned classification that performs prediction based on explicitly written human values in the command. Along with the task, we propose a practical approach that distills value-a…
▽ More
Many NLP classification tasks, such as sexism/racism detection or toxicity detection, are based on human values. Yet, human values can vary under diverse cultural conditions. Therefore, we introduce a framework for value-aligned classification that performs prediction based on explicitly written human values in the command. Along with the task, we propose a practical approach that distills value-aligned knowledge from large-scale language models (LLMs) to construct value-aligned classifiers in two steps. First, we generate value-aligned training data from LLMs by prompt-based few-shot learning. Next, we fine-tune smaller classification models with the generated data for the task. Empirical results show that our VA-Models surpass multiple baselines by at least 15.56% on the F1-score, including few-shot learning with OPT-175B and existing text augmentation methods. We suggest that using classifiers with explicit human value input improves both inclusivity & explainability in AI.
△ Less
Submitted 14 October, 2022;
originally announced October 2022.
-
Text Characterization Toolkit
Authors:
Daniel Simig,
Tianlu Wang,
Verna Dankers,
Peter Henderson,
Khuyagbaatar Batsuren,
Dieuwke Hupkes,
Mona Diab
Abstract:
In NLP, models are usually evaluated by reporting single-number performance scores on a number of readily available benchmarks, without much deeper analysis. Here, we argue that - especially given the well-known fact that benchmarks often contain biases, artefacts, and spurious correlations - deeper results analysis should become the de-facto standard when presenting new models or benchmarks. We p…
▽ More
In NLP, models are usually evaluated by reporting single-number performance scores on a number of readily available benchmarks, without much deeper analysis. Here, we argue that - especially given the well-known fact that benchmarks often contain biases, artefacts, and spurious correlations - deeper results analysis should become the de-facto standard when presenting new models or benchmarks. We present a tool that researchers can use to study properties of the dataset and the influence of those properties on their models' behaviour. Our Text Characterization Toolkit includes both an easy-to-use annotation tool, as well as off-the-shelf scripts that can be used for specific analyses. We also present use-cases from three different domains: we use the tool to predict what are difficult examples for given well-known trained models and identify (potentially harmful) biases and heuristics that are present in a dataset.
△ Less
Submitted 4 October, 2022;
originally announced October 2022.
-
Depth-Wise Attention (DWAtt): A Layer Fusion Method for Data-Efficient Classification
Authors:
Muhammad ElNokrashy,
Badr AlKhamissi,
Mona Diab
Abstract:
Language Models pretrained on large textual data have been shown to encode different types of knowledge simultaneously. Traditionally, only the features from the last layer are used when adapting to new tasks or data. We put forward that, when using or finetuning deep pretrained models, intermediate layer features that may be relevant to the downstream task are buried too deep to be used efficient…
▽ More
Language Models pretrained on large textual data have been shown to encode different types of knowledge simultaneously. Traditionally, only the features from the last layer are used when adapting to new tasks or data. We put forward that, when using or finetuning deep pretrained models, intermediate layer features that may be relevant to the downstream task are buried too deep to be used efficiently in terms of needed samples or steps. To test this, we propose a new layer fusion method: Depth-Wise Attention (DWAtt), to help re-surface signals from non-final layers. We compare DWAtt to a basic concatenation-based layer fusion method (Concat), and compare both to a deeper model baseline -- all kept within a similar parameter budget. Our findings show that DWAtt and Concat are more step- and sample-efficient than the baseline, especially in the few-shot setting. DWAtt outperforms Concat on larger data sizes. On CoNLL-03 NER, layer fusion shows 3.68--9.73% F1 gain at different few-shot sizes. The layer fusion models presented significantly outperform the baseline in various training scenarios with different data sizes, architectures, and training constraints.
△ Less
Submitted 7 May, 2024; v1 submitted 29 September, 2022;
originally announced September 2022.
-
ToKen: Task Decomposition and Knowledge Infusion for Few-Shot Hate Speech Detection
Authors:
Badr AlKhamissi,
Faisal Ladhak,
Srini Iyer,
Ves Stoyanov,
Zornitsa Kozareva,
Xian Li,
Pascale Fung,
Lambert Mathias,
Asli Celikyilmaz,
Mona Diab
Abstract:
Hate speech detection is complex; it relies on commonsense reasoning, knowledge of stereotypes, and an understanding of social nuance that differs from one culture to the next. It is also difficult to collect a large-scale hate speech annotated dataset. In this work, we frame this problem as a few-shot learning task, and show significant gains with decomposing the task into its "constituent" parts…
▽ More
Hate speech detection is complex; it relies on commonsense reasoning, knowledge of stereotypes, and an understanding of social nuance that differs from one culture to the next. It is also difficult to collect a large-scale hate speech annotated dataset. In this work, we frame this problem as a few-shot learning task, and show significant gains with decomposing the task into its "constituent" parts. In addition, we see that infusing knowledge from reasoning datasets (e.g. Atomic2020) improves the performance even further. Moreover, we observe that the trained models generalize to out-of-distribution datasets, showing the superiority of task decomposition and knowledge infusion compared to previously used methods. Concretely, our method outperforms the baseline by 17.83% absolute gain in the 16-shot case.
△ Less
Submitted 20 May, 2023; v1 submitted 25 May, 2022;
originally announced May 2022.
-
GisPy: A Tool for Measuring Gist Inference Score in Text
Authors:
Pedram Hosseini,
Christopher R. Wolfe,
Mona Diab,
David A. Broniatowski
Abstract:
Decision making theories such as Fuzzy-Trace Theory (FTT) suggest that individuals tend to rely on gist, or bottom-line meaning, in the text when making decisions. In this work, we delineate the process of developing GisPy, an open-source tool in Python for measuring the Gist Inference Score (GIS) in text. Evaluation of GisPy on documents in three benchmarks from the news and scientific text domai…
▽ More
Decision making theories such as Fuzzy-Trace Theory (FTT) suggest that individuals tend to rely on gist, or bottom-line meaning, in the text when making decisions. In this work, we delineate the process of developing GisPy, an open-source tool in Python for measuring the Gist Inference Score (GIS) in text. Evaluation of GisPy on documents in three benchmarks from the news and scientific text domains demonstrates that scores generated by our tool significantly distinguish low vs. high gist documents. Our tool is publicly available to use at: https://github.com/phosseini/GisPy.
△ Less
Submitted 25 May, 2022;
originally announced May 2022.
-
Consistent Human Evaluation of Machine Translation across Language Pairs
Authors:
Daniel Licht,
Cynthia Gao,
Janice Lam,
Francisco Guzman,
Mona Diab,
Philipp Koehn
Abstract:
Obtaining meaningful quality scores for machine translation systems through human evaluation remains a challenge given the high variability between human evaluators, partly due to subjective expectations for translation quality for different language pairs. We propose a new metric called XSTS that is more focused on semantic equivalence and a cross-lingual calibration method that enables more cons…
▽ More
Obtaining meaningful quality scores for machine translation systems through human evaluation remains a challenge given the high variability between human evaluators, partly due to subjective expectations for translation quality for different language pairs. We propose a new metric called XSTS that is more focused on semantic equivalence and a cross-lingual calibration method that enables more consistent assessment. We demonstrate the effectiveness of these novel contributions in large scale evaluation studies across up to 14 language pairs, with translation both into and out of English.
△ Less
Submitted 17 May, 2022;
originally announced May 2022.
-
Meta AI at Arabic Hate Speech 2022: MultiTask Learning with Self-Correction for Hate Speech Classification
Authors:
Badr AlKhamissi,
Mona Diab
Abstract:
In this paper, we tackle the Arabic Fine-Grained Hate Speech Detection shared task and demonstrate significant improvements over reported baselines for its three subtasks. The tasks are to predict if a tweet contains (1) Offensive language; and whether it is considered (2) Hate Speech or not and if so, then predict the (3) Fine-Grained Hate Speech label from one of six categories. Our final soluti…
▽ More
In this paper, we tackle the Arabic Fine-Grained Hate Speech Detection shared task and demonstrate significant improvements over reported baselines for its three subtasks. The tasks are to predict if a tweet contains (1) Offensive language; and whether it is considered (2) Hate Speech or not and if so, then predict the (3) Fine-Grained Hate Speech label from one of six categories. Our final solution is an ensemble of models that employs multitask learning and a self-consistency correction method yielding 82.7% on the hate speech subtask -- reflecting a 3.4% relative improvement compared to previous work.
△ Less
Submitted 16 May, 2022;
originally announced May 2022.
-
OPT: Open Pre-trained Transformer Language Models
Authors:
Susan Zhang,
Stephen Roller,
Naman Goyal,
Mikel Artetxe,
Moya Chen,
Shuohui Chen,
Christopher Dewan,
Mona Diab,
Xian Li,
Xi Victoria Lin,
Todor Mihaylov,
Myle Ott,
Sam Shleifer,
Kurt Shuster,
Daniel Simig,
Punit Singh Koura,
Anjali Sridhar,
Tianlu Wang,
Luke Zettlemoyer
Abstract:
Large language models, which are often trained for hundreds of thousands of compute days, have shown remarkable capabilities for zero- and few-shot learning. Given their computational cost, these models are difficult to replicate without significant capital. For the few that are available through APIs, no access is granted to the full model weights, making them difficult to study. We present Open…
▽ More
Large language models, which are often trained for hundreds of thousands of compute days, have shown remarkable capabilities for zero- and few-shot learning. Given their computational cost, these models are difficult to replicate without significant capital. For the few that are available through APIs, no access is granted to the full model weights, making them difficult to study. We present Open Pre-trained Transformers (OPT), a suite of decoder-only pre-trained transformers ranging from 125M to 175B parameters, which we aim to fully and responsibly share with interested researchers. We show that OPT-175B is comparable to GPT-3, while requiring only 1/7th the carbon footprint to develop. We are also releasing our logbook detailing the infrastructure challenges we faced, along with code for experimenting with all of the released models.
△ Less
Submitted 21 June, 2022; v1 submitted 2 May, 2022;
originally announced May 2022.
-
A Review on Language Models as Knowledge Bases
Authors:
Badr AlKhamissi,
Millicent Li,
Asli Celikyilmaz,
Mona Diab,
Marjan Ghazvininejad
Abstract:
Recently, there has been a surge of interest in the NLP community on the use of pretrained Language Models (LMs) as Knowledge Bases (KBs). Researchers have shown that LMs trained on a sufficiently large (web) corpus will encode a significant amount of knowledge implicitly in its parameters. The resulting LM can be probed for different kinds of knowledge and thus acting as a KB. This has a major ad…
▽ More
Recently, there has been a surge of interest in the NLP community on the use of pretrained Language Models (LMs) as Knowledge Bases (KBs). Researchers have shown that LMs trained on a sufficiently large (web) corpus will encode a significant amount of knowledge implicitly in its parameters. The resulting LM can be probed for different kinds of knowledge and thus acting as a KB. This has a major advantage over traditional KBs in that this method requires no human supervision. In this paper, we present a set of aspects that we deem a LM should have to fully act as a KB, and review the recent literature with respect to those aspects.
△ Less
Submitted 12 April, 2022;
originally announced April 2022.
-
Towards Responsible Natural Language Annotation for the Varieties of Arabic
Authors:
A. Stevie Bergman,
Mona T. Diab
Abstract:
When building NLP models, there is a tendency to aim for broader coverage, often overlooking cultural and (socio)linguistic nuance. In this position paper, we make the case for care and attention to such nuances, particularly in dataset annotation, as well as the inclusion of cultural and linguistic expertise in the process. We present a playbook for responsible dataset creation for polyglossic, m…
▽ More
When building NLP models, there is a tendency to aim for broader coverage, often overlooking cultural and (socio)linguistic nuance. In this position paper, we make the case for care and attention to such nuances, particularly in dataset annotation, as well as the inclusion of cultural and linguistic expertise in the process. We present a playbook for responsible dataset creation for polyglossic, multidialectal languages. This work is informed by a study on Arabic annotation of social media content.
△ Less
Submitted 17 March, 2022;
originally announced March 2022.
-
CALCS 2021 Shared Task: Machine Translation for Code-Switched Data
Authors:
Shuguang Chen,
Gustavo Aguilar,
Anirudh Srinivasan,
Mona Diab,
Thamar Solorio
Abstract:
To date, efforts in the code-switching literature have focused for the most part on language identification, POS, NER, and syntactic parsing. In this paper, we address machine translation for code-switched social media data. We create a community shared task. We provide two modalities for participation: supervised and unsupervised. For the supervised setting, participants are challenged to transla…
▽ More
To date, efforts in the code-switching literature have focused for the most part on language identification, POS, NER, and syntactic parsing. In this paper, we address machine translation for code-switched social media data. We create a community shared task. We provide two modalities for participation: supervised and unsupervised. For the supervised setting, participants are challenged to translate English into Hindi-English (Eng-Hinglish) in a single direction. For the unsupervised setting, we provide the following language pairs: English and Spanish-English (Eng-Spanglish), and English and Modern Standard Arabic-Egyptian Arabic (Eng-MSAEA) in both directions. We share insights and challenges in curating the "into" code-switching language evaluation data. Further, we provide baselines for all language pairs in the shared task. The leaderboard for the shared task comprises 12 individual system submissions corresponding to 5 different teams. The best performance achieved is 12.67% BLEU score for English to Hinglish and 25.72% BLEU score for MSAEA to English.
△ Less
Submitted 19 February, 2022;
originally announced February 2022.
-
A Quantitative and Qualitative Analysis of Schizophrenia Language
Authors:
Amal Alqahtani,
Efsun Sarioglu Kay,
Sardar Hamidian,
Michael Compton,
Mona Diab
Abstract:
Schizophrenia is one of the most disabling mental health conditions to live with. Approximately one percent of the population has schizophrenia which makes it fairly common, and it affects many people and their families. Patients with schizophrenia suffer different symptoms: formal thought disorder (FTD), delusions, and emotional flatness. In this paper, we quantitatively and qualitatively analyze…
▽ More
Schizophrenia is one of the most disabling mental health conditions to live with. Approximately one percent of the population has schizophrenia which makes it fairly common, and it affects many people and their families. Patients with schizophrenia suffer different symptoms: formal thought disorder (FTD), delusions, and emotional flatness. In this paper, we quantitatively and qualitatively analyze the language of patients with schizophrenia measuring various linguistic features in two modalities: speech and written text. We examine the following features: coherence and cohesion of thoughts, emotions, specificity, level of committed belief (LCB), and personality traits. Our results show that patients with schizophrenia score high in fear and neuroticism compared to healthy controls. In addition, they are more committed to their beliefs, and their writing lacks details. They score lower in most of the linguistic features of cohesion with significant p-values.
△ Less
Submitted 25 January, 2022;
originally announced January 2022.
-
Efficient Large Scale Language Modeling with Mixtures of Experts
Authors:
Mikel Artetxe,
Shruti Bhosale,
Naman Goyal,
Todor Mihaylov,
Myle Ott,
Sam Shleifer,
Xi Victoria Lin,
Jingfei Du,
Srinivasan Iyer,
Ramakanth Pasunuru,
Giri Anantharaman,
Xian Li,
Shuohui Chen,
Halil Akin,
Mandeep Baines,
Louis Martin,
Xing Zhou,
Punit Singh Koura,
Brian O'Horo,
Jeff Wang,
Luke Zettlemoyer,
Mona Diab,
Zornitsa Kozareva,
Ves Stoyanov
Abstract:
Mixture of Experts layers (MoEs) enable efficient scaling of language models through conditional computation. This paper presents a detailed empirical study of how autoregressive MoE language models scale in comparison with dense models in a wide range of settings: in- and out-of-domain language modeling, zero- and few-shot priming, and full-shot fine-tuning. With the exception of fine-tuning, we…
▽ More
Mixture of Experts layers (MoEs) enable efficient scaling of language models through conditional computation. This paper presents a detailed empirical study of how autoregressive MoE language models scale in comparison with dense models in a wide range of settings: in- and out-of-domain language modeling, zero- and few-shot priming, and full-shot fine-tuning. With the exception of fine-tuning, we find MoEs to be substantially more compute efficient. At more modest training budgets, MoEs can match the performance of dense models using $\sim$4 times less compute. This gap narrows at scale, but our largest MoE model (1.1T parameters) consistently outperforms a compute-equivalent dense model (6.7B parameters). Overall, this performance gap varies greatly across tasks and domains, suggesting that MoE and dense models generalize differently in ways that are worthy of future study. We make our code and models publicly available for research use.
△ Less
Submitted 26 October, 2022; v1 submitted 20 December, 2021;
originally announced December 2021.
-
Few-shot Learning with Multilingual Language Models
Authors:
Xi Victoria Lin,
Todor Mihaylov,
Mikel Artetxe,
Tianlu Wang,
Shuohui Chen,
Daniel Simig,
Myle Ott,
Naman Goyal,
Shruti Bhosale,
Jingfei Du,
Ramakanth Pasunuru,
Sam Shleifer,
Punit Singh Koura,
Vishrav Chaudhary,
Brian O'Horo,
Jeff Wang,
Luke Zettlemoyer,
Zornitsa Kozareva,
Mona Diab,
Veselin Stoyanov,
Xian Li
Abstract:
Large-scale generative language models such as GPT-3 are competitive few-shot learners. While these models are known to be able to jointly represent many different languages, their training data is dominated by English, potentially limiting their cross-lingual generalization. In this work, we train multilingual generative language models on a corpus covering a diverse set of languages, and study t…
▽ More
Large-scale generative language models such as GPT-3 are competitive few-shot learners. While these models are known to be able to jointly represent many different languages, their training data is dominated by English, potentially limiting their cross-lingual generalization. In this work, we train multilingual generative language models on a corpus covering a diverse set of languages, and study their few- and zero-shot learning capabilities in a wide range of tasks. Our largest model with 7.5 billion parameters sets new state of the art in few-shot learning in more than 20 representative languages, outperforming GPT-3 of comparable size in multilingual commonsense reasoning (with +7.4% absolute accuracy improvement in 0-shot settings and +9.4% in 4-shot settings) and natural language inference (+5.4% in each of 0-shot and 4-shot settings). On the FLORES-101 machine translation benchmark, our model outperforms GPT-3 on 171 out of 182 directions with 32 training examples, while surpassing the official supervised baseline in 45 directions. We conduct an in-depth analysis of different multilingual prompting approaches, showing in particular that strong few-shot learning performance across languages can be achieved via cross-lingual transfer through both templates and demonstration examples. Finally, we evaluate our models in social value tasks such as hate speech detection in five languages and find it has limitations similar to comparable sized GPT-3 models.
△ Less
Submitted 10 November, 2022; v1 submitted 20 December, 2021;
originally announced December 2021.
-
Knowledge-Augmented Language Models for Cause-Effect Relation Classification
Authors:
Pedram Hosseini,
David A. Broniatowski,
Mona Diab
Abstract:
Previous studies have shown the efficacy of knowledge augmentation methods in pretrained language models. However, these methods behave differently across domains and downstream tasks. In this work, we investigate the augmentation of pretrained language models with commonsense knowledge in the cause-effect relation classification and commonsense causal reasoning tasks. After automatically verbaliz…
▽ More
Previous studies have shown the efficacy of knowledge augmentation methods in pretrained language models. However, these methods behave differently across domains and downstream tasks. In this work, we investigate the augmentation of pretrained language models with commonsense knowledge in the cause-effect relation classification and commonsense causal reasoning tasks. After automatically verbalizing ATOMIC2020, a wide coverage commonsense reasoning knowledge graph, and GLUCOSE, a dataset of implicit commonsense causal knowledge, we continually pretrain BERT and RoBERTa with the verbalized data. Then we evaluate the resulting models on cause-effect pair classification and answering commonsense causal reasoning questions. Our results show that continually pretrained language models augmented with commonsense knowledge outperform our baselines on two commonsense causal reasoning benchmarks, COPA and BCOPA-CE, and the Temporal and Causal Reasoning (TCR) dataset, without additional improvement in model architecture or using quality-enhanced data for fine-tuning.
△ Less
Submitted 1 June, 2022; v1 submitted 15 December, 2021;
originally announced December 2021.
-
Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs
Authors:
Peter Hase,
Mona Diab,
Asli Celikyilmaz,
Xian Li,
Zornitsa Kozareva,
Veselin Stoyanov,
Mohit Bansal,
Srinivasan Iyer
Abstract:
Do language models have beliefs about the world? Dennett (1995) famously argues that even thermostats have beliefs, on the view that a belief is simply an informational state decoupled from any motivational state. In this paper, we discuss approaches to detecting when models have beliefs about the world, and we improve on methods for updating model beliefs to be more truthful, with a focus on meth…
▽ More
Do language models have beliefs about the world? Dennett (1995) famously argues that even thermostats have beliefs, on the view that a belief is simply an informational state decoupled from any motivational state. In this paper, we discuss approaches to detecting when models have beliefs about the world, and we improve on methods for updating model beliefs to be more truthful, with a focus on methods based on learned optimizers or hypernetworks. Our main contributions include: (1) new metrics for evaluating belief-updating methods that focus on the logical consistency of beliefs, (2) a training objective for Sequential, Local, and Generalizing model updates (SLAG) that improves the performance of learned optimizers, and (3) the introduction of the belief graph, which is a new form of interface with language models that shows the interdependencies between model beliefs. Our experiments suggest that models possess belief-like qualities to only a limited extent, but update methods can both fix incorrect model beliefs and greatly improve their consistency. Although off-the-shelf optimizers are surprisingly strong belief-updating baselines, our learned optimizers can outperform them in more difficult settings than have been considered in past work. Code is available at https://github.com/peterbhase/SLAG-Belief-Updating
△ Less
Submitted 26 November, 2021;
originally announced November 2021.
-
AnswerSumm: A Manually-Curated Dataset and Pipeline for Answer Summarization
Authors:
Alexander R. Fabbri,
Xiaojian Wu,
Srini Iyer,
Haoran Li,
Mona Diab
Abstract:
Community Question Answering (CQA) fora such as Stack Overflow and Yahoo! Answers contain a rich resource of answers to a wide range of community-based questions. Each question thread can receive a large number of answers with different perspectives. One goal of answer summarization is to produce a summary that reflects the range of answer perspectives. A major obstacle for this task is the absenc…
▽ More
Community Question Answering (CQA) fora such as Stack Overflow and Yahoo! Answers contain a rich resource of answers to a wide range of community-based questions. Each question thread can receive a large number of answers with different perspectives. One goal of answer summarization is to produce a summary that reflects the range of answer perspectives. A major obstacle for this task is the absence of a dataset to provide supervision for producing such summaries. Recent works propose heuristics to create such data, but these are often noisy and do not cover all answer perspectives present. This work introduces a novel dataset of 4,631 CQA threads for answer summarization curated by professional linguists. Our pipeline gathers annotations for all subtasks of answer summarization, including relevant answer sentence selection, grouping these sentences based on perspectives, summarizing each perspective, and producing an overall summary. We analyze and benchmark state-of-the-art models on these subtasks and introduce a novel unsupervised approach for multi-perspective data augmentation that boosts summarization performance according to automatic evaluation. Finally, we propose reinforcement learning rewards to improve factual consistency and answer coverage and analyze areas for improvement.
△ Less
Submitted 29 April, 2022; v1 submitted 11 November, 2021;
originally announced November 2021.
-
Chiral magnetic properties of QCD phase-diagram
Authors:
Abdel Nasser Tawfik,
Abdel Magied Diab
Abstract:
The QCD phase diagram is studied, at finite magnetic field. Our calculations are based on the QCD effective model, the SU($3$) Polyakov linear sigma model (PLSM), in which the chiral symmetry is integrated in the hadron phase and in the parton phase, the up-, down- and strange-quark degrees of freedom are incorporated besides the inclusion of Polyakov loop potentials in the pure gauge limit, which…
▽ More
The QCD phase diagram is studied, at finite magnetic field. Our calculations are based on the QCD effective model, the SU($3$) Polyakov linear sigma model (PLSM), in which the chiral symmetry is integrated in the hadron phase and in the parton phase, the up-, down- and strange-quark degrees of freedom are incorporated besides the inclusion of Polyakov loop potentials in the pure gauge limit, which are motivated by various underlying QCD symmetries. The Landau quantization and the magnetic catalysis are implemented. The response of the QCD matter to an external magnetic field such as magnetization, magnetic susceptibility and permeability has been estimated. We conclude that the parton phase has higher values of magnetization, magnetic susceptibility, and permeability relative to the hadron phase. Depending on the contributions to the Landau levels, we conclude that the chiral magnetic field enhances the chiral quark condensates and hence the chiral QCD phase diagram, i.e. the hadron-parton phase transition likely takes place, at lower critical temperatures and chemical potentials.
△ Less
Submitted 8 June, 2021;
originally announced June 2021.
-
Discrete Cosine Transform as Universal Sentence Encoder
Authors:
Nada Almarwani,
Mona Diab
Abstract:
Modern sentence encoders are used to generate dense vector representations that capture the underlying linguistic characteristics for a sequence of words, including phrases, sentences, or paragraphs. These kinds of representations are ideal for training a classifier for an end task such as sentiment analysis, question answering and text classification. Different models have been proposed to effici…
▽ More
Modern sentence encoders are used to generate dense vector representations that capture the underlying linguistic characteristics for a sequence of words, including phrases, sentences, or paragraphs. These kinds of representations are ideal for training a classifier for an end task such as sentiment analysis, question answering and text classification. Different models have been proposed to efficiently generate general purpose sentence representations to be used in pretraining protocols. While averaging is the most commonly used efficient sentence encoder, Discrete Cosine Transform (DCT) was recently proposed as an alternative that captures the underlying syntactic characteristics of a given text without compromising practical efficiency compared to averaging. However, as with most other sentence encoders, the DCT sentence encoder was only evaluated in English. To this end, we utilize DCT encoder to generate universal sentence representation for different languages such as German, French, Spanish and Russian. The experimental results clearly show the superior effectiveness of DCT encoding in which consistent performance improvements are achieved over strong baselines on multiple standardized datasets.
△ Less
Submitted 2 June, 2021;
originally announced June 2021.
-
Gender Bias Amplification During Speed-Quality Optimization in Neural Machine Translation
Authors:
Adithya Renduchintala,
Denise Diaz,
Kenneth Heafield,
Xian Li,
Mona Diab
Abstract:
Is bias amplified when neural machine translation (NMT) models are optimized for speed and evaluated on generic test sets using BLEU? We investigate architectures and techniques commonly used to speed up decoding in Transformer-based models, such as greedy search, quantization, average attention networks (AANs) and shallow decoder models and show their effect on gendered noun translation. We const…
▽ More
Is bias amplified when neural machine translation (NMT) models are optimized for speed and evaluated on generic test sets using BLEU? We investigate architectures and techniques commonly used to speed up decoding in Transformer-based models, such as greedy search, quantization, average attention networks (AANs) and shallow decoder models and show their effect on gendered noun translation. We construct a new gender bias test set, SimpleGEN, based on gendered noun phrases in which there is a single, unambiguous, correct answer. While we find minimal overall BLEU degradation as we apply speed optimizations, we observe that gendered noun translation performance degrades at a much faster rate.
△ Less
Submitted 31 May, 2021;
originally announced June 2021.
-
Adapting High-resource NMT Models to Translate Low-resource Related Languages without Parallel Data
Authors:
Wei-Jen Ko,
Ahmed El-Kishky,
Adithya Renduchintala,
Vishrav Chaudhary,
Naman Goyal,
Francisco Guzmán,
Pascale Fung,
Philipp Koehn,
Mona Diab
Abstract:
The scarcity of parallel data is a major obstacle for training high-quality machine translation systems for low-resource languages. Fortunately, some low-resource languages are linguistically related or similar to high-resource languages; these related languages may share many lexical or syntactic structures. In this work, we exploit this linguistic overlap to facilitate translating to and from a…
▽ More
The scarcity of parallel data is a major obstacle for training high-quality machine translation systems for low-resource languages. Fortunately, some low-resource languages are linguistically related or similar to high-resource languages; these related languages may share many lexical or syntactic structures. In this work, we exploit this linguistic overlap to facilitate translating to and from a low-resource language with only monolingual data, in addition to any parallel data in the related high-resource language. Our method, NMT-Adapt, combines denoising autoencoding, back-translation and adversarial objectives to utilize monolingual data for low-resource adaptation. We experiment on 7 languages from three different language families and show that our technique significantly improves translation into low-resource language compared to other translation baselines.
△ Less
Submitted 1 June, 2021; v1 submitted 31 May, 2021;
originally announced May 2021.
-
Optimal SMF packing in photonic lanterns: comparing theoretical topology to practical packing arrangements
Authors:
John J. Davenport,
Momen Diab,
Kalaga Madhav,
Martin M. Roth
Abstract:
Photonic lanterns rely on a close packed arrangement of single mode fibers, which are tapered and fused into one multi-mode core. Topologically optimal circle packing arrangements have been well studied. Using this, we fabricate PLs with 19 and 37 SMFs showing tightly packed, ordered arrangements with packing densities of 95 % and 99 % of theoretically achievable values, with mean adjacent core se…
▽ More
Photonic lanterns rely on a close packed arrangement of single mode fibers, which are tapered and fused into one multi-mode core. Topologically optimal circle packing arrangements have been well studied. Using this, we fabricate PLs with 19 and 37 SMFs showing tightly packed, ordered arrangements with packing densities of 95 % and 99 % of theoretically achievable values, with mean adjacent core separations of 1.03 and 1.08 fiber diameters, respectively. We demonstrate that topological circle packing data is a good predictor for optimal PL parameters.
△ Less
Submitted 19 April, 2021;
originally announced April 2021.
-
Multi-Perspective Abstractive Answer Summarization
Authors:
Alexander R. Fabbri,
Xiaojian Wu,
Srini Iyer,
Mona Diab
Abstract:
Community Question Answering (CQA) forums such as Stack Overflow and Yahoo! Answers contain a rich resource of answers to a wide range of questions. Each question thread can receive a large number of answers with different perspectives. The goal of multi-perspective answer summarization is to produce a summary that includes all perspectives of the answer. A major obstacle for multi-perspective, ab…
▽ More
Community Question Answering (CQA) forums such as Stack Overflow and Yahoo! Answers contain a rich resource of answers to a wide range of questions. Each question thread can receive a large number of answers with different perspectives. The goal of multi-perspective answer summarization is to produce a summary that includes all perspectives of the answer. A major obstacle for multi-perspective, abstractive answer summarization is the absence of a dataset to provide supervision for producing such summaries. This work introduces a novel dataset creation method to automatically create multi-perspective, bullet-point abstractive summaries from an existing CQA forum. Supervision provided by this dataset trains models to inherently produce multi-perspective summaries. Additionally, to train models to output more diverse, faithful answer summaries while retaining multiple perspectives, we propose a multi-reward optimization technique coupled with a sentence-relevance prediction multi-task loss. Our methods demonstrate improved coverage of perspectives and faithfulness as measured by automatic and human evaluations compared to a strong baseline.
△ Less
Submitted 17 April, 2021;
originally announced April 2021.
-
Simulations of mode-selective photonic lanterns for efficient coupling of starlight into the single-mode regime
Authors:
Momen Diab,
Aashana Tripathi,
John Davenport,
Aline N. Dinkelaker,
Kalaga Madhav,
Martin M. Roth
Abstract:
In ground-based astronomy, starlight distorted by the atmosphere couples poorly into single-mode waveguides but a correction by adaptive optics, even if only partial, can boost coupling into the few-mode regime allowing the use of photonic lanterns to convert into multiple single-mode beams. Corrected wavefronts result in focal patterns that couple mostly with the circularly symmetric waveguide mo…
▽ More
In ground-based astronomy, starlight distorted by the atmosphere couples poorly into single-mode waveguides but a correction by adaptive optics, even if only partial, can boost coupling into the few-mode regime allowing the use of photonic lanterns to convert into multiple single-mode beams. Corrected wavefronts result in focal patterns that couple mostly with the circularly symmetric waveguide modes. A mode-selective photonic lantern is hence proposed to convert the multimode light into a subset of the single-mode waveguides of the standard photonic lantern, thereby reducing the required number of outputs. We ran simulations to show that only two out of the six waveguides of a 1x6 photonic lantern carry >95% of the coupled light to the outputs at $D/r_0 < 10$ if the wavefront is partially corrected and the photonic lantern is made mode-selective.
△ Less
Submitted 25 March, 2021;
originally announced March 2021.
-
Predicting Directionality in Causal Relations in Text
Authors:
Pedram Hosseini,
David A. Broniatowski,
Mona Diab
Abstract:
In this work, we test the performance of two bidirectional transformer-based language models, BERT and SpanBERT, on predicting directionality in causal pairs in the textual content. Our preliminary results show that predicting direction for inter-sentence and implicit causal relations is more challenging. And, SpanBERT performs better than BERT on causal samples with longer span length. We also in…
▽ More
In this work, we test the performance of two bidirectional transformer-based language models, BERT and SpanBERT, on predicting directionality in causal pairs in the textual content. Our preliminary results show that predicting direction for inter-sentence and implicit causal relations is more challenging. And, SpanBERT performs better than BERT on causal samples with longer span length. We also introduce CREST which is a framework for unifying a collection of scattered datasets of causal relations.
△ Less
Submitted 25 March, 2021;
originally announced March 2021.
-
White Paper: Challenges and Considerations for the Creation of a Large Labelled Repository of Online Videos with Questionable Content
Authors:
Thamar Solorio,
Mahsa Shafaei,
Christos Smailis,
Mona Diab,
Theodore Giannakopoulos,
Heng Ji,
Yang Liu,
Rada Mihalcea,
Smaranda Muresan,
Ioannis Kakadiaris
Abstract:
This white paper presents a summary of the discussions regarding critical considerations to develop an extensive repository of online videos annotated with labels indicating questionable content. The main discussion points include: 1) the type of appropriate labels that will result in a valuable repository for the larger AI community; 2) how to design the collection and annotation process, as well…
▽ More
This white paper presents a summary of the discussions regarding critical considerations to develop an extensive repository of online videos annotated with labels indicating questionable content. The main discussion points include: 1) the type of appropriate labels that will result in a valuable repository for the larger AI community; 2) how to design the collection and annotation process, as well as the distribution of the corpus to maximize its potential impact; and, 3) what actions we can take to reduce risk of trauma to annotators.
△ Less
Submitted 25 January, 2021;
originally announced January 2021.
-
Starlight coupling through atmospheric turbulence into few-mode fibers and photonic lanterns in the presence of partial adaptive optics correction
Authors:
Momen Diab,
Aline N. Dinkelaker,
John Davenport,
Kalaga Madhav,
Martin M. Roth
Abstract:
Starlight corrupted by atmospheric turbulence cannot couple efficiently into astronomical instruments based on integrated optics as they require light of high spatial coherence to couple into their single-mode waveguides. Low-order adaptive optics in combination with photonic lanterns offer a practical approach to achieve efficient coupling into multiplexed astrophotonic devices. We investigate, a…
▽ More
Starlight corrupted by atmospheric turbulence cannot couple efficiently into astronomical instruments based on integrated optics as they require light of high spatial coherence to couple into their single-mode waveguides. Low-order adaptive optics in combination with photonic lanterns offer a practical approach to achieve efficient coupling into multiplexed astrophotonic devices. We investigate, aided by simulations and an experimental testbed, the trade-off between the degrees of freedom of the adaptive optics system and those of the input waveguide of an integrated optic component leading to a cost-effective hybrid system that achieves a signal-to-noise ratio higher than a standalone device fed by a single-mode fiber.
△ Less
Submitted 26 November, 2020;
originally announced November 2020.
-
A minimal Length Uncertainty Approach to Cosmological Constant Problem
Authors:
Abdel Magied Diab,
Abdel Nasser Tawfik
Abstract:
Based on quantum mechanical framework for the minimal length uncertainty, we demonstrate that the generalized uncertainty principle (GUP) parameter could be best constrained by recent gravitational waves observations on one hand. On other hand this suggests modified dispersion relations (MDRs) enabling an estimation for the difference between the group velocity of gravitons and that of photons. Ut…
▽ More
Based on quantum mechanical framework for the minimal length uncertainty, we demonstrate that the generalized uncertainty principle (GUP) parameter could be best constrained by recent gravitational waves observations on one hand. On other hand this suggests modified dispersion relations (MDRs) enabling an estimation for the difference between the group velocity of gravitons and that of photons. Utilizing features of the UV/IR correspondence and the obvious similarities between GUP (including non-gravitating and gravitating impacts on Heisenberg uncertainty principle) and the discrepancy between the theoretical and the observed cosmological constant (apparently manifesting gravitational influences on the vacuum energy density), we suggest a possible solution for the cosmological constant problem.
△ Less
Submitted 12 November, 2020;
originally announced November 2020.