-
Benchmarks and Metrics for Evaluations of Code Generation: A Critical Review
Authors:
Debalina Ghosh Paul,
Hong Zhu,
Ian Bayley
Abstract:
With the rapid development of Large Language Models (LLMs), a large number of machine learning models have been developed to assist programming tasks including the generation of program code from natural language input. However, how to evaluate such LLMs for this task is still an open problem despite of the great amount of research efforts that have been made and reported to evaluate and compare t…
▽ More
With the rapid development of Large Language Models (LLMs), a large number of machine learning models have been developed to assist programming tasks including the generation of program code from natural language input. However, how to evaluate such LLMs for this task is still an open problem despite of the great amount of research efforts that have been made and reported to evaluate and compare them. This paper provides a critical review of the existing work on the testing and evaluation of these tools with a focus on two key aspects: the benchmarks and the metrics used in the evaluations. Based on the review, further research directions are discussed.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
ScenEval: A Benchmark for Scenario-Based Evaluation of Code Generation
Authors:
Debalina Ghosh Paul,
Hong Zhu,
Ian Bayley
Abstract:
In the scenario-based evaluation of machine learning models, a key problem is how to construct test datasets that represent various scenarios. The methodology proposed in this paper is to construct a benchmark and attach metadata to each test case. Then a test system can be constructed with test morphisms that filter the test cases based on metadata to form a dataset.
The paper demonstrates this…
▽ More
In the scenario-based evaluation of machine learning models, a key problem is how to construct test datasets that represent various scenarios. The methodology proposed in this paper is to construct a benchmark and attach metadata to each test case. Then a test system can be constructed with test morphisms that filter the test cases based on metadata to form a dataset.
The paper demonstrates this methodology with large language models for code generation. A benchmark called ScenEval is constructed from problems in textbooks, an online tutorial website and Stack Overflow. Filtering by scenario is demonstrated and the test sets are used to evaluate ChatGPT for Java code generation.
Our experiments found that the performance of ChatGPT decreases with the complexity of the coding task. It is weakest for advanced topics like multi-threading, data structure algorithms and recursive methods. The Java code generated by ChatGPT tends to be much shorter than reference solution in terms of number of lines, while it is more likely to be more complex in both cyclomatic and cognitive complexity metrics, if the generated code is correct. However, the generated code is more likely to be less complex than the reference solution if the code is incorrect.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Meta-Learning with Generalized Ridge Regression: High-dimensional Asymptotics, Optimality and Hyper-covariance Estimation
Authors:
Yanhao Jin,
Krishnakumar Balasubramanian,
Debashis Paul
Abstract:
Meta-learning involves training models on a variety of training tasks in a way that enables them to generalize well on new, unseen test tasks. In this work, we consider meta-learning within the framework of high-dimensional multivariate random-effects linear models and study generalized ridge-regression based predictions. The statistical intuition of using generalized ridge regression in this sett…
▽ More
Meta-learning involves training models on a variety of training tasks in a way that enables them to generalize well on new, unseen test tasks. In this work, we consider meta-learning within the framework of high-dimensional multivariate random-effects linear models and study generalized ridge-regression based predictions. The statistical intuition of using generalized ridge regression in this setting is that the covariance structure of the random regression coefficients could be leveraged to make better predictions on new tasks. Accordingly, we first characterize the precise asymptotic behavior of the predictive risk for a new test task when the data dimension grows proportionally to the number of samples per task. We next show that this predictive risk is optimal when the weight matrix in generalized ridge regression is chosen to be the inverse of the covariance matrix of random coefficients. Finally, we propose and analyze an estimator of the inverse covariance matrix of random regression coefficients based on data from the training tasks. As opposed to intractable MLE-type estimators, the proposed estimators could be computed efficiently as they could be obtained by solving (global) geodesically-convex optimization problems. Our analysis and methodology use tools from random matrix theory and Riemannian optimization. Simulation results demonstrate the improved generalization performance of the proposed method on new unseen test tasks within the considered framework.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
PromptSet: A Programmer's Prompting Dataset
Authors:
Kaiser Pister,
Dhruba Jyoti Paul,
Patrick Brophy,
Ishan Joshi
Abstract:
The rise of capabilities expressed by large language models has been quickly followed by the integration of the same complex systems into application level logic. Algorithms, programs, systems, and companies are built around structured prompting to black box models where the majority of the design and implementation lies in capturing and quantifying the `agent mode'. The standard way to shape a cl…
▽ More
The rise of capabilities expressed by large language models has been quickly followed by the integration of the same complex systems into application level logic. Algorithms, programs, systems, and companies are built around structured prompting to black box models where the majority of the design and implementation lies in capturing and quantifying the `agent mode'. The standard way to shape a closed language model is to prime it for a specific task with a tailored prompt, often initially handwritten by a human. The textual prompts co-evolve with the codebase, taking shape over the course of project life as artifacts which must be reviewed and maintained, just as the traditional code files might be. Unlike traditional code, we find that prompts do not receive effective static testing and linting to prevent runtime issues. In this work, we present a novel dataset called PromptSet, with more than 61,000 unique developer prompts used in open source Python programs. We perform analysis on this dataset and introduce the notion of a static linter for prompts. Released with this publication is a HuggingFace dataset and a Github repository to recreate collection and processing efforts, both under the name \texttt{pisterlabs/promptset}.
△ Less
Submitted 26 February, 2024;
originally announced February 2024.
-
Making Reasoning Matter: Measuring and Improving Faithfulness of Chain-of-Thought Reasoning
Authors:
Debjit Paul,
Robert West,
Antoine Bosselut,
Boi Faltings
Abstract:
Large language models (LLMs) have been shown to perform better when asked to reason step-by-step before answering a question. However, it is unclear to what degree the model's final answer is faithful to the stated reasoning steps. In this paper, we perform a causal mediation analysis on twelve LLMs to examine how intermediate reasoning steps generated by the LLM influence the final outcome and fi…
▽ More
Large language models (LLMs) have been shown to perform better when asked to reason step-by-step before answering a question. However, it is unclear to what degree the model's final answer is faithful to the stated reasoning steps. In this paper, we perform a causal mediation analysis on twelve LLMs to examine how intermediate reasoning steps generated by the LLM influence the final outcome and find that LLMs do not reliably use their intermediate reasoning steps when generating an answer. To address this issue, we introduce FRODO, a framework to tailor small-sized LMs to generate correct reasoning steps and robustly reason over these steps. FRODO consists of an inference module that learns to generate correct reasoning steps using an implicit causal reward function and a reasoning module that learns to faithfully reason over these intermediate inferences using a counterfactual and causal preference objective. Our experiments show that FRODO significantly outperforms four competitive baselines. Furthermore, FRODO improves the robustness and generalization ability of the reasoning LM, yielding higher performance on out-of-distribution test sets. Finally, we find that FRODO's rationales are more faithful to its final answer predictions than standard supervised fine-tuning.
△ Less
Submitted 23 February, 2024; v1 submitted 21 February, 2024;
originally announced February 2024.
-
Convolutional Neural Networks can achieve binary bail judgement classification
Authors:
Amit Barman,
Devangan Roy,
Debapriya Paul,
Indranil Dutta,
Shouvik Kumar Guha,
Samir Karmakar,
Sudip Kumar Naskar
Abstract:
There is an evident lack of implementation of Machine Learning (ML) in the legal domain in India, and any research that does take place in this domain is usually based on data from the higher courts of law and works with English data. The lower courts and data from the different regional languages of India are often overlooked. In this paper, we deploy a Convolutional Neural Network (CNN) architec…
▽ More
There is an evident lack of implementation of Machine Learning (ML) in the legal domain in India, and any research that does take place in this domain is usually based on data from the higher courts of law and works with English data. The lower courts and data from the different regional languages of India are often overlooked. In this paper, we deploy a Convolutional Neural Network (CNN) architecture on a corpus of Hindi legal documents. We perform a bail Prediction task with the help of a CNN model and achieve an overall accuracy of 93\% which is an improvement on the benchmark accuracy, set by Kapoor et al. (2022), albeit in data from 20 districts of the Indian state of Uttar Pradesh.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
Exploring Defeasibility in Causal Reasoning
Authors:
Shaobo Cui,
Lazar Milikic,
Yiyang Feng,
Mete Ismayilzada,
Debjit Paul,
Antoine Bosselut,
Boi Faltings
Abstract:
Defeasibility in causal reasoning implies that the causal relationship between cause and effect can be strengthened or weakened. Namely, the causal strength between cause and effect should increase or decrease with the incorporation of strengthening arguments (supporters) or weakening arguments (defeaters), respectively. However, existing works ignore defeasibility in causal reasoning and fail to…
▽ More
Defeasibility in causal reasoning implies that the causal relationship between cause and effect can be strengthened or weakened. Namely, the causal strength between cause and effect should increase or decrease with the incorporation of strengthening arguments (supporters) or weakening arguments (defeaters), respectively. However, existing works ignore defeasibility in causal reasoning and fail to evaluate existing causal strength metrics in defeasible settings. In this work, we present $δ$-CAUSAL, the first benchmark dataset for studying defeasibility in causal reasoning. $δ$-CAUSAL includes around 11K events spanning ten domains, featuring defeasible causality pairs, i.e., cause-effect pairs accompanied by supporters and defeaters. We further show current causal strength metrics fail to reflect the change of causal strength with the incorporation of supporters or defeaters in $δ$-CAUSAL. To this end, we propose CESAR (Causal Embedding aSsociation with Attention Rating), a metric that measures causal strength based on token-level causal relationships. CESAR achieves a significant 69.7% relative improvement over existing metrics, increasing from 47.2% to 80.1% in capturing the causal strength change brought by supporters and defeaters. We further demonstrate even Large Language Models (LLMs) like GPT-3.5 still lag 4.5 and 10.7 points behind humans in generating supporters and defeaters, emphasizing the challenge posed by $δ$-CAUSAL.
△ Less
Submitted 27 June, 2024; v1 submitted 6 January, 2024;
originally announced January 2024.
-
Robust and Automatic Data Clustering: Dirichlet Process meets Median-of-Means
Authors:
Supratik Basu,
Jyotishka Ray Choudhury,
Debolina Paul,
Swagatam Das
Abstract:
Clustering stands as one of the most prominent challenges within the realm of unsupervised machine learning. Among the array of centroid-based clustering algorithms, the classic $k$-means algorithm, rooted in Lloyd's heuristic, takes center stage as one of the extensively employed techniques in the literature. Nonetheless, both $k$-means and its variants grapple with noteworthy limitations. These…
▽ More
Clustering stands as one of the most prominent challenges within the realm of unsupervised machine learning. Among the array of centroid-based clustering algorithms, the classic $k$-means algorithm, rooted in Lloyd's heuristic, takes center stage as one of the extensively employed techniques in the literature. Nonetheless, both $k$-means and its variants grapple with noteworthy limitations. These encompass a heavy reliance on initial cluster centroids, susceptibility to converging into local minima of the objective function, and sensitivity to outliers and noise in the data. When confronted with data containing noisy or outlier-laden observations, the Median-of-Means (MoM) estimator emerges as a stabilizing force for any centroid-based clustering framework. On a different note, a prevalent constraint among existing clustering methodologies resides in the prerequisite knowledge of the number of clusters prior to analysis. Utilizing model-based methodologies, such as Bayesian nonparametric models, offers the advantage of infinite mixture models, thereby circumventing the need for such requirements. Motivated by these facts, in this article, we present an efficient and automatic clustering technique by integrating the principles of model-based and centroid-based methodologies that mitigates the effect of noise on the quality of clustering while ensuring that the number of clusters need not be specified in advance. Statistical guarantees on the upper bound of clustering error, and rigorous assessment through simulated and real datasets suggest the advantages of our proposed method over existing state-of-the-art clustering algorithms.
△ Less
Submitted 26 November, 2023;
originally announced November 2023.
-
CRAB: Assessing the Strength of Causal Relationships Between Real-world Events
Authors:
Angelika Romanou,
Syrielle Montariol,
Debjit Paul,
Leo Laugier,
Karl Aberer,
Antoine Bosselut
Abstract:
Understanding narratives requires reasoning about the cause-and-effect relationships between events mentioned in the text. While existing foundation models yield impressive results in many NLP tasks requiring reasoning, it is unclear whether they understand the complexity of the underlying network of causal relationships of events in narratives. In this work, we present CRAB, a new Causal Reasonin…
▽ More
Understanding narratives requires reasoning about the cause-and-effect relationships between events mentioned in the text. While existing foundation models yield impressive results in many NLP tasks requiring reasoning, it is unclear whether they understand the complexity of the underlying network of causal relationships of events in narratives. In this work, we present CRAB, a new Causal Reasoning Assessment Benchmark designed to evaluate causal understanding of events in real-world narratives. CRAB contains fine-grained, contextual causality annotations for ~2.7K pairs of real-world events that describe various newsworthy event timelines (e.g., the acquisition of Twitter by Elon Musk). Using CRAB, we measure the performance of several large language models, demonstrating that most systems achieve poor performance on the task. Motivated by classical causal principles, we also analyze the causal structures of groups of events in CRAB, and find that models perform worse on causal reasoning when events are derived from complex causal structures compared to simple linear causal chains. We make our dataset and code available to the research community.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
A Simple Interpretable Transformer for Fine-Grained Image Classification and Analysis
Authors:
Dipanjyoti Paul,
Arpita Chowdhury,
Xinqi Xiong,
Feng-Ju Chang,
David Carlyn,
Samuel Stevens,
Kaiya L. Provost,
Anuj Karpatne,
Bryan Carstens,
Daniel Rubenstein,
Charles Stewart,
Tanya Berger-Wolf,
Yu Su,
Wei-Lun Chao
Abstract:
We present a novel usage of Transformers to make image classification interpretable. Unlike mainstream classifiers that wait until the last fully connected layer to incorporate class information to make predictions, we investigate a proactive approach, asking each class to search for itself in an image. We realize this idea via a Transformer encoder-decoder inspired by DEtection TRansformer (DETR)…
▽ More
We present a novel usage of Transformers to make image classification interpretable. Unlike mainstream classifiers that wait until the last fully connected layer to incorporate class information to make predictions, we investigate a proactive approach, asking each class to search for itself in an image. We realize this idea via a Transformer encoder-decoder inspired by DEtection TRansformer (DETR). We learn "class-specific" queries (one for each class) as input to the decoder, enabling each class to localize its patterns in an image via cross-attention. We name our approach INterpretable TRansformer (INTR), which is fairly easy to implement and exhibits several compelling properties. We show that INTR intrinsically encourages each class to attend distinctively; the cross-attention weights thus provide a faithful interpretation of the prediction. Interestingly, via "multi-head" cross-attention, INTR could identify different "attributes" of a class, making it particularly suitable for fine-grained classification and analysis, which we demonstrate on eight datasets. Our code and pre-trained models are publicly accessible at the Imageomics Institute GitHub site: https://github.com/Imageomics/INTR.
△ Less
Submitted 14 June, 2024; v1 submitted 7 November, 2023;
originally announced November 2023.
-
Generative AI for Software Metadata: Overview of the Information Retrieval in Software Engineering Track at FIRE 2023
Authors:
Srijoni Majumdar,
Soumen Paul,
Debjyoti Paul,
Ayan Bandyopadhyay,
Samiran Chattopadhyay,
Partha Pratim Das,
Paul D Clough,
Prasenjit Majumder
Abstract:
The Information Retrieval in Software Engineering (IRSE) track aims to develop solutions for automated evaluation of code comments in a machine learning framework based on human and large language model generated labels. In this track, there is a binary classification task to classify comments as useful and not useful. The dataset consists of 9048 code comments and surrounding code snippet pairs e…
▽ More
The Information Retrieval in Software Engineering (IRSE) track aims to develop solutions for automated evaluation of code comments in a machine learning framework based on human and large language model generated labels. In this track, there is a binary classification task to classify comments as useful and not useful. The dataset consists of 9048 code comments and surrounding code snippet pairs extracted from open source github C based projects and an additional dataset generated individually by teams using large language models. Overall 56 experiments have been submitted by 17 teams from various universities and software companies. The submissions have been evaluated quantitatively using the F1-Score and qualitatively based on the type of features developed, the supervised learning model used and their corresponding hyper-parameters. The labels generated from large language models increase the bias in the prediction model but lead to less over-fitted results.
△ Less
Submitted 27 October, 2023;
originally announced November 2023.
-
CRoW: Benchmarking Commonsense Reasoning in Real-World Tasks
Authors:
Mete Ismayilzada,
Debjit Paul,
Syrielle Montariol,
Mor Geva,
Antoine Bosselut
Abstract:
Recent efforts in natural language processing (NLP) commonsense reasoning research have yielded a considerable number of new datasets and benchmarks. However, most of these datasets formulate commonsense reasoning challenges in artificial scenarios that are not reflective of the tasks which real-world NLP systems are designed to solve. In this work, we present CRoW, a manually-curated, multi-task…
▽ More
Recent efforts in natural language processing (NLP) commonsense reasoning research have yielded a considerable number of new datasets and benchmarks. However, most of these datasets formulate commonsense reasoning challenges in artificial scenarios that are not reflective of the tasks which real-world NLP systems are designed to solve. In this work, we present CRoW, a manually-curated, multi-task benchmark that evaluates the ability of models to apply commonsense reasoning in the context of six real-world NLP tasks. CRoW is constructed using a multi-stage data collection pipeline that rewrites examples from existing datasets using commonsense-violating perturbations. We use CRoW to study how NLP systems perform across different dimensions of commonsense knowledge, such as physical, temporal, and social reasoning. We find a significant performance gap when NLP systems are evaluated on CRoW compared to humans, showcasing that commonsense reasoning is far from being solved in real-world task settings. We make our dataset and leaderboard available to the research community at https://github.com/mismayil/crow.
△ Less
Submitted 23 October, 2023;
originally announced October 2023.
-
Scaling Experiments in Self-Supervised Cross-Table Representation Learning
Authors:
Maximilian Schambach,
Dominique Paul,
Johannes S. Otterbach
Abstract:
To analyze the scaling potential of deep tabular representation learning models, we introduce a novel Transformer-based architecture specifically tailored to tabular data and cross-table representation learning by utilizing table-specific tokenizers and a shared Transformer backbone. Our training approach encompasses both single-table and cross-table models, trained via missing value imputation th…
▽ More
To analyze the scaling potential of deep tabular representation learning models, we introduce a novel Transformer-based architecture specifically tailored to tabular data and cross-table representation learning by utilizing table-specific tokenizers and a shared Transformer backbone. Our training approach encompasses both single-table and cross-table models, trained via missing value imputation through a self-supervised masked cell recovery objective. To understand the scaling behavior of our method, we train models of varying sizes, ranging from approximately $10^4$ to $10^7$ parameters. These models are trained on a carefully curated pretraining dataset, consisting of 135M training tokens sourced from 76 diverse datasets. We assess the scaling of our architecture in both single-table and cross-table pretraining setups by evaluating the pretrained models using linear probing on a curated set of benchmark datasets and comparing the results with conventional baselines.
△ Less
Submitted 29 September, 2023;
originally announced September 2023.
-
Recovering from Privacy-Preserving Masking with Large Language Models
Authors:
Arpita Vats,
Zhe Liu,
Peng Su,
Debjyoti Paul,
Yingyi Ma,
Yutong Pang,
Zeeshan Ahmed,
Ozlem Kalinli
Abstract:
Model adaptation is crucial to handle the discrepancy between proxy training data and actual users data received. To effectively perform adaptation, textual data of users is typically stored on servers or their local devices, where downstream natural language processing (NLP) models can be directly trained using such in-domain data. However, this might raise privacy and security concerns due to th…
▽ More
Model adaptation is crucial to handle the discrepancy between proxy training data and actual users data received. To effectively perform adaptation, textual data of users is typically stored on servers or their local devices, where downstream natural language processing (NLP) models can be directly trained using such in-domain data. However, this might raise privacy and security concerns due to the extra risks of exposing user information to adversaries. Replacing identifying information in textual data with a generic marker has been recently explored. In this work, we leverage large language models (LLMs) to suggest substitutes of masked tokens and have their effectiveness evaluated on downstream language modeling tasks. Specifically, we propose multiple pre-trained and fine-tuned LLM-based approaches and perform empirical studies on various datasets for the comparison of these methods. Experimental results show that models trained on the obfuscation corpora are able to achieve comparable performance with the ones trained on the original data without privacy-preserving token masking.
△ Less
Submitted 13 December, 2023; v1 submitted 12 September, 2023;
originally announced September 2023.
-
Flows: Building Blocks of Reasoning and Collaborating AI
Authors:
Martin Josifoski,
Lars Klein,
Maxime Peyrard,
Nicolas Baldwin,
Yifei Li,
Saibo Geng,
Julian Paul Schnitzler,
Yuxing Yao,
Jiheng Wei,
Debjit Paul,
Robert West
Abstract:
Recent advances in artificial intelligence (AI) have produced highly capable and controllable systems. This creates unprecedented opportunities for structured reasoning as well as collaboration among multiple AI systems and humans. To fully realize this potential, it is essential to develop a principled way of designing and studying such structured interactions. For this purpose, we introduce the…
▽ More
Recent advances in artificial intelligence (AI) have produced highly capable and controllable systems. This creates unprecedented opportunities for structured reasoning as well as collaboration among multiple AI systems and humans. To fully realize this potential, it is essential to develop a principled way of designing and studying such structured interactions. For this purpose, we introduce the conceptual framework Flows. Flows are self-contained building blocks of computation, with an isolated state, communicating through a standardized message-based interface. This modular design simplifies the process of creating Flows by allowing them to be recursively composed into arbitrarily nested interactions and is inherently concurrency-friendly. Crucially, any interaction can be implemented using this framework, including prior work on AI-AI and human-AI interactions, prompt engineering schemes, and tool augmentation. We demonstrate the potential of Flows on competitive coding, a challenging task on which even GPT-4 struggles. Our results suggest that structured reasoning and collaboration substantially improve generalization, with AI-only Flows adding +21 and human-AI Flows adding +54 absolute points in terms of solve rate. To support rapid and rigorous research, we introduce the aiFlows library embodying Flows. The aiFlows library is available at https://github.com/epfl-dlab/aiflows. Data and Flows for reproducing our experiments are available at https://github.com/epfl-dlab/cc_flows.
△ Less
Submitted 7 February, 2024; v1 submitted 2 August, 2023;
originally announced August 2023.
-
Constructing and Interpreting Causal Knowledge Graphs from News
Authors:
Fiona Anting Tan,
Debdeep Paul,
Sahim Yamaura,
Miura Koji,
See-Kiong Ng
Abstract:
Many financial jobs rely on news to learn about causal events in the past and present, to make informed decisions and predictions about the future. With the ever-increasing amount of news available online, there is a need to automate the extraction of causal events from unstructured texts. In this work, we propose a methodology to construct causal knowledge graphs (KGs) from news using two steps:…
▽ More
Many financial jobs rely on news to learn about causal events in the past and present, to make informed decisions and predictions about the future. With the ever-increasing amount of news available online, there is a need to automate the extraction of causal events from unstructured texts. In this work, we propose a methodology to construct causal knowledge graphs (KGs) from news using two steps: (1) Extraction of Causal Relations, and (2) Argument Clustering and Representation into KG. We aim to build graphs that emphasize on recall, precision and interpretability. For extraction, although many earlier works already construct causal KGs from text, most adopt rudimentary pattern-based methods. We close this gap by using the latest BERT-based extraction models alongside pattern-based ones. As a result, we achieved a high recall, while still maintaining a high precision. For clustering, we utilized a topic modelling approach to cluster our arguments, so as to increase the connectivity of our graph. As a result, instead of 15,686 disconnected subgraphs, we were able to obtain 1 connected graph that enables users to infer more causal relationships from. Our final KG effectively captures and conveys causal relationships, validated through experiments, multiple use cases and user feedback.
△ Less
Submitted 30 July, 2023; v1 submitted 16 May, 2023;
originally announced May 2023.
-
REFINER: Reasoning Feedback on Intermediate Representations
Authors:
Debjit Paul,
Mete Ismayilzada,
Maxime Peyrard,
Beatriz Borges,
Antoine Bosselut,
Robert West,
Boi Faltings
Abstract:
Language models (LMs) have recently shown remarkable performance on reasoning tasks by explicitly generating intermediate inferences, e.g., chain-of-thought prompting. However, these intermediate inference steps may be inappropriate deductions from the initial context and lead to incorrect final predictions. Here we introduce REFINER, a framework for finetuning LMs to explicitly generate intermedi…
▽ More
Language models (LMs) have recently shown remarkable performance on reasoning tasks by explicitly generating intermediate inferences, e.g., chain-of-thought prompting. However, these intermediate inference steps may be inappropriate deductions from the initial context and lead to incorrect final predictions. Here we introduce REFINER, a framework for finetuning LMs to explicitly generate intermediate reasoning steps while interacting with a critic model that provides automated feedback on the reasoning. Specifically, the critic provides structured feedback that the reasoning LM uses to iteratively improve its intermediate arguments. Empirical evaluations of REFINER on three diverse reasoning tasks show significant improvements over baseline LMs of comparable scale. Furthermore, when using GPT-3.5 or ChatGPT as the reasoner, the trained critic significantly improves reasoning without finetuning the reasoner. Finally, our critic model is trained without expensive human-in-the-loop data but can be substituted with humans at inference time.
△ Less
Submitted 4 February, 2024; v1 submitted 4 April, 2023;
originally announced April 2023.
-
Language Agnostic Data-Driven Inverse Text Normalization
Authors:
Szu-Jui Chen,
Debjyoti Paul,
Yutong Pang,
Peng Su,
Xuedong Zhang
Abstract:
With the emergence of automatic speech recognition (ASR) models, converting the spoken form text (from ASR) to the written form is in urgent need. This inverse text normalization (ITN) problem attracts the attention of researchers from various fields. Recently, several works show that data-driven ITN methods can output high-quality written form text. Due to the scarcity of labeled spoken-written d…
▽ More
With the emergence of automatic speech recognition (ASR) models, converting the spoken form text (from ASR) to the written form is in urgent need. This inverse text normalization (ITN) problem attracts the attention of researchers from various fields. Recently, several works show that data-driven ITN methods can output high-quality written form text. Due to the scarcity of labeled spoken-written datasets, the studies on non-English data-driven ITN are quite limited. In this work, we propose a language-agnostic data-driven ITN framework to fill this gap. Specifically, we leverage the data augmentation in conjunction with neural machine translated data for low resource languages. Moreover, we design an evaluation method for language agnostic ITN model when only English data is available. Our empirical evaluation shows this language agnostic modeling approach is effective for low resource languages while preserving the performance for high resource languages.
△ Less
Submitted 23 January, 2023; v1 submitted 20 January, 2023;
originally announced January 2023.
-
Language Model Decoding as Likelihood-Utility Alignment
Authors:
Martin Josifoski,
Maxime Peyrard,
Frano Rajic,
Jiheng Wei,
Debjit Paul,
Valentin Hartmann,
Barun Patra,
Vishrav Chaudhary,
Emre Kıcıman,
Boi Faltings,
Robert West
Abstract:
A critical component of a successful language generation pipeline is the decoding algorithm. However, the general principles that should guide the choice of a decoding algorithm remain unclear. Previous works only compare decoding algorithms in narrow scenarios, and their findings do not generalize across tasks. We argue that the misalignment between the model's likelihood and the task-specific no…
▽ More
A critical component of a successful language generation pipeline is the decoding algorithm. However, the general principles that should guide the choice of a decoding algorithm remain unclear. Previous works only compare decoding algorithms in narrow scenarios, and their findings do not generalize across tasks. We argue that the misalignment between the model's likelihood and the task-specific notion of utility is the key factor to understanding the effectiveness of decoding algorithms. To structure the discussion, we introduce a taxonomy of misalignment mitigation strategies (MMSs), providing a unifying view of decoding as a tool for alignment. The MMS taxonomy groups decoding algorithms based on their implicit assumptions about likelihood--utility misalignment, yielding general statements about their applicability across tasks. Specifically, by analyzing the correlation between the likelihood and the utility of predictions across a diverse set of tasks, we provide empirical evidence supporting the proposed taxonomy and a set of principles to structure reasoning when choosing a decoding algorithm. Crucially, our analysis is the first to relate likelihood-based decoding algorithms with algorithms that rely on external information, such as value-guided methods and prompting, and covers the most diverse set of tasks to date. Code, data, and models are available at https://github.com/epfl-dlab/understanding-decoding.
△ Less
Submitted 16 March, 2023; v1 submitted 13 October, 2022;
originally announced October 2022.
-
Improving Data Driven Inverse Text Normalization using Data Augmentation
Authors:
Laxmi Pandey,
Debjyoti Paul,
Pooja Chitkara,
Yutong Pang,
Xuedong Zhang,
Kjell Schubert,
Mark Chou,
Shu Liu,
Yatharth Saraf
Abstract:
Inverse text normalization (ITN) is used to convert the spoken form output of an automatic speech recognition (ASR) system to a written form. Traditional handcrafted ITN rules can be complex to transcribe and maintain. Meanwhile neural modeling approaches require quality large-scale spoken-written pair examples in the same or similar domain as the ASR system (in-domain data), to train. Both these…
▽ More
Inverse text normalization (ITN) is used to convert the spoken form output of an automatic speech recognition (ASR) system to a written form. Traditional handcrafted ITN rules can be complex to transcribe and maintain. Meanwhile neural modeling approaches require quality large-scale spoken-written pair examples in the same or similar domain as the ASR system (in-domain data), to train. Both these approaches require costly and complex annotations. In this paper, we present a data augmentation technique that effectively generates rich spoken-written numeric pairs from out-of-domain textual data with minimal human annotation. We empirically demonstrate that ITN model trained using our data augmentation technique consistently outperform ITN model trained using only in-domain data across all numeric surfaces like cardinal, currency, and fraction, by an overall accuracy of 14.44%.
△ Less
Submitted 20 July, 2022;
originally announced July 2022.
-
Robust Linear Predictions: Analyses of Uniform Concentration, Fast Rates and Model Misspecification
Authors:
Saptarshi Chakraborty,
Debolina Paul,
Swagatam Das
Abstract:
The problem of linear predictions has been extensively studied for the past century under pretty generalized frameworks. Recent advances in the robust statistics literature allow us to analyze robust versions of classical linear models through the prism of Median of Means (MoM). Combining these approaches in a piecemeal way might lead to ad-hoc procedures, and the restricted theoretical conclusion…
▽ More
The problem of linear predictions has been extensively studied for the past century under pretty generalized frameworks. Recent advances in the robust statistics literature allow us to analyze robust versions of classical linear models through the prism of Median of Means (MoM). Combining these approaches in a piecemeal way might lead to ad-hoc procedures, and the restricted theoretical conclusions that underpin each individual contribution may no longer be valid. To meet these challenges coherently, in this study, we offer a unified robust framework that includes a broad variety of linear prediction problems on a Hilbert space, coupled with a generic class of loss functions. Notably, we do not require any assumptions on the distribution of the outlying data points ($\mathcal{O}$) nor the compactness of the support of the inlying ones ($\mathcal{I}$). Under mild conditions on the dual norm, we show that for misspecification level $ε$, these estimators achieve an error rate of $O(\max\left\{|\mathcal{O}|^{1/2}n^{-1/2}, |\mathcal{I}|^{1/2}n^{-1} \right\}+ε)$, matching the best-known rates in literature. This rate is slightly slower than the classical rates of $O(n^{-1/2})$, indicating that we need to pay a price in terms of error rates to obtain robust estimates. Additionally, we show that this rate can be improved to achieve so-called "fast rates" under additional assumptions.
△ Less
Submitted 11 March, 2022; v1 submitted 6 January, 2022;
originally announced January 2022.
-
Uniform Concentration Bounds toward a Unified Framework for Robust Clustering
Authors:
Debolina Paul,
Saptarshi Chakraborty,
Swagatam Das,
Jason Xu
Abstract:
Recent advances in center-based clustering continue to improve upon the drawbacks of Lloyd's celebrated $k$-means algorithm over $60$ years after its introduction. Various methods seek to address poor local minima, sensitivity to outliers, and data that are not well-suited to Euclidean measures of fit, but many are supported largely empirically. Moreover, combining such approaches in a piecemeal m…
▽ More
Recent advances in center-based clustering continue to improve upon the drawbacks of Lloyd's celebrated $k$-means algorithm over $60$ years after its introduction. Various methods seek to address poor local minima, sensitivity to outliers, and data that are not well-suited to Euclidean measures of fit, but many are supported largely empirically. Moreover, combining such approaches in a piecemeal manner can result in ad hoc methods, and the limited theoretical results supporting each individual contribution may no longer hold. Toward addressing these issues in a principled way, this paper proposes a cohesive robust framework for center-based clustering under a general class of dissimilarity measures. In particular, we present a rigorous theoretical treatment within a Median-of-Means (MoM) estimation framework, showing that it subsumes several popular $k$-means variants. In addition to unifying existing methods, we derive uniform concentration bounds that complete their analyses, and bridge these results to the MoM framework via Dudley's chaining arguments. Importantly, we neither require any assumptions on the distribution of the outlying observations nor on the relative number of observations $n$ to features $p$. We establish strong consistency and an error rate of $O(n^{-1/2})$ under mild conditions, surpassing the best-known results in the literature. The methods are empirically validated thoroughly on real and synthetic datasets.
△ Less
Submitted 26 October, 2021;
originally announced October 2021.
-
Scaling New Peaks: A Viewership-centric Approach to Automated Content Curation
Authors:
Subhabrata Majumdar,
Deirdre Paul,
Eric Zavesky
Abstract:
Summarizing video content is important for video streaming services to engage the user in a limited time span. To this end, current methods involve manual curation or using passive interest cues to annotate potential high-interest segments to form the basis of summarized videos, and are costly and unreliable. We propose a viewership-driven, automated method that accommodates a range of segment ide…
▽ More
Summarizing video content is important for video streaming services to engage the user in a limited time span. To this end, current methods involve manual curation or using passive interest cues to annotate potential high-interest segments to form the basis of summarized videos, and are costly and unreliable. We propose a viewership-driven, automated method that accommodates a range of segment identification goals. Using satellite television viewership data as a source of ground truth for viewer interest, we apply statistical anomaly detection on a timeline of viewership metrics to identify 'seed' segments of high viewer interest. These segments are post-processed using empirical rules and several sources of content metadata, e.g. shot boundaries, adding in personalization aspects to produce the final highlights video.
To demonstrate the flexibility of our approach, we present two case studies, on the United States Democratic Presidential Debate on 19th December 2019, and Wimbledon Women's Final 2019. We perform qualitative comparisons with their publicly available highlights, as well as early vs. late viewership comparisons for insights into possible media and social influence on viewing behavior.
△ Less
Submitted 9 August, 2021;
originally announced August 2021.
-
Generating Hypothetical Events for Abductive Inference
Authors:
Debjit Paul,
Anette Frank
Abstract:
Abductive reasoning starts from some observations and aims at finding the most plausible explanation for these observations. To perform abduction, humans often make use of temporal and causal inferences, and knowledge about how some hypothetical situation can result in different outcomes. This work offers the first study of how such knowledge impacts the Abductive NLI task -- which consists in cho…
▽ More
Abductive reasoning starts from some observations and aims at finding the most plausible explanation for these observations. To perform abduction, humans often make use of temporal and causal inferences, and knowledge about how some hypothetical situation can result in different outcomes. This work offers the first study of how such knowledge impacts the Abductive NLI task -- which consists in choosing the more likely explanation for given observations. We train a specialized language model LMI that is tasked to generate what could happen next from a hypothetical scenario that evolves from a given event. We then propose a multi-task model MTL to solve the Abductive NLI task, which predicts a plausible explanation by a) considering different possible events emerging from candidate hypotheses -- events generated by LMI -- and b) selecting the one that is most similar to the observed outcome. We show that our MTL model improves over prior vanilla pre-trained LMs fine-tuned on Abductive NLI. Our manual evaluation and analysis suggest that learning about possible next events from different hypothetical scenarios supports abductive inference.
△ Less
Submitted 7 June, 2021;
originally announced June 2021.
-
COINS: Dynamically Generating COntextualized Inference Rules for Narrative Story Completion
Authors:
Debjit Paul,
Anette Frank
Abstract:
Despite recent successes of large pre-trained language models in solving reasoning tasks, their inference capabilities remain opaque. We posit that such models can be made more interpretable by explicitly generating interim inference rules, and using them to guide the generation of task-specific textual outputs. In this paper we present COINS, a recursive inference framework that i) iteratively re…
▽ More
Despite recent successes of large pre-trained language models in solving reasoning tasks, their inference capabilities remain opaque. We posit that such models can be made more interpretable by explicitly generating interim inference rules, and using them to guide the generation of task-specific textual outputs. In this paper we present COINS, a recursive inference framework that i) iteratively reads context sentences, ii) dynamically generates contextualized inference rules, encodes them, and iii) uses them to guide task-specific output generation. We apply COINS to a Narrative Story Completion task that asks a model to complete a story with missing sentences, to produce a coherent story with plausible logical connections, causal relationships, and temporal dependencies. By modularizing inference and sentence generation steps in a recurrent model, we aim to make reasoning steps and their effects on next sentence generation transparent. Our automatic and manual evaluations show that the model generates better story sentences than SOTA baselines, especially in terms of coherence. We further demonstrate improved performance over strong pre-trained LMs in generating commonsense inference rules. The recursive nature of COINS holds the potential for controlled generation of longer sequences.
△ Less
Submitted 4 June, 2021;
originally announced June 2021.
-
Database Workload Characterization with Query Plan Encoders
Authors:
Debjyoti Paul,
Jie Cao,
Feifei Li,
Vivek Srikumar
Abstract:
Smart databases are adopting artificial intelligence (AI) technologies to achieve {\em instance optimality}, and in the future, databases will come with prepackaged AI models within their core components. The reason is that every database runs on different workloads, demands specific resources, and settings to achieve optimal performance. It prompts the necessity to understand workloads running in…
▽ More
Smart databases are adopting artificial intelligence (AI) technologies to achieve {\em instance optimality}, and in the future, databases will come with prepackaged AI models within their core components. The reason is that every database runs on different workloads, demands specific resources, and settings to achieve optimal performance. It prompts the necessity to understand workloads running in the system along with their features comprehensively, which we dub as workload characterization.
To address this workload characterization problem, we propose our query plan encoders that learn essential features and their correlations from query plans. Our pretrained encoders capture the {\em structural} and the {\em computational performance} of queries independently. We show that our pretrained encoders are adaptable to workloads that expedite the transfer learning process. We performed independent assessments of structural encoder and performance encoders with multiple downstream tasks. For the overall evaluation of our query plan encoders, we architect two downstream tasks (i) query latency prediction and (ii) query classification. These tasks show the importance of feature-based workload characterization. We also performed extensive experiments on individual encoders to verify the effectiveness of representation learning and domain adaptability.
△ Less
Submitted 25 May, 2021;
originally announced May 2021.
-
CO-NNECT: A Framework for Revealing Commonsense Knowledge Paths as Explicitations of Implicit Knowledge in Texts
Authors:
Maria Becker,
Katharina Korfhage,
Debjit Paul,
Anette Frank
Abstract:
In this work we leverage commonsense knowledge in form of knowledge paths to establish connections between sentences, as a form of explicitation of implicit knowledge. Such connections can be direct (singlehop paths) or require intermediate concepts (multihop paths). To construct such paths we combine two model types in a joint framework we call Co-nnect: a relation classifier that predicts direct…
▽ More
In this work we leverage commonsense knowledge in form of knowledge paths to establish connections between sentences, as a form of explicitation of implicit knowledge. Such connections can be direct (singlehop paths) or require intermediate concepts (multihop paths). To construct such paths we combine two model types in a joint framework we call Co-nnect: a relation classifier that predicts direct connections between concepts; and a target prediction model that generates target or intermediate concepts given a source concept and a relation, which we use to construct multihop paths. Unlike prior work that relies exclusively on static knowledge sources, we leverage language models finetuned on knowledge stored in ConceptNet, to dynamically generate knowledge paths, as explanations of implicit knowledge that connects sentences in texts. As a central contribution we design manual and automatic evaluation settings for assessing the quality of the generated paths. We conduct evaluations on two argumentative datasets and show that a combination of the two model types generates meaningful, high-quality knowledge paths between sentences that reveal implicit knowledge conveyed in text.
△ Less
Submitted 7 May, 2021;
originally announced May 2021.
-
t-Entropy: A New Measure of Uncertainty with Some Applications
Authors:
Saptarshi Chakraborty,
Debolina Paul,
Swagatam Das
Abstract:
The concept of Entropy plays a key role in Information Theory, Statistics, and Machine Learning.This paper introduces a new entropy measure, called the t-entropy, which exploits the concavity of the inverse-tan function. We analytically show that the proposed t-entropy satisfies the prominent axiomatic properties of an entropy measure. We demonstrate an application of the proposed entropy measure…
▽ More
The concept of Entropy plays a key role in Information Theory, Statistics, and Machine Learning.This paper introduces a new entropy measure, called the t-entropy, which exploits the concavity of the inverse-tan function. We analytically show that the proposed t-entropy satisfies the prominent axiomatic properties of an entropy measure. We demonstrate an application of the proposed entropy measure for multi-level thresholding of images. We also propose the entropic-loss as a measure of the divergence between two probability distributions, which leads to robust estimators in the context of parametric statistical inference. The consistency and asymptotic breakdown point of the proposed estimator are mathematically analyzed. Finally, we show an application of the t-entropy to feature weighted data clustering.
△ Less
Submitted 5 May, 2021; v1 submitted 1 May, 2021;
originally announced May 2021.
-
Single Test Image-Based Automated Machine Learning System for Distinguishing between Trait and Diseased Blood Samples
Authors:
Sahar A. Nasser,
Debjani Paul,
Suyash P. Awate
Abstract:
We introduce a machine learning-based method for fully automated diagnosis of sickle cell disease of poor-quality unstained images of a mobile microscope. Our method is capable of distinguishing between diseased, trait (carrier), and normal samples unlike the previous methods that are limited to distinguishing the normal from the abnormal samples only. The novelty of this method comes from disting…
▽ More
We introduce a machine learning-based method for fully automated diagnosis of sickle cell disease of poor-quality unstained images of a mobile microscope. Our method is capable of distinguishing between diseased, trait (carrier), and normal samples unlike the previous methods that are limited to distinguishing the normal from the abnormal samples only. The novelty of this method comes from distinguishing the trait and the diseased samples from challenging images that have been captured directly in the field. The proposed approach contains two parts, the segmentation part followed by the classification part. We use a random forest algorithm to segment such challenging images acquitted through a mobile phone-based microscope. Then, we train two classifiers based on a random forest (RF) and a support vector machine (SVM) for classification. The results show superior performances of both of the classifiers not only for images which have been captured in the lab, but also for the ones which have been acquired in the field itself.
△ Less
Submitted 30 March, 2021;
originally announced March 2021.
-
Robust Principal Component Analysis: A Median of Means Approach
Authors:
Debolina Paul,
Saptarshi Chakraborty,
Swagatam Das
Abstract:
Principal Component Analysis (PCA) is a fundamental tool for data visualization, denoising, and dimensionality reduction. It is widely popular in Statistics, Machine Learning, Computer Vision, and related fields. However, PCA is well-known to fall prey to outliers and often fails to detect the true underlying low-dimensional structure within the dataset. Following the Median of Means (MoM) philoso…
▽ More
Principal Component Analysis (PCA) is a fundamental tool for data visualization, denoising, and dimensionality reduction. It is widely popular in Statistics, Machine Learning, Computer Vision, and related fields. However, PCA is well-known to fall prey to outliers and often fails to detect the true underlying low-dimensional structure within the dataset. Following the Median of Means (MoM) philosophy, recent supervised learning methods have shown great success in dealing with outlying observations without much compromise to their large sample theoretical properties. This paper proposes a PCA procedure based on the MoM principle. Called the \textbf{M}edian of \textbf{M}eans \textbf{P}rincipal \textbf{C}omponent \textbf{A}nalysis (MoMPCA), the proposed method is not only computationally appealing but also achieves optimal convergence rates under minimal assumptions. In particular, we explore the non-asymptotic error bounds of the obtained solution via the aid of the Rademacher complexities while granting absolutely no assumption on the outlying observations. The derived concentration results are not dependent on the dimension because the analysis is conducted in a separable Hilbert space, and the results only depend on the fourth moment of the underlying distribution in the corresponding norm. The proposal's efficacy is also thoroughly showcased through simulations and real data applications.
△ Less
Submitted 20 July, 2023; v1 submitted 5 February, 2021;
originally announced February 2021.
-
Automated Clustering of High-dimensional Data with a Feature Weighted Mean Shift Algorithm
Authors:
Saptarshi Chakraborty,
Debolina Paul,
Swagatam Das
Abstract:
Mean shift is a simple interactive procedure that gradually shifts data points towards the mode which denotes the highest density of data points in the region. Mean shift algorithms have been effectively used for data denoising, mode seeking, and finding the number of clusters in a dataset in an automated fashion. However, the merits of mean shift quickly fade away as the data dimensions increase…
▽ More
Mean shift is a simple interactive procedure that gradually shifts data points towards the mode which denotes the highest density of data points in the region. Mean shift algorithms have been effectively used for data denoising, mode seeking, and finding the number of clusters in a dataset in an automated fashion. However, the merits of mean shift quickly fade away as the data dimensions increase and only a handful of features contain useful information about the cluster structure of the data. We propose a simple yet elegant feature-weighted variant of mean shift to efficiently learn the feature importance and thus, extending the merits of mean shift to high-dimensional data. The resulting algorithm not only outperforms the conventional mean shift clustering procedure but also preserves its computational simplicity. In addition, the proposed method comes with rigorous theoretical convergence guarantees and a convergence rate of at least a cubic order. The efficacy of our proposal is thoroughly assessed through experimental comparison against baseline and state-of-the-art clustering methods on synthetic as well as real-world datasets.
△ Less
Submitted 10 May, 2021; v1 submitted 20 December, 2020;
originally announced December 2020.
-
Scheduling Beyond CPUs for HPC
Authors:
Yuping Fan,
Zhiling Lan,
Paul Rich,
William E. Allcock,
Michael E. Papka,
Brian Austin,
David Paul
Abstract:
High performance computing (HPC) is undergoing significant changes. The emerging HPC applications comprise both compute- and data-intensive applications. To meet the intense I/O demand from emerging data-intensive applications, burst buffers are deployed in production systems. Existing HPC schedulers are mainly CPU-centric. The extreme heterogeneity of hardware devices, combined with workload chan…
▽ More
High performance computing (HPC) is undergoing significant changes. The emerging HPC applications comprise both compute- and data-intensive applications. To meet the intense I/O demand from emerging data-intensive applications, burst buffers are deployed in production systems. Existing HPC schedulers are mainly CPU-centric. The extreme heterogeneity of hardware devices, combined with workload changes, forces the schedulers to consider multiple resources (e.g., burst buffers) beyond CPUs, in decision making. In this study, we present a multi-resource scheduling scheme named BBSched that schedules user jobs based on not only their CPU requirements, but also other schedulable resources such as burst buffer. BBSched formulates the scheduling problem into a multi-objective optimization (MOO) problem and rapidly solves the problem using a multi-objective genetic algorithm. The multiple solutions generated by BBSched enables system managers to explore potential tradeoffs among various resources, and therefore obtains better utilization of all the resources. The trace-driven simulations with real system workloads demonstrate that BBSched improves scheduling performance by up to 41% compared to existing methods, indicating that explicitly optimizing multiple resources beyond CPUs is essential for HPC scheduling.
△ Less
Submitted 9 December, 2020;
originally announced December 2020.
-
MeLPUF: Memory-in-Logic PUF Structures for Low-Overhead IC Authentication
Authors:
Christopher Vega,
Shubhra Deb Paul,
Patanjali SLPSK,
Swarup Bhunia
Abstract:
Physically Unclonable Functions (PUFs) are used for securing electronic devices across the implementation spectrum ranging from Field Programmable Gate Array (FPGA) to system on chips (SoCs). However, existing PUF implementations often suffer from one or more significant deficiencies: (1) significant design overhead; (2) difficulty to configure and integrate based on application-specific requireme…
▽ More
Physically Unclonable Functions (PUFs) are used for securing electronic devices across the implementation spectrum ranging from Field Programmable Gate Array (FPGA) to system on chips (SoCs). However, existing PUF implementations often suffer from one or more significant deficiencies: (1) significant design overhead; (2) difficulty to configure and integrate based on application-specific requirements; (3) vulnerability to model-building attacks; and (4) spatial locality to a specific region of a chip. These factors limit their application in the authentication of designs used in diverse applications. In this work, we propose MeLPUF: Memory-in-Logic PUF; a low-overhead, distributed PUF that leverages the existing logic gates in a design to create cross-coupled inverters (i.e., memory cells) in a logic circuit as an entropy source. It exploits these memory cells' power-up states as the entropy source to generate device-specific unique fingerprints. A dedicated control signal governs these on-demand memory cells. They can be dispersed across the combinational logic of a design to achieve distributed authentication. They can also be synthesized with a standard logic synthesis tool to meet the target area, power, and performance constraints. We evaluate the quality of MeLPUF signatures with circuit-level simulations and experimental measurements using FPGA silicon (TSMC 55nm process). Our analysis shows the high quality of the PUF in terms of uniqueness, randomness, and robustness while incurring modest overhead. We further demonstrate the scalability of MeLPUF by aggregating power-up states from multiple memory cells, thus creating PUF signatures or digital identifiers of varying lengths. Additionally, we suggest optimization techniques that can be leveraged to boost the performance of MeLPUF further.
△ Less
Submitted 29 March, 2023; v1 submitted 5 December, 2020;
originally announced December 2020.
-
Kernel k-Means, By All Means: Algorithms and Strong Consistency
Authors:
Debolina Paul,
Saptarshi Chakraborty,
Swagatam Das,
Jason Xu
Abstract:
Kernel $k$-means clustering is a powerful tool for unsupervised learning of non-linearly separable data. Since the earliest attempts, researchers have noted that such algorithms often become trapped by local minima arising from non-convexity of the underlying objective function. In this paper, we generalize recent results leveraging a general family of means to combat sub-optimal local solutions t…
▽ More
Kernel $k$-means clustering is a powerful tool for unsupervised learning of non-linearly separable data. Since the earliest attempts, researchers have noted that such algorithms often become trapped by local minima arising from non-convexity of the underlying objective function. In this paper, we generalize recent results leveraging a general family of means to combat sub-optimal local solutions to the kernel and multi-kernel settings. Called Kernel Power $k$-Means, our algorithm makes use of majorization-minimization (MM) to better solve this non-convex problem. We show the method implicitly performs annealing in kernel feature space while retaining efficient, closed-form updates, and we rigorously characterize its convergence properties both from computational and statistical points of view. In particular, we characterize the large sample behavior of the proposed method by establishing strong consistency guarantees. Its merits are thoroughly validated on a suite of simulated datasets and real data benchmarks that feature non-linear and multi-view separation.
△ Less
Submitted 12 November, 2020;
originally announced November 2020.
-
Social Commonsense Reasoning with Multi-Head Knowledge Attention
Authors:
Debjit Paul,
Anette Frank
Abstract:
Social Commonsense Reasoning requires understanding of text, knowledge about social events and their pragmatic implications, as well as commonsense reasoning skills. In this work we propose a novel multi-head knowledge attention model that encodes semi-structured commonsense inference rules and learns to incorporate them in a transformer-based reasoning cell. We assess the model's performance on t…
▽ More
Social Commonsense Reasoning requires understanding of text, knowledge about social events and their pragmatic implications, as well as commonsense reasoning skills. In this work we propose a novel multi-head knowledge attention model that encodes semi-structured commonsense inference rules and learns to incorporate them in a transformer-based reasoning cell. We assess the model's performance on two tasks that require different reasoning skills: Abductive Natural Language Inference and Counterfactual Invariance Prediction as a new task. We show that our proposed model improves performance over strong state-of-the-art models (i.e., RoBERTa) across both reasoning tasks. Notably we are, to the best of our knowledge, the first to demonstrate that a model that learns to perform counterfactual reasoning helps predicting the best explanation in an abductive reasoning task. We validate the robustness of the model's reasoning capabilities by perturbing the knowledge and provide qualitative analysis on the model's knowledge incorporation capabilities.
△ Less
Submitted 12 October, 2020;
originally announced October 2020.
-
Enhancing Speech Intelligibility in Text-To-Speech Synthesis using Speaking Style Conversion
Authors:
Dipjyoti Paul,
Muhammed PV Shifas,
Yannis Pantazis,
Yannis Stylianou
Abstract:
The increased adoption of digital assistants makes text-to-speech (TTS) synthesis systems an indispensable feature of modern mobile devices. It is hence desirable to build a system capable of generating highly intelligible speech in the presence of noise. Past studies have investigated style conversion in TTS synthesis, yet degraded synthesized quality often leads to worse intelligibility. To over…
▽ More
The increased adoption of digital assistants makes text-to-speech (TTS) synthesis systems an indispensable feature of modern mobile devices. It is hence desirable to build a system capable of generating highly intelligible speech in the presence of noise. Past studies have investigated style conversion in TTS synthesis, yet degraded synthesized quality often leads to worse intelligibility. To overcome such limitations, we proposed a novel transfer learning approach using Tacotron and WaveRNN based TTS synthesis. The proposed speech system exploits two modification strategies: (a) Lombard speaking style data and (b) Spectral Shaping and Dynamic Range Compression (SSDRC) which has been shown to provide high intelligibility gains by redistributing the signal energy on the time-frequency domain. We refer to this extension as Lombard-SSDRC TTS system. Intelligibility enhancement as quantified by the Intelligibility in Bits (SIIB-Gauss) measure shows that the proposed Lombard-SSDRC TTS system shows significant relative improvement between 110% and 130% in speech-shaped noise (SSN), and 47% to 140% in competing-speaker noise (CSN) against the state-of-the-art TTS approach. Additional subjective evaluation shows that Lombard-SSDRC TTS successfully increases the speech intelligibility with relative improvement of 455% for SSN and 104% for CSN in median keyword correction rate compared to the baseline TTS method.
△ Less
Submitted 13 August, 2020;
originally announced August 2020.
-
Speaker Conditional WaveRNN: Towards Universal Neural Vocoder for Unseen Speaker and Recording Conditions
Authors:
Dipjyoti Paul,
Yannis Pantazis,
Yannis Stylianou
Abstract:
Recent advancements in deep learning led to human-level performance in single-speaker speech synthesis. However, there are still limitations in terms of speech quality when generalizing those systems into multiple-speaker models especially for unseen speakers and unseen recording qualities. For instance, conventional neural vocoders are adjusted to the training speaker and have poor generalization…
▽ More
Recent advancements in deep learning led to human-level performance in single-speaker speech synthesis. However, there are still limitations in terms of speech quality when generalizing those systems into multiple-speaker models especially for unseen speakers and unseen recording qualities. For instance, conventional neural vocoders are adjusted to the training speaker and have poor generalization capabilities to unseen speakers. In this work, we propose a variant of WaveRNN, referred to as speaker conditional WaveRNN (SC-WaveRNN). We target towards the development of an efficient universal vocoder even for unseen speakers and recording conditions. In contrast to standard WaveRNN, SC-WaveRNN exploits additional information given in the form of speaker embeddings. Using publicly-available data for training, SC-WaveRNN achieves significantly better performance over baseline WaveRNN on both subjective and objective metrics. In MOS, SC-WaveRNN achieves an improvement of about 23% for seen speaker and seen recording condition and up to 95% for unseen speaker and unseen condition. Finally, we extend our work by implementing a multi-speaker text-to-speech (TTS) synthesis similar to zero-shot speaker adaptation. In terms of performance, our system has been preferred over the baseline TTS system by 60% over 15.5% and by 60.9% over 32.6%, for seen and unseen speakers, respectively.
△ Less
Submitted 9 August, 2020;
originally announced August 2020.
-
Cumulant GAN
Authors:
Yannis Pantazis,
Dipjyoti Paul,
Michail Fasoulakis,
Yannis Stylianou,
Markos Katsoulakis
Abstract:
In this paper, we propose a novel loss function for training Generative Adversarial Networks (GANs) aiming towards deeper theoretical understanding as well as improved stability and performance for the underlying optimization problem. The new loss function is based on cumulant generating functions giving rise to \emph{Cumulant GAN}. Relying on a recently-derived variational formula, we show that t…
▽ More
In this paper, we propose a novel loss function for training Generative Adversarial Networks (GANs) aiming towards deeper theoretical understanding as well as improved stability and performance for the underlying optimization problem. The new loss function is based on cumulant generating functions giving rise to \emph{Cumulant GAN}. Relying on a recently-derived variational formula, we show that the corresponding optimization problem is equivalent to R{é}nyi divergence minimization, thus offering a (partially) unified perspective of GAN losses: the R{é}nyi family encompasses Kullback-Leibler divergence (KLD), reverse KLD, Hellinger distance and $χ^2$-divergence. Wasserstein GAN is also a member of cumulant GAN. In terms of stability, we rigorously prove the linear convergence of cumulant GAN to the Nash equilibrium for a linear discriminator, Gaussian distributions and the standard gradient descent ascent algorithm. Finally, we experimentally demonstrate that image generation is more robust relative to Wasserstein GAN and it is substantially improved in terms of both inception score and Fréchet inception distance when both weaker and stronger discriminators are considered.
△ Less
Submitted 24 August, 2021; v1 submitted 11 June, 2020;
originally announced June 2020.
-
A Review of Computational Approaches for Evaluation of Rehabilitation Exercises
Authors:
Yalin Liao,
Aleksandar Vakanski,
Min Xian,
David Paul,
Russell Baker
Abstract:
Recent advances in data analytics and computer-aided diagnostics stimulate the vision of patient-centric precision healthcare, where treatment plans are customized based on the health records and needs of every patient. In physical rehabilitation, the progress in machine learning and the advent of affordable and reliable motion capture sensors have been conducive to the development of approaches f…
▽ More
Recent advances in data analytics and computer-aided diagnostics stimulate the vision of patient-centric precision healthcare, where treatment plans are customized based on the health records and needs of every patient. In physical rehabilitation, the progress in machine learning and the advent of affordable and reliable motion capture sensors have been conducive to the development of approaches for automated assessment of patient performance and progress toward functional recovery. The presented study reviews computational approaches for evaluating patient performance in rehabilitation programs using motion capture systems. Such approaches will play an important role in supplementing traditional rehabilitation assessment performed by trained clinicians, and in assisting patients participating in home-based rehabilitation. The reviewed computational methods for exercise evaluation are grouped into three main categories: discrete movement score, rule-based, and template-based approaches. The review places an emphasis on the application of machine learning methods for movement evaluation in rehabilitation. Related work in the literature on data representation, feature engineering, movement segmentation, and scoring functions is presented. The study also reviews existing sensors for capturing rehabilitation movements and provides an informative listing of pertinent benchmark datasets. The significance of this paper is in being the first to provide a comprehensive review of computational methods for evaluation of patient performance in rehabilitation programs.
△ Less
Submitted 19 March, 2020; v1 submitted 29 February, 2020;
originally announced March 2020.
-
Entropy Regularized Power k-Means Clustering
Authors:
Saptarshi Chakraborty,
Debolina Paul,
Swagatam Das,
Jason Xu
Abstract:
Despite its well-known shortcomings, $k$-means remains one of the most widely used approaches to data clustering. Current research continues to tackle its flaws while attempting to preserve its simplicity. Recently, the \textit{power $k$-means} algorithm was proposed to avoid trapping in local minima by annealing through a family of smoother surfaces. However, the approach lacks theoretical justif…
▽ More
Despite its well-known shortcomings, $k$-means remains one of the most widely used approaches to data clustering. Current research continues to tackle its flaws while attempting to preserve its simplicity. Recently, the \textit{power $k$-means} algorithm was proposed to avoid trapping in local minima by annealing through a family of smoother surfaces. However, the approach lacks theoretical justification and fails in high dimensions when many features are irrelevant. This paper addresses these issues by introducing \textit{entropy regularization} to learn feature relevance while annealing. We prove consistency of the proposed approach and derive a scalable majorization-minimization algorithm that enjoys closed-form updates and convergence guarantees. In particular, our method retains the same computational complexity of $k$-means and power $k$-means, but yields significant improvements over both. Its merits are thoroughly assessed on a suite of real and synthetic data experiments.
△ Less
Submitted 10 January, 2020;
originally announced January 2020.
-
A BLSTM Network for Printed Bengali OCR System with High Accuracy
Authors:
Debabrata Paul,
Bidyut Baran Chaudhuri
Abstract:
This paper presents a printed Bengali and English text OCR system developed by us using a single hidden BLSTM-CTC architecture having 128 units. Here, we did not use any peephole connection and dropout in the BLSTM, which helped us in getting better accuracy. This architecture was trained by 47,720 text lines that include English words also. When tested over 20 different Bengali fonts, it has prod…
▽ More
This paper presents a printed Bengali and English text OCR system developed by us using a single hidden BLSTM-CTC architecture having 128 units. Here, we did not use any peephole connection and dropout in the BLSTM, which helped us in getting better accuracy. This architecture was trained by 47,720 text lines that include English words also. When tested over 20 different Bengali fonts, it has produced character level accuracy of 99.32% and word level accuracy of 96.65%. A good Indic multi script OCR system is also developed by Google. It sometimes recognizes a character of Bengali into the same character of a non-Bengali script, especially Assamese, which has no distinction from Bengali, except for a few characters. For example, Bengali character for 'RA' is sometimes recognized as that of Assamese, mainly in conjunct consonant forms. Our OCR is free from such errors. This OCR system is available online at https://banglaocr.nltr.org
△ Less
Submitted 23 August, 2019;
originally announced August 2019.
-
Sparse Equisigned PCA: Algorithms and Performance Bounds in the Noisy Rank-1 Setting
Authors:
Arvind Prasadan,
Raj Rao Nadakuditi,
Debashis Paul
Abstract:
Singular value decomposition (SVD) based principal component analysis (PCA) breaks down in the high-dimensional and limited sample size regime below a certain critical eigen-SNR that depends on the dimensionality of the system and the number of samples. Below this critical eigen-SNR, the estimates returned by the SVD are asymptotically uncorrelated with the latent principal components. We consider…
▽ More
Singular value decomposition (SVD) based principal component analysis (PCA) breaks down in the high-dimensional and limited sample size regime below a certain critical eigen-SNR that depends on the dimensionality of the system and the number of samples. Below this critical eigen-SNR, the estimates returned by the SVD are asymptotically uncorrelated with the latent principal components. We consider a setting where the left singular vector of the underlying rank one signal matrix is assumed to be sparse and the right singular vector is assumed to be equisigned, that is, having either only nonnegative or only nonpositive entries. We consider six different algorithms for estimating the sparse principal component based on different statistical criteria and prove that by exploiting sparsity, we recover consistent estimates in the low eigen-SNR regime where the SVD fails. Our analysis reveals conditions under which a coordinate selection scheme based on a \textit{sum-type decision statistic} outperforms schemes that utilize the $\ell_1$ and $\ell_2$ norm-based statistics. We derive lower bounds on the size of detectable coordinates of the principal left singular vector and utilize these lower bounds to derive lower bounds on the worst-case risk. Finally, we verify our findings with numerical simulations and illustrate the performance with a video data example, where the interest is in identifying objects.
△ Less
Submitted 16 December, 2019; v1 submitted 22 May, 2019;
originally announced May 2019.
-
Ranking and Selecting Multi-Hop Knowledge Paths to Better Predict Human Needs
Authors:
Debjit Paul,
Anette Frank
Abstract:
To make machines better understand sentiments, research needs to move from polarity identification to understanding the reasons that underlie the expression of sentiment. Categorizing the goals or needs of humans is one way to explain the expression of sentiment in text. Humans are good at understanding situations described in natural language and can easily connect them to the character's psychol…
▽ More
To make machines better understand sentiments, research needs to move from polarity identification to understanding the reasons that underlie the expression of sentiment. Categorizing the goals or needs of humans is one way to explain the expression of sentiment in text. Humans are good at understanding situations described in natural language and can easily connect them to the character's psychological needs using commonsense knowledge. We present a novel method to extract, rank, filter and select multi-hop relation paths from a commonsense knowledge resource to interpret the expression of sentiment in terms of their underlying human needs. We efficiently integrate the acquired knowledge paths in a neural model that interfaces context representations with knowledge using a gated attention mechanism. We assess the model's performance on a recently published dataset for categorizing human needs. Selectively integrating knowledge paths boosts performance and establishes a new state-of-the-art. Our model offers interpretability through the learned attention map over commonsense knowledge paths. Human evaluation highlights the relevance of the encoded knowledge.
△ Less
Submitted 1 April, 2019;
originally announced April 2019.
-
Handling Noisy Labels for Robustly Learning from Self-Training Data for Low-Resource Sequence Labeling
Authors:
Debjit Paul,
Mittul Singh,
Michael A. Hedderich,
Dietrich Klakow
Abstract:
In this paper, we address the problem of effectively self-training neural networks in a low-resource setting. Self-training is frequently used to automatically increase the amount of training data. However, in a low-resource scenario, it is less effective due to unreliable annotations created using self-labeling of unlabeled data. We propose to combine self-training with noise handling on the self…
▽ More
In this paper, we address the problem of effectively self-training neural networks in a low-resource setting. Self-training is frequently used to automatically increase the amount of training data. However, in a low-resource scenario, it is less effective due to unreliable annotations created using self-labeling of unlabeled data. We propose to combine self-training with noise handling on the self-labeled data. Directly estimating noise on the combined clean training set and self-labeled data can lead to corruption of the clean data and hence, performs worse. Thus, we propose the Clean and Noisy Label Neural Network which trains on clean and noisy self-labeled data simultaneously by explicitly modelling clean and noisy labels separately. In our experiments on Chunking and NER, this approach performs more robustly than the baselines. Complementary to this explicit approach, noise can also be handled implicitly with the help of an auxiliary learning task. To such a complementary approach, our method is more beneficial than other baseline methods and together provides the best performance overall.
△ Less
Submitted 28 March, 2019;
originally announced March 2019.
-
Generalization of Spoofing Countermeasures: a Case Study with ASVspoof 2015 and BTAS 2016 Corpora
Authors:
Dipjyoti Paul,
Md Sahidullah,
Goutam Saha
Abstract:
Voice-based biometric systems are highly prone to spoofing attacks. Recently, various countermeasures have been developed for detecting different kinds of attacks such as replay, speech synthesis (SS) and voice conversion (VC). Most of the existing studies are conducted with a specific training set defined by the evaluation protocol. However, for realistic scenarios, selecting appropriate training…
▽ More
Voice-based biometric systems are highly prone to spoofing attacks. Recently, various countermeasures have been developed for detecting different kinds of attacks such as replay, speech synthesis (SS) and voice conversion (VC). Most of the existing studies are conducted with a specific training set defined by the evaluation protocol. However, for realistic scenarios, selecting appropriate training data is an open challenge for the system administrator. Motivated by this practical concern, this work investigates the generalization capability of spoofing countermeasures in restricted training conditions where speech from a broad attack types are left out in the training database. We demonstrate that different spoofing types have considerably different generalization capabilities. For this study, we analyze the performance using two kinds of features, mel-frequency cepstral coefficients (MFCCs) which are considered as baseline and recently proposed constant Q cepstral coefficients (CQCCs). The experiments are conducted with standard Gaussian mixture model - maximum likelihood (GMM-ML) classifier on two recently released spoofing corpora: ASVspoof 2015 and BTAS 2016 that includes cross-corpora performance analysis. Feature-level analysis suggests that static and dynamic coefficients of spectral features, both are important for detecting spoofing attacks in the real-life condition.
△ Less
Submitted 23 January, 2019;
originally announced January 2019.
-
Training Generative Adversarial Networks with Weights
Authors:
Yannis Pantazis,
Dipjyoti Paul,
Michail Fasoulakis,
Yannis Stylianou
Abstract:
The impressive success of Generative Adversarial Networks (GANs) is often overshadowed by the difficulties in their training. Despite the continuous efforts and improvements, there are still open issues regarding their convergence properties. In this paper, we propose a simple training variation where suitable weights are defined and assist the training of the Generator. We provide theoretical arg…
▽ More
The impressive success of Generative Adversarial Networks (GANs) is often overshadowed by the difficulties in their training. Despite the continuous efforts and improvements, there are still open issues regarding their convergence properties. In this paper, we propose a simple training variation where suitable weights are defined and assist the training of the Generator. We provide theoretical arguments why the proposed algorithm is better than the baseline training in the sense of speeding up the training process and of creating a stronger Generator. Performance results showed that the new algorithm is more accurate in both synthetic and image datasets resulting in improvements ranging between 5% and 50%.
△ Less
Submitted 6 November, 2018;
originally announced November 2018.
-
Robustness of Voice Conversion Techniques Under Mismatched Conditions
Authors:
Monisankha Pal,
Dipjyoti Paul,
Md Sahidullah,
Goutam Saha
Abstract:
Most of the existing studies on voice conversion (VC) are conducted in acoustically matched conditions between source and target signal. However, the robustness of VC methods in presence of mismatch remains unknown. In this paper, we report a comparative analysis of different VC techniques under mismatched conditions. The extensive experiments with five different VC techniques on CMU ARCTIC corpus…
▽ More
Most of the existing studies on voice conversion (VC) are conducted in acoustically matched conditions between source and target signal. However, the robustness of VC methods in presence of mismatch remains unknown. In this paper, we report a comparative analysis of different VC techniques under mismatched conditions. The extensive experiments with five different VC techniques on CMU ARCTIC corpus suggest that performance of VC methods substantially degrades in noisy conditions. We have found that bilinear frequency warping with amplitude scaling (BLFWAS) outperforms other methods in most of the noisy conditions. We further explore the suitability of different speech enhancement techniques for robust conversion. The objective evaluation results indicate that spectral subtraction and log minimum mean square error (logMMSE) based speech enhancement techniques can be used to improve the performance in specific noisy conditions.
△ Less
Submitted 22 December, 2016;
originally announced December 2016.
-
Using Word Embeddings for Automatic Query Expansion
Authors:
Dwaipayan Roy,
Debjyoti Paul,
Mandar Mitra,
Utpal Garain
Abstract:
In this paper a framework for Automatic Query Expansion (AQE) is proposed using distributed neural language model word2vec. Using semantic and contextual relation in a distributed and unsupervised framework, word2vec learns a low dimensional embedding for each vocabulary entry. Using such a framework, we devise a query expansion technique, where related terms to a query are obtained by K-nearest n…
▽ More
In this paper a framework for Automatic Query Expansion (AQE) is proposed using distributed neural language model word2vec. Using semantic and contextual relation in a distributed and unsupervised framework, word2vec learns a low dimensional embedding for each vocabulary entry. Using such a framework, we devise a query expansion technique, where related terms to a query are obtained by K-nearest neighbor approach. We explore the performance of the AQE methods, with and without feedback query expansion, and a variant of simple K-nearest neighbor in the proposed framework. Experiments on standard TREC ad-hoc data (Disk 4, 5 with query sets 301-450, 601-700) and web data (WT10G data with query set 451-550) shows significant improvement over standard term-overlapping based retrieval methods. However the proposed method fails to achieve comparable performance with statistical co-occurrence based feedback method such as RM3. We have also found that the word2vec based query expansion methods perform similarly with and without any feedback information.
△ Less
Submitted 24 June, 2016;
originally announced June 2016.
-
Novel Speech Features for Improved Detection of Spoofing Attacks
Authors:
Dipjyoti Paul,
Monisankha Pal,
Goutam Saha
Abstract:
Now-a-days, speech-based biometric systems such as automatic speaker verification (ASV) are highly prone to spoofing attacks by an imposture. With recent development in various voice conversion (VC) and speech synthesis (SS) algorithms, these spoofing attacks can pose a serious potential threat to the current state-of-the-art ASV systems. To impede such attacks and enhance the security of the ASV…
▽ More
Now-a-days, speech-based biometric systems such as automatic speaker verification (ASV) are highly prone to spoofing attacks by an imposture. With recent development in various voice conversion (VC) and speech synthesis (SS) algorithms, these spoofing attacks can pose a serious potential threat to the current state-of-the-art ASV systems. To impede such attacks and enhance the security of the ASV systems, the development of efficient anti-spoofing algorithms is essential that can differentiate synthetic or converted speech from natural or human speech. In this paper, we propose a set of novel speech features for detecting spoofing attacks. The proposed features are computed using alternative frequency-warping technique and formant-specific block transformation of filter bank log energies. We have evaluated existing and proposed features against several kinds of synthetic speech data from ASVspoof 2015 corpora. The results show that the proposed techniques outperform existing approaches for various spoofing attack detection task. The techniques investigated in this paper can also accurately classify natural and synthetic speech as equal error rates (EERs) of 0% have been achieved.
△ Less
Submitted 14 March, 2016;
originally announced March 2016.