Search | arXiv e-print repository

Instruction Data Generation and Unsupervised Adaptation for Speech Language Models

Authors: Vahid Noroozi, Zhehuai Chen, Somshubra Majumdar, Steve Huang, Jagadeesh Balam, Boris Ginsburg

Abstract: In this paper, we propose three methods for generating synthetic samples to train and evaluate multimodal large language models capable of processing both text and speech inputs. Addressing the scarcity of samples containing both modalities, synthetic data generation emerges as a crucial strategy to enhance the performance of such systems and facilitate the modeling of cross-modal relationships be… ▽ More In this paper, we propose three methods for generating synthetic samples to train and evaluate multimodal large language models capable of processing both text and speech inputs. Addressing the scarcity of samples containing both modalities, synthetic data generation emerges as a crucial strategy to enhance the performance of such systems and facilitate the modeling of cross-modal relationships between the speech and text domains. Our process employs large language models to generate textual components and text-to-speech systems to generate speech components. The proposed methods offer a practical and effective means to expand the training dataset for these models. Experimental results show progress in achieving an integrated understanding of text and speech. We also highlight the potential of using unlabeled speech data to generate synthetic samples comparable in quality to those with available transcriptions, enabling the expansion of these models to more languages. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: Accepted for Interspeech 2024

arXiv:2406.11871 [pdf, other]

Generative AI Voting: Fair Collective Choice is Resilient to LLM Biases and Inconsistencies

Authors: Srijoni Majumdar, Edith Elkind, Evangelos Pournaras

Abstract: Scaling up deliberative and voting participation is a longstanding endeavor -- a cornerstone for direct democracy and legitimate collective choice. Recent breakthroughs in generative artificial intelligence (AI) and large language models (LLMs) provide unprecedented opportunities, but also alerting risks for digital democracy. AI personal assistants can overcome cognitive bandwidth limitations of… ▽ More Scaling up deliberative and voting participation is a longstanding endeavor -- a cornerstone for direct democracy and legitimate collective choice. Recent breakthroughs in generative artificial intelligence (AI) and large language models (LLMs) provide unprecedented opportunities, but also alerting risks for digital democracy. AI personal assistants can overcome cognitive bandwidth limitations of humans, providing decision support capabilities or even direct AI representation of human voters at large scale. However, the quality of this representation and what underlying biases manifest when delegating collective decision making to LLMs is an alarming and timely challenge to tackle. By rigorously emulating with high realism more than >50K LLM voting personas in 81 real-world voting elections, we show that different LLMs (GPT 3, GPT 3.5, and Llama2) come with biases and significant inconsistencies in complex preferential ballot formats, compared to simpler and more consistent majoritarian elections. Strikingly, fair voting aggregation methods, such as equal shares, prove to be a win-win: fairer voting outcomes for humans with fairer AI representation. This novel underlying relationship proves paramount for democratic resilience in progressives scenarios with low voters turnout and voter fatigue supported by AI representatives: abstained voters are mitigated by recovering highly representative voting outcomes that are fairer. These insights provide remarkable foundations for science, policymakers and citizens in explaining and mitigating AI risks in democratic innovations. △ Less

Submitted 30 May, 2024; originally announced June 2024.

Comments: 35 pages, 10 figures

arXiv:2406.11704 [pdf, other]

Nemotron-4 340B Technical Report

Authors: Nvidia, :, Bo Adler, Niket Agarwal, Ashwath Aithal, Dong H. Anh, Pallab Bhattacharya, Annika Brundyn, Jared Casper, Bryan Catanzaro, Sharon Clay, Jonathan Cohen, Sirshak Das, Ayush Dattagupta, Olivier Delalleau, Leon Derczynski, Yi Dong, Daniel Egert, Ellie Evans, Aleksander Ficek, Denys Fridman, Shaona Ghosh, Boris Ginsburg, Igor Gitman, Tomasz Grzegorzek , et al. (58 additional authors not shown)

Abstract: We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward. Our models are open access under the NVIDIA Open Model License Agreement, a permissive model license that allows distribution, modification, and use of the models and its outputs. These models perform competitively to open access models on a wide range of evaluation be… ▽ More We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward. Our models are open access under the NVIDIA Open Model License Agreement, a permissive model license that allows distribution, modification, and use of the models and its outputs. These models perform competitively to open access models on a wide range of evaluation benchmarks, and were sized to fit on a single DGX H100 with 8 GPUs when deployed in FP8 precision. We believe that the community can benefit from these models in various research studies and commercial applications, especially for generating synthetic data to train smaller language models. Notably, over 98% of data used in our model alignment process is synthetically generated, showcasing the effectiveness of these models in generating synthetic data. To further support open research and facilitate model development, we are also open-sourcing the synthetic data generation pipeline used in our model alignment process. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.11036 [pdf, other]

garak: A Framework for Security Probing Large Language Models

Authors: Leon Derczynski, Erick Galinkin, Jeffrey Martin, Subho Majumdar, Nanna Inie

Abstract: As Large Language Models (LLMs) are deployed and integrated into thousands of applications, the need for scalable evaluation of how models respond to adversarial attacks grows rapidly. However, LLM security is a moving target: models produce unpredictable output, are constantly updated, and the potential adversary is highly diverse: anyone with access to the internet and a decent command of natura… ▽ More As Large Language Models (LLMs) are deployed and integrated into thousands of applications, the need for scalable evaluation of how models respond to adversarial attacks grows rapidly. However, LLM security is a moving target: models produce unpredictable output, are constantly updated, and the potential adversary is highly diverse: anyone with access to the internet and a decent command of natural language. Further, what constitutes a security weak in one context may not be an issue in a different context; one-fits-all guardrails remain theoretical. In this paper, we argue that it is time to rethink what constitutes ``LLM security'', and pursue a holistic approach to LLM security evaluation, where exploration and discovery of issues are central. To this end, this paper introduces garak (Generative AI Red-teaming and Assessment Kit), a framework which can be used to discover and identify vulnerabilities in a target LLM or dialog system. garak probes an LLM in a structured fashion to discover potential vulnerabilities. The outputs of the framework describe a target model's weaknesses, contribute to an informed discussion of what composes vulnerabilities in unique contexts, and can inform alignment and policy discussions for LLM deployment. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: https://garak.ai

arXiv:2405.14577 [pdf, other]

Representation noising effectively prevents harmful fine-tuning on LLMs

Authors: Domenic Rosati, Jan Wehner, Kai Williams, Łukasz Bartoszcze, David Atanasov, Robie Gonzales, Subhabrata Majumdar, Carsten Maple, Hassan Sajjad, Frank Rudzicz

Abstract: Releasing open-source large language models (LLMs) presents a dual-use risk since bad actors can easily fine-tune these models for harmful purposes. Even without the open release of weights, weight stealing and fine-tuning APIs make closed models vulnerable to harmful fine-tuning attacks (HFAs). While safety measures like preventing jailbreaks and improving safety guardrails are important, such me… ▽ More Releasing open-source large language models (LLMs) presents a dual-use risk since bad actors can easily fine-tune these models for harmful purposes. Even without the open release of weights, weight stealing and fine-tuning APIs make closed models vulnerable to harmful fine-tuning attacks (HFAs). While safety measures like preventing jailbreaks and improving safety guardrails are important, such measures can easily be reversed through fine-tuning. In this work, we propose Representation Noising (RepNoise), a defence mechanism that is effective even when attackers have access to the weights and the defender no longer has any control. RepNoise works by removing information about harmful representations such that it is difficult to recover them during fine-tuning. Importantly, our defence is also able to generalize across different subsets of harm that have not been seen during the defence process. Our method does not degrade the general capability of LLMs and retains the ability to train the model on harmless tasks. We provide empirical evidence that the effectiveness of our defence lies in its "depth": the degree to which information about harmful representations is removed across all layers of the LLM. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.05495 [pdf, other]

PARSAC: Fast, Human-quality Floorplanning for Modern SoCs with Complex Design Constraints

Authors: Hesham Mostafa, Uday Mallappa, Mikhail Galkin, Mariano Phielipp, Somdeb Majumdar

Abstract: The floorplanning of Systems-on-a-Chip (SoCs) and of chip sub-systems is a crucial step in the physical design flow as it determines the optimal shapes and locations of the blocks that make up the system. Simulated Annealing (SA) has been the method of choice for tackling classical floorplanning problems where the objective is to minimize wire-length and the total placement area. The goal in indus… ▽ More The floorplanning of Systems-on-a-Chip (SoCs) and of chip sub-systems is a crucial step in the physical design flow as it determines the optimal shapes and locations of the blocks that make up the system. Simulated Annealing (SA) has been the method of choice for tackling classical floorplanning problems where the objective is to minimize wire-length and the total placement area. The goal in industry-relevant floorplanning problems, however, is not only to minimize area and wire-length, but to do that while respecting hard placement constraints that specify the general area and/or the specific locations for the placement of some blocks. We show that simply incorporating these constraints into the SA objective function leads to sub-optimal, and often illegal, solutions. We propose the Constraints-Aware Simulated Annealing (CA-SA) method and show that it strongly outperforms vanilla SA in floorplanning problems with hard placement constraints. We developed a new floorplanning tool on top of CA-SA: PARSAC (Parallel Simulated Annealing with Constraints). PARSAC is an efficient, easy-to-use, and massively parallel floorplanner. Unlike current SA-based or learning-based floorplanning tools that cannot effectively incorporate hard placement-constraints, PARSAC can quickly construct the Pareto-optimal legal solutions front for constrained floorplanning problems. PARSAC also outperforms traditional SA on legacy floorplanning benchmarks. PARSAC is available as an open-source repository for researchers to replicate and build on our result. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: 9 pages, 7 figures

arXiv:2405.05480 [pdf, other]

FloorSet -- a VLSI Floorplanning Dataset with Design Constraints of Real-World SoCs

Authors: Uday Mallappa, Hesham Mostafa, Mikhail Galkin, Mariano Phielipp, Somdeb Majumdar

Abstract: Floorplanning for systems-on-a-chip (SoCs) and its sub-systems is a crucial and non-trivial step of the physical design flow. It represents a difficult combinatorial optimization problem. A typical large scale SoC with 120 partitions generates a search-space of nearly 10E250. As novel machine learning (ML) approaches emerge to tackle such problems, there is a growing need for a modern benchmark th… ▽ More Floorplanning for systems-on-a-chip (SoCs) and its sub-systems is a crucial and non-trivial step of the physical design flow. It represents a difficult combinatorial optimization problem. A typical large scale SoC with 120 partitions generates a search-space of nearly 10E250. As novel machine learning (ML) approaches emerge to tackle such problems, there is a growing need for a modern benchmark that comprises a large training dataset and performance metrics that better reflect real-world constraints and objectives compared to existing benchmarks. To address this need, we present FloorSet -- two comprehensive datasets of synthetic fixed-outline floorplan layouts that reflect the distribution of real SoCs. Each dataset has 1M training samples and 100 test samples where each sample is a synthetic floor-plan. FloorSet-Prime comprises fully-abutted rectilinear partitions and near-optimal wire-length. A simplified dataset that reflects early design phases, FloorSet-Lite comprises rectangular partitions, with under 5 percent white-space and near-optimal wire-length. Both datasets define hard constraints seen in modern design flows such as shape constraints, edge-affinity, grouping constraints, and pre-placement constraints. FloorSet is intended to spur fundamental research on large-scale constrained optimization problems. Crucially, FloorSet alleviates the core issue of reproducibility in modern ML driven solutions to such problems. FloorSet is available as an open-source repository for the research community. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: 10 pages, 11 figures

arXiv:2405.05085 [pdf, other]

Fair Voting Outcomes with Impact and Novelty Compromises? Unraveling Biases of Equal Shares in Participatory Budgeting

Authors: Sajan Maharjan, Srijoni Majumdar, Evangelos Pournaras

Abstract: Participatory budgeting, as a paradigm for democratic innovations, engages citizens in the distribution of a public budget to projects, which they propose and vote for implementation. So far, voting algorithms have been devised and studied in social choice literature to elect projects that are popular, while others prioritize on a proportional representation of voters' preferences, for instance, e… ▽ More Participatory budgeting, as a paradigm for democratic innovations, engages citizens in the distribution of a public budget to projects, which they propose and vote for implementation. So far, voting algorithms have been devised and studied in social choice literature to elect projects that are popular, while others prioritize on a proportional representation of voters' preferences, for instance, equal shares. However, the anticipated impact and novelty in the broader society by the winning projects, as selected by different algorithms, remains totally under-explored, lacking both a universal theory of impact for voting and a rigorous framework for impact and novelty assessments. This papers tackles this grand challenge towards new axiomatic foundations for designing effective and fair voting methods. This is via new and striking insights derived from a large-scale analysis of biases over 345 real-world voting outcomes, characterized for the first time by a novel portfolio of impact and novelty metrics. We find strong causal evidence that equal shares comes with impact loss in several infrastructural projects of different cost levels that have been so far over-represented. However, it also comes with a novel, yet over-represented, impact gain in welfare, education and culture. We discuss broader implications of these results and how impact loss can be mitigated at the stage of campaign design and project ideation. △ Less

Submitted 9 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

Comments: 23 pages, 9 figures

arXiv:2404.05482 [pdf, other]

WaveCatBoost for Probabilistic Forecasting of Regional Air Quality Data

Authors: Jintu Borah, Tanujit Chakraborty, Md. Shahrul Md. Nadzir, Mylene G. Cayetano, Shubhankar Majumdar

Abstract: Accurate and reliable air quality forecasting is essential for protecting public health, sustainable development, pollution control, and enhanced urban planning. This letter presents a novel WaveCatBoost architecture designed to forecast the real-time concentrations of air pollutants by combining the maximal overlapping discrete wavelet transform (MODWT) with the CatBoost model. This hybrid approa… ▽ More Accurate and reliable air quality forecasting is essential for protecting public health, sustainable development, pollution control, and enhanced urban planning. This letter presents a novel WaveCatBoost architecture designed to forecast the real-time concentrations of air pollutants by combining the maximal overlapping discrete wavelet transform (MODWT) with the CatBoost model. This hybrid approach efficiently transforms time series into high-frequency and low-frequency components, thereby extracting signal from noise and improving prediction accuracy and robustness. Evaluation of two distinct regional datasets, from the Central Air Pollution Control Board (CPCB) sensor network and a low-cost air quality sensor system (LAQS), underscores the superior performance of our proposed methodology in real-time forecasting compared to the state-of-the-art statistical and deep learning architectures. Moreover, we employ a conformal prediction strategy to provide probabilistic bands with our forecasts. △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2401.05947 [pdf, other]

Send Message to the Future? Blockchain-based Time Machines for Decentralized Reveal of Locked Information

Authors: Zhuolun Li, Srijoni Majumdar, Evangelos Pournaras

Abstract: Conditional information reveal systems automate the release of information upon meeting specific predefined conditions, such as time or location. This paper introduces a breakthrough in the understanding, design and application of conditional information reveal systems that are highly secure and decentralized. By designing a new practical timed-release cryptography system and a verifiable secret s… ▽ More Conditional information reveal systems automate the release of information upon meeting specific predefined conditions, such as time or location. This paper introduces a breakthrough in the understanding, design and application of conditional information reveal systems that are highly secure and decentralized. By designing a new practical timed-release cryptography system and a verifiable secret sharing scheme, a novel data sharing system is devised on the blockchain that `sends messages in the future' with highly accurate decryption times. This paper provides a complete evaluation portfolio of this pioneering paradigm, including analytical results, a validation of its robustness in the Tamarin Prover and a performance evaluation of a real-world, open-source system prototype deployed across the globe. Using real-world election data, we also demonstrate the applicability of this innovative system in e-voting, illustrating its capacity to secure and ensure fair electronic voting processes. △ Less

Submitted 24 May, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

arXiv:2312.17279 [pdf, other]

Stateful Conformer with Cache-based Inference for Streaming Automatic Speech Recognition

Authors: Vahid Noroozi, Somshubra Majumdar, Ankur Kumar, Jagadeesh Balam, Boris Ginsburg

Abstract: In this paper, we propose an efficient and accurate streaming speech recognition model based on the FastConformer architecture. We adapted the FastConformer architecture for streaming applications through: (1) constraining both the look-ahead and past contexts in the encoder, and (2) introducing an activation caching mechanism to enable the non-autoregressive encoder to operate autoregressively du… ▽ More In this paper, we propose an efficient and accurate streaming speech recognition model based on the FastConformer architecture. We adapted the FastConformer architecture for streaming applications through: (1) constraining both the look-ahead and past contexts in the encoder, and (2) introducing an activation caching mechanism to enable the non-autoregressive encoder to operate autoregressively during inference. The proposed model is thoughtfully designed in a way to eliminate the accuracy disparity between the train and inference time which is common for many streaming models. Furthermore, our proposed encoder works with various decoder configurations including Connectionist Temporal Classification (CTC) and RNN-Transducer (RNNT) decoders. Additionally, we introduced a hybrid CTC/RNNT architecture which utilizes a shared encoder with both a CTC and RNNT decoder to boost the accuracy and save computation. We evaluate the proposed model on LibriSpeech dataset and a multi-domain large scale dataset and demonstrate that it can achieve better accuracy with lower latency and inference time compared to a conventional buffered streaming model baseline. We also showed that training a model with multiple latencies can achieve better accuracy than single latency models while it enables us to support multiple latencies with a single model. Our experiments also showed the hybrid architecture would not only speedup the convergence of the CTC decoder but also improves the accuracy of streaming models compared to single decoder models. △ Less

Submitted 2 May, 2024; v1 submitted 27 December, 2023; originally announced December 2023.

Comments: Shorter version accepted to ICASSP 2024

arXiv:2311.03374 [pdf, other]

Generative AI for Software Metadata: Overview of the Information Retrieval in Software Engineering Track at FIRE 2023

Authors: Srijoni Majumdar, Soumen Paul, Debjyoti Paul, Ayan Bandyopadhyay, Samiran Chattopadhyay, Partha Pratim Das, Paul D Clough, Prasenjit Majumder

Abstract: The Information Retrieval in Software Engineering (IRSE) track aims to develop solutions for automated evaluation of code comments in a machine learning framework based on human and large language model generated labels. In this track, there is a binary classification task to classify comments as useful and not useful. The dataset consists of 9048 code comments and surrounding code snippet pairs e… ▽ More The Information Retrieval in Software Engineering (IRSE) track aims to develop solutions for automated evaluation of code comments in a machine learning framework based on human and large language model generated labels. In this track, there is a binary classification task to classify comments as useful and not useful. The dataset consists of 9048 code comments and surrounding code snippet pairs extracted from open source github C based projects and an additional dataset generated individually by teams using large language models. Overall 56 experiments have been submitted by 17 teams from various universities and software companies. The submissions have been evaluated quantitatively using the F1-Score and qualitatively based on the type of features developed, the supervised learning model used and their corresponding hyper-parameters. The labels generated from large language models increase the bias in the prediction model but lead to less over-fitted results. △ Less

Submitted 27 October, 2023; originally announced November 2023.

Comments: Overview Paper of the Information Retrieval of Software Engineering Track at the Forum for Information Retrieval, 2023

arXiv:2310.17152 [pdf]

Technical Note: Feasibility of translating 3.0T-trained Deep-Learning Segmentation Models Out-of-the-Box on Low-Field MRI 0.55T Knee-MRI of Healthy Controls

Authors: Rupsa Bhattacharjee, Zehra Akkaya, Johanna Luitjens, Pan Su, Yang Yang, Valentina Pedoia, Sharmila Majumdar

Abstract: In the current study, our purpose is to evaluate the feasibility of applying deep learning (DL) enabled algorithms to quantify bilateral knee biomarkers in healthy controls scanned at 0.55T and compared with 3.0T. The current study assesses the performance of standard in-practice bone, and cartilage segmentation algorithms at 0.55T, both qualitatively and quantitatively, in terms of comparing segm… ▽ More In the current study, our purpose is to evaluate the feasibility of applying deep learning (DL) enabled algorithms to quantify bilateral knee biomarkers in healthy controls scanned at 0.55T and compared with 3.0T. The current study assesses the performance of standard in-practice bone, and cartilage segmentation algorithms at 0.55T, both qualitatively and quantitatively, in terms of comparing segmentation performance, areas of improvement, and compartment-wise cartilage thickness values between 0.55T vs. 3.0T. Initial results demonstrate a usable to good technical feasibility of translating existing quantitative deep-learning-based image segmentation techniques, trained on 3.0T, out of 0.55T for knee MRI, in a multi-vendor acquisition environment. Especially in terms of segmenting cartilage compartments, the models perform almost equivalent to 3.0T in terms of Likert ranking. The 0.55T low-field sustainable and easy-to-install MRI, as demonstrated, thus, can be utilized for evaluating knee cartilage thickness and bone segmentations aided by established DL algorithms trained at higher-field strengths out-of-the-box initially. This could be utilized at the far-spread point-of-care locations with a lack of radiologists available to manually segment low-field images, at least till a decent base of low-field data pool is collated. With further fine-tuning with manual labeling of low-field data or utilizing synthesized higher SNR images from low-field images, OA biomarker quantification performance is potentially guaranteed to be further improved. △ Less

Submitted 26 October, 2023; originally announced October 2023.

Comments: 11 Pages, 3 Figures, 2 Tables

arXiv:2309.09950 [pdf, other]

Investigating End-to-End ASR Architectures for Long Form Audio Transcription

Authors: Nithin Rao Koluguri, Samuel Kriman, Georgy Zelenfroind, Somshubra Majumdar, Dima Rekesh, Vahid Noroozi, Jagadeesh Balam, Boris Ginsburg

Abstract: This paper presents an overview and evaluation of some of the end-to-end ASR models on long-form audios. We study three categories of Automatic Speech Recognition(ASR) models based on their core architecture: (1) convolutional, (2) convolutional with squeeze-and-excitation and (3) convolutional models with attention. We selected one ASR model from each category and evaluated Word Error Rate, maxim… ▽ More This paper presents an overview and evaluation of some of the end-to-end ASR models on long-form audios. We study three categories of Automatic Speech Recognition(ASR) models based on their core architecture: (1) convolutional, (2) convolutional with squeeze-and-excitation and (3) convolutional models with attention. We selected one ASR model from each category and evaluated Word Error Rate, maximum audio length and real-time factor for each model on a variety of long audio benchmarks: Earnings-21 and 22, CORAAL, and TED-LIUM3. The model from the category of self-attention with local attention and global token has the best accuracy comparing to other architectures. We also compared models with CTC and RNNT decoders and showed that CTC-based models are more robust and efficient than RNNT on long form audio. △ Less

Submitted 20 September, 2023; v1 submitted 18 September, 2023; originally announced September 2023.

Comments: PrePrint. Submitted to ICASSP 2024

arXiv:2308.09138 [pdf, other]

Semantic Consistency for Assuring Reliability of Large Language Models

Authors: Harsh Raj, Vipul Gupta, Domenic Rosati, Subhabrata Majumdar

Abstract: Large Language Models (LLMs) exhibit remarkable fluency and competence across various natural language tasks. However, recent research has highlighted their sensitivity to variations in input prompts. To deploy LLMs in a safe and reliable manner, it is crucial for their outputs to be consistent when prompted with expressions that carry the same meaning or intent. While some existing work has explo… ▽ More Large Language Models (LLMs) exhibit remarkable fluency and competence across various natural language tasks. However, recent research has highlighted their sensitivity to variations in input prompts. To deploy LLMs in a safe and reliable manner, it is crucial for their outputs to be consistent when prompted with expressions that carry the same meaning or intent. While some existing work has explored how state-of-the-art LLMs address this issue, their evaluations have been confined to assessing lexical equality of single- or multi-word answers, overlooking the consistency of generative text sequences. For a more comprehensive understanding of the consistency of LLMs in open-ended text generation scenarios, we introduce a general measure of semantic consistency, and formulate multiple versions of this metric to evaluate the performance of various LLMs. Our proposal demonstrates significantly higher consistency and stronger correlation with human evaluations of output consistency than traditional metrics based on lexical consistency. Finally, we propose a novel prompting strategy, called Ask-to-Choose (A2C), to enhance semantic consistency. When evaluated for closed-book question answering based on answer variations from the TruthfulQA benchmark, A2C increases accuracy metrics for pretrained and finetuned LLMs by up to 47%, and semantic consistency metrics for instruction-tuned models by up to 7-fold. △ Less

Submitted 17 August, 2023; originally announced August 2023.

arXiv:2308.06653 [pdf, other]

Smart Knowledge Transfer using Google-like Search

Authors: Srijoni Majumdar, Partha Pratim Das

Abstract: To address the issue of rising software maintenance cost due to program comprehension challenges, we propose SMARTKT (Smart Knowledge Transfer), a search framework, which extracts and integrates knowledge related to various aspects of an application in form of a semantic graph. This graph supports syntax and semantic queries and converts the process of program comprehension into a {\em google-like… ▽ More To address the issue of rising software maintenance cost due to program comprehension challenges, we propose SMARTKT (Smart Knowledge Transfer), a search framework, which extracts and integrates knowledge related to various aspects of an application in form of a semantic graph. This graph supports syntax and semantic queries and converts the process of program comprehension into a {\em google-like} search problem. △ Less

Submitted 12 August, 2023; originally announced August 2023.

Comments: 3 pages, 2 figures, accepted in the NDLI-UNESCO International Symposium on Knowledge Engineering for Digital Library Design 2019 (KEDL) as an extended abstract and poster

arXiv:2307.12915 [pdf, other]

Consensus-based Participatory Budgeting for Legitimacy: Decision Support via Multi-agent Reinforcement Learning

Authors: Srijoni Majumdar, Evangelos Pournaras

Abstract: The legitimacy of bottom-up democratic processes for the distribution of public funds by policy-makers is challenging and complex. Participatory budgeting is such a process, where voting outcomes may not always be fair or inclusive. Deliberation for which project ideas to put for voting and choose for implementation lack systematization and do not scale. This paper addresses these grand challenges… ▽ More The legitimacy of bottom-up democratic processes for the distribution of public funds by policy-makers is challenging and complex. Participatory budgeting is such a process, where voting outcomes may not always be fair or inclusive. Deliberation for which project ideas to put for voting and choose for implementation lack systematization and do not scale. This paper addresses these grand challenges by introducing a novel and legitimate iterative consensus-based participatory budgeting process. Consensus is designed to be a result of decision support via an innovative multi-agent reinforcement learning approach. Voters are assisted to interact with each other to make viable compromises. Extensive experimental evaluation with real-world participatory budgeting data from Poland reveal striking findings: Consensus is reachable, efficient and robust. Compromise is required, which is though comparable to the one of existing voting aggregation methods that promote fairness and inclusion without though attaining consensus. △ Less

Submitted 24 July, 2023; originally announced July 2023.

Comments: 13 Pages, 8 Figures, 3 Tables, Accepted in International Conference on Machine Learning, Optimization, and Data Science, 2023

Journal ref: International Conference on Machine Learning, Optimization, and Data Science, 2023

arXiv:2307.08412 [pdf, other]

A Privacy-Preserving Blockchain-based E-voting System

Authors: Arnab Mukherjee, Souvik Majumdar, Anup Kumar Kolya, Saborni Nandi

Abstract: Within a modern democratic nation, elections play a significant role in the nation's functioning. However, with the existing infrastructure for conducting elections using Electronic Voting Systems (EVMs), many loopholes exist, which illegitimate entities might leverage to cast false votes or even tamper with the EVMs after the voting session is complete. The need of the hour is to introduce a robu… ▽ More Within a modern democratic nation, elections play a significant role in the nation's functioning. However, with the existing infrastructure for conducting elections using Electronic Voting Systems (EVMs), many loopholes exist, which illegitimate entities might leverage to cast false votes or even tamper with the EVMs after the voting session is complete. The need of the hour is to introduce a robust, auditable, transparent, and tamper-proof e-voting system, enabling a more reliable and fair election process. To address such concerns, we propose a novel solution for blockchain-based e-voting, focusing on the security and privacy aspects of the e-voting process. We consider the security risks and loopholes and aim to preserve the anonymity of the voters while ensuring that illegitimate votes are properly handled. Additionally, we develop a prototype as a proof of concept using the Ethereum blockchain platform. Finally, we perform experiments to demonstrate the performance of the system. △ Less

Submitted 17 July, 2023; originally announced July 2023.

arXiv:2306.13696 [pdf, other]

doi 10.1145/3598469.3598472

Improving City Life via Legitimate and Participatory Policy-making: A Data-driven Approach in Switzerland

Authors: Thomas Wellings, Srijoni Majumdar, Regula Hänggli Fricker, Evangelos Pournaras

Abstract: This paper introduces a novel data-driven approach to address challenges faced by city policymakers concerning the distribution of public funds. Providing budgeting processes for improving quality of life based on objective (data-driven) evidence has been so far a missing element in policy-making. This paper focuses on a case study of 1,204 citizens in the city of Aarau, Switzerland, and analyzes… ▽ More This paper introduces a novel data-driven approach to address challenges faced by city policymakers concerning the distribution of public funds. Providing budgeting processes for improving quality of life based on objective (data-driven) evidence has been so far a missing element in policy-making. This paper focuses on a case study of 1,204 citizens in the city of Aarau, Switzerland, and analyzes survey data containing insightful indicators that can impact the legitimacy of decision-making. Our approach is twofold. On the one hand, we aim to optimize the legitimacy of policymakers' decisions by identifying the level of investment in neighborhoods and projects that offer the greatest return in legitimacy. To do so, we introduce a new context-independent legitimacy metric for policymakers. This metric allows us to distinguish decisive vs. indecisive collective preferences for neighborhoods or projects on which to invest, enabling policymakers to prioritize impactful bottom-up consultations and participatory initiatives (e.g., participatory budgeting). The metric also allows policymakers to identify the optimal number of investments in various project sectors and neighborhoods (in terms of legitimacy gain). On the other hand, we aim to offer guidance to policymakers concerning which satisfaction and participation factors influence citizens' quality of life through an accurate classification model and an evaluation of relocations. By doing so, policymakers may be able to further refine their strategy, making targeted investments with significant benefits to citizens' quality of life. These findings are expected to provide transformative insights for practicing direct democracy in Switzerland and a blueprint for policy-making to adopt worldwide. △ Less

Submitted 23 June, 2023; originally announced June 2023.

Comments: 18 pages, 15 figures

Journal ref: 24th Annual International Conference on Digital Government Research (dg.o 2023)

arXiv:2306.06283 [pdf, other]

doi 10.1039/D3DD00113J

14 Examples of How LLMs Can Transform Materials Science and Chemistry: A Reflection on a Large Language Model Hackathon

Authors: Kevin Maik Jablonka, Qianxiang Ai, Alexander Al-Feghali, Shruti Badhwar, Joshua D. Bocarsly, Andres M Bran, Stefan Bringuier, L. Catherine Brinson, Kamal Choudhary, Defne Circi, Sam Cox, Wibe A. de Jong, Matthew L. Evans, Nicolas Gastellu, Jerome Genzling, María Victoria Gil, Ankur K. Gupta, Zhi Hong, Alishba Imran, Sabine Kruschwitz, Anne Labarre, Jakub Lála, Tao Liu, Steven Ma, Sauradeep Majumdar , et al. (28 additional authors not shown)

Abstract: Large-language models (LLMs) such as GPT-4 caught the interest of many scientists. Recent studies suggested that these models could be useful in chemistry and materials science. To explore these possibilities, we organized a hackathon. This article chronicles the projects built as part of this hackathon. Participants employed LLMs for various applications, including predicting properties of mole… ▽ More Large-language models (LLMs) such as GPT-4 caught the interest of many scientists. Recent studies suggested that these models could be useful in chemistry and materials science. To explore these possibilities, we organized a hackathon. This article chronicles the projects built as part of this hackathon. Participants employed LLMs for various applications, including predicting properties of molecules and materials, designing novel interfaces for tools, extracting knowledge from unstructured data, and developing new educational applications. The diverse topics and the fact that working prototypes could be generated in less than two days highlight that LLMs will profoundly impact the future of our fields. The rich collection of ideas and projects also indicates that the applications of LLMs are not limited to materials science and chemistry but offer potential benefits to a wide range of scientific disciplines. △ Less

Submitted 14 July, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

arXiv:2305.16993 [pdf, ps, other]

Discrete-choice Multi-agent Optimization: Decentralized Hard Constraint Satisfaction for Smart Cities

Authors: Srijoni Majumdar, Chuhao Qin, Evangelos Pournaras

Abstract: Making Smart Cities more sustainable, resilient and democratic is emerging as an endeavor of satisfying hard constraints, for instance meeting net-zero targets. Decentralized multi-agent methods for socio-technical optimization of large-scale complex infrastructures such as energy and transport networks are scalable and more privacy-preserving by design. However, they mainly focus on satisfying so… ▽ More Making Smart Cities more sustainable, resilient and democratic is emerging as an endeavor of satisfying hard constraints, for instance meeting net-zero targets. Decentralized multi-agent methods for socio-technical optimization of large-scale complex infrastructures such as energy and transport networks are scalable and more privacy-preserving by design. However, they mainly focus on satisfying soft constraints to remain cost-effective. This paper introduces a new model for decentralized hard constraint satisfaction in discrete-choice combinatorial optimization problems. The model solves the cold start problem of partial information for coordination during initialization that can violate hard constraints. It also preserves a low-cost satisfaction of hard constraints in subsequent coordinated choices during which soft constraints optimization is performed. Strikingly, experimental results in real-world Smart City application scenarios demonstrate the required behavioral shift to preserve optimality when hard constraints are satisfied. These findings are significant for policymakers, system operators, designers and architects to create the missing social capital of running cities in more viable trajectories. △ Less

Submitted 26 May, 2023; originally announced May 2023.

Comments: 8 pages, 7 figures, Accepted for MSDM@AAMAS 2023

arXiv:2305.05084 [pdf, other]

Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition

Authors: Dima Rekesh, Nithin Rao Koluguri, Samuel Kriman, Somshubra Majumdar, Vahid Noroozi, He Huang, Oleksii Hrinchuk, Krishna Puvvada, Ankur Kumar, Jagadeesh Balam, Boris Ginsburg

Abstract: Conformer-based models have become the dominant end-to-end architecture for speech processing tasks. With the objective of enhancing the conformer architecture for efficient training and inference, we carefully redesigned Conformer with a novel downsampling schema. The proposed model, named Fast Conformer(FC), is 2.8x faster than the original Conformer, supports scaling to Billion parameters witho… ▽ More Conformer-based models have become the dominant end-to-end architecture for speech processing tasks. With the objective of enhancing the conformer architecture for efficient training and inference, we carefully redesigned Conformer with a novel downsampling schema. The proposed model, named Fast Conformer(FC), is 2.8x faster than the original Conformer, supports scaling to Billion parameters without any changes to the core architecture and also achieves state-of-the-art accuracy on Automatic Speech Recognition benchmarks. To enable transcription of long-form speech up to 11 hours, we replaced global attention with limited context attention post-training, while also improving accuracy through fine-tuning with the addition of a global token. Fast Conformer, when combined with a Transformer decoder also outperforms the original Conformer in accuracy and in speed for Speech Translation and Spoken Language Understanding. △ Less

Submitted 30 September, 2023; v1 submitted 8 May, 2023; originally announced May 2023.

Comments: Accepted at ASRU 2023

arXiv:2304.07213 [pdf, other]

doi 10.1016/j.rse.2023.113888

Very high resolution canopy height maps from RGB imagery using self-supervised vision transformer and convolutional decoder trained on Aerial Lidar

Authors: Jamie Tolan, Hung-I Yang, Ben Nosarzewski, Guillaume Couairon, Huy Vo, John Brandt, Justine Spore, Sayantan Majumdar, Daniel Haziza, Janaki Vamaraju, Theo Moutakanni, Piotr Bojanowski, Tracy Johns, Brian White, Tobias Tiecke, Camille Couprie

Abstract: Vegetation structure mapping is critical for understanding the global carbon cycle and monitoring nature-based approaches to climate adaptation and mitigation. Repeated measurements of these data allow for the observation of deforestation or degradation of existing forests, natural forest regeneration, and the implementation of sustainable agricultural practices like agroforestry. Assessments of t… ▽ More Vegetation structure mapping is critical for understanding the global carbon cycle and monitoring nature-based approaches to climate adaptation and mitigation. Repeated measurements of these data allow for the observation of deforestation or degradation of existing forests, natural forest regeneration, and the implementation of sustainable agricultural practices like agroforestry. Assessments of tree canopy height and crown projected area at a high spatial resolution are also important for monitoring carbon fluxes and assessing tree-based land uses, since forest structures can be highly spatially heterogeneous, especially in agroforestry systems. Very high resolution satellite imagery (less than one meter (1m) Ground Sample Distance) makes it possible to extract information at the tree level while allowing monitoring at a very large scale. This paper presents the first high-resolution canopy height map concurrently produced for multiple sub-national jurisdictions. Specifically, we produce very high resolution canopy height maps for the states of California and Sao Paulo, a significant improvement in resolution over the ten meter (10m) resolution of previous Sentinel / GEDI based worldwide maps of canopy height. The maps are generated by the extraction of features from a self-supervised model trained on Maxar imagery from 2017 to 2020, and the training of a dense prediction decoder against aerial lidar maps. We also introduce a post-processing step using a convolutional network trained on GEDI observations. We evaluate the proposed maps with set-aside validation lidar data as well as by comparing with other remotely sensed maps and field-collected data, and find our model produces an average Mean Absolute Error (MAE) of 2.8 meters and Mean Error (ME) of 0.6 meters. △ Less

Submitted 15 December, 2023; v1 submitted 14 April, 2023; originally announced April 2023.

Journal ref: Remote Sensing of Environment 300, 113888, 2024

arXiv:2304.06795 [pdf, other]

Efficient Sequence Transduction by Jointly Predicting Tokens and Durations

Authors: Hainan Xu, Fei Jia, Somshubra Majumdar, He Huang, Shinji Watanabe, Boris Ginsburg

Abstract: This paper introduces a novel Token-and-Duration Transducer (TDT) architecture for sequence-to-sequence tasks. TDT extends conventional RNN-Transducer architectures by jointly predicting both a token and its duration, i.e. the number of input frames covered by the emitted token. This is achieved by using a joint network with two outputs which are independently normalized to generate distributions… ▽ More This paper introduces a novel Token-and-Duration Transducer (TDT) architecture for sequence-to-sequence tasks. TDT extends conventional RNN-Transducer architectures by jointly predicting both a token and its duration, i.e. the number of input frames covered by the emitted token. This is achieved by using a joint network with two outputs which are independently normalized to generate distributions over tokens and durations. During inference, TDT models can skip input frames guided by the predicted duration output, which makes them significantly faster than conventional Transducers which process the encoder output frame by frame. TDT models achieve both better accuracy and significantly faster inference than conventional Transducers on different sequence transduction tasks. TDT models for Speech Recognition achieve better accuracy and up to 2.82X faster inference than conventional Transducers. TDT models for Speech Translation achieve an absolute gain of over 1 BLEU on the MUST-C test compared with conventional Transducers, and its inference is 2.27X faster. In Speech Intent Classification and Slot Filling tasks, TDT models improve the intent accuracy by up to over 1% (absolute) over conventional Transducers, while running up to 1.28X faster. Our implementation of the TDT model will be open-sourced with the NeMo (https://github.com/NVIDIA/NeMo) toolkit. △ Less

Submitted 29 May, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

arXiv:2303.08535 [pdf, other]

doi 10.1088/1751-8121/acd829

Singular relaxation of a random walk in a box with a Metropolis Monte Carlo dynamics

Authors: Alexei D. Chepelianskii, Satya N. Majumdar, Hendrik Schawe, Emmanuel Trizac

Abstract: We study analytically the relaxation eigenmodes of a simple Monte Carlo algorithm, corresponding to a particle in a box which moves by uniform random jumps. Moves outside of the box are rejected. At long times, the system approaches the equilibrium probability density, which is uniform inside the box. We show that the relaxation towards this equilibrium is unusual: for a jump length comparable to… ▽ More We study analytically the relaxation eigenmodes of a simple Monte Carlo algorithm, corresponding to a particle in a box which moves by uniform random jumps. Moves outside of the box are rejected. At long times, the system approaches the equilibrium probability density, which is uniform inside the box. We show that the relaxation towards this equilibrium is unusual: for a jump length comparable to the size of the box, the number of relaxation eigenmodes can be surprisingly small, one or two. We provide a complete analytic description of the transition between these two regimes. When only a single relaxation eigenmode is present, a suitable choice of the symmetry of the initial conditions gives a localizing decay to equilibrium. In this case, the deviation from equilibrium concentrates at the edges of the box where the rejection probability is maximal. Finally, in addition to the relaxation analysis of the master equation, we also describe the full eigen-spectrum of the master equation including its sub-leading eigen-modes. △ Less

Submitted 15 March, 2023; originally announced March 2023.

arXiv:2211.13419 [pdf, other]

Network Security Modelling with Distributional Data

Authors: Subhabrata Majumdar, Ganesh Subramaniam

Abstract: We investigate the detection of botnet command and control (C2) hosts in massive IP traffic using machine learning methods. To this end, we use NetFlow data -- the industry standard for monitoring of IP traffic -- and ML models using two sets of features: conventional NetFlow variables and distributional features based on NetFlow variables. In addition to using static summaries of NetFlow features… ▽ More We investigate the detection of botnet command and control (C2) hosts in massive IP traffic using machine learning methods. To this end, we use NetFlow data -- the industry standard for monitoring of IP traffic -- and ML models using two sets of features: conventional NetFlow variables and distributional features based on NetFlow variables. In addition to using static summaries of NetFlow features, we use quantiles of their IP-level distributions as input features in predictive models to predict whether an IP belongs to known botnet families. These models are used to develop intrusion detection systems to predict traffic traces identified with malicious attacks. The results are validated by matching predictions to existing denylists of published malicious IP addresses and deep packet inspection. The usage of our proposed novel distributional features, combined with techniques that enable modelling complex input feature spaces result in highly accurate predictions by our trained models. △ Less

Submitted 24 November, 2022; originally announced November 2022.

Comments: Accepted and presented in CAMLIS 2022, https://www.camlis.org/2022-conference. arXiv admin note: text overlap with arXiv:2108.08924

arXiv:2211.05853 [pdf, ps, other]

Measuring Reliability of Large Language Models through Semantic Consistency

Authors: Harsh Raj, Domenic Rosati, Subhabrata Majumdar

Abstract: While large pretrained language models (PLMs) demonstrate incredible fluency and performance on many natural language tasks, recent work has shown that well-performing PLMs are very sensitive to what prompts are feed into them. Even when prompts are semantically identical, language models may give very different answers. When considering safe and trustworthy deployments of PLMs we would like their… ▽ More While large pretrained language models (PLMs) demonstrate incredible fluency and performance on many natural language tasks, recent work has shown that well-performing PLMs are very sensitive to what prompts are feed into them. Even when prompts are semantically identical, language models may give very different answers. When considering safe and trustworthy deployments of PLMs we would like their outputs to be consistent under prompts that mean the same thing or convey the same intent. While some work has looked into how state-of-the-art PLMs address this need, they have been limited to only evaluating lexical equality of single- or multi-word answers and do not address consistency of generative text sequences. In order to understand consistency of PLMs under text generation settings, we develop a measure of semantic consistency that allows the comparison of open-ended text outputs. We implement several versions of this consistency metric to evaluate the performance of a number of PLMs on paraphrased versions of questions in the TruthfulQA dataset, we find that our proposed metrics are considerably more consistent than traditional metrics embodying lexical consistency, and also correlate with human evaluation of output consistency to a higher degree. △ Less

Submitted 11 April, 2023; v1 submitted 10 November, 2022; originally announced November 2022.

Comments: NeurIPS 2022 ML Safety Workshop, https://neurips2022.mlsafety.org

arXiv:2211.04568 [pdf, ps, other]

Towards Algorithmic Fairness in Space-Time: Filling in Black Holes

Authors: Cheryl Flynn, Aritra Guha, Subhabrata Majumdar, Divesh Srivastava, Zhengyi Zhou

Abstract: New technologies and the availability of geospatial data have drawn attention to spatio-temporal biases present in society. For example: the COVID-19 pandemic highlighted disparities in the availability of broadband service and its role in the digital divide; the environmental justice movement in the United States has raised awareness to health implications for minority populations stemming from h… ▽ More New technologies and the availability of geospatial data have drawn attention to spatio-temporal biases present in society. For example: the COVID-19 pandemic highlighted disparities in the availability of broadband service and its role in the digital divide; the environmental justice movement in the United States has raised awareness to health implications for minority populations stemming from historical redlining practices; and studies have found varying quality and coverage in the collection and sharing of open-source geospatial data. Despite the extensive literature on machine learning (ML) fairness, few algorithmic strategies have been proposed to mitigate such biases. In this paper we highlight the unique challenges for quantifying and addressing spatio-temporal biases, through the lens of use cases presented in the scientific literature and media. We envision a roadmap of ML strategies that need to be developed or adapted to quantify and overcome these challenges -- including transfer learning, active learning, and reinforcement learning techniques. Further, we discuss the potential role of ML in providing guidance to policy makers on issues related to spatial fairness. △ Less

Submitted 8 November, 2022; originally announced November 2022.

arXiv:2211.03541 [pdf, other]

Multi-blank Transducers for Speech Recognition

Authors: Hainan Xu, Fei Jia, Somshubra Majumdar, Shinji Watanabe, Boris Ginsburg

Abstract: This paper proposes a modification to RNN-Transducer (RNN-T) models for automatic speech recognition (ASR). In standard RNN-T, the emission of a blank symbol consumes exactly one input frame; in our proposed method, we introduce additional blank symbols, which consume two or more input frames when emitted. We refer to the added symbols as big blanks, and the method multi-blank RNN-T. For training… ▽ More This paper proposes a modification to RNN-Transducer (RNN-T) models for automatic speech recognition (ASR). In standard RNN-T, the emission of a blank symbol consumes exactly one input frame; in our proposed method, we introduce additional blank symbols, which consume two or more input frames when emitted. We refer to the added symbols as big blanks, and the method multi-blank RNN-T. For training multi-blank RNN-Ts, we propose a novel logit under-normalization method in order to prioritize emissions of big blanks. With experiments on multiple languages and datasets, we show that multi-blank RNN-T methods could bring relative speedups of over +90%/+139% to model inference for English Librispeech and German Multilingual Librispeech datasets, respectively. The multi-blank RNN-T method also improves ASR accuracy consistently. We will release our implementation of the method in the NeMo (https://github.com/NVIDIA/NeMo) toolkit. △ Less

Submitted 11 April, 2024; v1 submitted 4 November, 2022; originally announced November 2022.

Journal ref: ICASSP 2023

arXiv:2210.16780 [pdf]

Recognizing Handwriting Styles in a Historical Scanned Document Using Unsupervised Fuzzy Clustering

Authors: Sriparna Majumdar, Aaron Brick

Abstract: The forensic attribution of the handwriting in a digitized document to multiple scribes is a challenging problem of high dimensionality. Unique handwriting styles may be dissimilar in a blend of several factors including character size, stroke width, loops, ductus, slant angles, and cursive ligatures. Previous work on labeled data with Hidden Markov models, support vector machines, and semi-superv… ▽ More The forensic attribution of the handwriting in a digitized document to multiple scribes is a challenging problem of high dimensionality. Unique handwriting styles may be dissimilar in a blend of several factors including character size, stroke width, loops, ductus, slant angles, and cursive ligatures. Previous work on labeled data with Hidden Markov models, support vector machines, and semi-supervised recurrent neural networks have provided moderate to high success. In this study, we successfully detect hand shifts in a historical manuscript through fuzzy soft clustering in combination with linear principal component analysis. This advance demonstrates the successful deployment of unsupervised methods for writer attribution of historical documents and forensic document analysis. △ Less

Submitted 28 June, 2023; v1 submitted 30 October, 2022; originally announced October 2022.

Comments: 26 pages in total, 5 figures and 2 tables

arXiv:2210.03255 [pdf, other]

Damage Control During Domain Adaptation for Transducer Based Automatic Speech Recognition

Authors: Somshubra Majumdar, Shantanu Acharya, Vitaly Lavrukhin, Boris Ginsburg

Abstract: Automatic speech recognition models are often adapted to improve their accuracy in a new domain. A potential drawback of model adaptation to new domains is catastrophic forgetting, where the Word Error Rate on the original domain is significantly degraded. This paper addresses the situation when we want to simultaneously adapt automatic speech recognition models to a new domain and limit the degra… ▽ More Automatic speech recognition models are often adapted to improve their accuracy in a new domain. A potential drawback of model adaptation to new domains is catastrophic forgetting, where the Word Error Rate on the original domain is significantly degraded. This paper addresses the situation when we want to simultaneously adapt automatic speech recognition models to a new domain and limit the degradation of accuracy on the original domain without access to the original training dataset. We propose several techniques such as a limited training strategy and regularized adapter modules for the Transducer encoder, prediction, and joiner network. We apply these methods to the Google Speech Commands and to the UK and Ireland English Dialect speech data set and obtain strong results on the new target domain while limiting the degradation on the original domain. △ Less

Submitted 6 October, 2022; originally announced October 2022.

Comments: To appear in Proc. SLT 2022, Jan 09-12, 2023, Doha, Qatar

arXiv:2208.00498 [pdf, other]

DNNShield: Dynamic Randomized Model Sparsification, A Defense Against Adversarial Machine Learning

Authors: Mohammad Hossein Samavatian, Saikat Majumdar, Kristin Barber, Radu Teodorescu

Abstract: DNNs are known to be vulnerable to so-called adversarial attacks that manipulate inputs to cause incorrect results that can be beneficial to an attacker or damaging to the victim. Recent works have proposed approximate computation as a defense mechanism against machine learning attacks. We show that these approaches, while successful for a range of inputs, are insufficient to address stronger, hig… ▽ More DNNs are known to be vulnerable to so-called adversarial attacks that manipulate inputs to cause incorrect results that can be beneficial to an attacker or damaging to the victim. Recent works have proposed approximate computation as a defense mechanism against machine learning attacks. We show that these approaches, while successful for a range of inputs, are insufficient to address stronger, high-confidence adversarial attacks. To address this, we propose DNNSHIELD, a hardware-accelerated defense that adapts the strength of the response to the confidence of the adversarial input. Our approach relies on dynamic and random sparsification of the DNN model to achieve inference approximation efficiently and with fine-grain control over the approximation error. DNNSHIELD uses the output distribution characteristics of sparsified inference compared to a dense reference to detect adversarial inputs. We show an adversarial detection rate of 86% when applied to VGG16 and 88% when applied to ResNet50, which exceeds the detection rate of the state of the art approaches, with a much lower overhead. We demonstrate a software/hardware-accelerated FPGA prototype, which reduces the performance impact of DNNSHIELD relative to software-only CPU and GPU implementations. △ Less

Submitted 31 July, 2022; originally announced August 2022.

arXiv:2207.10488 [pdf, other]

doi 10.1088/1742-5468/ad002d

Metropolis Monte Carlo sampling: convergence, localization transition and optimality

Authors: Alexei D. Chepelianskii, Satya N. Majumdar, Hendrik Schawe, Emmanuel Trizac

Abstract: Among random sampling methods, Markov Chain Monte Carlo algorithms are foremost. Using a combination of analytical and numerical approaches, we study their convergence properties towards the steady state, within a random walk Metropolis scheme. Analysing the relaxation properties of some model algorithms sufficiently simple to enable analytic progress, we show that the deviations from the target s… ▽ More Among random sampling methods, Markov Chain Monte Carlo algorithms are foremost. Using a combination of analytical and numerical approaches, we study their convergence properties towards the steady state, within a random walk Metropolis scheme. Analysing the relaxation properties of some model algorithms sufficiently simple to enable analytic progress, we show that the deviations from the target steady-state distribution can feature a localization transition as a function of the characteristic length of the attempted jumps defining the random walk. While the iteration of the Monte Carlo algorithm converges to equilibrium for all choices of jump parameters, the localization transition changes drastically the asymptotic shape of the difference between the probability distribution reached after a finite number of steps of the algorithm and the target equilibrium distribution. We argue that the relaxation before and after the localisation transition is respectively limited by diffusion and rejection rates. △ Less

Submitted 15 April, 2023; v1 submitted 21 July, 2022; originally announced July 2022.

Journal ref: Journal of Statistical Mechanics 123205, (2023)

arXiv:2207.07783 [pdf, other]

Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection

Authors: Kyle Min, Sourya Roy, Subarna Tripathi, Tanaya Guha, Somdeb Majumdar

Abstract: Active speaker detection (ASD) in videos with multiple speakers is a challenging task as it requires learning effective audiovisual features and spatial-temporal correlations over long temporal windows. In this paper, we present SPELL, a novel spatial-temporal graph learning framework that can solve complex tasks such as ASD. To this end, each person in a video frame is first encoded in a unique n… ▽ More Active speaker detection (ASD) in videos with multiple speakers is a challenging task as it requires learning effective audiovisual features and spatial-temporal correlations over long temporal windows. In this paper, we present SPELL, a novel spatial-temporal graph learning framework that can solve complex tasks such as ASD. To this end, each person in a video frame is first encoded in a unique node for that frame. Nodes corresponding to a single person across frames are connected to encode their temporal dynamics. Nodes within a frame are also connected to encode inter-person relationships. Thus, SPELL reduces ASD to a node classification task. Importantly, SPELL is able to reason over long temporal contexts for all nodes without relying on computationally expensive fully connected graph neural networks. Through extensive experiments on the AVA-ActiveSpeaker dataset, we demonstrate that learning graph-based representations can significantly improve the active speaker detection performance owing to its explicit spatial and temporal structure. SPELL outperforms all previous state-of-the-art approaches while requiring significantly lower memory and computational resources. Our code is publicly available at https://github.com/SRA2/SPELL △ Less

Submitted 12 October, 2022; v1 submitted 15 July, 2022; originally announced July 2022.

Comments: ECCV 2022 camera ready (Supplementary videos: on ECVA soon). This paper supersedes arXiv:2112.01479

arXiv:2206.13046 [pdf, other]

DPOAD: Differentially Private Outsourcing of Anomaly Detection through Iterative Sensitivity Learning

Authors: Meisam Mohammady, Han Wang, Lingyu Wang, Mengyuan Zhang, Yosr Jarraya, Suryadipta Majumdar, Makan Pourzandi, Mourad Debbabi, Yuan Hong

Abstract: Outsourcing anomaly detection to third-parties can allow data owners to overcome resource constraints (e.g., in lightweight IoT devices), facilitate collaborative analysis (e.g., under distributed or multi-party scenarios), and benefit from lower costs and specialized expertise (e.g., of Managed Security Service Providers). Despite such benefits, a data owner may feel reluctant to outsource anomal… ▽ More Outsourcing anomaly detection to third-parties can allow data owners to overcome resource constraints (e.g., in lightweight IoT devices), facilitate collaborative analysis (e.g., under distributed or multi-party scenarios), and benefit from lower costs and specialized expertise (e.g., of Managed Security Service Providers). Despite such benefits, a data owner may feel reluctant to outsource anomaly detection without sufficient privacy protection. To that end, most existing privacy solutions would face a novel challenge, i.e., preserving privacy usually requires the difference between data entries to be eliminated or reduced, whereas anomaly detection critically depends on that difference. Such a conflict is recently resolved under a local analysis setting with trusted analysts (where no outsourcing is involved) through moving the focus of differential privacy (DP) guarantee from "all" to only "benign" entries. In this paper, we observe that such an approach is not directly applicable to the outsourcing setting, because data owners do not know which entries are "benign" prior to outsourcing, and hence cannot selectively apply DP on data entries. Therefore, we propose a novel iterative solution for the data owner to gradually "disentangle" the anomalous entries from the benign ones such that the third-party analyst can produce accurate anomaly results with sufficient DP guarantee. We design and implement our Differentially Private Outsourcing of Anomaly Detection (DPOAD) framework, and demonstrate its benefits over baseline Laplace and PainFree mechanisms through experiments with real data from different application domains. △ Less

Submitted 27 June, 2022; originally announced June 2022.

arXiv:2206.05391 [pdf, other]

Feature Selection using e-values

Authors: Subhabrata Majumdar, Snigdhansu Chatterjee

Abstract: In the context of supervised parametric models, we introduce the concept of e-values. An e-value is a scalar quantity that represents the proximity of the sampling distribution of parameter estimates in a model trained on a subset of features to that of the model trained on all features (i.e. the full model). Under general conditions, a rank ordering of e-values separates models that contain all e… ▽ More In the context of supervised parametric models, we introduce the concept of e-values. An e-value is a scalar quantity that represents the proximity of the sampling distribution of parameter estimates in a model trained on a subset of features to that of the model trained on all features (i.e. the full model). Under general conditions, a rank ordering of e-values separates models that contain all essential features from those that do not. The e-values are applicable to a wide range of parametric models. We use data depths and a fast resampling-based algorithm to implement a feature selection procedure using e-values, providing consistency results. For a $p$-dimensional feature space, this procedure requires fitting only the full model and evaluating $p+1$ models, as opposed to the traditional requirement of fitting and evaluating $2^p$ models. Through experiments across several model settings and synthetic and real datasets, we establish that the e-values method as a promising general alternative to existing model-specific methods of feature selection. △ Less

Submitted 16 June, 2022; v1 submitted 10 June, 2022; originally announced June 2022.

Comments: accepted in ICML-2022

Journal ref: Proceedings of the 39th International Conference on Machine Learning, PMLR 162:14753-14773, 2022, https://proceedings.mlr.press/v162/majumdar22a.html

arXiv:2204.06386 [pdf]

Efficient Deep Neural Network Accelerator Using Controlled Ferroelectric Domain Dynamics

Authors: Sayani Majumdar

Abstract: The current work reports an efficient deep neural network (DNN) accelerator where synaptic weight elements are controlled by ferroelectric domain dynamics. An integrated device-to-algorithm framework for benchmarking novel synaptic devices is used. In P(VDF-TrFE) based ferroelectric tunnel junctions, analog conductance states are measured using a custom pulsing protocol and associated custom circu… ▽ More The current work reports an efficient deep neural network (DNN) accelerator where synaptic weight elements are controlled by ferroelectric domain dynamics. An integrated device-to-algorithm framework for benchmarking novel synaptic devices is used. In P(VDF-TrFE) based ferroelectric tunnel junctions, analog conductance states are measured using a custom pulsing protocol and associated custom circuits and array architectures for DNN training is simulated. Our results show precise control of polarization switching dynamics in multi-domain, polycrystalline ferroelectric thin films can produce considerable weight update linearity in metal-ferroelectric-semiconductor (MFS) tunnel junctions. Ultrafast switching and low junction current in these devices offer extremely energy efficient operation. Through an integrated platform of hardware development, characterization and modelling, we predict the available conductance range where linearity is expected under identical potentiating and depressing pulses for efficient DNN training and inference tasks. As an example, an analog crossbar based DNN accelerator with MFS junctions as synaptic weight elements showed ~ 93% training accuracy on large MNIST handwritten digit dataset while for cropped images, a 95% accuracy is achieved. One observed challenge is rather limited dynamic conductance range while operating under identical potentiating and depressing pulses below 1V. Investigation is underway for improving the dynamic conductance range without losing the weight update linearity. △ Less

Submitted 13 October, 2022; v1 submitted 13 April, 2022; originally announced April 2022.

arXiv:2204.01696 [pdf, other]

Joint Hand Motion and Interaction Hotspots Prediction from Egocentric Videos

Authors: Shaowei Liu, Subarna Tripathi, Somdeb Majumdar, Xiaolong Wang

Abstract: We propose to forecast future hand-object interactions given an egocentric video. Instead of predicting action labels or pixels, we directly predict the hand motion trajectory and the future contact points on the next active object (i.e., interaction hotspots). This relatively low-dimensional representation provides a concrete description of future interactions. To tackle this task, we first provi… ▽ More We propose to forecast future hand-object interactions given an egocentric video. Instead of predicting action labels or pixels, we directly predict the hand motion trajectory and the future contact points on the next active object (i.e., interaction hotspots). This relatively low-dimensional representation provides a concrete description of future interactions. To tackle this task, we first provide an automatic way to collect trajectory and hotspots labels on large-scale data. We then use this data to train an Object-Centric Transformer (OCT) model for prediction. Our model performs hand and object interaction reasoning via the self-attention mechanism in Transformers. OCT also provides a probabilistic framework to sample the future trajectory and hotspots to handle uncertainty in prediction. We perform experiments on the Epic-Kitchens-55, Epic-Kitchens-100, and EGTEA Gaze+ datasets, and show that OCT significantly outperforms state-of-the-art approaches by a large margin. Project page is available at https://stevenlsw.github.io/hoi-forecast . △ Less

Submitted 4 April, 2022; originally announced April 2022.

Comments: CVPR 2022, Project page: https://stevenlsw.github.io/hoi-forecast

arXiv:2112.09828 [pdf, other]

Exploiting Long-Term Dependencies for Generating Dynamic Scene Graphs

Authors: Shengyu Feng, Subarna Tripathi, Hesham Mostafa, Marcel Nassar, Somdeb Majumdar

Abstract: Dynamic scene graph generation from a video is challenging due to the temporal dynamics of the scene and the inherent temporal fluctuations of predictions. We hypothesize that capturing long-term temporal dependencies is the key to effective generation of dynamic scene graphs. We propose to learn the long-term dependencies in a video by capturing the object-level consistency and inter-object relat… ▽ More Dynamic scene graph generation from a video is challenging due to the temporal dynamics of the scene and the inherent temporal fluctuations of predictions. We hypothesize that capturing long-term temporal dependencies is the key to effective generation of dynamic scene graphs. We propose to learn the long-term dependencies in a video by capturing the object-level consistency and inter-object relationship dynamics over object-level long-term tracklets using transformers. Experimental results demonstrate that our Dynamic Scene Graph Detection Transformer (DSG-DETR) outperforms state-of-the-art methods by a significant margin on the benchmark dataset Action Genome. Our ablation studies validate the effectiveness of each component of the proposed approach. The source code is available at https://github.com/Shengyu-Feng/DSG-DETR. △ Less

Submitted 19 October, 2022; v1 submitted 17 December, 2021; originally announced December 2021.

Comments: WACV 2023

arXiv:2112.01479 [pdf, other]

Learning Spatial-Temporal Graphs for Active Speaker Detection

Authors: Sourya Roy, Kyle Min, Subarna Tripathi, Tanaya Guha, Somdeb Majumdar

Abstract: We address the problem of active speaker detection through a new framework, called SPELL, that learns long-range multimodal graphs to encode the inter-modal relationship between audio and visual data. We cast active speaker detection as a node classification task that is aware of longer-term dependencies. We first construct a graph from a video so that each node corresponds to one person. Nodes re… ▽ More We address the problem of active speaker detection through a new framework, called SPELL, that learns long-range multimodal graphs to encode the inter-modal relationship between audio and visual data. We cast active speaker detection as a node classification task that is aware of longer-term dependencies. We first construct a graph from a video so that each node corresponds to one person. Nodes representing the same identity share edges between them within a defined temporal window. Nodes within the same video frame are also connected to encode inter-person interactions. Through extensive experiments on the Ava-ActiveSpeaker dataset, we demonstrate that learning graph-based representation, owing to its explicit spatial and temporal structure, significantly improves the overall performance. SPELL outperforms several relevant baselines and performs at par with state of the art models while requiring an order of magnitude lower computation cost. △ Less

Submitted 3 December, 2021; v1 submitted 2 December, 2021; originally announced December 2021.

Comments: 10 pages

arXiv:2110.03098 [pdf, other]

doi 10.21437/interspeech.2022-10854

CTC Variations Through New WFST Topologies

Authors: Aleksandr Laptev, Somshubra Majumdar, Boris Ginsburg

Abstract: This paper presents novel Weighted Finite-State Transducer (WFST) topologies to implement Connectionist Temporal Classification (CTC)-like algorithms for automatic speech recognition. Three new CTC variants are proposed: (1) the "compact-CTC", in which direct transitions between units are replaced with <epsilon> back-off transitions; (2) the "minimal-CTC", that only adds <blank> self-loops when us… ▽ More This paper presents novel Weighted Finite-State Transducer (WFST) topologies to implement Connectionist Temporal Classification (CTC)-like algorithms for automatic speech recognition. Three new CTC variants are proposed: (1) the "compact-CTC", in which direct transitions between units are replaced with <epsilon> back-off transitions; (2) the "minimal-CTC", that only adds <blank> self-loops when used in WFST-composition; and (3) the "selfless-CTC" variants, which disallows self-loop for non-blank units. Compact-CTC allows for 1.5 times smaller WFST decoding graphs and reduces memory consumption by two times when training CTC models with the LF-MMI objective without hurting the recognition accuracy. Minimal-CTC reduces graph size and memory consumption by two and four times for the cost of a small accuracy drop. Using selfless-CTC can improve the accuracy for wide context window models. △ Less

Submitted 26 June, 2022; v1 submitted 6 October, 2021; originally announced October 2021.

Comments: Accepted to Interspeech 2022, 5 pages, 2 figures, 7 tables

arXiv:2108.04187 [pdf, other]

Scaling New Peaks: A Viewership-centric Approach to Automated Content Curation

Authors: Subhabrata Majumdar, Deirdre Paul, Eric Zavesky

Abstract: Summarizing video content is important for video streaming services to engage the user in a limited time span. To this end, current methods involve manual curation or using passive interest cues to annotate potential high-interest segments to form the basis of summarized videos, and are costly and unreliable. We propose a viewership-driven, automated method that accommodates a range of segment ide… ▽ More Summarizing video content is important for video streaming services to engage the user in a limited time span. To this end, current methods involve manual curation or using passive interest cues to annotate potential high-interest segments to form the basis of summarized videos, and are costly and unreliable. We propose a viewership-driven, automated method that accommodates a range of segment identification goals. Using satellite television viewership data as a source of ground truth for viewer interest, we apply statistical anomaly detection on a timeline of viewership metrics to identify 'seed' segments of high viewer interest. These segments are post-processed using empirical rules and several sources of content metadata, e.g. shot boundaries, adding in personalization aspects to produce the final highlights video. To demonstrate the flexibility of our approach, we present two case studies, on the United States Democratic Presidential Debate on 19th December 2019, and Wimbledon Women's Final 2019. We perform qualitative comparisons with their publicly available highlights, as well as early vs. late viewership comparisons for insights into possible media and social influence on viewing behavior. △ Less

Submitted 9 August, 2021; originally announced August 2021.

arXiv:2107.13268 [pdf, ps, other]

A Distributed Intelligence Architecture for B5G Network Automation

Authors: Sayantini Majumdar, Riccardo Trivisonno, Georg Carle

Abstract: The management of networks is automated by closed loops. Concurrent closed loops aiming for individual optimization cause conflicts which, left unresolved, leads to significant degradation in performance indicators, resulting in sub-optimal network performance. Centralized optimization avoids conflicts, but impractical in large-scale networks for time-critical applications. Distributed, pervasive… ▽ More The management of networks is automated by closed loops. Concurrent closed loops aiming for individual optimization cause conflicts which, left unresolved, leads to significant degradation in performance indicators, resulting in sub-optimal network performance. Centralized optimization avoids conflicts, but impractical in large-scale networks for time-critical applications. Distributed, pervasive intelligence is therefore envisaged in the evolution to B5G networks. In this letter, we propose a Q-Learning-based distributed architecture (QLC), addressing the conflict issue by encouraging cooperation among intelligent agents. We design a realistic B5G network slice auto-scaling model and validate the performance of QLC via simulations, justifying further research in this direction. △ Less

Submitted 7 October, 2021; v1 submitted 28 July, 2021; originally announced July 2021.

Comments: 6 pages, 4 figures. This work has been submitted to the IEEE Networking Letters for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2107.10708 [pdf, other]

CarneliNet: Neural Mixture Model for Automatic Speech Recognition

Authors: Aleksei Kalinov, Somshubra Majumdar, Jagadeesh Balam, Boris Ginsburg

Abstract: End-to-end automatic speech recognition systems have achieved great accuracy by using deeper and deeper models. However, the increased depth comes with a larger receptive field that can negatively impact model performance in streaming scenarios. We propose an alternative approach that we call Neural Mixture Model. The basic idea is to introduce a parallel mixture of shallow networks instead of a v… ▽ More End-to-end automatic speech recognition systems have achieved great accuracy by using deeper and deeper models. However, the increased depth comes with a larger receptive field that can negatively impact model performance in streaming scenarios. We propose an alternative approach that we call Neural Mixture Model. The basic idea is to introduce a parallel mixture of shallow networks instead of a very deep network. To validate this idea we design CarneliNet -- a CTC-based neural network composed of three mega-blocks. Each mega-block consists of multiple parallel shallow sub-networks based on 1D depthwise-separable convolutions. We evaluate the model on LibriSpeech, MLS and AISHELL-2 datasets and achieved close to state-of-the-art results for CTC-based models. Finally, we demonstrate that one can dynamically reconfigure the number of parallel sub-networks to accommodate the computational requirements without retraining. △ Less

Submitted 22 July, 2021; originally announced July 2021.

Comments: Submitted to ASRU 2021

arXiv:2107.09804 [pdf, other]

doi 10.1109/HOST49136.2021.9702287

Using Undervolting as an On-Device Defense Against Adversarial Machine Learning Attacks

Authors: Saikat Majumdar, Mohammad Hossein Samavatian, Kristin Barber, Radu Teodorescu

Abstract: Deep neural network (DNN) classifiers are powerful tools that drive a broad spectrum of important applications, from image recognition to autonomous vehicles. Unfortunately, DNNs are known to be vulnerable to adversarial attacks that affect virtually all state-of-the-art models. These attacks make small imperceptible modifications to inputs that are sufficient to induce the DNNs to produce the wro… ▽ More Deep neural network (DNN) classifiers are powerful tools that drive a broad spectrum of important applications, from image recognition to autonomous vehicles. Unfortunately, DNNs are known to be vulnerable to adversarial attacks that affect virtually all state-of-the-art models. These attacks make small imperceptible modifications to inputs that are sufficient to induce the DNNs to produce the wrong classification. In this paper we propose a novel, lightweight adversarial correction and/or detection mechanism for image classifiers that relies on undervolting (running a chip at a voltage that is slightly below its safe margin). We propose using controlled undervolting of the chip running the inference process in order to introduce a limited number of compute errors. We show that these errors disrupt the adversarial input in a way that can be used either to correct the classification or detect the input as adversarial. We evaluate the proposed solution in an FPGA design and through software simulation. We evaluate 10 attacks and show average detection rates of 77% and 90% on two popular DNNs. △ Less

Submitted 6 August, 2021; v1 submitted 20 July, 2021; originally announced July 2021.

Journal ref: 2021 IEEE International Symposium on Hardware Oriented Security and Trust (HOST)

arXiv:2107.07341 [pdf]

Utilizing a digital swarm intelligence platform to improve consensus among radiologists and exploring its applications

Authors: Rutwik Shah, Bruno Astuto, Tyler Gleason, Will Fletcher, Justin Banaga, Kevin Sweetwood, Allen Ye, Rina Patel, Kevin McGill, Thomas Link, Jason Crane, Valentina Pedoia, Sharmila Majumdar

Abstract: Radiologists today play a key role in making diagnostic decisions and labeling images for training A.I. algorithms. Low inter-reader reliability (IRR) can be seen between experts when interpreting challenging cases. While teams-based decisions are known to outperform individual decisions, inter-personal biases often creep up in group interactions which limit non-dominant participants from expressi… ▽ More Radiologists today play a key role in making diagnostic decisions and labeling images for training A.I. algorithms. Low inter-reader reliability (IRR) can be seen between experts when interpreting challenging cases. While teams-based decisions are known to outperform individual decisions, inter-personal biases often creep up in group interactions which limit non-dominant participants from expressing true opinions. To overcome the dual problems of low consensus and inter-personal bias, we explored a solution modeled on biological swarms of bees. Two separate cohorts; three radiologists and five radiology residents collaborated on a digital swarm platform in real time and in a blinded fashion, grading meniscal lesions on knee MR exams. These consensus votes were benchmarked against clinical (arthroscopy) and radiological (senior-most radiologist) observations. The IRR of the consensus votes was compared to the IRR of the majority and most confident votes of the two cohorts.The radiologist cohort saw an improvement of 23% in IRR of swarm votes over majority vote. Similar improvement of 23% in IRR in 3-resident swarm votes over majority vote, was observed. The 5-resident swarm had an even higher improvement of 32% in IRR over majority vote. Swarm consensus votes also improved specificity by up to 50%. The swarm consensus votes outperformed individual and majority vote decisions in both the radiologists and resident cohorts. The 5-resident swarm had higher IRR than 3-resident swarm indicating positive effect of increased swarm size. The attending and resident swarms also outperformed predictions from a state-of-the-art A.I. algorithm. Utilizing a digital swarm platform improved agreement and allows participants to express judgement free intent, resulting in superior clinical performance and robust A.I. training labels. △ Less

Submitted 6 September, 2021; v1 submitted 26 June, 2021; originally announced July 2021.

Comments: 29 pages, 3 tables, 7 figures

arXiv:2107.01103 [pdf, other]

Generalized Multivariate Signs for Nonparametric Hypothesis Testing in High Dimensions

Authors: Subhabrata Majumdar, Snigdhansu Chatterjee

Abstract: High-dimensional data, where the dimension of the feature space is much larger than sample size, arise in a number of statistical applications. In this context, we construct the generalized multivariate sign transformation, defined as a vector divided by its norm. For different choices of the norm function, the resulting transformed vector adapts to certain geometrical features of the data distrib… ▽ More High-dimensional data, where the dimension of the feature space is much larger than sample size, arise in a number of statistical applications. In this context, we construct the generalized multivariate sign transformation, defined as a vector divided by its norm. For different choices of the norm function, the resulting transformed vector adapts to certain geometrical features of the data distribution. Building up on this idea, we obtain one-sample and two-sample testing procedures for mean vectors of high-dimensional data using these generalized sign vectors. These tests are based on U-statistics using kernel inner products, do not require prohibitive assumptions, and are amenable to a fast randomization-based implementation. Through experiments in a number of data settings, we show that tests using generalized signs display higher power than existing tests, while maintaining nominal type-I error rates. Finally, we provide example applications on the MNIST and Minnesota Twin Studies genomic data. △ Less

Submitted 2 July, 2021; originally announced July 2021.

arXiv:2106.08482 [pdf, other]

Minimizing Communication while Maximizing Performance in Multi-Agent Reinforcement Learning

Authors: Varun Kumar Vijay, Hassam Sheikh, Somdeb Majumdar, Mariano Phielipp

Abstract: Inter-agent communication can significantly increase performance in multi-agent tasks that require co-ordination to achieve a shared goal. Prior work has shown that it is possible to learn inter-agent communication protocols using multi-agent reinforcement learning and message-passing network architectures. However, these models use an unconstrained broadcast communication model, in which an agent… ▽ More Inter-agent communication can significantly increase performance in multi-agent tasks that require co-ordination to achieve a shared goal. Prior work has shown that it is possible to learn inter-agent communication protocols using multi-agent reinforcement learning and message-passing network architectures. However, these models use an unconstrained broadcast communication model, in which an agent communicates with all other agents at every step, even when the task does not require it. In real-world applications, where communication may be limited by system constraints like bandwidth, power and network capacity, one might need to reduce the number of messages that are sent. In this work, we explore a simple method of minimizing communication while maximizing performance in multi-task learning: simultaneously optimizing a task-specific objective and a communication penalty. We show that the objectives can be optimized using Reinforce and the Gumbel-Softmax reparameterization. We introduce two techniques to stabilize training: 50% training and message forwarding. Training with the communication penalty on only 50% of the episodes prevents our models from turning off their outgoing messages. Second, repeating messages received previously helps models retain information, and further improves performance. With these techniques, we show that we can reduce communication by 75% with no loss of performance. △ Less

Submitted 8 December, 2021; v1 submitted 15 June, 2021; originally announced June 2021.

arXiv:2106.07611 [pdf]

Neuroevolution-Enhanced Multi-Objective Optimization for Mixed-Precision Quantization

Authors: Santiago Miret, Vui Seng Chua, Mattias Marder, Mariano Phielipp, Nilesh Jain, Somdeb Majumdar

Abstract: Mixed-precision quantization is a powerful tool to enable memory and compute savings of neural network workloads by deploying different sets of bit-width precisions on separate compute operations. In this work, we present a flexible and scalable framework for automated mixed-precision quantization that concurrently optimizes task performance, memory compression, and compute savings through multi-o… ▽ More Mixed-precision quantization is a powerful tool to enable memory and compute savings of neural network workloads by deploying different sets of bit-width precisions on separate compute operations. In this work, we present a flexible and scalable framework for automated mixed-precision quantization that concurrently optimizes task performance, memory compression, and compute savings through multi-objective evolutionary computing. Our framework centers on Neuroevolution-Enhanced Multi-Objective Optimization (NEMO), a novel search method, which combines established search methods with the representational power of neural networks. Within NEMO, the population is divided into structurally distinct sub-populations, or species, which jointly create the Pareto frontier of solutions for the multi-objective problem. At each generation, species perform separate mutation and crossover operations, and are re-sized in proportion to the goodness of their contribution to the Pareto frontier. In our experiments, we define a graph-based representation to describe the underlying workload, enabling us to deploy graph neural networks trained by NEMO via neuroevolution, to find Pareto optimal configurations for MobileNet-V2, ResNet50 and ResNeXt-101-32x8d. Compared to the state-of-the-art, we achieve competitive results on memory compression and superior results for compute compression. Further analysis reveals that the graph representation and the species-based approach employed by NEMO are critical to finding optimal solutions. △ Less

Submitted 1 April, 2022; v1 submitted 14 June, 2021; originally announced June 2021.

arXiv:2106.05825 [pdf, other]

HASI: Hardware-Accelerated Stochastic Inference, A Defense Against Adversarial Machine Learning Attacks

Authors: Mohammad Hossein Samavatian, Saikat Majumdar, Kristin Barber, Radu Teodorescu

Abstract: Deep Neural Networks (DNNs) are employed in an increasing number of applications, some of which are safety critical. Unfortunately, DNNs are known to be vulnerable to so-called adversarial attacks that manipulate inputs to cause incorrect results that can be beneficial to an attacker or damaging to the victim. Multiple defenses have been proposed to increase the robustness of DNNs. In general, the… ▽ More Deep Neural Networks (DNNs) are employed in an increasing number of applications, some of which are safety critical. Unfortunately, DNNs are known to be vulnerable to so-called adversarial attacks that manipulate inputs to cause incorrect results that can be beneficial to an attacker or damaging to the victim. Multiple defenses have been proposed to increase the robustness of DNNs. In general, these defenses have high overhead, some require attack-specific re-training of the model or careful tuning to adapt to different attacks. This paper presents HASI, a hardware-accelerated defense that uses a process we call stochastic inference to detect adversarial inputs. We show that by carefully injecting noise into the model at inference time, we can differentiate adversarial inputs from benign ones. HASI uses the output distribution characteristics of noisy inference compared to a non-noisy reference to detect adversarial inputs. We show an adversarial detection rate of 86% when applied to VGG16 and 93% when applied to ResNet50, which exceeds the detection rate of the state of the art approaches, with a much lower overhead. We demonstrate two software/hardware-accelerated co-designs, which reduces the performance impact of stochastic inference to 1.58X-2X relative to the unprotected baseline, compared to 15X-20X overhead for a software-only GPU implementation. △ Less

Submitted 6 August, 2021; v1 submitted 9 June, 2021; originally announced June 2021.

Journal ref: Secure and Private Systems for Machine Learning Workshop 2021

Showing 1–50 of 88 results for author: Majumdar, S