-
Energy Efficient Fair STAR-RIS for Mobile Users
Authors:
Ashok S. Kumar,
Nancy Nayak,
Sheetal Kalyani,
Himal A. Suraweera
Abstract:
In this work, we propose a method to improve the energy efficiency and fairness of simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RIS) for mobile users, ensuring reduced power consumption while maintaining reliable communication. To achieve this, we introduce a new parameter known as the subsurface assignment variable, which determines the number of STAR-RIS e…
▽ More
In this work, we propose a method to improve the energy efficiency and fairness of simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RIS) for mobile users, ensuring reduced power consumption while maintaining reliable communication. To achieve this, we introduce a new parameter known as the subsurface assignment variable, which determines the number of STAR-RIS elements allocated to each user. We then formulate a novel optimization problem by concurrently optimizing the phase shifts of the STAR-RIS and subsurface assignment variable. We leverage the deep reinforcement learning (DRL) technique to address this optimization problem. The DRL model predicts the phase shifts of the STAR-RIS and efficiently allocates elements of STAR-RIS to the users. Additionally, we incorporate a penalty term in the DRL model to facilitate intelligent deactivation of STAR-RIS elements when not in use to enhance energy efficiency. Through extensive experiments, we show that the proposed method can achieve fairly high and nearly equal data rates for all users in both the transmission and reflection spaces in an energy-efficient manner.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
FuseMax: Leveraging Extended Einsums to Optimize Attention Accelerator Design
Authors:
Nandeeka Nayak,
Xinrui Wu,
Toluwanimi O. Odemuyiwa,
Michael Pellauer,
Joel S. Emer,
Christopher W. Fletcher
Abstract:
Attention for transformers is a critical workload that has recently received significant "attention" as a target for custom acceleration. Yet, while prior work succeeds in reducing attention's memory-bandwidth requirements, it creates load imbalance between attention operators (resulting in severe compute under-utilization) and requires on-chip memory that scales with sequence length (which is exp…
▽ More
Attention for transformers is a critical workload that has recently received significant "attention" as a target for custom acceleration. Yet, while prior work succeeds in reducing attention's memory-bandwidth requirements, it creates load imbalance between attention operators (resulting in severe compute under-utilization) and requires on-chip memory that scales with sequence length (which is expected to grow over time).
This paper ameliorates these issues, enabling attention with nearly 100% compute utilization, no off-chip memory traffic bottlenecks, and on-chip buffer size requirements that are independent of sequence length. The main conceptual contribution is to use a recently proposed abstraction -- the cascade of Einsums -- to describe, formalize and taxonomize the space of attention algorithms that appear in the literature. In particular, we show how Einsum cascades can be used to infer non-trivial lower bounds on the number of passes a kernel must take through its input data, which has implications for either required on-chip buffer capacity or memory traffic. We show how this notion can be used to meaningfully divide the space of attention algorithms into several categories and use these categories to inform our design process.
Based on the above characterization, we propose FuseMax -- a novel mapping of attention onto a spatial array-style architecture. On attention, in an iso-area comparison, FuseMax achieves an average $6.7\times$ speedup over the prior state-of-the-art FLAT while using $79\%$ of the energy. Similarly, on the full end-to-end transformer inference, FuseMax achieves an average $5.3\times$ speedup over FLAT using $83\%$ of the energy.
△ Less
Submitted 25 June, 2024; v1 submitted 15 June, 2024;
originally announced June 2024.
-
Learning to Generate Instruction Tuning Datasets for Zero-Shot Task Adaptation
Authors:
Nihal V. Nayak,
Yiyang Nan,
Avi Trost,
Stephen H. Bach
Abstract:
We introduce Bonito, an open-source model for conditional task generation that converts unannotated text into task-specific training datasets for instruction tuning. We aim to enable zero-shot task adaptation of large language models on users' specialized, private data. We train Bonito by fine-tuning a pretrained large language model on a new large-scale dataset with 1.65M examples created by remi…
▽ More
We introduce Bonito, an open-source model for conditional task generation that converts unannotated text into task-specific training datasets for instruction tuning. We aim to enable zero-shot task adaptation of large language models on users' specialized, private data. We train Bonito by fine-tuning a pretrained large language model on a new large-scale dataset with 1.65M examples created by remixing existing instruction tuning datasets into meta-templates. The meta-templates for a dataset produce training examples where the input is the unannotated text and the task attribute and the output consists of the instruction and the response. We use Bonito to generate synthetic tasks for seven datasets from specialized domains with unannotated text across three task types -- yes-no question answering, extractive question answering, and natural language inference -- and adapt language models. We show that Bonito significantly improves the average performance of pretrained and instruction tuned models over the de facto self supervised baseline. For example, adapting Mistral-Instruct-v2 and instruction tuned variants of Mistral and Llama2 with Bonito improves the strong zero-shot performance by 22.1 F1 points whereas the next word prediction objective undoes some of the benefits of instruction tuning and reduces the average performance by 0.8 F1 points. We conduct additional experiments with Bonito to understand the effects of the domain, the size of the training set, and the choice of alternative synthetic task generators. Overall, we show that learning with synthetic instruction tuning datasets is an effective way to adapt language models to new domains. The model, dataset, and code are available at https://github.com/BatsResearch/bonito.
△ Less
Submitted 6 June, 2024; v1 submitted 28 February, 2024;
originally announced February 2024.
-
TeAAL: A Declarative Framework for Modeling Sparse Tensor Accelerators
Authors:
Nandeeka Nayak,
Toluwanimi O. Odemuyiwa,
Shubham Ugare,
Christopher W. Fletcher,
Michael Pellauer,
Joel S. Emer
Abstract:
Over the past few years, the explosion in sparse tensor algebra workloads has led to a corresponding rise in domain-specific accelerators to service them. Due to the irregularity present in sparse tensors, these accelerators employ a wide variety of novel solutions to achieve good performance. At the same time, prior work on design-flexible sparse accelerator modeling does not express this full ra…
▽ More
Over the past few years, the explosion in sparse tensor algebra workloads has led to a corresponding rise in domain-specific accelerators to service them. Due to the irregularity present in sparse tensors, these accelerators employ a wide variety of novel solutions to achieve good performance. At the same time, prior work on design-flexible sparse accelerator modeling does not express this full range of design features, making it difficult to understand the impact of each design choice and compare or extend the state-of-the-art.
To address this, we propose TeAAL: a language and simulator generator for the concise and precise specification and evaluation of sparse tensor algebra accelerators. We use TeAAL to represent and evaluate four disparate state-of-the-art accelerators -- ExTensor, Gamma, OuterSPACE, and SIGMA -- and verify that it reproduces their performance with high accuracy. Finally, we demonstrate the potential of TeAAL as a tool for designing new accelerators by showing how it can be used to speed up vertex-centric programming accelerators -- achieving $1.9\times$ on BFS and $1.2\times$ on SSSP over GraphDynS.
△ Less
Submitted 11 June, 2024; v1 submitted 16 April, 2023;
originally announced April 2023.
-
A DRL Approach for RIS-Assisted Full-Duplex UL and DL Transmission: Beamforming, Phase Shift and Power Optimization
Authors:
Nancy Nayak,
Sheetal Kalyani,
Himal A. Suraweera
Abstract:
We propose a deep reinforcement learning (DRL) approach for a full-duplex (FD) transmission that predicts the phase shifts of the reconfigurable intelligent surface (RIS), base station (BS) active beamformers, and the transmit powers to maximize the weighted sum rate of uplink and downlink users. Existing methods require channel state information (CSI) and residual self-interference (SI) knowledge…
▽ More
We propose a deep reinforcement learning (DRL) approach for a full-duplex (FD) transmission that predicts the phase shifts of the reconfigurable intelligent surface (RIS), base station (BS) active beamformers, and the transmit powers to maximize the weighted sum rate of uplink and downlink users. Existing methods require channel state information (CSI) and residual self-interference (SI) knowledge to calculate exact active beamformers or the DRL rewards, which typically fail without CSI or residual SI. Especially for time-varying channels, estimating and signaling CSI to the DRL agent is required at each time step and is costly. We propose a two-stage DRL framework with minimal signaling overhead to address this. The first stage uses the least squares method to initiate learning by partially canceling the residual SI. The second stage uses DRL to achieve performance comparable to existing CSI-based methods without requiring the CSI or the exact residual SI. Further, the proposed DRL framework for quantized RIS phase shifts reduces the signaling from BS to the RISs using $32$ times fewer bits than the continuous version. The quantized methods reduce action space, resulting in faster convergence and $7.1\%$ and $22.28\%$ better UL and DL rates, respectively than the continuous method.
△ Less
Submitted 20 June, 2024; v1 submitted 28 December, 2022;
originally announced December 2022.
-
Does CLIP Bind Concepts? Probing Compositionality in Large Image Models
Authors:
Martha Lewis,
Nihal V. Nayak,
Peilin Yu,
Qinan Yu,
Jack Merullo,
Stephen H. Bach,
Ellie Pavlick
Abstract:
Large-scale neural network models combining text and images have made incredible progress in recent years. However, it remains an open question to what extent such models encode compositional representations of the concepts over which they operate, such as correctly identifying ''red cube'' by reasoning over the constituents ''red'' and ''cube''. In this work, we focus on the ability of a large pr…
▽ More
Large-scale neural network models combining text and images have made incredible progress in recent years. However, it remains an open question to what extent such models encode compositional representations of the concepts over which they operate, such as correctly identifying ''red cube'' by reasoning over the constituents ''red'' and ''cube''. In this work, we focus on the ability of a large pretrained vision and language model (CLIP) to encode compositional concepts and to bind variables in a structure-sensitive way (e.g., differentiating ''cube behind sphere'' from ''sphere behind cube''). In order to inspect the performance of CLIP, we compare several architectures from research on compositional distributional semantics models (CDSMs), a line of research that attempts to implement traditional compositional linguistic structures within embedding spaces. We find that CLIP can compose concepts in a single-object setting, but in situations where concept binding is needed, performance drops dramatically. At the same time, CDSMs also perform poorly, with best performance at chance level.
△ Less
Submitted 29 March, 2023; v1 submitted 20 December, 2022;
originally announced December 2022.
-
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
Authors:
BigScience Workshop,
:,
Teven Le Scao,
Angela Fan,
Christopher Akiki,
Ellie Pavlick,
Suzana Ilić,
Daniel Hesslow,
Roman Castagné,
Alexandra Sasha Luccioni,
François Yvon,
Matthias Gallé,
Jonathan Tow,
Alexander M. Rush,
Stella Biderman,
Albert Webson,
Pawan Sasanka Ammanamanchi,
Thomas Wang,
Benoît Sagot,
Niklas Muennighoff,
Albert Villanova del Moral,
Olatunji Ruwase,
Rachel Bawden,
Stas Bekman,
Angelina McMillan-Major
, et al. (369 additional authors not shown)
Abstract:
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access…
▽ More
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
△ Less
Submitted 27 June, 2023; v1 submitted 9 November, 2022;
originally announced November 2022.
-
CEREAL: Few-Sample Clustering Evaluation
Authors:
Nihal V. Nayak,
Ethan R. Elenberg,
Clemens Rosenbaum
Abstract:
Evaluating clustering quality with reliable evaluation metrics like normalized mutual information (NMI) requires labeled data that can be expensive to annotate. We focus on the underexplored problem of estimating clustering quality with limited labels. We adapt existing approaches from the few-sample model evaluation literature to actively sub-sample, with a learned surrogate model, the most infor…
▽ More
Evaluating clustering quality with reliable evaluation metrics like normalized mutual information (NMI) requires labeled data that can be expensive to annotate. We focus on the underexplored problem of estimating clustering quality with limited labels. We adapt existing approaches from the few-sample model evaluation literature to actively sub-sample, with a learned surrogate model, the most informative data points for annotation to estimate the evaluation metric. However, we find that their estimation can be biased and only relies on the labeled data. To that end, we introduce CEREAL, a comprehensive framework for few-sample clustering evaluation that extends active sampling approaches in three key ways. First, we propose novel NMI-based acquisition functions that account for the distinctive properties of clustering and uncertainties from a learned surrogate model. Next, we use ideas from semi-supervised learning and train the surrogate model with both the labeled and unlabeled data. Finally, we pseudo-label the unlabeled data with the surrogate model. We run experiments to estimate NMI in an active sampling pipeline on three datasets across vision and language. Our results show that CEREAL reduces the area under the absolute error curve by up to 57% compared to the best sampling baseline. We perform an extensive ablation study to show that our framework is agnostic to the choice of clustering algorithm and evaluation metric. We also extend CEREAL from clusterwise annotations to pairwise annotations. Overall, CEREAL can efficiently evaluate clustering with limited human annotations.
△ Less
Submitted 30 September, 2022;
originally announced October 2022.
-
Rotate the ReLU to implicitly sparsify deep networks
Authors:
Nancy Nayak,
Sheetal Kalyani
Abstract:
In the era of Deep Neural Network based solutions for a variety of real-life tasks, having a compact and energy-efficient deployable model has become fairly important. Most of the existing deep architectures use Rectifier Linear Unit (ReLU) activation. In this paper, we propose a novel idea of rotating the ReLU activation to give one more degree of freedom to the architecture. We show that this ac…
▽ More
In the era of Deep Neural Network based solutions for a variety of real-life tasks, having a compact and energy-efficient deployable model has become fairly important. Most of the existing deep architectures use Rectifier Linear Unit (ReLU) activation. In this paper, we propose a novel idea of rotating the ReLU activation to give one more degree of freedom to the architecture. We show that this activation wherein the rotation is learned via training results in the elimination of those parameters/filters in the network which are not important for the task. In other words, rotated ReLU seems to be doing implicit sparsification. The slopes of the rotated ReLU activations act as coarse feature extractors and unnecessary features can be eliminated before retraining. Our studies indicate that features always choose to pass through a lesser number of filters in architectures such as ResNet and its variants. Hence, by rotating the ReLU, the weights or the filters that are not necessary are automatically identified and can be dropped thus giving rise to significant savings in memory and computation. Furthermore, in some cases, we also notice that along with saving in memory and computation we also obtain improvement over the reported performance of the corresponding baseline work in the popular datasets such as MNIST, CIFAR-10, CIFAR-100, and SVHN.
△ Less
Submitted 1 June, 2022;
originally announced June 2022.
-
Identical Image Retrieval using Deep Learning
Authors:
Sayan Nath,
Nikhil Nayak
Abstract:
In recent years, we know that the interaction with images has increased. Image similarity involves fetching similar-looking images abiding by a given reference image. The target is to find out whether the image searched as a query can result in similar pictures. We are using the BigTransfer Model, which is a state-of-art model itself. BigTransfer(BiT) is essentially a ResNet but pre-trained on a l…
▽ More
In recent years, we know that the interaction with images has increased. Image similarity involves fetching similar-looking images abiding by a given reference image. The target is to find out whether the image searched as a query can result in similar pictures. We are using the BigTransfer Model, which is a state-of-art model itself. BigTransfer(BiT) is essentially a ResNet but pre-trained on a larger dataset like ImageNet and ImageNet-21k with additional modifications. Using the fine-tuned pre-trained Convolution Neural Network Model, we extract the key features and train on the K-Nearest Neighbor model to obtain the nearest neighbor. The application of our model is to find similar images, which are hard to achieve through text queries within a low inference time. We analyse the benchmark of our model based on this application.
△ Less
Submitted 18 May, 2022; v1 submitted 10 May, 2022;
originally announced May 2022.
-
Learning to Compose Soft Prompts for Compositional Zero-Shot Learning
Authors:
Nihal V. Nayak,
Peilin Yu,
Stephen H. Bach
Abstract:
We introduce compositional soft prompting (CSP), a parameter-efficient learning technique to improve the zero-shot compositionality of large-scale pretrained vision-language models (VLMs) like CLIP. We develop CSP for compositional zero-shot learning, the task of predicting unseen attribute-object compositions (e.g., old cat and young tiger). VLMs have a flexible text encoder that can represent ar…
▽ More
We introduce compositional soft prompting (CSP), a parameter-efficient learning technique to improve the zero-shot compositionality of large-scale pretrained vision-language models (VLMs) like CLIP. We develop CSP for compositional zero-shot learning, the task of predicting unseen attribute-object compositions (e.g., old cat and young tiger). VLMs have a flexible text encoder that can represent arbitrary classes as natural language prompts but they often underperform task-specific architectures on the compositional zero-shot benchmark datasets. CSP treats the attributes and objects that define classes as learnable tokens of vocabulary. During training, the vocabulary is tuned to recognize classes that compose tokens in multiple ways (e.g., old cat and white cat). At test time, we recompose the learned attribute-object vocabulary in new combinations to recognize novel classes. We show that CSP outperforms the CLIP on benchmark datasets by an average of 10.9 percentage points on AUC. CSP also outperforms CoOp, a soft prompting method that fine-tunes the prefix context tokens, by an average of 5.8 percentage points on AUC. We perform additional experiments to show that CSP improves generalization to higher-order attribute-attribute-object compositions (e.g., old white cat) and combinations of pretrained attributes and fine-tuned objects. The code is available at https://github.com/BatsResearch/csp.
△ Less
Submitted 24 April, 2023; v1 submitted 7 April, 2022;
originally announced April 2022.
-
PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts
Authors:
Stephen H. Bach,
Victor Sanh,
Zheng-Xin Yong,
Albert Webson,
Colin Raffel,
Nihal V. Nayak,
Abheesht Sharma,
Taewoon Kim,
M Saiful Bari,
Thibault Fevry,
Zaid Alyafeai,
Manan Dey,
Andrea Santilli,
Zhiqing Sun,
Srulik Ben-David,
Canwen Xu,
Gunjan Chhablani,
Han Wang,
Jason Alan Fries,
Maged S. Al-shaibani,
Shanya Sharma,
Urmish Thakker,
Khalid Almubarak,
Xiangru Tang,
Dragomir Radev
, et al. (2 additional authors not shown)
Abstract:
PromptSource is a system for creating, sharing, and using natural language prompts. Prompts are functions that map an example from a dataset to a natural language input and target output. Using prompts to train and query language models is an emerging area in NLP that requires new tools that let users develop and refine these prompts collaboratively. PromptSource addresses the emergent challenges…
▽ More
PromptSource is a system for creating, sharing, and using natural language prompts. Prompts are functions that map an example from a dataset to a natural language input and target output. Using prompts to train and query language models is an emerging area in NLP that requires new tools that let users develop and refine these prompts collaboratively. PromptSource addresses the emergent challenges in this new setting with (1) a templating language for defining data-linked prompts, (2) an interface that lets users quickly iterate on prompt development by observing outputs of their prompts on many examples, and (3) a community-driven set of guidelines for contributing new prompts to a common pool. Over 2,000 prompts for roughly 170 datasets are already available in PromptSource. PromptSource is available at https://github.com/bigscience-workshop/promptsource.
△ Less
Submitted 29 March, 2022; v1 submitted 2 February, 2022;
originally announced February 2022.
-
TAGLETS: A System for Automatic Semi-Supervised Learning with Auxiliary Data
Authors:
Wasu Piriyakulkij,
Cristina Menghini,
Ross Briden,
Nihal V. Nayak,
Jeffrey Zhu,
Elaheh Raisi,
Stephen H. Bach
Abstract:
Machine learning practitioners often have access to a spectrum of data: labeled data for the target task (which is often limited), unlabeled data, and auxiliary data, the many available labeled datasets for other tasks. We describe TAGLETS, a system built to study techniques for automatically exploiting all three types of data and creating high-quality, servable classifiers. The key components of…
▽ More
Machine learning practitioners often have access to a spectrum of data: labeled data for the target task (which is often limited), unlabeled data, and auxiliary data, the many available labeled datasets for other tasks. We describe TAGLETS, a system built to study techniques for automatically exploiting all three types of data and creating high-quality, servable classifiers. The key components of TAGLETS are: (1) auxiliary data organized according to a knowledge graph, (2) modules encapsulating different methods for exploiting auxiliary and unlabeled data, and (3) a distillation stage in which the ensembled modules are combined into a servable model. We compare TAGLETS with state-of-the-art transfer learning and semi-supervised learning methods on four image classification tasks. Our study covers a range of settings, varying the amount of labeled data and the semantic relatedness of the auxiliary data to the target task. We find that the intelligent incorporation of auxiliary and unlabeled data into multiple learning techniques enables TAGLETS to match-and most often significantly surpass-these alternatives. TAGLETS is available as an open-source system at github.com/BatsResearch/taglets.
△ Less
Submitted 5 May, 2022; v1 submitted 8 November, 2021;
originally announced November 2021.
-
Binarized ResNet: Enabling Robust Automatic Modulation Classification at the resource-constrained Edge
Authors:
Deepsayan Sadhukhan,
Nitin Priyadarshini Shankar,
Nancy Nayak,
Thulasi Tholeti,
Sheetal Kalyani
Abstract:
Recently, deep neural networks (DNNs) have been used extensively for automatic modulation classification (AMC), and the results have been quite promising. However, DNNs have high memory and computation requirements making them impractical for edge networks where the devices are resource-constrained. They are also vulnerable to adversarial attacks, which is a significant security concern. This work…
▽ More
Recently, deep neural networks (DNNs) have been used extensively for automatic modulation classification (AMC), and the results have been quite promising. However, DNNs have high memory and computation requirements making them impractical for edge networks where the devices are resource-constrained. They are also vulnerable to adversarial attacks, which is a significant security concern. This work proposes a rotated binary large ResNet (RBLResNet) for AMC that can be deployed at the edge network because of low memory and computational complexity. The performance gap between the RBLResNet and existing architectures with floating-point weights and activations can be closed by two proposed ensemble methods: (i) multilevel classification (MC), and (ii) bagging multiple RBLResNets while retaining low memory and computational power. The MC method achieves an accuracy of $93.39\%$ at $10$dB over all the $24$ modulation classes of the Deepsig dataset. This performance is comparable to state-of-the-art (SOTA) performances, with $4.75$ times lower memory and $1214$ times lower computation. Furthermore, RBLResNet also has high adversarial robustness compared to existing DNN models. The proposed MC method with RBLResNets has an adversarial accuracy of $87.25\%$ over a wide range of SNRs, surpassing the robustness of all existing SOTA methods to the best of our knowledge. Properties such as low memory, low computation, and the highest adversarial robustness make it a better choice for robust AMC in low-power edge devices.
△ Less
Submitted 17 April, 2023; v1 submitted 27 October, 2021;
originally announced October 2021.
-
Multitask Prompted Training Enables Zero-Shot Task Generalization
Authors:
Victor Sanh,
Albert Webson,
Colin Raffel,
Stephen H. Bach,
Lintang Sutawika,
Zaid Alyafeai,
Antoine Chaffin,
Arnaud Stiegler,
Teven Le Scao,
Arun Raja,
Manan Dey,
M Saiful Bari,
Canwen Xu,
Urmish Thakker,
Shanya Sharma Sharma,
Eliza Szczechla,
Taewoon Kim,
Gunjan Chhablani,
Nihal Nayak,
Debajyoti Datta,
Jonathan Chang,
Mike Tian-Jian Jiang,
Han Wang,
Matteo Manica,
Sheng Shen
, et al. (16 additional authors not shown)
Abstract:
Large language models have recently been shown to attain reasonable zero-shot generalization on a diverse set of tasks (Brown et al., 2020). It has been hypothesized that this is a consequence of implicit multitask learning in language models' pretraining (Radford et al., 2019). Can zero-shot generalization instead be directly induced by explicit multitask learning? To test this question at scale,…
▽ More
Large language models have recently been shown to attain reasonable zero-shot generalization on a diverse set of tasks (Brown et al., 2020). It has been hypothesized that this is a consequence of implicit multitask learning in language models' pretraining (Radford et al., 2019). Can zero-shot generalization instead be directly induced by explicit multitask learning? To test this question at scale, we develop a system for easily mapping any natural language tasks into a human-readable prompted form. We convert a large set of supervised datasets, each with multiple prompts with diverse wording. These prompted datasets allow for benchmarking the ability of a model to perform completely held-out tasks. We fine-tune a pretrained encoder-decoder model (Raffel et al., 2020; Lester et al., 2021) on this multitask mixture covering a wide variety of tasks. The model attains strong zero-shot performance on several standard datasets, often outperforming models up to 16x its size. Further, our approach attains strong performance on a subset of tasks from the BIG-bench benchmark, outperforming models up to 6x its size. All trained models are available at https://github.com/bigscience-workshop/t-zero and all prompts are available at https://github.com/bigscience-workshop/promptsource.
△ Less
Submitted 17 March, 2022; v1 submitted 15 October, 2021;
originally announced October 2021.
-
BayesAoA: A Bayesian method for Computation Efficient Angle of Arrival Estimation
Authors:
Akshay Sharma,
Nancy Nayak,
Sheetal Kalyani
Abstract:
The angle of Arrival (AoA) estimation is of great interest in modern communication systems. Traditional maximum likelihood-based iterative algorithms are sensitive to initialization and cannot be used online. We propose a Bayesian method to find AoA that is insensitive towards initialization. The proposed method is less complex and needs fewer computing resources than traditional deep learning-bas…
▽ More
The angle of Arrival (AoA) estimation is of great interest in modern communication systems. Traditional maximum likelihood-based iterative algorithms are sensitive to initialization and cannot be used online. We propose a Bayesian method to find AoA that is insensitive towards initialization. The proposed method is less complex and needs fewer computing resources than traditional deep learning-based methods. It has a faster convergence than the brute-force methods. Further, a Hedge type solution is proposed that helps to deploy the method online to handle the situations where the channel noise and antenna configuration in the receiver change over time. The proposed method achieves $92\%$ accuracy in a channel of noise variance $10^{-6}$ with $19.3\%$ of the brute-force method's computation.
△ Less
Submitted 15 October, 2021;
originally announced October 2021.
-
5G Traffic Prediction with Time Series Analysis
Authors:
Nikhil Nayak,
Rujula Singh R
Abstract:
In todays day and age, a mobile phone has become a basic requirement needed for anyone to thrive. With the cellular traffic demand increasing so dramatically, it is now necessary to accurately predict the user traffic in cellular networks, so as to improve the performance in terms of resource allocation and utilisation. By leveraging the power of machine learning and identifying its usefulness in…
▽ More
In todays day and age, a mobile phone has become a basic requirement needed for anyone to thrive. With the cellular traffic demand increasing so dramatically, it is now necessary to accurately predict the user traffic in cellular networks, so as to improve the performance in terms of resource allocation and utilisation. By leveraging the power of machine learning and identifying its usefulness in the field of cellular networks we try to achieve three main objectives classification of the application generating the traffic, prediction of packet arrival intensity and burst occurrence. The design of the prediction and classification system is done using Long Short Term Memory model. The LSTM predictor developed in this experiment would return the number of uplink packets and also estimate the probability of burst occurrence in the specified future time interval. For the purpose of classification, the regression layer in our LSTM prediction model is replaced by a softmax classifier which is used to classify the application generating the cellular traffic into one of the four applications including surfing, video calling, voice calling, and video streaming.
△ Less
Submitted 7 October, 2021;
originally announced October 2021.
-
Realizing Neural Decoder at the Edge with Ensembled BNN
Authors:
Devannagari Vikas,
Nancy Nayak,
Sheetal Kalyani
Abstract:
In this work, we propose extreme compression techniques like binarization, ternarization for Neural Decoders such as TurboAE. These methods reduce memory and computation by a factor of 64 with a performance better than the quantized (with 1-bit or 2-bits) Neural Decoders. However, because of the limited representation capability of the Binary and Ternary networks, the performance is not as good as…
▽ More
In this work, we propose extreme compression techniques like binarization, ternarization for Neural Decoders such as TurboAE. These methods reduce memory and computation by a factor of 64 with a performance better than the quantized (with 1-bit or 2-bits) Neural Decoders. However, because of the limited representation capability of the Binary and Ternary networks, the performance is not as good as the real-valued decoder. To fill this gap, we further propose to ensemble 4 such weak performers to deploy in the edge to achieve a performance similar to the real-valued network. These ensemble decoders give 16 and 64 times saving in memory and computation respectively and help to achieve performance similar to real-valued TurboAE.
△ Less
Submitted 18 June, 2021;
originally announced June 2021.
-
Zero-Shot Learning with Common Sense Knowledge Graphs
Authors:
Nihal V. Nayak,
Stephen H. Bach
Abstract:
Zero-shot learning relies on semantic class representations such as hand-engineered attributes or learned embeddings to predict classes without any labeled examples. We propose to learn class representations by embedding nodes from common sense knowledge graphs in a vector space. Common sense knowledge graphs are an untapped source of explicit high-level knowledge that requires little human effort…
▽ More
Zero-shot learning relies on semantic class representations such as hand-engineered attributes or learned embeddings to predict classes without any labeled examples. We propose to learn class representations by embedding nodes from common sense knowledge graphs in a vector space. Common sense knowledge graphs are an untapped source of explicit high-level knowledge that requires little human effort to apply to a range of tasks. To capture the knowledge in the graph, we introduce ZSL-KG, a general-purpose framework with a novel transformer graph convolutional network (TrGCN) for generating class representations. Our proposed TrGCN architecture computes non-linear combinations of node neighbourhoods. Our results show that ZSL-KG improves over existing WordNet-based methods on five out of six zero-shot benchmark datasets in language and vision.
△ Less
Submitted 25 August, 2022; v1 submitted 18 June, 2020;
originally announced June 2020.
-
Understanding Learning Dynamics of Binary Neural Networks via Information Bottleneck
Authors:
Vishnu Raj,
Nancy Nayak,
Sheetal Kalyani
Abstract:
Compact neural networks are essential for affordable and power efficient deep learning solutions. Binary Neural Networks (BNNs) take compactification to the extreme by constraining both weights and activations to two levels, $\{+1, -1\}$. However, training BNNs are not easy due to the discontinuity in activation functions, and the training dynamics of BNNs is not well understood. In this paper, we…
▽ More
Compact neural networks are essential for affordable and power efficient deep learning solutions. Binary Neural Networks (BNNs) take compactification to the extreme by constraining both weights and activations to two levels, $\{+1, -1\}$. However, training BNNs are not easy due to the discontinuity in activation functions, and the training dynamics of BNNs is not well understood. In this paper, we present an information-theoretic perspective of BNN training. We analyze BNNs through the Information Bottleneck principle and observe that the training dynamics of BNNs is considerably different from that of Deep Neural Networks (DNNs). While DNNs have a separate empirical risk minimization and representation compression phases, our numerical experiments show that in BNNs, both these phases are simultaneous. Since BNNs have a less expressive capacity, they tend to find efficient hidden representations concurrently with label fitting. Experiments in multiple datasets support these observations, and we see a consistent behavior across different activation functions in BNNs.
△ Less
Submitted 12 June, 2020;
originally announced June 2020.
-
Green DetNet: Computation and Memory efficient DetNet using Smart Compression and Training
Authors:
Nancy Nayak,
Thulasi Tholeti,
Muralikrishnan Srinivasan,
Sheetal Kalyani
Abstract:
This paper introduces an incremental training framework for compressing popular Deep Neural Network (DNN) based unfolded multiple-input-multiple-output (MIMO) detection algorithms like DetNet. The idea of incremental training is explored to select the optimal depth while training. To reduce the computation requirements or the number of FLoating point OPerations (FLOPs) and enforce sparsity in weig…
▽ More
This paper introduces an incremental training framework for compressing popular Deep Neural Network (DNN) based unfolded multiple-input-multiple-output (MIMO) detection algorithms like DetNet. The idea of incremental training is explored to select the optimal depth while training. To reduce the computation requirements or the number of FLoating point OPerations (FLOPs) and enforce sparsity in weights, the concept of structured regularization is explored using group LASSO and sparse group LASSO. Our methods lead to an astounding $98.9\%$ reduction in memory requirement and $81.63\%$ reduction in FLOPs when compared with DetNet without compromising on BER performance.
△ Less
Submitted 16 April, 2021; v1 submitted 20 March, 2020;
originally announced March 2020.
-
QnAMaker: Data to Bot in 2 Minutes
Authors:
Parag Agrawal,
Tulasi Menon,
Aya Kamel,
Michel Naim,
Chaikesh Chouragade,
Gurvinder Singh,
Rohan Kulkarni,
Anshuman Suri,
Sahithi Katakam,
Vineet Pratik,
Prakul Bansal,
Simerpreet Kaur,
Neha Rajput,
Anand Duggal,
Achraf Chalabi,
Prashant Choudhari,
Reddy Satti,
Niranjan Nayak
Abstract:
Having a bot for seamless conversations is a much-desired feature that products and services today seek for their websites and mobile apps. These bots help reduce traffic received by human support significantly by handling frequent and directly answerable known questions. Many such services have huge reference documents such as FAQ pages, which makes it hard for users to browse through this data.…
▽ More
Having a bot for seamless conversations is a much-desired feature that products and services today seek for their websites and mobile apps. These bots help reduce traffic received by human support significantly by handling frequent and directly answerable known questions. Many such services have huge reference documents such as FAQ pages, which makes it hard for users to browse through this data. A conversation layer over such raw data can lower traffic to human support by a great margin. We demonstrate QnAMaker, a service that creates a conversational layer over semi-structured data such as FAQ pages, product manuals, and support documents. QnAMaker is the popular choice for Extraction and Question-Answering as a service and is used by over 15,000 bots in production. It is also used by search interfaces and not just bots.
△ Less
Submitted 18 March, 2020;
originally announced March 2020.
-
Deep Reinforcement Learning based Blind mmWave MIMO Beam Alignment
Authors:
Vishnu Raj,
Nancy Nayak,
Sheetal Kalyani
Abstract:
Directional beamforming is a crucial component for realizing robust wireless communication systems using millimeter wave (mmWave) technology. Beam alignment using brute-force search of the space introduces time overhead while location aided blind beam alignment adds additional hardware requirements to the system. In this paper, we introduce a method for blind beam alignment based on the RF fingerp…
▽ More
Directional beamforming is a crucial component for realizing robust wireless communication systems using millimeter wave (mmWave) technology. Beam alignment using brute-force search of the space introduces time overhead while location aided blind beam alignment adds additional hardware requirements to the system. In this paper, we introduce a method for blind beam alignment based on the RF fingerprints of user equipment obtained by the base stations. The proposed system performs blind beam alignment on a multiple base station cellular environment with multiple mobile users using deep reinforcement learning. We present a novel neural network architecture that can handle a mix of both continuous and discrete actions and use policy gradient methods to train the model. Our results show that the proposed method can achieve a data rate of up to four times the traditional method without any overheads.
△ Less
Submitted 20 February, 2021; v1 submitted 24 January, 2020;
originally announced January 2020.
-
Leveraging online learning for CSS in frugal IoT network
Authors:
Nancy Nayak,
Vishnu Raj,
Sheetal Kalyani
Abstract:
We present a novel method for centralized collaborative spectrum sensing for IoT network leveraging cognitive radio network. Based on an online learning framework, we propose an algorithm to efficiently combine the individual sensing results based on the past performance of each detector. Additionally, we show how to utilize the learned normalized weights as a proxy metric of detection accuracy an…
▽ More
We present a novel method for centralized collaborative spectrum sensing for IoT network leveraging cognitive radio network. Based on an online learning framework, we propose an algorithm to efficiently combine the individual sensing results based on the past performance of each detector. Additionally, we show how to utilize the learned normalized weights as a proxy metric of detection accuracy and selectively enable the sensing at detectors. Our results show improved performance in terms of inter-user collision and misdetection. Further, by selectively enabling some of the devices in the network, we propose a strategy to extend the field life of devices without compromising on detection accuracy.
△ Less
Submitted 24 January, 2020; v1 submitted 16 July, 2019;
originally announced July 2019.
-
Building a Conversational Agent Overnight with Dialogue Self-Play
Authors:
Pararth Shah,
Dilek Hakkani-Tür,
Gokhan Tür,
Abhinav Rastogi,
Ankur Bapna,
Neha Nayak,
Larry Heck
Abstract:
We propose Machines Talking To Machines (M2M), a framework combining automation and crowdsourcing to rapidly bootstrap end-to-end dialogue agents for goal-oriented dialogues in arbitrary domains. M2M scales to new tasks with just a task schema and an API client from the dialogue system developer, but it is also customizable to cater to task-specific interactions. Compared to the Wizard-of-Oz appro…
▽ More
We propose Machines Talking To Machines (M2M), a framework combining automation and crowdsourcing to rapidly bootstrap end-to-end dialogue agents for goal-oriented dialogues in arbitrary domains. M2M scales to new tasks with just a task schema and an API client from the dialogue system developer, but it is also customizable to cater to task-specific interactions. Compared to the Wizard-of-Oz approach for data collection, M2M achieves greater diversity and coverage of salient dialogue flows while maintaining the naturalness of individual utterances. In the first phase, a simulated user bot and a domain-agnostic system bot converse to exhaustively generate dialogue "outlines", i.e. sequences of template utterances and their semantic parses. In the second phase, crowd workers provide contextual rewrites of the dialogues to make the utterances more natural while preserving their meaning. The entire process can finish within a few hours. We propose a new corpus of 3,000 dialogues spanning 2 domains collected with M2M, and present comparisons with popular dialogue datasets on the quality and diversity of the surface forms and dialogue flows.
△ Less
Submitted 15 January, 2018;
originally announced January 2018.