Search | arXiv e-print repository

Training-free Color-Style Disentanglement for Constrained Text-to-Image Synthesis

Authors: Aishwarya Agarwal, Srikrishna Karanam, Balaji Vasan Srinivasan

Abstract: We consider the problem of independently, in a disentangled fashion, controlling the outputs of text-to-image diffusion models with color and style attributes of a user-supplied reference image. We present the first training-free, test-time-only method to disentangle and condition text-to-image models on color and style attributes from reference image. To realize this, we propose two key innovatio… ▽ More We consider the problem of independently, in a disentangled fashion, controlling the outputs of text-to-image diffusion models with color and style attributes of a user-supplied reference image. We present the first training-free, test-time-only method to disentangle and condition text-to-image models on color and style attributes from reference image. To realize this, we propose two key innovations. Our first contribution is to transform the latent codes at inference time using feature transformations that make the covariance matrix of current generation follow that of the reference image, helping meaningfully transfer color. Next, we observe that there exists a natural disentanglement between color and style in the LAB image space, which we exploit to transform the self-attention feature maps of the image being generated with respect to those of the reference computed from its L channel. Both these operations happen purely at test time and can be done independently or merged. This results in a flexible method where color and style information can come from the same reference image or two different sources, and a new generation can seamlessly fuse them in either scenario. △ Less

Submitted 4 September, 2024; originally announced September 2024.

Comments: 16 pages, 17 figures

arXiv:2408.16749 [pdf]

Assessing Large Language Models for Online Extremism Research: Identification, Explanation, and New Knowledge

Authors: Beidi Dong, Jin R. Lee, Ziwei Zhu, Balassubramanian Srinivasan

Abstract: The United States has experienced a significant increase in violent extremism, prompting the need for automated tools to detect and limit the spread of extremist ideology online. This study evaluates the performance of Bidirectional Encoder Representations from Transformers (BERT) and Generative Pre-Trained Transformers (GPT) in detecting and classifying online domestic extremist posts. We collect… ▽ More The United States has experienced a significant increase in violent extremism, prompting the need for automated tools to detect and limit the spread of extremist ideology online. This study evaluates the performance of Bidirectional Encoder Representations from Transformers (BERT) and Generative Pre-Trained Transformers (GPT) in detecting and classifying online domestic extremist posts. We collected social media posts containing "far-right" and "far-left" ideological keywords and manually labeled them as extremist or non-extremist. Extremist posts were further classified into one or more of five contributing elements of extremism based on a working definitional framework. The BERT model's performance was evaluated based on training data size and knowledge transfer between categories. We also compared the performance of GPT 3.5 and GPT 4 models using different prompts: naïve, layperson-definition, role-playing, and professional-definition. Results showed that the best performing GPT models outperformed the best performing BERT models, with more detailed prompts generally yielding better results. However, overly complex prompts may impair performance. Different versions of GPT have unique sensitives to what they consider extremist. GPT 3.5 performed better at classifying far-left extremist posts, while GPT 4 performed better at classifying far-right extremist posts. Large language models, represented by GPT models, hold significant potential for online extremism classification tasks, surpassing traditional BERT models in a zero-shot setting. Future research should explore human-computer interactions in optimizing GPT models for extremist detection and classification tasks to develop more efficient (e.g., quicker, less effort) and effective (e.g., fewer errors or mistakes) methods for identifying extremist content. △ Less

Submitted 29 August, 2024; originally announced August 2024.

arXiv:2406.18893 [pdf, other]

AlignIT: Enhancing Prompt Alignment in Customization of Text-to-Image Models

Authors: Aishwarya Agarwal, Srikrishna Karanam, Balaji Vasan Srinivasan

Abstract: We consider the problem of customizing text-to-image diffusion models with user-supplied reference images. Given new prompts, the existing methods can capture the key concept from the reference images but fail to align the generated image with the prompt. In this work, we seek to address this key issue by proposing new methods that can easily be used in conjunction with existing customization meth… ▽ More We consider the problem of customizing text-to-image diffusion models with user-supplied reference images. Given new prompts, the existing methods can capture the key concept from the reference images but fail to align the generated image with the prompt. In this work, we seek to address this key issue by proposing new methods that can easily be used in conjunction with existing customization methods that optimize the embeddings/weights at various intermediate stages of the text encoding process. The first contribution of this paper is a dissection of the various stages of the text encoding process leading up to the conditioning vector for text-to-image models. We take a holistic view of existing customization methods and notice that key and value outputs from this process differs substantially from their corresponding baseline (non-customized) models (e.g., baseline stable diffusion). While this difference does not impact the concept being customized, it leads to other parts of the generated image not being aligned with the prompt. Further, we also observe that these keys and values allow independent control various aspects of the final generation, enabling semantic manipulation of the output. Taken together, the features spanning these keys and values, serve as the basis for our next contribution where we fix the aforementioned issues with existing methods. We propose a new post-processing algorithm, AlignIT, that infuses the keys and values for the concept of interest while ensuring the keys and values for all other tokens in the input prompt are unchanged. Our proposed method can be plugged in directly to existing customization methods, leading to a substantial performance improvement in the alignment of the final result with the input prompt while retaining the customization quality. △ Less

Submitted 27 June, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

Comments: 10 pages, 9 figures

arXiv:2406.06938 [pdf, other]

Post-Hoc Answer Attribution for Grounded and Trustworthy Long Document Comprehension: Task, Insights, and Challenges

Authors: Abhilasha Sancheti, Koustava Goswami, Balaji Vasan Srinivasan

Abstract: Attributing answer text to its source document for information-seeking questions is crucial for building trustworthy, reliable, and accountable systems. We formulate a new task of post-hoc answer attribution for long document comprehension (LDC). Owing to the lack of long-form abstractive and information-seeking LDC datasets, we refactor existing datasets to assess the strengths and weaknesses of… ▽ More Attributing answer text to its source document for information-seeking questions is crucial for building trustworthy, reliable, and accountable systems. We formulate a new task of post-hoc answer attribution for long document comprehension (LDC). Owing to the lack of long-form abstractive and information-seeking LDC datasets, we refactor existing datasets to assess the strengths and weaknesses of existing retrieval-based and proposed answer decomposition and textual entailment-based optimal selection attribution systems for this task. We throw light on the limitations of existing datasets and the need for datasets to assess the actual performance of systems on this task. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: Accepted to *SEM 2024

arXiv:2406.04673 [pdf, other]

MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models

Authors: Sanjoy Chowdhury, Sayan Nag, K J Joseph, Balaji Vasan Srinivasan, Dinesh Manocha

Abstract: Music is a universal language that can communicate emotions and feelings. It forms an essential part of the whole spectrum of creative media, ranging from movies to social media posts. Machine learning models that can synthesize music are predominantly conditioned on textual descriptions of it. Inspired by how musicians compose music not just from a movie script, but also through visualizations, w… ▽ More Music is a universal language that can communicate emotions and feelings. It forms an essential part of the whole spectrum of creative media, ranging from movies to social media posts. Machine learning models that can synthesize music are predominantly conditioned on textual descriptions of it. Inspired by how musicians compose music not just from a movie script, but also through visualizations, we propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music. MeLFusion is a text-to-music diffusion model with a novel "visual synapse", which effectively infuses the semantics from the visual modality into the generated music. To facilitate research in this area, we introduce a new dataset MeLBench, and propose a new evaluation metric IMSM. Our exhaustive experimental evaluation suggests that adding visual information to the music synthesis pipeline significantly improves the quality of generated music, measured both objectively and subjectively, with a relative gain of up to 67.98% on the FAD score. We hope that our work will gather attention to this pragmatic, yet relatively under-explored research area. △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: Accepted at CVPR 2024 as Highlight paper. Webpage: https://schowdhury671.github.io/melfusion_cvpr2024/

arXiv:2405.17980 [pdf, other]

Peering into the Mind of Language Models: An Approach for Attribution in Contextual Question Answering

Authors: Anirudh Phukan, Shwetha Somasundaram, Apoorv Saxena, Koustava Goswami, Balaji Vasan Srinivasan

Abstract: With the enhancement in the field of generative artificial intelligence (AI), contextual question answering has become extremely relevant. Attributing model generations to the input source document is essential to ensure trustworthiness and reliability. We observe that when large language models (LLMs) are used for contextual question answering, the output answer often consists of text copied verb… ▽ More With the enhancement in the field of generative artificial intelligence (AI), contextual question answering has become extremely relevant. Attributing model generations to the input source document is essential to ensure trustworthiness and reliability. We observe that when large language models (LLMs) are used for contextual question answering, the output answer often consists of text copied verbatim from the input prompt which is linked together with "glue text" generated by the LLM. Motivated by this, we propose that LLMs have an inherent awareness from where the text was copied, likely captured in the hidden states of the LLM. We introduce a novel method for attribution in contextual question answering, leveraging the hidden state representations of LLMs. Our approach bypasses the need for extensive model retraining and retrieval model overhead, offering granular attributions and preserving the quality of generated answers. Our experimental results demonstrate that our method performs on par or better than GPT-4 at identifying verbatim copied segments in LLM generations and in attributing these segments to their source. Importantly, our method shows robust performance across various LLM architectures, highlighting its broad applicability. Additionally, we present Verifiability-granular, an attribution dataset which has token level annotations for LLM generations in the contextual question answering setup. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2404.04521 [pdf, other]

Automated Computer Program Evaluation and Projects -- Our Experiences

Authors: Bama Srinivasan, Mala Nehru, Ranjani Parthasarathi, Saswati Mukherjee, Jeena A Thankachan

Abstract: This paper provides a few approaches to automating computer programming and project submission tasks, that we have been following for the last six years and have found to be successful. The approaches include using CodeRunner with Learning Management System (LMS) integration for programming practice and evaluation, and Git (GitHub) for project submissions and automatic code evaluation. In this pap… ▽ More This paper provides a few approaches to automating computer programming and project submission tasks, that we have been following for the last six years and have found to be successful. The approaches include using CodeRunner with Learning Management System (LMS) integration for programming practice and evaluation, and Git (GitHub) for project submissions and automatic code evaluation. In this paper, we describe the details of how we set up the tools and customized those for computer science courses. Based on our experiences, we also provide a few insights on using these tools for effective learning. △ Less

Submitted 6 April, 2024; originally announced April 2024.

Comments: 14 pages, 15 figures

ACM Class: D.3

Journal ref: https://www.sxcejournal.com/spe-apr-2023/17.pdf

arXiv:2402.14361 [pdf, other]

OpenTab: Advancing Large Language Models as Open-domain Table Reasoners

Authors: Kezhi Kong, Jiani Zhang, Zhengyuan Shen, Balasubramaniam Srinivasan, Chuan Lei, Christos Faloutsos, Huzefa Rangwala, George Karypis

Abstract: Large Language Models (LLMs) trained on large volumes of data excel at various natural language tasks, but they cannot handle tasks requiring knowledge that has not been trained on previously. One solution is to use a retriever that fetches relevant information to expand LLM's knowledge scope. However, existing textual-oriented retrieval-based LLMs are not ideal on structured table data due to div… ▽ More Large Language Models (LLMs) trained on large volumes of data excel at various natural language tasks, but they cannot handle tasks requiring knowledge that has not been trained on previously. One solution is to use a retriever that fetches relevant information to expand LLM's knowledge scope. However, existing textual-oriented retrieval-based LLMs are not ideal on structured table data due to diversified data modalities and large table sizes. In this work, we propose OpenTab, an open-domain table reasoning framework powered by LLMs. Overall, OpenTab leverages table retriever to fetch relevant tables and then generates SQL programs to parse the retrieved tables efficiently. Utilizing the intermediate data derived from the SQL executions, it conducts grounded inference to produce accurate response. Extensive experimental evaluation shows that OpenTab significantly outperforms baselines in both open- and closed-domain settings, achieving up to 21.5% higher accuracy. We further run ablation studies to validate the efficacy of our proposed designs of the system. △ Less

Submitted 12 April, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

Comments: Accepted by ICLR 2024

arXiv:2401.01637 [pdf, other]

Social Media Ready Caption Generation for Brands

Authors: Himanshu Maheshwari, Koustava Goswami, Apoorv Saxena, Balaji Vasan Srinivasan

Abstract: Social media advertisements are key for brand marketing, aiming to attract consumers with captivating captions and pictures or logos. While previous research has focused on generating captions for general images, incorporating brand personalities into social media captioning remains unexplored. Brand personalities are shown to be affecting consumers' behaviours and social interactions and thus are… ▽ More Social media advertisements are key for brand marketing, aiming to attract consumers with captivating captions and pictures or logos. While previous research has focused on generating captions for general images, incorporating brand personalities into social media captioning remains unexplored. Brand personalities are shown to be affecting consumers' behaviours and social interactions and thus are proven to be a key aspect of marketing strategies. Current open-source multimodal LLMs are not directly suited for this task. Hence, we propose a pipeline solution to assist brands in creating engaging social media captions that align with the image and the brand personalities. Our architecture is based on two parts: a the first part contains an image captioning model that takes in an image that the brand wants to post online and gives a plain English caption; b the second part takes in the generated caption along with the target brand personality and outputs a catchy personality-aligned social media caption. Along with brand personality, our system also gives users the flexibility to provide hashtags, Instagram handles, URLs, and named entities they want the caption to contain, making the captions more semantically related to the social media handles. Comparative evaluations against various baselines demonstrate the effectiveness of our approach, both qualitatively and quantitatively. △ Less

Submitted 3 January, 2024; originally announced January 2024.

arXiv:2311.15516 [pdf, other]

Active Foundational Models for Fault Diagnosis of Electrical Motors

Authors: Sriram Anbalagan, Sai Shashank GP, Deepesh Agarwal, Balasubramaniam Natarajan, Babji Srinivasan

Abstract: Fault detection and diagnosis of electrical motors are of utmost importance in ensuring the safe and reliable operation of several industrial systems. Detection and diagnosis of faults at the incipient stage allows corrective actions to be taken in order to reduce the severity of faults. The existing data-driven deep learning approaches for machine fault diagnosis rely extensively on huge amounts… ▽ More Fault detection and diagnosis of electrical motors are of utmost importance in ensuring the safe and reliable operation of several industrial systems. Detection and diagnosis of faults at the incipient stage allows corrective actions to be taken in order to reduce the severity of faults. The existing data-driven deep learning approaches for machine fault diagnosis rely extensively on huge amounts of labeled samples, where annotations are expensive and time-consuming. However, a major portion of unlabeled condition monitoring data is not exploited in the training process. To overcome this limitation, we propose a foundational model-based Active Learning framework that utilizes less amount of labeled samples, which are most informative and harnesses a large amount of available unlabeled data by effectively combining Active Learning and Contrastive Self-Supervised Learning techniques. It consists of a transformer network-based backbone model trained using an advanced nearest-neighbor contrastive self-supervised learning method. This approach empowers the backbone to learn improved representations of samples derived from raw, unlabeled vibration data. Subsequently, the backbone can undergo fine-tuning to address a range of downstream tasks, both within the same machines and across different machines. The effectiveness of the proposed methodology has been assessed through the fine-tuning of the backbone for multiple target tasks using three distinct machine-bearing fault datasets. The experimental evaluation demonstrates a superior performance as compared to existing state-of-the-art fault diagnosis methods with less amount of labeled data. △ Less

Submitted 26 November, 2023; originally announced November 2023.

Comments: 30 pages, 2 figures, 7 tables

arXiv:2311.11919 [pdf, other]

An Image is Worth Multiple Words: Multi-attribute Inversion for Constrained Text-to-Image Synthesis

Authors: Aishwarya Agarwal, Srikrishna Karanam, Tripti Shukla, Balaji Vasan Srinivasan

Abstract: We consider the problem of constraining diffusion model outputs with a user-supplied reference image. Our key objective is to extract multiple attributes (e.g., color, object, layout, style) from this single reference image, and then generate new samples with them. One line of existing work proposes to invert the reference images into a single textual conditioning vector, enabling generation of ne… ▽ More We consider the problem of constraining diffusion model outputs with a user-supplied reference image. Our key objective is to extract multiple attributes (e.g., color, object, layout, style) from this single reference image, and then generate new samples with them. One line of existing work proposes to invert the reference images into a single textual conditioning vector, enabling generation of new samples with this learned token. These methods, however, do not learn multiple tokens that are necessary to condition model outputs on the multiple attributes noted above. Another line of techniques expand the inversion space to learn multiple embeddings but they do this only along the layer dimension (e.g., one per layer of the DDPM model) or the timestep dimension (one for a set of timesteps in the denoising process), leading to suboptimal attribute disentanglement. To address the aforementioned gaps, the first contribution of this paper is an extensive analysis to determine which attributes are captured in which dimension of the denoising process. As noted above, we consider both the time-step dimension (in reverse denoising) as well as the DDPM model layer dimension. We observe that often a subset of these attributes are captured in the same set of model layers and/or across same denoising timesteps. For instance, color and style are captured across same U-Net layers, whereas layout and color are captured across same timestep stages. Consequently, an inversion process that is designed only for the time-step dimension or the layer dimension is insufficient to disentangle all attributes. This leads to our second contribution where we design a new multi-attribute inversion algorithm, MATTE, with associated disentanglement-enhancing regularization losses, that operates across both dimensions and explicitly leads to four disentangled tokens (color, style, layout, and object). △ Less

Submitted 20 November, 2023; originally announced November 2023.

arXiv:2311.11471 [pdf]

Towards AI enabled automated tracking of multiple boxers

Authors: A. S. Karthikeyan, Vipul Baghel, Anish Monsley Kirupakaran, John Warburton, Ranganathan Srinivasan, Babji Srinivasan, Ravi Sadananda Hegde

Abstract: Continuous tracking of boxers across multiple training sessions helps quantify traits required for the well-known ten-point-must system. However, continuous tracking of multiple athletes across multiple training sessions remains a challenge, because it is difficult to precisely segment bout boundaries in a recorded video stream. Furthermore, re-identification of the same athlete over different per… ▽ More Continuous tracking of boxers across multiple training sessions helps quantify traits required for the well-known ten-point-must system. However, continuous tracking of multiple athletes across multiple training sessions remains a challenge, because it is difficult to precisely segment bout boundaries in a recorded video stream. Furthermore, re-identification of the same athlete over different period or even within the same bout remains a challenge. Difficulties are further compounded when a single fixed view video is captured in top-view. This work summarizes our progress in creating a system in an economically single fixed top-view camera. Specifically, we describe improved algorithm for bout transition detection and in-bout continuous player identification without erroneous ID updation or ID switching. From our custom collected data of ~11 hours (athlete count: 45, bouts: 189), our transition detection algorithm achieves 90% accuracy and continuous ID tracking achieves IDU=0, IDS=0. △ Less

Submitted 9 August, 2023; originally announced November 2023.

arXiv:2311.07002 [pdf, other]

doi 10.1038/s41598-024-57281-x

PICS in Pics: Physics Informed Contour Selection for Rapid Image Segmentation

Authors: Vikas Dwivedi, Balaji Srinivasan, Ganapathy Krishnamurthi

Abstract: Effective training of deep image segmentation models is challenging due to the need for abundant, high-quality annotations. Generating annotations is laborious and time-consuming for human experts, especially in medical image segmentation. To facilitate image annotation, we introduce Physics Informed Contour Selection (PICS) - an interpretable, physics-informed algorithm for rapid image segmentati… ▽ More Effective training of deep image segmentation models is challenging due to the need for abundant, high-quality annotations. Generating annotations is laborious and time-consuming for human experts, especially in medical image segmentation. To facilitate image annotation, we introduce Physics Informed Contour Selection (PICS) - an interpretable, physics-informed algorithm for rapid image segmentation without relying on labeled data. PICS draws inspiration from physics-informed neural networks (PINNs) and an active contour model called snake. It is fast and computationally lightweight because it employs cubic splines instead of a deep neural network as a basis function. Its training parameters are physically interpretable because they directly represent control knots of the segmentation curve. Traditional snakes involve minimization of the edge-based loss functionals by deriving the Euler-Lagrange equation followed by its numerical solution. However, PICS directly minimizes the loss functional, bypassing the Euler Lagrange equations. It is the first snake variant to minimize a region-based loss function instead of traditional edge-based loss functions. PICS uniquely models the three-dimensional (3D) segmentation process with an unsteady partial differential equation (PDE), which allows accelerated segmentation via transfer learning. To demonstrate its effectiveness, we apply PICS for 3D segmentation of the left ventricle on a publicly available cardiac dataset. While doing so, we also introduce a new convexity-preserving loss term that encodes the shape information of the left ventricle to enhance PICS's segmentation quality. Overall, PICS presents several novelties in network architecture, transfer learning, and physics-inspired losses for image segmentation, thereby showing promising outcomes and potential for further refinement. △ Less

Submitted 12 November, 2023; originally announced November 2023.

arXiv:2310.20046 [pdf, other]

Which Examples to Annotate for In-Context Learning? Towards Effective and Efficient Selection

Authors: Costas Mavromatis, Balasubramaniam Srinivasan, Zhengyuan Shen, Jiani Zhang, Huzefa Rangwala, Christos Faloutsos, George Karypis

Abstract: Large Language Models (LLMs) can adapt to new tasks via in-context learning (ICL). ICL is efficient as it does not require any parameter updates to the trained LLM, but only few annotated examples as input for the LLM. In this work, we investigate an active learning approach for ICL, where there is a limited budget for annotating examples. We propose a model-adaptive optimization-free algorithm, t… ▽ More Large Language Models (LLMs) can adapt to new tasks via in-context learning (ICL). ICL is efficient as it does not require any parameter updates to the trained LLM, but only few annotated examples as input for the LLM. In this work, we investigate an active learning approach for ICL, where there is a limited budget for annotating examples. We propose a model-adaptive optimization-free algorithm, termed AdaICL, which identifies examples that the model is uncertain about, and performs semantic diversity-based example selection. Diversity-based sampling improves overall effectiveness, while uncertainty sampling improves budget efficiency and helps the LLM learn new information. Moreover, AdaICL poses its sampling strategy as a Maximum Coverage problem, that dynamically adapts based on the model's feedback and can be approximately solved via greedy algorithms. Extensive experiments on nine datasets and seven LLMs show that AdaICL improves performance by 4.4% accuracy points over SOTA (7.7% relative improvement), is up to 3x more budget-efficient than performing annotations uniformly at random, while it outperforms SOTA with 2x fewer ICL examples. △ Less

Submitted 30 October, 2023; originally announced October 2023.

arXiv:2310.13196 [pdf, other]

NameGuess: Column Name Expansion for Tabular Data

Authors: Jiani Zhang, Zhengyuan Shen, Balasubramaniam Srinivasan, Shen Wang, Huzefa Rangwala, George Karypis

Abstract: Recent advances in large language models have revolutionized many sectors, including the database industry. One common challenge when dealing with large volumes of tabular data is the pervasive use of abbreviated column names, which can negatively impact performance on various data search, access, and understanding tasks. To address this issue, we introduce a new task, called NameGuess, to expand… ▽ More Recent advances in large language models have revolutionized many sectors, including the database industry. One common challenge when dealing with large volumes of tabular data is the pervasive use of abbreviated column names, which can negatively impact performance on various data search, access, and understanding tasks. To address this issue, we introduce a new task, called NameGuess, to expand column names (used in database schema) as a natural language generation problem. We create a training dataset of 384K abbreviated-expanded column pairs using a new data fabrication method and a human-annotated evaluation benchmark that includes 9.2K examples from real-world tables. To tackle the complexities associated with polysemy and ambiguity in NameGuess, we enhance auto-regressive language models by conditioning on table content and column header names -- yielding a fine-tuned model (with 2.7B parameters) that matches human performance. Furthermore, we conduct a comprehensive analysis (on multiple LLMs) to validate the effectiveness of table content in NameGuess and identify promising future opportunities. Code has been made available at https://github.com/amazon-science/nameguess. △ Less

Submitted 19 October, 2023; originally announced October 2023.

Comments: This work has been accepted to EMNLP'23

arXiv:2310.09656 [pdf, other]

Mixed-Type Tabular Data Synthesis with Score-based Diffusion in Latent Space

Authors: Hengrui Zhang, Jiani Zhang, Balasubramaniam Srinivasan, Zhengyuan Shen, Xiao Qin, Christos Faloutsos, Huzefa Rangwala, George Karypis

Abstract: Recent advances in tabular data generation have greatly enhanced synthetic data quality. However, extending diffusion models to tabular data is challenging due to the intricately varied distributions and a blend of data types of tabular data. This paper introduces Tabsyn, a methodology that synthesizes tabular data by leveraging a diffusion model within a variational autoencoder (VAE) crafted late… ▽ More Recent advances in tabular data generation have greatly enhanced synthetic data quality. However, extending diffusion models to tabular data is challenging due to the intricately varied distributions and a blend of data types of tabular data. This paper introduces Tabsyn, a methodology that synthesizes tabular data by leveraging a diffusion model within a variational autoencoder (VAE) crafted latent space. The key advantages of the proposed Tabsyn include (1) Generality: the ability to handle a broad spectrum of data types by converting them into a single unified space and explicitly capture inter-column relations; (2) Quality: optimizing the distribution of latent embeddings to enhance the subsequent training of diffusion models, which helps generate high-quality synthetic data, (3) Speed: much fewer number of reverse steps and faster synthesis speed than existing diffusion-based methods. Extensive experiments on six datasets with five metrics demonstrate that Tabsyn outperforms existing methods. Specifically, it reduces the error rates by 86% and 67% for column-wise distribution and pair-wise column correlation estimations compared with the most competitive baselines. △ Less

Submitted 11 May, 2024; v1 submitted 14 October, 2023; originally announced October 2023.

Comments: Accepted by ICLR 2024 (Oral Presentation). Code is available at: https://github.com/amazon-science/tabsyn

arXiv:2310.03320 [pdf, other]

BioBridge: Bridging Biomedical Foundation Models via Knowledge Graphs

Authors: Zifeng Wang, Zichen Wang, Balasubramaniam Srinivasan, Vassilis N. Ioannidis, Huzefa Rangwala, Rishita Anubhai

Abstract: Foundation models (FMs) are able to leverage large volumes of unlabeled data to demonstrate superior performance across a wide range of tasks. However, FMs developed for biomedical domains have largely remained unimodal, i.e., independently trained and used for tasks on protein sequences alone, small molecule structures alone, or clinical data alone. To overcome this limitation of biomedical FMs,… ▽ More Foundation models (FMs) are able to leverage large volumes of unlabeled data to demonstrate superior performance across a wide range of tasks. However, FMs developed for biomedical domains have largely remained unimodal, i.e., independently trained and used for tasks on protein sequences alone, small molecule structures alone, or clinical data alone. To overcome this limitation of biomedical FMs, we present BioBridge, a novel parameter-efficient learning framework, to bridge independently trained unimodal FMs to establish multimodal behavior. BioBridge achieves it by utilizing Knowledge Graphs (KG) to learn transformations between one unimodal FM and another without fine-tuning any underlying unimodal FMs. Our empirical results demonstrate that BioBridge can beat the best baseline KG embedding methods (on average by around 76.3%) in cross-modal retrieval tasks. We also identify BioBridge demonstrates out-of-domain generalization ability by extrapolating to unseen modalities or relations. Additionally, we also show that BioBridge presents itself as a general purpose retriever that can aid biomedical multimodal question answering as well as enhance the guided generation of novel drugs. △ Less

Submitted 18 January, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

Comments: ICLR 2024

arXiv:2309.00613 [pdf, other]

Iterative Multi-granular Image Editing using Diffusion Models

Authors: K J Joseph, Prateksha Udhayanan, Tripti Shukla, Aishwarya Agarwal, Srikrishna Karanam, Koustava Goswami, Balaji Vasan Srinivasan

Abstract: Recent advances in text-guided image synthesis has dramatically changed how creative professionals generate artistic and aesthetically pleasing visual assets. To fully support such creative endeavors, the process should possess the ability to: 1) iteratively edit the generations and 2) control the spatial reach of desired changes (global, local or anything in between). We formalize this pragmatic… ▽ More Recent advances in text-guided image synthesis has dramatically changed how creative professionals generate artistic and aesthetically pleasing visual assets. To fully support such creative endeavors, the process should possess the ability to: 1) iteratively edit the generations and 2) control the spatial reach of desired changes (global, local or anything in between). We formalize this pragmatic problem setting as Iterative Multi-granular Editing. While there has been substantial progress with diffusion-based models for image synthesis and editing, they are all one shot (i.e., no iterative editing capabilities) and do not naturally yield multi-granular control (i.e., covering the full spectrum of local-to-global edits). To overcome these drawbacks, we propose EMILIE: Iterative Multi-granular Image Editor. EMILIE introduces a novel latent iteration strategy, which re-purposes a pre-trained diffusion model to facilitate iterative editing. This is complemented by a gradient control operation for multi-granular control. We introduce a new benchmark dataset to evaluate our newly proposed setting. We conduct exhaustive quantitatively and qualitatively evaluation against recent state-of-the-art approaches adapted to our task, to being out the mettle of EMILIE. We hope our work would attract attention to this newly identified, pragmatic problem setting. △ Less

Submitted 28 October, 2023; v1 submitted 1 September, 2023; originally announced September 2023.

Comments: Accepted to IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2024

arXiv:2308.16649 [pdf, other]

Learning with Multi-modal Gradient Attention for Explainable Composed Image Retrieval

Authors: Prateksha Udhayanan, Srikrishna Karanam, Balaji Vasan Srinivasan

Abstract: We consider the problem of composed image retrieval that takes an input query consisting of an image and a modification text indicating the desired changes to be made on the image and retrieves images that match these changes. Current state-of-the-art techniques that address this problem use global features for the retrieval, resulting in incorrect localization of the regions of interest to be mod… ▽ More We consider the problem of composed image retrieval that takes an input query consisting of an image and a modification text indicating the desired changes to be made on the image and retrieves images that match these changes. Current state-of-the-art techniques that address this problem use global features for the retrieval, resulting in incorrect localization of the regions of interest to be modified because of the global nature of the features, more so in cases of real-world, in-the-wild images. Since modifier texts usually correspond to specific local changes in an image, it is critical that models learn local features to be able to both localize and retrieve better. To this end, our key novelty is a new gradient-attention-based learning objective that explicitly forces the model to focus on the local regions of interest being modified in each retrieval step. We achieve this by first proposing a new visual image attention computation technique, which we call multi-modal gradient attention (MMGrad) that is explicitly conditioned on the modifier text. We next demonstrate how MMGrad can be incorporated into an end-to-end model training strategy with a new learning objective that explicitly forces these MMGrad attention maps to highlight the correct local regions corresponding to the modifier text. By training retrieval models with this new loss function, we show improved grounding by means of better visual attention maps, leading to better explainability of the models as well as competitive quantitative retrieval performance on standard benchmark datasets. △ Less

Submitted 31 August, 2023; originally announced August 2023.

arXiv:2307.16891 [pdf, other]

Foundational Models for Fault Diagnosis of Electrical Motors

Authors: Sriram Anbalagan, Deepesh Agarwal, Balasubramaniam Natarajan, Babji Srinivasan

Abstract: A majority of recent advancements related to the fault diagnosis of electrical motors are based on the assumption that training and testing data are drawn from the same distribution. However, the data distribution can vary across different operating conditions during real-world operating scenarios of electrical motors. Consequently, this assumption limits the practical implementation of existing s… ▽ More A majority of recent advancements related to the fault diagnosis of electrical motors are based on the assumption that training and testing data are drawn from the same distribution. However, the data distribution can vary across different operating conditions during real-world operating scenarios of electrical motors. Consequently, this assumption limits the practical implementation of existing studies for fault diagnosis, as they rely on fully labelled training data spanning all operating conditions and assume a consistent distribution. This is because obtaining a large number of labelled samples for several machines across different fault cases and operating scenarios may be unfeasible. In order to overcome the aforementioned limitations, this work proposes a framework to develop a foundational model for fault diagnosis of electrical motors. It involves building a neural network-based backbone to learn high-level features using self-supervised learning, and then fine-tuning the backbone to achieve specific objectives. The primary advantage of such an approach is that the backbone can be fine-tuned to achieve a wide variety of target tasks using very less amount of training data as compared to traditional supervised learning methodologies. The empirical evaluation demonstrates the effectiveness of the proposed approach by obtaining more than 90\% classification accuracy by fine-tuning the backbone not only across different types of fault scenarios or operating conditions, but also across different machines. This illustrates the promising potential of the proposed approach for cross-machine fault diagnosis tasks in real-world applications. △ Less

Submitted 31 July, 2023; originally announced July 2023.

Comments: 7 pages, 1 figure, 5 tables, submitted to IEEE PESGRE 2023

arXiv:2307.08623 [pdf, other]

HYTREL: Hypergraph-enhanced Tabular Data Representation Learning

Authors: Pei Chen, Soumajyoti Sarkar, Leonard Lausen, Balasubramaniam Srinivasan, Sheng Zha, Ruihong Huang, George Karypis

Abstract: Language models pretrained on large collections of tabular data have demonstrated their effectiveness in several downstream tasks. However, many of these models do not take into account the row/column permutation invariances, hierarchical structure, etc. that exist in tabular data. To alleviate these limitations, we propose HYTREL, a tabular language model, that captures the permutation invariance… ▽ More Language models pretrained on large collections of tabular data have demonstrated their effectiveness in several downstream tasks. However, many of these models do not take into account the row/column permutation invariances, hierarchical structure, etc. that exist in tabular data. To alleviate these limitations, we propose HYTREL, a tabular language model, that captures the permutation invariances and three more structural properties of tabular data by using hypergraphs - where the table cells make up the nodes and the cells occurring jointly together in each row, column, and the entire table are used to form three different types of hyperedges. We show that HYTREL is maximally invariant under certain conditions for tabular data, i.e., two tables obtain the same representations via HYTREL iff the two tables are identical up to permutations. Our empirical results demonstrate that HYTREL consistently outperforms other competitive baselines on four downstream tasks with minimal pretraining, illustrating the advantages of incorporating the inductive biases associated with tabular data into the representations. Finally, our qualitative analyses showcase that HYTREL can assimilate the table structures to generate robust representations for the cells, rows, columns, and the entire table. △ Less

Submitted 26 October, 2023; v1 submitted 14 July, 2023; originally announced July 2023.

Comments: NeurIPS 2023 (spotlight)

arXiv:2307.00910 [pdf, other]

CoPL: Contextual Prompt Learning for Vision-Language Understanding

Authors: Koustava Goswami, Srikrishna Karanam, Prateksha Udhayanan, K J Joseph, Balaji Vasan Srinivasan

Abstract: Recent advances in multimodal learning has resulted in powerful vision-language models, whose representations are generalizable across a variety of downstream tasks. Recently, their generalization ability has been further extended by incorporating trainable prompts, borrowed from the natural language processing literature. While such prompt learning techniques have shown impressive results, we ide… ▽ More Recent advances in multimodal learning has resulted in powerful vision-language models, whose representations are generalizable across a variety of downstream tasks. Recently, their generalization ability has been further extended by incorporating trainable prompts, borrowed from the natural language processing literature. While such prompt learning techniques have shown impressive results, we identify that these prompts are trained based on global image features which limits itself in two aspects: First, by using global features, these prompts could be focusing less on the discriminative foreground image, resulting in poor generalization to various out-of-distribution test cases. Second, existing work weights all prompts equally whereas intuitively, prompts should be reweighed according to the semantics of the image. We address these as part of our proposed Contextual Prompt Learning (CoPL) framework, capable of aligning the prompts to the localized features of the image. Our key innovations over earlier works include using local image features as part of the prompt learning process, and more crucially, learning to weight these prompts based on local features that are appropriate for the task at hand. This gives us dynamic prompts that are both aligned to local image features as well as aware of local contextual relationships. Our extensive set of experiments on a variety of standard and few-shot datasets show that our method produces substantially improved performance when compared to the current state of the art methods. We also demonstrate both few-shot and out-of-distribution performance to establish the utility of learning dynamic prompts that are aligned to local image features. △ Less

Submitted 12 December, 2023; v1 submitted 3 July, 2023; originally announced July 2023.

Comments: Accepted at AAAI 2024

arXiv:2306.14603 [pdf, other]

Learning with Difference Attention for Visually Grounded Self-supervised Representations

Authors: Aishwarya Agarwal, Srikrishna Karanam, Balaji Vasan Srinivasan

Abstract: Recent works in self-supervised learning have shown impressive results on single-object images, but they struggle to perform well on complex multi-object images as evidenced by their poor visual grounding. To demonstrate this concretely, we propose visual difference attention (VDA) to compute visual attention maps in an unsupervised fashion by comparing an image with its salient-regions-masked-out… ▽ More Recent works in self-supervised learning have shown impressive results on single-object images, but they struggle to perform well on complex multi-object images as evidenced by their poor visual grounding. To demonstrate this concretely, we propose visual difference attention (VDA) to compute visual attention maps in an unsupervised fashion by comparing an image with its salient-regions-masked-out version. We use VDA to derive attention maps for state-of-the art SSL methods and show they do not highlight all salient regions in an image accurately, suggesting their inability to learn strong representations for downstream tasks like segmentation. Motivated by these limitations, we cast VDA as a differentiable operation and propose a new learning objective, Differentiable Difference Attention (DiDA) loss, which leads to substantial improvements in an SSL model's visually grounding to an image's salient regions. △ Less

Submitted 26 June, 2023; originally announced June 2023.

Comments: 15 pages, 14 figures

arXiv:2306.14544 [pdf, other]

A-STAR: Test-time Attention Segregation and Retention for Text-to-image Synthesis

Authors: Aishwarya Agarwal, Srikrishna Karanam, K J Joseph, Apoorv Saxena, Koustava Goswami, Balaji Vasan Srinivasan

Abstract: While recent developments in text-to-image generative models have led to a suite of high-performing methods capable of producing creative imagery from free-form text, there are several limitations. By analyzing the cross-attention representations of these models, we notice two key issues. First, for text prompts that contain multiple concepts, there is a significant amount of pixel-space overlap (… ▽ More While recent developments in text-to-image generative models have led to a suite of high-performing methods capable of producing creative imagery from free-form text, there are several limitations. By analyzing the cross-attention representations of these models, we notice two key issues. First, for text prompts that contain multiple concepts, there is a significant amount of pixel-space overlap (i.e., same spatial regions) among pairs of different concepts. This eventually leads to the model being unable to distinguish between the two concepts and one of them being ignored in the final generation. Next, while these models attempt to capture all such concepts during the beginning of denoising (e.g., first few steps) as evidenced by cross-attention maps, this knowledge is not retained by the end of denoising (e.g., last few steps). Such loss of knowledge eventually leads to inaccurate generation outputs. To address these issues, our key innovations include two test-time attention-based loss functions that substantially improve the performance of pretrained baseline text-to-image diffusion models. First, our attention segregation loss reduces the cross-attention overlap between attention maps of different concepts in the text prompt, thereby reducing the confusion/conflict among various concepts and the eventual capture of all concepts in the generated output. Next, our attention retention loss explicitly forces text-to-image diffusion models to retain cross-attention information for all concepts across all denoising time steps, thereby leading to reduced information loss and the preservation of all concepts in the generated output. △ Less

Submitted 26 June, 2023; originally announced June 2023.

Comments: 15 pages, 16 figures

arXiv:2212.09825 [pdf, other]

What to Read in a Contract? Party-Specific Summarization of Legal Obligations, Entitlements, and Prohibitions

Authors: Abhilasha Sancheti, Aparna Garimella, Balaji Vasan Srinivasan, Rachel Rudinger

Abstract: Reviewing and comprehending key obligations, entitlements, and prohibitions in legal contracts can be a tedious task due to their length and domain-specificity. Furthermore, the key rights and duties requiring review vary for each contracting party. In this work, we propose a new task of party-specific extractive summarization for legal contracts to facilitate faster reviewing and improved compreh… ▽ More Reviewing and comprehending key obligations, entitlements, and prohibitions in legal contracts can be a tedious task due to their length and domain-specificity. Furthermore, the key rights and duties requiring review vary for each contracting party. In this work, we propose a new task of party-specific extractive summarization for legal contracts to facilitate faster reviewing and improved comprehension of rights and duties. To facilitate this, we curate a dataset comprising of party-specific pairwise importance comparisons annotated by legal experts, covering ~293K sentence pairs that include obligations, entitlements, and prohibitions extracted from lease agreements. Using this dataset, we train a pairwise importance ranker and propose a pipeline-based extractive summarization system that generates a party-specific contract summary. We establish the need for incorporating domain-specific notion of importance during summarization by comparing our system against various baselines using both automatic and human evaluation methods △ Less

Submitted 24 October, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

Comments: EMNLP 2023

arXiv:2211.12752 [pdf, other]

Agent-Specific Deontic Modality Detection in Legal Language

Authors: Abhilasha Sancheti, Aparna Garimella, Balaji Vasan Srinivasan, Rachel Rudinger

Abstract: Legal documents are typically long and written in legalese, which makes it particularly difficult for laypeople to understand their rights and duties. While natural language understanding technologies can be valuable in supporting such understanding in the legal domain, the limited availability of datasets annotated for deontic modalities in the legal domain, due to the cost of hiring experts and… ▽ More Legal documents are typically long and written in legalese, which makes it particularly difficult for laypeople to understand their rights and duties. While natural language understanding technologies can be valuable in supporting such understanding in the legal domain, the limited availability of datasets annotated for deontic modalities in the legal domain, due to the cost of hiring experts and privacy issues, is a bottleneck. To this end, we introduce, LEXDEMOD, a corpus of English contracts annotated with deontic modality expressed with respect to a contracting party or agent along with the modal triggers. We benchmark this dataset on two tasks: (i) agent-specific multi-label deontic modality classification, and (ii) agent-specific deontic modality and trigger span detection using Transformer-based (Vaswani et al., 2017) language models. Transfer learning experiments show that the linguistic diversity of modal expressions in LEXDEMOD generalizes reasonably from lease to employment and rental agreements. A small case study indicates that a model trained on LEXDEMOD can detect red flags with high recall. We believe our work offers a new research direction for deontic modality detection in the legal domain. △ Less

Submitted 23 November, 2022; originally announced November 2022.

Comments: Accepted at EMNLP 2022

arXiv:2203.10483 [pdf, other]

Entailment Relation Aware Paraphrase Generation

Authors: Abhilasha Sancheti, Balaji Vasan Srinivasan, Rachel Rudinger

Abstract: We introduce a new task of entailment relation aware paraphrase generation which aims at generating a paraphrase conforming to a given entailment relation (e.g. equivalent, forward entailing, or reverse entailing) with respect to a given input. We propose a reinforcement learning-based weakly-supervised paraphrasing system, ERAP, that can be trained using existing paraphrase and natural language i… ▽ More We introduce a new task of entailment relation aware paraphrase generation which aims at generating a paraphrase conforming to a given entailment relation (e.g. equivalent, forward entailing, or reverse entailing) with respect to a given input. We propose a reinforcement learning-based weakly-supervised paraphrasing system, ERAP, that can be trained using existing paraphrase and natural language inference (NLI) corpora without an explicit task-specific corpus. A combination of automated and human evaluations show that ERAP generates paraphrases conforming to the specified entailment relation and are of good quality as compared to the baselines and uncontrolled paraphrasing systems. Using ERAP for augmenting training data for downstream textual entailment task improves performance over an uncontrolled paraphrasing system, and introduces fewer training artifacts, indicating the benefit of explicit control during paraphrasing. △ Less

Submitted 20 March, 2022; originally announced March 2022.

Comments: 11 pages, 10 tables, 2 figures

arXiv:2111.02987 [pdf]

Numerical Approximation in CFD Problems Using Physics Informed Machine Learning

Authors: Siddharth Rout, Vikas Dwivedi, Balaji Srinivasan

Abstract: The thesis focuses on various techniques to find an alternate approximation method that could be universally used for a wide range of CFD problems but with low computational cost and low runtime. Various techniques have been explored within the field of machine learning to gauge the utility in fulfilling the core ambition. Steady advection diffusion problem has been used as the test case to unders… ▽ More The thesis focuses on various techniques to find an alternate approximation method that could be universally used for a wide range of CFD problems but with low computational cost and low runtime. Various techniques have been explored within the field of machine learning to gauge the utility in fulfilling the core ambition. Steady advection diffusion problem has been used as the test case to understand the level of complexity up to which a method can provide solution. Ultimately, the focus stays over physics informed machine learning techniques where solving differential equations is possible without any training with computed data. The prevalent methods by I.E. Lagaris et.al. and M. Raissi et.al are explored thoroughly. The prevalent methods cannot solve advection dominant problems. A physics informed method, called as Distributed Physics Informed Neural Network (DPINN), is proposed to solve advection dominant problems. It increases the lexibility and capability of older methods by splitting the domain and introducing other physics-based constraints as mean squared loss terms. Various experiments are done to explore the end to end possibilities with the method. Parametric study is also done to understand the behavior of the method to different tunable parameters. The method is tested over steady advection-diffusion problems and unsteady square pulse problems. Very accurate results are recorded. Extreme learning machine (ELM) is a very fast neural network algorithm at the cost of tunable parameters. The ELM based variant of the proposed model is tested over the advection-diffusion problem. ELM makes the complex optimization simpler and Since the method is non-iterative, the solution is recorded in a single shot. The ELM based variant seems to work better than the simple DPINN method. Simultaneously scope for various development in future are hinted throughout the thesis. △ Less

Submitted 1 November, 2021; originally announced November 2021.

arXiv:2110.15794 [pdf, other]

CLAUSEREC: A Clause Recommendation Framework for AI-aided Contract Authoring

Authors: Vinay Aggarwal, Aparna Garimella, Balaji Vasan Srinivasan, Anandhavelu N, Rajiv Jain

Abstract: Contracts are a common type of legal document that frequent in several day-to-day business workflows. However, there has been very limited NLP research in processing such documents, and even lesser in generating them. These contracts are made up of clauses, and the unique nature of these clauses calls for specific methods to understand and generate such documents. In this paper, we introduce the t… ▽ More Contracts are a common type of legal document that frequent in several day-to-day business workflows. However, there has been very limited NLP research in processing such documents, and even lesser in generating them. These contracts are made up of clauses, and the unique nature of these clauses calls for specific methods to understand and generate such documents. In this paper, we introduce the task of clause recommendation, asa first step to aid and accelerate the author-ing of contract documents. We propose a two-staged pipeline to first predict if a specific clause type is relevant to be added in a contract, and then recommend the top clauses for the given type based on the contract context. We pretrain BERT on an existing library of clauses with two additional tasks and use it for our prediction and recommendation. We experiment with classification methods and similarity-based heuristics for clause relevance prediction, and generation-based methods for clause recommendation, and evaluate the results from various methods on several clause types. We provide analyses on the results, and further outline the advantages and limitations of the various methods for this line of research. △ Less

Submitted 26 October, 2021; originally announced October 2021.

arXiv:2110.03785 [pdf, other]

Addressing practical challenges in Active Learning via a hybrid query strategy

Authors: Deepesh Agarwal, Pravesh Srivastava, Sergio Martin-del-Campo, Balasubramaniam Natarajan, Babji Srinivasan

Abstract: Active Learning (AL) is a powerful tool to address modern machine learning problems with significantly fewer labeled training instances. However, implementation of traditional AL methodologies in practical scenarios is accompanied by multiple challenges due to the inherent assumptions. There are several hindrances, such as unavailability of labels for the AL algorithm at the beginning; unreliable… ▽ More Active Learning (AL) is a powerful tool to address modern machine learning problems with significantly fewer labeled training instances. However, implementation of traditional AL methodologies in practical scenarios is accompanied by multiple challenges due to the inherent assumptions. There are several hindrances, such as unavailability of labels for the AL algorithm at the beginning; unreliable external source of labels during the querying process; or incompatible mechanisms to evaluate the performance of Active Learner. Inspired by these practical challenges, we present a hybrid query strategy-based AL framework that addresses three practical challenges simultaneously: cold-start, oracle uncertainty and performance evaluation of Active Learner in the absence of ground truth. While a pre-clustering approach is employed to address the cold-start problem, the uncertainty surrounding the expertise of labeler and confidence in the given labels is incorporated to handle oracle uncertainty. The heuristics obtained during the querying process serve as the fundamental premise for accessing the performance of Active Learner. The robustness of the proposed AL framework is evaluated across three different environments and industrial settings. The results demonstrate the capability of the proposed framework to tackle practical challenges during AL implementation in real-world scenarios. △ Less

Submitted 7 October, 2021; originally announced October 2021.

Comments: 15 pages, 4 figures, 6 tables

arXiv:2110.02910 [pdf, other]

Equivariant Subgraph Aggregation Networks

Authors: Beatrice Bevilacqua, Fabrizio Frasca, Derek Lim, Balasubramaniam Srinivasan, Chen Cai, Gopinath Balamurugan, Michael M. Bronstein, Haggai Maron

Abstract: Message-passing neural networks (MPNNs) are the leading architecture for deep learning on graph-structured data, in large part due to their simplicity and scalability. Unfortunately, it was shown that these architectures are limited in their expressive power. This paper proposes a novel framework called Equivariant Subgraph Aggregation Networks (ESAN) to address this issue. Our main observation is… ▽ More Message-passing neural networks (MPNNs) are the leading architecture for deep learning on graph-structured data, in large part due to their simplicity and scalability. Unfortunately, it was shown that these architectures are limited in their expressive power. This paper proposes a novel framework called Equivariant Subgraph Aggregation Networks (ESAN) to address this issue. Our main observation is that while two graphs may not be distinguishable by an MPNN, they often contain distinguishable subgraphs. Thus, we propose to represent each graph as a set of subgraphs derived by some predefined policy, and to process it using a suitable equivariant architecture. We develop novel variants of the 1-dimensional Weisfeiler-Leman (1-WL) test for graph isomorphism, and prove lower bounds on the expressiveness of ESAN in terms of these new WL variants. We further prove that our approach increases the expressive power of both MPNNs and more expressive architectures. Moreover, we provide theoretical results that describe how design choices such as the subgraph selection policy and equivariant neural architecture affect our architecture's expressive power. To deal with the increased computational cost, we propose a subgraph sampling scheme, which can be viewed as a stochastic version of our framework. A comprehensive set of experiments on real and synthetic datasets demonstrates that our framework improves the expressive power and overall performance of popular GNN architectures. △ Less

Submitted 16 March, 2022; v1 submitted 6 October, 2021; originally announced October 2021.

Comments: Published at ICLR 2022, Spotlight. 46 pages

arXiv:2104.07000 [pdf, other]

IGA : An Intent-Guided Authoring Assistant

Authors: Simeng Sun, Wenlong Zhao, Varun Manjunatha, Rajiv Jain, Vlad Morariu, Franck Dernoncourt, Balaji Vasan Srinivasan, Mohit Iyyer

Abstract: While large-scale pretrained language models have significantly improved writing assistance functionalities such as autocomplete, more complex and controllable writing assistants have yet to be explored. We leverage advances in language modeling to build an interactive writing assistant that generates and rephrases text according to fine-grained author specifications. Users provide input to our In… ▽ More While large-scale pretrained language models have significantly improved writing assistance functionalities such as autocomplete, more complex and controllable writing assistants have yet to be explored. We leverage advances in language modeling to build an interactive writing assistant that generates and rephrases text according to fine-grained author specifications. Users provide input to our Intent-Guided Assistant (IGA) in the form of text interspersed with tags that correspond to specific rhetorical directives (e.g., adding description or contrast, or rephrasing a particular sentence). We fine-tune a language model on a dataset heuristically-labeled with author intent, which allows IGA to fill in these tags with generated text that users can subsequently edit to their liking. A series of automatic and crowdsourced evaluations confirm the quality of IGA's generated outputs, while a small-scale user study demonstrates author preference for IGA over baseline methods in a creative writing task. We release our dataset, code, and demo to spur further research into AI-assisted writing. △ Less

Submitted 19 September, 2021; v1 submitted 14 April, 2021; originally announced April 2021.

Comments: EMNLP2021

arXiv:2101.11836 [pdf, other]

DRAG: Director-Generator Language Modelling Framework for Non-Parallel Author Stylized Rewriting

Authors: Hrituraj Singh, Gaurav Verma, Aparna Garimella, Balaji Vasan Srinivasan

Abstract: Author stylized rewriting is the task of rewriting an input text in a particular author's style. Recent works in this area have leveraged Transformer-based language models in a denoising autoencoder setup to generate author stylized text without relying on a parallel corpus of data. However, these approaches are limited by the lack of explicit control of target attributes and being entirely data-d… ▽ More Author stylized rewriting is the task of rewriting an input text in a particular author's style. Recent works in this area have leveraged Transformer-based language models in a denoising autoencoder setup to generate author stylized text without relying on a parallel corpus of data. However, these approaches are limited by the lack of explicit control of target attributes and being entirely data-driven. In this paper, we propose a Director-Generator framework to rewrite content in the target author's style, specifically focusing on certain target attributes. We show that our proposed framework works well even with a limited-sized target author corpus. Our experiments on corpora consisting of relatively small-sized text authored by three distinct authors show significant improvements upon existing works to rewrite input texts in target author's style. Our quantitative and qualitative analyses further show that our model has better meaning retention and results in more fluent generations. △ Less

Submitted 28 January, 2021; originally announced January 2021.

Comments: Accepted as Long Paper to EACL 2021

arXiv:2101.07773 [pdf, other]

Learning over Families of Sets -- Hypergraph Representation Learning for Higher Order Tasks

Authors: Balasubramaniam Srinivasan, Da Zheng, George Karypis

Abstract: Graph representation learning has made major strides over the past decade. However, in many relational domains, the input data are not suited for simple graph representations as the relationships between entities go beyond pairwise interactions. In such cases, the relationships in the data are better represented as hyperedges (set of entities) of a non-uniform hypergraph. While there have been wor… ▽ More Graph representation learning has made major strides over the past decade. However, in many relational domains, the input data are not suited for simple graph representations as the relationships between entities go beyond pairwise interactions. In such cases, the relationships in the data are better represented as hyperedges (set of entities) of a non-uniform hypergraph. While there have been works on principled methods for learning representations of nodes of a hypergraph, these approaches are limited in their applicability to tasks on non-uniform hypergraphs (hyperedges with different cardinalities). In this work, we exploit the incidence structure to develop a hypergraph neural network to learn provably expressive representations of variable sized hyperedges which preserve local-isomorphism in the line graph of the hypergraph, while also being invariant to permutations of its constituent vertices. Specifically, for a given vertex set, we propose frameworks for (1) hyperedge classification and (2) variable sized expansion of partially observed hyperedges which captures the higher order interactions among vertices and hyperedges. We evaluate performance on multiple real-world hypergraph datasets and demonstrate consistent, significant improvement in accuracy, over state-of-the-art models. △ Less

Submitted 19 January, 2021; originally announced January 2021.

Comments: Published as a conference paper at SIAM International Conference on Data Mining(SDM 2021)

arXiv:2010.11578 [pdf, other]

Multi-Style Transfer with Discriminative Feedback on Disjoint Corpus

Authors: Navita Goyal, Balaji Vasan Srinivasan, Anandhavelu Natarajan, Abhilasha Sancheti

Abstract: Style transfer has been widely explored in natural language generation with non-parallel corpus by directly or indirectly extracting a notion of style from source and target domain corpus. A common shortcoming of existing approaches is the prerequisite of joint annotations across all the stylistic dimensions under consideration. Availability of such dataset across a combination of styles limits th… ▽ More Style transfer has been widely explored in natural language generation with non-parallel corpus by directly or indirectly extracting a notion of style from source and target domain corpus. A common shortcoming of existing approaches is the prerequisite of joint annotations across all the stylistic dimensions under consideration. Availability of such dataset across a combination of styles limits the extension of these setups to multiple style dimensions. While cascading single-dimensional models across multiple styles is a possibility, it suffers from content loss, especially when the style dimensions are not completely independent of each other. In our work, we relax this requirement of jointly annotated data across multiple styles by using independently acquired data across different style dimensions without any additional annotations. We initialize an encoder-decoder setup with transformer-based language model pre-trained on a generic corpus and enhance its re-writing capability to multiple target style dimensions by employing multiple style-aware language models as discriminators. Through quantitative and qualitative evaluation, we show the ability of our model to control styles across multiple style dimensions while preserving content of the input text. We compare it against baselines involving cascaded state-of-the-art uni-dimensional style transfer models. △ Less

Submitted 12 April, 2021; v1 submitted 22 October, 2020; originally announced October 2020.

Report number: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3500â3510

arXiv:2010.11553 [pdf, other]

Incorporating Stylistic Lexical Preferences in Generative Language Models

Authors: Hrituraj Singh, Gaurav Verma, Balaji Vasan Srinivasan

Abstract: While recent advances in language modeling have resulted in powerful generation models, their generation style remains implicitly dependent on the training data and can not emulate a specific target style. Leveraging the generative capabilities of a transformer-based language models, we present an approach to induce certain target-author attributes by incorporating continuous multi-dimensional lex… ▽ More While recent advances in language modeling have resulted in powerful generation models, their generation style remains implicitly dependent on the training data and can not emulate a specific target style. Leveraging the generative capabilities of a transformer-based language models, we present an approach to induce certain target-author attributes by incorporating continuous multi-dimensional lexical preferences of an author into generative language models. We introduce rewarding strategies in a reinforcement learning framework that encourages the use of words across multiple categorical dimensions, to varying extents. Our experiments demonstrate that the proposed approach can generate text that distinctively aligns with a given target author's lexical style. We conduct quantitative and qualitative comparisons with competitive and relevant baselines to illustrate the benefits of the proposed approach. △ Less

Submitted 22 October, 2020; originally announced October 2020.

Comments: To Appear in Findings of EMNLP 2020

arXiv:2008.06457 [pdf, other]

doi 10.1007/978-3-030-93080-6_15

Abstracting Deep Neural Networks into Concept Graphs for Concept Level Interpretability

Authors: Avinash Kori, Parth Natekar, Ganapathy Krishnamurthi, Balaji Srinivasan

Abstract: The black-box nature of deep learning models prevents them from being completely trusted in domains like biomedicine. Most explainability techniques do not capture the concept-based reasoning that human beings follow. In this work, we attempt to understand the behavior of trained models that perform image processing tasks in the medical domain by building a graphical representation of the concepts… ▽ More The black-box nature of deep learning models prevents them from being completely trusted in domains like biomedicine. Most explainability techniques do not capture the concept-based reasoning that human beings follow. In this work, we attempt to understand the behavior of trained models that perform image processing tasks in the medical domain by building a graphical representation of the concepts they learn. Extracting such a graphical representation of the model's behavior on an abstract, higher conceptual level would unravel the learnings of these models and would help us to evaluate the steps taken by the model for predictions. We show the application of our proposed implementation on two biomedical problems - brain tumor segmentation and fundus image classification. We provide an alternative graphical representation of the model by formulating a concept level graph as discussed above, which makes the problem of intervention to find active inference trails more tractable. Understanding these trails would provide an understanding of the hierarchy of the decision-making process followed by the model. [As well as overall nature of model]. Our framework is available at https://github.com/koriavinash1/BioExp △ Less

Submitted 17 November, 2020; v1 submitted 14 August, 2020; originally announced August 2020.

arXiv:2005.05256 [pdf, other]

Reinforced Rewards Framework for Text Style Transfer

Authors: Abhilasha Sancheti, Kundan Krishna, Balaji Vasan Srinivasan, Anandhavelu Natarajan

Abstract: Style transfer deals with the algorithms to transfer the stylistic properties of a piece of text into that of another while ensuring that the core content is preserved. There has been a lot of interest in the field of text style transfer due to its wide application to tailored text generation. Existing works evaluate the style transfer models based on content preservation and transfer strength. In… ▽ More Style transfer deals with the algorithms to transfer the stylistic properties of a piece of text into that of another while ensuring that the core content is preserved. There has been a lot of interest in the field of text style transfer due to its wide application to tailored text generation. Existing works evaluate the style transfer models based on content preservation and transfer strength. In this work, we propose a reinforcement learning based framework that directly rewards the framework on these target metrics yielding a better transfer of the target style. We show the improved performance of our proposed framework based on automatic and human evaluation on three independent tasks: wherein we transfer the style of text from formal to informal, high excitement to low excitement, modern English to Shakespearean English, and vice-versa in all the three cases. Improved performance of the proposed framework over existing state-of-the-art frameworks indicates the viability of the approach. △ Less

Submitted 11 May, 2020; originally announced May 2020.

Comments: ECIR 2020

arXiv:2004.14243 [pdf, other]

Towards Transparent and Explainable Attention Models

Authors: Akash Kumar Mohankumar, Preksha Nema, Sharan Narasimhan, Mitesh M. Khapra, Balaji Vasan Srinivasan, Balaraman Ravindran

Abstract: Recent studies on interpretability of attention distributions have led to notions of faithful and plausible explanations for a model's predictions. Attention distributions can be considered a faithful explanation if a higher attention weight implies a greater impact on the model's prediction. They can be considered a plausible explanation if they provide a human-understandable justification for th… ▽ More Recent studies on interpretability of attention distributions have led to notions of faithful and plausible explanations for a model's predictions. Attention distributions can be considered a faithful explanation if a higher attention weight implies a greater impact on the model's prediction. They can be considered a plausible explanation if they provide a human-understandable justification for the model's predictions. In this work, we first explain why current attention mechanisms in LSTM based encoders can neither provide a faithful nor a plausible explanation of the model's predictions. We observe that in LSTM based encoders the hidden representations at different time-steps are very similar to each other (high conicity) and attention weights in these situations do not carry much meaning because even a random permutation of the attention weights does not affect the model's predictions. Based on experiments on a wide variety of tasks and datasets, we observe attention distributions often attribute the model's predictions to unimportant words such as punctuation and fail to offer a plausible explanation for the predictions. To make attention mechanisms more faithful and plausible, we propose a modified LSTM cell with a diversity-driven training objective that ensures that the hidden representations learned at different time steps are diverse. We show that the resulting attention distributions offer more transparency as they (i) provide a more precise importance ranking of the hidden states (ii) are better indicative of words important for the model's predictions (iii) correlate better with gradient-based attribution methods. Human evaluations indicate that the attention distributions learned by our model offer a plausible explanation of the model's predictions. Our code has been made publicly available at https://github.com/akashkm99/Interpretable-Attention △ Less

Submitted 29 April, 2020; originally announced April 2020.

Comments: Accepted at ACL 2020

arXiv:2001.00258 [pdf, other]

A Generalized Deep Learning Framework for Whole-Slide Image Segmentation and Analysis

Authors: Mahendra Khened, Avinash Kori, Haran Rajkumar, Balaji Srinivasan, Ganapathy Krishnamurthi

Abstract: Histopathology tissue analysis is considered the gold standard in cancer diagnosis and prognosis. Given the large size of these images and the increase in the number of potential cancer cases, an automated solution as an aid to histopathologists is highly desirable. In the recent past, deep learning-based techniques have provided state of the art results in a wide variety of image analysis tasks,… ▽ More Histopathology tissue analysis is considered the gold standard in cancer diagnosis and prognosis. Given the large size of these images and the increase in the number of potential cancer cases, an automated solution as an aid to histopathologists is highly desirable. In the recent past, deep learning-based techniques have provided state of the art results in a wide variety of image analysis tasks, including analysis of digitized slides. However, the size of images and variability in histopathology tasks makes it a challenge to develop an integrated framework for histopathology image analysis. We propose a deep learning-based framework for histopathology tissue analysis. We demonstrate the generalizability of our framework, including training and inference, on several open-source datasets, which include CAMELYON (breast cancer metastases), DigestPath (colon cancer), and PAIP (liver cancer) datasets. We discuss multiple types of uncertainties pertaining to data and model, namely aleatoric and epistemic, respectively. Simultaneously, we demonstrate our model generalization across different data distribution by evaluating some samples on TCGA data. On CAMELYON16 test data (n=139) for the task of lesion detection, the FROC score achieved was 0.86 and in the CAMELYON17 test-data (n=500) for the task of pN-staging the Cohen's kappa score achieved was 0.9090 (third in the open leaderboard). On DigestPath test data (n=212) for the task of tumor segmentation, a Dice score of 0.782 was achieved (fourth in the challenge). On PAIP test data (n=40) for the task of viable tumor segmentation, a Jaccard Index of 0.75 (third in the challenge) was achieved, and for viable tumor burden, a score of 0.633 was achieved (second in the challenge). Our entire framework and related documentation are freely available at GitHub and PyPi. △ Less

Submitted 18 November, 2020; v1 submitted 1 January, 2020; originally announced January 2020.

arXiv:1912.08492 [pdf, other]

Generating summaries tailored to target characteristics

Authors: Kushal Chawla, Hrituraj Singh, Arijit Pramanik, Mithlesh Kumar, Balaji Vasan Srinivasan

Abstract: Recently, research efforts have gained pace to cater to varied user preferences while generating text summaries. While there have been attempts to incorporate a few handpicked characteristics such as length or entities, a holistic view around these preferences is missing and crucial insights on why certain characteristics should be incorporated in a specific manner are absent. With this objective,… ▽ More Recently, research efforts have gained pace to cater to varied user preferences while generating text summaries. While there have been attempts to incorporate a few handpicked characteristics such as length or entities, a holistic view around these preferences is missing and crucial insights on why certain characteristics should be incorporated in a specific manner are absent. With this objective, we provide a categorization around these characteristics relevant to the task of text summarization: one, focusing on what content needs to be generated and second, focusing on the stylistic aspects of the output summaries. We use our insights to provide guidelines on appropriate methods to incorporate various classes characteristics in sequence-to-sequence summarization framework. Our experiments with incorporating topics, readability and simplicity indicate the viability of the proposed prescriptions △ Less

Submitted 18 December, 2019; originally announced December 2019.

Comments: Appeared in CiCLing 2019

arXiv:1911.08056 [pdf, other]

Modelling pressure-Hessian from local velocity gradients information in an incompressible turbulent flow field using deep neural networks

Authors: Nishant Parashar, Sawan S. Sinha, Balaji Srinivasan

Abstract: The understanding of the dynamics of the velocity gradients in turbulent flows is critical to understanding various non-linear turbulent processes. The pressure-Hessian and the viscous-Laplacian govern the evolution of the velocity-gradients and are known to be non-local in nature. Over the years, several simplified dynamical models have been proposed that models the viscous-Laplacian and the pres… ▽ More The understanding of the dynamics of the velocity gradients in turbulent flows is critical to understanding various non-linear turbulent processes. The pressure-Hessian and the viscous-Laplacian govern the evolution of the velocity-gradients and are known to be non-local in nature. Over the years, several simplified dynamical models have been proposed that models the viscous-Laplacian and the pressure-Hessian primarily in terms of local velocity gradients information. These models can also serve as closure models for the Lagrangian PDF methods. The recent fluid deformation closure model (RFDM) has been shown to retrieve excellent one-time statistics of the viscous process. However, the pressure-Hessian modelled by the RFDM has various physical limitations. In this work, we first demonstrate the limitations of the RFDM in estimating the pressure-Hessian. Further, we employ a tensor basis neural network (TBNN) to model the pressure-Hessian from the velocity gradient tensor itself. The neural network is trained on high-resolution data obtained from direct numerical simulation (DNS) of isotropic turbulence at Reynolds number of 433 (JHU turbulence database, JHTD). The predictions made by the TBNN are tested against two different isotropic turbulence datasets at Reynolds number of 433 (JHTD) and 315 (UP Madrid turbulence database, UPMTD) and channel flow dataset at Reynolds number of 1000 (UT Texas and JHTD). The evaluation of the neural network output is made in terms of the alignment statistics of the predicted pressure-Hessian eigenvectors with the strain-rate eigenvectors for turbulent isotropic flow as well as channel flow. Our analysis of the predicted solution leads to the discovery of ten unique coefficients of the tensor basis of strain-rate and rotation-rate tensors, the linear combination over which accurately captures key alignment statistics of the pressure-Hessian tensor. △ Less

Submitted 18 November, 2019; originally announced November 2019.

arXiv:1911.00523 [pdf, other]

What Gets Echoed? Understanding the "Pointers" in Explanations of Persuasive Arguments

Authors: David Atkinson, Kumar Bhargav Srinivasan, Chenhao Tan

Abstract: Explanations are central to everyday life, and are a topic of growing interest in the AI community. To investigate the process of providing natural language explanations, we leverage the dynamics of the /r/ChangeMyView subreddit to build a dataset with 36K naturally occurring explanations of why an argument is persuasive. We propose a novel word-level prediction task to investigate how explanation… ▽ More Explanations are central to everyday life, and are a topic of growing interest in the AI community. To investigate the process of providing natural language explanations, we leverage the dynamics of the /r/ChangeMyView subreddit to build a dataset with 36K naturally occurring explanations of why an argument is persuasive. We propose a novel word-level prediction task to investigate how explanations selectively reuse, or echo, information from what is being explained (henceforth, explanandum). We develop features to capture the properties of a word in the explanandum, and show that our proposed features not only have relatively strong predictive power on the echoing of a word in an explanation, but also enhance neural methods of generating explanations. In particular, while the non-contextual properties of a word itself are more valuable for stopwords, the interaction between the constituent parts of an explanandum is crucial in predicting the echoing of content words. We also find intriguing patterns of a word being echoed. For example, although nouns are generally less likely to be echoed, subjects and objects can, depending on their source, be more likely to be echoed in the explanations. △ Less

Submitted 1 November, 2019; originally announced November 2019.

Comments: 19 pages, 3 figures, EMNLP 2019, the code and dataset are available at https://chenhaot.com/papers/explanation-pointers.html

arXiv:1910.09563 [pdf, other]

doi 10.1145/3359265

Content Removal as a Moderation Strategy: Compliance and Other Outcomes in the ChangeMyView Community

Authors: Kumar Bhargav Srinivasan, Cristian Danescu-Niculescu-Mizil, Lillian Lee, Chenhao Tan

Abstract: Moderators of online communities often employ comment deletion as a tool. We ask here whether, beyond the positive effects of shielding a community from undesirable content, does comment removal actually cause the behavior of the comment's author to improve? We examine this question in a particularly well-moderated community, the ChangeMyView subreddit. The standard analytic approach of interrup… ▽ More Moderators of online communities often employ comment deletion as a tool. We ask here whether, beyond the positive effects of shielding a community from undesirable content, does comment removal actually cause the behavior of the comment's author to improve? We examine this question in a particularly well-moderated community, the ChangeMyView subreddit. The standard analytic approach of interrupted time-series analysis unfortunately cannot answer this question of causality because it fails to distinguish the effect of having made a non-compliant comment from the effect of being subjected to moderator removal of that comment. We therefore leverage a "delayed feedback" approach based on the observation that some users may remain active between the time when they posted the non-compliant comment and the time when that comment is deleted. Applying this approach to such users, we reveal the causal role of comment deletion in reducing immediate noncompliance rates, although we do not find evidence of it having a causal role in inducing other behavior improvements. Our work thus empirically demonstrates both the promise and some potential limits of content removal as a positive moderation strategy, and points to future directions for identifying causal effects from observational data. △ Less

Submitted 21 October, 2019; originally announced October 2019.

Comments: 21 pages, 8 figures, accepted at CSCW 2019, the dataset is available at https://chenhaot.com/papers/content-removal.html

arXiv:1910.00452 [pdf, other]

On the Equivalence between Positional Node Embeddings and Structural Graph Representations

Authors: Balasubramaniam Srinivasan, Bruno Ribeiro

Abstract: This work provides the first unifying theoretical framework for node (positional) embeddings and structural graph representations, bridging methods like matrix factorization and graph neural networks. Using invariant theory, we show that the relationship between structural representations and node embeddings is analogous to that of a distribution and its samples. We prove that all tasks that can b… ▽ More This work provides the first unifying theoretical framework for node (positional) embeddings and structural graph representations, bridging methods like matrix factorization and graph neural networks. Using invariant theory, we show that the relationship between structural representations and node embeddings is analogous to that of a distribution and its samples. We prove that all tasks that can be performed by node embeddings can also be performed by structural representations and vice-versa. We also show that the concept of transductive and inductive learning is unrelated to node embeddings and graph representations, clearing another source of confusion in the literature. Finally, we introduce new practical guidelines to generating and using node embeddings, which fixes significant shortcomings of standard operating procedures used today. △ Less

Submitted 21 September, 2020; v1 submitted 1 October, 2019; originally announced October 2019.

Comments: This version corrects some typos in the definition of Σ, it should be Σ_n. Code available at https://github.com/PurdueMINDS/Equivalence

Journal ref: Published as a conference paper at the Eighth International Conference on Learning Representations (ICLR 2020)

arXiv:1909.09962 [pdf, other]

Adapting Language Models for Non-Parallel Author-Stylized Rewriting

Authors: Bakhtiyar Syed, Gaurav Verma, Balaji Vasan Srinivasan, Anandhavelu Natarajan, Vasudeva Varma

Abstract: Given the recent progress in language modeling using Transformer-based neural models and an active interest in generating stylized text, we present an approach to leverage the generalization capabilities of a language model to rewrite an input text in a target author's style. Our proposed approach adapts a pre-trained language model to generate author-stylized text by fine-tuning on the author-spe… ▽ More Given the recent progress in language modeling using Transformer-based neural models and an active interest in generating stylized text, we present an approach to leverage the generalization capabilities of a language model to rewrite an input text in a target author's style. Our proposed approach adapts a pre-trained language model to generate author-stylized text by fine-tuning on the author-specific corpus using a denoising autoencoder (DAE) loss in a cascaded encoder-decoder framework. Optimizing over DAE loss allows our model to learn the nuances of an author's style without relying on parallel data, which has been a severe limitation of the previous related works in this space. To evaluate the efficacy of our approach, we propose a linguistically-motivated framework to quantify stylistic alignment of the generated text to the target author at lexical, syntactic and surface levels. The evaluation framework is both interpretable as it leads to several insights about the model, and self-contained as it does not rely on external classifiers, e.g. sentiment or formality classifiers. Qualitative and quantitative assessment indicates that the proposed approach rewrites the input text with better alignment to the target style while preserving the original content better than state-of-the-art baselines. △ Less

Submitted 31 October, 2020; v1 submitted 22 September, 2019; originally announced September 2019.

Comments: Accepted for publication in Main Technical Track at AAAI 20

arXiv:1909.08349 [pdf, other]

A Lexical, Syntactic, and Semantic Perspective for Understanding Style in Text

Authors: Gaurav Verma, Balaji Vasan Srinivasan

Abstract: With a growing interest in modeling inherent subjectivity in natural language, we present a linguistically-motivated process to understand and analyze the writing style of individuals from three perspectives: lexical, syntactic, and semantic. We discuss the stylistically expressive elements within each of these levels and use existing methods to quantify the linguistic intuitions related to some o… ▽ More With a growing interest in modeling inherent subjectivity in natural language, we present a linguistically-motivated process to understand and analyze the writing style of individuals from three perspectives: lexical, syntactic, and semantic. We discuss the stylistically expressive elements within each of these levels and use existing methods to quantify the linguistic intuitions related to some of these elements. We show that such a multi-level analysis is useful for developing a well-knit understanding of style - which is independent of the natural language task at hand, and also demonstrate its value in solving three downstream tasks: authors' style analysis, authorship attribution, and emotion prediction. We conduct experiments on a variety of datasets, comprising texts from social networking sites, user reviews, legal documents, literary books, and newswire. The results on the aforementioned tasks and datasets illustrate that such a multi-level understanding of style, which has been largely ignored in recent works, models style-related subjectivity in text and can be leveraged to improve performance on multiple downstream tasks both qualitatively and quantitatively. △ Less

Submitted 18 September, 2019; originally announced September 2019.

arXiv:1909.05355 [pdf, other]

Let's Ask Again: Refine Network for Automatic Question Generation

Authors: Preksha Nema, Akash Kumar Mohankumar, Mitesh M. Khapra, Balaji Vasan Srinivasan, Balaraman Ravindran

Abstract: In this work, we focus on the task of Automatic Question Generation (AQG) where given a passage and an answer the task is to generate the corresponding question. It is desired that the generated question should be (i) grammatically correct (ii) answerable from the passage and (iii) specific to the given answer. An analysis of existing AQG models shows that they produce questions which do not adher… ▽ More In this work, we focus on the task of Automatic Question Generation (AQG) where given a passage and an answer the task is to generate the corresponding question. It is desired that the generated question should be (i) grammatically correct (ii) answerable from the passage and (iii) specific to the given answer. An analysis of existing AQG models shows that they produce questions which do not adhere to one or more of {the above-mentioned qualities}. In particular, the generated questions look like an incomplete draft of the desired question with a clear scope for refinement. {To alleviate this shortcoming}, we propose a method which tries to mimic the human process of generating questions by first creating an initial draft and then refining it. More specifically, we propose Refine Network (RefNet) which contains two decoders. The second decoder uses a dual attention network which pays attention to both (i) the original passage and (ii) the question (initial draft) generated by the first decoder. In effect, it refines the question generated by the first decoder, thereby making it more correct and complete. We evaluate RefNet on three datasets, \textit{viz.}, SQuAD, HOTPOT-QA, and DROP, and show that it outperforms existing state-of-the-art methods by 7-16\% on all of these datasets. Lastly, we show that we can improve the quality of the second decoder on specific metrics, such as, fluency and answerability by explicitly rewarding revisions that improve on the corresponding metric during training. The code has been made publicly available \footnote{https://github.com/PrekshaNema25/RefNet-QG} △ Less

Submitted 31 August, 2019; originally announced September 2019.

Comments: accepted in EMNLP 2019 in Main Conference, (10 pages)

arXiv:1907.08967 [pdf, other]

doi 10.1016/j.neucom.2020.09.006

Distributed physics informed neural network for data-efficient solution to partial differential equations

Authors: Vikas Dwivedi, Nishant Parashar, Balaji Srinivasan

Abstract: The physics informed neural network (PINN) is evolving as a viable method to solve partial differential equations. In the recent past PINNs have been successfully tested and validated to find solutions to both linear and non-linear partial differential equations (PDEs). However, the literature lacks detailed investigation of PINNs in terms of their representation capability. In this work, we first… ▽ More The physics informed neural network (PINN) is evolving as a viable method to solve partial differential equations. In the recent past PINNs have been successfully tested and validated to find solutions to both linear and non-linear partial differential equations (PDEs). However, the literature lacks detailed investigation of PINNs in terms of their representation capability. In this work, we first test the original PINN method in terms of its capability to represent a complicated function. Further, to address the shortcomings of the PINN architecture, we propose a novel distributed PINN, named DPINN. We first perform a direct comparison of the proposed DPINN approach against PINN to solve a non-linear PDE (Burgers' equation). We show that DPINN not only yields a more accurate solution to the Burgers' equation, but it is found to be more data-efficient as well. At last, we employ our novel DPINN to two-dimensional steady-state Navier-Stokes equation, which is a system of non-linear PDEs. To the best of the authors' knowledge, this is the first such attempt to directly solve the Navier-Stokes equation using a physics informed neural network. △ Less

Submitted 21 July, 2019; originally announced July 2019.

Comments: 16 pages, 8 figures

Journal ref: Neurocomputing, 420, 299-316

arXiv:1907.03507 [pdf, other]

Physics Informed Extreme Learning Machine (PIELM) -- A rapid method for the numerical solution of partial differential equations

Authors: Vikas Dwivedi, Balaji Srinivasan

Abstract: There has been rapid progress recently on the application of deep networks to the solution of partial differential equations, collectively labelled as Physics Informed Neural Networks (PINNs). In this paper, we develop Physics Informed Extreme Learning Machine (PIELM), a rapid version of PINNs which can be applied to stationary and time dependent linear partial differential equations. We demonstra… ▽ More There has been rapid progress recently on the application of deep networks to the solution of partial differential equations, collectively labelled as Physics Informed Neural Networks (PINNs). In this paper, we develop Physics Informed Extreme Learning Machine (PIELM), a rapid version of PINNs which can be applied to stationary and time dependent linear partial differential equations. We demonstrate that PIELM matches or exceeds the accuracy of PINNs on a range of problems. We also discuss the limitations of neural network based approaches, including our PIELM, in the solution of PDEs on large domains and suggest an extension, a distributed version of our algorithm -{}- DPIELM. We show that DPIELM produces excellent results comparable to conventional numerical techniques in the solution of time-dependent problems. Collectively, this work contributes towards making the use of neural networks in the solution of partial differential equations in complex domains as a competitive alternative to conventional discretization techniques. △ Less

Submitted 8 July, 2019; originally announced July 2019.

Comments: 29 pages, 30 figures

Showing 1–50 of 59 results for author: Srinivasan, B