-
Categorizing Sources of Information for Explanations in Conversational AI Systems for Older Adults Aging in Place
Authors:
Niharika Mathur,
Tamara Zubatiy,
Elizabeth Mynatt
Abstract:
As the permeability of AI systems in interpersonal domains like the home expands, their technical capabilities of generating explanations are required to be aligned with user expectations for transparency and reasoning. This paper presents insights from our ongoing work in understanding the effectiveness of explanations in Conversational AI systems for older adults aging in place and their family…
▽ More
As the permeability of AI systems in interpersonal domains like the home expands, their technical capabilities of generating explanations are required to be aligned with user expectations for transparency and reasoning. This paper presents insights from our ongoing work in understanding the effectiveness of explanations in Conversational AI systems for older adults aging in place and their family caregivers. We argue that in collaborative and multi-user environments like the home, AI systems will make recommendations based on a host of information sources to generate explanations. These sources may be more or less salient based on user mental models of the system and the specific task. We highlight the need for cross technological collaboration between AI systems and other available sources of information in the home to generate multiple explanations for a single user query. Through example scenarios in a caregiving home setting, this paper provides an initial framework for categorizing these sources and informing a potential design space for AI explanations surrounding everyday tasks in the home.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Training-efficient density quantum machine learning
Authors:
Brian Coyle,
El Amine Cherrat,
Nishant Jain,
Natansh Mathur,
Snehal Raj,
Skander Kazdaghli,
Iordanis Kerenidis
Abstract:
Quantum machine learning requires powerful, flexible and efficiently trainable models to be successful in solving challenging problems. In this work, we present density quantum neural networks, a learning model incorporating randomisation over a set of trainable unitaries. These models generalise quantum neural networks using parameterised quantum circuits, and allow a trade-off between expressibi…
▽ More
Quantum machine learning requires powerful, flexible and efficiently trainable models to be successful in solving challenging problems. In this work, we present density quantum neural networks, a learning model incorporating randomisation over a set of trainable unitaries. These models generalise quantum neural networks using parameterised quantum circuits, and allow a trade-off between expressibility and efficient trainability, particularly on quantum hardware. We demonstrate the flexibility of the formalism by applying it to two recently proposed model families. The first are commuting-block quantum neural networks (QNNs) which are efficiently trainable but may be limited in expressibility. The second are orthogonal (Hamming-weight preserving) quantum neural networks which provide well-defined and interpretable transformations on data but are challenging to train at scale on quantum devices. Density commuting QNNs improve capacity with minimal gradient complexity overhead, and density orthogonal neural networks admit a quadratic-to-constant gradient query advantage with minimal to no performance loss. We conduct numerical experiments on synthetic translationally invariant data and MNIST image data with hyperparameter optimisation to support our findings. Finally, we discuss the connection to post-variational quantum neural networks, measurement-based quantum machine learning and the dropout mechanism.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
SVGCraft: Beyond Single Object Text-to-SVG Synthesis with Comprehensive Canvas Layout
Authors:
Ayan Banerjee,
Nityanand Mathur,
Josep Lladós,
Umapada Pal,
Anjan Dutta
Abstract:
Generating VectorArt from text prompts is a challenging vision task, requiring diverse yet realistic depictions of the seen as well as unseen entities. However, existing research has been mostly limited to the generation of single objects, rather than comprehensive scenes comprising multiple elements. In response, this work introduces SVGCraft, a novel end-to-end framework for the creation of vect…
▽ More
Generating VectorArt from text prompts is a challenging vision task, requiring diverse yet realistic depictions of the seen as well as unseen entities. However, existing research has been mostly limited to the generation of single objects, rather than comprehensive scenes comprising multiple elements. In response, this work introduces SVGCraft, a novel end-to-end framework for the creation of vector graphics depicting entire scenes from textual descriptions. Utilizing a pre-trained LLM for layout generation from text prompts, this framework introduces a technique for producing masked latents in specified bounding boxes for accurate object placement. It introduces a fusion mechanism for integrating attention maps and employs a diffusion U-Net for coherent composition, speeding up the drawing process. The resulting SVG is optimized using a pre-trained encoder and LPIPS loss with opacity modulation to maximize similarity. Additionally, this work explores the potential of primitive shapes in facilitating canvas completion in constrained environments. Through both qualitative and quantitative assessments, SVGCraft is demonstrated to surpass prior works in abstraction, recognizability, and detail, as evidenced by its performance metrics (CLIP-T: 0.4563, Cosine Similarity: 0.6342, Confusion: 0.66, Aesthetic: 6.7832). The code will be available at https://github.com/ayanban011/SVGCraft.
△ Less
Submitted 30 March, 2024;
originally announced April 2024.
-
DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Models
Authors:
Shyam Marjit,
Harshit Singh,
Nityanand Mathur,
Sayak Paul,
Chia-Mu Yu,
Pin-Yu Chen
Abstract:
In the realm of subject-driven text-to-image (T2I) generative models, recent developments like DreamBooth and BLIP-Diffusion have led to impressive results yet encounter limitations due to their intensive fine-tuning demands and substantial parameter requirements. While the low-rank adaptation (LoRA) module within DreamBooth offers a reduction in trainable parameters, it introduces a pronounced se…
▽ More
In the realm of subject-driven text-to-image (T2I) generative models, recent developments like DreamBooth and BLIP-Diffusion have led to impressive results yet encounter limitations due to their intensive fine-tuning demands and substantial parameter requirements. While the low-rank adaptation (LoRA) module within DreamBooth offers a reduction in trainable parameters, it introduces a pronounced sensitivity to hyperparameters, leading to a compromise between parameter efficiency and the quality of T2I personalized image synthesis. Addressing these constraints, we introduce \textbf{\textit{DiffuseKronA}}, a novel Kronecker product-based adaptation module that not only significantly reduces the parameter count by 35\% and 99.947\% compared to LoRA-DreamBooth and the original DreamBooth, respectively, but also enhances the quality of image synthesis. Crucially, \textit{DiffuseKronA} mitigates the issue of hyperparameter sensitivity, delivering consistent high-quality generations across a wide range of hyperparameters, thereby diminishing the necessity for extensive fine-tuning. Furthermore, a more controllable decomposition makes \textit{DiffuseKronA} more interpretable and even can achieve up to a 50\% reduction with results comparable to LoRA-Dreambooth. Evaluated against diverse and complex input images and text prompts, \textit{DiffuseKronA} consistently outperforms existing models, producing diverse images of higher quality with improved fidelity and a more accurate color distribution of objects, all the while upholding exceptional parameter efficiency, thus presenting a substantial advancement in the field of T2I generative modeling. Our project page, consisting of links to the code, and pre-trained checkpoints, is available at https://diffusekrona.github.io/.
△ Less
Submitted 28 February, 2024; v1 submitted 27 February, 2024;
originally announced February 2024.
-
RL Dreams: Policy Gradient Optimization for Score Distillation based 3D Generation
Authors:
Aradhya N. Mathur,
Phu Pham,
Aniket Bera,
Ojaswa Sharma
Abstract:
3D generation has rapidly accelerated in the past decade owing to the progress in the field of generative modeling. Score Distillation Sampling (SDS) based rendering has improved 3D asset generation to a great extent. Further, the recent work of Denoising Diffusion Policy Optimization (DDPO) demonstrates that the diffusion process is compatible with policy gradient methods and has been demonstrate…
▽ More
3D generation has rapidly accelerated in the past decade owing to the progress in the field of generative modeling. Score Distillation Sampling (SDS) based rendering has improved 3D asset generation to a great extent. Further, the recent work of Denoising Diffusion Policy Optimization (DDPO) demonstrates that the diffusion process is compatible with policy gradient methods and has been demonstrated to improve the 2D diffusion models using an aesthetic scoring function. We first show that this aesthetic scorer acts as a strong guide for a variety of SDS-based methods and demonstrates its effectiveness in text-to-3D synthesis. Further, we leverage the DDPO approach to improve the quality of the 3D rendering obtained from 2D diffusion models. Our approach, DDPO3D, employs the policy gradient method in tandem with aesthetic scoring. To the best of our knowledge, this is the first method that extends policy gradient methods to 3D score-based rendering and shows improvement across SDS-based methods such as DreamGaussian, which are currently driving research in text-to-3D synthesis. Our approach is compatible with score distillation-based methods, which would facilitate the integration of diverse reward functions into the generative process. Our project page can be accessed via https://ddpo3d.github.io.
△ Less
Submitted 7 December, 2023;
originally announced December 2023.
-
CLIPDrawX: Primitive-based Explanations for Text Guided Sketch Synthesis
Authors:
Nityanand Mathur,
Shyam Marjit,
Abhra Chaudhuri,
Anjan Dutta
Abstract:
With the goal of understanding the visual concepts that CLIP associates with text prompts, we show that the latent space of CLIP can be visualized solely in terms of linear transformations on simple geometric primitives like circles and straight lines. Although existing approaches achieve this by sketch-synthesis-through-optimization, they do so on the space of Bézier curves, which exhibit a waste…
▽ More
With the goal of understanding the visual concepts that CLIP associates with text prompts, we show that the latent space of CLIP can be visualized solely in terms of linear transformations on simple geometric primitives like circles and straight lines. Although existing approaches achieve this by sketch-synthesis-through-optimization, they do so on the space of Bézier curves, which exhibit a wastefully large set of structures that they can evolve into, as most of them are non-essential for generating meaningful sketches. We present CLIPDrawX, an algorithm that provides significantly better visualizations for CLIP text embeddings, using only simple primitive shapes like straight lines and circles. This constrains the set of possible outputs to linear transformations on these primitives, thereby exhibiting an inherently simpler mathematical form. The synthesis process of CLIPDrawX can be tracked end-to-end, with each visual concept being explained exclusively in terms of primitives. Implementation will be released upon acceptance. Project Page: $\href{https://clipdrawx.github.io/}{\text{https://clipdrawx.github.io/}}$.
△ Less
Submitted 4 December, 2023;
originally announced December 2023.
-
Improved Financial Forecasting via Quantum Machine Learning
Authors:
Sohum Thakkar,
Skander Kazdaghli,
Natansh Mathur,
Iordanis Kerenidis,
André J. Ferreira-Martins,
Samurai Brito
Abstract:
Quantum algorithms have the potential to enhance machine learning across a variety of domains and applications. In this work, we show how quantum machine learning can be used to improve financial forecasting. First, we use classical and quantum Determinantal Point Processes to enhance Random Forest models for churn prediction, improving precision by almost 6%. Second, we design quantum neural netw…
▽ More
Quantum algorithms have the potential to enhance machine learning across a variety of domains and applications. In this work, we show how quantum machine learning can be used to improve financial forecasting. First, we use classical and quantum Determinantal Point Processes to enhance Random Forest models for churn prediction, improving precision by almost 6%. Second, we design quantum neural network architectures with orthogonal and compound layers for credit risk assessment, which match classical performance with significantly fewer parameters. Our results demonstrate that leveraging quantum ideas can effectively enhance the performance of machine learning, both today as quantum-inspired classical ML solutions, and even more in the future, with the advent of better quantum hardware.
△ Less
Submitted 3 April, 2024; v1 submitted 31 May, 2023;
originally announced June 2023.
-
Assessing New Hires' Programming Productivity Through UMETRIX -- An Industry Case Study
Authors:
Sai Anirudh Karre,
Neeraj Mathur,
Y. Raghu Reddy
Abstract:
New hires (novice or experienced) usually undergo an onboarding program for a specific period to get acquainted with the processes of the hiring organization to reach expected programming productivity levels. This paper presents a programming productivity framework developed as an outcome of a three-year-long industry study with small to medium-scale organizations using a usability evaluation and…
▽ More
New hires (novice or experienced) usually undergo an onboarding program for a specific period to get acquainted with the processes of the hiring organization to reach expected programming productivity levels. This paper presents a programming productivity framework developed as an outcome of a three-year-long industry study with small to medium-scale organizations using a usability evaluation and code recommendation tool, UMETRIX, to manage new hire programming productivity. We developed a programming productivity framework around this tool called "Utpada" Participating organizations expressed strong interest in relying on this programming productivity framework to assess the skill gap among new hires. It helped identify under-performers early and strategize their upskill plan per their business needs. The participating organizations have seen an 89% rise in quality code contributions by new hires during their probation period compared to traditional new hires'. This framework is reproducible for any new-hire team size and can be easily integrated into existing programming productivity improvement programs.
△ Less
Submitted 5 May, 2023;
originally announced May 2023.
-
Quantum Vision Transformers
Authors:
El Amine Cherrat,
Iordanis Kerenidis,
Natansh Mathur,
Jonas Landman,
Martin Strahm,
Yun Yvonna Li
Abstract:
In this work, quantum transformers are designed and analysed in detail by extending the state-of-the-art classical transformer neural network architectures known to be very performant in natural language processing and image analysis. Building upon the previous work, which uses parametrised quantum circuits for data loading and orthogonal neural layers, we introduce three types of quantum transfor…
▽ More
In this work, quantum transformers are designed and analysed in detail by extending the state-of-the-art classical transformer neural network architectures known to be very performant in natural language processing and image analysis. Building upon the previous work, which uses parametrised quantum circuits for data loading and orthogonal neural layers, we introduce three types of quantum transformers for training and inference, including a quantum transformer based on compound matrices, which guarantees a theoretical advantage of the quantum attention mechanism compared to their classical counterpart both in terms of asymptotic run time and the number of model parameters. These quantum architectures can be built using shallow quantum circuits and produce qualitatively different classification models. The three proposed quantum attention layers vary on the spectrum between closely following the classical transformers and exhibiting more quantum characteristics. As building blocks of the quantum transformer, we propose a novel method for loading a matrix as quantum states as well as two new trainable quantum orthogonal layers adaptable to different levels of connectivity and quality of quantum computers. We performed extensive simulations of the quantum transformers on standard medical image datasets that showed competitively, and at times better performance compared to the classical benchmarks, including the best-in-class classical vision transformers. The quantum transformers we trained on these small-scale datasets require fewer parameters compared to standard classical benchmarks. Finally, we implemented our quantum transformers on superconducting quantum computers and obtained encouraging results for up to six qubit experiments.
△ Less
Submitted 20 February, 2024; v1 submitted 16 September, 2022;
originally announced September 2022.
-
LIFI: Towards Linguistically Informed Frame Interpolation
Authors:
Aradhya Neeraj Mathur,
Devansh Batra,
Yaman Kumar,
Rajiv Ratn Shah,
Roger Zimmermann
Abstract:
In this work, we explore a new problem of frame interpolation for speech videos. Such content today forms the major form of online communication. We try to solve this problem by using several deep learning video generation algorithms to generate the missing frames. We also provide examples where computer vision models despite showing high performance on conventional non-linguistic metrics fail to…
▽ More
In this work, we explore a new problem of frame interpolation for speech videos. Such content today forms the major form of online communication. We try to solve this problem by using several deep learning video generation algorithms to generate the missing frames. We also provide examples where computer vision models despite showing high performance on conventional non-linguistic metrics fail to accurately produce faithful interpolation of speech. With this motivation, we provide a new set of linguistically-informed metrics specifically targeted to the problem of speech videos interpolation. We also release several datasets to test computer vision video generation models of their speech understanding.
△ Less
Submitted 2 December, 2020; v1 submitted 30 October, 2020;
originally announced October 2020.
-
Tangled up in BLEU: Reevaluating the Evaluation of Automatic Machine Translation Evaluation Metrics
Authors:
Nitika Mathur,
Timothy Baldwin,
Trevor Cohn
Abstract:
Automatic metrics are fundamental for the development and evaluation of machine translation systems. Judging whether, and to what extent, automatic metrics concur with the gold standard of human evaluation is not a straightforward problem. We show that current methods for judging metrics are highly sensitive to the translations used for assessment, particularly the presence of outliers, which ofte…
▽ More
Automatic metrics are fundamental for the development and evaluation of machine translation systems. Judging whether, and to what extent, automatic metrics concur with the gold standard of human evaluation is not a straightforward problem. We show that current methods for judging metrics are highly sensitive to the translations used for assessment, particularly the presence of outliers, which often leads to falsely confident conclusions about a metric's efficacy. Finally, we turn to pairwise system ranking, developing a method for thresholding performance improvement under an automatic metric against human judgements, which allows quantification of type I versus type II errors incurred, i.e., insignificant human differences in system quality that are accepted, and significant human differences that are rejected. Together, these findings suggest improvements to the protocols for metric evaluation and system performance evaluation in machine translation.
△ Less
Submitted 12 June, 2020; v1 submitted 11 June, 2020;
originally announced June 2020.
-
Multimodal Medical Volume Colorization from 2D Style
Authors:
Aradhya Neeraj Mathur,
Apoorv Khattar,
Ojaswa Sharma
Abstract:
Colorization involves the synthesis of colors on a target image while preserving structural content as well as the semantics of the target image. This is a well-explored problem in 2D with many state-of-the-art solutions. We propose a novel deep learning-based approach for the colorization of 3D medical volumes. Our system is capable of directly mapping the colors of a 2D photograph to a 3D MRI vo…
▽ More
Colorization involves the synthesis of colors on a target image while preserving structural content as well as the semantics of the target image. This is a well-explored problem in 2D with many state-of-the-art solutions. We propose a novel deep learning-based approach for the colorization of 3D medical volumes. Our system is capable of directly mapping the colors of a 2D photograph to a 3D MRI volume in real-time, producing a high-fidelity color volume suitable for photo-realistic visualization. Since this work is first of its kind, we discuss the full pipeline in detail and the challenges that it brings for 3D medical data. The colorization of medical MRI volume also entails modality conversion that highlights the robustness of our approach in handling multi-modal data.
△ Less
Submitted 6 April, 2020;
originally announced April 2020.
-
Load Balancing Optimization in LTE/LTE-A Cellular Networks: A Review
Authors:
Sumita Mishra,
Nidhi Mathur
Abstract:
During the past few decades wireless technology has seen a tremendous growth. The recent introduction of high-end mobile devices has further increased subscriber's demand for high bandwidth. Current cellular systems require manual configuration and management of networks, which is now costly, time consuming and error prone due to exponentially increasing rate of mobile users and nodes. This leads…
▽ More
During the past few decades wireless technology has seen a tremendous growth. The recent introduction of high-end mobile devices has further increased subscriber's demand for high bandwidth. Current cellular systems require manual configuration and management of networks, which is now costly, time consuming and error prone due to exponentially increasing rate of mobile users and nodes. This leads to introduction of self organizing capabilities for network management with minimum human involvement. It is expected to permit higher end user Quality of Service (QoS) along with less operational and maintenance cost for telecom service providers. Self organized cellular networks incorporate a collection of functions for automatic configuration, optimization and maintenance of cellular networks. As mobile end users continue to use network resources while moving from a cell boundary to other, traffic load within a cell does not remain constant. Thus Load balancing as a part of self organized network solution, has become one of the most active and emerging fields of research in Cellular Network. It involves transfer of load from overloaded cells to the neighbouring cells with free resources for more balanced load distribution in order to maintain appropriate end-user experience and network performance. In this paper, review of various load balancing techniques currently used in mobile networks is presented, with special emphasis on techniques that are suitable for self optimization feature in future cellular networks.
△ Less
Submitted 23 December, 2014;
originally announced December 2014.