Search | arXiv e-print repository

On the Minimisation of Deterministic and History-Deterministic Generalised (co)Büchi Automata

Authors: Antonio Casares, Olivier Idir, Denis Kuperberg, Corto Mascle, Aditya Prakash

Abstract: We present a polynomial-time algorithm minimising the number of states of history-deterministic generalised coBüchi automata, building on the work of Abu Radi and Kupferman on coBüchi automata. On the other hand, we establish that the minimisation problem for both deterministic and history-deterministic generalised Büchi automata is NP-complete, as well as the problem of minimising at the same tim… ▽ More We present a polynomial-time algorithm minimising the number of states of history-deterministic generalised coBüchi automata, building on the work of Abu Radi and Kupferman on coBüchi automata. On the other hand, we establish that the minimisation problem for both deterministic and history-deterministic generalised Büchi automata is NP-complete, as well as the problem of minimising at the same time the number of states and colours of history-deterministic generalised coBüchi automata. △ Less

Submitted 25 July, 2024; originally announced July 2024.

arXiv:2407.12777 [pdf, other]

Generalizable Human Gaussians for Sparse View Synthesis

Authors: Youngjoong Kwon, Baole Fang, Yixing Lu, Haoye Dong, Cheng Zhang, Francisco Vicente Carrasco, Albert Mosella-Montoro, Jianjin Xu, Shingo Takagi, Daeil Kim, Aayush Prakash, Fernando De la Torre

Abstract: Recent progress in neural rendering has brought forth pioneering methods, such as NeRF and Gaussian Splatting, which revolutionize view rendering across various domains like AR/VR, gaming, and content creation. While these methods excel at interpolating {\em within the training data}, the challenge of generalizing to new scenes and objects from very sparse views persists. Specifically, modeling 3D… ▽ More Recent progress in neural rendering has brought forth pioneering methods, such as NeRF and Gaussian Splatting, which revolutionize view rendering across various domains like AR/VR, gaming, and content creation. While these methods excel at interpolating {\em within the training data}, the challenge of generalizing to new scenes and objects from very sparse views persists. Specifically, modeling 3D humans from sparse views presents formidable hurdles due to the inherent complexity of human geometry, resulting in inaccurate reconstructions of geometry and textures. To tackle this challenge, this paper leverages recent advancements in Gaussian Splatting and introduces a new method to learn generalizable human Gaussians that allows photorealistic and accurate view-rendering of a new human subject from a limited set of sparse views in a feed-forward manner. A pivotal innovation of our approach involves reformulating the learning of 3D Gaussian parameters into a regression process defined on the 2D UV space of a human template, which allows leveraging the strong geometry prior and the advantages of 2D convolutions. In addition, a multi-scaffold is proposed to effectively represent the offset details. Our method outperforms recent methods on both within-dataset generalization as well as cross-dataset generalization settings. △ Less

Submitted 17 July, 2024; originally announced July 2024.

arXiv:2407.08620 [pdf, other]

History-Determinism vs Fair Simulation

Authors: Udi Boker, Thomas A. Henzinger, Karoliina Lehtinen, Aditya Prakash

Abstract: An automaton is history-deterministic if its nondeterminism can be resolved on the fly, only using the prefix of the word read so far. This mild form of nondeterminism has attracted particular attention for its applications in synthesis problems. An automaton $A$ is guidable with respect to a class $C$ of automata if it can fairly simulate every automaton in $C$ whose language is contained in that… ▽ More An automaton is history-deterministic if its nondeterminism can be resolved on the fly, only using the prefix of the word read so far. This mild form of nondeterminism has attracted particular attention for its applications in synthesis problems. An automaton $A$ is guidable with respect to a class $C$ of automata if it can fairly simulate every automaton in $C$ whose language is contained in that of $A$. In other words, guidable automata are those for which inclusion and simulation coincide, making them particularly interesting for model-checking. We study the connection between these two notions, and specifically the question of when they coincide. For classes of automata on which they do, deciding guidability, an otherwise challenging decision problem, reduces to deciding history-determinism, a problem that is starting to be well-understood for many classes. We provide a selection of sufficient criteria for a class of automata to guarantee the coincidence of the notions, and use them to show that the notions coincide for the most common automata classes, among which are $ω$-regular automata and many infinite-state automata with safety and reachability acceptance conditions, including vector addition systems with states, one-counter nets, pushdown-, Parikh-, and timed-automata. We also demonstrate that history-determinism and guidability do not always coincide, for example, for the classes of timed automata with a fixed number of clocks. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: Full version of the paper accepted at CONCUR 2024

arXiv:2407.02657 [pdf, other]

Large Scale Hierarchical Industrial Demand Time-Series Forecasting incorporating Sparsity

Authors: Harshavardhan Kamarthi, Aditya B. Sasanur, Xinjie Tong, Xingyu Zhou, James Peters, Joe Czyzyk, B. Aditya Prakash

Abstract: Hierarchical time-series forecasting (HTSF) is an important problem for many real-world business applications where the goal is to simultaneously forecast multiple time-series that are related to each other via a hierarchical relation. Recent works, however, do not address two important challenges that are typically observed in many demand forecasting applications at large companies. First, many t… ▽ More Hierarchical time-series forecasting (HTSF) is an important problem for many real-world business applications where the goal is to simultaneously forecast multiple time-series that are related to each other via a hierarchical relation. Recent works, however, do not address two important challenges that are typically observed in many demand forecasting applications at large companies. First, many time-series at lower levels of the hierarchy have high sparsity i.e., they have a significant number of zeros. Most HTSF methods do not address this varying sparsity across the hierarchy. Further, they do not scale well to the large size of the real-world hierarchy typically unseen in benchmarks used in literature. We resolve both these challenges by proposing HAILS, a novel probabilistic hierarchical model that enables accurate and calibrated probabilistic forecasts across the hierarchy by adaptively modeling sparse and dense time-series with different distributional assumptions and reconciling them to adhere to hierarchical constraints. We show the scalability and effectiveness of our methods by evaluating them against real-world demand forecasting datasets. We deploy HAILS at a large chemical manufacturing company for a product demand forecasting application with over ten thousand products and observe a significant 8.5\% improvement in forecast accuracy and 23% better improvement for sparse time-series. The enhanced accuracy and scalability make HAILS a valuable tool for improved business planning and customer experience. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: Accepted at KDD 2024

arXiv:2407.02641 [pdf, other]

Learning Graph Structures and Uncertainty for Accurate and Calibrated Time-series Forecasting

Authors: Harshavardhan Kamarthi, Lingkai Kong, Alexander Rodriguez, Chao Zhang, B Aditya Prakash

Abstract: Multi-variate time series forecasting is an important problem with a wide range of applications. Recent works model the relations between time-series as graphs and have shown that propagating information over the relation graph can improve time series forecasting. However, in many cases, relational information is not available or is noisy and reliable. Moreover, most works ignore the underlying un… ▽ More Multi-variate time series forecasting is an important problem with a wide range of applications. Recent works model the relations between time-series as graphs and have shown that propagating information over the relation graph can improve time series forecasting. However, in many cases, relational information is not available or is noisy and reliable. Moreover, most works ignore the underlying uncertainty of time-series both for structure learning and deriving the forecasts resulting in the structure not capturing the uncertainty resulting in forecast distributions with poor uncertainty estimates. We tackle this challenge and introduce STOIC, that leverages stochastic correlations between time-series to learn underlying structure between time-series and to provide well-calibrated and accurate forecasts. Over a wide-range of benchmark datasets STOIC provides around 16% more accurate and 14% better-calibrated forecasts. STOIC also shows better adaptation to noise in data during inference and captures important and useful relational information in various benchmarks. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2406.12747 [pdf, other]

TSI-Bench: Benchmarking Time Series Imputation

Authors: Wenjie Du, Jun Wang, Linglong Qian, Yiyuan Yang, Fanxing Liu, Zepu Wang, Zina Ibrahim, Haoxin Liu, Zhiyuan Zhao, Yingjie Zhou, Wenjia Wang, Kaize Ding, Yuxuan Liang, B. Aditya Prakash, Qingsong Wen

Abstract: Effective imputation is a crucial preprocessing step for time series analysis. Despite the development of numerous deep learning algorithms for time series imputation, the community lacks standardized and comprehensive benchmark platforms to effectively evaluate imputation performance across different settings. Moreover, although many deep learning forecasting algorithms have demonstrated excellen… ▽ More Effective imputation is a crucial preprocessing step for time series analysis. Despite the development of numerous deep learning algorithms for time series imputation, the community lacks standardized and comprehensive benchmark platforms to effectively evaluate imputation performance across different settings. Moreover, although many deep learning forecasting algorithms have demonstrated excellent performance, whether their modeling achievements can be transferred to time series imputation tasks remains unexplored. To bridge these gaps, we develop TSI-Bench, the first (to our knowledge) comprehensive benchmark suite for time series imputation utilizing deep learning techniques. The TSI-Bench pipeline standardizes experimental settings to enable fair evaluation of imputation algorithms and identification of meaningful insights into the influence of domain-appropriate missingness ratios and patterns on model performance. Furthermore, TSI-Bench innovatively provides a systematic paradigm to tailor time series forecasting algorithms for imputation purposes. Our extensive study across 34,804 experiments, 28 algorithms, and 8 datasets with diverse missingness scenarios demonstrates TSI-Bench's effectiveness in diverse downstream tasks and potential to unlock future directions in time series imputation research and analysis. The source code and experiment logs are available at https://github.com/WenjieDu/AwesomeImputation. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.09130 [pdf, other]

Time-Series Forecasting for Out-of-Distribution Generalization Using Invariant Learning

Authors: Haoxin Liu, Harshavardhan Kamarthi, Lingkai Kong, Zhiyuan Zhao, Chao Zhang, B. Aditya Prakash

Abstract: Time-series forecasting (TSF) finds broad applications in real-world scenarios. Due to the dynamic nature of time-series data, it is crucial to equip TSF models with out-of-distribution (OOD) generalization abilities, as historical training data and future test data can have different distributions. In this paper, we aim to alleviate the inherent OOD problem in TSF via invariant learning. We ident… ▽ More Time-series forecasting (TSF) finds broad applications in real-world scenarios. Due to the dynamic nature of time-series data, it is crucial to equip TSF models with out-of-distribution (OOD) generalization abilities, as historical training data and future test data can have different distributions. In this paper, we aim to alleviate the inherent OOD problem in TSF via invariant learning. We identify fundamental challenges of invariant learning for TSF. First, the target variables in TSF may not be sufficiently determined by the input due to unobserved core variables in TSF, breaking the conventional assumption of invariant learning. Second, time-series datasets lack adequate environment labels, while existing environmental inference methods are not suitable for TSF. To address these challenges, we propose FOIL, a model-agnostic framework that enables timeseries Forecasting for Out-of-distribution generalization via Invariant Learning. FOIL employs a novel surrogate loss to mitigate the impact of unobserved variables. Further, FOIL implements a joint optimization by alternately inferring environments effectively with a multi-head network while preserving the temporal adjacency structure, and learning invariant representations across inferred environments for OOD generalized TSF. We demonstrate that the proposed FOIL significantly improves the performance of various TSF models, achieving gains of up to 85%. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 14 pages

ACM Class: H.0

arXiv:2406.08627 [pdf, other]

Time-MMD: A New Multi-Domain Multimodal Dataset for Time Series Analysis

Authors: Haoxin Liu, Shangqing Xu, Zhiyuan Zhao, Lingkai Kong, Harshavardhan Kamarthi, Aditya B. Sasanur, Megha Sharma, Jiaming Cui, Qingsong Wen, Chao Zhang, B. Aditya Prakash

Abstract: Time series data are ubiquitous across a wide range of real-world domains. While real-world time series analysis (TSA) requires human experts to integrate numerical series data with multimodal domain-specific knowledge, most existing TSA models rely solely on numerical data, overlooking the significance of information beyond numerical series. This oversight is due to the untapped potential of text… ▽ More Time series data are ubiquitous across a wide range of real-world domains. While real-world time series analysis (TSA) requires human experts to integrate numerical series data with multimodal domain-specific knowledge, most existing TSA models rely solely on numerical data, overlooking the significance of information beyond numerical series. This oversight is due to the untapped potential of textual series data and the absence of a comprehensive, high-quality multimodal dataset. To overcome this obstacle, we introduce Time-MMD, the first multi-domain, multimodal time series dataset covering 9 primary data domains. Time-MMD ensures fine-grained modality alignment, eliminates data contamination, and provides high usability. Additionally, we develop MM-TSFlib, the first multimodal time-series forecasting (TSF) library, seamlessly pipelining multimodal TSF evaluations based on Time-MMD for in-depth analyses. Extensive experiments conducted on Time-MMD through MM-TSFlib demonstrate significant performance enhancements by extending unimodal TSF to multimodality, evidenced by over 15% mean squared error reduction in general, and up to 40% in domains with rich textual data. More importantly, our datasets and library revolutionize broader applications, impacts, research topics to advance TSA. The dataset and library are available at https://github.com/AdityaLab/Time-MMD and https://github.com/AdityaLab/MM-TSFlib. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.04273 [pdf, other]

ELFS: Enhancing Label-Free Coreset Selection via Clustering-based Pseudo-Labeling

Authors: Haizhong Zheng, Elisa Tsai, Yifu Lu, Jiachen Sun, Brian R. Bartoldson, Bhavya Kailkhura, Atul Prakash

Abstract: High-quality human-annotated data is crucial for modern deep learning pipelines, yet the human annotation process is both costly and time-consuming. Given a constrained human labeling budget, selecting an informative and representative data subset for labeling can significantly reduce human annotation effort. Well-performing state-of-the-art (SOTA) coreset selection methods require ground-truth la… ▽ More High-quality human-annotated data is crucial for modern deep learning pipelines, yet the human annotation process is both costly and time-consuming. Given a constrained human labeling budget, selecting an informative and representative data subset for labeling can significantly reduce human annotation effort. Well-performing state-of-the-art (SOTA) coreset selection methods require ground-truth labels over the whole dataset, failing to reduce the human labeling burden. Meanwhile, SOTA label-free coreset selection methods deliver inferior performance due to poor geometry-based scores. In this paper, we introduce ELFS, a novel label-free coreset selection method. ELFS employs deep clustering to estimate data difficulty scores without ground-truth labels. Furthermore, ELFS uses a simple but effective double-end pruning method to mitigate bias on calculated scores, which further improves the performance on selected coresets. We evaluate ELFS on five vision benchmarks and show that ELFS consistently outperforms SOTA label-free baselines. For instance, at a 90% pruning rate, ELFS surpasses the best-performing baseline by 5.3% on CIFAR10 and 7.1% on CIFAR100. Moreover, ELFS even achieves comparable performance to supervised coreset selection at low pruning rates (e.g., 30% and 50%) on CIFAR10 and ImageNet-1K. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2405.14899 [pdf, other]

DETAIL: Task DEmonsTration Attribution for Interpretable In-context Learning

Authors: Zijian Zhou, Xiaoqiang Lin, Xinyi Xu, Alok Prakash, Daniela Rus, Bryan Kian Hsiang Low

Abstract: In-context learning (ICL) allows transformer-based language models that are pre-trained on general text to quickly learn a specific task with a few "task demonstrations" without updating their parameters, significantly boosting their flexibility and generality. ICL possesses many distinct characteristics from conventional machine learning, thereby requiring new approaches to interpret this learnin… ▽ More In-context learning (ICL) allows transformer-based language models that are pre-trained on general text to quickly learn a specific task with a few "task demonstrations" without updating their parameters, significantly boosting their flexibility and generality. ICL possesses many distinct characteristics from conventional machine learning, thereby requiring new approaches to interpret this learning paradigm. Taking the viewpoint of recent works showing that transformers learn in context by formulating an internal optimizer, we propose an influence function-based attribution technique, DETAIL, that addresses the specific characteristics of ICL. We empirically verify the effectiveness of our approach for demonstration attribution while being computationally efficient. Leveraging the results, we then show how DETAIL can help improve model performance in real-world scenarios through demonstration reordering and curation. Finally, we experimentally prove the wide applicability of DETAIL by showing our attribution scores obtained on white-box models are transferable to black-box models in improving model performance. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2404.17530 [pdf, other]

Lookahead Games and Efficient Determinisation of History-Deterministic Büchi Automata

Authors: Rohan Acharya, Marcin Jurdziński, Aditya Prakash

Abstract: Our main technical contribution is a polynomial-time determinisation procedure for history-deterministic Büchi automata, which settles an open question of Kuperberg and Skrzypczak, 2015. A key conceptual contribution is the lookahead game, which is a variant of Bagnol and Kuperberg's token game, in which Adam is given a fixed lookahead. We prove that the lookahead game is equivalent to the 1-token… ▽ More Our main technical contribution is a polynomial-time determinisation procedure for history-deterministic Büchi automata, which settles an open question of Kuperberg and Skrzypczak, 2015. A key conceptual contribution is the lookahead game, which is a variant of Bagnol and Kuperberg's token game, in which Adam is given a fixed lookahead. We prove that the lookahead game is equivalent to the 1-token game. This allows us to show that the 1-token game characterises history-determinism for semantically-deterministic Büchi automata, which paves the way to our polynomial-time determinisation procedure. △ Less

Submitted 26 April, 2024; originally announced April 2024.

Comments: Full version of paper accepted at ICALP 2024

arXiv:2404.04743 [pdf, other]

doi 10.1145/3656418

From Batch to Stream: Automatic Generation of Online Algorithms

Authors: Ziteng Wang, Shankara Pailoor, Aaryan Prakash, Yuepeng Wang, Isil Dillig

Abstract: Online streaming algorithms, tailored for continuous data processing, offer substantial benefits but are often more intricate to design than their offline counterparts. This paper introduces a novel approach for automatically synthesizing online streaming algorithms from their offline versions. In particular, we propose a novel methodology, based on the notion of relational function signature (RFS… ▽ More Online streaming algorithms, tailored for continuous data processing, offer substantial benefits but are often more intricate to design than their offline counterparts. This paper introduces a novel approach for automatically synthesizing online streaming algorithms from their offline versions. In particular, we propose a novel methodology, based on the notion of relational function signature (RFS), for deriving an online algorithm given its offline version. Then, we propose a concrete synthesis algorithm that is an instantiation of the proposed methodology. Our algorithm uses the RFS to decompose the synthesis problem into a set of independent subtasks and uses a combination of symbolic reasoning and search to solve each subproblem. We implement the proposed technique in a new tool called Opera and evaluate it on over 50 tasks spanning two domains: statistical computations and online auctions. Our results show that Opera can automatically derive the online version of the original algorithm for 98% of the tasks. Our experiments also demonstrate that Opera significantly outperforms alternative approaches, including adaptations of SyGuS solvers to this problem as well as two of Opera's own ablations. △ Less

Submitted 8 May, 2024; v1 submitted 6 April, 2024; originally announced April 2024.

ACM Class: D.3.0

Journal ref: Proc. ACM Program. Lang. 8, PLDI, Article 188 (June 2024), 32 pages

arXiv:2403.19852 [pdf, other]

A Review of Graph Neural Networks in Epidemic Modeling

Authors: Zewen Liu, Guancheng Wan, B. Aditya Prakash, Max S. Y. Lau, Wei Jin

Abstract: Since the onset of the COVID-19 pandemic, there has been a growing interest in studying epidemiological models. Traditional mechanistic models mathematically describe the transmission mechanisms of infectious diseases. However, they often suffer from limitations of oversimplified or fixed assumptions, which could cause sub-optimal predictive power and inefficiency in capturing complex relation inf… ▽ More Since the onset of the COVID-19 pandemic, there has been a growing interest in studying epidemiological models. Traditional mechanistic models mathematically describe the transmission mechanisms of infectious diseases. However, they often suffer from limitations of oversimplified or fixed assumptions, which could cause sub-optimal predictive power and inefficiency in capturing complex relation information. Consequently, Graph Neural Networks (GNNs) have emerged as a progressively popular tool in epidemic research. In this paper, we endeavor to furnish a comprehensive review of GNNs in epidemic tasks and highlight potential future directions. To accomplish this objective, we introduce hierarchical taxonomies for both epidemic tasks and methodologies, offering a trajectory of development within this domain. For epidemic tasks, we establish a taxonomy akin to those typically employed within the epidemic domain. For methodology, we categorize existing work into Neural Models and Hybrid Models. Following this, we perform an exhaustive and systematic examination of the methodologies, encompassing both the tasks and their technical details. Furthermore, we discuss the limitations of existing methods from diverse perspectives and systematically propose future research directions. This survey aims to bridge literature gaps and promote the progression of this promising field, with a list of relevant papers at https://github.com/Emory-Melody/awesome-epidemic-modelingpapers. We hope that it will facilitate synergies between the communities of GNNs and epidemiology, and contribute to their collective progress. △ Less

Submitted 21 April, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.16428 [pdf, other]

Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects

Authors: Zicong Fan, Takehiko Ohkawa, Linlin Yang, Nie Lin, Zhishan Zhou, Shihao Zhou, Jiajun Liang, Zhong Gao, Xuanyang Zhang, Xue Zhang, Fei Li, Zheng Liu, Feng Lu, Karim Abou Zeid, Bastian Leibe, Jeongwan On, Seungryul Baek, Aditya Prakash, Saurabh Gupta, Kun He, Yoichi Sato, Otmar Hilliges, Hyung Jin Chang, Angela Yao

Abstract: We interact with the world with our hands and see it through our own (egocentric) perspective. A holistic 3Dunderstanding of such interactions from egocentric views is important for tasks in robotics, AR/VR, action recognition and motion generation. Accurately reconstructing such interactions in 3D is challenging due to heavy occlusion, viewpoint bias, camera distortion, and motion blur from the h… ▽ More We interact with the world with our hands and see it through our own (egocentric) perspective. A holistic 3Dunderstanding of such interactions from egocentric views is important for tasks in robotics, AR/VR, action recognition and motion generation. Accurately reconstructing such interactions in 3D is challenging due to heavy occlusion, viewpoint bias, camera distortion, and motion blur from the head movement. To this end, we designed the HANDS23 challenge based on the AssemblyHands and ARCTIC datasets with carefully designed training and testing splits. Based on the results of the top submitted methods and more recent baselines on the leaderboards, we perform a thorough analysis on 3D hand(-object) reconstruction tasks. Our analysis demonstrates the effectiveness of addressing distortion specific to egocentric cameras, adopting high-capacity transformers to learn complex hand-object interactions, and fusing predictions from different views. Our study further reveals challenging scenarios intractable with state-of-the-art methods, such as fast hand motion, object reconstruction from narrow egocentric views, and close contact between two hands and objects. Our efforts will enrich the community's knowledge foundation and facilitate future hand studies on egocentric hand-object interactions. △ Less

Submitted 5 August, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

Comments: Accepted to ECCV 2024

arXiv:2403.05578 [pdf, other]

Chaining text-to-image and large language model: A novel approach for generating personalized e-commerce banners

Authors: Shanu Vashishtha, Abhinav Prakash, Lalitesh Morishetti, Kaushiki Nag, Yokila Arora, Sushant Kumar, Kannan Achan

Abstract: Text-to-image models such as stable diffusion have opened a plethora of opportunities for generating art. Recent literature has surveyed the use of text-to-image models for enhancing the work of many creative artists. Many e-commerce platforms employ a manual process to generate the banners, which is time-consuming and has limitations of scalability. In this work, we demonstrate the use of text-to… ▽ More Text-to-image models such as stable diffusion have opened a plethora of opportunities for generating art. Recent literature has surveyed the use of text-to-image models for enhancing the work of many creative artists. Many e-commerce platforms employ a manual process to generate the banners, which is time-consuming and has limitations of scalability. In this work, we demonstrate the use of text-to-image models for generating personalized web banners with dynamic content for online shoppers based on their interactions. The novelty in this approach lies in converting users' interaction data to meaningful prompts without human intervention. To this end, we utilize a large language model (LLM) to systematically extract a tuple of attributes from item meta-information. The attributes are then passed to a text-to-image model via prompt engineering to generate images for the banner. Our results show that the proposed approach can create high-quality personalized banners for users. △ Less

Submitted 28 February, 2024; originally announced March 2024.

Comments: 10 pages

arXiv:2403.04781 [pdf]

Selective Encryption using Segmentation Mask with Chaotic Henon Map for Multidimensional Medical Images

Authors: S Arut Prakash, Aditya Ganesh Kumar, Prabhu Shankar K. C., Lithicka Anandavel, Aditya Lakshmi Narayanan

Abstract: A user-centric design and resource optimization should be at the center of any technology or innovation. The user-centric perspective gives the developer the opportunity to develop with task-based optimization. The user in the medical image field is a medical professional who analyzes the medical images and gives their diagnosis results to the patient. This scheme, having the medical professional… ▽ More A user-centric design and resource optimization should be at the center of any technology or innovation. The user-centric perspective gives the developer the opportunity to develop with task-based optimization. The user in the medical image field is a medical professional who analyzes the medical images and gives their diagnosis results to the patient. This scheme, having the medical professional user's perspective, innovates in the area of Medical Image storage and security. The architecture is designed with three main segments, namely: Segmentation, Storage, and Retrieval. This architecture was designed owing to the fact that the number of retrieval operations done by medical professionals was toweringly higher when compared to the storage operations done for some handful number of times for a particular medical image. This gives room for our innovation to segment out the medically indispensable part of the medical image, encrypt it, and store it. By encrypting the vital parts of the image using a strong encryption algorithm like the chaotic Henon map, we are able to keep the security intact. Now retrieving the medical image demands only the computationally less stressing decryption of the segmented region of interest. The decryption of the segmented region of interest results in the full recovery of the medical image which can be viewed on demand by the medical professionals for various diagnosis purposes. In this scheme, we were able to achieve a retrieval speed improvement of around 47% when compared to a full image encryption of brain medical CT images. △ Less

Submitted 2 March, 2024; originally announced March 2024.

arXiv:2402.16132 [pdf, other]

LSTPrompt: Large Language Models as Zero-Shot Time Series Forecasters by Long-Short-Term Prompting

Authors: Haoxin Liu, Zhiyuan Zhao, Jindong Wang, Harshavardhan Kamarthi, B. Aditya Prakash

Abstract: Time-series forecasting (TSF) finds broad applications in real-world scenarios. Prompting off-the-shelf Large Language Models (LLMs) demonstrates strong zero-shot TSF capabilities while preserving computational efficiency. However, existing prompting methods oversimplify TSF as language next-token predictions, overlooking its dynamic nature and lack of integration with state-of-the-art prompt stra… ▽ More Time-series forecasting (TSF) finds broad applications in real-world scenarios. Prompting off-the-shelf Large Language Models (LLMs) demonstrates strong zero-shot TSF capabilities while preserving computational efficiency. However, existing prompting methods oversimplify TSF as language next-token predictions, overlooking its dynamic nature and lack of integration with state-of-the-art prompt strategies such as Chain-of-Thought. Thus, we propose LSTPrompt, a novel approach for prompting LLMs in zero-shot TSF tasks. LSTPrompt decomposes TSF into short-term and long-term forecasting sub-tasks, tailoring prompts to each. LSTPrompt guides LLMs to regularly reassess forecasting mechanisms to enhance adaptability. Extensive evaluations demonstrate consistently better performance of LSTPrompt than existing prompting methods, and competitive results compared to foundation TSF models. △ Less

Submitted 25 February, 2024; originally announced February 2024.

Comments: 9 pages, 4 figures, 3 tables, 2 page references, 2 page appendix

arXiv:2402.15911 [pdf, other]

PRP: Propagating Universal Perturbations to Attack Large Language Model Guard-Rails

Authors: Neal Mangaokar, Ashish Hooda, Jihye Choi, Shreyas Chandrashekaran, Kassem Fawaz, Somesh Jha, Atul Prakash

Abstract: Large language models (LLMs) are typically aligned to be harmless to humans. Unfortunately, recent work has shown that such models are susceptible to automated jailbreak attacks that induce them to generate harmful content. More recent LLMs often incorporate an additional layer of defense, a Guard Model, which is a second LLM that is designed to check and moderate the output response of the primar… ▽ More Large language models (LLMs) are typically aligned to be harmless to humans. Unfortunately, recent work has shown that such models are susceptible to automated jailbreak attacks that induce them to generate harmful content. More recent LLMs often incorporate an additional layer of defense, a Guard Model, which is a second LLM that is designed to check and moderate the output response of the primary LLM. Our key contribution is to show a novel attack strategy, PRP, that is successful against several open-source (e.g., Llama 2) and closed-source (e.g., GPT 3.5) implementations of Guard Models. PRP leverages a two step prefix-based attack that operates by (a) constructing a universal adversarial prefix for the Guard Model, and (b) propagating this prefix to the response. We find that this procedure is effective across multiple threat models, including ones in which the adversary has no access to the Guard Model at all. Our work suggests that further advances are required on defenses and Guard Models before they can be considered effective. △ Less

Submitted 24 February, 2024; originally announced February 2024.

arXiv:2402.12280 [pdf, other]

Adaptive Skeleton Graph Decoding

Authors: Shuowei Jin, Yongji Wu, Haizhong Zheng, Qingzhao Zhang, Matthew Lentz, Z. Morley Mao, Atul Prakash, Feng Qian, Danyang Zhuo

Abstract: Large language models (LLMs) have seen significant adoption for natural language tasks, owing their success to massive numbers of model parameters (e.g., 70B+); however, LLM inference incurs significant computation and memory costs. Recent approaches propose parallel decoding strategies, such as Skeleton-of-Thought (SoT), to improve performance by breaking prompts down into sub-problems that can b… ▽ More Large language models (LLMs) have seen significant adoption for natural language tasks, owing their success to massive numbers of model parameters (e.g., 70B+); however, LLM inference incurs significant computation and memory costs. Recent approaches propose parallel decoding strategies, such as Skeleton-of-Thought (SoT), to improve performance by breaking prompts down into sub-problems that can be decoded in parallel; however, they often suffer from reduced response quality. Our key insight is that we can request additional information, specifically dependencies and difficulty, when generating the sub-problems to improve both response quality and performance. In this paper, we propose Skeleton Graph Decoding (SGD), which uses dependencies exposed between sub-problems to support information forwarding between dependent sub-problems for improved quality while exposing parallelization opportunities for decoding independent sub-problems. Additionally, we leverage difficulty estimates for each sub-problem to select an appropriately-sized model, improving performance without significantly reducing quality. Compared to standard autoregressive generation and SoT, SGD achieves a 1.69x speedup while improving quality by up to 51%. △ Less

Submitted 19 February, 2024; originally announced February 2024.

arXiv:2402.06126 [pdf, other]

Learn To be Efficient: Build Structured Sparsity in Large Language Models

Authors: Haizhong Zheng, Xiaoyan Bai, Xueshen Liu, Z. Morley Mao, Beidi Chen, Fan Lai, Atul Prakash

Abstract: Large Language Models (LLMs) have achieved remarkable success with their billion-level parameters, yet they incur high inference overheads. The emergence of activation sparsity in LLMs provides a natural approach to reduce this cost by involving only parts of the parameters for inference. However, existing methods only focus on utilizing this naturally formed activation sparsity in a post-training… ▽ More Large Language Models (LLMs) have achieved remarkable success with their billion-level parameters, yet they incur high inference overheads. The emergence of activation sparsity in LLMs provides a natural approach to reduce this cost by involving only parts of the parameters for inference. However, existing methods only focus on utilizing this naturally formed activation sparsity in a post-training setting, overlooking the potential for further amplifying this inherent sparsity. In this paper, we hypothesize that LLMs can learn to be efficient by achieving more structured activation sparsity. To achieve this, we introduce a novel training algorithm, Learn-To-be-Efficient (LTE), designed to train efficiency-aware LLMs to learn to activate fewer neurons and achieve a better trade-off between sparsity and performance. Furthermore, unlike SOTA MoEfication methods, which mainly focus on ReLU-based models, LTE can also be applied to LLMs like LLaMA using non-ReLU activations. Extensive evaluation on language understanding, language generation, and instruction tuning tasks show that LTE consistently outperforms SOTA baselines. Along with our hardware-aware custom kernel implementation, LTE reduces LLaMA2-7B inference latency by 25% at 50% sparsity. △ Less

Submitted 3 June, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

arXiv:2402.03358 [pdf, other]

A Comprehensive Survey on Graph Reduction: Sparsification, Coarsening, and Condensation

Authors: Mohammad Hashemi, Shengbo Gong, Juntong Ni, Wenqi Fan, B. Aditya Prakash, Wei Jin

Abstract: Many real-world datasets can be naturally represented as graphs, spanning a wide range of domains. However, the increasing complexity and size of graph datasets present significant challenges for analysis and computation. In response, graph reduction, or graph summarization, has gained prominence for simplifying large graphs while preserving essential properties. In this survey, we aim to provide… ▽ More Many real-world datasets can be naturally represented as graphs, spanning a wide range of domains. However, the increasing complexity and size of graph datasets present significant challenges for analysis and computation. In response, graph reduction, or graph summarization, has gained prominence for simplifying large graphs while preserving essential properties. In this survey, we aim to provide a comprehensive understanding of graph reduction methods, including graph sparsification, graph coarsening, and graph condensation. Specifically, we establish a unified definition for these methods and introduce a hierarchical taxonomy to categorize the challenges they address. Our survey then systematically reviews the technical details of these methods and emphasizes their practical applications across diverse scenarios. Furthermore, we outline critical research directions to ensure the continued effectiveness of graph reduction techniques, as well as provide a comprehensive paper list at \url{https://github.com/Emory-Melody/awesome-graph-reduction}. We hope this survey will bridge literature gaps and propel the advancement of this promising field. △ Less

Submitted 29 June, 2024; v1 submitted 28 January, 2024; originally announced February 2024.

Comments: Accepted by IJCAI 2024 (This ArXiv version is a long version of our IJCAI paper)

arXiv:2401.12437 [pdf, other]

Convex-Concave Zero-sum Markov Stackelberg Games

Authors: Denizalp Goktas, Arjun Prakash, Amy Greenwald

Abstract: Zero-sum Markov Stackelberg games can be used to model myriad problems, in domains ranging from economics to human robot interaction. In this paper, we develop policy gradient methods that solve these games in continuous state and action settings using noisy gradient estimates computed from observed trajectories of play. When the games are convex-concave, we prove that our algorithms converge to S… ▽ More Zero-sum Markov Stackelberg games can be used to model myriad problems, in domains ranging from economics to human robot interaction. In this paper, we develop policy gradient methods that solve these games in continuous state and action settings using noisy gradient estimates computed from observed trajectories of play. When the games are convex-concave, we prove that our algorithms converge to Stackelberg equilibrium in polynomial time. We also show that reach-avoid problems are naturally modeled as convex-concave zero-sum Markov Stackelberg games, and that Stackelberg equilibrium policies are more effective than their Nash counterparts in these problems. △ Less

Submitted 22 January, 2024; originally announced January 2024.

arXiv:2312.06594 [pdf, other]

Mitigating Perspective Distortion-induced Shape Ambiguity in Image Crops

Authors: Aditya Prakash, Arjun Gupta, Saurabh Gupta

Abstract: Objects undergo varying amounts of perspective distortion as they move across a camera's field of view. Models for predicting 3D from a single image often work with crops around the object of interest and ignore the location of the object in the camera's field of view. We note that ignoring this location information further exaggerates the inherent ambiguity in making 3D inferences from 2D images… ▽ More Objects undergo varying amounts of perspective distortion as they move across a camera's field of view. Models for predicting 3D from a single image often work with crops around the object of interest and ignore the location of the object in the camera's field of view. We note that ignoring this location information further exaggerates the inherent ambiguity in making 3D inferences from 2D images and can prevent models from even fitting to the training data. To mitigate this ambiguity, we propose Intrinsics-Aware Positional Encoding (KPE), which incorporates information about the location of crops in the image and camera intrinsics. Experiments on three popular 3D-from-a-single-image benchmarks: depth prediction on NYU, 3D object detection on KITTI & nuScenes, and predicting 3D shapes of articulated objects on ARCTIC, show the benefits of KPE. △ Less

Submitted 11 December, 2023; originally announced December 2023.

Comments: Project Page: https://ap229997.github.io/projects/ambiguity/

arXiv:2312.06583 [pdf, other]

3D Hand Pose Estimation in Egocentric Images in the Wild

Authors: Aditya Prakash, Ruisen Tu, Matthew Chang, Saurabh Gupta

Abstract: We present WildHands, a method for 3D hand pose estimation in egocentric images in the wild. This is challenging due to (a) lack of 3D hand pose annotations for images in the wild, and (b) a form of perspective distortion-induced shape ambiguity that arises in the analysis of crops around hands. For the former, we use auxiliary supervision on in-the-wild data in the form of segmentation masks & gr… ▽ More We present WildHands, a method for 3D hand pose estimation in egocentric images in the wild. This is challenging due to (a) lack of 3D hand pose annotations for images in the wild, and (b) a form of perspective distortion-induced shape ambiguity that arises in the analysis of crops around hands. For the former, we use auxiliary supervision on in-the-wild data in the form of segmentation masks & grasp labels in addition to 3D supervision available in lab datasets. For the latter, we provide spatial cues about the location of the hand crop in the camera's field of view. Our approach achieves the best 3D hand pose on the ARCTIC leaderboard and outperforms FrankMocap, a popular and robust approach for estimating hand pose in the wild, by 45.3% when evaluated on 2D hand pose on our EPIC-HandKps dataset. △ Less

Submitted 11 December, 2023; originally announced December 2023.

Comments: Project page: https://ap229997.github.io/projects/hands/

arXiv:2311.11413 [pdf, other]

Large Pre-trained time series models for cross-domain Time series analysis tasks

Authors: Harshavardhan Kamarthi, B. Aditya Prakash

Abstract: Large pre-trained models have been vital in recent advancements in domains like language and vision, making model training for individual downstream tasks more efficient and provide superior performance. However, tackling time-series analysis tasks usually involves designing and training a separate model from scratch leveraging training data and domain expertise specific to the task. We tackle a s… ▽ More Large pre-trained models have been vital in recent advancements in domains like language and vision, making model training for individual downstream tasks more efficient and provide superior performance. However, tackling time-series analysis tasks usually involves designing and training a separate model from scratch leveraging training data and domain expertise specific to the task. We tackle a significant challenge for pre-training a foundational time-series model from multi-domain time-series datasets: extracting semantically useful tokenized inputs to the model across heterogenous time-series from different domains. We propose Large Pre-trained Time-series Models (LPTM) that introduces a novel method of \textit{adaptive segmentation} that automatically identifies optimal dataset-specific segmentation strategy during pre-training. This enables LPTM to perform similar to or better than domain-specific state-of-art model when fine-tuned to different downstream time-series analysis tasks and under zero-shot settings. LPTM achieves superior forecasting and time-series classification results taking up to 40% less data and 50% less training time compared to state-of-art baselines. △ Less

Submitted 11 July, 2024; v1 submitted 19 November, 2023; originally announced November 2023.

Comments: 16 pages, 5 Figures, 4 Tables

arXiv:2311.07841 [pdf, other]

PEMS: Pre-trained Epidemic Time-series Models

Authors: Harshavardhan Kamarthi, B. Aditya Prakash

Abstract: Providing accurate and reliable predictions about the future of an epidemic is an important problem for enabling informed public health decisions. Recent works have shown that leveraging data-driven solutions that utilize advances in deep learning methods to learn from past data of an epidemic often outperform traditional mechanistic models. However, in many cases, the past data is sparse and may… ▽ More Providing accurate and reliable predictions about the future of an epidemic is an important problem for enabling informed public health decisions. Recent works have shown that leveraging data-driven solutions that utilize advances in deep learning methods to learn from past data of an epidemic often outperform traditional mechanistic models. However, in many cases, the past data is sparse and may not sufficiently capture the underlying dynamics. While there exists a large amount of data from past epidemics, leveraging prior knowledge from time-series data of other diseases is a non-trivial challenge. Motivated by the success of pre-trained models in language and vision tasks, we tackle the problem of pre-training epidemic time-series models to learn from multiple datasets from different diseases and epidemics. We introduce Pre-trained Epidemic Time-Series Models (PEMS) that learn from diverse time-series datasets of a variety of diseases by formulating pre-training as a set of self-supervised learning (SSL) tasks. We tackle various important challenges specific to pre-training for epidemic time-series such as dealing with heterogeneous dynamics and efficiently capturing useful patterns from multiple epidemic datasets by carefully designing the SSL tasks to learn important priors about the epidemic dynamics that can be leveraged for fine-tuning to multiple downstream tasks. The resultant PEM outperforms previous state-of-the-art methods in various downstream time-series tasks across datasets of varying seasonal patterns, geography, and mechanism of contagion including the novel Covid-19 pandemic unseen in pre-trained data with better efficiency using smaller fraction of datasets. △ Less

Submitted 19 November, 2023; v1 submitted 13 November, 2023; originally announced November 2023.

Comments: 18 pages

arXiv:2310.13498 [pdf, other]

Checking History-Determinism is NP-hard for Parity Automata

Authors: Aditya Prakash

Abstract: We show that the problem of checking if a given nondeterministic parity automaton simulates another given nondeterministic parity automaton is NP-hard. We then adapt the techniques used for this result to show that the problem of checking history-determinism for a given parity automaton is NP-hard. This is an improvement from Kuperberg and Skrzypczak's previous lower bound of solving parity games… ▽ More We show that the problem of checking if a given nondeterministic parity automaton simulates another given nondeterministic parity automaton is NP-hard. We then adapt the techniques used for this result to show that the problem of checking history-determinism for a given parity automaton is NP-hard. This is an improvement from Kuperberg and Skrzypczak's previous lower bound of solving parity games from 2015. We also show that deciding if Eve wins the one-token game or the two-token game of a given parity automaton is NP-hard. Finally, we show that the problem of deciding if the language of a nondeterministic parity automaton is contained in the language of a history-deterministic parity automaton can be solved in quasi-polynomial time. △ Less

Submitted 23 January, 2024; v1 submitted 20 October, 2023; originally announced October 2023.

Comments: Full version of the paper accepted at FoSSaCS 2024. Some minor editorial changes from the previous version following suggestions from anonymous reviewers

arXiv:2310.11569

When Rigidity Hurts: Soft Consistency Regularization for Probabilistic Hierarchical Time Series Forecasting

Authors: Harshavardhan Kamarthi, Lingkai Kong, Alexander Rodríguez, Chao Zhang, B. Aditya Prakash

Abstract: Probabilistic hierarchical time-series forecasting is an important variant of time-series forecasting, where the goal is to model and forecast multivariate time-series that have underlying hierarchical relations. Most methods focus on point predictions and do not provide well-calibrated probabilistic forecasts distributions. Recent state-of-art probabilistic forecasting methods also impose hierarc… ▽ More Probabilistic hierarchical time-series forecasting is an important variant of time-series forecasting, where the goal is to model and forecast multivariate time-series that have underlying hierarchical relations. Most methods focus on point predictions and do not provide well-calibrated probabilistic forecasts distributions. Recent state-of-art probabilistic forecasting methods also impose hierarchical relations on point predictions and samples of distribution which does not account for coherency of forecast distributions. Previous works also silently assume that datasets are always consistent with given hierarchical relations and do not adapt to real-world datasets that show deviation from this assumption. We close both these gap and propose PROFHiT, which is a fully probabilistic hierarchical forecasting model that jointly models forecast distribution of entire hierarchy. PROFHiT uses a flexible probabilistic Bayesian approach and introduces a novel Distributional Coherency regularization to learn from hierarchical relations for entire forecast distribution that enables robust and calibrated forecasts as well as adapt to datasets of varying hierarchical consistency. On evaluating PROFHiT over wide range of datasets, we observed 41-88% better performance in accuracy and significantly better calibration. Due to modeling the coherency over full distribution, we observed that PROFHiT can robustly provide reliable forecasts even if up to 10% of input time-series data is missing where other methods' performance severely degrade by over 70%. △ Less

Submitted 19 October, 2023; v1 submitted 17 October, 2023; originally announced October 2023.

Comments: I have submitted this paper as a revision of arXiv:2206.07940. Apologies for the confusion

arXiv:2310.07506 [pdf, other]

Leveraging Hierarchical Feature Sharing for Efficient Dataset Condensation

Authors: Haizhong Zheng, Jiachen Sun, Shutong Wu, Bhavya Kailkhura, Zhuoqing Mao, Chaowei Xiao, Atul Prakash

Abstract: Given a real-world dataset, data condensation (DC) aims to synthesize a small synthetic dataset that captures the knowledge of a natural dataset while being usable for training models with comparable accuracy. Recent works propose to enhance DC with data parameterization, which condenses data into very compact parameterized data containers instead of images. The intuition behind data parameterizat… ▽ More Given a real-world dataset, data condensation (DC) aims to synthesize a small synthetic dataset that captures the knowledge of a natural dataset while being usable for training models with comparable accuracy. Recent works propose to enhance DC with data parameterization, which condenses data into very compact parameterized data containers instead of images. The intuition behind data parameterization is to encode shared features of images to avoid additional storage costs. In this paper, we recognize that images share common features in a hierarchical way due to the inherent hierarchical structure of the classification system, which is overlooked by current data parameterization methods. To better align DC with this hierarchical nature and encourage more efficient information sharing inside data containers, we propose a novel data parameterization architecture, Hierarchical Memory Network (HMN). HMN stores condensed data in a three-tier structure, representing the dataset-level, class-level, and instance-level features. Another helpful property of the hierarchical architecture is that HMN naturally ensures good independence among images despite achieving information sharing. This enables instance-level pruning for HMN to reduce redundant information, thereby further minimizing redundancy and enhancing performance. We evaluate HMN on five public datasets and show that our proposed method outperforms all baselines. △ Less

Submitted 18 July, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

Journal ref: ECCV 2024

arXiv:2310.06077 [pdf, other]

Performative Time-Series Forecasting

Authors: Zhiyuan Zhao, Alexander Rodriguez, B. Aditya Prakash

Abstract: Time-series forecasting is a critical challenge in various domains and has witnessed substantial progress in recent years. Many real-life scenarios, such as public health, economics, and social applications, involve feedback loops where predictions can influence the predicted outcome, subsequently altering the target variable's distribution. This phenomenon, known as performativity, introduces the… ▽ More Time-series forecasting is a critical challenge in various domains and has witnessed substantial progress in recent years. Many real-life scenarios, such as public health, economics, and social applications, involve feedback loops where predictions can influence the predicted outcome, subsequently altering the target variable's distribution. This phenomenon, known as performativity, introduces the potential for 'self-negating' or 'self-fulfilling' predictions. Despite extensive studies in classification problems across domains, performativity remains largely unexplored in the context of time-series forecasting from a machine-learning perspective. In this paper, we formalize performative time-series forecasting (PeTS), addressing the challenge of accurate predictions when performativity-induced distribution shifts are possible. We propose a novel approach, Feature Performative-Shifting (FPS), which leverages the concept of delayed response to anticipate distribution shifts and subsequently predicts targets accordingly. We provide theoretical insights suggesting that FPS can potentially lead to reduced generalization error. We conduct comprehensive experiments using multiple time-series models on COVID-19 and traffic forecasting tasks. The results demonstrate that FPS consistently outperforms conventional time-series forecasting methods, highlighting its efficacy in handling performativity-induced challenges. △ Less

Submitted 9 October, 2023; originally announced October 2023.

Comments: 12 pages (7 main text, 2 reference, 3 appendix), 3 figures, 4 tables

arXiv:2309.06252 [pdf, other]

Predicting Routine Object Usage for Proactive Robot Assistance

Authors: Maithili Patel, Aswin Prakash, Sonia Chernova

Abstract: Proactivity in robot assistance refers to the robot's ability to anticipate user needs and perform assistive actions without explicit requests. This requires understanding user routines, predicting consistent activities, and actively seeking information to predict inconsistent behaviors. We propose SLaTe-PRO (Sequential Latent Temporal model for Predicting Routine Object usage), which improves upo… ▽ More Proactivity in robot assistance refers to the robot's ability to anticipate user needs and perform assistive actions without explicit requests. This requires understanding user routines, predicting consistent activities, and actively seeking information to predict inconsistent behaviors. We propose SLaTe-PRO (Sequential Latent Temporal model for Predicting Routine Object usage), which improves upon prior state-of-the-art by combining object and user action information, and conditioning object usage predictions on past history. Additionally, we find some human behavior to be inherently stochastic and lacking in contextual cues that the robot can use for proactive assistance. To address such cases, we introduce an interactive query mechanism that can be used to ask queries about the user's intended activities and object use to improve prediction. We evaluate our approach on longitudinal data from three households, spanning 24 activity classes. SLaTe-PRO performance raises the F1 score metric to 0.57 without queries, and 0.60 with user queries, over a score of 0.43 from prior work. We additionally present a case study with a fully autonomous household robot. △ Less

Submitted 28 January, 2024; v1 submitted 12 September, 2023; originally announced September 2023.

arXiv:2308.16559 [pdf]

VisAhoi: Towards a Library to Generate and Integrate Visualization Onboarding Using High-level Visualization Grammars

Authors: Christina Stoiber, Daniela Moitzi, Holger Stitz, Florian Grassinger, Anto Silviya Geo Prakash, Dominic Girardi, Marc Streit, Wolfgang Aigner

Abstract: Visualization onboarding supports users in reading, interpreting, and extracting information from visual data representations. General-purpose onboarding tools and libraries are applicable for explaining a wide range of graphical user interfaces but cannot handle specific visualization requirements. This paper describes a first step towards developing an onboarding library called VisAhoi, which is… ▽ More Visualization onboarding supports users in reading, interpreting, and extracting information from visual data representations. General-purpose onboarding tools and libraries are applicable for explaining a wide range of graphical user interfaces but cannot handle specific visualization requirements. This paper describes a first step towards developing an onboarding library called VisAhoi, which is easy to integrate, extend, semi-automate, reuse, and customize. VisAhoi supports the creation of onboarding elements for different visualization types and datasets. We demonstrate how to extract and describe onboarding instructions using three well-known high-level descriptive visualization grammars - Vega-Lite, Plotly.js, and ECharts. We show the applicability of our library by performing two usage scenarios that describe the integration of VisAhoi into a VA tool for the analysis of high-throughput screening (HTS) data and, second, into a Flourish template to provide an authoring tool for data journalists for a treemap visualization. We provide a supplementary website that demonstrates the applicability of VisAhoi to various visualizations, including a bar chart, a horizon graph, a change matrix or heatmap, a scatterplot, and a treemap visualization. △ Less

Submitted 31 August, 2023; originally announced August 2023.

arXiv:2308.05889 [pdf, other]

DF2: Distribution-Free Decision-Focused Learning

Authors: Lingkai Kong, Wenhao Mu, Jiaming Cui, Yuchen Zhuang, B. Aditya Prakash, Bo Dai, Chao Zhang

Abstract: Decision-focused learning (DFL) has recently emerged as a powerful approach for predict-then-optimize problems by customizing a predictive model to a downstream optimization task. However, existing end-to-end DFL methods are hindered by three significant bottlenecks: model mismatch error, sample average approximation error, and gradient approximation error. Model mismatch error stems from the misa… ▽ More Decision-focused learning (DFL) has recently emerged as a powerful approach for predict-then-optimize problems by customizing a predictive model to a downstream optimization task. However, existing end-to-end DFL methods are hindered by three significant bottlenecks: model mismatch error, sample average approximation error, and gradient approximation error. Model mismatch error stems from the misalignment between the model's parameterized predictive distribution and the true probability distribution. Sample average approximation error arises when using finite samples to approximate the expected optimization objective. Gradient approximation error occurs as DFL relies on the KKT condition for exact gradient computation, while most methods approximate the gradient for backpropagation in non-convex objectives. In this paper, we present DF2 -- the first \textit{distribution-free} decision-focused learning method explicitly designed to address these three bottlenecks. Rather than depending on a task-specific forecaster that requires precise model assumptions, our method directly learns the expected optimization function during training. To efficiently learn the function in a data-driven manner, we devise an attention-based model architecture inspired by the distribution-based parameterization of the expected objective. Our method is, to the best of our knowledge, the first to address all three bottlenecks within a single model. We evaluate DF2 on a synthetic problem, a wind power bidding problem, and a non-convex vaccine distribution problem, demonstrating the effectiveness of DF2. △ Less

Submitted 10 August, 2023; originally announced August 2023.

Comments: 24 pages

arXiv:2307.16331 [pdf, other]

Theoretically Principled Trade-off for Stateful Defenses against Query-Based Black-Box Attacks

Authors: Ashish Hooda, Neal Mangaokar, Ryan Feng, Kassem Fawaz, Somesh Jha, Atul Prakash

Abstract: Adversarial examples threaten the integrity of machine learning systems with alarming success rates even under constrained black-box conditions. Stateful defenses have emerged as an effective countermeasure, detecting potential attacks by maintaining a buffer of recent queries and detecting new queries that are too similar. However, these defenses fundamentally pose a trade-off between attack dete… ▽ More Adversarial examples threaten the integrity of machine learning systems with alarming success rates even under constrained black-box conditions. Stateful defenses have emerged as an effective countermeasure, detecting potential attacks by maintaining a buffer of recent queries and detecting new queries that are too similar. However, these defenses fundamentally pose a trade-off between attack detection and false positive rates, and this trade-off is typically optimized by hand-picking feature extractors and similarity thresholds that empirically work well. There is little current understanding as to the formal limits of this trade-off and the exact properties of the feature extractors/underlying problem domain that influence it. This work aims to address this gap by offering a theoretical characterization of the trade-off between detection and false positive rates for stateful defenses. We provide upper bounds for detection rates of a general class of feature extractors and analyze the impact of this trade-off on the convergence of black-box attacks. We then support our theoretical findings with empirical evaluations across multiple datasets and stateful defenses. △ Less

Submitted 30 July, 2023; originally announced July 2023.

Comments: 2nd AdvML Frontiers Workshop at ICML 2023

arXiv:2307.11833 [pdf, other]

PINNsFormer: A Transformer-Based Framework For Physics-Informed Neural Networks

Authors: Zhiyuan Zhao, Xueying Ding, B. Aditya Prakash

Abstract: Physics-Informed Neural Networks (PINNs) have emerged as a promising deep learning framework for approximating numerical solutions to partial differential equations (PDEs). However, conventional PINNs, relying on multilayer perceptrons (MLP), neglect the crucial temporal dependencies inherent in practical physics systems and thus fail to propagate the initial condition constraints globally and acc… ▽ More Physics-Informed Neural Networks (PINNs) have emerged as a promising deep learning framework for approximating numerical solutions to partial differential equations (PDEs). However, conventional PINNs, relying on multilayer perceptrons (MLP), neglect the crucial temporal dependencies inherent in practical physics systems and thus fail to propagate the initial condition constraints globally and accurately capture the true solutions under various scenarios. In this paper, we introduce a novel Transformer-based framework, termed PINNsFormer, designed to address this limitation. PINNsFormer can accurately approximate PDE solutions by utilizing multi-head attention mechanisms to capture temporal dependencies. PINNsFormer transforms point-wise inputs into pseudo sequences and replaces point-wise PINNs loss with a sequential loss. Additionally, it incorporates a novel activation function, Wavelet, which anticipates Fourier decomposition through deep neural networks. Empirical results demonstrate that PINNsFormer achieves superior generalization ability and accuracy across various scenarios, including PINNs failure modes and high-dimensional PDEs. Moreover, PINNsFormer offers flexibility in integrating existing learning schemes for PINNs, further enhancing its performance. △ Less

Submitted 7 May, 2024; v1 submitted 21 July, 2023; originally announced July 2023.

Comments: 17 pages (including 9 pages of main text, 3 pages of references, and 5 pages of appendix), 9 figures, 7 tables

arXiv:2307.08849 [pdf, other]

Autoregressive Diffusion Model for Graph Generation

Authors: Lingkai Kong, Jiaming Cui, Haotian Sun, Yuchen Zhuang, B. Aditya Prakash, Chao Zhang

Abstract: Diffusion-based graph generative models have recently obtained promising results for graph generation. However, existing diffusion-based graph generative models are mostly one-shot generative models that apply Gaussian diffusion in the dequantized adjacency matrix space. Such a strategy can suffer from difficulty in model training, slow sampling speed, and incapability of incorporating constraints… ▽ More Diffusion-based graph generative models have recently obtained promising results for graph generation. However, existing diffusion-based graph generative models are mostly one-shot generative models that apply Gaussian diffusion in the dequantized adjacency matrix space. Such a strategy can suffer from difficulty in model training, slow sampling speed, and incapability of incorporating constraints. We propose an \emph{autoregressive diffusion} model for graph generation. Unlike existing methods, we define a node-absorbing diffusion process that operates directly in the discrete graph space. For forward diffusion, we design a \emph{diffusion ordering network}, which learns a data-dependent node absorbing ordering from graph topology. For reverse generation, we design a \emph{denoising network} that uses the reverse node ordering to efficiently reconstruct the graph by predicting the node type of the new node and its edges with previously denoised nodes at a time. Based on the permutation invariance of graph, we show that the two networks can be jointly trained by optimizing a simple lower bound of data likelihood. Our experiments on six diverse generic graph datasets and two molecule datasets show that our model achieves better or comparable generation performance with previous state-of-the-art, and meanwhile enjoys fast generation speed. △ Less

Submitted 17 July, 2023; originally announced July 2023.

Comments: 18 pages

arXiv:2306.04527 [pdf, other]

ContriMix: Scalable stain color augmentation for domain generalization without domain labels in digital pathology

Authors: Tan H. Nguyen, Dinkar Juyal, Jin Li, Aaditya Prakash, Shima Nofallah, Chintan Shah, Sai Chowdary Gullapally, Limin Yu, Michael Griffin, Anand Sampat, John Abel, Justin Lee, Amaro Taylor-Weiner

Abstract: Differences in staining and imaging procedures can cause significant color variations in histopathology images, leading to poor generalization when deploying deep-learning models trained from a different data source. Various color augmentation methods have been proposed to generate synthetic images during training to make models more robust, eliminating the need for stain normalization during test… ▽ More Differences in staining and imaging procedures can cause significant color variations in histopathology images, leading to poor generalization when deploying deep-learning models trained from a different data source. Various color augmentation methods have been proposed to generate synthetic images during training to make models more robust, eliminating the need for stain normalization during test time. Many color augmentation methods leverage domain labels to generate synthetic images. This approach causes three significant challenges to scaling such a model. Firstly, incorporating data from a new domain into deep-learning models trained on existing domain labels is not straightforward. Secondly, dependency on domain labels prevents the use of pathology images without domain labels to improve model performance. Finally, implementation of these methods becomes complicated when multiple domain labels (e.g., patient identification, medical center, etc) are associated with a single image. We introduce ContriMix, a novel domain label free stain color augmentation method based on DRIT++, a style-transfer method. Contrimix leverages sample stain color variation within a training minibatch and random mixing to extract content and attribute information from pathology images. This information can be used by a trained ContriMix model to create synthetic images to improve the performance of existing classifiers. ContriMix outperforms competing methods on the Camelyon17-WILDS dataset. Its performance is consistent across different slides in the test set while being robust to the color variation from rare substances in pathology images. We make our code and trained ContriMix models available for research use. The code for ContriMix can be found at https://gitlab.com/huutan86/contrimix △ Less

Submitted 8 March, 2024; v1 submitted 7 June, 2023; originally announced June 2023.

arXiv:2306.00349 [pdf, other]

CALICO: Self-Supervised Camera-LiDAR Contrastive Pre-training for BEV Perception

Authors: Jiachen Sun, Haizhong Zheng, Qingzhao Zhang, Atul Prakash, Z. Morley Mao, Chaowei Xiao

Abstract: Perception is crucial in the realm of autonomous driving systems, where bird's eye view (BEV)-based architectures have recently reached state-of-the-art performance. The desirability of self-supervised representation learning stems from the expensive and laborious process of annotating 2D and 3D data. Although previous research has investigated pretraining methods for both LiDAR and camera-based 3… ▽ More Perception is crucial in the realm of autonomous driving systems, where bird's eye view (BEV)-based architectures have recently reached state-of-the-art performance. The desirability of self-supervised representation learning stems from the expensive and laborious process of annotating 2D and 3D data. Although previous research has investigated pretraining methods for both LiDAR and camera-based 3D object detection, a unified pretraining framework for multimodal BEV perception is missing. In this study, we introduce CALICO, a novel framework that applies contrastive objectives to both LiDAR and camera backbones. Specifically, CALICO incorporates two stages: point-region contrast (PRC) and region-aware distillation (RAD). PRC better balances the region- and scene-level representation learning on the LiDAR modality and offers significant performance improvement compared to existing methods. RAD effectively achieves contrastive distillation on our self-trained teacher model. CALICO's efficacy is substantiated by extensive evaluations on 3D object detection and BEV map segmentation tasks, where it delivers significant performance improvements. Notably, CALICO outperforms the baseline method by 10.5% and 8.6% on NDS and mAP. Moreover, CALICO boosts the robustness of multimodal 3D object detection against adversarial attacks and corruption. Additionally, our framework can be tailored to different backbones and heads, positioning it as a promising approach for multimodal BEV perception. △ Less

Submitted 27 November, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

arXiv:2305.16301 [pdf, other]

Look Ma, No Hands! Agent-Environment Factorization of Egocentric Videos

Authors: Matthew Chang, Aditya Prakash, Saurabh Gupta

Abstract: The analysis and use of egocentric videos for robotic tasks is made challenging by occlusion due to the hand and the visual mismatch between the human hand and a robot end-effector. In this sense, the human hand presents a nuisance. However, often hands also provide a valuable signal, e.g. the hand pose may suggest what kind of object is being held. In this work, we propose to extract a factored r… ▽ More The analysis and use of egocentric videos for robotic tasks is made challenging by occlusion due to the hand and the visual mismatch between the human hand and a robot end-effector. In this sense, the human hand presents a nuisance. However, often hands also provide a valuable signal, e.g. the hand pose may suggest what kind of object is being held. In this work, we propose to extract a factored representation of the scene that separates the agent (human hand) and the environment. This alleviates both occlusion and mismatch while preserving the signal, thereby easing the design of models for downstream robotics tasks. At the heart of this factorization is our proposed Video Inpainting via Diffusion Model (VIDM) that leverages both a prior on real-world images (through a large-scale pre-trained diffusion model) and the appearance of the object in earlier frames of the video (through attention). Our experiments demonstrate the effectiveness of VIDM at improving inpainting quality on egocentric videos and the power of our factored representation for numerous tasks: object detection, 3D reconstruction of manipulated objects, and learning of reward functions, policies, and affordances from videos. △ Less

Submitted 25 May, 2023; originally announced May 2023.

Comments: for project website with video, see https://matthewchang.github.io/vidm/

arXiv:2305.03036 [pdf, other]

Learning Hand-Held Object Reconstruction from In-The-Wild Videos

Authors: Aditya Prakash, Matthew Chang, Matthew Jin, Saurabh Gupta

Abstract: Prior works for reconstructing hand-held objects from a single image rely on direct 3D shape supervision which is challenging to gather in real world at scale. Consequently, these approaches do not generalize well when presented with novel objects in in-the-wild settings. While 3D supervision is a major bottleneck, there is an abundance of in-the-wild raw video data showing hand-object interaction… ▽ More Prior works for reconstructing hand-held objects from a single image rely on direct 3D shape supervision which is challenging to gather in real world at scale. Consequently, these approaches do not generalize well when presented with novel objects in in-the-wild settings. While 3D supervision is a major bottleneck, there is an abundance of in-the-wild raw video data showing hand-object interactions. In this paper, we automatically extract 3D supervision (via multiview 2D supervision) from such raw video data to scale up the learning of models for hand-held object reconstruction. This requires tackling two key challenges: unknown camera pose and occlusion. For the former, we use hand pose (predicted from existing techniques, e.g. FrankMocap) as a proxy for object pose. For the latter, we learn data-driven 3D shape priors using synthetic objects from the ObMan dataset. We use these indirect 3D cues to train occupancy networks that predict the 3D shape of objects from a single RGB image. Our experiments on the MOW and HO3D datasets show the effectiveness of these supervisory signals at predicting the 3D shape for real-world hand-held objects without any direct real-world 3D supervision. △ Less

Submitted 4 May, 2023; originally announced May 2023.

Comments: Project Webpage: https://ap229997.github.io/projects/horse/

arXiv:2305.02401 [pdf, other]

Synthetic DOmain-Targeted Augmentation (S-DOTA) Improves Model Generalization in Digital Pathology

Authors: Sai Chowdary Gullapally, Yibo Zhang, Nitin Kumar Mittal, Deeksha Kartik, Sandhya Srinivasan, Kevin Rose, Daniel Shenker, Dinkar Juyal, Harshith Padigela, Raymond Biju, Victor Minden, Chirag Maheshwari, Marc Thibault, Zvi Goldstein, Luke Novak, Nidhi Chandra, Justin Lee, Aaditya Prakash, Chintan Shah, John Abel, Darren Fahy, Amaro Taylor-Weiner, Anand Sampat

Abstract: Machine learning algorithms have the potential to improve patient outcomes in digital pathology. However, generalization of these tools is currently limited by sensitivity to variations in tissue preparation, staining procedures and scanning equipment that lead to domain shift in digitized slides. To overcome this limitation and improve model generalization, we studied the effectiveness of two Syn… ▽ More Machine learning algorithms have the potential to improve patient outcomes in digital pathology. However, generalization of these tools is currently limited by sensitivity to variations in tissue preparation, staining procedures and scanning equipment that lead to domain shift in digitized slides. To overcome this limitation and improve model generalization, we studied the effectiveness of two Synthetic DOmain-Targeted Augmentation (S-DOTA) methods, namely CycleGAN-enabled Scanner Transform (ST) and targeted Stain Vector Augmentation (SVA), and compared them against the International Color Consortium (ICC) profile-based color calibration (ICC Cal) method and a baseline method using traditional brightness, color and noise augmentations. We evaluated the ability of these techniques to improve model generalization to various tasks and settings: four models, two model types (tissue segmentation and cell classification), two loss functions, six labs, six scanners, and three indications (hepatocellular carcinoma (HCC), nonalcoholic steatohepatitis (NASH), prostate adenocarcinoma). We compared these methods based on the macro-averaged F1 scores on in-distribution (ID) and out-of-distribution (OOD) test sets across multiple domains, and found that S-DOTA methods (i.e., ST and SVA) led to significant improvements over ICC Cal and baseline on OOD data while maintaining comparable performance on ID data. Thus, we demonstrate that S-DOTA may help address generalization due to domain shift in real world applications. △ Less

Submitted 3 May, 2023; originally announced May 2023.

arXiv:2304.12483 [pdf, other]

Towards Realistic Generative 3D Face Models

Authors: Aashish Rai, Hiresh Gupta, Ayush Pandey, Francisco Vicente Carrasco, Shingo Jason Takagi, Amaury Aubel, Daeil Kim, Aayush Prakash, Fernando de la Torre

Abstract: In recent years, there has been significant progress in 2D generative face models fueled by applications such as animation, synthetic data generation, and digital avatars. However, due to the absence of 3D information, these 2D models often struggle to accurately disentangle facial attributes like pose, expression, and illumination, limiting their editing capabilities. To address this limitation,… ▽ More In recent years, there has been significant progress in 2D generative face models fueled by applications such as animation, synthetic data generation, and digital avatars. However, due to the absence of 3D information, these 2D models often struggle to accurately disentangle facial attributes like pose, expression, and illumination, limiting their editing capabilities. To address this limitation, this paper proposes a 3D controllable generative face model to produce high-quality albedo and precise 3D shape leveraging existing 2D generative models. By combining 2D face generative models with semantic face manipulation, this method enables editing of detailed 3D rendered faces. The proposed framework utilizes an alternating descent optimization approach over shape and albedo. Differentiable rendering is used to train high-quality shapes and albedo without 3D supervision. Moreover, this approach outperforms the state-of-the-art (SOTA) methods in the well-known NoW benchmark for shape reconstruction. It also outperforms the SOTA reconstruction models in recovering rendered faces' identities across novel poses by an average of 10%. Additionally, the paper demonstrates direct control of expressions in 3D faces by exploiting latent space leading to text-based editing of 3D faces. △ Less

Submitted 26 October, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

Comments: Preprint

arXiv:2303.13814 [pdf, other]

Multimodal Adaptive Fusion of Face and Gait Features using Keyless attention based Deep Neural Networks for Human Identification

Authors: Ashwin Prakash, Thejaswin S, Athira Nambiar, Alexandre Bernardino

Abstract: Biometrics plays a significant role in vision-based surveillance applications. Soft biometrics such as gait is widely used with face in surveillance tasks like person recognition and re-identification. Nevertheless, in practical scenarios, classical fusion techniques respond poorly to changes in individual users and in the external environment. To this end, we propose a novel adaptive multi-biomet… ▽ More Biometrics plays a significant role in vision-based surveillance applications. Soft biometrics such as gait is widely used with face in surveillance tasks like person recognition and re-identification. Nevertheless, in practical scenarios, classical fusion techniques respond poorly to changes in individual users and in the external environment. To this end, we propose a novel adaptive multi-biometric fusion strategy for the dynamic incorporation of gait and face biometric cues by leveraging keyless attention deep neural networks. Various external factors such as viewpoint and distance to the camera, are investigated in this study. Extensive experiments have shown superior performanceof the proposed model compared with the state-of-the-art model. △ Less

Submitted 24 March, 2023; originally announced March 2023.

Comments: -

arXiv:2303.06280 [pdf, other]

doi 10.1145/3576915.3623116

Stateful Defenses for Machine Learning Models Are Not Yet Secure Against Black-box Attacks

Authors: Ryan Feng, Ashish Hooda, Neal Mangaokar, Kassem Fawaz, Somesh Jha, Atul Prakash

Abstract: Recent work has proposed stateful defense models (SDMs) as a compelling strategy to defend against a black-box attacker who only has query access to the model, as is common for online machine learning platforms. Such stateful defenses aim to defend against black-box attacks by tracking the query history and detecting and rejecting queries that are "similar" and thus preventing black-box attacks fr… ▽ More Recent work has proposed stateful defense models (SDMs) as a compelling strategy to defend against a black-box attacker who only has query access to the model, as is common for online machine learning platforms. Such stateful defenses aim to defend against black-box attacks by tracking the query history and detecting and rejecting queries that are "similar" and thus preventing black-box attacks from finding useful gradients and making progress towards finding adversarial attacks within a reasonable query budget. Recent SDMs (e.g., Blacklight and PIHA) have shown remarkable success in defending against state-of-the-art black-box attacks. In this paper, we show that SDMs are highly vulnerable to a new class of adaptive black-box attacks. We propose a novel adaptive black-box attack strategy called Oracle-guided Adaptive Rejection Sampling (OARS) that involves two stages: (1) use initial query patterns to infer key properties about an SDM's defense; and, (2) leverage those extracted properties to design subsequent query patterns to evade the SDM's defense while making progress towards finding adversarial inputs. OARS is broadly applicable as an enhancement to existing black-box attacks - we show how to apply the strategy to enhance six common black-box attacks to be more effective against current class of SDMs. For example, OARS-enhanced versions of black-box attacks improved attack success rate against recent stateful defenses from almost 0% to to almost 100% for multiple datasets within reasonable query budgets. △ Less

Submitted 26 September, 2023; v1 submitted 10 March, 2023; originally announced March 2023.

Comments: ACM CCS 2023

arXiv:2302.11052 [pdf, other]

doi 10.1145/3543873.3584633

Que2Engage: Embedding-based Retrieval for Relevant and Engaging Products at Facebook Marketplace

Authors: Yunzhong He, Yuxin Tian, Mengjiao Wang, Feier Chen, Licheng Yu, Maolong Tang, Congcong Chen, Ning Zhang, Bin Kuang, Arul Prakash

Abstract: Embedding-based Retrieval (EBR) in e-commerce search is a powerful search retrieval technique to address semantic matches between search queries and products. However, commercial search engines like Facebook Marketplace Search are complex multi-stage systems optimized for multiple business objectives. At Facebook Marketplace, search retrieval focuses on matching search queries with relevant produc… ▽ More Embedding-based Retrieval (EBR) in e-commerce search is a powerful search retrieval technique to address semantic matches between search queries and products. However, commercial search engines like Facebook Marketplace Search are complex multi-stage systems optimized for multiple business objectives. At Facebook Marketplace, search retrieval focuses on matching search queries with relevant products, while search ranking puts more emphasis on contextual signals to up-rank the more engaging products. As a result, the end-to-end searcher experience is a function of both relevance and engagement, and the interaction between different stages of the system. This presents challenges to EBR systems in order to optimize for better searcher experiences. In this paper we presents Que2Engage, a search EBR system built towards bridging the gap between retrieval and ranking for end-to-end optimizations. Que2Engage takes a multimodal & multitask approach to infuse contextual information into the retrieval stage and to balance different business objectives. We show the effectiveness of our approach via a multitask evaluation framework and thorough baseline comparisons and ablation studies. Que2Engage is deployed on Facebook Marketplace Search and shows significant improvements in searcher engagement in two weeks of A/B testing. △ Less

Submitted 21 February, 2023; originally announced February 2023.

Comments: Accepted by WWW'2023

arXiv:2302.10527 [pdf, other]

doi 10.1145/3543873.3584622

HierCat: Hierarchical Query Categorization from Weakly Supervised Data at Facebook Marketplace

Authors: Yunzhong He, Cong Zhang, Ruoyan Kong, Chaitanya Kulkarni, Qing Liu, Ashish Gandhe, Amit Nithianandan, Arul Prakash

Abstract: Query categorization at customer-to-customer e-commerce platforms like Facebook Marketplace is challenging due to the vagueness of search intent, noise in real-world data, and imbalanced training data across languages. Its deployment also needs to consider challenges in scalability and downstream integration in order to translate modeling advances into better search result relevance. In this paper… ▽ More Query categorization at customer-to-customer e-commerce platforms like Facebook Marketplace is challenging due to the vagueness of search intent, noise in real-world data, and imbalanced training data across languages. Its deployment also needs to consider challenges in scalability and downstream integration in order to translate modeling advances into better search result relevance. In this paper we present HierCat, the query categorization system at Facebook Marketplace. HierCat addresses these challenges by leveraging multi-task pre-training of dual-encoder architectures with a hierarchical inference step to effectively learn from weakly supervised training data mined from searcher engagement. We show that HierCat not only outperforms popular methods in offline experiments, but also leads to 1.4% improvement in NDCG and 4.3% increase in searcher engagement at Facebook Marketplace Search in online A/B testing. △ Less

Submitted 21 February, 2023; v1 submitted 21 February, 2023; originally announced February 2023.

Comments: Accepted by WWW'2023

arXiv:2302.06227 [pdf, other]

Fast and small footprint Hybrid HMM-HiFiGAN based system for speech synthesis in Indian languages

Authors: Sudhanshu Srivastava, Ishika Gupta, Anusha Prakash, Jom Kuriakose, Hema A. Murthy

Abstract: Hidden-Markov-model (HMM) based text-to-speech (HTS) offers flexibility in speaking styles along with fast training and synthesis while being computationally less intense. HTS performs well even in low-resource scenarios. The primary drawback is that the voice quality is poor compared to that of E2E systems. A hybrid approach combining HMM-based feature generation and neural-network-based HiFi-GAN… ▽ More Hidden-Markov-model (HMM) based text-to-speech (HTS) offers flexibility in speaking styles along with fast training and synthesis while being computationally less intense. HTS performs well even in low-resource scenarios. The primary drawback is that the voice quality is poor compared to that of E2E systems. A hybrid approach combining HMM-based feature generation and neural-network-based HiFi-GAN vocoder to improve HTS synthesis quality is proposed. HTS is trained on high-resolution mel-spectrograms instead of conventional mel generalized coefficients (MGC), and the output mel-spectrogram corresponding to the input text is used in a HiFi-GAN vocoder trained on Indic languages, to produce naturalness that is equivalent to that of E2E systems, as evidenced from the DMOS and PC tests. △ Less

Submitted 13 February, 2023; originally announced February 2023.

Comments: 5 pages, 5 figures

arXiv:2212.05614 [pdf, other]

Generic Tagging for RISC-V Binaries

Authors: David Demicco, Matthew Cole, Gokturk Yuksek, Ravi Theja Gollapudi, Aravind Prakash, Kanad Ghose, Zerksis Umrigar

Abstract: With the widespread popularity of RISC-V -- an open-source ISA -- custom hardware security solutions targeting specific defense needs are gaining popularity. These solutions often require specialized compilers that can insert metadata (called tags) into the generated binaries, and/or extend the RISC-V ISA with new instructions. Developing such compilers can be a tedious and time-consuming process.… ▽ More With the widespread popularity of RISC-V -- an open-source ISA -- custom hardware security solutions targeting specific defense needs are gaining popularity. These solutions often require specialized compilers that can insert metadata (called tags) into the generated binaries, and/or extend the RISC-V ISA with new instructions. Developing such compilers can be a tedious and time-consuming process. In this paper, we present COGENT, a generic instruction tag generator for RISC-V architecture. COGENT is capable of associating a tag of configurable and varying widths (1 to 20 bits) to each instruction. It is also capable of emitting labels that are central to the implementation of control-flow integrity (CFI) solutions. COGENT encodes all tags and labels as nop instructions thereby providing full backward compatibility. We evaluate COGENT on a subset of programs from the SPEC CPU2017 benchmark suite and report the binary size increase to be 29.3% and 18.27% for the lowest and highest tag coverage levels respectively. Additionally, we executed tagged programs on COTS RISC-V unmodified hardware and found the execution time overhead (with respect to backward compatibility) to be 13.4% and 5.72% for the lowest and highest coverage levels respectively. Finally, using a case study, we present possible use case scenarios where COGENT can be applied. △ Less

Submitted 11 December, 2022; originally announced December 2022.

arXiv:2211.13837 [pdf, other]

End-to-End Stochastic Optimization with Energy-Based Model

Authors: Lingkai Kong, Jiaming Cui, Yuchen Zhuang, Rui Feng, B. Aditya Prakash, Chao Zhang

Abstract: Decision-focused learning (DFL) was recently proposed for stochastic optimization problems that involve unknown parameters. By integrating predictive modeling with an implicitly differentiable optimization layer, DFL has shown superior performance to the standard two-stage predict-then-optimize pipeline. However, most existing DFL methods are only applicable to convex problems or a subset of nonco… ▽ More Decision-focused learning (DFL) was recently proposed for stochastic optimization problems that involve unknown parameters. By integrating predictive modeling with an implicitly differentiable optimization layer, DFL has shown superior performance to the standard two-stage predict-then-optimize pipeline. However, most existing DFL methods are only applicable to convex problems or a subset of nonconvex problems that can be easily relaxed to convex ones. Further, they can be inefficient in training due to the requirement of solving and differentiating through the optimization problem in every training iteration. We propose SO-EBM, a general and efficient DFL method for stochastic optimization using energy-based models. Instead of relying on KKT conditions to induce an implicit optimization layer, SO-EBM explicitly parameterizes the original optimization problem using a differentiable optimization layer based on energy functions. To better approximate the optimization landscape, we propose a coupled training objective that uses a maximum likelihood loss to capture the optimum location and a distribution-based regularizer to capture the overall energy landscape. Finally, we propose an efficient training procedure for SO-EBM with a self-normalized importance sampler based on a Gaussian mixture proposal. We evaluate SO-EBM in three applications: power scheduling, COVID-19 resource allocation, and non-convex adversarial security game, demonstrating the effectiveness and efficiency of SO-EBM. △ Less

Submitted 24 November, 2022; originally announced November 2022.

Comments: NeurIPS 2022 Oral

arXiv:2211.01338 [pdf, other]

Technology Pipeline for Large Scale Cross-Lingual Dubbing of Lecture Videos into Multiple Indian Languages

Authors: Anusha Prakash, Arun Kumar, Ashish Seth, Bhagyashree Mukherjee, Ishika Gupta, Jom Kuriakose, Jordan Fernandes, K V Vikram, Mano Ranjith Kumar M, Metilda Sagaya Mary, Mohammad Wajahat, Mohana N, Mudit Batra, Navina K, Nihal John George, Nithya Ravi, Pruthwik Mishra, Sudhanshu Srivastava, Vasista Sai Lodagala, Vandan Mujadia, Kada Sai Venkata Vineeth, Vrunda Sukhadia, Dipti Sharma, Hema Murthy, Pushpak Bhattacharya , et al. (2 additional authors not shown)

Abstract: Cross-lingual dubbing of lecture videos requires the transcription of the original audio, correction and removal of disfluencies, domain term discovery, text-to-text translation into the target language, chunking of text using target language rhythm, text-to-speech synthesis followed by isochronous lipsyncing to the original video. This task becomes challenging when the source and target languages… ▽ More Cross-lingual dubbing of lecture videos requires the transcription of the original audio, correction and removal of disfluencies, domain term discovery, text-to-text translation into the target language, chunking of text using target language rhythm, text-to-speech synthesis followed by isochronous lipsyncing to the original video. This task becomes challenging when the source and target languages belong to different language families, resulting in differences in generated audio duration. This is further compounded by the original speaker's rhythm, especially for extempore speech. This paper describes the challenges in regenerating English lecture videos in Indian languages semi-automatically. A prototype is developed for dubbing lectures into 9 Indian languages. A mean-opinion-score (MOS) is obtained for two languages, Hindi and Tamil, on two different courses. The output video is compared with the original video in terms of MOS (1-5) and lip synchronisation with scores of 4.09 and 3.74, respectively. The human effort also reduces by 75%. △ Less

Submitted 1 November, 2022; originally announced November 2022.

Showing 1–50 of 156 results for author: Prakash, A