Search | arXiv e-print repository

PostMark: A Robust Blackbox Watermark for Large Language Models

Authors: Yapei Chang, Kalpesh Krishna, Amir Houmansadr, John Wieting, Mohit Iyyer

Abstract: The most effective techniques to detect LLM-generated text rely on inserting a detectable signature -- or watermark -- during the model's decoding process. Most existing watermarking methods require access to the underlying LLM's logits, which LLM API providers are loath to share due to fears of model distillation. As such, these watermarks must be implemented independently by each LLM provider. I… ▽ More The most effective techniques to detect LLM-generated text rely on inserting a detectable signature -- or watermark -- during the model's decoding process. Most existing watermarking methods require access to the underlying LLM's logits, which LLM API providers are loath to share due to fears of model distillation. As such, these watermarks must be implemented independently by each LLM provider. In this paper, we develop PostMark, a modular post-hoc watermarking procedure in which an input-dependent set of words (determined via a semantic embedding) is inserted into the text after the decoding process has completed. Critically, PostMark does not require logit access, which means it can be implemented by a third party. We also show that PostMark is more robust to paraphrasing attacks than existing watermarking methods: our experiments cover eight baseline algorithms, five base LLMs, and three datasets. Finally, we evaluate the impact of PostMark on text quality using both automated and human assessments, highlighting the trade-off between quality and robustness to paraphrasing. We release our code, outputs, and annotations at https://github.com/lilakk/PostMark. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: preprint; 18 pages, 5 figures

arXiv:2406.09264 [pdf, other]

Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions

Authors: Hua Shen, Tiffany Knearem, Reshmi Ghosh, Kenan Alkiek, Kundan Krishna, Yachuan Liu, Ziqiao Ma, Savvas Petridis, Yi-Hao Peng, Li Qiwei, Sushrita Rakshit, Chenglei Si, Yutong Xie, Jeffrey P. Bigham, Frank Bentley, Joyce Chai, Zachary Lipton, Qiaozhu Mei, Rada Mihalcea, Michael Terry, Diyi Yang, Meredith Ringel Morris, Paul Resnick, David Jurgens

Abstract: Recent advancements in general-purpose AI have highlighted the importance of guiding AI systems towards the intended goals, ethical principles, and values of individuals and groups, a concept broadly recognized as alignment. However, the lack of clarified definitions and scopes of human-AI alignment poses a significant obstacle, hampering collaborative efforts across research domains to achieve th… ▽ More Recent advancements in general-purpose AI have highlighted the importance of guiding AI systems towards the intended goals, ethical principles, and values of individuals and groups, a concept broadly recognized as alignment. However, the lack of clarified definitions and scopes of human-AI alignment poses a significant obstacle, hampering collaborative efforts across research domains to achieve this alignment. In particular, ML- and philosophy-oriented alignment research often views AI alignment as a static, unidirectional process (i.e., aiming to ensure that AI systems' objectives match humans) rather than an ongoing, mutual alignment problem [429]. This perspective largely neglects the long-term interaction and dynamic changes of alignment. To understand these gaps, we introduce a systematic review of over 400 papers published between 2019 and January 2024, spanning multiple domains such as Human-Computer Interaction (HCI), Natural Language Processing (NLP), Machine Learning (ML), and others. We characterize, define and scope human-AI alignment. From this, we present a conceptual framework of "Bidirectional Human-AI Alignment" to organize the literature from a human-centered perspective. This framework encompasses both 1) conventional studies of aligning AI to humans that ensures AI produces the intended outcomes determined by humans, and 2) a proposed concept of aligning humans to AI, which aims to help individuals and society adjust to AI advancements both cognitively and behaviorally. Additionally, we articulate the key findings derived from literature analysis, including discussions about human values, interaction techniques, and evaluations. To pave the way for future studies, we envision three key challenges for future directions and propose examples of potential future solutions. △ Less

Submitted 17 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

Comments: 56 pages

arXiv:2405.08003 [pdf, ps, other]

doi 10.1142/S0219025724400022

Continuous Krishna-Parthasarathy Entropic Uncertainty Principle

Authors: K. Mahesh Krishna

Abstract: In 2002, Krishna and Parthasarathy [\textit{Sankhyā Ser. A}] derived discrete quantum version of Maassen-Uffink [\textit{Phys. Rev. Lett., 1988}] entropic uncertainty principle. In this paper, using the notion of continuous operator-valued frames, we derive an entropic uncertainty principle for arbitrary family of operators indexed by measure spaces having finite measure. We give an application to… ▽ More In 2002, Krishna and Parthasarathy [\textit{Sankhyā Ser. A}] derived discrete quantum version of Maassen-Uffink [\textit{Phys. Rev. Lett., 1988}] entropic uncertainty principle. In this paper, using the notion of continuous operator-valued frames, we derive an entropic uncertainty principle for arbitrary family of operators indexed by measure spaces having finite measure. We give an application to the special case of compact groups. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: 7 pages, 0 Figures

MSC Class: 81P15; 94A17; 42C15

Journal ref: Special issue of Infinite Dimensional Analysis, Quantum Probability and Related Topics in honour of Prof. K. R. Parthasarathy, 18 March 2024

arXiv:2404.17922 [pdf, other]

Open-Set 3D Semantic Instance Maps for Vision Language Navigation -- O3D-SIM

Authors: Laksh Nanwani, Kumaraditya Gupta, Aditya Mathur, Swayam Agrawal, A. H. Abdul Hafez, K. Madhava Krishna

Abstract: Humans excel at forming mental maps of their surroundings, equipping them to understand object relationships and navigate based on language queries. Our previous work SI Maps [1] showed that having instance-level information and the semantic understanding of an environment helps significantly improve performance for language-guided tasks. We extend this instance-level approach to 3D while increasi… ▽ More Humans excel at forming mental maps of their surroundings, equipping them to understand object relationships and navigate based on language queries. Our previous work SI Maps [1] showed that having instance-level information and the semantic understanding of an environment helps significantly improve performance for language-guided tasks. We extend this instance-level approach to 3D while increasing the pipeline's robustness and improving quantitative and qualitative results. Our method leverages foundational models for object recognition, image segmentation, and feature extraction. We propose a representation that results in a 3D point cloud map with instance-level embeddings, which bring in the semantic understanding that natural language commands can query. Quantitatively, the work improves upon the success rate of language-guided tasks. At the same time, we qualitatively observe the ability to identify instances more clearly and leverage the foundational models and language and image-aligned embeddings to identify objects that, otherwise, a closed-set approach wouldn't be able to identify. △ Less

Submitted 27 April, 2024; originally announced April 2024.

arXiv:2404.04643 [pdf, other]

Constrained 6-DoF Grasp Generation on Complex Shapes for Improved Dual-Arm Manipulation

Authors: Gaurav Singh, Sanket Kalwar, Md Faizal Karim, Bipasha Sen, Nagamanikandan Govindan, Srinath Sridhar, K Madhava Krishna

Abstract: Efficiently generating grasp poses tailored to specific regions of an object is vital for various robotic manipulation tasks, especially in a dual-arm setup. This scenario presents a significant challenge due to the complex geometries involved, requiring a deep understanding of the local geometry to generate grasps efficiently on the specified constrained regions. Existing methods only explore set… ▽ More Efficiently generating grasp poses tailored to specific regions of an object is vital for various robotic manipulation tasks, especially in a dual-arm setup. This scenario presents a significant challenge due to the complex geometries involved, requiring a deep understanding of the local geometry to generate grasps efficiently on the specified constrained regions. Existing methods only explore settings involving table-top/small objects and require augmented datasets to train, limiting their performance on complex objects. We propose CGDF: Constrained Grasp Diffusion Fields, a diffusion-based grasp generative model that generalizes to objects with arbitrary geometries, as well as generates dense grasps on the target regions. CGDF uses a part-guided diffusion approach that enables it to get high sample efficiency in constrained grasping without explicitly training on massive constraint-augmented datasets. We provide qualitative and quantitative comparisons using analytical metrics and in simulation, in both unconstrained and constrained settings to show that our method can generalize to generate stable grasps on complex objects, especially useful for dual-arm manipulation settings, while existing methods struggle to do so. △ Less

Submitted 6 April, 2024; originally announced April 2024.

Comments: Project Page: https://constrained-grasp-diffusion.github.io/

arXiv:2404.03307 [pdf, other]

Bi-level Trajectory Optimization on Uneven Terrains with Differentiable Wheel-Terrain Interaction Model

Authors: Amith Manoharan, Aditya Sharma, Himani Belsare, Kaustab Pal, K. Madhava Krishna, Arun Kumar Singh

Abstract: Navigation of wheeled vehicles on uneven terrain necessitates going beyond the 2D approaches for trajectory planning. Specifically, it is essential to incorporate the full 6dof variation of vehicle pose and its associated stability cost in the planning process. To this end, most recent works aim to learn a neural network model to predict the vehicle evolution. However, such approaches are data-int… ▽ More Navigation of wheeled vehicles on uneven terrain necessitates going beyond the 2D approaches for trajectory planning. Specifically, it is essential to incorporate the full 6dof variation of vehicle pose and its associated stability cost in the planning process. To this end, most recent works aim to learn a neural network model to predict the vehicle evolution. However, such approaches are data-intensive and fraught with generalization issues. In this paper, we present a purely model-based approach that just requires the digital elevation information of the terrain. Specifically, we express the wheel-terrain interaction and 6dof pose prediction as a non-linear least squares (NLS) problem. As a result, trajectory planning can be viewed as a bi-level optimization. The inner optimization layer predicts the pose on the terrain along a given trajectory, while the outer layer deforms the trajectory itself to reduce the stability and kinematic costs of the pose. We improve the state-of-the-art in the following respects. First, we show that our NLS based pose prediction closely matches the output from a high-fidelity physics engine. This result coupled with the fact that we can query gradients of the NLS solver, makes our pose predictor, a differentiable wheel-terrain interaction model. We further leverage this differentiability to efficiently solve the proposed bi-level trajectory optimization problem. Finally, we perform extensive experiments, and comparison with a baseline to showcase the effectiveness of our approach in obtaining smooth, stable trajectories. △ Less

Submitted 11 April, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

Comments: 8 pages, 7 figures, submitted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)

arXiv:2404.02911 [pdf, ps, other]

Machine Learning Driven Global Optimisation Framework for Analog Circuit Design

Authors: Ria Rashid, Komala Krishna, Clint Pazhayidam George, Nandakumar Nambath

Abstract: We propose a machine learning-driven optimisation framework for analog circuit design in this paper. The primary objective is to determine the device sizes for the optimal performance of analog circuits for a given set of specifications. Our methodology entails employing machine learning models and spice simulations to direct the optimisation algorithm towards achieving the optimal design for anal… ▽ More We propose a machine learning-driven optimisation framework for analog circuit design in this paper. The primary objective is to determine the device sizes for the optimal performance of analog circuits for a given set of specifications. Our methodology entails employing machine learning models and spice simulations to direct the optimisation algorithm towards achieving the optimal design for analog circuits. Machine learning based global offline surrogate models, with the circuit design parameters as the input, are built in the design space for the analog circuits under study and is used to guide the optimisation algorithm, resulting in faster convergence and a reduced number of spice simulations. Multi-layer perceptron and random forest regressors are employed to predict the required design specifications of the analog circuit. Since the saturation condition of transistors is vital in the proper working of analog circuits, multi-layer perceptron classifiers are used to predict the saturation condition of each transistor in the circuit. The feasibility of the candidate solutions is verified using machine learning models before invoking spice simulations. We validate the proposed framework using three circuit topologies--a bandgap reference, a folded cascode operational amplifier, and a two-stage operational amplifier. The simulation results show better optimum values and lower standard deviations for fitness functions after convergence. Incorporating the machine learning-based predictions proposed in the optimisation method has resulted in the reduction of spice calls by 56%, 59%, and 83% when compared with standard approaches in the three test cases considered in the study. △ Less

Submitted 26 February, 2024; originally announced April 2024.

arXiv:2404.00910 [pdf, ps, other]

Unexpected Uncertainty Principle for Disc Banach Spaces

Authors: K. Mahesh Krishna

Abstract: Let $(\{f_n\}_{n=1}^\infty, \{τ_n\}_{n=1}^\infty)$ and $(\{g_n\}_{n=1}^\infty, \{ω_n\}_{n=1}^\infty)$ be unbounded continuous p-Schauder frames ($0<p<1$) for a disc Banach space $\mathcal{X}$. Then for every $x \in ( \mathcal{D}(θ_f) \cap\mathcal{D}(θ_g))\setminus\{0\}$, we show that \begin{align}\label{UB} (1) \quad \quad \quad \quad \|θ_f x\|_0\|θ_g x\|_0 \geq \frac{1}{\left(\displaystyle\sup_{n… ▽ More Let $(\{f_n\}_{n=1}^\infty, \{τ_n\}_{n=1}^\infty)$ and $(\{g_n\}_{n=1}^\infty, \{ω_n\}_{n=1}^\infty)$ be unbounded continuous p-Schauder frames ($0<p<1$) for a disc Banach space $\mathcal{X}$. Then for every $x \in ( \mathcal{D}(θ_f) \cap\mathcal{D}(θ_g))\setminus\{0\}$, we show that \begin{align}\label{UB} (1) \quad \quad \quad \quad \|θ_f x\|_0\|θ_g x\|_0 \geq \frac{1}{\left(\displaystyle\sup_{n,m \in \mathbb{N} }|f_n(ω_m)|\right)^p\left(\displaystyle\sup_{n, m \in \mathbb{N}}|g_m(τ_n)|\right)^p}, \end{align} where \begin{align*} & θ_f: \mathcal{D}(θ_f) \ni x \mapsto θ_fx := \{f_n(x)\}_{n=1}^\infty\in \ell^p(\mathbb{N}), \quad θ_g: \mathcal{D}(θ_g) \ni x \mapsto θ_gx := \{g_n(x)\}_{n=1}^\infty\in \ell^p(\mathbb{N}). \end{align*} Inequality (1) is unexpectedly different from both bounded uncertainty principle arXiv:2308.00312v1 and unbounded uncertainty principle arXiv:2312.00366v1 for Banach spaces. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: 6 Pages, 0 Figures

MSC Class: 42C15

arXiv:2403.20116 [pdf, other]

LeGo-Drive: Language-enhanced Goal-oriented Closed-Loop End-to-End Autonomous Driving

Authors: Pranjal Paul, Anant Garg, Tushar Choudhary, Arun Kumar Singh, K. Madhava Krishna

Abstract: Existing Vision-Language models (VLMs) estimate either long-term trajectory waypoints or a set of control actions as a reactive solution for closed-loop planning based on their rich scene comprehension. However, these estimations are coarse and are subjective to their "world understanding" which may generate sub-optimal decisions due to perception errors. In this paper, we introduce LeGo-Drive, wh… ▽ More Existing Vision-Language models (VLMs) estimate either long-term trajectory waypoints or a set of control actions as a reactive solution for closed-loop planning based on their rich scene comprehension. However, these estimations are coarse and are subjective to their "world understanding" which may generate sub-optimal decisions due to perception errors. In this paper, we introduce LeGo-Drive, which aims to address this issue by estimating a goal location based on the given language command as an intermediate representation in an end-to-end setting. The estimated goal might fall in a non-desirable region, like on top of a car for a parking-like command, leading to inadequate planning. Hence, we propose to train the architecture in an end-to-end manner, resulting in iterative refinement of both the goal and the trajectory collectively. We validate the effectiveness of our method through comprehensive experiments conducted in diverse simulated environments. We report significant improvements in standard autonomous driving metrics, with a goal reaching Success Rate of 81%. We further showcase the versatility of LeGo-Drive across different driving scenarios and linguistic inputs, underscoring its potential for practical deployment in autonomous vehicles and intelligent transportation systems. △ Less

Submitted 29 March, 2024; originally announced March 2024.

arXiv:2403.17946 [pdf, ps, other]

Nonlinear Heisenberg-Robertson-Schrodinger Uncertainty Principle

Authors: K. Mahesh Krishna

Abstract: We derive an uncertainty principle for Lipschitz maps acting on subsets of Banach spaces. We show that this nonlinear uncertainty principle reduces to the Heisenberg-Robertson-Schrodinger uncertainty principle for linear operators acting on Hilbert spaces. We derive an uncertainty principle for Lipschitz maps acting on subsets of Banach spaces. We show that this nonlinear uncertainty principle reduces to the Heisenberg-Robertson-Schrodinger uncertainty principle for linear operators acting on Hilbert spaces. △ Less

Submitted 1 March, 2024; originally announced March 2024.

Comments: 4 Pages, 0 Figures

MSC Class: 26A16; 46B99

arXiv:2403.05530 [pdf, other]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2402.12566 [pdf, other]

GenAudit: Fixing Factual Errors in Language Model Outputs with Evidence

Authors: Kundan Krishna, Sanjana Ramprasad, Prakhar Gupta, Byron C. Wallace, Zachary C. Lipton, Jeffrey P. Bigham

Abstract: LLMs can generate factually incorrect statements even when provided access to reference documents. Such errors can be dangerous in high-stakes applications (e.g., document-grounded QA for healthcare or finance). We present GenAudit -- a tool intended to assist fact-checking LLM responses for document-grounded tasks. GenAudit suggests edits to the LLM response by revising or removing claims that ar… ▽ More LLMs can generate factually incorrect statements even when provided access to reference documents. Such errors can be dangerous in high-stakes applications (e.g., document-grounded QA for healthcare or finance). We present GenAudit -- a tool intended to assist fact-checking LLM responses for document-grounded tasks. GenAudit suggests edits to the LLM response by revising or removing claims that are not supported by the reference document, and also presents evidence from the reference for facts that do appear to have support. We train models to execute these tasks, and design an interactive interface to present suggested edits and evidence to users. Comprehensive evaluation by human raters shows that GenAudit can detect errors in 8 different LLM outputs when summarizing documents from diverse domains. To ensure that most errors are flagged by the system, we propose a method that can increase the error recall while minimizing impact on precision. We release our tool (GenAudit) and fact-checking model for public use. △ Less

Submitted 16 March, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

Comments: Code and models available at https://genaudit.org

arXiv:2402.08591 [pdf, ps, other]

Nonlinear Maccone-Pati Uncertainty Principle

Authors: K. Mahesh Krishna

Abstract: We show that one of the two important uncertainty principles derived by Maccone and Pati \textit{[Phys. Rev. Lett., 2014]} can be derived for arbitrary maps defined on subsets of $\mathcal{L}^p$ spaces for $1< p<\infty$. Our main tool is the Clarkson inequalities. We also derive a nonlinear uncertainty principle for weak parallelogram spaces and Type-p Banach spaces. We show that one of the two important uncertainty principles derived by Maccone and Pati \textit{[Phys. Rev. Lett., 2014]} can be derived for arbitrary maps defined on subsets of $\mathcal{L}^p$ spaces for $1< p<\infty$. Our main tool is the Clarkson inequalities. We also derive a nonlinear uncertainty principle for weak parallelogram spaces and Type-p Banach spaces. △ Less

Submitted 1 February, 2024; originally announced February 2024.

Comments: 6 pages, 0 figures

MSC Class: 46B20; 46E30

arXiv:2402.04255 [pdf, ps, other]

Functional Kuppinger-Durisi-Bölcskei Uncertainty Principle

Authors: K. Mahesh Krishna

Abstract: Let $\mathcal{X}$ be a Banach space. Let $\{τ_j\}_{j=1}^n, \{ω_k\}_{k=1}^m\subseteq \mathcal{X}$ and $\{f_j\}_{j=1}^n$, $\{g_k\}_{k=1}^m\subseteq \mathcal{X}^*$ satisfy $ |f_j(τ_j)|\geq 1$ for all $ 1\leq j \leq n$, $|g_k(ω_k)|\geq 1 $ for all $1\leq k \leq m$. If $x \in \mathcal{X}\setminus \{0\}$ is such that $x=θ_τθ_f x=θ_ωθ_g x$, then we show that \begin{align}\label{FKDB} (1) \quad\quad\quad\… ▽ More Let $\mathcal{X}$ be a Banach space. Let $\{τ_j\}_{j=1}^n, \{ω_k\}_{k=1}^m\subseteq \mathcal{X}$ and $\{f_j\}_{j=1}^n$, $\{g_k\}_{k=1}^m\subseteq \mathcal{X}^*$ satisfy $ |f_j(τ_j)|\geq 1$ for all $ 1\leq j \leq n$, $|g_k(ω_k)|\geq 1 $ for all $1\leq k \leq m$. If $x \in \mathcal{X}\setminus \{0\}$ is such that $x=θ_τθ_f x=θ_ωθ_g x$, then we show that \begin{align}\label{FKDB} (1) \quad\quad\quad\quad \|θ_fx\|_0\|θ_gx\|_0\geq \frac{\bigg[1-(\|θ_fx\|_0-1)\max\limits_{1\leq j,r \leq n,j\neq r}|f_j(τ_r)|\bigg]^+\bigg[1-(\|θ_g x\|_0-1)\max\limits_{1\leq k,s \leq m,k\neq s}|g_k(ω_s)|\bigg]^+}{\left(\displaystyle\max_{1\leq j \leq n, 1\leq k \leq m}|f_j(ω_k)|\right)\left(\displaystyle\max_{1\leq j \leq n, 1\leq k \leq m}|g_k(τ_j)|\right)}. \end{align} We call Inequality (1) as \textbf{Functional Kuppinger-Durisi-Bölcskei Uncertainty Principle}. Inequality (1) improves the uncertainty principle obtained by Kuppinger, Durisi and Bölcskei \textit{[IEEE Trans. Inform. Theory (2012)]} (which improved the Donoho-Stark-Elad-Bruckstein uncertainty principle \textit{[SIAM J. Appl. Math. (1989), IEEE Trans. Inform. Theory (2002)]}). We also derive functional form of the uncertainity principle obtained by Studer, Kuppinger, Pope and Bölcskei \textit{[EEE Trans. Inform. Theory (2012)]}. △ Less

Submitted 1 January, 2024; originally announced February 2024.

Comments: 9 Pages, 0 Figures

MSC Class: 46A45; 46B45; 42C15

arXiv:2402.03509 [pdf, other]

Evaluating the Factuality of Zero-shot Summarizers Across Varied Domains

Authors: Sanjana Ramprasad, Kundan Krishna, Zachary C Lipton, Byron C Wallace

Abstract: Recent work has shown that large language models (LLMs) are capable of generating summaries zero-shot (i.e., without explicit supervision) that, under human assessment, are often comparable or even preferred to manually composed reference summaries. However, this prior work has focussed almost exclusively on evaluating news article summarization. How do zero-shot summarizers perform in other (pote… ▽ More Recent work has shown that large language models (LLMs) are capable of generating summaries zero-shot (i.e., without explicit supervision) that, under human assessment, are often comparable or even preferred to manually composed reference summaries. However, this prior work has focussed almost exclusively on evaluating news article summarization. How do zero-shot summarizers perform in other (potentially more specialized) domains? In this work we evaluate zero-shot generated summaries across specialized domains including biomedical articles, and legal bills (in addition to standard news benchmarks for reference). We focus especially on the factuality of outputs. We acquire annotations from domain experts to identify inconsistencies in summaries and systematically categorize these errors. We analyze whether the prevalence of a given domain in the pretraining corpus affects extractiveness and faithfulness of generated summaries of articles in this domain. We release all collected annotations to facilitate additional research toward measuring and realizing factually accurate summarization, beyond news articles. The dataset can be downloaded from https://github.com/sanjanaramprasad/zero_shot_faceval_domains △ Less

Submitted 5 February, 2024; originally announced February 2024.

arXiv:2401.17399 [pdf, other]

ATPPNet: Attention based Temporal Point cloud Prediction Network

Authors: Kaustab Pal, Aditya Sharma, Avinash Sharma, K. Madhava Krishna

Abstract: Point cloud prediction is an important yet challenging task in the field of autonomous driving. The goal is to predict future point cloud sequences that maintain object structures while accurately representing their temporal motion. These predicted point clouds help in other subsequent tasks like object trajectory estimation for collision avoidance or estimating locations with the least odometry d… ▽ More Point cloud prediction is an important yet challenging task in the field of autonomous driving. The goal is to predict future point cloud sequences that maintain object structures while accurately representing their temporal motion. These predicted point clouds help in other subsequent tasks like object trajectory estimation for collision avoidance or estimating locations with the least odometry drift. In this work, we present ATPPNet, a novel architecture that predicts future point cloud sequences given a sequence of previous time step point clouds obtained with LiDAR sensor. ATPPNet leverages Conv-LSTM along with channel-wise and spatial attention dually complemented by a 3D-CNN branch for extracting an enhanced spatio-temporal context to recover high quality fidel predictions of future point clouds. We conduct extensive experiments on publicly available datasets and report impressive performance outperforming the existing methods. We also conduct a thorough ablative study of the proposed architecture and provide an application study that highlights the potential of our model for tasks like odometry estimation. △ Less

Submitted 30 January, 2024; originally announced January 2024.

Comments: Accepted for presentation at the 2024 IEEE International Conference on Robotics and Automation (ICRA)

arXiv:2401.01954 [pdf, ps, other]

Word-Representability of Graphs with respect to Split Recomposition

Authors: Tithi Dwary, K. V. Krishna

Abstract: In this work, we show that the class of word-representable graphs is closed under split recomposition and determine the representation number of the graph obtained by recomposing two word-representable graphs. Accordingly, we show that the class of parity graphs is word-representable. Further, we obtain a characteristic property by which the recomposition of comparability graphs is a comparability… ▽ More In this work, we show that the class of word-representable graphs is closed under split recomposition and determine the representation number of the graph obtained by recomposing two word-representable graphs. Accordingly, we show that the class of parity graphs is word-representable. Further, we obtain a characteristic property by which the recomposition of comparability graphs is a comparability graph. Consequently, we also establish the permutation-representation number (prn) of the resulting comparability graph. We also introduce a subclass of comparability graphs, called prn-irreducible graphs. We provide a criterion such that the split recomposition of two prn-irreducible graphs is a comparability graph and determine the prn of the resultant graph. △ Less

Submitted 3 January, 2024; originally announced January 2024.

MSC Class: 68R10; 68R15; 05C90; 06A07

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.00366 [pdf, ps, other]

Unbounded Donoho-Stark-Elad-Bruckstein-Ricaud-Torrésani Uncertainty Principles

Authors: K. Mahesh Krishna

Abstract: Let $(Ω, μ)$, $(Δ, ν)$ be measure spaces and $p=1$ or $p=\infty$. Let $(\{f_α\}_{α\in Ω}, \{τ_α\}_{α\in Ω})$ and $(\{g_β\}_{β\in Δ}, \{ω_β\}_{β\in Δ})$ be unbounded continuous p-Schauder frames for a Banach space $\mathcal{X}$. Then for every $x \in ( \mathcal{D}(θ_f) \cap\mathcal{D}(θ_g))\setminus\{0\}$, we show that \begin{align}\label{UB} (1) \quad \quad \quad \quad μ(\operatorname{supp}(θ_f… ▽ More Let $(Ω, μ)$, $(Δ, ν)$ be measure spaces and $p=1$ or $p=\infty$. Let $(\{f_α\}_{α\in Ω}, \{τ_α\}_{α\in Ω})$ and $(\{g_β\}_{β\in Δ}, \{ω_β\}_{β\in Δ})$ be unbounded continuous p-Schauder frames for a Banach space $\mathcal{X}$. Then for every $x \in ( \mathcal{D}(θ_f) \cap\mathcal{D}(θ_g))\setminus\{0\}$, we show that \begin{align}\label{UB} (1) \quad \quad \quad \quad μ(\operatorname{supp}(θ_f x))ν(\operatorname{supp}(θ_g x)) \geq \frac{1}{\left(\displaystyle\sup_{α\in Ω, β\in Δ}|f_α(ω_β)|\right)\left(\displaystyle\sup_{α\in Ω, β\in Δ}|g_β(τ_α)|\right)}, \end{align} where \begin{align*} &θ_f:\mathcal{D}(θ_f) \ni x \mapsto θ_fx \in \mathcal{L}^p(Ω, μ); \quad θ_fx: Ω\ni α\mapsto (θ_fx) (α):= f_α(x) \in \mathbb{K},\\ &θ_g: \mathcal{D}(θ_g) \ni x \mapsto θ_gx \in \mathcal{L}^p(Δ, ν); \quad θ_gx: Δ\ni β\mapsto (θ_gx) (β):= g_β(x) \in \mathbb{K}. \end{align*} We call Inequality (1) as \textbf{Unbounded Donoho-Stark-Elad-Bruckstein-Ricaud-Torrésani Uncertainty Principle}. Along with recent \textbf{Functional Continuous Uncertainty Principle} [arXiv:2308.00312], Inequality (1) also improves Ricaud-Torrésani uncertainty principle [IEEE Trans. Inform. Theory, 2013]. In particular, it improves Elad-Bruckstein uncertainty principle [IEEE Trans. Inform. Theory, 2002] and Donoho-Stark uncertainty principle [SIAM J. Appl. Math., 1989]. △ Less

Submitted 1 December, 2023; originally announced December 2023.

Comments: 6 Figures, 0 Figures

MSC Class: 42C15

arXiv:2311.14635 [pdf]

Automated Detection and Counting of Windows using UAV Imagery based Remote Sensing

Authors: Dhruv Patel, Shivani Chepuri, Sarvesh Thakur, K. Harikumar, Ravi Kiran S., K. Madhava Krishna

Abstract: Despite the technological advancements in the construction and surveying sector, the inspection of salient features like windows in an under-construction or existing building is predominantly a manual process. Moreover, the number of windows present in a building is directly related to the magnitude of deformation it suffers under earthquakes. In this research, a method to accurately detect and co… ▽ More Despite the technological advancements in the construction and surveying sector, the inspection of salient features like windows in an under-construction or existing building is predominantly a manual process. Moreover, the number of windows present in a building is directly related to the magnitude of deformation it suffers under earthquakes. In this research, a method to accurately detect and count the number of windows of a building by deploying an Unmanned Aerial Vehicle (UAV) based remote sensing system is proposed. The proposed two-stage method automates the identification and counting of windows by developing computer vision pipelines that utilize data from UAV's onboard camera and other sensors. Quantitative and Qualitative results show the effectiveness of our proposed approach in accurately detecting and counting the windows compared to the existing method. △ Less

Submitted 24 November, 2023; originally announced November 2023.

arXiv:2311.13980 [pdf, ps, other]

On the Permutation-Representation Number of Bipartite Graphs using Neighborhood Graphs

Authors: Khyodeno Mozhui, K. V. Krishna

Abstract: The problems of determining the permutation-representation number (prn) and the representation number of bipartite graphs are open in the literature. Moreover, the decision problem corresponding to the determination of the prn of a bipartite graph is NP-complete. However, these numbers were established for certain subclasses of bipartite graphs, e.g., for crown graphs. Further, it was conjectured… ▽ More The problems of determining the permutation-representation number (prn) and the representation number of bipartite graphs are open in the literature. Moreover, the decision problem corresponding to the determination of the prn of a bipartite graph is NP-complete. However, these numbers were established for certain subclasses of bipartite graphs, e.g., for crown graphs. Further, it was conjectured that the crown graphs have the highest representation number among the bipartite graphs. In this work, first, we reconcile the relation between the prn of a comparability graph and the dimension of its induced poset and review the upper bounds on the prn of bipartite graphs. Then, we study the prn of bipartite graphs using the notion called neighborhood graphs. This approach substantiates the aforesaid conjecture and gives us theoretical evidence. In this connection, we devise a polynomial-time procedure to construct a word that represents a given bipartite graph permutationally. Accordingly, we provide a better upper bound for the prn of bipartite graphs. Further, we construct a class of bipartite graphs, viz., extended crown graphs, defined over posets and investigate its prn using the neighborhood graphs. △ Less

Submitted 23 November, 2023; originally announced November 2023.

MSC Class: 68R10; 68R15; 05C90; 06A07

arXiv:2311.09517 [pdf, other]

GEE! Grammar Error Explanation with Large Language Models

Authors: Yixiao Song, Kalpesh Krishna, Rajesh Bhatt, Kevin Gimpel, Mohit Iyyer

Abstract: Grammatical error correction tools are effective at correcting grammatical errors in users' input sentences but do not provide users with \textit{natural language} explanations about their errors. Such explanations are essential for helping users learn the language by gaining a deeper understanding of its grammatical rules (DeKeyser, 2003; Ellis et al., 2006). To address this gap, we propose the t… ▽ More Grammatical error correction tools are effective at correcting grammatical errors in users' input sentences but do not provide users with \textit{natural language} explanations about their errors. Such explanations are essential for helping users learn the language by gaining a deeper understanding of its grammatical rules (DeKeyser, 2003; Ellis et al., 2006). To address this gap, we propose the task of grammar error explanation, where a system needs to provide one-sentence explanations for each grammatical error in a pair of erroneous and corrected sentences. We analyze the capability of GPT-4 in grammar error explanation, and find that it only produces explanations for 60.2% of the errors using one-shot prompting. To improve upon this performance, we develop a two-step pipeline that leverages fine-tuned and prompted large language models to perform structured atomic token edit extraction, followed by prompting GPT-4 to generate explanations. We evaluate our pipeline on German and Chinese grammar error correction data sampled from language learners with a wide range of proficiency levels. Human evaluation reveals that our pipeline produces 93.9% and 98.0% correct explanations for German and Chinese data, respectively. To encourage further research in this area, we will open-source our data and code. △ Less

Submitted 15 November, 2023; originally announced November 2023.

Comments: Preprint, 24 pages, code and data available in https://github.com/Yixiao-Song/GEE-with-LLMs

arXiv:2310.13077 [pdf, other]

doi 10.1109/CASE56687.2023.10260513

NeuroSMPC: A Neural Network guided Sampling Based MPC for On-Road Autonomous Driving

Authors: Kaustab Pal, Aditya Sharma, Mohd Omama, Parth N. Shah, K. Madhava Krishna

Abstract: In this paper we show an effective means of integrating data driven frameworks to sampling based optimal control to vastly reduce the compute time for easy adoption and adaptation to real time applications such as on-road autonomous driving in the presence of dynamic actors. Presented with training examples, a spatio-temporal CNN learns to predict the optimal mean control over a finite horizon tha… ▽ More In this paper we show an effective means of integrating data driven frameworks to sampling based optimal control to vastly reduce the compute time for easy adoption and adaptation to real time applications such as on-road autonomous driving in the presence of dynamic actors. Presented with training examples, a spatio-temporal CNN learns to predict the optimal mean control over a finite horizon that precludes further resampling, an iterative process that makes sampling based optimal control formulations difficult to adopt in real time settings. Generating control samples around the network-predicted optimal mean retains the advantage of sample diversity while enabling real time rollout of trajectories that avoids multiple dynamic obstacles in an on-road navigation setting. Further the 3D CNN architecture implicitly learns the future trajectories of the dynamic agents in the scene resulting in successful collision free navigation despite no explicit future trajectory prediction. We show performance gain over multiple baselines in a number of on-road scenes through closed loop simulations in CARLA. We also showcase the real world applicability of our system by running it on our custom Autonomous Driving Platform (AutoDP). △ Less

Submitted 19 October, 2023; originally announced October 2023.

Comments: Published in 2023 IEEE 19th International Conference on Automation Science and Engineering (CASE)

arXiv:2310.08270 [pdf, other]

Hilbert Space Embedding-based Trajectory Optimization for Multi-Modal Uncertain Obstacle Trajectory Prediction

Authors: Basant Sharma, Aditya Sharma, K. Madhava Krishna, Arun Kumar Singh

Abstract: Safe autonomous driving critically depends on how well the ego-vehicle can predict the trajectories of neighboring vehicles. To this end, several trajectory prediction algorithms have been presented in the existing literature. Many of these approaches output a multi-modal distribution of obstacle trajectories instead of a single deterministic prediction to account for the underlying uncertainty. H… ▽ More Safe autonomous driving critically depends on how well the ego-vehicle can predict the trajectories of neighboring vehicles. To this end, several trajectory prediction algorithms have been presented in the existing literature. Many of these approaches output a multi-modal distribution of obstacle trajectories instead of a single deterministic prediction to account for the underlying uncertainty. However, existing planners cannot handle the multi-modality based on just sample-level information of the predictions. With this motivation, this paper proposes a trajectory optimizer that can leverage the distributional aspects of the prediction in a computationally tractable and sample-efficient manner. Our optimizer can work with arbitrarily complex distributions and thus can be used with output distribution represented as a deep neural network. The core of our approach is built on embedding distribution in Reproducing Kernel Hilbert Space (RKHS), which we leverage in two ways. First, we propose an RKHS embedding approach to select probable samples from the obstacle trajectory distribution. Second, we rephrase chance-constrained optimization as distribution matching in RKHS and propose a novel sampling-based optimizer for its solution. We validate our approach with hand-crafted and neural network-based predictors trained on real-world datasets and show improvement over the existing stochastic optimization approaches in safety metrics. △ Less

Submitted 12 October, 2023; originally announced October 2023.

arXiv:2310.04181 [pdf, other]

DiffPrompter: Differentiable Implicit Visual Prompts for Semantic-Segmentation in Adverse Conditions

Authors: Sanket Kalwar, Mihir Ungarala, Shruti Jain, Aaron Monis, Krishna Reddy Konda, Sourav Garg, K Madhava Krishna

Abstract: Semantic segmentation in adverse weather scenarios is a critical task for autonomous driving systems. While foundation models have shown promise, the need for specialized adaptors becomes evident for handling more challenging scenarios. We introduce DiffPrompter, a novel differentiable visual and latent prompting mechanism aimed at expanding the learning capabilities of existing adaptors in founda… ▽ More Semantic segmentation in adverse weather scenarios is a critical task for autonomous driving systems. While foundation models have shown promise, the need for specialized adaptors becomes evident for handling more challenging scenarios. We introduce DiffPrompter, a novel differentiable visual and latent prompting mechanism aimed at expanding the learning capabilities of existing adaptors in foundation models. Our proposed $\nabla$HFC image processing block excels particularly in adverse weather conditions, where conventional methods often fall short. Furthermore, we investigate the advantages of jointly training visual and latent prompts, demonstrating that this combined approach significantly enhances performance in out-of-distribution scenarios. Our differentiable visual prompts leverage parallel and series architectures to generate prompts, effectively improving object segmentation tasks in adverse conditions. Through a comprehensive series of experiments and evaluations, we provide empirical evidence to support the efficacy of our approach. Project page at https://diffprompter.github.io. △ Less

Submitted 26 March, 2024; v1 submitted 6 October, 2023; originally announced October 2023.

arXiv:2310.02251 [pdf, other]

Talk2BEV: Language-enhanced Bird's-eye View Maps for Autonomous Driving

Authors: Tushar Choudhary, Vikrant Dewangan, Shivam Chandhok, Shubham Priyadarshan, Anushka Jain, Arun K. Singh, Siddharth Srivastava, Krishna Murthy Jatavallabhula, K. Madhava Krishna

Abstract: Talk2BEV is a large vision-language model (LVLM) interface for bird's-eye view (BEV) maps in autonomous driving contexts. While existing perception systems for autonomous driving scenarios have largely focused on a pre-defined (closed) set of object categories and driving scenarios, Talk2BEV blends recent advances in general-purpose language and vision models with BEV-structured map representation… ▽ More Talk2BEV is a large vision-language model (LVLM) interface for bird's-eye view (BEV) maps in autonomous driving contexts. While existing perception systems for autonomous driving scenarios have largely focused on a pre-defined (closed) set of object categories and driving scenarios, Talk2BEV blends recent advances in general-purpose language and vision models with BEV-structured map representations, eliminating the need for task-specific models. This enables a single system to cater to a variety of autonomous driving tasks encompassing visual and spatial reasoning, predicting the intents of traffic actors, and decision-making based on visual cues. We extensively evaluate Talk2BEV on a large number of scene understanding tasks that rely on both the ability to interpret free-form natural language queries, and in grounding these queries to the visual context embedded into the language-enhanced BEV map. To enable further research in LVLMs for autonomous driving scenarios, we develop and release Talk2BEV-Bench, a benchmark encompassing 1000 human-annotated BEV scenarios, with more than 20,000 questions and ground-truth responses from the NuScenes dataset. △ Less

Submitted 14 November, 2023; v1 submitted 3 October, 2023; originally announced October 2023.

Comments: Project page at https://llmbev.github.io/talk2bev/

arXiv:2307.01215 [pdf, ps, other]

Functional Donoho-Stark Approximate Support Uncertainty Principle

Authors: K. Mahesh Krishna

Abstract: Let $(\{f_j\}_{j=1}^n, \{τ_j\}_{j=1}^n)$ and $(\{g_k\}_{k=1}^n, \{ω_k\}_{k=1}^n)$ be two p-orthonormal bases for a finite dimensional Banach space $\mathcal{X}$. If $ x \in \mathcal{X}\setminus\{0\}$ is such that $θ_fx$ is $\varepsilon$-supported on $M\subseteq \{1,\dots, n\}$ w.r.t. p-norm and $θ_gx$ is $δ$-supported on $N\subseteq \{1,\dots, n\}$ w.r.t. p-norm, then we show that \begin{align}\la… ▽ More Let $(\{f_j\}_{j=1}^n, \{τ_j\}_{j=1}^n)$ and $(\{g_k\}_{k=1}^n, \{ω_k\}_{k=1}^n)$ be two p-orthonormal bases for a finite dimensional Banach space $\mathcal{X}$. If $ x \in \mathcal{X}\setminus\{0\}$ is such that $θ_fx$ is $\varepsilon$-supported on $M\subseteq \{1,\dots, n\}$ w.r.t. p-norm and $θ_gx$ is $δ$-supported on $N\subseteq \{1,\dots, n\}$ w.r.t. p-norm, then we show that \begin{align}\label{ME} (1) \quad \quad \quad \quad &o(M)^\frac{1}{p}o(N)^\frac{1}{q}\geq \frac{1}{\displaystyle \max_{1\leq j,k\leq n}|f_j(ω_k) |}\max \{1-\varepsilon-δ, 0\},\\ (2) \quad \quad \quad \quad&o(M)^\frac{1}{q}o(N)^\frac{1}{p}\geq \frac{1}{\displaystyle \max_{1\leq j,k\leq n}|g_k(τ_j) |}\max \{1-\varepsilon-δ, 0\},\label{ME2} \end{align} where \begin{align*} θ_f: \mathcal{X} \ni x \mapsto (f_j(x) )_{j=1}^n \in \ell^p([n]); \quad θ_g: \mathcal{X} \ni x \mapsto (g_k(x) )_{k=1}^n \in \ell^p([n]) \end{align*} and $q$ is the conjugate index of $p$. We call Inequalities (1) and (2) as \textbf{Functional Donoho-Stark Approximate Support Uncertainty Principle}. Inequalities (1) and (2) improve the finite approximate support uncertainty principle obtained by Donoho and Stark \textit{[SIAM J. Appl. Math., 1989]}. △ Less

Submitted 1 July, 2023; originally announced July 2023.

Comments: 7 Pages, 0 Figures

MSC Class: 42C15; 46B03; 46B04

arXiv:2307.00301 [pdf, ps, other]

Words for the Graphs with Permutation-Representation Number at most Three

Authors: Khyodeno Mozhui, K. V. Krishna

Abstract: The graphs with permutation-representation number (\textit{prn}) at most two are known. While a characterization for the class of graphs with the \textit{prn} at most three is an open problem, we summarize the graphs of this class that are known so far. Although it is known that the \textit{prn} of trees is at most three, in this work, we devise a polynomial-time algorithm for obtaining a word rep… ▽ More The graphs with permutation-representation number (\textit{prn}) at most two are known. While a characterization for the class of graphs with the \textit{prn} at most three is an open problem, we summarize the graphs of this class that are known so far. Although it is known that the \textit{prn} of trees is at most three, in this work, we devise a polynomial-time algorithm for obtaining a word representing a given tree permutationally. Consequently, we determine the words representing even cycles. Contributing to the class of graphs with the \textit{prn} at most three, we determine the \textit{prn} as well as the representation number of book graphs. △ Less

Submitted 29 September, 2023; v1 submitted 1 July, 2023; originally announced July 2023.

MSC Class: 68R10; 68R15; 05C90; 06A07

arXiv:2306.06093 [pdf, other]

HyP-NeRF: Learning Improved NeRF Priors using a HyperNetwork

Authors: Bipasha Sen, Gaurav Singh, Aditya Agarwal, Rohith Agaram, K Madhava Krishna, Srinath Sridhar

Abstract: Neural Radiance Fields (NeRF) have become an increasingly popular representation to capture high-quality appearance and shape of scenes and objects. However, learning generalizable NeRF priors over categories of scenes or objects has been challenging due to the high dimensionality of network weight space. To address the limitations of existing work on generalization, multi-view consistency and to… ▽ More Neural Radiance Fields (NeRF) have become an increasingly popular representation to capture high-quality appearance and shape of scenes and objects. However, learning generalizable NeRF priors over categories of scenes or objects has been challenging due to the high dimensionality of network weight space. To address the limitations of existing work on generalization, multi-view consistency and to improve quality, we propose HyP-NeRF, a latent conditioning method for learning generalizable category-level NeRF priors using hypernetworks. Rather than using hypernetworks to estimate only the weights of a NeRF, we estimate both the weights and the multi-resolution hash encodings resulting in significant quality gains. To improve quality even further, we incorporate a denoise and finetune strategy that denoises images rendered from NeRFs estimated by the hypernetwork and finetunes it while retaining multiview consistency. These improvements enable us to use HyP-NeRF as a generalizable prior for multiple downstream tasks including NeRF reconstruction from single-view or cluttered scenes and text-to-NeRF. We provide qualitative comparisons and evaluate HyP-NeRF on three tasks: generalization, compression, and retrieval, demonstrating our state-of-the-art results. △ Less

Submitted 23 December, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

Comments: Project Page: https://hyp-nerf.github.io

arXiv:2306.04939 [pdf, other]

UAP-BEV: Uncertainty Aware Planning using Bird's Eye View generated from Surround Monocular Images

Authors: Vikrant Dewangan, Basant Sharma, Tushar Choudhary, Sarthak Sharma, Aakash Aanegola, Arun K. Singh, K. Madhava Krishna

Abstract: Autonomous driving requires accurate reasoning of the location of objects from raw sensor data. Recent end-to-end learning methods go from raw sensor data to a trajectory output via Bird's Eye View(BEV) segmentation as an interpretable intermediate representation. Motion planning over cost maps generated via Birds Eye View (BEV) segmentation has emerged as a prominent approach in autonomous drivin… ▽ More Autonomous driving requires accurate reasoning of the location of objects from raw sensor data. Recent end-to-end learning methods go from raw sensor data to a trajectory output via Bird's Eye View(BEV) segmentation as an interpretable intermediate representation. Motion planning over cost maps generated via Birds Eye View (BEV) segmentation has emerged as a prominent approach in autonomous driving. However, the current approaches have two critical gaps. First, the optimization process is simplistic and involves just evaluating a fixed set of trajectories over the cost map. The trajectory samples are not adapted based on their associated cost values. Second, the existing cost maps do not account for the uncertainty in the cost maps that can arise due to noise in RGB images, and BEV annotations. As a result, these approaches can struggle in challenging scenarios where there is abrupt cut-in, stopping, overtaking, merging, etc from the neighboring vehicles. In this paper, we propose UAP-BEV: A novel approach that models the noise in Spatio-Temporal BEV predictions to create an uncertainty-aware occupancy grid map. Using queries of the distance to the closest occupied cell, we obtain a sample estimate of the collision probability of the ego-vehicle. Subsequently, our approach uses gradient-free sampling-based optimization to compute low-cost trajectories over the cost map. Importantly, the sampling distribution is adapted based on the optimal cost values of the sampled trajectories. By explicitly modeling probabilistic collision avoidance in the BEV space, our approach is able to outperform the cost-map-based baselines in collision avoidance, route completion, time to completion, and smoothness. To further validate our method, we also show results on the real-world dataset NuScenes, where we report improvements in collision avoidance and smoothness. △ Less

Submitted 8 June, 2023; originally announced June 2023.

Comments: Accepted to CASE 2023. Project video available at https://vikr-182.github.io/UAP-BEV

arXiv:2306.01014 [pdf, ps, other]

Functional Ghobber-Jaming Uncertainty Principle

Authors: K. Mahesh Krishna

Abstract: Let $(\{f_j\}_{j=1}^n, \{τ_j\}_{j=1}^n)$ and $(\{g_k\}_{k=1}^n, \{ω_k\}_{k=1}^n)$ be two p-orthonormal bases for a finite dimensional Banach space $\mathcal{X}$. Let $M,N\subseteq \{1, \dots, n\}$ be such that \begin{align*} o(M)^\frac{1}{q}o(N)^\frac{1}{p}< \frac{1}{\displaystyle \max_{1\leq j,k\leq n}|g_k(τ_j) |}, \end{align*} where $q$ is the conjugate index of $p$. Then for all… ▽ More Let $(\{f_j\}_{j=1}^n, \{τ_j\}_{j=1}^n)$ and $(\{g_k\}_{k=1}^n, \{ω_k\}_{k=1}^n)$ be two p-orthonormal bases for a finite dimensional Banach space $\mathcal{X}$. Let $M,N\subseteq \{1, \dots, n\}$ be such that \begin{align*} o(M)^\frac{1}{q}o(N)^\frac{1}{p}< \frac{1}{\displaystyle \max_{1\leq j,k\leq n}|g_k(τ_j) |}, \end{align*} where $q$ is the conjugate index of $p$. Then for all $x \in \mathcal{X}$, we show that \begin{align}\label{FGJU} (1) \quad \quad \quad \quad \|x\|\leq \left(1+\frac{1}{1-o(M)^\frac{1}{q}o(N)^\frac{1}{p}\displaystyle\max_{1\leq j,k\leq n}|g_k(τ_j)|}\right)\left[\left(\sum_{j\in M^c}|f_j(x)|^p\right)^\frac{1}{p}+\left(\sum_{k\in N^c}|g_k(x) |^p\right)^\frac{1}{p}\right]. \end{align} We call Inequality (1) as \textbf{Functional Ghobber-Jaming Uncertainty Principle}. Inequality (1) improves the uncertainty principle obtained by Ghobber and Jaming \textit{[Linear Algebra Appl., 2011]}. △ Less

Submitted 1 June, 2023; originally announced June 2023.

Comments: 7 Pages, 0 Figures

MSC Class: 42C15; 46B03; 46B04

arXiv:2305.14296 [pdf, other]

USB: A Unified Summarization Benchmark Across Tasks and Domains

Authors: Kundan Krishna, Prakhar Gupta, Sanjana Ramprasad, Byron C. Wallace, Jeffrey P. Bigham, Zachary C. Lipton

Abstract: While the NLP community has produced numerous summarization benchmarks, none provide the rich annotations required to simultaneously address many important problems related to control and reliability. We introduce a Wikipedia-derived benchmark, complemented by a rich set of crowd-sourced annotations, that supports $8$ interrelated tasks: (i) extractive summarization; (ii) abstractive summarization… ▽ More While the NLP community has produced numerous summarization benchmarks, none provide the rich annotations required to simultaneously address many important problems related to control and reliability. We introduce a Wikipedia-derived benchmark, complemented by a rich set of crowd-sourced annotations, that supports $8$ interrelated tasks: (i) extractive summarization; (ii) abstractive summarization; (iii) topic-based summarization; (iv) compressing selected sentences into a one-line summary; (v) surfacing evidence for a summary sentence; (vi) predicting the factual accuracy of a summary sentence; (vii) identifying unsubstantiated spans in a summary sentence; (viii) correcting factual errors in summaries. We compare various methods on this benchmark and discover that on multiple tasks, moderately-sized fine-tuned models consistently outperform much larger few-shot prompted language models. For factuality-related tasks, we also evaluate existing heuristics to create training data and find that training on them results in worse performance than training on $20\times$ less human-labeled data. Our articles draw from $6$ domains, facilitating cross-domain analysis. On some tasks, the amount of training data matters more than the domain where it comes from, while for other tasks training specifically on data from the target domain, even if limited, is more beneficial. △ Less

Submitted 4 December, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: EMNLP Findings 2023 Camera Ready

arXiv:2305.14251 [pdf, other]

FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation

Authors: Sewon Min, Kalpesh Krishna, Xinxi Lyu, Mike Lewis, Wen-tau Yih, Pang Wei Koh, Mohit Iyyer, Luke Zettlemoyer, Hannaneh Hajishirzi

Abstract: Evaluating the factuality of long-form text generated by large language models (LMs) is non-trivial because (1) generations often contain a mixture of supported and unsupported pieces of information, making binary judgments of quality inadequate, and (2) human evaluation is time-consuming and costly. In this paper, we introduce FACTSCORE, a new evaluation that breaks a generation into a series of… ▽ More Evaluating the factuality of long-form text generated by large language models (LMs) is non-trivial because (1) generations often contain a mixture of supported and unsupported pieces of information, making binary judgments of quality inadequate, and (2) human evaluation is time-consuming and costly. In this paper, we introduce FACTSCORE, a new evaluation that breaks a generation into a series of atomic facts and computes the percentage of atomic facts supported by a reliable knowledge source. We conduct an extensive human evaluation to obtain FACTSCOREs of people biographies generated by several state-of-the-art commercial LMs -- InstructGPT, ChatGPT, and the retrieval-augmented PerplexityAI -- and report new analysis demonstrating the need for such a fine-grained score (e.g., ChatGPT only achieves 58%). Since human evaluation is costly, we also introduce an automated model that estimates FACTSCORE using retrieval and a strong language model, with less than a 2% error rate. Finally, we use this automated metric to evaluate 6,500 generations from a new set of 13 recent LMs that would have cost $26K if evaluated by humans, with various findings: GPT-4 and ChatGPT are more factual than public models, and Vicuna and Alpaca are some of the best public models. FACTSCORE is available for public use via `pip install factscore`. △ Less

Submitted 11 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: 25 pages; 7 figures. Published as a main conference paper at EMNLP 2023. Code available at https://github.com/shmsw25/FActScore

arXiv:2305.12363 [pdf, other]

doi 10.1109/RO-MAN57019.2023.10309534

Instance-Level Semantic Maps for Vision Language Navigation

Authors: Laksh Nanwani, Anmol Agarwal, Kanishk Jain, Raghav Prabhakar, Aaron Monis, Aditya Mathur, Krishna Murthy, Abdul Hafez, Vineet Gandhi, K. Madhava Krishna

Abstract: Humans have a natural ability to perform semantic associations with the surrounding objects in the environment. This allows them to create a mental map of the environment, allowing them to navigate on-demand when given linguistic instructions. A natural goal in Vision Language Navigation (VLN) research is to impart autonomous agents with similar capabilities. Recent works take a step towards this… ▽ More Humans have a natural ability to perform semantic associations with the surrounding objects in the environment. This allows them to create a mental map of the environment, allowing them to navigate on-demand when given linguistic instructions. A natural goal in Vision Language Navigation (VLN) research is to impart autonomous agents with similar capabilities. Recent works take a step towards this goal by creating a semantic spatial map representation of the environment without any labeled data. However, their representations are limited for practical applicability as they do not distinguish between different instances of the same object. In this work, we address this limitation by integrating instance-level information into spatial map representation using a community detection algorithm and utilizing word ontology learned by large language models (LLMs) to perform open-set semantic associations in the mapping representation. The resulting map representation improves the navigation performance by two-fold (233%) on realistic language commands with instance-specific descriptions compared to the baseline. We validate the practicality and effectiveness of our approach through extensive qualitative and quantitative experiments. △ Less

Submitted 1 July, 2023; v1 submitted 21 May, 2023; originally announced May 2023.

Journal ref: IEEE RO-MAN 2023

arXiv:2304.03326 [pdf, other]

Finite Time Lyapunov Exponent Analysis of Model Predictive Control and Reinforcement Learning

Authors: Kartik Krishna, Steven L. Brunton, Zhuoyuan Song

Abstract: Finite-time Lyapunov exponents (FTLEs) provide a powerful approach to compute time-varying analogs of invariant manifolds in unsteady fluid flow fields. These manifolds are useful to visualize the transport mechanisms of passive tracers advecting with the flow. However, many vehicles and mobile sensors are not passive, but are instead actuated according to some intelligent trajectory planning or c… ▽ More Finite-time Lyapunov exponents (FTLEs) provide a powerful approach to compute time-varying analogs of invariant manifolds in unsteady fluid flow fields. These manifolds are useful to visualize the transport mechanisms of passive tracers advecting with the flow. However, many vehicles and mobile sensors are not passive, but are instead actuated according to some intelligent trajectory planning or control law; for example, model predictive control and reinforcement learning are often used to design energy-efficient trajectories in a dynamically changing background flow. In this work, we investigate the use of FTLE on such controlled agents to gain insight into optimal transport routes for navigation in known unsteady flows. We find that these controlled FTLE (cFTLE) coherent structures separate the flow field into different regions with similar costs of transport to the goal location. These separatrices are functions of the planning algorithm's hyper-parameters, such as the optimization time horizon and the cost of actuation. Computing the invariant sets and manifolds of active agent dynamics in dynamic flow fields is useful in the context of robust motion control, hyperparameter tuning, and determining safe and collision-free trajectories for autonomous systems. Moreover, these cFTLE structures provide insight into effective deployment locations for mobile agents with actuation and energy constraints to traverse the ocean or atmosphere. △ Less

Submitted 17 May, 2023; v1 submitted 6 April, 2023; originally announced April 2023.

Comments: 22 pages, 12 figures

MSC Class: 37N35; 34H05; 37D10; 34D08; 37N10; 76F25; 93B45

arXiv:2304.03324 [pdf, ps, other]

Functional Donoho-Stark-Elad-Bruckstein-Ricaud-Torrésani Uncertainty Principle

Authors: K. Mahesh Krishna

Abstract: Let $(\{f_j\}_{j=1}^n, \{τ_j\}_{j=1}^n)$ and $(\{g_k\}_{k=1}^m, \{ω_k\}_{k=1}^m)$ be p-Schauder frames for a finite dimensional Banach space $\mathcal{X}$. Then for every $x \in \mathcal{X}\setminus\{0\}$, we show that \begin{align} (1) \quad \|θ_f x\|_0^\frac{1}{p}\|θ_g x\|_0^\frac{1}{q} \geq \frac{1}{\displaystyle\max_{1\leq j\leq n, 1\leq k\leq m}|f_j(ω_k)|}\quad \text{and} \quad \|θ_g x\|_0^\f… ▽ More Let $(\{f_j\}_{j=1}^n, \{τ_j\}_{j=1}^n)$ and $(\{g_k\}_{k=1}^m, \{ω_k\}_{k=1}^m)$ be p-Schauder frames for a finite dimensional Banach space $\mathcal{X}$. Then for every $x \in \mathcal{X}\setminus\{0\}$, we show that \begin{align} (1) \quad \|θ_f x\|_0^\frac{1}{p}\|θ_g x\|_0^\frac{1}{q} \geq \frac{1}{\displaystyle\max_{1\leq j\leq n, 1\leq k\leq m}|f_j(ω_k)|}\quad \text{and} \quad \|θ_g x\|_0^\frac{1}{p}\|θ_f x\|_0^\frac{1}{q}\geq \frac{1}{\displaystyle\max_{1\leq j\leq n, 1\leq k\leq m}|g_k(τ_j)|}. \end{align} where \begin{align*} θ_f: \mathcal{X} \ni x \mapsto (f_j(x) )_{j=1}^n \in \ell^p([n]); \quad θ_g: \mathcal{X} \ni x \mapsto (g_k(x) )_{k=1}^m \in \ell^p([m]) \end{align*} and $q$ is the conjugate index of $p$. We call Inequality (1) as \textbf{Functional Donoho-Stark-Elad-Bruckstein-Ricaud-Torrésani Uncertainty Principle}. Inequality (1) improves Ricaud-Torrésani uncertainty principle \textit{[IEEE Trans. Inform. Theory, 2013]}. In particular, it improves Elad-Bruckstein uncertainty principle \textit{[IEEE Trans. Inform. Theory, 2002]} and Donoho-Stark uncertainty principle \textit{[SIAM J. Appl. Math., 1989]}. △ Less

Submitted 5 April, 2023; originally announced April 2023.

Comments: 5 Pages, 0 Figures

MSC Class: 42C15

arXiv:2304.01074 [pdf, other]

FinderNet: A Data Augmentation Free Canonicalization aided Loop Detection and Closure technique for Point clouds in 6-DOF separation

Authors: Sudarshan S Harithas, Gurkirat Singh, Aneesh Chavan, Sarthak Sharma, Suraj Patni, Chetan Arora, K. Madhava Krishna

Abstract: We focus on the problem of LiDAR point cloud based loop detection (or Finding) and closure (LDC) in a multi-agent setting. State-of-the-art (SOTA) techniques directly generate learned embeddings of a given point cloud, require large data transfers, and are not robust to wide variations in 6 Degrees-of-Freedom (DOF) viewpoint. Moreover, absence of strong priors in an unstructured point cloud leads… ▽ More We focus on the problem of LiDAR point cloud based loop detection (or Finding) and closure (LDC) in a multi-agent setting. State-of-the-art (SOTA) techniques directly generate learned embeddings of a given point cloud, require large data transfers, and are not robust to wide variations in 6 Degrees-of-Freedom (DOF) viewpoint. Moreover, absence of strong priors in an unstructured point cloud leads to highly inaccurate LDC. In this original approach, we propose independent roll and pitch canonicalization of the point clouds using a common dominant ground plane. Discretization of the canonicalized point cloud along the axis perpendicular to the ground plane leads to an image similar to Digital Elevation Maps (DEMs), which exposes strong spatial priors in the scene. Our experiments show that LDC based on learnt embeddings of such DEMs is not only data efficient but also significantly more robust, and generalizable than the current SOTA. We report significant performance gain in terms of Average Precision for loop detection and absolute translation/rotation error for relative pose estimation (or loop closure) on Kitti, GPR and Oxford Robot Car over multiple SOTA LDC methods. Our encoder technique allows to compress the original point cloud by over 830 times. To further test the robustness of our technique we create and opensource a custom dataset called Lidar-UrbanFly Dataset (LUF) which consists of point clouds obtained from a LiDAR mounted on a quadrotor. △ Less

Submitted 3 April, 2023; originally announced April 2023.

arXiv:2303.13408 [pdf, other]

Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense

Authors: Kalpesh Krishna, Yixiao Song, Marzena Karpinska, John Wieting, Mohit Iyyer

Abstract: The rise in malicious usage of large language models, such as fake content creation and academic plagiarism, has motivated the development of approaches that identify AI-generated text, including those based on watermarking or outlier detection. However, the robustness of these detection algorithms to paraphrases of AI-generated text remains unclear. To stress test these detectors, we build a 11B… ▽ More The rise in malicious usage of large language models, such as fake content creation and academic plagiarism, has motivated the development of approaches that identify AI-generated text, including those based on watermarking or outlier detection. However, the robustness of these detection algorithms to paraphrases of AI-generated text remains unclear. To stress test these detectors, we build a 11B parameter paraphrase generation model (DIPPER) that can paraphrase paragraphs, condition on surrounding context, and control lexical diversity and content reordering. Using DIPPER to paraphrase text generated by three large language models (including GPT3.5-davinci-003) successfully evades several detectors, including watermarking, GPTZero, DetectGPT, and OpenAI's text classifier. For example, DIPPER drops detection accuracy of DetectGPT from 70.3% to 4.6% (at a constant false positive rate of 1%), without appreciably modifying the input semantics. To increase the robustness of AI-generated text detection to paraphrase attacks, we introduce a simple defense that relies on retrieving semantically-similar generations and must be maintained by a language model API provider. Given a candidate text, our algorithm searches a database of sequences previously generated by the API, looking for sequences that match the candidate text within a certain threshold. We empirically verify our defense using a database of 15M generations from a fine-tuned T5-XXL model and find that it can detect 80% to 97% of paraphrased generations across different settings while only classifying 1% of human-written sequences as AI-generated. We open-source our models, code and data. △ Less

Submitted 17 October, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

Comments: NeurIPS 2023 camera ready (32 pages). Code, models, data available in https://github.com/martiansideofthemoon/ai-detection-paraphrases

arXiv:2303.04729 [pdf, other]

doi 10.1145/3576915.3616652

Stealing the Decoding Algorithms of Language Models

Authors: Ali Naseh, Kalpesh Krishna, Mohit Iyyer, Amir Houmansadr

Abstract: A key component of generating text from modern language models (LM) is the selection and tuning of decoding algorithms. These algorithms determine how to generate text from the internal probability distribution generated by the LM. The process of choosing a decoding algorithm and tuning its hyperparameters takes significant time, manual effort, and computation, and it also requires extensive human… ▽ More A key component of generating text from modern language models (LM) is the selection and tuning of decoding algorithms. These algorithms determine how to generate text from the internal probability distribution generated by the LM. The process of choosing a decoding algorithm and tuning its hyperparameters takes significant time, manual effort, and computation, and it also requires extensive human evaluation. Therefore, the identity and hyperparameters of such decoding algorithms are considered to be extremely valuable to their owners. In this work, we show, for the first time, that an adversary with typical API access to an LM can steal the type and hyperparameters of its decoding algorithms at very low monetary costs. Our attack is effective against popular LMs used in text generation APIs, including GPT-2, GPT-3 and GPT-Neo. We demonstrate the feasibility of stealing such information with only a few dollars, e.g., $\$0.8$, $\$1$, $\$4$, and $\$40$ for the four versions of GPT-3. △ Less

Submitted 1 December, 2023; v1 submitted 8 March, 2023; originally announced March 2023.

Journal ref: Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security

arXiv:2301.13298 [pdf, other]

LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization

Authors: Kalpesh Krishna, Erin Bransom, Bailey Kuehl, Mohit Iyyer, Pradeep Dasigi, Arman Cohan, Kyle Lo

Abstract: While human evaluation remains best practice for accurately judging the faithfulness of automatically-generated summaries, few solutions exist to address the increased difficulty and workload when evaluating long-form summaries. Through a survey of 162 papers on long-form summarization, we first shed light on current human evaluation practices surrounding long-form summaries. We find that 73% of t… ▽ More While human evaluation remains best practice for accurately judging the faithfulness of automatically-generated summaries, few solutions exist to address the increased difficulty and workload when evaluating long-form summaries. Through a survey of 162 papers on long-form summarization, we first shed light on current human evaluation practices surrounding long-form summaries. We find that 73% of these papers do not perform any human evaluation on model-generated summaries, while other works face new difficulties that manifest when dealing with long documents (e.g., low inter-annotator agreement). Motivated by our survey, we present LongEval, a set of guidelines for human evaluation of faithfulness in long-form summaries that addresses the following challenges: (1) How can we achieve high inter-annotator agreement on faithfulness scores? (2) How can we minimize annotator workload while maintaining accurate faithfulness scores? and (3) Do humans benefit from automated alignment between summary and source snippets? We deploy LongEval in annotation studies on two long-form summarization datasets in different domains (SQuALITY and PubMed), and we find that switching to a finer granularity of judgment (e.g., clause-level) reduces inter-annotator variance in faithfulness scores (e.g., std-dev from 18.5 to 6.8). We also show that scores from a partial annotation of fine-grained units highly correlates with scores from a full annotation workload (0.89 Kendall's tau using 50% judgments). We release our human judgments, annotation templates, and our software as a Python library for future research. △ Less

Submitted 30 January, 2023; originally announced January 2023.

Comments: EACL 2023 camera ready. Code and data can be found in https://github.com/martiansideofthemoon/longeval-summarization

arXiv:2212.09928 [pdf, other]

Improving the Robustness of Summarization Models by Detecting and Removing Input Noise

Authors: Kundan Krishna, Yao Zhao, Jie Ren, Balaji Lakshminarayanan, Jiaming Luo, Mohammad Saleh, Peter J. Liu

Abstract: The evaluation of abstractive summarization models typically uses test data that is identically distributed as training data. In real-world practice, documents to be summarized may contain input noise caused by text extraction artifacts or data pipeline bugs. The robustness of model performance under distribution shift caused by such noise is relatively under-studied. We present a large empirical… ▽ More The evaluation of abstractive summarization models typically uses test data that is identically distributed as training data. In real-world practice, documents to be summarized may contain input noise caused by text extraction artifacts or data pipeline bugs. The robustness of model performance under distribution shift caused by such noise is relatively under-studied. We present a large empirical study quantifying the sometimes severe loss in performance (up to 12 ROUGE-1 points) from different types of input noise for a range of datasets and model sizes. We then propose a light-weight method for detecting and removing such noise in the input during model inference without requiring any extra training, auxiliary models, or even prior knowledge of the type of noise. Our proposed approach effectively mitigates the loss in performance, recovering a large fraction of the performance drop, sometimes as large as 11 ROUGE-1 points. △ Less

Submitted 4 December, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

Comments: EMNLP Findings 2023 Camera Ready

arXiv:2211.16882 [pdf, other]

MVRackLay: Monocular Multi-View Layout Estimation for Warehouse Racks and Shelves

Authors: Pranjali Pathre, Anurag Sahu, Ashwin Rao, Avinash Prabhu, Meher Shashwat Nigam, Tanvi Karandikar, Harit Pandya, K. Madhava Krishna

Abstract: In this paper, we propose and showcase, for the first time, monocular multi-view layout estimation for warehouse racks and shelves. Unlike typical layout estimation methods, MVRackLay estimates multi-layered layouts, wherein each layer corresponds to the layout of a shelf within a rack. Given a sequence of images of a warehouse scene, a dual-headed Convolutional-LSTM architecture outputs segmented… ▽ More In this paper, we propose and showcase, for the first time, monocular multi-view layout estimation for warehouse racks and shelves. Unlike typical layout estimation methods, MVRackLay estimates multi-layered layouts, wherein each layer corresponds to the layout of a shelf within a rack. Given a sequence of images of a warehouse scene, a dual-headed Convolutional-LSTM architecture outputs segmented racks, the front and the top view layout of each shelf within a rack. With minimal effort, such an output is transformed into a 3D rendering of all racks, shelves and objects on the shelves, giving an accurate 3D depiction of the entire warehouse scene in terms of racks, shelves and the number of objects on each shelf. MVRackLay generalizes to a diverse set of warehouse scenes with varying number of objects on each shelf, number of shelves and in the presence of other such racks in the background. Further, MVRackLay shows superior performance vis-a-vis its single view counterpart, RackLay, in layout accuracy, quantized in terms of the mean IoU and mAP metrics. We also showcase a multi-view stitching of the 3D layouts resulting in a representation of the warehouse scene with respect to a global reference frame akin to a rendering of the scene from a SLAM pipeline. To the best of our knowledge, this is the first such work to portray a 3D rendering of a warehouse scene in terms of its semantic components - Racks, Shelves and Objects - all from a single monocular camera. △ Less

Submitted 30 November, 2022; originally announced November 2022.

Journal ref: IEEE International Conference on Robotics and Biomimetics (ROBIO) 2022

arXiv:2210.14250 [pdf, other]

Exploring Document-Level Literary Machine Translation with Parallel Paragraphs from World Literature

Authors: Katherine Thai, Marzena Karpinska, Kalpesh Krishna, Bill Ray, Moira Inghilleri, John Wieting, Mohit Iyyer

Abstract: Literary translation is a culturally significant task, but it is bottlenecked by the small number of qualified literary translators relative to the many untranslated works published around the world. Machine translation (MT) holds potential to complement the work of human translators by improving both training procedures and their overall efficiency. Literary translation is less constrained than m… ▽ More Literary translation is a culturally significant task, but it is bottlenecked by the small number of qualified literary translators relative to the many untranslated works published around the world. Machine translation (MT) holds potential to complement the work of human translators by improving both training procedures and their overall efficiency. Literary translation is less constrained than more traditional MT settings since translators must balance meaning equivalence, readability, and critical interpretability in the target language. This property, along with the complex discourse-level context present in literary texts, also makes literary MT more challenging to computationally model and evaluate. To explore this task, we collect a dataset (Par3) of non-English language novels in the public domain, each aligned at the paragraph level to both human and automatic English translations. Using Par3, we discover that expert literary translators prefer reference human translations over machine-translated paragraphs at a rate of 84%, while state-of-the-art automatic MT metrics do not correlate with those preferences. The experts note that MT outputs contain not only mistranslations, but also discourse-disrupting errors and stylistic inconsistencies. To address these problems, we train a post-editing model whose output is preferred over normal MT output at a rate of 69% by experts. We publicly release Par3 at https://github.com/katherinethai/par3/ to spur future research into literary MT. △ Less

Submitted 25 October, 2022; originally announced October 2022.

Comments: EMNLP 2022

arXiv:2210.11689 [pdf, other]

SLING: Sino Linguistic Evaluation of Large Language Models

Authors: Yixiao Song, Kalpesh Krishna, Rajesh Bhatt, Mohit Iyyer

Abstract: To understand what kinds of linguistic knowledge are encoded by pretrained Chinese language models (LMs), we introduce the benchmark of Sino LINGuistics (SLING), which consists of 38K minimal sentence pairs in Mandarin Chinese grouped into 9 high-level linguistic phenomena. Each pair demonstrates the acceptability contrast of a specific syntactic or semantic phenomenon (e.g., The keys are lost vs.… ▽ More To understand what kinds of linguistic knowledge are encoded by pretrained Chinese language models (LMs), we introduce the benchmark of Sino LINGuistics (SLING), which consists of 38K minimal sentence pairs in Mandarin Chinese grouped into 9 high-level linguistic phenomena. Each pair demonstrates the acceptability contrast of a specific syntactic or semantic phenomenon (e.g., The keys are lost vs. The keys is lost), and an LM should assign lower perplexity to the acceptable sentence. In contrast to the CLiMP dataset (Xiang et al., 2021), which also contains Chinese minimal pairs and was created by translating the vocabulary of the English BLiMP dataset, the minimal pairs in SLING are derived primarily by applying syntactic and lexical transformations to naturally-occurring, linguist-annotated sentences from the Chinese Treebank 9.0, thus addressing severe issues in CLiMP's data generation process. We test 18 publicly available pretrained monolingual (e.g., BERT-base-zh, CPM) and multi-lingual (e.g., mT5, XLM) language models on SLING. Our experiments show that the average accuracy for LMs is far below human performance (69.7% vs. 97.1%), while BERT-base-zh achieves the highest accuracy (84.8%) of all tested LMs, even much larger ones. Additionally, we find that most LMs have a strong gender and number (singular/plural) bias, and they perform better on local phenomena than hierarchical ones. △ Less

Submitted 20 October, 2022; originally announced October 2022.

Comments: 29 pages, EMNLP 2022 camera ready

arXiv:2210.07188 [pdf, other]

ezCoref: Towards Unifying Annotation Guidelines for Coreference Resolution

Authors: Ankita Gupta, Marzena Karpinska, Wenlong Zhao, Kalpesh Krishna, Jack Merullo, Luke Yeh, Mohit Iyyer, Brendan O'Connor

Abstract: Large-scale, high-quality corpora are critical for advancing research in coreference resolution. However, existing datasets vary in their definition of coreferences and have been collected via complex and lengthy guidelines that are curated for linguistic experts. These concerns have sparked a growing interest among researchers to curate a unified set of guidelines suitable for annotators with var… ▽ More Large-scale, high-quality corpora are critical for advancing research in coreference resolution. However, existing datasets vary in their definition of coreferences and have been collected via complex and lengthy guidelines that are curated for linguistic experts. These concerns have sparked a growing interest among researchers to curate a unified set of guidelines suitable for annotators with various backgrounds. In this work, we develop a crowdsourcing-friendly coreference annotation methodology, ezCoref, consisting of an annotation tool and an interactive tutorial. We use ezCoref to re-annotate 240 passages from seven existing English coreference datasets (spanning fiction, news, and multiple other domains) while teaching annotators only cases that are treated similarly across these datasets. Surprisingly, we find that reasonable quality annotations were already achievable (>90% agreement between the crowd and expert annotations) even without extensive training. On carefully analyzing the remaining disagreements, we identify the presence of linguistic cases that our annotators unanimously agree upon but lack unified treatments (e.g., generic pronouns, appositives) in existing datasets. We propose the research community should revisit these phenomena when curating future unified annotation guidelines. △ Less

Submitted 13 October, 2022; originally announced October 2022.

Comments: preprint (19 pages), code in https://github.com/gnkitaa/ezCoref

arXiv:2210.07062 [pdf, ps, other]

Non-Archimedean Welch Bounds and Non-Archimedean Zauner Conjecture

Authors: K. Mahesh Krishna

Abstract: Let $\mathbb{K}$ be a non-Archimedean (complete) valued field satisfying \begin{align*} \left|\sum_{j=1}^{n}λ_j^2\right|=\max_{1\leq j \leq n}|λ_j|^2, \quad \forall λ_j \in \mathbb{K}, 1\leq j \leq n, \forall n \in \mathbb{N}. \end{align*} For $d\in \mathbb{N}$, let $\mathbb{K}^d$ be the standard $d$-dimensional non-Archimedean Hilbert space. Let $m \in \mathbb{N}$ and… ▽ More Let $\mathbb{K}$ be a non-Archimedean (complete) valued field satisfying \begin{align*} \left|\sum_{j=1}^{n}λ_j^2\right|=\max_{1\leq j \leq n}|λ_j|^2, \quad \forall λ_j \in \mathbb{K}, 1\leq j \leq n, \forall n \in \mathbb{N}. \end{align*} For $d\in \mathbb{N}$, let $\mathbb{K}^d$ be the standard $d$-dimensional non-Archimedean Hilbert space. Let $m \in \mathbb{N}$ and $\text{Sym}^m(\mathbb{K}^d)$ be the non-Archimedean Hilbert space of symmetric m-tensors. We prove the following result. If $\{τ_j\}_{j=1}^n$ is a collection in $\mathbb{K}^d$ satisfying $\langle τ_j, τ_j\rangle =1$ for all $1\leq j \leq n$ and the operator $\text{Sym}^m(\mathbb{K}^d)\ni x \mapsto \sum_{j=1}^n\langle x, τ_j^{\otimes m}\rangle τ_j^{\otimes m} \in \text{Sym}^m(\mathbb{K}^d)$ is diagonalizable, then \begin{align} (1) \quad \quad \quad \max_{1\leq j,k \leq n, j \neq k}\{|n|, |\langle τ_j, τ_k\rangle|^{2m} \}\geq \frac{|n|^2}{\left|{d+m-1 \choose m}\right| }. \end{align} We call Inequality (1) as the non-Archimedean version of Welch bounds obtained by Welch [\textit{IEEE Transactions on Information Theory, 1974}]. We formulate non-Archimedean Zauner conjecture. △ Less

Submitted 28 August, 2022; originally announced October 2022.

Comments: 9 Pages, 0 Figures

MSC Class: 12J25; 46S10; 47S10

arXiv:2209.15558 [pdf, other]

Out-of-Distribution Detection and Selective Generation for Conditional Language Models

Authors: Jie Ren, Jiaming Luo, Yao Zhao, Kundan Krishna, Mohammad Saleh, Balaji Lakshminarayanan, Peter J. Liu

Abstract: Machine learning algorithms typically assume independent and identically distributed samples in training and at test time. Much work has shown that high-performing ML classifiers can degrade significantly and provide overly-confident, wrong classification predictions, particularly for out-of-distribution (OOD) inputs. Conditional language models (CLMs) are predominantly trained to classify the nex… ▽ More Machine learning algorithms typically assume independent and identically distributed samples in training and at test time. Much work has shown that high-performing ML classifiers can degrade significantly and provide overly-confident, wrong classification predictions, particularly for out-of-distribution (OOD) inputs. Conditional language models (CLMs) are predominantly trained to classify the next token in an output sequence, and may suffer even worse degradation on OOD inputs as the prediction is done auto-regressively over many steps. Furthermore, the space of potential low-quality outputs is larger as arbitrary text can be generated and it is important to know when to trust the generated output. We present a highly accurate and lightweight OOD detection method for CLMs, and demonstrate its effectiveness on abstractive summarization and translation. We also show how our method can be used under the common and realistic setting of distribution shift for selective generation (analogous to selective prediction for classification) of high-quality outputs, while automatically abstaining from low-quality ones, enabling safer deployment of generative language models. △ Less

Submitted 7 March, 2023; v1 submitted 30 September, 2022; originally announced September 2022.

Comments: Published in ICLR 2023

arXiv:2209.14922 [pdf, other]

GDIP: Gated Differentiable Image Processing for Object-Detection in Adverse Conditions

Authors: Sanket Kalwar, Dhruv Patel, Aakash Aanegola, Krishna Reddy Konda, Sourav Garg, K Madhava Krishna

Abstract: Detecting objects under adverse weather and lighting conditions is crucial for the safe and continuous operation of an autonomous vehicle, and remains an unsolved problem. We present a Gated Differentiable Image Processing (GDIP) block, a domain-agnostic network architecture, which can be plugged into existing object detection networks (e.g., Yolo) and trained end-to-end with adverse condition ima… ▽ More Detecting objects under adverse weather and lighting conditions is crucial for the safe and continuous operation of an autonomous vehicle, and remains an unsolved problem. We present a Gated Differentiable Image Processing (GDIP) block, a domain-agnostic network architecture, which can be plugged into existing object detection networks (e.g., Yolo) and trained end-to-end with adverse condition images such as those captured under fog and low lighting. Our proposed GDIP block learns to enhance images directly through the downstream object detection loss. This is achieved by learning parameters of multiple image pre-processing (IP) techniques that operate concurrently, with their outputs combined using weights learned through a novel gating mechanism. We further improve GDIP through a multi-stage guidance procedure for progressive image enhancement. Finally, trading off accuracy for speed, we propose a variant of GDIP that can be used as a regularizer for training Yolo, which eliminates the need for GDIP-based image enhancement during inference, resulting in higher throughput and plausible real-world deployment. We demonstrate significant improvement in detection performance over several state-of-the-art methods through quantitative and qualitative studies on synthetic datasets such as PascalVOC, and real-world foggy (RTTS) and low-lighting (ExDark) datasets. △ Less

Submitted 29 September, 2022; originally announced September 2022.

Comments: Submitted to ICRA2023. More information at https://gatedip.github.io

arXiv:2209.14389 [pdf, other]

Downstream Datasets Make Surprisingly Good Pretraining Corpora

Authors: Kundan Krishna, Saurabh Garg, Jeffrey P. Bigham, Zachary C. Lipton

Abstract: For most natural language processing tasks, the dominant practice is to finetune large pretrained transformer models (e.g., BERT) using smaller downstream datasets. Despite the success of this approach, it remains unclear to what extent these gains are attributable to the massive background corpora employed for pretraining versus to the pretraining objectives themselves. This paper introduces a la… ▽ More For most natural language processing tasks, the dominant practice is to finetune large pretrained transformer models (e.g., BERT) using smaller downstream datasets. Despite the success of this approach, it remains unclear to what extent these gains are attributable to the massive background corpora employed for pretraining versus to the pretraining objectives themselves. This paper introduces a large-scale study of self-pretraining, where the same (downstream) training data is used for both pretraining and finetuning. In experiments addressing both ELECTRA and RoBERTa models and 10 distinct downstream classification datasets, we observe that self-pretraining rivals standard pretraining on the BookWiki corpus (despite using around $10\times$--$500\times$ less data), outperforming the latter on $7$ and $5$ datasets, respectively. Surprisingly, these task-specific pretrained models often perform well on other tasks, including the GLUE benchmark. Besides classification tasks, self-pretraining also provides benefits on structured output prediction tasks such as span based question answering and commonsense inference, often providing more than $50\%$ of the performance boosts provided by pretraining on the BookWiki corpus. Our results hint that in many scenarios, performance gains attributable to pretraining are driven primarily by the pretraining objective itself and are not always attributable to the use of external pretraining data in massive amounts. These findings are especially relevant in light of concerns about intellectual property and offensive content in web-scale pretraining data. △ Less

Submitted 26 May, 2023; v1 submitted 28 September, 2022; originally announced September 2022.

Comments: ACL2023 Camera Ready

arXiv:2209.13418 [pdf, other]

UAV-based Visual Remote Sensing for Automated Building Inspection

Authors: Kushagra Srivastava, Dhruv Patel, Aditya Kumar Jha, Mohhit Kumar Jha, Jaskirat Singh, Ravi Kiran Sarvadevabhatla, Pradeep Kumar Ramancharla, Harikumar Kandath, K. Madhava Krishna

Abstract: Unmanned Aerial Vehicle (UAV) based remote sensing system incorporated with computer vision has demonstrated potential for assisting building construction and in disaster management like damage assessment during earthquakes. The vulnerability of a building to earthquake can be assessed through inspection that takes into account the expected damage progression of the associated component and the co… ▽ More Unmanned Aerial Vehicle (UAV) based remote sensing system incorporated with computer vision has demonstrated potential for assisting building construction and in disaster management like damage assessment during earthquakes. The vulnerability of a building to earthquake can be assessed through inspection that takes into account the expected damage progression of the associated component and the component's contribution to structural system performance. Most of these inspections are done manually, leading to high utilization of manpower, time, and cost. This paper proposes a methodology to automate these inspections through UAV-based image data collection and a software library for post-processing that helps in estimating the seismic structural parameters. The key parameters considered here are the distances between adjacent buildings, building plan-shape, building plan area, objects on the rooftop and rooftop layout. The accuracy of the proposed methodology in estimating the above-mentioned parameters is verified through field measurements taken using a distance measuring sensor and also from the data obtained through Google Earth. Additional details and code can be accessed from https://uvrsabi.github.io/ . △ Less

Submitted 27 September, 2022; originally announced September 2022.

Comments: Paper accepted at CVCIE Workshop at ECCV, 2022 and the project page is https://uvrsabi.github.io/

Showing 1–50 of 164 results for author: Krishna, K