-
In Defense of RAG in the Era of Long-Context Language Models
Authors:
Tan Yu,
Anbang Xu,
Rama Akkiraju
Abstract:
Overcoming the limited context limitations in early-generation LLMs, retrieval-augmented generation (RAG) has been a reliable solution for context-based answer generation in the past. Recently, the emergence of long-context LLMs allows the models to incorporate much longer text sequences, making RAG less attractive. Recent studies show that long-context LLMs significantly outperform RAG in long-co…
▽ More
Overcoming the limited context limitations in early-generation LLMs, retrieval-augmented generation (RAG) has been a reliable solution for context-based answer generation in the past. Recently, the emergence of long-context LLMs allows the models to incorporate much longer text sequences, making RAG less attractive. Recent studies show that long-context LLMs significantly outperform RAG in long-context applications. Unlike the existing works favoring the long-context LLM over RAG, we argue that the extremely long context in LLMs suffers from a diminished focus on relevant information and leads to potential degradation in answer quality. This paper revisits the RAG in long-context answer generation. We propose an order-preserve retrieval-augmented generation (OP-RAG) mechanism, which significantly improves the performance of RAG for long-context question-answer applications. With OP-RAG, as the number of retrieved chunks increases, the answer quality initially rises, and then declines, forming an inverted U-shaped curve. There exist sweet points where OP-RAG could achieve higher answer quality with much less tokens than long-context LLM taking the whole context as input. Extensive experiments on public benchmark demonstrate the superiority of our OP-RAG.
△ Less
Submitted 3 September, 2024;
originally announced September 2024.
-
ProgramAlly: Creating Custom Visual Access Programs via Multi-Modal End-User Programming
Authors:
Jaylin Herskovitz,
Andi Xu,
Rahaf Alharbi,
Anhong Guo
Abstract:
Existing visual assistive technologies are built for simple and common use cases, and have few avenues for blind people to customize their functionalities. Drawing from prior work on DIY assistive technology, this paper investigates end-user programming as a means for users to create and customize visual access programs to meet their unique needs. We introduce ProgramAlly, a system for creating cu…
▽ More
Existing visual assistive technologies are built for simple and common use cases, and have few avenues for blind people to customize their functionalities. Drawing from prior work on DIY assistive technology, this paper investigates end-user programming as a means for users to create and customize visual access programs to meet their unique needs. We introduce ProgramAlly, a system for creating custom filters for visual information, e.g., 'find NUMBER on BUS', leveraging three end-user programming approaches: block programming, natural language, and programming by example. To implement ProgramAlly, we designed a representation of visual filtering tasks based on scenarios encountered by blind people, and integrated a set of on-device and cloud models for generating and running these programs. In user studies with 12 blind adults, we found that participants preferred different programming modalities depending on the task, and envisioned using visual access programs to address unique accessibility challenges that are otherwise difficult with existing applications. Through ProgramAlly, we present an exploration of how blind end-users can create visual access programs to customize and control their experiences.
△ Less
Submitted 19 August, 2024;
originally announced August 2024.
-
HELP: Hierarchical Embeddings-based Log Parsing
Authors:
Andy Xu,
Arno Gau
Abstract:
Logs are a first-hand source of information for software maintenance and failure diagnosis. Log parsing, which converts semi-structured log messages into structured templates, is a prerequisite for automated log analysis tasks such as anomaly detection, troubleshooting, and root cause analysis. However, existing log parsers fail in real-world systems for three main reasons. First, traditional heur…
▽ More
Logs are a first-hand source of information for software maintenance and failure diagnosis. Log parsing, which converts semi-structured log messages into structured templates, is a prerequisite for automated log analysis tasks such as anomaly detection, troubleshooting, and root cause analysis. However, existing log parsers fail in real-world systems for three main reasons. First, traditional heuristics-based parsers require handcrafted features and domain knowledge, which are difficult to generalize at scale. Second, existing large language model-based parsers rely on periodic offline processing, limiting their effectiveness in real-time use cases. Third, existing online parsing algorithms are susceptible to log drift, where slight log changes create false positives that drown out real anomalies. To address these challenges, we propose HELP, a Hierarchical Embeddings-based Log Parser. HELP is the first online semantic-based parser to leverage LLMs for performant and cost-effective log parsing. We achieve this through a novel hierarchical embeddings module, which fine-tunes a text embedding model to cluster logs before parsing, reducing querying costs by multiple orders of magnitude. To combat log drift, we also develop an iterative rebalancing module, which periodically updates existing log groupings. We evaluate HELP extensively on 14 public large-scale datasets, showing that HELP achieves significantly higher F1-weighted grouping and parsing accuracy than current state-of-the-art online log parsers. We also implement HELP into Iudex's production observability platform, confirming HELP's practicality in a production environment. Our results show that HELP is effective and efficient for high-throughput real-world log parsing.
△ Less
Submitted 15 August, 2024;
originally announced August 2024.
-
Imagen 3
Authors:
Imagen-Team-Google,
:,
Jason Baldridge,
Jakob Bauer,
Mukul Bhutani,
Nicole Brichtova,
Andrew Bunner,
Kelvin Chan,
Yichang Chen,
Sander Dieleman,
Yuqing Du,
Zach Eaton-Rosen,
Hongliang Fei,
Nando de Freitas,
Yilin Gao,
Evgeny Gladchenko,
Sergio Gómez Colmenarejo,
Mandy Guo,
Alex Haig,
Will Hawkins,
Hexiang Hu,
Huilian Huang,
Tobenna Peter Igwe,
Christos Kaplanis,
Siavash Khodadadeh
, et al. (227 additional authors not shown)
Abstract:
We introduce Imagen 3, a latent diffusion model that generates high quality images from text prompts. We describe our quality and responsibility evaluations. Imagen 3 is preferred over other state-of-the-art (SOTA) models at the time of evaluation. In addition, we discuss issues around safety and representation, as well as methods we used to minimize the potential harm of our models.
We introduce Imagen 3, a latent diffusion model that generates high quality images from text prompts. We describe our quality and responsibility evaluations. Imagen 3 is preferred over other state-of-the-art (SOTA) models at the time of evaluation. In addition, we discuss issues around safety and representation, as well as methods we used to minimize the potential harm of our models.
△ Less
Submitted 13 August, 2024;
originally announced August 2024.
-
Text-Driven Neural Collaborative Filtering Model for Paper Source Tracing
Authors:
Aobo Xu,
Bingyu Chang,
Qingpeng Liu,
Ling Jian
Abstract:
Identifying significant references within the complex interrelations of a citation knowledge graph is challenging, which encompasses connections through citations, authorship, keywords, and other relational attributes. The Paper Source Tracing (PST) task seeks to automate the identification of pivotal references for given scholarly articles utilizing advanced data mining techniques. In the KDD CUP…
▽ More
Identifying significant references within the complex interrelations of a citation knowledge graph is challenging, which encompasses connections through citations, authorship, keywords, and other relational attributes. The Paper Source Tracing (PST) task seeks to automate the identification of pivotal references for given scholarly articles utilizing advanced data mining techniques. In the KDD CUP OAG-Challenge PST track, we design a recommendation-based framework tailored for the PST task. This framework employs the Neural Collaborative Filtering (NCF) model to generate final predictions. To process the textual attributes of the papers and extract input features for the model, we utilize SciBERT, a pre-trained language model. According to the experimental results, our method achieved a score of 0.37814 on the Mean Average Precision (MAP) metric, outperforming baseline models and ranking 11th among all participating teams. The source code is publicly available at https://github.com/MyLove-XAB/KDDCupFinal.
△ Less
Submitted 19 August, 2024; v1 submitted 24 July, 2024;
originally announced July 2024.
-
Generally-Occurring Model Change for Robust Counterfactual Explanations
Authors:
Ao Xu,
Tieru Wu
Abstract:
With the increasing impact of algorithmic decision-making on human lives, the interpretability of models has become a critical issue in machine learning. Counterfactual explanation is an important method in the field of interpretable machine learning, which can not only help users understand why machine learning models make specific decisions, but also help users understand how to change these dec…
▽ More
With the increasing impact of algorithmic decision-making on human lives, the interpretability of models has become a critical issue in machine learning. Counterfactual explanation is an important method in the field of interpretable machine learning, which can not only help users understand why machine learning models make specific decisions, but also help users understand how to change these decisions. Naturally, it is an important task to study the robustness of counterfactual explanation generation algorithms to model changes. Previous literature has proposed the concept of Naturally-Occurring Model Change, which has given us a deeper understanding of robustness to model change. In this paper, we first further generalize the concept of Naturally-Occurring Model Change, proposing a more general concept of model parameter changes, Generally-Occurring Model Change, which has a wider range of applicability. We also prove the corresponding probabilistic guarantees. In addition, we consider a more specific problem, data set perturbation, and give relevant theoretical results by combining optimization theory.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
FACTS About Building Retrieval Augmented Generation-based Chatbots
Authors:
Rama Akkiraju,
Anbang Xu,
Deepak Bora,
Tan Yu,
Lu An,
Vishal Seth,
Aaditya Shukla,
Pritam Gundecha,
Hridhay Mehta,
Ashwin Jha,
Prithvi Raj,
Abhinav Balasubramanian,
Murali Maram,
Guru Muthusamy,
Shivakesh Reddy Annepally,
Sidney Knowles,
Min Du,
Nick Burnett,
Sean Javiya,
Ashok Marannan,
Mamta Kumari,
Surbhi Jha,
Ethan Dereszenski,
Anupam Chakraborty,
Subhash Ranjan
, et al. (13 additional authors not shown)
Abstract:
Enterprise chatbots, powered by generative AI, are emerging as key applications to enhance employee productivity. Retrieval Augmented Generation (RAG), Large Language Models (LLMs), and orchestration frameworks like Langchain and Llamaindex are crucial for building these chatbots. However, creating effective enterprise chatbots is challenging and requires meticulous RAG pipeline engineering. This…
▽ More
Enterprise chatbots, powered by generative AI, are emerging as key applications to enhance employee productivity. Retrieval Augmented Generation (RAG), Large Language Models (LLMs), and orchestration frameworks like Langchain and Llamaindex are crucial for building these chatbots. However, creating effective enterprise chatbots is challenging and requires meticulous RAG pipeline engineering. This includes fine-tuning embeddings and LLMs, extracting documents from vector databases, rephrasing queries, reranking results, designing prompts, honoring document access controls, providing concise responses, including references, safeguarding personal information, and building orchestration agents. We present a framework for building RAG-based chatbots based on our experience with three NVIDIA chatbots: for IT/HR benefits, financial earnings, and general content. Our contributions are three-fold: introducing the FACTS framework (Freshness, Architectures, Cost, Testing, Security), presenting fifteen RAG pipeline control points, and providing empirical results on accuracy-latency tradeoffs between large and small LLMs. To the best of our knowledge, this is the first paper of its kind that provides a holistic view of the factors as well as solutions for building secure enterprise-grade chatbots."
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Logicbreaks: A Framework for Understanding Subversion of Rule-based Inference
Authors:
Anton Xue,
Avishree Khare,
Rajeev Alur,
Surbhi Goel,
Eric Wong
Abstract:
We study how to subvert language models from following the rules. We model rule-following as inference in propositional Horn logic, a mathematical system in which rules have the form "if $P$ and $Q$, then $R$" for some propositions $P$, $Q$, and $R$. We prove that although transformers can faithfully abide by such rules, maliciously crafted prompts can nevertheless mislead even theoretically const…
▽ More
We study how to subvert language models from following the rules. We model rule-following as inference in propositional Horn logic, a mathematical system in which rules have the form "if $P$ and $Q$, then $R$" for some propositions $P$, $Q$, and $R$. We prove that although transformers can faithfully abide by such rules, maliciously crafted prompts can nevertheless mislead even theoretically constructed models. Empirically, we find that attacks on our theoretical models mirror popular attacks on large language models. Our work suggests that studying smaller theoretical models can help understand the behavior of large language models in rule-based settings like logical reasoning and jailbreak attacks.
△ Less
Submitted 21 June, 2024;
originally announced July 2024.
-
Temporally Multi-Scale Sparse Self-Attention for Physical Activity Data Imputation
Authors:
Hui Wei,
Maxwell A. Xu,
Colin Samplawski,
James M. Rehg,
Santosh Kumar,
Benjamin M. Marlin
Abstract:
Wearable sensors enable health researchers to continuously collect data pertaining to the physiological state of individuals in real-world settings. However, such data can be subject to extensive missingness due to a complex combination of factors. In this work, we study the problem of imputation of missing step count data, one of the most ubiquitous forms of wearable sensor data. We construct a n…
▽ More
Wearable sensors enable health researchers to continuously collect data pertaining to the physiological state of individuals in real-world settings. However, such data can be subject to extensive missingness due to a complex combination of factors. In this work, we study the problem of imputation of missing step count data, one of the most ubiquitous forms of wearable sensor data. We construct a novel and large scale data set consisting of a training set with over 3 million hourly step count observations and a test set with over 2.5 million hourly step count observations. We propose a domain knowledge-informed sparse self-attention model for this task that captures the temporal multi-scale nature of step-count data. We assess the performance of the model relative to baselines and conduct ablation studies to verify our specific model designs.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions
Authors:
Anfeng Xu,
Kevin Huang,
Tiantian Feng,
Lue Shen,
Helen Tager-Flusberg,
Shrikanth Narayanan
Abstract:
Speech foundation models, trained on vast datasets, have opened unique opportunities in addressing challenging low-resource speech understanding, such as child speech. In this work, we explore the capabilities of speech foundation models on child-adult speaker diarization. We show that exemplary foundation models can achieve 39.5% and 62.3% relative reductions in Diarization Error Rate and Speaker…
▽ More
Speech foundation models, trained on vast datasets, have opened unique opportunities in addressing challenging low-resource speech understanding, such as child speech. In this work, we explore the capabilities of speech foundation models on child-adult speaker diarization. We show that exemplary foundation models can achieve 39.5% and 62.3% relative reductions in Diarization Error Rate and Speaker Confusion Rate, respectively, compared to previous speaker diarization methods. In addition, we benchmark and evaluate the speaker diarization results of the speech foundation models with varying the input audio window size, speaker demographics, and training data ratio. Our results highlight promising pathways for understanding and adopting speech foundation models to facilitate child speech understanding.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
NUMCoT: Numerals and Units of Measurement in Chain-of-Thought Reasoning using Large Language Models
Authors:
Ancheng Xu,
Minghuan Tan,
Lei Wang,
Min Yang,
Ruifeng Xu
Abstract:
Numeral systems and units of measurement are two conjoined topics in activities of human beings and have mutual effects with the languages expressing them. Currently, the evaluation of Large Language Models (LLMs) often involves mathematical reasoning, yet little attention is given to how minor changes in numbers or units can drastically alter the complexity of problems and the performance of LLMs…
▽ More
Numeral systems and units of measurement are two conjoined topics in activities of human beings and have mutual effects with the languages expressing them. Currently, the evaluation of Large Language Models (LLMs) often involves mathematical reasoning, yet little attention is given to how minor changes in numbers or units can drastically alter the complexity of problems and the performance of LLMs. In this paper, we scrutinize existing LLMs on processing of numerals and units of measurement by constructing datasets with perturbations. We first anatomize the reasoning of math word problems to different sub-procedures like numeral conversions from language to numbers and measurement conversions based on units. Then we further annotate math word problems from ancient Chinese arithmetic works which are challenging in numerals and units of measurement. Experiments on perturbed datasets demonstrate that LLMs still encounter difficulties in handling numeral and measurement conversions.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Enhancing Counterfactual Image Generation Using Mahalanobis Distance with Distribution Preferences in Feature Space
Authors:
Yukai Zhang,
Ao Xu,
Zihao Li,
Tieru Wu
Abstract:
In the realm of Artificial Intelligence (AI), the importance of Explainable Artificial Intelligence (XAI) is increasingly recognized, particularly as AI models become more integral to our lives. One notable single-instance XAI approach is counterfactual explanation, which aids users in comprehending a model's decisions and offers guidance on altering these decisions. Specifically in the context of…
▽ More
In the realm of Artificial Intelligence (AI), the importance of Explainable Artificial Intelligence (XAI) is increasingly recognized, particularly as AI models become more integral to our lives. One notable single-instance XAI approach is counterfactual explanation, which aids users in comprehending a model's decisions and offers guidance on altering these decisions. Specifically in the context of image classification models, effective image counterfactual explanations can significantly enhance user understanding. This paper introduces a novel method for computing feature importance within the feature space of a black-box model. By employing information fusion techniques, our method maximizes the use of data to address feature counterfactual explanations in the feature space. Subsequently, we utilize an image generation model to transform these feature counterfactual explanations into image counterfactual explanations. Our experiments demonstrate that the counterfactual explanations generated by our method closely resemble the original images in both pixel and feature spaces. Additionally, our method outperforms established baselines, achieving impressive experimental results.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Weak Robust Compatibility Between Learning Algorithms and Counterfactual Explanation Generation Algorithms
Authors:
Ao Xu,
Tieru Wu
Abstract:
Counterfactual explanation generation is a powerful method for Explainable Artificial Intelligence. It can help users understand why machine learning models make specific decisions, and how to change those decisions. Evaluating the robustness of counterfactual explanation algorithms is therefore crucial. Previous literature has widely studied the robustness based on the perturbation of input insta…
▽ More
Counterfactual explanation generation is a powerful method for Explainable Artificial Intelligence. It can help users understand why machine learning models make specific decisions, and how to change those decisions. Evaluating the robustness of counterfactual explanation algorithms is therefore crucial. Previous literature has widely studied the robustness based on the perturbation of input instances. However, the robustness defined from the perspective of perturbed instances is sometimes biased, because this definition ignores the impact of learning algorithms on robustness. In this paper, we propose a more reasonable definition, Weak Robust Compatibility, based on the perspective of explanation strength. In practice, we propose WRC-Test to help us generate more robust counterfactuals. Meanwhile, we designed experiments to verify the effectiveness of WRC-Test. Theoretically, we introduce the concepts of PAC learning theory and define the concept of PAC WRC-Approximability. Based on reasonable assumptions, we establish oracle inequalities about weak robustness, which gives a sufficient condition for PAC WRC-Approximability.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
High-Performance Privacy-Preserving Matrix Completion for Trajectory Recovery
Authors:
Jiahao Guo,
An-Bao Xu
Abstract:
Matrix completion has important applications in trajectory recovery and mobile social networks. However, sending raw data containing personal, sensitive information to cloud computing nodes may lead to privacy exposure issue.The privacy-preserving matrix completion is a useful approach to perform matrix completion while preserving privacy. In this paper, we propose a high-performance method for pr…
▽ More
Matrix completion has important applications in trajectory recovery and mobile social networks. However, sending raw data containing personal, sensitive information to cloud computing nodes may lead to privacy exposure issue.The privacy-preserving matrix completion is a useful approach to perform matrix completion while preserving privacy. In this paper, we propose a high-performance method for privacy-preserving matrix completion. First,we use a lightweight encryption scheme to encrypt the raw data and then perform matrix completion using alternating direction method of multipliers (ADMM). Then,the complemented matrix is decrypted and compared with the original matrix to calculate the error. This method has faster speed with higher accuracy. The results of numerical experiments reveal that the proposed method is faster than other algorithms.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
Efficient and Effective Weakly-Supervised Action Segmentation via Action-Transition-Aware Boundary Alignment
Authors:
Angchi Xu,
Wei-Shi Zheng
Abstract:
Weakly-supervised action segmentation is a task of learning to partition a long video into several action segments, where training videos are only accompanied by transcripts (ordered list of actions). Most of existing methods need to infer pseudo segmentation for training by serial alignment between all frames and the transcript, which is time-consuming and hard to be parallelized while training.…
▽ More
Weakly-supervised action segmentation is a task of learning to partition a long video into several action segments, where training videos are only accompanied by transcripts (ordered list of actions). Most of existing methods need to infer pseudo segmentation for training by serial alignment between all frames and the transcript, which is time-consuming and hard to be parallelized while training. In this work, we aim to escape from this inefficient alignment with massive but redundant frames, and instead to directly localize a few action transitions for pseudo segmentation generation, where a transition refers to the change from an action segment to its next adjacent one in the transcript. As the true transitions are submerged in noisy boundaries due to intra-segment visual variation, we propose a novel Action-Transition-Aware Boundary Alignment (ATBA) framework to efficiently and effectively filter out noisy boundaries and detect transitions. In addition, to boost the semantic learning in the case that noise is inevitably present in the pseudo segmentation, we also introduce video-level losses to utilize the trusted video-level supervision. Extensive experiments show the effectiveness of our approach on both performance and training speed.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
Large Language Model Augmented Exercise Retrieval for Personalized Language Learning
Authors:
Austin Xu,
Will Monroe,
Klinton Bicknell
Abstract:
We study the problem of zero-shot exercise retrieval in the context of online language learning, to give learners the ability to explicitly request personalized exercises via natural language. Using real-world data collected from language learners, we observe that vector similarity approaches poorly capture the relationship between exercise content and the language that learners use to express wha…
▽ More
We study the problem of zero-shot exercise retrieval in the context of online language learning, to give learners the ability to explicitly request personalized exercises via natural language. Using real-world data collected from language learners, we observe that vector similarity approaches poorly capture the relationship between exercise content and the language that learners use to express what they want to learn. This semantic gap between queries and content dramatically reduces the effectiveness of general-purpose retrieval models pretrained on large scale information retrieval datasets like MS MARCO. We leverage the generative capabilities of large language models to bridge the gap by synthesizing hypothetical exercises based on the learner's input, which are then used to search for relevant exercises. Our approach, which we call mHyER, overcomes three challenges: (1) lack of relevance labels for training, (2) unrestricted learner input content, and (3) low semantic similarity between input and retrieval candidates. mHyER outperforms several strong baselines on two novel benchmarks created from crowdsourced data and publicly available data.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Large (and Deep) Factor Models
Authors:
Bryan Kelly,
Boris Kuznetsov,
Semyon Malamud,
Teng Andrea Xu
Abstract:
We open up the black box behind Deep Learning for portfolio optimization and prove that a sufficiently wide and arbitrarily deep neural network (DNN) trained to maximize the Sharpe ratio of the Stochastic Discount Factor (SDF) is equivalent to a large factor model (LFM): A linear factor pricing model that uses many non-linear characteristics. The nature of these characteristics depends on the arch…
▽ More
We open up the black box behind Deep Learning for portfolio optimization and prove that a sufficiently wide and arbitrarily deep neural network (DNN) trained to maximize the Sharpe ratio of the Stochastic Discount Factor (SDF) is equivalent to a large factor model (LFM): A linear factor pricing model that uses many non-linear characteristics. The nature of these characteristics depends on the architecture of the DNN in an explicit, tractable fashion. This makes it possible to derive end-to-end trained DNN-based SDFs in closed form for the first time. We evaluate LFMs empirically and show how various architectural choices impact SDF performance. We document the virtue of depth complexity: With enough data, the out-of-sample performance of DNN-SDF is increasing in the NN depth, saturating at huge depths of around 100 hidden layers.
△ Less
Submitted 20 January, 2024;
originally announced February 2024.
-
A Test-Time Learning Approach to Reparameterize the Geophysical Inverse Problem with a Convolutional Neural Network
Authors:
Anran Xu,
Lindsey J. Heagy
Abstract:
Regularization is critical for solving ill-posed geophysical inverse problems. Explicit regularization is often used, but there are opportunities to explore the implicit regularization effects that are inherent in a Neural Network structure. Researchers have discovered that the Convolutional Neural Network (CNN) architecture inherently enforces a regularization that is advantageous for addressing…
▽ More
Regularization is critical for solving ill-posed geophysical inverse problems. Explicit regularization is often used, but there are opportunities to explore the implicit regularization effects that are inherent in a Neural Network structure. Researchers have discovered that the Convolutional Neural Network (CNN) architecture inherently enforces a regularization that is advantageous for addressing diverse inverse problems in computer vision, including de-noising and in-painting. In this study, we examine the applicability of this implicit regularization to geophysical inversions. The CNN maps an arbitrary vector to the model space. The predicted subsurface model is then fed into a forward numerical simulation to generate corresponding predicted measurements. Subsequently, the objective function value is computed by comparing these predicted measurements with the observed measurements. The backpropagation algorithm is employed to update the trainable parameters of the CNN during the inversion. Note that the CNN in our proposed method does not require training before the inversion, rather, the CNN weights are estimated in the inversion process, hence this is a test-time learning (TTL) approach. In this study, we choose to focus on the Direct Current (DC) resistivity inverse problem, which is representative of typical Tikhonov-style geophysical inversions (e.g. gravity, electromagnetic, etc.), to test our hypothesis. The experimental results demonstrate that the implicit regularization can be useful in some DC resistivity inversions. We also provide a discussion of the potential sources of this implicit regularization introduced from the CNN architecture and discuss some practical guides for applying the proposed method to other geophysical methods.
△ Less
Submitted 9 July, 2024; v1 submitted 7 December, 2023;
originally announced December 2023.
-
Compilation for Surface Code Quantum Computers
Authors:
Abtin Molavi,
Amanda Xu,
Swamit Tannu,
Aws Albarghouthi
Abstract:
Practical applications of quantum computing depend on fault-tolerant devices with error correction. Today, the most promising approach is a class of error-correcting codes called surface codes. In this paper, we study the problem of compiling quantum circuits for quantum computers implementing surface codes. The problem involves (1) mapping circuit qubits to the device qubits and (2) routing execu…
▽ More
Practical applications of quantum computing depend on fault-tolerant devices with error correction. Today, the most promising approach is a class of error-correcting codes called surface codes. In this paper, we study the problem of compiling quantum circuits for quantum computers implementing surface codes. The problem involves (1) mapping circuit qubits to the device qubits and (2) routing execution paths between pairs of interacting qubits. We call this the surface code mapping and routing problem (SCMR).
Solving SCMR near-optimally is critical for both efficiency and correctness. An optimal solution limits the cost of a computation in terms of precious quantum resources and also minimizes the probability of incurring an undetected logical error, which increases with each additional time step.
We study SCMR from a theoretical and practical perspective. First, we prove that SCMR, as well as a constrained version of the problem, is NP-complete. Second, we present a optimal algorithm for solving SCMR that is based on a SAT encoding. Third, we present a spectrum of efficient relaxations of SCMR, for example, by exploiting greedy algorithms for solving the problem of node-disjoint paths. Finally, we implement and evaluate our algorithms on a large suite of real and synthetic circuits. Our results suggest that our relaxations are a powerful tool for compiling realistic workloads. The relaxation-based algorithms are orders of magnitude faster than the optimal algorithm (solving instances with tens of thousands of gates in minutes), while still finding high-quality solutions, achieving the theoretical lower bound on up to 55 out of 168 circuits from a diverse benchmark suite.
△ Less
Submitted 29 November, 2023;
originally announced November 2023.
-
Hyperbolic Space with Hierarchical Margin Boosts Fine-Grained Learning from Coarse Labels
Authors:
Shu-Lin Xu,
Yifan Sun,
Faen Zhang,
Anqi Xu,
Xiu-Shen Wei,
Yi Yang
Abstract:
Learning fine-grained embeddings from coarse labels is a challenging task due to limited label granularity supervision, i.e., lacking the detailed distinctions required for fine-grained tasks. The task becomes even more demanding when attempting few-shot fine-grained recognition, which holds practical significance in various applications. To address these challenges, we propose a novel method that…
▽ More
Learning fine-grained embeddings from coarse labels is a challenging task due to limited label granularity supervision, i.e., lacking the detailed distinctions required for fine-grained tasks. The task becomes even more demanding when attempting few-shot fine-grained recognition, which holds practical significance in various applications. To address these challenges, we propose a novel method that embeds visual embeddings into a hyperbolic space and enhances their discriminative ability with a hierarchical cosine margins manner. Specifically, the hyperbolic space offers distinct advantages, including the ability to capture hierarchical relationships and increased expressive power, which favors modeling fine-grained objects. Based on the hyperbolic space, we further enforce relatively large/small similarity margins between coarse/fine classes, respectively, yielding the so-called hierarchical cosine margins manner. While enforcing similarity margins in the regular Euclidean space has become popular for deep embedding learning, applying it to the hyperbolic space is non-trivial and validating the benefit for coarse-to-fine generalization is valuable. Extensive experiments conducted on five benchmark datasets showcase the effectiveness of our proposed method, yielding state-of-the-art results surpassing competing methods.
△ Less
Submitted 18 November, 2023;
originally announced November 2023.
-
Comparative Knowledge Distillation
Authors:
Alex Wilf,
Alex Tianyi Xu,
Paul Pu Liang,
Alexander Obolenskiy,
Daniel Fried,
Louis-Philippe Morency
Abstract:
In the era of large scale pretrained models, Knowledge Distillation (KD) serves an important role in transferring the wisdom of computationally heavy teacher models to lightweight, efficient student models while preserving performance. Traditional KD paradigms, however, assume readily available access to teacher models for frequent inference -- a notion increasingly at odds with the realities of c…
▽ More
In the era of large scale pretrained models, Knowledge Distillation (KD) serves an important role in transferring the wisdom of computationally heavy teacher models to lightweight, efficient student models while preserving performance. Traditional KD paradigms, however, assume readily available access to teacher models for frequent inference -- a notion increasingly at odds with the realities of costly, often proprietary, large scale models. Addressing this gap, our paper considers how to minimize the dependency on teacher model inferences in KD in a setting we term Few Teacher Inference Knowledge Distillation (FTI KD). We observe that prevalent KD techniques and state of the art data augmentation strategies fall short in this constrained setting. Drawing inspiration from educational principles that emphasize learning through comparison, we propose Comparative Knowledge Distillation (CKD), which encourages student models to understand the nuanced differences in a teacher model's interpretations of samples. Critically, CKD provides additional learning signals to the student without making additional teacher calls. We also extend the principle of CKD to groups of samples, enabling even more efficient learning from limited teacher calls. Empirical evaluation across varied experimental settings indicates that CKD consistently outperforms state of the art data augmentation and KD techniques.
△ Less
Submitted 3 November, 2023;
originally announced November 2023.
-
REBAR: Retrieval-Based Reconstruction for Time-series Contrastive Learning
Authors:
Maxwell A. Xu,
Alexander Moreno,
Hui Wei,
Benjamin M. Marlin,
James M. Rehg
Abstract:
The success of self-supervised contrastive learning hinges on identifying positive data pairs, such that when they are pushed together in embedding space, the space encodes useful information for subsequent downstream tasks. Constructing positive pairs is non-trivial as the pairing must be similar enough to reflect a shared semantic meaning, but different enough to capture within-class variation.…
▽ More
The success of self-supervised contrastive learning hinges on identifying positive data pairs, such that when they are pushed together in embedding space, the space encodes useful information for subsequent downstream tasks. Constructing positive pairs is non-trivial as the pairing must be similar enough to reflect a shared semantic meaning, but different enough to capture within-class variation. Classical approaches in vision use augmentations to exploit well-established invariances to construct positive pairs, but invariances in the time-series domain are much less obvious. In our work, we propose a novel method of using a learned measure for identifying positive pairs. Our Retrieval-Based Reconstruction (REBAR) measure measures the similarity between two sequences as the reconstruction error that results from reconstructing one sequence with retrieved information from the other. Then, if the two sequences have high REBAR similarity, we label them as a positive pair. Through validation experiments, we show that the REBAR error is a predictor of mutual class membership. Once integrated into a contrastive learning framework, our REBAR method learns an embedding that achieves state-of-the-art performance on downstream tasks across various modalities.
△ Less
Submitted 16 March, 2024; v1 submitted 1 November, 2023;
originally announced November 2023.
-
Audio-visual child-adult speaker classification in dyadic interactions
Authors:
Anfeng Xu,
Kevin Huang,
Tiantian Feng,
Helen Tager-Flusberg,
Shrikanth Narayanan
Abstract:
Interactions involving children span a wide range of important domains from learning to clinical diagnostic and therapeutic contexts. Automated analyses of such interactions are motivated by the need to seek accurate insights and offer scale and robustness across diverse and wide-ranging conditions. Identifying the speech segments belonging to the child is a critical step in such modeling. Convent…
▽ More
Interactions involving children span a wide range of important domains from learning to clinical diagnostic and therapeutic contexts. Automated analyses of such interactions are motivated by the need to seek accurate insights and offer scale and robustness across diverse and wide-ranging conditions. Identifying the speech segments belonging to the child is a critical step in such modeling. Conventional child-adult speaker classification typically relies on audio modeling approaches, overlooking visual signals that convey speech articulation information, such as lip motion. Building on the foundation of an audio-only child-adult speaker classification pipeline, we propose incorporating visual cues through active speaker detection and visual processing models. Our framework involves video pre-processing, utterance-level child-adult speaker detection, and late fusion of modality-specific predictions. We demonstrate from extensive experiments that a visually aided classification pipeline enhances the accuracy and robustness of the classification. We show relative improvements of 2.38% and 3.97% in F1 macro score when one face and two faces are visible, respectively.
△ Less
Submitted 9 October, 2023; v1 submitted 3 October, 2023;
originally announced October 2023.
-
Perceptual adjustment queries and an inverted measurement paradigm for low-rank metric learning
Authors:
Austin Xu,
Andrew D. McRae,
Jingyan Wang,
Mark A. Davenport,
Ashwin Pananjady
Abstract:
We introduce a new type of query mechanism for collecting human feedback, called the perceptual adjustment query ( PAQ). Being both informative and cognitively lightweight, the PAQ adopts an inverted measurement scheme, and combines advantages from both cardinal and ordinal queries. We showcase the PAQ in the metric learning problem, where we collect PAQ measurements to learn an unknown Mahalanobi…
▽ More
We introduce a new type of query mechanism for collecting human feedback, called the perceptual adjustment query ( PAQ). Being both informative and cognitively lightweight, the PAQ adopts an inverted measurement scheme, and combines advantages from both cardinal and ordinal queries. We showcase the PAQ in the metric learning problem, where we collect PAQ measurements to learn an unknown Mahalanobis distance. This gives rise to a high-dimensional, low-rank matrix estimation problem to which standard matrix estimators cannot be applied. Consequently, we develop a two-stage estimator for metric learning from PAQs, and provide sample complexity guarantees for this estimator. We present numerical simulations demonstrating the performance of the estimator and its notable properties.
△ Less
Submitted 8 September, 2023;
originally announced September 2023.
-
MM-AU:Towards Multimodal Understanding of Advertisement Videos
Authors:
Digbalay Bose,
Rajat Hebbar,
Tiantian Feng,
Krishna Somandepalli,
Anfeng Xu,
Shrikanth Narayanan
Abstract:
Advertisement videos (ads) play an integral part in the domain of Internet e-commerce as they amplify the reach of particular products to a broad audience or can serve as a medium to raise awareness about specific issues through concise narrative structures. The narrative structures of advertisements involve several elements like reasoning about the broad content (topic and the underlying message)…
▽ More
Advertisement videos (ads) play an integral part in the domain of Internet e-commerce as they amplify the reach of particular products to a broad audience or can serve as a medium to raise awareness about specific issues through concise narrative structures. The narrative structures of advertisements involve several elements like reasoning about the broad content (topic and the underlying message) and examining fine-grained details involving the transition of perceived tone due to the specific sequence of events and interaction among characters. In this work, to facilitate the understanding of advertisements along the three important dimensions of topic categorization, perceived tone transition, and social message detection, we introduce a multimodal multilingual benchmark called MM-AU composed of over 8.4K videos (147 hours) curated from multiple web sources. We explore multiple zero-shot reasoning baselines through the application of large language models on the ads transcripts. Further, we demonstrate that leveraging signals from multiple modalities, including audio, video, and text, in multimodal transformer-based supervised models leads to improved performance compared to unimodal approaches.
△ Less
Submitted 27 August, 2023;
originally announced August 2023.
-
Tensor Completion via Leverage Sampling and Tensor QR Decomposition for Network Latency Estimation
Authors:
Jun Lei,
Ji-Qian Zhao,
Jing-Qi Wang,
An-Bao Xu
Abstract:
In this paper, we consider the network latency estimation, which has been an important metric for network performance. However, a large scale of network latency estimation requires a lot of computing time. Therefore, we propose a new method that is much faster and maintains high accuracy. The data structure of network nodes can form a matrix, and the tensor model can be formed by introducing the t…
▽ More
In this paper, we consider the network latency estimation, which has been an important metric for network performance. However, a large scale of network latency estimation requires a lot of computing time. Therefore, we propose a new method that is much faster and maintains high accuracy. The data structure of network nodes can form a matrix, and the tensor model can be formed by introducing the time dimension. Thus, the entire problem can be be summarized as a tensor completion problem. The main idea of our method is improving the tensor leverage sampling strategy and introduce tensor QR decomposition into tensor completion. To achieve faster tensor leverage sampling, we replace tensor singular decomposition (t-SVD) with tensor CSVD-QR to appoximate t-SVD. To achieve faster completion for incomplete tensor, we use the tensor $L_{2,1}$-norm rather than traditional tensor nuclear norm. Furthermore, we introduce tensor QR decomposition into alternating direction method of multipliers (ADMM) framework. Numerical experiments witness that our method is faster than state-of-art algorithms with satisfactory accuracy.
△ Less
Submitted 27 June, 2023;
originally announced July 2023.
-
Stability Guarantees for Feature Attributions with Multiplicative Smoothing
Authors:
Anton Xue,
Rajeev Alur,
Eric Wong
Abstract:
Explanation methods for machine learning models tend not to provide any formal guarantees and may not reflect the underlying decision-making process. In this work, we analyze stability as a property for reliable feature attribution methods. We prove that relaxed variants of stability are guaranteed if the model is sufficiently Lipschitz with respect to the masking of features. We develop a smoothi…
▽ More
Explanation methods for machine learning models tend not to provide any formal guarantees and may not reflect the underlying decision-making process. In this work, we analyze stability as a property for reliable feature attribution methods. We prove that relaxed variants of stability are guaranteed if the model is sufficiently Lipschitz with respect to the masking of features. We develop a smoothing method called Multiplicative Smoothing (MuS) to achieve such a model. We show that MuS overcomes the theoretical limitations of standard smoothing techniques and can be integrated with any classifier and feature attribution method. We evaluate MuS on vision and language models with various feature attribution methods, such as LIME and SHAP, and demonstrate that MuS endows feature attributions with non-trivial stability guarantees.
△ Less
Submitted 26 October, 2023; v1 submitted 12 July, 2023;
originally announced July 2023.
-
Optimal and Stable Multi-Layer Object Rearrangement on a Tabletop
Authors:
Andy Xu,
Kai Gao,
Si Wei Feng,
Jingjin Yu
Abstract:
Object rearrangement is a fundamental sub-task in accomplishing a great many physical tasks. As such, effectively executing rearrangement is an important skill for intelligent robots to master. In this study, we conduct the first algorithmic study on optimally solving the problem of Multi-layer Object Rearrangement on a Tabletop (MORT), in which one object may be relocated at a time, and an object…
▽ More
Object rearrangement is a fundamental sub-task in accomplishing a great many physical tasks. As such, effectively executing rearrangement is an important skill for intelligent robots to master. In this study, we conduct the first algorithmic study on optimally solving the problem of Multi-layer Object Rearrangement on a Tabletop (MORT), in which one object may be relocated at a time, and an object can only be moved if other objects do not block its top surface. In addition, any intermediate structure during the reconfiguration process must be physically stable, i.e., it should stand without external support. To tackle the dual challenges of untangling the dependencies between objects and ensuring structural stability, we develop an algorithm that interleaves the computation of the optimal rearrangement plan and structural stability checking. Using a carefully constructed integer linear programming (ILP) model, our algorithm, Stability-aware Integer Programming-based Planner (SIPP), readily scales to optimally solve complex rearrangement problems of 3D structures with over 60 building blocks, with solution quality significantly outperforming natural greedy best-first approaches.
Upon the publication of the manuscript, source code and data will be available at https://github.com/arc-l/mort/
△ Less
Submitted 30 June, 2023; v1 submitted 25 June, 2023;
originally announced June 2023.
-
dotears: Scalable, consistent DAG estimation using observational and interventional data
Authors:
Albert Xue,
Jingyou Rao,
Sriram Sankararaman,
Harold Pimentel
Abstract:
New biological assays like Perturb-seq link highly parallel CRISPR interventions to a high-dimensional transcriptomic readout, providing insight into gene regulatory networks. Causal gene regulatory networks can be represented by directed acyclic graph (DAGs), but learning DAGs from observational data is complicated by lack of identifiability and a combinatorial solution space. Score-based structu…
▽ More
New biological assays like Perturb-seq link highly parallel CRISPR interventions to a high-dimensional transcriptomic readout, providing insight into gene regulatory networks. Causal gene regulatory networks can be represented by directed acyclic graph (DAGs), but learning DAGs from observational data is complicated by lack of identifiability and a combinatorial solution space. Score-based structure learning improves practical scalability of inferring DAGs. Previous score-based methods are sensitive to error variance structure; on the other hand, estimation of error variance is difficult without prior knowledge of structure. Accordingly, we present $\texttt{dotears}$ [doo-tairs], a continuous optimization framework which leverages observational and interventional data to infer a single causal structure, assuming a linear Structural Equation Model (SEM). $\texttt{dotears}$ exploits structural consequences of hard interventions to give a marginal estimate of exogenous error structure, bypassing the circular estimation problem. We show that $\texttt{dotears}$ is a provably consistent estimator of the true DAG under mild assumptions. $\texttt{dotears}$ outperforms other methods in varied simulations, and in real data infers edges that validate with higher precision and recall than state-of-the-art methods through differential expression tests and high-confidence protein-protein interactions.
△ Less
Submitted 20 February, 2024; v1 submitted 30 May, 2023;
originally announced May 2023.
-
Lightweight Learner for Shared Knowledge Lifelong Learning
Authors:
Yunhao Ge,
Yuecheng Li,
Di Wu,
Ao Xu,
Adam M. Jones,
Amanda Sofie Rios,
Iordanis Fostiropoulos,
Shixian Wen,
Po-Hsuan Huang,
Zachary William Murdock,
Gozde Sahin,
Shuo Ni,
Kiran Lekkala,
Sumedh Anand Sontakke,
Laurent Itti
Abstract:
In Lifelong Learning (LL), agents continually learn as they encounter new conditions and tasks. Most current LL is limited to a single agent that learns tasks sequentially. Dedicated LL machinery is then deployed to mitigate the forgetting of old tasks as new tasks are learned. This is inherently slow. We propose a new Shared Knowledge Lifelong Learning (SKILL) challenge, which deploys a decentral…
▽ More
In Lifelong Learning (LL), agents continually learn as they encounter new conditions and tasks. Most current LL is limited to a single agent that learns tasks sequentially. Dedicated LL machinery is then deployed to mitigate the forgetting of old tasks as new tasks are learned. This is inherently slow. We propose a new Shared Knowledge Lifelong Learning (SKILL) challenge, which deploys a decentralized population of LL agents that each sequentially learn different tasks, with all agents operating independently and in parallel. After learning their respective tasks, agents share and consolidate their knowledge over a decentralized communication network, so that, in the end, all agents can master all tasks. We present one solution to SKILL which uses Lightweight Lifelong Learning (LLL) agents, where the goal is to facilitate efficient sharing by minimizing the fraction of the agent that is specialized for any given task. Each LLL agent thus consists of a common task-agnostic immutable part, where most parameters are, and individual task-specific modules that contain fewer parameters but are adapted to each task. Agents share their task-specific modules, plus summary information ("task anchors") representing their tasks in the common task-agnostic latent space of all agents. Receiving agents register each received task-specific module using the corresponding anchor. Thus, every agent improves its ability to solve new tasks each time new task-specific modules and anchors are received. On a new, very challenging SKILL-102 dataset with 102 image classification tasks (5,033 classes in total, 2,041,225 training, 243,464 validation, and 243,464 test images), we achieve much higher (and SOTA) accuracy over 8 LL baselines, while also achieving near perfect parallelization. Code and data can be found at https://github.com/gyhandy/Shared-Knowledge-Lifelong-Learning
△ Less
Submitted 24 May, 2023;
originally announced May 2023.
-
Estimating Large Language Model Capabilities without Labeled Test Data
Authors:
Harvey Yiyun Fu,
Qinyuan Ye,
Albert Xu,
Xiang Ren,
Robin Jia
Abstract:
Large Language Models (LLMs) have the impressive ability to perform in-context learning (ICL) from only a few examples, but the success of ICL varies widely from task to task. Thus, it is important to quickly determine whether ICL is applicable to a new task, but directly evaluating ICL accuracy can be expensive in situations where test data is expensive to annotate -- the exact situations where I…
▽ More
Large Language Models (LLMs) have the impressive ability to perform in-context learning (ICL) from only a few examples, but the success of ICL varies widely from task to task. Thus, it is important to quickly determine whether ICL is applicable to a new task, but directly evaluating ICL accuracy can be expensive in situations where test data is expensive to annotate -- the exact situations where ICL is most appealing. In this paper, we propose the task of ICL accuracy estimation, in which we predict the accuracy of an LLM when doing in-context learning on a new task given only unlabeled test data for that task. To perform ICL accuracy estimation, we propose a method that trains a meta-model using LLM confidence scores as features. We compare our method to several strong accuracy estimation baselines on a new benchmark that covers 4 LLMs and 3 task collections. The meta-model improves over all baselines across 8 out of 12 settings and achieves the same estimation performance as directly evaluating on 40 collected labeled test examples per task. At the same time, no existing approach provides an accurate and reliable ICL accuracy estimation in every setting, highlighting the need for better ways to measure the uncertainty of LLM predictions.
△ Less
Submitted 26 October, 2023; v1 submitted 24 May, 2023;
originally announced May 2023.
-
Understanding Spoken Language Development of Children with ASD Using Pre-trained Speech Embeddings
Authors:
Anfeng Xu,
Rajat Hebbar,
Rimita Lahiri,
Tiantian Feng,
Lindsay Butler,
Lue Shen,
Helen Tager-Flusberg,
Shrikanth Narayanan
Abstract:
Speech processing techniques are useful for analyzing speech and language development in children with Autism Spectrum Disorder (ASD), who are often varied and delayed in acquiring these skills. Early identification and intervention are crucial, but traditional assessment methodologies such as caregiver reports are not adequate for the requisite behavioral phenotyping. Natural Language Sample (NLS…
▽ More
Speech processing techniques are useful for analyzing speech and language development in children with Autism Spectrum Disorder (ASD), who are often varied and delayed in acquiring these skills. Early identification and intervention are crucial, but traditional assessment methodologies such as caregiver reports are not adequate for the requisite behavioral phenotyping. Natural Language Sample (NLS) analysis has gained attention as a promising complement. Researchers have developed benchmarks for spoken language capabilities in children with ASD, obtainable through the analysis of NLS. This paper proposes applications of speech processing technologies in support of automated assessment of children's spoken language development by classification between child and adult speech and between speech and nonverbal vocalization in NLS, with respective F1 macro scores of 82.6% and 67.8%, underscoring the potential for accurate and scalable tools for ASD research and clinical use.
△ Less
Submitted 31 May, 2023; v1 submitted 23 May, 2023;
originally announced May 2023.
-
Fractional eternal domination: securely distributing resources across a network
Authors:
Fnu Devvrit,
Aaron Krim-Yee,
Nithish Kumar,
Gary MacGillivray,
Ben Seamone,
Virgélot Virgile,
AnQi Xu
Abstract:
This paper initiates the study of fractional eternal domination in graphs, a natural relaxation of the well-studied eternal domination problem. We study the connections to flows and linear programming in order to obtain results on the complexity of determining the fractional eternal domination number of a graph $G$, which we denote $γ_{\,\textit{f}}^{\infty}(G)$. We study the behaviour of…
▽ More
This paper initiates the study of fractional eternal domination in graphs, a natural relaxation of the well-studied eternal domination problem. We study the connections to flows and linear programming in order to obtain results on the complexity of determining the fractional eternal domination number of a graph $G$, which we denote $γ_{\,\textit{f}}^{\infty}(G)$. We study the behaviour of $γ_{\,\textit{f}}^{\infty}(G)$ as it relates to other domination parameters. We also determine bounds on, and in some cases exact values for, $γ_{\,\textit{f}}^{\infty}(G)$ when $G$ is a member of one of a variety of important graph classes, including trees, split graphs, strongly chordal graphs, Kneser graphs, abelian Cayley graphs, and graph products.
△ Less
Submitted 23 April, 2023;
originally announced April 2023.
-
Facial Affect Recognition based on Transformer Encoder and Audiovisual Fusion for the ABAW5 Challenge
Authors:
Ziyang Zhang,
Liuwei An,
Zishun Cui,
Ao xu,
Tengteng Dong,
Yueqi Jiang,
Jingyi Shi,
Xin Liu,
Xiao Sun,
Meng Wang
Abstract:
In this paper, we present our solutions for the 5th Workshop and Competition on Affective Behavior Analysis in-the-wild (ABAW), which includes four sub-challenges of Valence-Arousal (VA) Estimation, Expression (Expr) Classification, Action Unit (AU) Detection and Emotional Reaction Intensity (ERI) Estimation. The 5th ABAW competition focuses on facial affect recognition utilizing different modalit…
▽ More
In this paper, we present our solutions for the 5th Workshop and Competition on Affective Behavior Analysis in-the-wild (ABAW), which includes four sub-challenges of Valence-Arousal (VA) Estimation, Expression (Expr) Classification, Action Unit (AU) Detection and Emotional Reaction Intensity (ERI) Estimation. The 5th ABAW competition focuses on facial affect recognition utilizing different modalities and datasets. In our work, we extract powerful audio and visual features using a large number of sota models. These features are fused by Transformer Encoder and TEMMA. Besides, to avoid the possible impact of large dimensional differences between various features, we design an Affine Module to align different features to the same dimension. Extensive experiments demonstrate that the superiority of the proposed method. For the VA Estimation sub-challenge, our method obtains the mean Concordance Correlation Coefficient (CCC) of 0.6066. For the Expression Classification sub-challenge, the average F1 Score is 0.4055. For the AU Detection sub-challenge, the average F1 Score is 0.5296. For the Emotional Reaction Intensity Estimation sub-challenge, the average pearson's correlations coefficient on the validation set is 0.3968. All of the results of four sub-challenges outperform the baseline with a large margin.
△ Less
Submitted 20 March, 2023; v1 submitted 16 March, 2023;
originally announced March 2023.
-
Delving Deep into Simplicity Bias for Long-Tailed Image Recognition
Authors:
Xiu-Shen Wei,
Xuhao Sun,
Yang Shen,
Anqi Xu,
Peng Wang,
Faen Zhang
Abstract:
Simplicity Bias (SB) is a phenomenon that deep neural networks tend to rely favorably on simpler predictive patterns but ignore some complex features when applied to supervised discriminative tasks. In this work, we investigate SB in long-tailed image recognition and find the tail classes suffer more severely from SB, which harms the generalization performance of such underrepresented classes. We…
▽ More
Simplicity Bias (SB) is a phenomenon that deep neural networks tend to rely favorably on simpler predictive patterns but ignore some complex features when applied to supervised discriminative tasks. In this work, we investigate SB in long-tailed image recognition and find the tail classes suffer more severely from SB, which harms the generalization performance of such underrepresented classes. We empirically report that self-supervised learning (SSL) can mitigate SB and perform in complementary to the supervised counterpart by enriching the features extracted from tail samples and consequently taking better advantage of such rare samples. However, standard SSL methods are designed without explicitly considering the inherent data distribution in terms of classes and may not be optimal for long-tailed distributed data. To address this limitation, we propose a novel SSL method tailored to imbalanced data. It leverages SSL by triple diverse levels, i.e., holistic-, partial-, and augmented-level, to enhance the learning of predictive complex patterns, which provides the potential to overcome the severe SB on tail data. Both quantitative and qualitative experimental results on five long-tailed benchmark datasets show our method can effectively mitigate SB and significantly outperform the competing state-of-the-arts.
△ Less
Submitted 7 February, 2023;
originally announced February 2023.
-
A Simple Algorithm For Scaling Up Kernel Methods
Authors:
Teng Andrea Xu,
Bryan Kelly,
Semyon Malamud
Abstract:
The recent discovery of the equivalence between infinitely wide neural networks (NNs) in the lazy training regime and Neural Tangent Kernels (NTKs) (Jacot et al., 2018) has revived interest in kernel methods. However, conventional wisdom suggests kernel methods are unsuitable for large samples due to their computational complexity and memory requirements. We introduce a novel random feature regres…
▽ More
The recent discovery of the equivalence between infinitely wide neural networks (NNs) in the lazy training regime and Neural Tangent Kernels (NTKs) (Jacot et al., 2018) has revived interest in kernel methods. However, conventional wisdom suggests kernel methods are unsuitable for large samples due to their computational complexity and memory requirements. We introduce a novel random feature regression algorithm that allows us (when necessary) to scale to virtually infinite numbers of random features. We illustrate the performance of our method on the CIFAR-10 dataset.
△ Less
Submitted 30 January, 2023; v1 submitted 26 January, 2023;
originally announced January 2023.
-
Autonomous Material Composite Morphing Wing
Authors:
Daniel Morton,
Artemis Xu,
Alberto Matute,
Robert F. Shepherd
Abstract:
Aeronautics research has continually sought to achieve the adaptability and morphing performance of avian wings, but in practice, wings of all scales continue to use the same hinged control-surface embodiment. Recent research into compliant and bio-inspired mechanisms for morphing wings and control surfaces has indicated promising results, though often these are mechanically complex, or limited in…
▽ More
Aeronautics research has continually sought to achieve the adaptability and morphing performance of avian wings, but in practice, wings of all scales continue to use the same hinged control-surface embodiment. Recent research into compliant and bio-inspired mechanisms for morphing wings and control surfaces has indicated promising results, though often these are mechanically complex, or limited in the number of degrees-of-freedom (DOF) they can control. Seeking to improve on these limitations, we apply a new paradigm denoted Autonomous Material Composites to the design of avian-scale morphing wings. With this methodology, we reduce the need for complex actuation and mechanisms, and allow for three-dimensional placement of stretchable fiber optic strain gauges (Optical Lace) throughout the metamaterial structure. This structure centers around elastomeric conformal lattices, and by applying functionally-graded warping and thickening to this lattice, we allow for local tailoring of the compliance properties to fit the desired morphing. As a result, the wing achieves high-deformation morphing in three DOF: twist, camber, and extension/compression, with these morphed shapes effectively modifying the aerodynamic performance of the wing, as demonstrated in low-Reynolds wind tunnel testing. Our sensors also successfully demonstrate differentiable trends across all degrees of morphing, enabling the future state estimation and control of this wing.
△ Less
Submitted 18 January, 2023;
originally announced January 2023.
-
Matrix Multiplication: Verifying Strong Uniquely Solvable Puzzles
Authors:
Matthew Anderson,
Zongliang Ji,
Anthony Yang Xu
Abstract:
Cohn and Umans proposed a framework for developing fast matrix multiplication algorithms based on the embedding computation in certain groups algebras. In subsequent work with Kleinberg and Szegedy, they connected this to the search for combinatorial objects called strong uniquely solvable puzzles (strong USPs). We begin a systematic computer-aided search for these objects. We develop and implemen…
▽ More
Cohn and Umans proposed a framework for developing fast matrix multiplication algorithms based on the embedding computation in certain groups algebras. In subsequent work with Kleinberg and Szegedy, they connected this to the search for combinatorial objects called strong uniquely solvable puzzles (strong USPs). We begin a systematic computer-aided search for these objects. We develop and implement constraint-based algorithms build on reductions to $\mathrm{SAT}$ and $\mathrm{IP}$ to verify that puzzles are strong USPs, and to search for large strong USPs. We produce tight bounds on the maximum size of a strong USP for width $k \le 5$, construct puzzles of small width that are larger than previous work, and improve the upper bounds on strong USP size for $k \le 12$. Although our work only deals with puzzles of small-constant width, the strong USPs we find imply matrix multiplication algorithms that run in $O(n^ω)$ time with exponent $ω\le 2.66$. While our algorithms do not beat the fastest algorithms, our work provides evidence and, perhaps, a path to finding families of strong USPs that imply matrix multiplication algorithms that are more efficient than those currently known.
△ Less
Submitted 30 December, 2022;
originally announced January 2023.
-
Bayesian Learning for Dynamic Inference
Authors:
Aolin Xu,
Peng Guan
Abstract:
The traditional statistical inference is static, in the sense that the estimate of the quantity of interest does not affect the future evolution of the quantity. In some sequential estimation problems however, the future values of the quantity to be estimated depend on the estimate of its current value. This type of estimation problems has been formulated as the dynamic inference problem. In this…
▽ More
The traditional statistical inference is static, in the sense that the estimate of the quantity of interest does not affect the future evolution of the quantity. In some sequential estimation problems however, the future values of the quantity to be estimated depend on the estimate of its current value. This type of estimation problems has been formulated as the dynamic inference problem. In this work, we formulate the Bayesian learning problem for dynamic inference, where the unknown quantity-generation model is assumed to be randomly drawn according to a random model parameter. We derive the optimal Bayesian learning rules, both offline and online, to minimize the inference loss. Moreover, learning for dynamic inference can serve as a meta problem, such that all familiar machine learning problems, including supervised learning, imitation learning and reinforcement learning, can be cast as its special cases or variants. Gaining a good understanding of this unifying meta problem thus sheds light on a broad spectrum of machine learning problems as well.
△ Less
Submitted 30 December, 2022;
originally announced January 2023.
-
HandsOff: Labeled Dataset Generation With No Additional Human Annotations
Authors:
Austin Xu,
Mariya I. Vasileva,
Achal Dave,
Arjun Seshadri
Abstract:
Recent work leverages the expressive power of generative adversarial networks (GANs) to generate labeled synthetic datasets. These dataset generation methods often require new annotations of synthetic images, which forces practitioners to seek out annotators, curate a set of synthetic images, and ensure the quality of generated labels. We introduce the HandsOff framework, a technique capable of pr…
▽ More
Recent work leverages the expressive power of generative adversarial networks (GANs) to generate labeled synthetic datasets. These dataset generation methods often require new annotations of synthetic images, which forces practitioners to seek out annotators, curate a set of synthetic images, and ensure the quality of generated labels. We introduce the HandsOff framework, a technique capable of producing an unlimited number of synthetic images and corresponding labels after being trained on less than 50 pre-existing labeled images. Our framework avoids the practical drawbacks of prior work by unifying the field of GAN inversion with dataset generation. We generate datasets with rich pixel-wise labels in multiple challenging domains such as faces, cars, full-body human poses, and urban driving scenes. Our method achieves state-of-the-art performance in semantic segmentation, keypoint detection, and depth estimation compared to prior dataset generation approaches and transfer learning baselines. We additionally showcase its ability to address broad challenges in model development which stem from fixed, hand-annotated datasets, such as the long-tail problem in semantic segmentation. Project page: austinxu87.github.io/handsoff.
△ Less
Submitted 30 March, 2023; v1 submitted 23 December, 2022;
originally announced December 2022.
-
PulseImpute: A Novel Benchmark Task for Pulsative Physiological Signal Imputation
Authors:
Maxwell A. Xu,
Alexander Moreno,
Supriya Nagesh,
V. Burak Aydemir,
David W. Wetter,
Santosh Kumar,
James M. Rehg
Abstract:
The promise of Mobile Health (mHealth) is the ability to use wearable sensors to monitor participant physiology at high frequencies during daily life to enable temporally-precise health interventions. However, a major challenge is frequent missing data. Despite a rich imputation literature, existing techniques are ineffective for the pulsative signals which comprise many mHealth applications, and…
▽ More
The promise of Mobile Health (mHealth) is the ability to use wearable sensors to monitor participant physiology at high frequencies during daily life to enable temporally-precise health interventions. However, a major challenge is frequent missing data. Despite a rich imputation literature, existing techniques are ineffective for the pulsative signals which comprise many mHealth applications, and a lack of available datasets has stymied progress. We address this gap with PulseImpute, the first large-scale pulsative signal imputation challenge which includes realistic mHealth missingness models, an extensive set of baselines, and clinically-relevant downstream tasks. Our baseline models include a novel transformer-based architecture designed to exploit the structure of pulsative signals. We hope that PulseImpute will enable the ML community to tackle this significant and challenging task.
△ Less
Submitted 15 December, 2023; v1 submitted 14 December, 2022;
originally announced December 2022.
-
Contrastive Novelty-Augmented Learning: Anticipating Outliers with Large Language Models
Authors:
Albert Xu,
Xiang Ren,
Robin Jia
Abstract:
In many task settings, text classification models are likely to encounter examples from novel classes on which they cannot predict correctly. Selective prediction, in which models abstain on low-confidence examples, provides a possible solution, but existing models are often overly confident on unseen classes. To remedy this overconfidence, we introduce Contrastive Novelty-Augmented Learning (CoNA…
▽ More
In many task settings, text classification models are likely to encounter examples from novel classes on which they cannot predict correctly. Selective prediction, in which models abstain on low-confidence examples, provides a possible solution, but existing models are often overly confident on unseen classes. To remedy this overconfidence, we introduce Contrastive Novelty-Augmented Learning (CoNAL), a two-step method that generates OOD examples representative of novel classes, then trains to decrease confidence on them. First, we generate OOD examples by prompting a large language model twice: we prompt it to enumerate relevant novel classes, then generate examples from each novel class matching the task format. Second, we train a classifier with a novel contrastive objective that encourages lower confidence on generated OOD examples than training examples. When trained with CoNAL, classifiers improve in their ability to detect and abstain on novel class examples over prior methods by an average of 2.3% in terms of accuracy under the accuracy-coverage curve (AUAC) and 5.5% AUROC across 4 NLP datasets, with no cost to in-distribution accuracy.
△ Less
Submitted 26 May, 2023; v1 submitted 28 November, 2022;
originally announced November 2022.
-
Synthesizing Quantum-Circuit Optimizers
Authors:
Amanda Xu,
Abtin Molavi,
Lauren Pick,
Swamit Tannu,
Aws Albarghouthi
Abstract:
Near-term quantum computers are expected to work in an environment where each operation is noisy, with no error correction. Therefore, quantum-circuit optimizers are applied to minimize the number of noisy operations. Today, physicists are constantly experimenting with novel devices and architectures. For every new physical substrate and for every modification of a quantum computer, we need to mod…
▽ More
Near-term quantum computers are expected to work in an environment where each operation is noisy, with no error correction. Therefore, quantum-circuit optimizers are applied to minimize the number of noisy operations. Today, physicists are constantly experimenting with novel devices and architectures. For every new physical substrate and for every modification of a quantum computer, we need to modify or rewrite major pieces of the optimizer to run successful experiments. In this paper, we present QUESO, an efficient approach for automatically synthesizing a quantum-circuit optimizer for a given quantum device. For instance, in 1.2 minutes, QUESO can synthesize an optimizer with high-probability correctness guarantees for IBM computers that significantly outperforms leading compilers, such as IBM's Qiskit and TKET, on the majority (85%) of the circuits in a diverse benchmark suite.
A number of theoretical and algorithmic insights underlie QUESO: (1) An algebraic approach for representing rewrite rules and their semantics. This facilitates reasoning about complex symbolic rewrite rules that are beyond the scope of existing techniques. (2) A fast approach for probabilistically verifying equivalence of quantum circuits by reducing the problem to a special form of polynomial identity testing. (3) A novel probabilistic data structure, called a polynomial identity filter (PIF), for efficiently synthesizing rewrite rules. (4) A beam-search-based algorithm that efficiently applies the synthesized symbolic rewrite rules to optimize quantum circuits.
△ Less
Submitted 10 May, 2023; v1 submitted 17 November, 2022;
originally announced November 2022.
-
Generating counterfactual explanations of tumor spatial proteomes to discover effective strategies for enhancing immune infiltration
Authors:
Zitong Jerry Wang,
Alexander M. Xu,
Aman Bhargava,
Matt W. Thomson
Abstract:
The tumor microenvironment (TME) significantly impacts cancer prognosis due to its immune composition. While therapies for altering the immune composition, including immunotherapies, have shown exciting results for treating hematological cancers, they are less effective for immunologically-cold, solid tumors. Spatial omics technologies capture the spatial organization of the TME with unprecedented…
▽ More
The tumor microenvironment (TME) significantly impacts cancer prognosis due to its immune composition. While therapies for altering the immune composition, including immunotherapies, have shown exciting results for treating hematological cancers, they are less effective for immunologically-cold, solid tumors. Spatial omics technologies capture the spatial organization of the TME with unprecedented molecular detail, revealing the relationship between immune cell localization and molecular signals. Here, we formulate T-cell infiltration prediction as a self-supervised machine learning problem and develop a counterfactual optimization strategy that leverages large scale spatial omics profiles of patient tumors to design tumor perturbations predicted to boost T-cell infiltration. A convolutional neural network predicts T-cell distribution based on signaling molecules in the TME provided by imaging mass cytometry. Gradient-based counterfactual generation, then, computes perturbations predicted to boost T-cell abundance. We apply our framework to melanoma, colorectal cancer liver metastases, and breast tumor data, discovering combinatorial perturbations predicted to support T-cell infiltration across tens to hundreds of patients. This work presents a paradigm for counterfactual-based prediction and design of cancer therapeutics using spatial omics data.
△ Less
Submitted 13 October, 2023; v1 submitted 8 November, 2022;
originally announced November 2022.
-
Multi-GPU thermal lattice Boltzmann simulations using OpenACC and MPI
Authors:
Ao Xu,
Bo-Tao Li
Abstract:
We assess the performance of the hybrid Open Accelerator (OpenACC) and Message Passing Interface (MPI) approach for multi-graphics processing units (GPUs) accelerated thermal lattice Boltzmann (LB) simulation. The OpenACC accelerates computation on a single GPU, and the MPI synchronizes the information between multiple GPUs. With a single GPU, the two-dimension (2D) simulation achieved 1.93 billio…
▽ More
We assess the performance of the hybrid Open Accelerator (OpenACC) and Message Passing Interface (MPI) approach for multi-graphics processing units (GPUs) accelerated thermal lattice Boltzmann (LB) simulation. The OpenACC accelerates computation on a single GPU, and the MPI synchronizes the information between multiple GPUs. With a single GPU, the two-dimension (2D) simulation achieved 1.93 billion lattice updates per second (GLUPS) with a grid number of $8193^{2}$, and the three-dimension (3D) simulation achieved 1.04 GLUPS with a grid number of $385^{3}$, which is more than 76% of the theoretical maximum performance. On multi-GPUs, we adopt block partitioning, overlapping communications with computations, and concurrent computation to optimize parallel efficiency. We show that in the strong scaling test, using 16 GPUs, the 2D simulation achieved 30.42 GLUPS and the 3D simulation achieved 14.52 GLUPS. In the weak scaling test, the parallel efficiency remains above 99% up to 16 GPUs. Our results demonstrated that, with improved data and task management, the hybrid OpenACC and MPI technique is promising for thermal LB simulation on multi-GPUs.
△ Less
Submitted 17 November, 2022; v1 submitted 6 November, 2022;
originally announced November 2022.
-
Mathematical Justification of Hard Negative Mining via Isometric Approximation Theorem
Authors:
Albert Xu,
Jhih-Yi Hsieh,
Bhaskar Vundurthy,
Eliana Cohen,
Howie Choset,
Lu Li
Abstract:
In deep metric learning, the Triplet Loss has emerged as a popular method to learn many computer vision and natural language processing tasks such as facial recognition, object detection, and visual-semantic embeddings. One issue that plagues the Triplet Loss is network collapse, an undesirable phenomenon where the network projects the embeddings of all data onto a single point. Researchers predom…
▽ More
In deep metric learning, the Triplet Loss has emerged as a popular method to learn many computer vision and natural language processing tasks such as facial recognition, object detection, and visual-semantic embeddings. One issue that plagues the Triplet Loss is network collapse, an undesirable phenomenon where the network projects the embeddings of all data onto a single point. Researchers predominately solve this problem by using triplet mining strategies. While hard negative mining is the most effective of these strategies, existing formulations lack strong theoretical justification for their empirical success. In this paper, we utilize the mathematical theory of isometric approximation to show an equivalence between the Triplet Loss sampled by hard negative mining and an optimization problem that minimizes a Hausdorff-like distance between the neural network and its ideal counterpart function. This provides the theoretical justifications for hard negative mining's empirical efficacy. In addition, our novel application of the isometric approximation theorem provides the groundwork for future forms of hard negative mining that avoid network collapse. Our theory can also be extended to analyze other Euclidean space-based metric learning methods like Ladder Loss or Contrastive Learning.
△ Less
Submitted 20 October, 2022;
originally announced October 2022.
-
Benign Autoencoders
Authors:
Semyon Malamud,
Teng Andrea Xu,
Antoine Didisheim
Abstract:
Recent progress in Generative Artificial Intelligence (AI) relies on efficient data representations, often featuring encoder-decoder architectures. We formalize the mathematical problem of finding the optimal encoder-decoder pair and characterize its solution, which we name the "benign autoencoder" (BAE). We prove that BAE projects data onto a manifold whose dimension is the optimal compressibilit…
▽ More
Recent progress in Generative Artificial Intelligence (AI) relies on efficient data representations, often featuring encoder-decoder architectures. We formalize the mathematical problem of finding the optimal encoder-decoder pair and characterize its solution, which we name the "benign autoencoder" (BAE). We prove that BAE projects data onto a manifold whose dimension is the optimal compressibility dimension of the generative problem. We highlight surprising connections between BAE and several recent developments in AI, such as conditional GANs, context encoders, stable diffusion, stacked autoencoders, and the learning capabilities of generative models. As an illustration, we show how BAE can find optimal, low-dimensional latent representations that improve the performance of a discriminator under a distribution shift. By compressing "malignant" data dimensions, BAE leads to smoother and more stable gradients.
△ Less
Submitted 28 August, 2023; v1 submitted 2 October, 2022;
originally announced October 2022.
-
Qubit Mapping and Routing via MaxSAT
Authors:
Abtin Molavi,
Amanda Xu,
Martin Diges,
Lauren Pick,
Swamit Tannu,
Aws Albarghouthi
Abstract:
Near-term quantum computers will operate in a noisy environment, without error correction. A critical problem for near-term quantum computing is laying out a logical circuit onto a physical device with limited connectivity between qubits. This is known as the qubit mapping and routing (QMR) problem, an intractable combinatorial problem. It is important to solve QMR as optimally as possible to redu…
▽ More
Near-term quantum computers will operate in a noisy environment, without error correction. A critical problem for near-term quantum computing is laying out a logical circuit onto a physical device with limited connectivity between qubits. This is known as the qubit mapping and routing (QMR) problem, an intractable combinatorial problem. It is important to solve QMR as optimally as possible to reduce the amount of added noise, which may render a quantum computation useless. In this paper, we present a novel approach for optimally solving the QMR problem via a reduction to maximum satisfiability (MAXSAT). Additionally, we present two novel relaxation ideas that shrink the size of the MAXSAT constraints by exploiting the structure of a quantum circuit. Our thorough empirical evaluation demonstrates (1) the scalability of our approach compared to state-of-the-art optimal QMR techniques (solves more than 3x benchmarks with 40x speedup), (2) the significant cost reduction compared to state-of-the-art heuristic approaches (an average of ~5x swap reduction), and (3) the power of our proposed constraint relaxations.
△ Less
Submitted 29 August, 2022;
originally announced August 2022.
-
Invariant Structure Learning for Better Generalization and Causal Explainability
Authors:
Yunhao Ge,
Sercan Ö. Arik,
Jinsung Yoon,
Ao Xu,
Laurent Itti,
Tomas Pfister
Abstract:
Learning the causal structure behind data is invaluable for improving generalization and obtaining high-quality explanations. We propose a novel framework, Invariant Structure Learning (ISL), that is designed to improve causal structure discovery by utilizing generalization as an indication. ISL splits the data into different environments, and learns a structure that is invariant to the target acr…
▽ More
Learning the causal structure behind data is invaluable for improving generalization and obtaining high-quality explanations. We propose a novel framework, Invariant Structure Learning (ISL), that is designed to improve causal structure discovery by utilizing generalization as an indication. ISL splits the data into different environments, and learns a structure that is invariant to the target across different environments by imposing a consistency constraint. An aggregation mechanism then selects the optimal classifier based on a graph structure that reflects the causal mechanisms in the data more accurately compared to the structures learnt from individual environments. Furthermore, we extend ISL to a self-supervised learning setting where accurate causal structure discovery does not rely on any labels. This self-supervised ISL utilizes invariant causality proposals by iteratively setting different nodes as targets. On synthetic and real-world datasets, we demonstrate that ISL accurately discovers the causal structure, outperforms alternative methods, and yields superior generalization for datasets with significant distribution shifts.
△ Less
Submitted 13 June, 2022;
originally announced June 2022.
-
Chordal Sparsity for SDP-based Neural Network Verification
Authors:
Anton Xue,
Lars Lindemann,
Rajeev Alur
Abstract:
Neural networks are central to many emerging technologies, but verifying their correctness remains a major challenge. It is known that network outputs can be sensitive and fragile to even small input perturbations, thereby increasing the risk of unpredictable and undesirable behavior. Fast and accurate verification of neural networks is therefore critical to their widespread adoption, and in recen…
▽ More
Neural networks are central to many emerging technologies, but verifying their correctness remains a major challenge. It is known that network outputs can be sensitive and fragile to even small input perturbations, thereby increasing the risk of unpredictable and undesirable behavior. Fast and accurate verification of neural networks is therefore critical to their widespread adoption, and in recent years, various methods have been developed as a response to this problem. In this paper, we focus on improving semidefinite programming (SDP) based techniques for neural network verification. Such techniques offer the power of expressing complex geometric constraints while retaining a convex problem formulation, but scalability remains a major issue in practice. Our starting point is the DeepSDP framework proposed by Fazlyab et al., which uses quadratic constraints to abstract the verification problem into a large-scale SDP. However, solving this SDP quickly becomes intractable when the network grows. Our key observation is that by leveraging chordal sparsity, we can decompose the primary computational bottleneck of DeepSDP -- a large linear matrix inequality (LMI) -- into an equivalent collection of smaller LMIs. We call our chordally sparse optimization program Chordal-DeepSDP and prove that its construction is identically expressive as that of DeepSDP. Moreover, we show that additional analysis of Chordal-DeepSDP allows us to further rewrite its collection of LMIs in a second level of decomposition that we call Chordal-DeepSDP-2 -- which results in another significant computational gain. Finally, we provide numerical experiments on real networks of learned cart-pole dynamics, showcasing the computational advantage of Chordal-DeepSDP and Chordal-DeepSDP-2 over DeepSDP.
△ Less
Submitted 8 January, 2024; v1 submitted 7 June, 2022;
originally announced June 2022.