Search | arXiv e-print repository

Variance reduction of diffusion model's gradients with Taylor approximation-based control variate

Authors: Paul Jeha, Will Grathwohl, Michael Riis Andersen, Carl Henrik Ek, Jes Frellsen

Abstract: Score-based models, trained with denoising score matching, are remarkably effective in generating high dimensional data. However, the high variance of their training objective hinders optimisation. We attempt to reduce it with a control variate, derived via a $k$-th order Taylor expansion on the training objective and its gradient. We prove an equivalence between the two and demonstrate empiricall… ▽ More Score-based models, trained with denoising score matching, are remarkably effective in generating high dimensional data. However, the high variance of their training objective hinders optimisation. We attempt to reduce it with a control variate, derived via a $k$-th order Taylor expansion on the training objective and its gradient. We prove an equivalence between the two and demonstrate empirically the effectiveness of our approach on a low dimensional problem setting; and study its effect on larger problems. △ Less

Submitted 22 August, 2024; originally announced August 2024.

Comments: 14 pages, ICML Structured Probabilistic Inference & Generative Modeling 2024

arXiv:2404.04268 [pdf]

The Use of Generative Search Engines for Knowledge Work and Complex Tasks

Authors: Siddharth Suri, Scott Counts, Leijie Wang, Chacha Chen, Mengting Wan, Tara Safavi, Jennifer Neville, Chirag Shah, Ryen W. White, Reid Andersen, Georg Buscher, Sathish Manivannan, Nagu Rangan, Longqi Yang

Abstract: Until recently, search engines were the predominant method for people to access online information. The recent emergence of large language models (LLMs) has given machines new capabilities such as the ability to generate new digital artifacts like text, images, code etc., resulting in a new tool, a generative search engine, which combines the capabilities of LLMs with a traditional search engine.… ▽ More Until recently, search engines were the predominant method for people to access online information. The recent emergence of large language models (LLMs) has given machines new capabilities such as the ability to generate new digital artifacts like text, images, code etc., resulting in a new tool, a generative search engine, which combines the capabilities of LLMs with a traditional search engine. Through the empirical analysis of Bing Copilot (Bing Chat), one of the first publicly available generative search engines, we analyze the types and complexity of tasks that people use Bing Copilot for compared to Bing Search. Findings indicate that people use the generative search engine for more knowledge work tasks that are higher in cognitive complexity than were commonly done with a traditional search engine. △ Less

Submitted 19 March, 2024; originally announced April 2024.

Comments: 32 pages, 3 figures, 4 tables

ACM Class: J.4

arXiv:2403.12388 [pdf, other]

Interpretable User Satisfaction Estimation for Conversational Systems with Large Language Models

Authors: Ying-Chun Lin, Jennifer Neville, Jack W. Stokes, Longqi Yang, Tara Safavi, Mengting Wan, Scott Counts, Siddharth Suri, Reid Andersen, Xiaofeng Xu, Deepak Gupta, Sujay Kumar Jauhar, Xia Song, Georg Buscher, Saurabh Tiwary, Brent Hecht, Jaime Teevan

Abstract: Accurate and interpretable user satisfaction estimation (USE) is critical for understanding, evaluating, and continuously improving conversational systems. Users express their satisfaction or dissatisfaction with diverse conversational patterns in both general-purpose (ChatGPT and Bing Copilot) and task-oriented (customer service chatbot) conversational systems. Existing approaches based on featur… ▽ More Accurate and interpretable user satisfaction estimation (USE) is critical for understanding, evaluating, and continuously improving conversational systems. Users express their satisfaction or dissatisfaction with diverse conversational patterns in both general-purpose (ChatGPT and Bing Copilot) and task-oriented (customer service chatbot) conversational systems. Existing approaches based on featurized ML models or text embeddings fall short in extracting generalizable patterns and are hard to interpret. In this work, we show that LLMs can extract interpretable signals of user satisfaction from their natural language utterances more effectively than embedding-based approaches. Moreover, an LLM can be tailored for USE via an iterative prompting framework using supervision from labeled examples. The resulting method, Supervised Prompting for User satisfaction Rubrics (SPUR), not only has higher accuracy but is more interpretable as it scores user satisfaction via learned rubrics with a detailed breakdown. △ Less

Submitted 8 June, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.12173 [pdf, other]

TnT-LLM: Text Mining at Scale with Large Language Models

Authors: Mengting Wan, Tara Safavi, Sujay Kumar Jauhar, Yujin Kim, Scott Counts, Jennifer Neville, Siddharth Suri, Chirag Shah, Ryen W White, Longqi Yang, Reid Andersen, Georg Buscher, Dhruv Joshi, Nagu Rangan

Abstract: Transforming unstructured text into structured and meaningful forms, organized by useful category labels, is a fundamental step in text mining for downstream analysis and application. However, most existing methods for producing label taxonomies and building text-based label classifiers still rely heavily on domain expertise and manual curation, making the process expensive and time-consuming. Thi… ▽ More Transforming unstructured text into structured and meaningful forms, organized by useful category labels, is a fundamental step in text mining for downstream analysis and application. However, most existing methods for producing label taxonomies and building text-based label classifiers still rely heavily on domain expertise and manual curation, making the process expensive and time-consuming. This is particularly challenging when the label space is under-specified and large-scale data annotations are unavailable. In this paper, we address these challenges with Large Language Models (LLMs), whose prompt-based interface facilitates the induction and use of large-scale pseudo labels. We propose TnT-LLM, a two-phase framework that employs LLMs to automate the process of end-to-end label generation and assignment with minimal human effort for any given use-case. In the first phase, we introduce a zero-shot, multi-stage reasoning approach which enables LLMs to produce and refine a label taxonomy iteratively. In the second phase, LLMs are used as data labelers that yield training samples so that lightweight supervised classifiers can be reliably built, deployed, and served at scale. We apply TnT-LLM to the analysis of user intent and conversational domain for Bing Copilot (formerly Bing Chat), an open-domain chat-based search engine. Extensive experiments using both human and automatic evaluation metrics demonstrate that TnT-LLM generates more accurate and relevant label taxonomies when compared against state-of-the-art baselines, and achieves a favorable balance between accuracy and efficiency for classification at scale. We also share our practical experiences and insights on the challenges and opportunities of using LLMs for large-scale text mining in real-world applications. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: 9 pages main content, 8 pages references and appendix

arXiv:2312.08805 [pdf, other]

Zoom in on the Plant: Fine-grained Analysis of Leaf, Stem and Vein Instances

Authors: Ronja Güldenring, Rasmus Eckholdt Andersen, Lazaros Nalpantidis

Abstract: Robot perception is far from what humans are capable of. Humans do not only have a complex semantic scene understanding but also extract fine-grained intra-object properties for the salient ones. When humans look at plants, they naturally perceive the plant architecture with its individual leaves and branching system. In this work, we want to advance the granularity in plant understanding for agri… ▽ More Robot perception is far from what humans are capable of. Humans do not only have a complex semantic scene understanding but also extract fine-grained intra-object properties for the salient ones. When humans look at plants, they naturally perceive the plant architecture with its individual leaves and branching system. In this work, we want to advance the granularity in plant understanding for agricultural precision robots. We develop a model to extract fine-grained phenotypic information, such as leaf-, stem-, and vein instances. The underlying dataset RumexLeaves is made publicly available and is the first of its kind with keypoint-guided polyline annotations leading along the line from the lowest stem point along the leaf basal to the leaf apex. Furthermore, we introduce an adapted metric POKS complying with the concept of keypoint-guided polylines. In our experimental evaluation, we provide baseline results for our newly introduced dataset while showcasing the benefits of POKS over OKS. △ Less

Submitted 14 December, 2023; originally announced December 2023.

Comments: Accepted at Robotics and Automation Letters (RA-L)

arXiv:2311.09389 [pdf, other]

Neural machine translation for automated feedback on children's early-stage writing

Authors: Jonas Vestergaard Jensen, Mikkel Jordahn, Michael Riis Andersen

Abstract: In this work, we address the problem of assessing and constructing feedback for early-stage writing automatically using machine learning. Early-stage writing is typically vastly different from conventional writing due to phonetic spelling and lack of proper grammar, punctuation, spacing etc. Consequently, early-stage writing is highly non-trivial to analyze using common linguistic metrics. We prop… ▽ More In this work, we address the problem of assessing and constructing feedback for early-stage writing automatically using machine learning. Early-stage writing is typically vastly different from conventional writing due to phonetic spelling and lack of proper grammar, punctuation, spacing etc. Consequently, early-stage writing is highly non-trivial to analyze using common linguistic metrics. We propose to use sequence-to-sequence models for "translating" early-stage writing by students into "conventional" writing, which allows the translated text to be analyzed using linguistic metrics. Furthermore, we propose a novel robust likelihood to mitigate the effect of noise in the dataset. We investigate the proposed methods using a set of numerical experiments and demonstrate that the conventional text can be predicted with high accuracy. △ Less

Submitted 15 November, 2023; originally announced November 2023.

Comments: 9 pages, 1 figure, 1 table, to be published in the proceedings of the Northern Lights Deep Learning Conference 2024

ACM Class: I.2.7

arXiv:2309.13063 [pdf, other]

Using Large Language Models to Generate, Validate, and Apply User Intent Taxonomies

Authors: Chirag Shah, Ryen W. White, Reid Andersen, Georg Buscher, Scott Counts, Sarkar Snigdha Sarathi Das, Ali Montazer, Sathish Manivannan, Jennifer Neville, Xiaochuan Ni, Nagu Rangan, Tara Safavi, Siddharth Suri, Mengting Wan, Leijie Wang, Longqi Yang

Abstract: Log data can reveal valuable information about how users interact with Web search services, what they want, and how satisfied they are. However, analyzing user intents in log data is not easy, especially for emerging forms of Web search such as AI-driven chat. To understand user intents from log data, we need a way to label them with meaningful categories that capture their diversity and dynamics.… ▽ More Log data can reveal valuable information about how users interact with Web search services, what they want, and how satisfied they are. However, analyzing user intents in log data is not easy, especially for emerging forms of Web search such as AI-driven chat. To understand user intents from log data, we need a way to label them with meaningful categories that capture their diversity and dynamics. Existing methods rely on manual or machine-learned labeling, which are either expensive or inflexible for large and dynamic datasets. We propose a novel solution using large language models (LLMs), which can generate rich and relevant concepts, descriptions, and examples for user intents. However, using LLMs to generate a user intent taxonomy and apply it for log analysis can be problematic for two main reasons: (1) such a taxonomy is not externally validated; and (2) there may be an undesirable feedback loop. To address this, we propose a new methodology with human experts and assessors to verify the quality of the LLM-generated taxonomy. We also present an end-to-end pipeline that uses an LLM with human-in-the-loop to produce, refine, and apply labels for user intent analysis in log data. We demonstrate its effectiveness by uncovering new insights into user intents from search and chat logs from the Microsoft Bing commercial search engine. The proposed work's novelty stems from the method for generating purpose-driven user intent taxonomies with strong validation. This method not only helps remove methodological and practical bottlenecks from intent-focused research, but also provides a new framework for generating, validating, and applying other kinds of taxonomies in a scalable and adaptable way with reasonable human effort. △ Less

Submitted 9 May, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

Report number: MSR-TR-2023-32

arXiv:2309.08827 [pdf, other]

S3-DST: Structured Open-Domain Dialogue Segmentation and State Tracking in the Era of LLMs

Authors: Sarkar Snigdha Sarathi Das, Chirag Shah, Mengting Wan, Jennifer Neville, Longqi Yang, Reid Andersen, Georg Buscher, Tara Safavi

Abstract: The traditional Dialogue State Tracking (DST) problem aims to track user preferences and intents in user-agent conversations. While sufficient for task-oriented dialogue systems supporting narrow domain applications, the advent of Large Language Model (LLM)-based chat systems has introduced many real-world intricacies in open-domain dialogues. These intricacies manifest in the form of increased co… ▽ More The traditional Dialogue State Tracking (DST) problem aims to track user preferences and intents in user-agent conversations. While sufficient for task-oriented dialogue systems supporting narrow domain applications, the advent of Large Language Model (LLM)-based chat systems has introduced many real-world intricacies in open-domain dialogues. These intricacies manifest in the form of increased complexity in contextual interactions, extended dialogue sessions encompassing a diverse array of topics, and more frequent contextual shifts. To handle these intricacies arising from evolving LLM-based chat systems, we propose joint dialogue segmentation and state tracking per segment in open-domain dialogue systems. Assuming a zero-shot setting appropriate to a true open-domain dialogue system, we propose S3-DST, a structured prompting technique that harnesses Pre-Analytical Recollection, a novel grounding mechanism we designed for improving long context tracking. To demonstrate the efficacy of our proposed approach in joint segmentation and state tracking, we evaluate S3-DST on a proprietary anonymized open-domain dialogue dataset, as well as publicly available DST and segmentation datasets. Across all datasets and settings, S3-DST consistently outperforms the state-of-the-art, demonstrating its potency and robustness the next generation of LLM-based chat systems. △ Less

Submitted 15 September, 2023; originally announced September 2023.

arXiv:2309.04607 [pdf]

Linking Symptom Inventories using Semantic Textual Similarity

Authors: Eamonn Kennedy, Shashank Vadlamani, Hannah M Lindsey, Kelly S Peterson, Kristen Dams OConnor, Kenton Murray, Ronak Agarwal, Houshang H Amiri, Raeda K Andersen, Talin Babikian, David A Baron, Erin D Bigler, Karen Caeyenberghs, Lisa Delano-Wood, Seth G Disner, Ekaterina Dobryakova, Blessen C Eapen, Rachel M Edelstein, Carrie Esopenko, Helen M Genova, Elbert Geuze, Naomi J Goodrich-Hunsaker, Jordan Grafman, Asta K Haberg, Cooper B Hodges , et al. (57 additional authors not shown)

Abstract: An extensive library of symptom inventories has been developed over time to measure clinical symptoms, but this variety has led to several long standing issues. Most notably, results drawn from different settings and studies are not comparable, which limits reproducibility. Here, we present an artificial intelligence (AI) approach using semantic textual similarity (STS) to link symptoms and scores… ▽ More An extensive library of symptom inventories has been developed over time to measure clinical symptoms, but this variety has led to several long standing issues. Most notably, results drawn from different settings and studies are not comparable, which limits reproducibility. Here, we present an artificial intelligence (AI) approach using semantic textual similarity (STS) to link symptoms and scores across previously incongruous symptom inventories. We tested the ability of four pre-trained STS models to screen thousands of symptom description pairs for related content - a challenging task typically requiring expert panels. Models were tasked to predict symptom severity across four different inventories for 6,607 participants drawn from 16 international data sources. The STS approach achieved 74.8% accuracy across five tasks, outperforming other models tested. This work suggests that incorporating contextual, semantic information can assist expert decision-making processes, yielding gains for both general and disease-specific clinical assessment. △ Less

Submitted 8 September, 2023; originally announced September 2023.

arXiv:2304.04048 [pdf, other]

Polygonizer: An auto-regressive building delineator

Authors: Maxim Khomiakov, Michael Riis Andersen, Jes Frellsen

Abstract: In geospatial planning, it is often essential to represent objects in a vectorized format, as this format easily translates to downstream tasks such as web development, graphics, or design. While these problems are frequently addressed using semantic segmentation, which requires additional post-processing to vectorize objects in a non-trivial way, we present an Image-to-Sequence model that allows… ▽ More In geospatial planning, it is often essential to represent objects in a vectorized format, as this format easily translates to downstream tasks such as web development, graphics, or design. While these problems are frequently addressed using semantic segmentation, which requires additional post-processing to vectorize objects in a non-trivial way, we present an Image-to-Sequence model that allows for direct shape inference and is ready for vector-based workflows out of the box. We demonstrate the model's performance in various ways, including perturbations to the image input that correspond to variations or artifacts commonly encountered in remote sensing applications. Our model outperforms prior works when using ground truth bounding boxes (one object per image), achieving the lowest maximum tangent angle error. △ Less

Submitted 8 April, 2023; originally announced April 2023.

Comments: ICLR 2023 Workshop on Machine Learning in Remote Sensing

arXiv:2303.11215 [pdf, other]

Learning to Generate 3D Representations of Building Roofs Using Single-View Aerial Imagery

Authors: Maxim Khomiakov, Alejandro Valverde Mahou, Alba Reinders Sánchez, Jes Frellsen, Michael Riis Andersen

Abstract: We present a novel pipeline for learning the conditional distribution of a building roof mesh given pixels from an aerial image, under the assumption that roof geometry follows a set of regular patterns. Unlike alternative methods that require multiple images of the same object, our approach enables estimating 3D roof meshes using only a single image for predictions. The approach employs the PolyG… ▽ More We present a novel pipeline for learning the conditional distribution of a building roof mesh given pixels from an aerial image, under the assumption that roof geometry follows a set of regular patterns. Unlike alternative methods that require multiple images of the same object, our approach enables estimating 3D roof meshes using only a single image for predictions. The approach employs the PolyGen, a deep generative transformer architecture for 3D meshes. We apply this model in a new domain and investigate the sensitivity of the image resolution. We propose a novel metric to evaluate the performance of the inferred meshes, and our results show that the model is robust even at lower resolutions, while qualitatively producing realistic representations for out-of-distribution samples. △ Less

Submitted 20 March, 2023; originally announced March 2023.

Comments: Copyright 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

arXiv:2301.05983 [pdf, other]

On the role of Model Uncertainties in Bayesian Optimization

Authors: Jonathan Foldager, Mikkel Jordahn, Lars Kai Hansen, Michael Riis Andersen

Abstract: Bayesian optimization (BO) is a popular method for black-box optimization, which relies on uncertainty as part of its decision-making process when deciding which experiment to perform next. However, not much work has addressed the effect of uncertainty on the performance of the BO algorithm and to what extent calibrated uncertainties improve the ability to find the global optimum. In this work, we… ▽ More Bayesian optimization (BO) is a popular method for black-box optimization, which relies on uncertainty as part of its decision-making process when deciding which experiment to perform next. However, not much work has addressed the effect of uncertainty on the performance of the BO algorithm and to what extent calibrated uncertainties improve the ability to find the global optimum. In this work, we provide an extensive study of the relationship between the BO performance (regret) and uncertainty calibration for popular surrogate models and compare them across both synthetic and real-world experiments. Our results confirm that Gaussian Processes are strong surrogate models and that they tend to outperform other popular models. Our results further show a positive association between calibration error and regret, but interestingly, this association disappears when we control for the type of model in the analysis. We also studied the effect of re-calibration and demonstrate that it generally does not lead to improved regret. Finally, we provide theoretical justification for why uncertainty calibration might be difficult to combine with BO due to the small sample sizes commonly used. △ Less

Submitted 14 January, 2023; originally announced January 2023.

Comments: 14 pages, 4 figures, 2 tables

arXiv:2212.01260 [pdf, other]

SolarDK: A high-resolution urban solar panel image classification and localization dataset

Authors: Maxim Khomiakov, Julius Holbech Radzikowski, Carl Anton Schmidt, Mathias Bonde Sørensen, Mads Andersen, Michael Riis Andersen, Jes Frellsen

Abstract: The body of research on classification of solar panel arrays from aerial imagery is increasing, yet there are still not many public benchmark datasets. This paper introduces two novel benchmark datasets for classifying and localizing solar panel arrays in Denmark: A human annotated dataset for classification and segmentation, as well as a classification dataset acquired using self-reported data fr… ▽ More The body of research on classification of solar panel arrays from aerial imagery is increasing, yet there are still not many public benchmark datasets. This paper introduces two novel benchmark datasets for classifying and localizing solar panel arrays in Denmark: A human annotated dataset for classification and segmentation, as well as a classification dataset acquired using self-reported data from the Danish national building registry. We explore the performance of prior works on the new benchmark dataset, and present results after fine-tuning models using a similar approach as recent works. Furthermore, we train models of newer architectures and provide benchmark baselines to our datasets in several scenarios. We believe the release of these datasets may improve future research in both local and global geospatial domains for identifying and mapping of solar panel arrays from aerial imagery. The data is accessible at https://osf.io/aj539/. △ Less

Submitted 2 December, 2022; originally announced December 2022.

Comments: 7 pages, 2 figures, to access the dataset, see https://osf.io/aj539/

arXiv:2203.15945 [pdf, other]

A Framework for Improving the Reliability of Black-box Variational Inference

Authors: Manushi Welandawe, Michael Riis Andersen, Aki Vehtari, Jonathan H. Huggins

Abstract: Black-box variational inference (BBVI) now sees widespread use in machine learning and statistics as a fast yet flexible alternative to Markov chain Monte Carlo methods for approximate Bayesian inference. However, stochastic optimization methods for BBVI remain unreliable and require substantial expertise and hand-tuning to apply effectively. In this paper, we propose Robust and Automated Black-bo… ▽ More Black-box variational inference (BBVI) now sees widespread use in machine learning and statistics as a fast yet flexible alternative to Markov chain Monte Carlo methods for approximate Bayesian inference. However, stochastic optimization methods for BBVI remain unreliable and require substantial expertise and hand-tuning to apply effectively. In this paper, we propose Robust and Automated Black-box VI (RABVI), a framework for improving the reliability of BBVI optimization. RABVI is based on rigorously justified automation techniques, includes just a small number of intuitive tuning parameters, and detects inaccurate estimates of the optimal variational approximation. RABVI adaptively decreases the learning rate by detecting convergence of the fixed--learning-rate iterates, then estimates the symmetrized Kullback--Leibler (KL) divergence between the current variational approximation and the optimal one. It also employs a novel optimization termination criterion that enables the user to balance desired accuracy against computational cost by comparing (i) the predicted relative decrease in the symmetrized KL divergence if a smaller learning were used and (ii) the predicted computation required to converge with the smaller learning rate. We validate the robustness and accuracy of RABVI through carefully designed simulation studies and on a diverse set of real-world model and data examples. △ Less

Submitted 16 May, 2024; v1 submitted 29 March, 2022; originally announced March 2022.

arXiv:2202.03268 [pdf, other]

Cyber-resilience for marine navigation by information fusion and change detection

Authors: Dimitrios Dagdilelis, Mogens Blanke, Rasmus Hjorth Andersen, Roberto Galeazzi

Abstract: Cyber-resilience is an increasing concern in developing autonomous navigation solutions for marine vessels. This paper scrutinizes cyber-resilience properties of marine navigation through a prism with three edges: multiple sensor information fusion, diagnosis of not-normal behaviours, and change detection. It proposes a two-stage estimator for diagnosis and mitigation of sensor signals used for co… ▽ More Cyber-resilience is an increasing concern in developing autonomous navigation solutions for marine vessels. This paper scrutinizes cyber-resilience properties of marine navigation through a prism with three edges: multiple sensor information fusion, diagnosis of not-normal behaviours, and change detection. It proposes a two-stage estimator for diagnosis and mitigation of sensor signals used for coastal navigation. Developing a Likelihood Field approach, a first stage extracts shoreline features from radar and matches them to the electronic navigation chart. A second stage associates buoy and beacon features from the radar with chart information. Using real data logged at sea tests combined with simulated spoofing, the paper verifies the ability to timely diagnose and isolate an attempt to compromise position measurements. A new approach is suggested for high level processing of received data to evaluate their consistency, that is agnostic to the underlying technology of the individual sensory input. A combined parametric Gaussian modelling and Kernel Density Estimation is suggested and compared with a generalized likelihood ratio change detector that uses sliding windows. The paper shows how deviations from nominal behaviour and isolation of the components is possible when under attack or when defects in sensors occur. △ Less

Submitted 1 February, 2022; originally announced February 2022.

Comments: 18 pages, 21 figures

ACM Class: G.3; I.2; I.4; I.5

arXiv:2103.01085 [pdf, other]

Challenges and Opportunities in High-dimensional Variational Inference

Authors: Akash Kumar Dhaka, Alejandro Catalina, Manushi Welandawe, Michael Riis Andersen, Jonathan Huggins, Aki Vehtari

Abstract: Current black-box variational inference (BBVI) methods require the user to make numerous design choices -- such as the selection of variational objective and approximating family -- yet there is little principled guidance on how to do so. We develop a conceptual framework and set of experimental tools to understand the effects of these choices, which we leverage to propose best practices for maxim… ▽ More Current black-box variational inference (BBVI) methods require the user to make numerous design choices -- such as the selection of variational objective and approximating family -- yet there is little principled guidance on how to do so. We develop a conceptual framework and set of experimental tools to understand the effects of these choices, which we leverage to propose best practices for maximizing posterior approximation accuracy. Our approach is based on studying the pre-asymptotic tail behavior of the density ratios between the joint distribution and the variational approximation, then exploiting insights and tools from the importance sampling literature. Our framework and supporting experiments help to distinguish between the behavior of BBVI methods for approximating low-dimensional versus moderate-to-high-dimensional posteriors. In the latter case, we show that mass-covering variational objectives are difficult to optimize and do not improve accuracy, but flexible variational families can improve accuracy and the effectiveness of importance sampling -- at the cost of additional optimization challenges. Therefore, for moderate-to-high-dimensional posteriors we recommend using the (mode-seeking) exclusive KL divergence since it is the easiest to optimize, and improving the variational family or using model parameter transformations to make the posterior and optimal variational approximation more similar. On the other hand, in low-dimensional settings, we show that heavy-tailed variational families and mass-covering divergences are effective and can increase the chances that the approximation can be improved by importance sampling. △ Less

Submitted 30 June, 2021; v1 submitted 1 March, 2021; originally announced March 2021.

arXiv:2009.00666 [pdf, other]

Robust, Accurate Stochastic Optimization for Variational Inference

Authors: Akash Kumar Dhaka, Alejandro Catalina, Michael Riis Andersen, Måns Magnusson, Jonathan H. Huggins, Aki Vehtari

Abstract: We consider the problem of fitting variational posterior approximations using stochastic optimization methods. The performance of these approximations depends on (1) how well the variational family matches the true posterior distribution,(2) the choice of divergence, and (3) the optimization of the variational objective. We show that even in the best-case scenario when the exact posterior belongs… ▽ More We consider the problem of fitting variational posterior approximations using stochastic optimization methods. The performance of these approximations depends on (1) how well the variational family matches the true posterior distribution,(2) the choice of divergence, and (3) the optimization of the variational objective. We show that even in the best-case scenario when the exact posterior belongs to the assumed variational family, common stochastic optimization methods lead to poor variational approximations if the problem dimension is moderately large. We also demonstrate that these methods are not robust across diverse model types. Motivated by these findings, we develop a more robust and accurate stochastic optimization framework by viewing the underlying optimization algorithm as producing a Markov chain. Our approach is theoretically motivated and includes a diagnostic for convergence and a novel stopping rule, both of which are robust to noisy evaluations of the objective function. We show empirically that the proposed framework works well on a diverse set of models: it can automatically detect stochastic optimization failure or inaccurate variational approximation △ Less

Submitted 3 September, 2020; v1 submitted 1 September, 2020; originally announced September 2020.

Journal ref: NeurIPS 2020

arXiv:2007.05994 [pdf, other]

State Space Expectation Propagation: Efficient Inference Schemes for Temporal Gaussian Processes

Authors: William J. Wilkinson, Paul E. Chang, Michael Riis Andersen, Arno Solin

Abstract: We formulate approximate Bayesian inference in non-conjugate temporal and spatio-temporal Gaussian process models as a simple parameter update rule applied during Kalman smoothing. This viewpoint encompasses most inference schemes, including expectation propagation (EP), the classical (Extended, Unscented, etc.) Kalman smoothers, and variational inference. We provide a unifying perspective on thes… ▽ More We formulate approximate Bayesian inference in non-conjugate temporal and spatio-temporal Gaussian process models as a simple parameter update rule applied during Kalman smoothing. This viewpoint encompasses most inference schemes, including expectation propagation (EP), the classical (Extended, Unscented, etc.) Kalman smoothers, and variational inference. We provide a unifying perspective on these algorithms, showing how replacing the power EP moment matching step with linearisation recovers the classical smoothers. EP provides some benefits over the traditional methods via introduction of the so-called cavity distribution, and we combine these benefits with the computational efficiency of linearisation, providing extensive empirical analysis demonstrating the efficacy of various algorithms under this unifying framework. We provide a fast implementation of all methods in JAX. △ Less

Submitted 12 July, 2020; originally announced July 2020.

Comments: Accepted to International Conference on Machine Learning (ICML) 2020

arXiv:2003.11435 [pdf, other]

Preferential Batch Bayesian Optimization

Authors: Eero Siivola, Akash Kumar Dhaka, Michael Riis Andersen, Javier Gonzalez, Pablo Garcia Moreno, Aki Vehtari

Abstract: Most research in Bayesian optimization (BO) has focused on \emph{direct feedback} scenarios, where one has access to exact values of some expensive-to-evaluate objective. This direction has been mainly driven by the use of BO in machine learning hyper-parameter configuration problems. However, in domains such as modelling human preferences, A/B tests, or recommender systems, there is a need for me… ▽ More Most research in Bayesian optimization (BO) has focused on \emph{direct feedback} scenarios, where one has access to exact values of some expensive-to-evaluate objective. This direction has been mainly driven by the use of BO in machine learning hyper-parameter configuration problems. However, in domains such as modelling human preferences, A/B tests, or recommender systems, there is a need for methods that can replace direct feedback with \emph{preferential feedback}, obtained via rankings or pairwise comparisons. In this work, we present preferential batch Bayesian optimization (PBBO), a new framework that allows finding the optimum of a latent function of interest, given any type of parallel preferential feedback for a group of two or more points. We do so by using a Gaussian process model with a likelihood specially designed to enable parallel and efficient data collection mechanisms, which are key in modern machine learning. We show how the acquisitions developed under this framework generalize and augment previous approaches in Bayesian optimization, expanding the use of these techniques to a wider range of domains. An extensive simulation study shows the benefits of this approach, both with simulated functions and four real data sets. △ Less

Submitted 31 August, 2021; v1 submitted 25 March, 2020; originally announced March 2020.

Comments: 6 pages + 7 pages in supplementary material

arXiv:1904.10679 [pdf, other]

Bayesian leave-one-out cross-validation for large data

Authors: Måns Magnusson, Michael Riis Andersen, Johan Jonasson, Aki Vehtari

Abstract: Model inference, such as model comparison, model checking, and model selection, is an important part of model development. Leave-one-out cross-validation (LOO) is a general approach for assessing the generalizability of a model, but unfortunately, LOO does not scale well to large datasets. We propose a combination of using approximate inference techniques and probability-proportional-to-size-sampl… ▽ More Model inference, such as model comparison, model checking, and model selection, is an important part of model development. Leave-one-out cross-validation (LOO) is a general approach for assessing the generalizability of a model, but unfortunately, LOO does not scale well to large datasets. We propose a combination of using approximate inference techniques and probability-proportional-to-size-sampling (PPS) for fast LOO model evaluation for large datasets. We provide both theoretical and empirical results showing good properties for large data. △ Less

Submitted 24 April, 2019; originally announced April 2019.

Comments: Accepted to ICML 2019. This version is the submitted paper

Journal ref: Thirty-sixth International Conference on Machine Learning, PMLR 97:4244-4253, 2019

arXiv:1901.11436 [pdf, other]

End-to-End Probabilistic Inference for Nonstationary Audio Analysis

Authors: William J. Wilkinson, Michael Riis Andersen, Joshua D. Reiss, Dan Stowell, Arno Solin

Abstract: A typical audio signal processing pipeline includes multiple disjoint analysis stages, including calculation of a time-frequency representation followed by spectrogram-based feature analysis. We show how time-frequency analysis and nonnegative matrix factorisation can be jointly formulated as a spectral mixture Gaussian process model with nonstationary priors over the amplitude variance parameters… ▽ More A typical audio signal processing pipeline includes multiple disjoint analysis stages, including calculation of a time-frequency representation followed by spectrogram-based feature analysis. We show how time-frequency analysis and nonnegative matrix factorisation can be jointly formulated as a spectral mixture Gaussian process model with nonstationary priors over the amplitude variance parameters. Further, we formulate this nonlinear model's state space representation, making it amenable to infinite-horizon Gaussian process regression with approximate inference via expectation propagation, which scales linearly in the number of time steps and quadratically in the state dimensionality. By doing so, we are able to process audio signals with hundreds of thousands of data points. We demonstrate, on various tasks with empirical data, how this inference scheme outperforms more standard techniques that rely on extended Kalman filtering. △ Less

Submitted 27 April, 2019; v1 submitted 31 January, 2019; originally announced January 2019.

Comments: Accepted to the Thirty-sixth International Conference on Machine Learning (ICML) 2019

arXiv:1811.02489 [pdf, other]

Unifying Probabilistic Models for Time-Frequency Analysis

Authors: William J. Wilkinson, Michael Riis Andersen, Joshua D. Reiss, Dan Stowell, Arno Solin

Abstract: In audio signal processing, probabilistic time-frequency models have many benefits over their non-probabilistic counterparts. They adapt to the incoming signal, quantify uncertainty, and measure correlation between the signal's amplitude and phase information, making time domain resynthesis straightforward. However, these models are still not widely used since they come at a high computational cos… ▽ More In audio signal processing, probabilistic time-frequency models have many benefits over their non-probabilistic counterparts. They adapt to the incoming signal, quantify uncertainty, and measure correlation between the signal's amplitude and phase information, making time domain resynthesis straightforward. However, these models are still not widely used since they come at a high computational cost, and because they are formulated in such a way that it can be difficult to interpret all the modelling assumptions. By showing their equivalence to Spectral Mixture Gaussian processes, we illuminate the underlying model assumptions and provide a general framework for constructing more complex models that better approximate real-world signals. Our interpretation makes it intuitive to inspect, compare, and alter the models since all prior knowledge is encoded in the Gaussian process kernel functions. We utilise a state space representation to perform efficient inference via Kalman smoothing, and we demonstrate how our interpretation allows for efficient parameter learning in the frequency domain. △ Less

Submitted 12 February, 2019; v1 submitted 6 November, 2018; originally announced November 2018.

Comments: Accepted to International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019

arXiv:1401.3830 [pdf]

doi 10.1613/jair.2905

Interactive Cost Configuration Over Decision Diagrams

Authors: Henrik Reif Andersen, Tarik Hadzic, David Pisinger

Abstract: In many AI domains such as product configuration, a user should interactively specify a solution that must satisfy a set of constraints. In such scenarios, offline compilation of feasible solutions into a tractable representation is an important approach to delivering efficient backtrack-free user interaction online. In particular,binary decision diagrams (BDDs) have been successfully used as a… ▽ More In many AI domains such as product configuration, a user should interactively specify a solution that must satisfy a set of constraints. In such scenarios, offline compilation of feasible solutions into a tractable representation is an important approach to delivering efficient backtrack-free user interaction online. In particular,binary decision diagrams (BDDs) have been successfully used as a compilation target for product and service configuration. In this paper we discuss how to extend BDD-based configuration to scenarios involving cost functions which express user preferences. We first show that an efficient, robust and easy to implement extension is possible if the cost function is additive, and feasible solutions are represented using multi-valued decision diagrams (MDDs). We also discuss the effect on MDD size if the cost function is non-additive or if it is encoded explicitly into MDD. We then discuss interactive configuration in the presence of multiple cost functions. We prove that even in its simplest form, multiple-cost configuration is NP-hard in the input MDD. However, for solving two-cost configuration we develop a pseudo-polynomial scheme and a fully polynomial approximation scheme. The applicability of our approach is demonstrated through experiments over real-world configuration models and product-catalogue datasets. Response times are generally within a fraction of a second even for very large instances. △ Less

Submitted 15 January, 2014; originally announced January 2014.

Journal ref: Journal Of Artificial Intelligence Research, Volume 37, pages 99-139, 2010

arXiv:0907.3631 [pdf, ps, other]

Interchanging distance and capacity in probabilistic mappings

Authors: Reid Andersen, Uriel Feige

Abstract: Harald Racke [STOC 2008] described a new method to obtain hierarchical decompositions of networks in a way that minimizes the congestion. Racke's approach is based on an equivalence that he discovered between minimizing congestion and minimizing stretch (in a certain setting). Here we present Racke's equivalence in an abstract setting that is more general than the one described in Racke's work,… ▽ More Harald Racke [STOC 2008] described a new method to obtain hierarchical decompositions of networks in a way that minimizes the congestion. Racke's approach is based on an equivalence that he discovered between minimizing congestion and minimizing stretch (in a certain setting). Here we present Racke's equivalence in an abstract setting that is more general than the one described in Racke's work, and clarifies the power of Racke's result. In addition, we present a related (but different) equivalence that was developed by Yuval Emek [ESA 2009] and is only known to apply to planar graphs. △ Less

Submitted 21 July, 2009; originally announced July 2009.

Comments: 16 pages, no figures

ACM Class: F.2.2; G.2.2

arXiv:0811.3779 [pdf, ps, other]

Finding Sparse Cuts Locally Using Evolving Sets

Authors: Reid Andersen, Yuval Peres

Abstract: A {\em local graph partitioning algorithm} finds a set of vertices with small conductance (i.e. a sparse cut) by adaptively exploring part of a large graph $G$, starting from a specified vertex. For the algorithm to be local, its complexity must be bounded in terms of the size of the set that it outputs, with at most a weak dependence on the number $n$ of vertices in $G$. Previous local partitio… ▽ More A {\em local graph partitioning algorithm} finds a set of vertices with small conductance (i.e. a sparse cut) by adaptively exploring part of a large graph $G$, starting from a specified vertex. For the algorithm to be local, its complexity must be bounded in terms of the size of the set that it outputs, with at most a weak dependence on the number $n$ of vertices in $G$. Previous local partitioning algorithms find sparse cuts using random walks and personalized PageRank. In this paper, we introduce a randomized local partitioning algorithm that finds a sparse cut by simulating the {\em volume-biased evolving set process}, which is a Markov chain on sets of vertices. We prove that for any set of vertices $A$ that has conductance at most $φ$, for at least half of the starting vertices in $A$ our algorithm will output (with probability at least half), a set of conductance $O(φ^{1/2} \log^{1/2} n)$. We prove that for a given run of the algorithm, the expected ratio between its computational complexity and the volume of the set that it outputs is $O(φ^{-1/2} polylog(n))$. In comparison, the best previous local partitioning algorithm, due to Andersen, Chung, and Lang, has the same approximation guarantee, but a larger ratio of $O(φ^{-1} polylog(n))$ between the complexity and output volume. Using our local partitioning algorithm as a subroutine, we construct a fast algorithm for finding balanced cuts. Given a fixed value of $φ$, the resulting algorithm has complexity $O((m+nφ^{-1/2}) polylog(n))$ and returns a cut with conductance $O(φ^{1/2} \log^{1/2} n)$ and volume at least $v_φ/2$, where $v_φ$ is the largest volume of any set with conductance at most $φ$. △ Less

Submitted 23 November, 2008; originally announced November 2008.

Comments: 20 pages, no figures

ACM Class: F.2.2

arXiv:0705.4604 [pdf, other]

Temporal Runtime Verification using Monadic Difference Logic

Authors: Henrik Reif Andersen, Kaare J. Kristoffersen

Abstract: In this paper we present an algorithm for performing runtime verification of a bounded temporal logic over timed runs. The algorithm consists of three elements. First, the bounded temporal formula to be verified is translated into a monadic first-order logic over difference inequalities, which we call monadic difference logic. Second, at each step of the timed run, the monadic difference formula… ▽ More In this paper we present an algorithm for performing runtime verification of a bounded temporal logic over timed runs. The algorithm consists of three elements. First, the bounded temporal formula to be verified is translated into a monadic first-order logic over difference inequalities, which we call monadic difference logic. Second, at each step of the timed run, the monadic difference formula is modified by computing a quotient with the state and time of that step. Third, the resulting formula is checked for being a tautology or being unsatisfiable by a decision procedure for monadic difference logic. We further provide a simple decision procedure for monadic difference logic based on the data structure Difference Decision Diagrams. The algorithm is complete in a very strong sense on a subclass of temporal formulae characterized as homogeneously monadic and it is approximate on other formulae. The approximation comes from the fact that not all unsatisfiable or tautological formulae are recognised at the earliest possible time of the runtime verification. Contrary to existing approaches, the presented algorithms do not work by syntactic rewriting but employ efficient decision structures which make them applicable in real applications within for instance business software. △ Less

Submitted 31 May, 2007; originally announced May 2007.

ACM Class: D.2.4; D.2.5

arXiv:0704.1394 [pdf, ps, other]

Calculating Valid Domains for BDD-Based Interactive Configuration

Authors: Tarik Hadzic, Rune Moller Jensen, Henrik Reif Andersen

Abstract: In these notes we formally describe the functionality of Calculating Valid Domains from the BDD representing the solution space of valid configurations. The formalization is largely based on the CLab configuration framework. In these notes we formally describe the functionality of Calculating Valid Domains from the BDD representing the solution space of valid configurations. The formalization is largely based on the CLab configuration framework. △ Less

Submitted 11 April, 2007; originally announced April 2007.

arXiv:cs/0702170 [pdf, ps, other]

Generic Global Constraints based on MDDs

Authors: Peter Tiedemann, Henrik Reif Andersen, Rasmus Pagh

Abstract: Constraint Programming (CP) has been successfully applied to both constraint satisfaction and constraint optimization problems. A wide variety of specialized global constraints provide critical assistance in achieving a good model that can take advantage of the structure of the problem in the search for a solution. However, a key outstanding issue is the representation of 'ad-hoc' constraints th… ▽ More Constraint Programming (CP) has been successfully applied to both constraint satisfaction and constraint optimization problems. A wide variety of specialized global constraints provide critical assistance in achieving a good model that can take advantage of the structure of the problem in the search for a solution. However, a key outstanding issue is the representation of 'ad-hoc' constraints that do not have an inherent combinatorial nature, and hence are not modeled well using narrowly specialized global constraints. We attempt to address this issue by considering a hybrid of search and compilation. Specifically we suggest the use of Reduced Ordered Multi-Valued Decision Diagrams (ROMDDs) as the supporting data structure for a generic global constraint. We give an algorithm for maintaining generalized arc consistency (GAC) on this constraint that amortizes the cost of the GAC computation over a root-to-leaf path in the search tree without requiring asymptotically more space than used for the MDD. Furthermore we present an approach for incrementally maintaining the reduced property of the MDD during the search, and show how this can be used for providing domain entailment detection. Finally we discuss how to apply our approach to other similar data structures such as AOMDDs and Case DAGs. The technique used can be seen as an extension of the GAC algorithm for the regular language constraint on finite length input. △ Less

Submitted 28 February, 2007; originally announced February 2007.

Comments: Preliminary 15 pages version of the tech-report cs.AI/0611141

arXiv:cs/0702078 [pdf, ps, other]

A Local Algorithm for Finding Dense Subgraphs

Authors: Reid Andersen

Abstract: We present a local algorithm for finding dense subgraphs of bipartite graphs, according to the definition of density proposed by Kannan and Vinay. Our algorithm takes as input a bipartite graph with a specified starting vertex, and attempts to find a dense subgraph near that vertex. We prove that for any subgraph S with k vertices and density theta, there are a significant number of starting ver… ▽ More We present a local algorithm for finding dense subgraphs of bipartite graphs, according to the definition of density proposed by Kannan and Vinay. Our algorithm takes as input a bipartite graph with a specified starting vertex, and attempts to find a dense subgraph near that vertex. We prove that for any subgraph S with k vertices and density theta, there are a significant number of starting vertices within S for which our algorithm produces a subgraph S' with density theta / O(log n) on at most O(D k^2) vertices, where D is the maximum degree. The running time of the algorithm is O(D k^2), independent of the number of vertices in the graph. △ Less

Submitted 13 February, 2007; originally announced February 2007.

Comments: 14 pages, no figures

ACM Class: F.2.2; G.2.2

arXiv:cs/0702032 [pdf, ps, other]

Finding large and small dense subgraphs

Authors: Reid Andersen

Abstract: We consider two optimization problems related to finding dense subgraphs. The densest at-least-k-subgraph problem (DalkS) is to find an induced subgraph of highest average degree among all subgraphs with at least k vertices, and the densest at-most-k-subgraph problem (DamkS) is defined similarly. These problems are related to the well-known densest k-subgraph problem (DkS), which is to find the… ▽ More We consider two optimization problems related to finding dense subgraphs. The densest at-least-k-subgraph problem (DalkS) is to find an induced subgraph of highest average degree among all subgraphs with at least k vertices, and the densest at-most-k-subgraph problem (DamkS) is defined similarly. These problems are related to the well-known densest k-subgraph problem (DkS), which is to find the densest subgraph on exactly k vertices. We show that DalkS can be approximated efficiently, while DamkS is nearly as hard to approximate as the densest k-subgraph problem. △ Less

Submitted 5 February, 2007; originally announced February 2007.

Comments: 12 pages, no figures

ACM Class: F.2.2; G.2.2

arXiv:cs/0612068 [pdf, ps, other]

Interactive Configuration by Regular String Constraints

Authors: Esben Rune Hansen, Henrik Reif Andersen

Abstract: A product configurator which is complete, backtrack free and able to compute the valid domains at any state of the configuration can be constructed by building a Binary Decision Diagram (BDD). Despite the fact that the size of the BDD is exponential in the number of variables in the worst case, BDDs have proved to work very well in practice. Current BDD-based techniques can only handle interacti… ▽ More A product configurator which is complete, backtrack free and able to compute the valid domains at any state of the configuration can be constructed by building a Binary Decision Diagram (BDD). Despite the fact that the size of the BDD is exponential in the number of variables in the worst case, BDDs have proved to work very well in practice. Current BDD-based techniques can only handle interactive configuration with small finite domains. In this paper we extend the approach to handle string variables constrained by regular expressions. The user is allowed to change the strings by adding letters at the end of the string. We show how to make a data structure that can perform fast valid domain computations given some assignment on the set of string variables. We first show how to do this by using one large DFA. Since this approach is too space consuming to be of practical use, we construct a data structure that simulates the large DFA and in most practical cases are much more space efficient. As an example a configuration problem on $n$ string variables with only one solution in which each string variable is assigned to a value of length of $k$ the former structure will use $Ω(k^n)$ space whereas the latter only need $O(kn)$. We also show how this framework easily can be combined with the recent BDD techniques to allow both boolean, integer and string variables in the configuration problem. △ Less

Submitted 12 December, 2006; originally announced December 2006.

Comments: Tech Report

arXiv:cs/0611141 [pdf, ps, other]

A Generic Global Constraint based on MDDs

Authors: Peter Tiedemann, Henrik Reif Andersen, Rasmus Pagh

Abstract: The paper suggests the use of Multi-Valued Decision Diagrams (MDDs) as the supporting data structure for a generic global constraint. We give an algorithm for maintaining generalized arc consistency (GAC) on this constraint that amortizes the cost of the GAC computation over a root-to-terminal path in the search tree. The technique used is an extension of the GAC algorithm for the regular langua… ▽ More The paper suggests the use of Multi-Valued Decision Diagrams (MDDs) as the supporting data structure for a generic global constraint. We give an algorithm for maintaining generalized arc consistency (GAC) on this constraint that amortizes the cost of the GAC computation over a root-to-terminal path in the search tree. The technique used is an extension of the GAC algorithm for the regular language constraint on finite length input. Our approach adds support for skipped variables, maintains the reduced property of the MDD dynamically and provides domain entailment detection. Finally we also show how to adapt the approach to constraint types that are closely related to MDDs, such as AOMDDs and Case DAGs. △ Less

Submitted 28 November, 2006; originally announced November 2006.

Comments: Tech report, 31 pages, 3 figures

Showing 1–32 of 32 results for author: Andersen, R