Search | arXiv e-print repository

Steering Llama 2 via Contrastive Activation Addition

Authors: Nina Panickssery, Nick Gabrieli, Julian Schulz, Meg Tong, Evan Hubinger, Alexander Matt Turner

Abstract: We introduce Contrastive Activation Addition (CAA), an innovative method for steering language models by modifying their activations during forward passes. CAA computes "steering vectors" by averaging the difference in residual stream activations between pairs of positive and negative examples of a particular behavior, such as factual versus hallucinatory responses. During inference, these steerin… ▽ More We introduce Contrastive Activation Addition (CAA), an innovative method for steering language models by modifying their activations during forward passes. CAA computes "steering vectors" by averaging the difference in residual stream activations between pairs of positive and negative examples of a particular behavior, such as factual versus hallucinatory responses. During inference, these steering vectors are added at all token positions after the user's prompt with either a positive or negative coefficient, allowing precise control over the degree of the targeted behavior. We evaluate CAA's effectiveness on Llama 2 Chat using multiple-choice behavioral question datasets and open-ended generation tasks. We demonstrate that CAA significantly alters model behavior, is effective over and on top of traditional methods like finetuning and system prompt design, and minimally reduces capabilities. Moreover, we gain deeper insights into CAA's mechanisms by employing various activation space interpretation methods. CAA accurately steers model outputs and sheds light on how high-level concepts are represented in Large Language Models (LLMs). △ Less

Submitted 5 July, 2024; v1 submitted 8 December, 2023; originally announced December 2023.

arXiv:2207.02523 [pdf, other]

doi 10.1007/978-3-031-21131-7_18

Modeling Node Exposure for Community Detection in Networks

Authors: Sameh Othman, Johannes Schulz, Marco Baity-Jesi, Caterina De Bacco

Abstract: In community detection, datasets often suffer a sampling bias for which nodes which would normally have a high affinity appear to have zero affinity. This happens for example when two affine users of a social network were not exposed to one another. Community detection on this kind of data suffers then from considering affine nodes as not affine. To solve this problem, we explicitly model the (non… ▽ More In community detection, datasets often suffer a sampling bias for which nodes which would normally have a high affinity appear to have zero affinity. This happens for example when two affine users of a social network were not exposed to one another. Community detection on this kind of data suffers then from considering affine nodes as not affine. To solve this problem, we explicitly model the (non-)exposure mechanism in a Bayesian community detection framework, by introducing a set of additional hidden variables. Compared to approaches which do not model exposure, our method is able to better reconstruct the input graph, while maintaining a similar performance in recovering communities. Importantly, it allows to estimate the probability that two nodes have been exposed, a possibility not available with standard models. △ Less

Submitted 6 July, 2022; originally announced July 2022.

Comments: 13 pages, 4 figures

arXiv:2206.06456 [pdf, other]

doi 10.3390/e24081021

A comparison of partial information decompositions using data from real and simulated layer 5b pyramidal cells

Authors: Jim W. Kay, Jan M. Schulz, W. A. Phillips

Abstract: Partial information decomposition allows the joint mutual information between an output and a set of inputs to be divided into components that are synergistic or shared or unique to each input. We consider five different decompositions and compare their results on data from layer 5b pyramidal cells in two different studies. The first study was of the amplification of somatic action potential outpu… ▽ More Partial information decomposition allows the joint mutual information between an output and a set of inputs to be divided into components that are synergistic or shared or unique to each input. We consider five different decompositions and compare their results on data from layer 5b pyramidal cells in two different studies. The first study was of the amplification of somatic action potential output by apical dendritic input and its regulation by dendritic inhibition. We find that two of the decompositions produce much larger estimates of synergy and shared information than the others, as well as large levels of unique misinformation. When within-neuron differences in the components are examined, the five methods produce more similar results for all but the shared information component, for which two methods produce a different statistical conclusion from the others. There are some differences in the expression of unique information asymmetry among the methods. It is significantly larger, on average, under dendritic inhibition. Three of the methods support a previous conclusion that apical amplification is reduced by dendritic inhibition. The second study used a detailed compartmental model to produce action potentials for many combinations of the numbers of basal and apical synaptic inputs. Two analyses of decompositions are conducted on subsets of the data. In the first, the decompositions reveal a bifurcation in unique information asymmetry. For three of the methods this suggests that apical drive switches to basal drive as the strength of the basal input increases, while the other two show changing mixtures of information and misinformation. Decompositions produced using the second set of subsets show that all five decompositions provide support for properties of cooperative context-sensitivity - to varying extents. △ Less

Submitted 13 June, 2022; originally announced June 2022.

Comments: 27 pages, 11 figures

Journal ref: Published in Entropy, 24th July, 2022, 24(8), 1021

arXiv:2206.02912 [pdf]

doi 10.1088/1361-6560/accdb0

Learning Image Representations for Content Based Image Retrieval of Radiotherapy Treatment Plans

Authors: Charles Huang, Varun Vasudevan, Oscar Pastor-Serrano, Md Tauhidul Islam, Yusuke Nomura, Piotr Dubrowski, Jen-Yeu Wang, Joseph B. Schulz, Yong Yang, Lei Xing

Abstract: Objective: Knowledge based planning (KBP) typically involves training an end-to-end deep learning model to predict dose distributions. However, training end-to-end methods may be associated with practical limitations due to the limited size of medical datasets that are often used. To address these limitations, we propose a content based image retrieval (CBIR) method for retrieving dose distributio… ▽ More Objective: Knowledge based planning (KBP) typically involves training an end-to-end deep learning model to predict dose distributions. However, training end-to-end methods may be associated with practical limitations due to the limited size of medical datasets that are often used. To address these limitations, we propose a content based image retrieval (CBIR) method for retrieving dose distributions of previously planned patients based on anatomical similarity. Approach: Our proposed CBIR method trains a representation model that produces latent space embeddings of a patient's anatomical information. The latent space embeddings of new patients are then compared against those of previous patients in a database for image retrieval of dose distributions. All source code for this project is available on github. Main Results: The retrieval performance of various CBIR methods is evaluated on a dataset consisting of both publicly available plans and clinical plans from our institution. This study compares various encoding methods, ranging from simple autoencoders to more recent Siamese networks like SimSiam, and the best performance was observed for the multitask Siamese network. Significance: Applying CBIR to inform subsequent treatment planning potentially addresses many limitations associated with end-to-end KBP. Our current results demonstrate that excellent image retrieval performance can be obtained through slight changes to previously developed Siamese networks. We hope to integrate CBIR into automated planning workflow in future works, potentially through methods like the MetaPlanner framework. △ Less

Submitted 23 August, 2022; v1 submitted 6 June, 2022; originally announced June 2022.

arXiv:2111.09121 [pdf, other]

Uncertainty Quantification of Surrogate Explanations: an Ordinal Consensus Approach

Authors: Jonas Schulz, Rafael Poyiadzi, Raul Santos-Rodriguez

Abstract: Explainability of black-box machine learning models is crucial, in particular when deployed in critical applications such as medicine or autonomous cars. Existing approaches produce explanations for the predictions of models, however, how to assess the quality and reliability of such explanations remains an open question. In this paper we take a step further in order to provide the practitioner wi… ▽ More Explainability of black-box machine learning models is crucial, in particular when deployed in critical applications such as medicine or autonomous cars. Existing approaches produce explanations for the predictions of models, however, how to assess the quality and reliability of such explanations remains an open question. In this paper we take a step further in order to provide the practitioner with tools to judge the trustworthiness of an explanation. To this end, we produce estimates of the uncertainty of a given explanation by measuring the ordinal consensus amongst a set of diverse bootstrapped surrogate explainers. While we encourage diversity by using ensemble techniques, we propose and analyse metrics to aggregate the information contained within the set of explainers through a rating scheme. We empirically illustrate the properties of this approach through experiments on state-of-the-art Convolutional Neural Network ensembles. Furthermore, through tailored visualisations, we show specific examples of situations where uncertainty estimates offer concrete actionable insights to the user beyond those arising from standard surrogate explainers. △ Less

Submitted 17 November, 2021; originally announced November 2021.

arXiv:2109.03027 [pdf, other]

Statistical analysis of locally parameterized shapes

Authors: Mohsen Taheri, Jörn Schulz

Abstract: The alignment of shapes has been a crucial step in statistical shape analysis, for example, in calculating mean shape, detecting locational differences between two shape populations, and classification. Procrustes alignment is the most commonly used method and state of the art. In this work, we uncover that alignment might seriously affect the statistical analysis. For example, alignment can induc… ▽ More The alignment of shapes has been a crucial step in statistical shape analysis, for example, in calculating mean shape, detecting locational differences between two shape populations, and classification. Procrustes alignment is the most commonly used method and state of the art. In this work, we uncover that alignment might seriously affect the statistical analysis. For example, alignment can induce false shape differences and lead to misleading results and interpretations. We propose a novel hierarchical shape parameterization based on local coordinate systems. The local parameterized shapes are translation and rotation invariant. Thus, the inherent alignment problems from the commonly used global coordinate system for shape representation can be avoided using this parameterization. The new parameterization is also superior for shape deformation and simulation. The method's power is demonstrated on the hypothesis testing of simulated data as well as the left hippocampi of patients with Parkinson's disease and controls. △ Less

Submitted 18 August, 2021; originally announced September 2021.

Comments: 25 pages, 20 figures

arXiv:2109.02230 [pdf, other]

Non-Euclidean Analysis of Joint Variations in Multi-Object Shapes

Authors: Zhiyuan Liu, Jörn Schulz, Mohsen Taheri, Martin Styner, James Damon, Stephen Pizer, J. S. Marron

Abstract: This paper considers joint analysis of multiple functionally related structures in classification tasks. In particular, our method developed is driven by how functionally correlated brain structures vary together between autism and control groups. To do so, we devised a method based on a novel combination of (1) non-Euclidean statistics that can faithfully represent non-Euclidean data in Euclidean… ▽ More This paper considers joint analysis of multiple functionally related structures in classification tasks. In particular, our method developed is driven by how functionally correlated brain structures vary together between autism and control groups. To do so, we devised a method based on a novel combination of (1) non-Euclidean statistics that can faithfully represent non-Euclidean data in Euclidean spaces and (2) a non-parametric integrative analysis method that can decompose multi-block Euclidean data into joint, individual, and residual structures. We find that the resulting joint structure is effective, robust, and interpretable in recognizing the underlying patterns of the joint variation of multi-block non-Euclidean data. We verified the method in classifying the structural shape data collected from cases that developed and did not develop into Autistic Spectrum Disorder (ASD). △ Less

Submitted 7 September, 2021; v1 submitted 5 September, 2021; originally announced September 2021.

arXiv:1906.05264 [pdf, other]

GluonTS: Probabilistic Time Series Models in Python

Authors: Alexander Alexandrov, Konstantinos Benidis, Michael Bohlke-Schneider, Valentin Flunkert, Jan Gasthaus, Tim Januschowski, Danielle C. Maddix, Syama Rangapuram, David Salinas, Jasper Schulz, Lorenzo Stella, Ali Caner Türkmen, Yuyang Wang

Abstract: We introduce Gluon Time Series (GluonTS, available at https://gluon-ts.mxnet.io), a library for deep-learning-based time series modeling. GluonTS simplifies the development of and experimentation with time series models for common tasks such as forecasting or anomaly detection. It provides all necessary components and tools that scientists need for quickly building new models, for efficiently runn… ▽ More We introduce Gluon Time Series (GluonTS, available at https://gluon-ts.mxnet.io), a library for deep-learning-based time series modeling. GluonTS simplifies the development of and experimentation with time series models for common tasks such as forecasting or anomaly detection. It provides all necessary components and tools that scientists need for quickly building new models, for efficiently running and analyzing experiments and for evaluating model accuracy. △ Less

Submitted 14 June, 2019; v1 submitted 12 June, 2019; originally announced June 2019.

Comments: ICML Time Series Workshop 2019

arXiv:1901.01291 [pdf, other]

On the Utility of Model Learning in HRI

Authors: Gokul Swamy, Jens Schulz, Rohan Choudhury, Dylan Hadfield-Menell, Anca Dragan

Abstract: Fundamental to robotics is the debate between model-based and model-free learning: should the robot build an explicit model of the world, or learn a policy directly? In the context of HRI, part of the world to be modeled is the human. One option is for the robot to treat the human as a black box and learn a policy for how they act directly. But it can also model the human as an agent, and rely on… ▽ More Fundamental to robotics is the debate between model-based and model-free learning: should the robot build an explicit model of the world, or learn a policy directly? In the context of HRI, part of the world to be modeled is the human. One option is for the robot to treat the human as a black box and learn a policy for how they act directly. But it can also model the human as an agent, and rely on a "theory of mind" to guide or bias the learning (grey box). We contribute a characterization of the performance of these methods for an autonomous driving task under the optimistic case of having an ideal theory of mind, as well as under different scenarios in which the assumptions behind the robot's theory of mind for the human are wrong, as they inevitably will be in practice. △ Less

Submitted 21 May, 2020; v1 submitted 4 January, 2019; originally announced January 2019.

arXiv:1804.10467 [pdf, other]

Interaction-Aware Probabilistic Behavior Prediction in Urban Environments

Authors: Jens Schulz, Constantin Hubmann, Julian Löchner, Darius Burschka

Abstract: Planning for autonomous driving in complex, urban scenarios requires accurate prediction of the trajectories of surrounding traffic participants. Their future behavior depends on their route intentions, the road-geometry, traffic rules and mutual interaction, resulting in interdependencies between their trajectories. We present a probabilistic prediction framework based on a dynamic Bayesian netwo… ▽ More Planning for autonomous driving in complex, urban scenarios requires accurate prediction of the trajectories of surrounding traffic participants. Their future behavior depends on their route intentions, the road-geometry, traffic rules and mutual interaction, resulting in interdependencies between their trajectories. We present a probabilistic prediction framework based on a dynamic Bayesian network, which represents the state of the complete scene including all agents and respects the aforementioned dependencies. We propose Markovian, context-dependent motion models to define the interaction-aware behavior of drivers. At first, the state of the dynamic Bayesian network is estimated over time by tracking the single agents via sequential Monte Carlo inference. Secondly, we perform a probabilistic forward simulation of the network's estimated belief state to generate the different combinatorial scene developments. This provides the corresponding trajectories for the set of possible, future scenes. Our framework can handle various road layouts and number of traffic participants. We evaluate the approach in online simulations and real-world scenarios. It is shown that our interaction-aware prediction outperforms interaction-unaware physics- and map-based approaches. △ Less

Submitted 28 August, 2018; v1 submitted 27 April, 2018; originally announced April 2018.

arXiv:1102.3643 [pdf, ps, other]

A Constant Factor Approximation Algorithm for Unsplittable Flow on Paths

Authors: Paul Bonsma, Jens Schulz, Andreas Wiese

Abstract: In the unsplittable flow problem on a path, we are given a capacitated path $P$ and $n$ tasks, each task having a demand, a profit, and start and end vertices. The goal is to compute a maximum profit set of tasks, such that for each edge $e$ of $P$, the total demand of selected tasks that use $e$ does not exceed the capacity of $e$. This is a well-studied problem that has been studied under altern… ▽ More In the unsplittable flow problem on a path, we are given a capacitated path $P$ and $n$ tasks, each task having a demand, a profit, and start and end vertices. The goal is to compute a maximum profit set of tasks, such that for each edge $e$ of $P$, the total demand of selected tasks that use $e$ does not exceed the capacity of $e$. This is a well-studied problem that has been studied under alternative names, such as resource allocation, bandwidth allocation, resource constrained scheduling, temporal knapsack and interval packing. We present a polynomial time constant-factor approximation algorithm for this problem. This improves on the previous best known approximation ratio of $O(\log n)$. The approximation ratio of our algorithm is $7+ε$ for any $ε>0$. We introduce several novel algorithmic techniques, which might be of independent interest: a framework which reduces the problem to instances with a bounded range of capacities, and a new geometrically inspired dynamic program which solves a special case of the maximum weight independent set of rectangles problem to optimality. In the setting of resource augmentation, wherein the capacities can be slightly violated, we give a $(2+ε)$-approximation algorithm. In addition, we show that the problem is strongly NP-hard even if all edge capacities are equal and all demands are either~1,~2, or~3. △ Less

Submitted 19 March, 2012; v1 submitted 17 February, 2011; originally announced February 2011.

Comments: 37 pages, 5 figures Version 2 contains the same results as version 1, but the presentation has been greatly revised and improved. References have been added

Showing 1–11 of 11 results for author: Schulz, J