We’re seeking a Research Manager to co-own the planning, execution, and evaluation of our 2025 Fellowship program. The Fellowship brings together researchers from diverse academic backgrounds to work on high-impact AI safety research projects with an AI Safety mentor already established in the field. While each fellow works with a dedicated project mentor, the research manager will ensure fellows have a productive and useful experience by unblocking them in critical moments and helping to guide their professional development.
Part-time (0.5 FTE): September 2025 (wrap-up and evaluation)
Total: 4.5 months FTE equivalent
We are hiring two part time managers, one as above, and one for fewer hours to top-up our capacity. If you cannot commit to as many hours as above, we still encourage you to apply.
Location
Fellowship Period (June-August): Berkeley or San Francisco (TBD)
Preparation & Wrap-up: Remote
Primary Responsibilities
Design and revise fellowship curriculum and materials
Plan and execute opening and closing retreats
Facilitate weekly research salons and community-building activities
Conduct regular check-ins with fellows and mentors
Coordinate with external speakers and partner organizations
Evaluate program effectiveness and write comprehensive reports
Support fellows in preparing and showcasing their research
Detailed responsibilities split by fellowship stage is available at this link. This is the second year we are hiring a research director, materials to help you into the role will be provided during the onboarding.
Qualifications
Mandatory:
Experience managing research programs or researchers
Strong communication or facilitation skills
Broad understanding of the AI risk landscape
A proactive spirit and proven ability to independently follow-through on complex plans
Proven ability to work effectively with researchers from diverse disciplines, and an interest in doing more of this (our fellows work across interpretability, philosophy, law, biology, and many more).
Project management experience
Event planning experience
Preferred:
Interdisciplinary research experience ,
Familiarity with the PIBBSS bet
5+ years of research experience
Previous AI safety research experience
We understand that people come from different disciplines with varied backgrounds, and encourage you to apply if you think you would be a good fit, even if your CV does not check every box.
Compensation & Benefits
Salary: $27,000-$36,000 USD (pre-tax, for full participation as above) based on experience
Accommodation: $1,000 per month during the Fellowship period (June-August)
Meals: One or two meals provided daily during the Fellowship
Travel: Fellowship-related travel costs (plane, train, bus tickets) will be provided
Reporting Structure
You will report directly to Dušan D. Nešić (PIBBSS Executive Team, Operations) and will be supported by Lucas Teixeira (PIBBSS Executive Team, Research)
Application Process
To apply, please fill in this formby March 30th EoD AoE.
Questions?
For questions about this role, please contact Dušan D. Nešić at dusan@pibbss.ai.
PIBBSS is a research initiative aiming to grow the AI Safety field by fostering interdisciplinary engagement with the natural and social sciences. Our mission is to support foundational scientific contributions which would aid in the development of safe and globally beneficial AI systems.
Role Overview
We’re seeking a Research Manager to own (or co-own) the planning, execution, and evaluation of our 2025 Fellowship program. The Fellowship brings together researchers from diverse academic backgrounds to work on high-impact AI safety research for both career transition and direct research impact. While each fellow works with a dedicated project mentor, the research manager will ensure fellows have a productive and useful experience by unblocking them in critical moments and helping to guide their professional development.
Part-time (0.5 FTE): September 2025 (wrap-up and evaluation)
Total: 5 months FTE equivalent
We are open to hiring two part time managers, in which case the time would be split accordingly.
Location
Fellowship Period (June-August): Either London or San Francisco (to be determined within the next few weeks)
Preparation & Wrap-up: Remote
Primary Responsibilities
Design and revise fellowship curriculum and materials
Plan and execute opening and closing retreats
Facilitate weekly research salons and community-building activities
Conduct regular check-ins with fellows and mentors
Coordinate with external speakers and partner organizations
Evaluate program effectiveness and write comprehensive reports
Support fellows in preparing and showcasing their research
Detailed responsibilities split by fellowship stagetime is available at this link.
Qualifications
Mandatory:
Experience managing research programs or researchers
Strong communication or facilitation skills
Broad understanding of the AI risk landscape
A proactive spirit and proven ability to independently follow-through on complex plans
Proven ability to work effectively with researchers from diverse disciplines, and an interest in doing more of this (our fellows work across interpretability, philosophy, law, biology, and many more).
Project management experience
Event planning experience
Preferred:
Interdisciplinary research experience ,
Familiarity with the PIBBSS bet
5+ years of research experience
Previous AI safety research experience
We understand that people come from different disciplines with varied backgrounds, and encourage you to apply if you think you would be a good fit, even if your CV does not check every box.
Compensation & Benefits
Salary: $6,000-$8,000 USD per month (pre-tax, FTE equivalent) based on experience
Accommodation: $1,000 per month during the Fellowship period (June-August)
Meals: One or two meals provided daily during the Fellowship
Travel: Fellowship-related travel costs (plane, train, bus tickets) will be provided
Reporting Structure
You will report directly to Dušan D. Nešić (PIBBSS Executive Team, Operations) and will be supported by:
Lucas Teixeira (PIBBSS Executive Team, Research)
The Horizon Scanning team
Application Process
To apply, please fill in this formby March 10th EoD AoE.
Questions?
For questions about this role, please contact Dušan D. Nešić at dusan@[this website URL]
The Symposium presents an opportunity to learn about the work of PIBBSS fellows conducted over the summer program. The program has concluded and recordings are available on our YouTube page.
The symposium took place in person in LISA offices in London, and online over several days in the week of September 9th.
Find the full agenda (including brief descriptions for each talk) below by toggling between the days.
Register at this link. We recommend joining through the Zoom application and making sure that you have the latest version of Zoom to ensure you can join the breakout rooms at the end and chat with the speakers more privately
Click here to add the schedule to your Google calendar.
Agenda
TUESDAY (10/09)
WEDNESDAY (11/09)
THURSDAY (12/09)
FRIDAY (13/09)
TUESDAY (10/09)
17:00 GMT[09:00 San Francisco, 18:00 London] – Solvable models of in-context learning
Nischal Mainali
Recent work has shown that linear transformers can learn an in-context regression algorithm at the global minimum of SGD dynamics. We extend this by analytically solving nonlinear training dynamics of these models. Despite the model’s linearity, the learning dynamics exhibits highly nonlinear training phenomena similar to those observed in practice. Properties of these dynamics can then be used to identify feature learning in real-world models and compare between them. Finally, we also explore conservation laws in training dynamics and how they, combined with initialization, influence learning capabilities and dynamics.
18:00 GMT[10:00 San Francisco, 19:00 London] – Factored Space Models: Causality Between Levels of Abstraction
Magdalena Wache
Causality plays an important role in understanding intelligent behavior, and there is a wealth of literature on models for causality, most of which is focused on causal graphs. However, causal graphs are limited when it comes to modeling variables that are deterministically related. In the presence of deterministic relationships there is generally no causal graph that satisfies both the Markov condition and the faithfulness condition. This is an important limitation, since deterministic relationships appear in many applications in mathematics, physics and computer science, and when modeling systems at different levels of abstraction. We introduce Factored Space Models as an alternative to causal graphs which naturally represent both probabilistic and deterministic relationships at all levels of abstraction. Moreover, we introduce structural independence and establish that it is equivalent to statistical independence in every distribution that factorizes over the factored space. This theorem generalizes the classical soundness and completeness theorem for d-separation.
19:00 GMT[11:00 San Francisco, 20:00 London] – Fixing our concepts to understand minds and agency: preliminary results
Mateusz Bagiński
The hermeneutic net is a provisional method for large-scale analysis of concepts and cognitive frameworks relevant to a particular domain of investigation. It aims to examine the roles that various elements play in our thinking, their relationships, and inadequacies, in order to put us in a better position to comprehensively revise our thinking about the domain. Applying this method to the domain of minds and agency was my main focus during the PIBBSS Fellowship. In the presentation I will cover the methodology and trajectory of my project, the results it has produced so far, and how I see its potential contributions to agent foundations and adjacent areas of research.
20:00 GMT – Breakout room with each presenter
Join speaker-specific breakout rooms if you have more questions or want to continue the discussion.
WEDNESDAY (11/09)
17:00 GMT[09:00 San Francisco, 18:00 London] – Features that Fire Together Wire Together: Examining Co-occurence of SAE Features
Matthew A. Clarke
Sparse autoencoders (SAE) aim to decompose activation patterns in neural networks into interpretable features. This promises to let us see what a model is ‘thinking’ and so facilitates understanding, detection and possibly correction of dangerous behaviours. SAE features will be easiest to interpret if they are independent. However, we find that even in large SAEs, features co-occur more than chance. We set out to investigate whether understanding these co-occurrences is necessary for understanding model function, or whether features remain independently interpretable. We find that co-occurrence is rarer in larger SAEs, and reduces in relevance, likely due to feature splitting. When there is co-occurrence, features jointly map interpretable subspaces, e.g. week days, or position in a url. However, only a subset of these can be interpreted by looking at features independently, suggesting that features that fire together may need to be understood as a group to best interpret network behaviour.
18:00 GMT[10:00 San Francisco, 19:00 London] – Minimum Description Length for singular models
Yevgeny Liokumovich
Understanding generalization properties of neural networks is crucial for AI safety and alignment. One approach towards explaining modern models’ remarkable generalization abilities is via the Minimum Description Length Principle, a mathematical realization of the Occam’s razor. For regular models the MDL principle leads to the Bayesian Information Criterion, but neural networks are generally singular. In this talk I will describe a new formula for the MDL for a class of singular models and discuss its implications for the choice of prior and the geometry of the parameter space.
19:00 GMT[11:00 San Francisco, 20:00 London] – Are Neuro-Symbolic Approaches the Path to Safe LLM-Based Agents?
Agustín Martinez-Suñé
Large Language Model (LLM)-based agents are increasingly used to autonomously operate in both digital and physical environments, given natural language instructions. However, ensuring these agents act safely and avoid unintended harmful consequences remains a significant challenge. This project explores how neuro-symbolic approaches might address this issue. Specifically, we will discuss how integrating automated planning techniques with LLMs’ ability to formalize natural language can not only improve these agents’ safety but also help provide measurable safety guarantees.
Heavy-tailed distributions describe systems where extreme events dominate the dynamics. These distributions are prominent in the gradient noise of Stochastic Gradient Descent, and have strong connections to the ability for SGD to generalize. Despite this, it is common in machine learning to assume Normal/Gaussian dynamics, which may impact safety by underestimating the possibility of rare events or sudden changes in behavior or ability. In this talk I explore simulations of ML systems exhibiting heavy-tailed noise, discuss how this noise arises, and how its characteristics may act as a useful statistical signature for monitoring and interpreting ML systems.
21:00 GMT — Breakout room with each presenter
Join speaker-specific breakout rooms if you have more questions or want to continue the discussion.
THURSDAY (12/09)
17:00 GMT[9:00 San Francisco, 18:00 London] – Exploring the potential of formal approaches to emergence for AI safety
Nadine Spychala
Multi-scale relationships in emergent phenomena and complex systems are studied across various disciplines. They explore the unresolved subject of how macro- and micro-scales relate to each other – are they independent, reducible, or is their relationship more complex? Historically, the lack of formalism hindered empirical investigation, but recent research introduced quantitative measures based on information theory to quantify emergence in systems whose components evolve over time. In this talk, I elaborate on how such multi-scale measures can be leveraged to characterize and understand evolving macro-scales in AI. This bears direct relevance for AI safety, as those macro-scales potentially relate to an AI’s behaviour and capabilities – measures of emergence may thus help detect, predict and perhaps even control them. Measuring emergence is new terrain, hence the applicability of measures thereof to empirical data as well as their validity and informativeness form active research areas themselves. During my fellowship, I explored a) the feasibility of applying those measures to neural networks, and b) the value of such an endeavour. I identify important sub-questions and bottlenecks that refine the research agenda, and outline what further progress on quantifying emergence in AI (with decision-making relevance w. r. t. their capabilities) would entail.
18:00 GMT[10:00 San Francisco, 19:00 London] – What I’ve learned as a PIBBSS fellow, and what I plan to do with it
Shaun Raviv
Starting from scratch on AI safety as a journalist with a non-technical background.
19:00 GMT[11:00 San Francisco, 20:00 London] – Searching for indicators of phenomenal consciousness in LLMs: Metacognition & higher-order theory
Euan McLean
It’s time to start the hard grind of testing LMs for indicators of phenomenal consciousness under all the different popular (computational functionalist) theories of mind. We might as well start with higher-order theory (HOT). Some versions of HOT imply that to have phenomenal consciousness, a thing must have the ability to distinguish between reliable internal/mental states from noise. Can language models do this?
20:00 GMT — Breakout room with each presenter
Join speaker-specific breakout rooms if you have more questions or want to continue the discussion.
FRIDAY (13/09)
17:00 GMT[09:00 San Francisco, 18:00 London]– Dynamics of LLM beliefs during chain-of-thought reasoning
Baram Sosis
Chain-of-thought and other forms of scaffolding can significantly enhance the capabilities of language models, making a better understanding of how they work an important goal for safety research. Benchmark accuracy can measure how much scaffolding improves performance, but this does not give us insight into how the LLMs arrive at their answers. I will present the results of my work studying how the internal beliefs of LLMs of different sizes evolve over the course of chain-of-thought reasoning as measured by linear probes.
18:00 GMT[10:00 San Francisco, 19:00 London] – Cultural Evolution of Cooperation in LLMs
Aron Vallinder
Cultural evolution has been a critical driver of the scope and sophistication of cooperation in humans, and may plausibly play a similar role in interactions among advanced AI systems. In this talk, I present results from a simulation study in which LLM agents are assigned Big-5 personalities, generate a strategy, and then play an indirect reciprocity game with cultural transmission of strategy. These results indicate that cultural transmission can indeed drive increases in cooperative behavior, although there is large variation across models. Moreover, Big-5 personality plays a substantial role in determining the level of cooperation.
19:00 GMT[11:00 San Francisco, 20:00 London] – The geometry of in-context learning
Jan Bauer
Reasoning is a deductive process: A conclusion is reached through a series of intermediate states. Recent work has shown that in-context learning capabilities of sequence models can accomodate such a stateful process, termed Chain-of-Thought (CoT). While CoT empirically often leads to superior capabilities, it is unclear what facilitates it on a neural level. For such a setting where neural activations depend on each other, the study of the biological brain has developed normative analyses that explain global structure through geometric manifolds. We here adopt this framework in a simple example that suggests that CoT-like sequence completetion is implemented as extrapolation on a manifold whose geometry has been shaped during in-weight learning. Overall, we argue that the global structure of features can be normatively interpreted by studying the neural geometry, and thereby complement descriptive analyses that focus on local features in isolation.
20:00 GMT — Breakout room with each presenter
Join speaker-specific breakout rooms if you have more questions or want to continue the discussion.
Abstract: The dynamic we identify, ‘auto-intentional agency’, is found in systems which create abstract explanations of their own behaviour — in other words, apply the intentional stance (or something functionally similar) to themselves. We argue that auto-intentional agents acquire distinctive planning capacities distinct from goal-directed systems which, while amenable to being understood via the intentional stance, are not fruitfully understood as applying the intentional stance to themselves. We unpack this notion of auto-intentional agency with reference to hierarchically deep self-models in the active inference framework. We also show how auto-intentional agency dovetails with insights from the philosophy of action and moral psychology. We then show the implications of this distinct form of agency for AI safety. In particular, we argue that auto-intentional agents, in modelling themselves as temporally extended, are likely to have more sophisticated planning capacities and to be more prone to explicit self-preservation and power-seeking than other artificially intelligent systems.
18:30 — Allostasis emergence of auto-intentional agency
George Deane
Abstract: This talk argues that ‘thick’ agency and conscious experience are likely to be co-emergent as systems come to form abstractions of themselves and their own control that they can use for the basis of action selection. I consider this argument by looking at minimal biological systems and then apply consider the possibility of artificial agency and consciousness through this perspective.
19:00 — (Basal) Memory and the Cognitive Light Cone of Artificial Systems
Urte Laukaityte
Abstract: What’s memory got to do with AI? On the face of it – not much, perhaps. However, memory is increasingly conceptualised as crucial in enabling future planning and prediction in biological organisms – capacities which, in artificial systems, worry the AI safety community. One way to flesh out the relationship is in terms of the notion of a cognitive light cone. Specifically, it references the set of “things [the] system can possibly care about” (Levin, 2019). The boundary of this light cone is hence “the most distant (in time and space) set of events” of significance to the system in question – with spatial and temporal distances on the horizontal and vertical axes, respectively. Drawing on the basal cognition framework more broadly, this talk will explore the implications that the cognitive light cone idea, if taken seriously, might have vis-à-vis restricting the space of possible AIs.
19:30 — Searching For a Science of Abstraction
Aysja Johnson
Abstract: I suspect it tends to be easier to achieve goals insofar as you’re better able to exploit abstractions in your environment. To take a simple example: it’s easier to play chess if you have access to an online chess program, rather than just assembly code. But what is it about abstractions, exactly, that makes learning some of them far more useful for helping you achieve goals than others? I’m trying to figure out how to answer this and similar questions, in the hope that they may help us better characterize phenomena like agency and intelligence.
20:00 – Breakout room with each presenter
Join speaker-specific breakout rooms if you have more questions or want to continue the discussion.
WEDNESDAY (20/09)
18:00 — Agent, behave! Learning and Sustaining Social Norms as Normative Equilibria.
Ninell Oldenburg
Abstract: Learning social norms is an inherent, cooperative feature of human societies. How can we build learning agents that do the same, so they cooperate with the human institutions they are embedded in? We hypothesize that agents can achieve this by assuming there is a shared set of rules that most others comply with and enforce while pursuing their individual interests. By assuming shared rules, a newly introduced agent can infer the rules that are practiced by an existing population from observations of compliance and violation. Furthermore, groups of agents can converge to a shared set of rules, even if they initially diverge in their beliefs about the rules. This, in turn, enables the stability of a shared rule system: since agents can bootstrap common knowledge of the rules, this gives them a reason to trust that rules will continue being practiced by others, and hence an incentive to participate in sustaining this normative infrastructure. We demonstrate this in a multi-agent environment which allows for a range of cooperative institutions, including property norms and compensation for pro-social labour.
18:30 — Detecting emergent capabilities in multi-agent AI Systems
Matthew Lutz
Abstract: After summarizing my research on army ants as a model system of collective misalignment, I will discuss potential alignment failure modes that may arise from the emergence of analogous collective capabilities in AI systems (predatory or otherwise). I will present initial results from my research at PIBBSS – designing and conducting experiments to predict and detect emergent capabilities in LLM collectives, informed by social insects, swarm robotics, and collective decision making. I am interested in what fundamentally novel capabilities might emerge in LLM collectives vs. single instances of more powerful models (if any), and how we might design experiments to safely explore this possibility space.
19:00 — An overview of AI misuse risks (and what to do about them)
Sammy Martin
Abstract: One of the most significant reasons why Transformative AI might be dangerous is its ability to make powerful and potentially cheap and easy to proliferate weapons. This talk aims to provide a structured framework for understanding the misuse risks associated with AI-enabled weaponry across three timelines: near-term, intermediate, and long-term. We will explore the gamut of threats, from AI-driven cyberattacks and bioweapons to advanced lethal autonomous weapons and nanotech. I will then explain how an understanding of the dangers we face can inform the overall governance strategies required to mitigate these risks, and outline five general categories of overall solutions to these problems.
19:30 — Tort law as a tool for mitigating catastrophic risk from artificial intelligence
Gabriel Weil
Abstract: Building and deploying potentially dangerous AI systems generates risk externalities that the tort liability system should seek to internalize. I address several practical and doctrinal barriers to fully aligning AI companies’ incentives in order to induce a socially optimal investment in AI safety measures. The single largest barrier is that, under plausible assumptions, most of the expected harm from AI systems comes in truly catastrophic scenarios where the harm would be practically non-compensable. To address this, I propose a form of punitive damages designed to pull forward this expected liability into cases of practically compensable harm. To succeed in internalizing the full risk of legally compensable harm, such punitive damages would need to be available even in the absence of human malice or recklessness. Another key doctrinal change considered is recognizing the training and deployment of advanced AI systems as an abnormally dangerous activity subject to strict liability.
20:00 — Breakout room with each presenter
Join speaker-specific breakout rooms if you have more questions or want to continue the discussion.
THURSDAY (21/09)
18:00 — The role of model degeneracy in the dynamics of SGD
Guillaume Corlouer
Abstract: Stochastic Gradient Descent (SGD) is a fundamental algorithm enabling deep neural networks to learn. We lack a comprehensive understanding of how the training dynamics governs the selection of specific internal representations within these networks, which consequently hampers their interpretability. In the Bayesian paradigm, Singular Learning Theory (SLT) shows that the accumulation of the posterior in a learning machine with a degenerate statistical model – i.e. a singular model – can be influenced by the degree of degeneracy in such models. However, it remains unclear how SLT predictions translate to the deep learning paradigm. In this talk I will present research in progress about the potential of SLT to help elucidate the dynamics of SGD. The talk will begin with a review of SLT and its relevance to AI safety. Subsequently, ongoing experiments looking at the dynamics of SGD on singular toy models will be discussed. Preliminary observations suggest that the degeneracy of statistical models influence the convergence of SGD.
18:30 — A Geometry Viewpoint for Interpretability
Nishal Mainali
Abstract: Computational Neuroscience has fruitfully posited that computations and representations in brains of behaving animals can be understood in terms of the geometric features of neural population activity. This has lead to a shift from circuit search to understanding population geometry directly to understand and theorize about neural system. Can this viewpoint be usefully imported into interpretability? I’ll present some simple initial findings that show geometric regularities in toy LLMs. These regularities can be understood both as non-behavioral measures that might identify model capabilities or as empirical findings in search for a theory. I’ll end with a brief sketch of a further research program this viewpoint suggests.
19:00 — An Overview of Problems in the Study of Language Model Behavior
Eleni Angelou
Abstract: There are at least two distinct ways to approach Language Model (LM) cognition. The first is the equivalent of behavioral psychology for LMs and the second is the equivalent of neuroscience for human brains, i.e., interpretability. I focus on the behavioral study of LMs and discuss some key problems that are observed in attempts to interpret LM outputs. A broader question about studying the behavior of models concerns the potential contribution to solving the AI alignment problem. While it is unclear to what extent LM behavior is indicative of the internal workings of a system, and consequently, of the degree of danger a model may pose, it surely seems that further work in LM behavioral psychology would at least provide some tools for evaluating novel behaviors and informing governance regimes.
19:30 — Breakout room with each presenter
Join speaker-specific breakout rooms if you have more questions or want to continue the discussion.
FRIDAY (22/09)
18:00 — Beyond vNM: Self-modification and Reflective Stability
Cecilia Wood
Abstract: Any tool we use to increase AI safety should be robust to self-modification. I present a formalism of an agent able to self-modify and argue this captures a wide range of goals or preferences beyond standard von Neumann-Morgenstern utility. In particular, I address mild or soft optimization approaches, such as quantilizers or maximising worse case over credal sets, by applying existing preference axiomatisations from economic theory which are more general than the axiomatisation given in the von Neumann-Morgenstern utility theorem.
18:30 — Constructing Logically Updateless Decision Theory
Martín Soto
Abstract: Both CDT and EDT, the two most prominent decision theories, update on all the information they are provided. As a consequence, they present apparently irrational behavior in Parfit’s Hitchhiker and Counterfactual Mugging. These are particular examples of the general phenomenon of dynamic instability: different instantiations of the agent (in different time positions or counterfactual realities) aren’t able to cooperate. Updateless Decision Theory aims to solve dynamic instability completely, by having a single agent-moment decide all future policy. This is straightforward for the case of empirical uncertainty, where we can assume logical omniscience. But for logical uncertainty, we face more complicated tradeoffs, and in fact not even the correct formalization is clear. We propose a formalization using Garrabrant’s Logical Inductors, develop desiderata for UDT, and present an algorithm satisfying most of them. We also explore fundamental impossibilities for certain dynamically stable agents, and sketch ways forward by relaxing dynamic stability.
19:00 — A Mathematical Model of Deceptive Policy Optimization
Tom Ringstrom
Abstract: A pressing concern in AI alignment is the possibility of agents intentionally deceiving other agents for their own gain. However, little is known how this could be achieved in a rapid, deliberative, and real-time manner within the simpler domain of Markov decision processes. I will outline the representations and computations that might be necessary for an agent to intentionally deceive an observer in the domain of discrete-state planning. I will close by formalizing how we might be able to compute the probability that an agent is being intentionally deceptive.
19:30 — Causal approaches to agency and directedness
Brady Pelkey
Abstract: Within AI safety, one way to describe intentional directedness and adaptive behavior comes from the ‘causal incentives’ framework. I’ll discuss some connections this has with work on causal dynamical systems and equilibration. I’ll also explore where these tools might fit into broader debates in philosophy of biology.
20:00 — Breakout room with each presenter
Join speaker-specific breakout rooms if you have more questions or want to continue the discussion.
PIBBSS is a research initiative facilitating work that draws on the parallels between intelligent behavior in natural and artificial systems, and leveraging these insights towards making AI systems safe, beneficial and aligned.
We want to support excellent researchers pursuing “PIBBSS-style” AI alignment research by providing them with tailored longer-term support: a full-time salary, a lively research community and operational support. The initial commitment is 6 months, with potential extensions to 12 or more.
Applications are currently closed. The next round of applications is due to start in Spring 2024.
At PIBBSS (“Principles of Intelligent Behavior in Biological and Social Systems”), our mission is to facilitate AI alignment research that draws on the parallels between intelligent behavior in natural and artificial systems. (For more details, see the section “About PIBBSS and our research priorities” below. )
Since PIBBSS’ inception in 2022, we have run two iterations of our 3-months research fellowships, as well as a reading group, an ongoing speaker series, and a number of research retreats exploring topics in line with PIBBSS’ research interests. Our alumni have gone on to pursue AI alignment research at places like OpenAI, Anthropic, the Alignment of Complex Systems research group, and academia, as well as through independent funding.
While we continue to be excited about the value of those activities, we also believe that there is more value to be had! In particular, we see potential in a program geared towards supporting longer-term, more sustained research efforts in line with PIBBSS’ research bet.
As such, we are delighted to launch our new PIBBSS Affiliate Program.
Goals and key features of the Affiliate Program
At its core, our goal in running this program is to counterfactually help produce substantial, high-quality and legible research progress on problems in AI risk, alignment and governance. The affiliate program aims to do this by supporting promising individual researchers in a highly tailored fashion to scope, develop, and progress on their personal research agendas.
Concrete success scenarios might look like (but are not limited to) affiliates publishing insightful research results; joining the efforts of existing research groups or founding new ones; identifying, scoping, testing and building traction on novel or neglected research avenues; maturing their personal research agenda and secure independent funding; etc.
The PIBBSS Affiliate Program has the following key features:
Affiliates receive a salary of 5’000 USD/month/FTE. We are open to paying higher salaries to exceptional candidates.
We are, by default, looking for affiliates who will be joining the program full-time. That said, we are open to adapting to affiliates’ circumstances where this seems appropriate, such as joining the program in a part-time fashion. We also have the capacity to offer support to people who do not want to leave their current research position.
Affiliates benefit from a range of support structures aimed at helping them focus on and advance in their research endeavors, such as:
Quarterly research retreats bringing together affiliates as well as other alignment researchers from our extended network
Personalized research and strategic support, typically in the form of bi-weekly calls with your point person (with option to adapt frequency of nature of support to affiliate’s needs and preferences)
Access to the PIBBSS research community and extended network, including joining our slack space, tailored introductions, opportunities to participate in workshops, etc.
Financial and administrative support in order to e.g. visit relevant research conferences, get access to base models and compute where required for research, etc.
Access to office space depending on the needs/preferences of affiliates.
Note: A key feature of the program is to provide highly tailored support to our affiliates, according to what counterfactually helps them most make progress on their research. As such, the above list, while representative, is not necessarily fully precise or comprehensive.
The initial duration of the affiliateship is 6 months, at which point we review affiliates for a possible extension to 12 months. In exceptional circumstances, we will make initial offers of 12 months.
Our vision is for the affiliate program to provide a serious long-term home for promising researchers pursuing PIBBSS-style AI alignment research at time scales of 1 year and more, including the possibility of permanent positions.
We want to build on and strengthen the supportive, curious and dedicated culture that PIBBSS has built over the last two years. We care about good epistemic practices while navigating the intricacies of making sense of a complex world and particularly emphasize the value of epistemic pluralism in doing so.
At the current point in time, we orient to the affiliate program with an experimental lens – testing the hypothesis that this program can provide significant value and learning about how exactly we can support our affiliates most effectively. Depending on what we learn, we might scale up or scale down, streamline the program or make it more tailored to individual affiliates, aim to have affiliates for time-limited or for open-ended periods, or do other things which increase the impact of the program.
Requirements of the Affiliate Program
The affiliate program is a very open-ended program with limited requirements. Here is what we asked from our affiliates:
Submitting a quarterly report reflecting on your research progress and prospective research plan.
The reports are meant to help the affiliate by providing an accountability structure and encouraging them to zoom out and reflect on a regular basis. They also help us understand what affiliates have been working on and whether there is anything we can do to support them better going forward.
Participating in our ~quarterly research retreats
A genuine commitment to making progress on questions related to AI safety, alignment and governance, and constructive, truth-seeking and kind participation in our shared epistemic culture
What are we looking for in an ideal affiliate?
Affiliates are selected for being in a strong position to pursue promising PIBBSS-aligned research directions in AI alignment in a highly self-directed fashion. As such, we are looking for (a mix of) the following characteristics:
Scholarly competence/caliber in your domain(s) of expertise
Strong prior understanding of AI Risk & Alignment
Evidence of good research judgment/taste
Promisingness of your proposed line of research
Ability to counterfactually benefit from what PIBBSS can offer
Evidence for good research productivity, including strong self-management skills
Note that, in our experience, great PIBBSS-style researchers have often had non-standard backgrounds. As such, we wish to encourage individuals with varied backgrounds and perspectives to apply.
About PIBBSS and our research priorities
PIBBSS is a research initiative facilitating work that draws on the parallels between intelligent behavior in natural and artificial systems, and leveraging these insights towards making AI systems safe, beneficial and aligned.
In essence, we believe that the study of complex natural and social systems can help us better understand the risk involved in implementing such behaviors in artificial substrates and improve our ability to develop AI systems such that they reliably display desired safety- and governability-relevant properties. (We describe in some more detail the nature of and some background assumptions going into our epistemic bet in this write-up, as well as in this talk).
To provide more concrete pointers towards the type of research we’re excited about, here is a sample of the work that we have supported in the past:
Aiming to develop a principled understanding of the dynamics emerging from multi-agent interactions between AI systems, such as by drawing on the study of collective behavior in army ants or models from evolutionary biology and ecology
Exploring how to include understanding of multi-agent AI dynamics in developing model evaluations [forthcoming]
Developing a more realistic account of values and practical reasoning and exploring their implications for our understanding of the AI alignment problem [forthcoming]
Evaluating the potential and limits of existing legal tools for reducing catastrophic risks from AI
Looking at the philosophy of sciences (and the special sciences) to find ways to make research. efforts. more. productive.
Our fellows and collaborators have brought expertise from many domains to bear on the question of AI alignment, including but not limited to:
Ecology
Evolutionary and developmental biology
Systems biology
Biophysics
Neuroscience
Cognitive science
Linguistics
Legal theory
Political theory
Economic theory
Social theory
Complex systems studies
Cybernetics
Network science
Physics
Philosophy of Science
How to apply
Applications are now open. You can apply by following this link.
We accept new affiliates on a ~quarterly basis. To be considered as part of the next iteration, apply by November 5th.
We expect to hire 2-6 (median 4) affiliates over the coming 6 months.
We expect this round of affiliates to start between mid-December to mid-January.
About the application process: The details of the application process may vary between candidates in virtue of which information we think will most help us make well-informed decisions. In most cases, the initial stage (application form) is followed by one or more interviews to discuss your research in more depth. We may also reach out to your references or, in some cases, ask you to submit a work task of ~2-5h.
PIBBSS Fellowship 2025 Research Management Post
Research Manager - PIBBSS Fellowship 2025
Role Overview
We’re seeking a Research Manager to co-own the planning, execution, and evaluation of our 2025 Fellowship program. The Fellowship brings together researchers from diverse academic backgrounds to work on high-impact AI safety research projects with an AI Safety mentor already established in the field. While each fellow works with a dedicated project mentor, the research manager will ensure fellows have a productive and useful experience by unblocking them in critical moments and helping to guide their professional development.
Ideal Time Commitment & Structure
We are hiring two part time managers, one as above, and one for fewer hours to top-up our capacity. If you cannot commit to as many hours as above, we still encourage you to apply.
Location
Primary Responsibilities
Detailed responsibilities split by fellowship stage is available at this link. This is the second year we are hiring a research director, materials to help you into the role will be provided during the onboarding.
Qualifications
We understand that people come from different disciplines with varied backgrounds, and encourage you to apply if you think you would be a good fit, even if your CV does not check every box.
Compensation & Benefits
Reporting Structure
You will report directly to Dušan D. Nešić (PIBBSS Executive Team, Operations) and will be supported by Lucas Teixeira (PIBBSS Executive Team, Research)
Application Process
To apply, please fill in this form by March 30th EoD AoE.
Questions?
For questions about this role, please contact Dušan D. Nešić at dusan@pibbss.ai.
Fellowship 2025 Research Management Role
Fellowship 2025 Research Management Role
Research Manager – PIBBSS Fellowship 2025
About PIBBSS
PIBBSS is a research initiative aiming to grow the AI Safety field by fostering interdisciplinary engagement with the natural and social sciences. Our mission is to support foundational scientific contributions which would aid in the development of safe and globally beneficial AI systems.
Role Overview
We’re seeking a Research Manager to own (or co-own) the planning, execution, and evaluation of our 2025 Fellowship program. The Fellowship brings together researchers from diverse academic backgrounds to work on high-impact AI safety research for both career transition and direct research impact. While each fellow works with a dedicated project mentor, the research manager will ensure fellows have a productive and useful experience by unblocking them in critical moments and helping to guide their professional development.
Ideal Time Commitment & Structure
We are open to hiring two part time managers, in which case the time would be split accordingly.
Location
Primary Responsibilities
Detailed responsibilities split by fellowship stagetime is available at this link.
Qualifications
We understand that people come from different disciplines with varied backgrounds, and encourage you to apply if you think you would be a good fit, even if your CV does not check every box.
Compensation & Benefits
Reporting Structure
You will report directly to Dušan D. Nešić (PIBBSS Executive Team, Operations) and will be supported by:
Application Process
To apply, please fill in this form by March 10th EoD AoE.
Questions?
For questions about this role, please contact Dušan D. Nešić at dusan@[this website URL]
Symposium ’24
PIBBSS presents:
Summer Symposium 2024
The Symposium presents an opportunity to learn about the work of PIBBSS fellows conducted over the summer program. The program has concluded and recordings are available on our YouTube page.
The symposium took place in person in LISA offices in London, and online over several days in the week of September 9th.
Find a program overview here.
Find the full agenda (including brief descriptions for each talk) below by toggling between the days.
Register at this link. We recommend joining through the Zoom application and making sure that you have the latest version of Zoom to ensure you can join the breakout rooms at the end and chat with the speakers more privately
Click here to add the schedule to your Google calendar.
Agenda
17:00 GMT [09:00 San Francisco, 18:00 London] – Solvable models of in-context learning
Nischal Mainali
18:00 GMT [10:00 San Francisco, 19:00 London] – Factored Space Models: Causality Between Levels of Abstraction
Magdalena Wache
19:00 GMT [11:00 San Francisco, 20:00 London] – Fixing our concepts to understand minds and agency: preliminary results
Mateusz Bagiński
The hermeneutic net is a provisional method for large-scale analysis of concepts and cognitive frameworks relevant to a particular domain of investigation. It aims to examine the roles that various elements play in our thinking, their relationships, and inadequacies, in order to put us in a better position to comprehensively revise our thinking about the domain. Applying this method to the domain of minds and agency was my main focus during the PIBBSS Fellowship. In the presentation I will cover the methodology and trajectory of my project, the results it has produced so far, and how I see its potential contributions to agent foundations and adjacent areas of research.20:00 GMT – Breakout room with each presenter
17:00 GMT [09:00 San Francisco, 18:00 London] – Features that Fire Together Wire Together: Examining Co-occurence of SAE Features
Matthew A. Clarke
18:00 GMT [10:00 San Francisco, 19:00 London] – Minimum Description Length for singular models
Yevgeny Liokumovich
19:00 GMT [11:00 San Francisco, 20:00 London] – Are Neuro-Symbolic Approaches the Path to Safe LLM-Based Agents?
Agustín Martinez-Suñé
20:00 GMT [12:00 San Francisco, 21:00 London] – Heavy-tailed Noise & Stochastic Gradient Descent
Wesley Erickson
21:00 GMT — Breakout room with each presenter
17:00 GMT [9:00 San Francisco, 18:00 London] – Exploring the potential of formal approaches to emergence for AI safety
Nadine Spychala
Measuring emergence is new terrain, hence the applicability of measures thereof to empirical data as well as their validity and informativeness form active research areas themselves. During my fellowship, I explored a) the feasibility of applying those measures to neural networks, and b) the value of such an endeavour. I identify important sub-questions and bottlenecks that refine the research agenda, and outline what further progress on quantifying emergence in AI (with decision-making relevance w. r. t. their capabilities) would entail.
18:00 GMT [10:00 San Francisco, 19:00 London] – What I’ve learned as a PIBBSS fellow, and what I plan to do with it
Shaun Raviv
19:00 GMT [11:00 San Francisco, 20:00 London] – Searching for indicators of phenomenal consciousness in LLMs: Metacognition & higher-order theory
Euan McLean
20:00 GMT — Breakout room with each presenter
17:00 GMT [09:00 San Francisco, 18:00 London]– Dynamics of LLM beliefs during chain-of-thought reasoning
Baram Sosis
18:00 GMT [10:00 San Francisco, 19:00 London] – Cultural Evolution of Cooperation in LLMs
Aron Vallinder
19:00 GMT [11:00 San Francisco, 20:00 London] – The geometry of in-context learning
Jan Bauer
20:00 GMT — Breakout room with each presenter
For any questions, please write to us:
Symposium ’23
PIBBSS presents:
Summer Symposium 2023
The Symposium ‘23 has finished! You can find materials from the event below, and you can find recordings of some talks on our YouTube page playlist.
The symposium has taken place online over several days in the week of September 18th.
Find a program overview here. Find the full agenda (including brief descriptions for each talk) below by toggling between the days.
Learn more about the event here.
Agenda
18:00 — Auto-Intentional Agency and AI Risk
Giles Howdle
18:30 — Allostasis emergence of auto-intentional agency
George Deane
19:00 — (Basal) Memory and the Cognitive Light Cone of Artificial Systems
Urte Laukaityte
19:30 — Searching For a Science of Abstraction
Aysja Johnson
20:00 – Breakout room with each presenter
18:00 — Agent, behave! Learning and Sustaining Social Norms as Normative Equilibria.
Ninell Oldenburg
18:30 — Detecting emergent capabilities in multi-agent AI Systems
Matthew Lutz
19:00 — An overview of AI misuse risks (and what to do about them)
Sammy Martin
19:30 — Tort law as a tool for mitigating catastrophic risk from artificial intelligence
Gabriel Weil
20:00 — Breakout room with each presenter
18:00 — The role of model degeneracy in the dynamics of SGD
Guillaume Corlouer
18:30 — A Geometry Viewpoint for Interpretability
Nishal Mainali
19:00 — An Overview of Problems in the Study of Language Model Behavior
Eleni Angelou
19:30 — Breakout room with each presenter
18:00 — Beyond vNM: Self-modification and Reflective Stability
Cecilia Wood
18:30 — Constructing Logically Updateless Decision Theory
Martín Soto
19:00 — A Mathematical Model of Deceptive Policy Optimization
Tom Ringstrom
19:30 — Causal approaches to agency and directedness
Brady Pelkey
20:00 — Breakout room with each presenter
For any questions, please write to us:
Become a PIBBSS Research Affiliate!
TLDR: PIBBSS is hiring research affiliates.
PIBBSS is a research initiative facilitating work that draws on the parallels between intelligent behavior in natural and artificial systems, and leveraging these insights towards making AI systems safe, beneficial and aligned.
We want to support excellent researchers pursuing “PIBBSS-style” AI alignment research by providing them with tailored longer-term support: a full-time salary, a lively research community and operational support. The initial commitment is 6 months, with potential extensions to 12 or more.
Applications are currently closed. The next round of applications is due to start in Spring 2024.
At PIBBSS (“Principles of Intelligent Behavior in Biological and Social Systems”), our mission is to facilitate AI alignment research that draws on the parallels between intelligent behavior in natural and artificial systems. (For more details, see the section “About PIBBSS and our research priorities” below. )
Since PIBBSS’ inception in 2022, we have run two iterations of our 3-months research fellowships, as well as a reading group, an ongoing speaker series, and a number of research retreats exploring topics in line with PIBBSS’ research interests. Our alumni have gone on to pursue AI alignment research at places like OpenAI, Anthropic, the Alignment of Complex Systems research group, and academia, as well as through independent funding.
While we continue to be excited about the value of those activities, we also believe that there is more value to be had! In particular, we see potential in a program geared towards supporting longer-term, more sustained research efforts in line with PIBBSS’ research bet.
As such, we are delighted to launch our new PIBBSS Affiliate Program.
Goals and key features of the Affiliate Program
At its core, our goal in running this program is to counterfactually help produce substantial, high-quality and legible research progress on problems in AI risk, alignment and governance. The affiliate program aims to do this by supporting promising individual researchers in a highly tailored fashion to scope, develop, and progress on their personal research agendas.
Concrete success scenarios might look like (but are not limited to) affiliates publishing insightful research results; joining the efforts of existing research groups or founding new ones; identifying, scoping, testing and building traction on novel or neglected research avenues; maturing their personal research agenda and secure independent funding; etc.
The PIBBSS Affiliate Program has the following key features:
At the current point in time, we orient to the affiliate program with an experimental lens – testing the hypothesis that this program can provide significant value and learning about how exactly we can support our affiliates most effectively. Depending on what we learn, we might scale up or scale down, streamline the program or make it more tailored to individual affiliates, aim to have affiliates for time-limited or for open-ended periods, or do other things which increase the impact of the program.
Requirements of the Affiliate Program
The affiliate program is a very open-ended program with limited requirements. Here is what we asked from our affiliates:
What are we looking for in an ideal affiliate?
Affiliates are selected for being in a strong position to pursue promising PIBBSS-aligned research directions in AI alignment in a highly self-directed fashion. As such, we are looking for (a mix of) the following characteristics:
Note that, in our experience, great PIBBSS-style researchers have often had non-standard backgrounds. As such, we wish to encourage individuals with varied backgrounds and perspectives to apply.
About PIBBSS and our research priorities
PIBBSS is a research initiative facilitating work that draws on the parallels between intelligent behavior in natural and artificial systems, and leveraging these insights towards making AI systems safe, beneficial and aligned.
In essence, we believe that the study of complex natural and social systems can help us better understand the risk involved in implementing such behaviors in artificial substrates and improve our ability to develop AI systems such that they reliably display desired safety- and governability-relevant properties. (We describe in some more detail the nature of and some background assumptions going into our epistemic bet in this write-up, as well as in this talk).
To provide more concrete pointers towards the type of research we’re excited about, here is a sample of the work that we have supported in the past:
Our fellows and collaborators have brought expertise from many domains to bear on the question of AI alignment, including but not limited to:
How to apply
Applications are now open. You can apply by following this link.
We accept new affiliates on a ~quarterly basis. To be considered as part of the next iteration, apply by November 5th.
We expect to hire 2-6 (median 4) affiliates over the coming 6 months.
We expect this round of affiliates to start between mid-December to mid-January.
About the application process: The details of the application process may vary between candidates in virtue of which information we think will most help us make well-informed decisions. In most cases, the initial stage (application form) is followed by one or more interviews to discuss your research in more depth. We may also reach out to your references or, in some cases, ask you to submit a work task of ~2-5h.