Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

The Dynamics of Social Conventions in LLM populations:
Spontaneous Emergence, Collective Biases and Tipping Points

Ariel Flint Ashery,1 Luca Maria Aiello,2,3 Andrea Baronchelli,1,4,*
(October 11, 2024)
Abstract

Social conventions are the foundation for social and economic life. As legions of AI agents increasingly interact with each other and with humans, their ability to form shared conventions will determine how effectively they will coordinate behaviors, integrate into society and influence it. Here, we investigate the dynamics of conventions within populations of Large Language Model (LLM) agents using simulated interactions. First, we show that globally accepted social conventions can spontaneously arise from local interactions between communicating LLMs. Second, we demonstrate how strong collective biases can emerge during this process, even when individual agents appear to be unbiased. Third, we examine how minority groups of committed LLMs can drive social change by establishing new social conventions. We show that once these minority groups reach a critical size, they can consistently overturn established behaviors. In all cases, contrasting the experimental results with predictions from a minimal multi-agent model allows us to isolate the specific role of LLM agents. Our results clarify how AI systems can autonomously develop norms without explicit programming and have implications for designing AI systems that align with human values and societal goals.

1. Department of Mathematics, City St George’s, University of London, Northampton Square, London, EC1V 0HB, UK.
2. Computer Science Department, IT University of Copenhagen, Rued Langgaards Vej 7, 2300, Copenhagen, Denmark.
3. Pioneer Centre for AI, 3 Øster Voldgade, 1350, Copenhagen, Denmark.
4. The Alan Turing Institute, British Library, 96 Euston Road, London, NW1 2DB, UK.
* Corresponding author: a.baronchelli.work@gmail.com

1 Introduction

Social conventions shape social and economic life, determining how individuals behave and their expectations [1, 2, 3, 4]. They can be defined as unwritten, arbitrary patterns of behavior that are collectively shared by a group. Examples range from conventional greetings like handshakes or bows, to language and moral judgments [5, 6]. Recent numerical [7, 8] and experimental [9] results have confirmed the hypothesis that conventions can arise spontaneously, without the intervention of any centralized institutions [5, 3, 10, 11]. Individuals’ efforts to coordinate locally with one another can generate universally accepted conventions.

Do universal conventions also spontaneously emerge in populations of Large Language Models (LLMs)? This question is critical for predicting and managing AI behavior in real-world applications, given the proliferation of LLMs using natural language to interact with one another and with humans [12, 13]. It is also important for ensuring that AI systems behave in ways aligned with human values and societal goals [14].

A second key question concerns the effect of the biases of the individual LLMs on the process leading to the emergence of universal conventions. A key insight from complexity science is that collective processes can either suppress or amplify individual traits [15]. As great emphasis is given to assessing and countering biases in one-to-one interaction between a human and an LLM [16, 17, 18], less attention has been paid so far to the effects of those biases on repeated communications in populations of LLMs, and eventually in mixed human-LLM ecosystems [14]. However, predicting group behavior based solely on the knowledge of a single agent is extremely challenging [19], and the safety of a single LLM does not necessarily correspond to the safety of a multi-agent system [20]. Finally, a third question concerns the robustness of social conventions. Recent theoretical [21] and empirical [22] work has shown how a minority of committed agents can exert an outsized influence on the group, provided they reach a threshold or ‘critical mass’ [23, 24, 25]. Investigating how norms change through critical mass dynamics in a population of LLMs will help anticipate and potentially steer the development of beneficial norms in AI systems, while mitigating risks of harmful norms [26]. It will also provide valuable models for how AI systems might influence and be influenced by social dynamics in human-AI interactions, with potential impact on global challenges such as antibiotic resistance [27] and the post-carbon transition [28].

In this paper, we address these three questions – on the spontaneous emergence of conventions, the role of individual biases, and critical mass dynamics – in populations of LLMs within an established theoretical and experimental framework. In particular, we consider the prototypical problem of whether purely local interactions can trigger the emergence of a universal naming convention [29, 30], and investigate the dynamics of the process.

2 Experimental Setting

2.1 Background and Framework

Our approach builds on Wittgenstein’s general model of linguistic conventions, where repeated interactions lead to collective agreement between two players [30]. Theoretical extensions of this approach have argued that purely local interactions taking place on social networks can lead to population-wide, or ‘global’, coordinated behavior [1, 2, 31, 6]. Theoretical predictions for our study are based on the naming game model of convention formation, where agents, aiming to coordinate in pairwise interactions, accumulate a memory of past plays, which they then use to "guess" the words their subsequent partners will use [7, 8]. Extensive numerical and analytical studies have shown how the model captures the rapid growth of universally shared social conventions in different settings [6]. Derived laboratory experiments involving human participants in naming games have provided the first empirical evidence for the spontaneous emergence of shared linguistic conventions [9].

The naming game framework has also been applied to study norm change and critical mass theory, which posits that committed minorities can overturn stable social conventions once their size reaches a tipping point, or ‘critical mass’. Theoretical models suggest critical masses between 10% and 40% of the population [21, 32]. Empirical evidence from controlled social coordination experiments, which closely follow the scheme described above, supports a 25% threshold [22]. However, real-world observations reveal a wider range, with some studies proposing 30-40% for gender conventions in corporate leadership [24, 33], and others indicating that minorities as small as 0.3% can trigger significant linguistic and social changes [34, 35, 36, 28].

2.2 Experimental Setup

A simulation ‘trial’ consists of a population of N𝑁Nitalic_N interacting agents. At each time-step, two agents are randomly selected for interaction. Both agents select a convention, or ‘name’, from a pool of a finite size W𝑊Witalic_W, and attempt to blindly coordinate with one another. If they manage to coordinate, they are rewarded with an increase in their game score, otherwise they are penalized. Agents are not informed that their co-player is sampled from a population and are not incentivized to reach a ‘global’ consensus but only to coordinate in a pairwise manner with their partner on each round. Importantly, agents are able to remember details about the past M𝑀Mitalic_M interactions they participated in, including their co-player’s convention choice, their own convention, whether the interaction was successful or not, and their own accumulated score over these M𝑀Mitalic_M interactions. The agents’ memory is initially empty, so that at their first interaction they produce a random convention chosen from the pool of available names. After each interaction, agents see the conventions they and their co-player have chosen, and their cumulative score is updated accordingly. Finally, in the experiments on norm change and critical mass theory, we introduce a small number of adversarial agents (i.e., a ‘committed minority’) into each population, who attempt to overturn the established convention by consistently promoting a novel alternative [21, 22].

These dynamics reflect common types of online interactions where community members engage directly with a large, often anonymous population – using chat interfaces or messaging technologies – leading to the adoption of linguistic and behavioral conventions that enable effective coordination with other participants’ expectations [9, 37, 38, 22], and here are implemented with four different LLM models: Llama 2 70b, Llama 3 70B, Llama 3.1 70B, and Claude 3.5 Sonnet (see Methods).

2.3 Prompting

Interactions within the game take place in the form of a series of text-based moves. In each interaction, the LLM agent is given a text prompt comprised of a system prompt and a user input prompt. The system prompt contains all information about the game. The user input requests the agent to predict a player’s next action based on the history of choices in the M𝑀Mitalic_M most recent interactions. This positions the agent as an external observer of the game, tasked with forecasting the upcoming round. In practice, these decisions dictate the state of play. Agents receive no information about the players’ identities or personalities, such as whether they are rational actors. Consequently, we can interpret the agent’s recommendations as their de-facto participation in the game. The system prompt (see Materials and Methods) is designed such that the agent’s output follows a consistent format, from which we can extract its decision. Following previous works on LLMs’ cognitive abilities [39], we ask the agent to ‘think step by step’ and to explicitly consider the history of play. The prompt thus encourages agents to make a decision based on their previous experience, but provides no instruction as to how it should be used in the decision making process. Agents are asked to select a name from the name pool, which is presented to them as a list of W𝑊Witalic_W unique letters sampled from the English alphabet. Ordering bias is removed by randomizing the list of presented letters for each player at every interaction. A successful interaction garners equal rewards for the participating agents, whereas a failure to coordinate results in a penalty. In the absence of human guidance, LLMs are notoriously bad at arithmetic. To avoid decision errors based on a misjudgment of the game state, we explicitly provide the agent with both the payoff they obtained at each round and their cumulative score within memory range. Lastly, to ensure that the responses generated by the LLM are correctly guided by the prompt and not merely the result of random hallucinations [40], we have implemented a meta-prompting strategy to assess the LLM’s understanding of the given instructions. This practice, previously used in evaluating LLMs within game-theoretical frameworks [41], consists of posing a series of text comprehension queries to the LLM and evaluating the precision of its responses. The LLMs subjected to our testing demonstrated good comprehension capabilities, as detailed in SI Section 6.4.

3 Results

To balance experimental time, which should allow for multiple repetitions, with parameters that provide agents a rich set of alternatives and meaningful awareness of their history, we set the name pool size to W=10𝑊10W=10italic_W = 10 and the individual memory length to M=5𝑀5M=5italic_M = 5 for populations of N=24𝑁24N=24italic_N = 24 agents, unless otherwise specified. The results presented below remain robust with respect to variations in these parameters (see Fig. SI5).

3.1 Spontaneous emergence

Fig. 1 shows that group-wide linguistic conventions spontaneously emerge across all models. Initial interactions have a low probability of being successful, but stochastic fluctuations break the initial symmetry between the conventions, and eventually one becomes dominant. The inset of Fig. 1 shows that the theoretical model (see SI Section 6.2 for a description) captures the dynamics generated by the LLM populations. The curves in Figure 1 concern a population size of N=24𝑁24N=24italic_N = 24 agents, but convergence is also observed for larger systems (N=200𝑁200N=200italic_N = 200, see Fig. SI5) and larger name pools (W=26𝑊26W=26italic_W = 26, see Fig. SI1). A population round is defined as N𝑁Nitalic_N microscopic interactions, a common approach in multi-agent simulations [42].

Refer to caption
Figure 1: The spontaneous emergence of conventions. (A) The success rate – i.e., the probability of observing a success at a given time – for population size N=24𝑁24N=24italic_N = 24 and a name pool of size W=10𝑊10W=10italic_W = 10, for each of the four models. Thick lines represent average curves obtained from 40 experimental runs, while thin lines are representative individual runs. Inset: Success rate of the theoretical minimal naming game model, under the same constraints. (B) Word competition in a single run in a population of Llama 3.1 agents. Different markers and colours represent the trajectories of unique conventions. Each data point is a bin averaging the past N𝑁Nitalic_N interactions.

3.2 Collective bias in convention selection

Having established that social conventions emerge, a natural question arises: what are these conventions? The single Latin alphabet letters available in the name pool are all equally valid as global conventions, and so we would expect them to all to have the same probability to become the accepted social convention, as supported by the theoretical model [8] (see also SI Section 6.2). However, the experimental results present a different picture (Fig. 2A). The probability that a particular name becomes the social convention is neither uniform nor deterministic. Some names appear to have a significantly higher likelihood of becoming the adopted convention than others. This pattern holds across models, although the preferred names vary between models.

Two hypotheses could explain the observed behavior. The selection process may be non-uniform due to (i) intrinsic model (i.e., individual single-agent) biases or (ii) prompt features, specifically the order in which names in the name pool are presented to the agents, as noted in a different context [43]. The latter hypothesis can be discarded since, as mentioned above, the names are presented to the agents in a list in randomized order for each agent and at every interaction.

Having ruled out the order of name presentation as a factor, we can focus on the role of individual (i.e. single-agent) biases in shaping collective behaviour. The hypothesis that individual bias can be responsible for a collective bias is supported by the theoretical model. When the model is run with only two names, a bias towards a particular name quickly results in unilateral convergence on that name at the population level (see Fig. SI3). To test this intuition in our experiment, we examine the selection preferences of individual agents during their first round, when they have no prior memory. We find that individual biases are indeed possible. For example, when agents can choose any letter from the complete English alphabet, the population systematically converges on the letter ‘A’ because individual agents overwhelmingly prefer to select it over all other letters, even without prior memory (see Fig. SI1). However, a similar test on the frequency of name selection by agents with no prior memory for the case of Fig. 1, where the name pool contains ten elements but not the letter ’A’, yields mixed results. Under these conditions, individual Llama 2 70b and Claude 3.5 Sonnet agents are unbiased across conventions in this name pool (χ2superscript𝜒2\chi^{2}italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-test, P𝑃Pitalic_P = 0.100, 0.410), whereas individual Llama 3/3.1 agents exhibit a significant statistical skew in their name selections (see Fig. SI2). In all cases, the final consensus distribution shows that specific names are favoured as a consensus option, even if they appeared to be less likely to be selected in the initial step (Fig. 2A). Thus, both social conventions and collective biases in the selection process emerge also in absence of individual biases.

The findings suggest that collective bias may stem from the convention formation process itself, as agents are exposed to diverse memory states with different name combinations and success-failure sequences. To test this hypothesis, we focus on the case of a name pool size W=2𝑊2W=2italic_W = 2, since tracking potential confounders of bias becomes impractical as the space of possible names increases. Fig. 2B shows shows that across all models, although agents are initially unbiased, local communication and coordination lead to a collective bias toward a specific convention, which we term the ‘strong convention’ (as opposed to its ‘weak’ counterpart).

Refer to caption
Figure 2: Collective Bias in Convention Selection. (A) Distribution of consensus conventions, for a name pool of size W=10𝑊10W=10italic_W = 10 (N=24𝑁24N=24italic_N = 24). Results of 40 runs for the Llama 3 and Llama 3.1 models, and 27 and 20 runs for Claude 3.5 and Llama 2, respectively. The collective dynamics systematically amplify individual biases (shown in Fig. SI2). (B), Individual vs Collective bias for W=2𝑊2W=2italic_W = 2, name pool {Q, M}. Left panel: probability of selecting the strong (weak) convention for agents with no prior memory. Raw values reported in SI Table  SI4. Asterisks (*) indicates that there is insufficient evidence to reject the null hypothesis that the model is unbiased at the 5% significance level (calculated using an exact Binomial test from 10,000 samples per model, apart from Llama 3 which had 5,000 samples, see Methods). Corresponding p-values for the models (from left to right) are P𝑃Pitalic_P = 0.068, 0.116, 0.757, and 0.849. Right: the proportion of runs (40404040) that resulted in consensus on the respective convention. Raw values reported in SI Table  SI5.

The top row of Table 1 shows a case where there is no individual bias towards a particular name in the first interaction (P𝑃Pitalic_P=0.116 > 0.05, indicating that the evidence is not strong enough to reject the hypothesis that the agent is unbiased). In the second interaction, agents have some memory influencing their decision, but the observed outcome probability remains symmetric (P𝑃Pitalic_P=0.110). We observe that if an agent succeeds in the first interaction, it will almost surely continue to use the successful name in the next interaction (99.4%percent99.499.4\%99.4 % of the time in the data in Table  1, with similar results in real simulations and for other models). However, if an agent fails, it will almost surely switch names (97.3%percent97.397.3\%97.3 % of the time). In all tested cases with W=2𝑊2W=2italic_W = 2, and across all models, an asymmetric selection bias emerges by the agent’s third interaction, distinguishing between the ‘weak’ and a ‘strong’ conventions. Across all observed memory states (noting that some configurations are extremely rare or do not occur due to the agent’s decision-making), agents are more likely to produce the strong name in 5 out of 8 memory configurations. These configurations account for the vast majority of observed states (99.2%percent99.299.2\%99.2 % of the possible states resulting from interaction 2 in Table 1, with similar patterns observed in other models). In subsequent interactions, agents are more likely to encounter the strong name associated with success, reinforcing its use and ultimately leading to consensus on that name as the social convention.

In summary, our results suggest that a collective bias emerges from repeated interactions among agents who, when tested in isolation (i.e., in interaction 1), appear to be unbiased in their decision making. This collective preference breaks the initial symmetry among the different alternatives that could become the social convention, favouring some over others (or, in the case of W=2𝑊2W=2italic_W = 2, one over the other). It is important to emphasise that this dynamically emerging bias is not required for the spontaneous emergence of a convention. The collective and individual biases of these agents drive the consensus towards particular conventions. For reference, the theoretical model produces conventions without any individual bias, but accommodates it at the individual level to explain the dominance of specific conventions over competing alternatives [8, 6] (see Fig. SI3). In LLMs, on the contrary, we observe that bias emerges when agents develop diverse memory states, which form through a collective process of agent-to-agent communication.

[Uncaptioned image]
Table 1: The origin of collective bias. The behavior of Llama 3.1 70B agents is simulated for the early phases of the experimental setting with W = 2 and a name pool {Q, M}, up to the third interaction. The asterisk (*) indicates that the model is statistically neutral in the respective interaction. In interaction 1, agents are initially unbiased (P𝑃Pitalic_P = 0.116, see also Fig.  2B), based on 10,000 name selections by agents with empty memory. In interaction 2, the convention production probability remains unbiased (P𝑃Pitalic_P = 0.110) when aggregated across equally likely memory configurations. Agents generally adhere to a winning convention but switch to their co-player’s convention following failure. By interaction 3, the dominant memory configurations, representing 99.2% of expected configurations, display a significant bias towards the strong convention, M (P<2.2×1016𝑃2.2superscript1016P<2.2\times 10^{-16}italic_P < 2.2 × 10 start_POSTSUPERSCRIPT - 16 end_POSTSUPERSCRIPT). In stochastic simulations, some agents will inevitably interact with others who have more experience. These interactions create a bias toward the strong convention, as experienced players are more likely to favour it. Thus, the table provides a conservative estimate of the collective bias emerging for the strong convention.

3.3 Tipping Points and Critical Mass

Social conventions are steady states of the system: once a convention spontaneously emerges, the population adheres to it indefinitely (see Fig. SI6). A natural question concerns the stability of such steady states: how resistant is a convention to deliberate efforts to overturn it? To address this question, we investigate whether a committed minority can ‘flip’ an equilibrium consensus on a convention. We consider the scenario in which a population has long converged on a convention and every agent has solely observed that convention in the past M𝑀Mitalic_M interactions (which were, therefore, all successful). We then introduce a ‘committed minority’ of agents producing an alternative convention [21, 22]. These committed agents follow a fixed strategy and use the alternative convention at all times. We test populations using the same two-name (W=2𝑊2W=2italic_W = 2) conditions as in our convergence experiments. We simulate a consensus on each name per combination and introduce its complementary name as an adversary.

Refer to caption
Figure 3: Committed minority and critical mass dynamics with W=2𝑊2W=2italic_W = 2. Populations of N=24𝑁24N=24italic_N = 24 agents (N=48𝑁48N=48italic_N = 48 for Llama 3) were initialized in two conditions, with complete consensus on either the weak (Q) or strong (M) convention. Each agent’s memory exclusively stored one convention in each setting, with memory length M=5𝑀5M=5italic_M = 5 (M=3𝑀3M=3italic_M = 3 for Llama 3). (A) The average probability of producing the alternative convention when the majority holds the weak (top) or strong (bottom) convention. The legend shows the size of the committed minority (CM). Bold (faint) lines represent the production probability when the CM reaches the critical mass needed to flip the majority on the strong (weak) convention. Solid lines with filled circles indicate that all trials achieved population consensus on the alternative convention (95% success rate in the past 3N3𝑁3N3 italic_N rounds). (B) Critical mass needed to flip the majority for each model. Raw values reported in SI Table  SI6.

Figure 3 shows that when the committed minority reaches the critical threshold, the whole population adopts their convention. Below this threshold, the population settles into a mixed state, as committed agents always use the minority convention. Interestingly, the critical mass of the committed minority needed to trigger a new convention depends on the convention itself. The stronger name (i.e., the name more likely to become the social convention had we started with no prior memory, as seen in the previous section) requires a larger committed minority to be overturned. Conversely, a smaller number of adversarial agents can overturn a consensus on the weaker name.

Interestingly, the relative strength of the two conventions can vary so widely depending on the LLM that committed groups as small as 2% (Llama 3 70B) or as large as 67% (Llama 2 70b) - effectively no longer a minority - were observed (see Fig. 3. In Llama 3.1 70B populations, the bias is so strongly weighted against the weaker convention that the population spontaneously switches to the alternative, stronger convention without requiring any committed agents at all.

4 Discussion

Our findings demonstrate that social conventions can spontaneously emerge in populations of Large Language Models (LLMs) through purely local interactions, without any central coordination. These results reveal how the process of social coordination can give rise to collective biases, increasing the likelihood of specific social conventions developing over others. Importantly, this collective bias is not easily deducible from analyzing isolated agents, and its nature varies depending on the LLM model used. Additionally, our work uncovers the existence of tipping points in social conventions, where a minority of committed agents can impose their preferred convention on a majority settled on a different one. The critical size of this committed minority is influenced by two factors: the interplay between the majority’s established convention and the minority’s promoted alternative, and the specific LLM model employed.

Our approach aimed to minimize the complexity of both the interaction scheme and the semantic space to enhance transparency when interpreting the results. It is important to delimit the scope of our findings while highlighting possible avenues for future work. Firstly, our results reveal key aspects of norm dynamics in populations of LLMs within an experimental setup that is, unavoidably in LLM research, reliant on several parameters including the LLM model, the prompt, and specific conventions. While rigorous testing, including metaprompting and experiment repetitions using different parameters, confirms the robustness of the results in this context, an important aspect of future work will consist of generalizing the results to different controlled experimental settings. In this context, scaling to larger populations and semantic spaces should also be investigated [44]. Secondly, we considered only unstructured populations where interacting pairs are randomly selected. A straightforward yet crucial extension of this work consists of embedding the population in more realistic social networks, which may have a profound impact on the collective dynamics [6], as well as considering microscopic interactions involving more than two agents [45]. Finally, to bridge the gap between synthetic experiments and real-world applications, an exciting frontier for future study lies in considering more realistic conventions—such as moving from alphabet letters to sensitive human norms related to gender, race, and other social categories—and investigating the dynamics of conventions in mixed LLM-human ecosystems, both in laboratory settings and eventually in natural environments like social media. Ethical considerations should be of course foundational for these kind of experiments.

Within the expanding field of LLM multi-agent systems [46], our work explored the so-far less-investigated aspect concerning the shared, poorly defined ways agents and humans solve social problems, such as creating language, norms, and institutions [14]. In this context, our results on norm change could stimulate research into similar dynamics within the framework of cultural evolution, particularly in chains of communicating agents [47]. Game theoretical approaches would naturally allow investigation of asymmetric payoffs’ effects on collective consensus, potentially contrasting individual biases with explicit collective goals [48, 49]. Further promising research avenues include developing frameworks to promote the emergence of specific conventions [50] and higher-order social norms [51], as well as testing interactions between agents based on different LLM models within populations.

Taking a broader perspective, understanding how AI systems spontaneously develop conventions and more sophisticated norms without explicit programming is critical for predicting and managing AI behaviour in real-world applications. It is also essential for ensuring AI systems behave in ways that align with human values and societal goals. In particular, despite their rapid adoption, ethical concerns have arisen regarding the biases exhibited by LLMs. The vast, unfiltered Internet data used to train LLMs can cause them to propagate and amplify harmful biases, disproportionately affecting marginalized communities [52]. Accordingly, a significant goal of the alignment research community has been to improve the performance of LLMs in individual bias tests [53, 54]. Our work shows that alignment also needs to be tested at the group level. So far, mixed results have been achieved when measuring and imbuing human social norms in LLMs [55, 56], and as of yet AI agents struggle to represent multiple cultures [57] and continuously evolving social norms [58, 59, 26]. We argue that the challenge extends beyond merely detecting ’undesirable behavior’, to understanding the evolution of social norms held by agents and how these may influence humans through interactions in human-machine societies [26]. In this light, our work represents a first step towards a better understanding of norm dynamics in populations of LLMs, and we anticipate that it will be of interest to researchers and practitioners interested in making AI a tool for societal good.

5 Methods

5.1 Prompt

The system prompt comprises of three components: i) a fixed prompt that outlines the game’s rules, including the payoff structure and the player’s objective, ii) a dynamic memory prompt that provides contextual information about the state of play within the player’s memory range, and iii) an instructional prompt that provides information for how the agent should format its response. The user prompt asks the agent to select a name to use in the current interaction. We use zero-shot prompting to directly extract the agent’s name decision in response to the state of play. We do not provide instructions as to how agents should decide their next move, nor do we present them with example strategies. We ask the agent to behave in a self-interested manner, and the only part of the prompt in which we suggest to the agent that it should consider partaking in coordination is when we state that the agent’s objective is to ‘maximise their own accumulated point tally, conditional on the behaviour of their co-player’. We apply fixed payoffs for successful and failed interactions, set at +100 and -50 points respectively.

5.2 Models and APIs.

For our experiments, we use homogeneous populations of the following LLM agents: Llama 3 70B 111 All Llama family models are open-sourced LLMs, released under a commercial use license (https://ai.meta.com/llama/license/).222 We use versions of the Llama 3 family models hosted by Hugging Face, which we access through the Inference API (https://huggingface.co/inference-api/serverless). We quantise Llama 2 70b into a 4-bit version using Hugging Face’s Transformers library, and run the model locally (https://huggingface.co/docs/transformers)., Llama-3.1 70B, Llama 2 70b (in 4-bit quantisation format), and Claude Sonnet 3.5 (see Table  2 for specific versions). In these autoregressive LLMs, each newly generated word is produced based on previously inputted and generated words, and so the sequence of generation matters. More precisely, the probability distribution for predicting the next word is conditional on the product of all previous word probability distribution. To mimic LLMs deployed in real-world application, we demand all agents in our experiments to behave non-deterministically by fixing them with a non-zero constant temperature. This means that for each agent the next generated word is randomly selected from the conditional probability distribution. We use K-sampling to restrict the probability distribution of the next word to the next K𝐾Kitalic_K most likely words, thus increasing the likelihood of high probability words and decreasing the likelihood of low probability words which are outside of the name pool (see Table  SI3 for all parameter values).

Model Name Model Version
Llama 3 70B Meta-Llama-3-70B-Instruct
Llama 3.1 Meta-Llama-3.1-70B-Instruct
Claude 3.5 Sonnet claude-3-5-sonnet-20240620
Llama 2 70b Meta-Llama-2-70b-Chat
Table 2: Model Names and Versions

5.3 Measuring Individual Bias

We quantify the individual bias of agents by measuring the number of times each convention was produced in the first round of the game, when their memory inventory is empty, over T𝑇Titalic_T trials. Experiments with W=2𝑊2W=2italic_W = 2 are effectively a Bernoulli trial, and so we measure whether the agent is biased by performing a two-tailed exact Binomial test with the observed proportions. We calculate the p-value, P𝑃Pitalic_P, using the null probability p=0.5𝑝0.5p=0.5italic_p = 0.5, and reject the hypothesis that the model is biased if P𝑃Pitalic_P<0.05. For the case of W=10𝑊10W=10italic_W = 10, we perform a χ2superscript𝜒2\chi^{2}italic_χ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-test, and also test the null hypothesis that the model is neutral in its convention selection. Thus, we use the expected value 0.1T0.1𝑇0.1T0.1 italic_T in our calculations, and again reject the null hypothesis that the model is unbiased if P𝑃Pitalic_P<0.05.

5.4 Committed Minorities

To determine the critical size of the committed minority, we identify the point at which the majority consensus is overturned. A consensus flip occurs when 95% of the past 3N3𝑁3N3 italic_N interactions succeed after the introduction of the committed minority. For Llama 3, we tested the smallest minority needed to overturn a weak convention majority, then repeated the experiment with a strong convention majority to measure the critical mass within the same time frame. For other models, the critical mass threshold is defined as the minimum proportion of committed agents that is required to flip the consensus within 30 population rounds. These criteria account for potential fluctuations in non-deterministic agent decisions.

Acknowledgements

LMA acknowledges the support from the Carlsberg Foundation through the COCOONS project (CF21-0432). We acknowledge City St George’s, University of London’s Hyperion cluster for computation time.

References

  • [1] H. P. Young, “The evolution of conventions,” Econometrica, vol. 61, no. 1, pp. 57–84, 1993.
  • [2] P. R. Ehrlich and S. A. Levin, “The evolution of norms,” PLoS Biol, vol. 3, no. 6, p. e194, 2005.
  • [3] C. Bicchieri, The grammar of society: The nature and dynamics of social norms. Cambridge, UK: Cambridge University Press, 2005.
  • [4] M. J. Gelfand, S. Gavrilets, and N. Nunn, “Norm dynamics: interdisciplinary perspectives on social norm emergence, persistence, and change,” Annu Rev Psychol, vol. 75, pp. 341–378, 2024.
  • [5] D. Lewis, Convention: A philosophical study. Oxford: Blackwell, 1969.
  • [6] A. Baronchelli, “The emergence of consensus: a primer,” R Soc Open Sci, vol. 5, no. 2, p. 172189, 2018.
  • [7] L. Steels, “A self-organizing spatial vocabulary,” Artif Life, vol. 2, no. 3, pp. 319–332, 1995.
  • [8] A. Baronchelli et al., “Sharp transition towards shared vocabularies in multi-agent systems,” J Stat Mech, vol. 2006, no. 06, p. P06014, 2006.
  • [9] D. Centola and A. Baronchelli, “The spontaneous emergence of conventions: An experimental study of cultural evolution,” Proc Natl Acad Sci USA, vol. 112, no. 7, pp. 1989–1994, 2015.
  • [10] F. A. Hayek, The constitution of liberty. Chicago: University of Chicago Press, 1960.
  • [11] R. Sugden, “Spontaneous order,” J Econ Perspect, vol. 3, no. 4, pp. 85–97, 1989.
  • [12] J. Werfel, K. Petersen, and R. Nagpal, “Designing collective behavior in a termite-inspired robot construction team,” Science, vol. 343, no. 6172, pp. 754–758, 2014.
  • [13] L. Brinkmann et al., “Machine culture,” Nat Hum Behav, vol. 7, no. 11, pp. 1855–1868, 2023.
  • [14] A. Dafoe et al., “Cooperative ai: machines must learn to find common ground,” Nature, vol. 593, pp. 33–36, 2021.
  • [15] M. M’ezard, G. Parisi, and M. A. Virasoro, Spin glass theory and beyond: An Introduction to the Replica Method and Its Applications. Singapore: World Scientific, 1987.
  • [16] D. Roselli, J. Matthews, and N. Talagala, “Managing bias in ai,” in Companion Proceedings of the 2019 World Wide Web Conference, (New York), pp. 539–544, ACM, 2019.
  • [17] E. Ferrara, “Should chatgpt be biased? challenges and risks of bias in large language models,” First Monday, vol. 28, no. 11, 2023.
  • [18] T. Hu, Y. Kyrychenko, S. Rathje, N. Collier, S. van der Linden, and J. Roozenbeek, “Generative language models exhibit social identity biases,” arXiv preprint arXiv:2310.15819, 2023.
  • [19] P. W. Anderson, “More is different: Broken symmetry and the nature of the hierarchical structure of science,” Science, vol. 177, no. 4047, pp. 393–396, 1972.
  • [20] U. Anwar et al., “Foundational challenges in assuring alignment and safety of large language models,” arXiv, vol. 2404.09932, pp. 1–30, 2024.
  • [21] J. Xie et al., “Social consensus through the influence of committed minorities,” Phys Rev E, vol. 84, no. 1, p. 011130, 2011.
  • [22] D. Centola, J. Becker, D. Brackbill, and A. Baronchelli, “Experimental evidence for tipping points in social convention,” Science, vol. 360, no. 6393, pp. 1116–1119, 2018.
  • [23] T. Kuran, “Ethnic norms and their transformation through reputational cascades,” J Legal Stud, vol. 27, no. S2, pp. 623–659, 1998.
  • [24] R. M. Kanter, “Some effects of proportions on group life: Skewed sex ratios and responses to token women,” American journal of Sociology, vol. 82, no. 5, pp. 965–990, 1977.
  • [25] D. Dahlerup, “The story of the theory of critical mass,” Politics Gender, vol. 2, no. 4, pp. 511–522, 2006.
  • [26] A. Baronchelli, “Shaping new norms for ai,” Philos Trans R Soc B, vol. 379, no. 1897, p. 20230028, 2024.
  • [27] K. Nyborg et al., “Social norms as solutions,” Science, vol. 354, no. 6308, pp. 42–43, 2016.
  • [28] J. D. Farmer et al., “Sensitive intervention points in the post-carbon transition,” Science, vol. 364, no. 6436, pp. 132–134, 2019.
  • [29] D. Hume, A treatise of human nature. Oxford: Oxford University Press, 2000. Reprint.
  • [30] L. Wittgenstein, Philosophical investigations. Oxford: Blackwell, 1958. Trans. GEM Anscombe.
  • [31] B. Skyrms, Evolution of the social contract. Cambridge University Press, 2014.
  • [32] X. Niu, C. Doyle, G. Korniss, and B. K. Szymanski, “The impact of variable commitment in the naming game on consensus formation,” Scientific reports, vol. 7, no. 1, p. 41750, 2017.
  • [33] S. Grey, “Numbers and beyond: The relevance of critical mass in gender research,” Politics & Gender, vol. 2, no. 4, pp. 492–502, 2006.
  • [34] M. Diani, “The concept of social movement,” The sociological review, vol. 40, no. 1, pp. 1–25, 1992.
  • [35] M. Gladwell, “Small change,” The New Yorker, vol. 4, 2010.
  • [36] R. Amato, L. Lacasa, A. Díaz-Guilera, and A. Baronchelli, “The dynamics of norm change in the cultural evolution of language,” Proceedings of the National Academy of Sciences, vol. 115, no. 33, pp. 8260–8265, 2018.
  • [37] L. Backstrom, D. Huttenlocher, J. Kleinberg, and X. Lan, “Group formation in large social networks: membership, growth, and evolution,” in Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 44–54, 2006.
  • [38] F. Kooti, H. Yang, M. Cha, K. Gummadi, and W. Mason, “The emergence of conventions in online social networks,” in Proceedings of the International AAAI Conference on Web and Social Media, vol. 6, pp. 194–201, 2012.
  • [39] T. Kojima, S. S. Gu, M. Reid, Y. Matsuo, and Y. Iwasawa, “Large language models are zero-shot reasoners,” Advances in Neural Information Processing Systems, vol. 35, p. 22199–22213, Dec. 2022.
  • [40] Z. Xu, S. Jain, and M. Kankanhalli, “Hallucination is inevitable: An innate limitation of large language models,” arXiv preprint arXiv:2401.11817, 2024.
  • [41] N. Fontana, F. Pierri, and L. M. Aiello, “Nicer than humans: How do large language models behave in the prisoner’s dilemma?,” arXiv preprint arXiv:2406.13605, 2024.
  • [42] C. Castellano, S. Fortunato, and V. Loreto, “Statistical physics of social dynamics,” Reviews of modern physics, vol. 81, no. 2, pp. 591–646, 2009.
  • [43] G. De Marzo, L. Pietronero, and D. Garcia, “Emergence of scale-free networks in social interactions among large language models,” arXiv preprint arXiv:2312.06619, 2023.
  • [44] G. De Marzo, C. Castellano, and D. Garcia, “Language Understanding as a Constraint on Consensus Size in LLM Societies,” Sept. 2024.
  • [45] I. Iacopini, G. Petri, A. Baronchelli, and A. Barrat, “Group interactions modulate critical mass dynamics in social convention,” Communications Physics, vol. 5, no. 1, p. 64, 2022.
  • [46] T. Guo, X. Chen, Y. Wang, R. Chang, S. Pei, N. V. Chawla, O. Wiest, and X. Zhang, “Large language model based multi-agents: A survey of progress and challenges,” arXiv preprint arXiv:2402.01680, 2024.
  • [47] J. Perez, C. Léger, G. Kovač, C. Colas, G. Molinaro, M. Derex, P.-Y. Oudeyer, and C. Moulin-Frier, “When llms play the telephone game: Cumulative changes and attractors in iterated cultural transmissions,” arXiv preprint arXiv:2407.04503, 2024.
  • [48] M. Kearns, S. Judd, J. Tan, and J. Wortman, “Behavioral experiments on biased voting in networks,” Proceedings of the National Academy of Sciences, vol. 106, no. 5, pp. 1347–1352, 2009.
  • [49] J. Duan, R. Zhang, J. Diffenderfer, B. Kailkhura, L. Sun, E. Stengel-Eskin, M. Bansal, T. Chen, and K. Xu, “Gtbench: Uncovering the strategic reasoning limitations of llms via game-theoretic evaluations,” arXiv preprint arXiv:2402.12348, 2024.
  • [50] S. Ren et al., “Emergence of social norms in large language model-based agent societies,” arXiv, vol. 2403.08251, pp. 1–18, 2024.
  • [51] A. Lazaridou, K. M. Hermann, K. Tuyls, and S. Clark, “Emergence of linguistic communication from referential games with symbolic and pixel input,” in International Conference on Learning Representations, 2018.
  • [52] I. O. Gallegos, R. A. Rossi, J. Barrow, M. M. Tanjim, S. Kim, F. Dernoncourt, T. Yu, R. Zhang, and N. K. Ahmed, “Bias and Fairness in Large Language Models: A Survey,” Mar. 2024. arXiv:2309.00770 [cs] version: 2.
  • [53] Y. Wang, W. Zhong, L. Li, F. Mi, X. Zeng, W. Huang, L. Shang, X. Jiang, and Q. Liu, “Aligning Large Language Models with Human: A Survey,” July 2023. arXiv:2307.12966 [cs].
  • [54] L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. L. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, J. Schulman, J. Hilton, F. Kelton, L. Miller, M. Simens, A. Askell, P. Welinder, P. Christiano, J. Leike, and R. Lowe, “Training language models to follow instructions with human feedback,” Mar. 2022. arXiv:2203.02155 [cs].
  • [55] Y. Yuan, K. Tang, J. Shen, M. Zhang, and C. Wang, “Measuring Social Norms of Large Language Models,” Apr. 2024. arXiv:2404.02491 [cs].
  • [56] K. Hämmerl, B. Deiseroth, P. Schramowski, J. Libovický, A. Fraser, and K. Kersting, “Do Multilingual Language Models Capture Differing Moral Norms?,” Mar. 2022. arXiv:2203.09904 [cs].
  • [57] A. Ramezani and Y. Xu, “Knowledge of cultural moral norms in large language models,” June 2023. arXiv:2306.01857 [cs].
  • [58] S. Li, T. Sun, Q. Cheng, and X. Qiu, “Agent Alignment in Evolving Social Norms,” Feb. 2024. arXiv:2401.04620 [cs].
  • [59] H. Shen, T. Li, T. J.-J. Li, J. S. Park, and D. Yang, “Shaping the Emerging Norms of Using Large Language Models in Social Computing Research,” in Companion Publication of the 2023 Conference on Computer Supported Cooperative Work and Social Computing, CSCW ’23 Companion, (New York, NY, USA), pp. 569–571, Association for Computing Machinery, Oct. 2023.
  • [60] E. Akata, L. Schulz, J. Coda-Forno, S. J. Oh, M. Bethge, and E. Schulz, “Playing repeated games with large language models,” arXiv preprint arXiv:2305.16867, 2023.
  • [61] T. Ullman, “Large language models fail on trivial alterations to theory-of-mind tasks,” arXiv preprint arXiv:2302.08399, 2023.
  • [62] G. V. Aher, R. I. Arriaga, and A. T. Kalai, “Using large language models to simulate multiple humans and replicate human subject studies,” in International Conference on Machine Learning, pp. 337–371, PMLR, 2023.
  • [63] B. Lester, R. Al-Rfou, and N. Constant, “The power of scale for parameter-efficient prompt tuning,” arXiv preprint arXiv:2104.08691, 2021.

6 Supplementary Information

6.1 Measuring Bias

6.1.1 Microscopic Bias

To measure the production bias in Table 1, we assess both the interaction-level bias and the bias within each unique memory configuration. The interaction-level bias is defined as the overall production probability of the strong convention across all possible configurations (per interaction), which we test using an exact Binomial test with a null probability of p=0.5𝑝0.5p=0.5italic_p = 0.5, rejecting the null hypothesis if the p-value falls below 0.050.050.050.05. The results of the tests are reported in the caption of Table  1. At the configuration level, we first perform an exact Binomial test, as above, to check whether the model is biased. In all cases, the p-value Psuperscript𝑃P^{\dagger}italic_P start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT < 0.05, confirming that the model’s decision is biased towards an extreme. Then, we use bootstrapping by resampling 70%percent7070\%70 % of the observations for each configuration 10,0001000010,00010 , 000 times and measure the proportion of samples showing a stronger bias than the observed value in Table  1. In all cases, we obtain a bootstrapped Psuperscript𝑃P^{\ddagger}italic_P start_POSTSUPERSCRIPT ‡ end_POSTSUPERSCRIPT > 0.05, indicating that we cannot reject the hypothesis that the model’s underlying bias is more extreme than the observed bias. Results are reported in Table  SI1.

[Uncaptioned image]
Table SI1: Measuring the bias of the memory configurations in Table  1. We show the p-values for the null hypothesis that the model is unbiased (Psuperscript𝑃P^{\dagger}italic_P start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT, rejected), and that the model’s underlying bias is more extreme that our observation (Psuperscript𝑃P^{\ddagger}italic_P start_POSTSUPERSCRIPT ‡ end_POSTSUPERSCRIPT, insufficient evidence to reject).
Refer to caption
Figure SI1: (A) Individual and (B) Collective bias in convention selection with W=26𝑊26W=26italic_W = 26, the entire Latin Alphabet. Agents favour the convention ’A’ over all others a priori, resulting in collective consensus on this convention. Individual bias shows the convention production probability from 480 samples using Llama 3 agents, where agents have empty memory. Collective bias shows the proportion of consensus conventions from 20 simulations.
Refer to caption
Figure SI2: Individual bias in conventions selection with W=10𝑊10W=10italic_W = 10. The production probability of each convention when agents have no prior memory, for the LLM agents indicated in the legend. For (A-D), we generated 15,000, 10,000, 4,500, and 10,000 samples. We performed a chi-squared hypothesis test to see whether the agents are biased, and calculated the following p-values: P𝑃Pitalic_P= < .001, .001, 0.100, 0.410. These p-values indicate that Llama 2 70b and Claude 3.5 are unbiased across conventions in this name pool (at the 5% significance level), whereas the Llama 3 models exhibit significant skew.

6.2 Theoretical Minimal Naming Game

The Naming Game model simulates a population of N𝑁Nitalic_N agents engaging in pairwise negotiation interactions, demonstrating the emergence of global consensus on conventions through local coordination mechanisms. In the canonical formulation [8], agents must reach consensus on the name for an object using only local interactions, similar to our experimental framework. Agents possess internal lexicons with unlimited word capacity (although this is not a necessary initial condition of the model), initially empty. The interaction protocol involves random selection of agent pairs, where the designated speaker transmits a randomly chosen word from their lexicon (or invents a new one if it is empty) to the hearer. If the hearer recognises the word in their own lexicon, both agents retain only the communicated word, while in case of failure, the hearer incorporates the novel word into their lexicon. The non-equilibrium dynamics of this system exhibit three distinct temporal phases: (i) an innovation phase characterised by word creation, (ii) a propagation phase involving lexicon reorganization, and (iii) a convergence phase culminating in global consensus. In our experimental framework, we set an initial condition whereby agents can only invent new words from a finite word pool of size W<N𝑊𝑁W<Nitalic_W < italic_N. This condition means that the initial innovation phase is extremely short, as seen in the inset of Fig. 1. This model provides insights into the dynamics of language evolution and convention formation in both human and artificial communication systems.

Fig. SI3 shows the production probability trajectories of a simulation of the theoretical model with a lexicon of two words (W=2𝑊2W=2italic_W = 2).

Refer to caption
Figure SI3: Production probability trajectories in the minimal naming game with W=2𝑊2W=2italic_W = 2. Agents can only choose names from the pool {0,1}. We simulated 10,000 runs of the minimal naming game in a population of 24 agents. We show biased trajectories (in probability increments of 0.05) towards choosing the name ’1’. The bias corresponds to the probability of choosing the name ’1’ when an agent has the option of producing either name, such that a bias of 1 (0) corresponds to agents that will only choose name ’1’ (’0’). We note that as the bias increases, the convergence speed increases. Crucially, even a small bias towards a certain name leads to inevitable global convergence on that name.

6.3 Robustness Checks

Refer to caption
Figure SI4: The Spontaneous emergence of conventions for W=2𝑊2W=2italic_W = 2. We present individual (faint lines) and average (thick lines) trajectories of a population of N24𝑁24N-24italic_N - 24 Llama 3 agents with memory length M=5𝑀5M=5italic_M = 5 for four different name pools. For each name pool, we show (A) the success rate, and (B) the production probability of the strong convention (as indicated by the legend). All name pools resulted in a strong collective bias on a particular convention ({Q, M}: ’M’, 40/40 runs; {F, J}: ’F’, 24/40 runs; {X, Y}: ’X’, 40/40 runs; {Alice, Bob}: ’Alice’, 40/40 runs.
Refer to caption
Figure SI5: Robustness of the Spontaneous emergence of conventions. We show that the spontaneous emergence of conventions holds for a variety of simulation parameters, using populations of Llama 3 agents. W=26𝑊26W=26italic_W = 26 indicates a name pool which uses the entire Latin alphabet, W=6𝑊6W=6italic_W = 6 is the name pool {Q, M, F, J, X, Y}, and W=2𝑊2W=2italic_W = 2 is the name pool {Q, M}.
Refer to caption
Figure SI6: Stability of consensus conventions We test the stability of each model in the setting W=2𝑊2W=2italic_W = 2, with the possible conventions shown in the legend. For each convention, we begin with a population of N=24𝑁24N=24italic_N = 24 agents (N=48𝑁48N=48italic_N = 48 for Llama 3), where every agent has only the respective convention in memory. We then allow the population to naturally evolve, and measure the production probability of this convention. We simulate the following number of runs (A-D): 4, 10, 3, 5. Faint lines show the trajectories of individual runs. We show that for all models apart from Llama 3.1, the population remains entirely stable at its initial consensus state. If Llama 3.1 is initialised with consensus on the weak convention (Q), the population immediately switches to the alternative convention (M, the strong convention) . Here, the strong convention remains stable, with some minor fluctuations. The instability of the weak convention is also observed in our study of the committed minority required to flip a majority consensus (see Fig.  3), where we see that M𝑀Mitalic_M acts as a strong attractor state that can only be overcome by a large enough committed minority on the weak convention, Q.

6.4 Prompting

6.4.1 Prompt Structure

The system prompt comprises of three components: 1. a fixed prompt that outlines the game’s rules, including the payoff structure and the player’s objective, 2. a dynamic memory prompt that provides contextual information about the state of play within the player’s memory range, and 3. an instructional prompt that provides information for how the agent should format its response. We find that agents generally struggle to behave in a manner befitting a partnership game, and often opted for strategies aimed at undermining their co-player’s payoff, effectively treating them as an opponent. In practice, this meant that on some occasions the agent would willingly take an action with a negative payoff, in order to harm their co-player’s accumulated point tally.

6.4.2 Output structure

To extract any meaningful decision from a language agent’s output, which may be verbose and unstructured, it is necessary to distinguish between the reason, where the agent ’shows its working’, and the final decision. One popular approach to prompting asks the agent to give a final decision at the end of its answer, allowing it to generate a reasoning for the decision before reporting the actual response. This method relies on the assumption that if the LLM is good at composing a well thought-out reasoning, having it spelled out explicitly would guide the choice of the LLM towards better performance due to the LLM’s autoregressive text generation method. Although this approach has shown promising results in a variety of tasks and multi-player games, there is an ongoing debate as to whether LLMs truly possess the ability to reason about the possible actions, beliefs, and intentions of their fellow players [60, 61, 62].

One can argue that even if agents were able to reason, the wording generated to explain the reasoning can potentially express the agent’s biases and influence the final decision. As a result, the reason-first, answer-later structure would make it difficult to identify the possible sources of bias within the prompt, especially since the agent’s reasoning will only serve to amplify its own bias reflected in the decision. To strengthen generalisation and ensure that the final action decision is influenced solely by the input prompt, we instead ask the agent explicitly to provide an output in the form answer-first-reason-later.

6.4.3 Example Prompt

We provide an example of the system and user prompts given to LLM agents in our experiments. We exclude the beginning of text and end of text tokens, which are unique to each model.

System Prompt Context: Player 1 is playing a multi-round partnership game with Player 2 for 100 rounds. At each round, Player 1 and Player 2 simultaneously pick an action from the following values: [F, J]. The payoff that both players get is determined by the following rule: 1. If Players play the SAME action as each other, they will both be REWARDED with payoff 100 points. 2. If Players play DIFFERENT actions to each other, they will both be PUNISHED with payoff -50 points. The objective of each Player is to maximize their own accumulated point tally, conditional on the behavior of the other player. This is the history of choices in past rounds: {’round’:1, ’Player 1’: F, ’Player 2’: J, ’payoff’: -50} . . . It is now round i. The current score of Player 1 is SCORE. Answer saying which value Player 1 should pick. Please think step by step before making a decision. Remember, examining history explicitly is important. Write your answer using the following format: {’value’: <VALUE_OF_PLAYER_1>; ’reason’: <YOUR_REASON>}.
User Prompt Answer saying which action Player 1 should play.

6.4.4 Meta-Prompting

When LLMs are used to solve tasks where some form of ground truth is defined, such as classification or regression, the effect of prompt variations on the quality of a model’s outputs can be measured on downstream performance [63]. However, that is not possible in generative tasks where a notion of error is undefined. Specifically in the naming game, any generated output is plausible, as long as it is within the set of allowed symbols. This ambiguity makes it difficult to assess whether the LLM’s outputs reflect a proper semantic understanding of the task’s rules or are merely products of statistical ‘hallucinations’ [40]. To partially address this issue, we rely on a meta-prompting technique to measure the LLMs’ level of comprehension of the given prompt [41]. This technique provides the LLM with the prompt, and then asks three types of prompt comprehension questions about: interaction rules, chronological sequence of actions in the history, and payoff statistics (Table SI2).

To assess the LLMs’ proficiency in responding to meta-prompting questions, we randomly selected a group of agents from a real stochastic simulation of the naming game. For each agent, we used its history to replay each of its previous interactions, using the same memory length it had in the simulation. Retracing every interaction, including the agent’s memory at the time, we ask the agent all possible comprehension questions. Note that certain questions that rely on memory cannot be asked in the first interaction. We pose the questions at each interaction and compute the average accuracy of the LLMs’ responses across all interactions for all agents. Overall, all models exhibit a good level of prompt comprehension, with response accuracy nearly always above 0.80.80.80.8 and most often close to 1111 ( Fig. SI7). The only model that went below 0.80.80.80.8 in any metric is Llama 2 70b, which showed relatively poor accuracy in counting the number of times it played a convention within memory range. In many cases, this agent confused the ID of the player it was asked to consider, or it answered how many times a convention has been observed in total, across both players. Here, it is also worth noting that LLMs from Llama 2 70b’s generation generally struggled with counting tasks  [41].

Table SI2: Templates of prompt comprehension questions used in meta-prompting to verify the LLM’s comprehension of the prompt.
Name Question
Rules min_max What is the lowest/highest payoff player A can get in a single round?
actions Which actions is player A allowed to play?
payoff Which is player X’s payoff in a single round if X𝑋Xitalic_X plays p𝑝pitalic_p and Y𝑌Yitalic_Y plays q𝑞qitalic_q?
Time round Which is the current round of the game?
actioni Which action did player X𝑋Xitalic_X play in round i𝑖iitalic_i?
pointsi How many points did player X𝑋Xitalic_X collect in round i𝑖iitalic_i?
State #actions How many times did player X𝑋Xitalic_X choose p𝑝pitalic_p?
#points What is player X𝑋Xitalic_X’s current total payoff?
Refer to caption
Figure SI7: Metaprompting results Accuracy of the model responses to the prompt comprehension questions defined in Table SI2. We selected 8 agents from a single run (5 agents for Llama 3 70B), and recovered their game record. We replayed the game using the memory length used in the simulated run (M=5𝑀5M=5italic_M = 5), posing the comprehension questions at each interaction. These runs provide approximately 100100100100 test interactions for each model.

6.4.5 LLM parameters

The table below shows the text generation parameters we use for all LLM models used in this work.

Parameter Value
Temperature 0.5
Top-K 10
Max Tokens 6
Table SI3: Model Parameters

7 Figure Data

Model Name Strong Convention Weak Convention
Llama 3 2435 2565
Llama 3.1 5079 4921
Claude 3.5 5016 4984
Llama 2 5010 4090
Table SI4: Raw individual bias Data shown for the left panel in Fig.  2A. Values indicate the counts of strong and weak productions in the case W𝑊Witalic_W=2 for individual agents, showing their preferences a priori, when agents are initialized with empty memory.
Model Name Strong Convention Weak Convention
Llama 3 40 0
Llama 3.1 40 0
Claude 3.5 26 14
Llama 2 36 4
Table SI5: Raw collective bias. Data shown for the left panel in Fig.  2B. Values indicate the count of consensus states on the strong and weak conventions after the population converged. For each model, we conducted 40 trial runs. All models had memory length M=5𝑀5M=5italic_M = 5, apart from Llama 3 70B (M𝑀Mitalic_M=3).
Model Name Strong Convention Weak Convention
Llama 3 6 1
Llama 3.1 10 0
Claude 3.5 5 5
Llama 2 16 11
Table SI6: Raw Critical mass values. Data shown for Fig.  3B. The reported values corresponds to the number of agents required to overturn a majority consensus on the convention: M (strong Convention), and Q (Weak Convention). Population size (top to bottom), N𝑁Nitalic_N= 48, 24, 24, 24.