Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Conversation and the Evolution of Metacognition

2023, Evolutionary Linguistic Theory

While the term "metacognition" is sometimes used to refer to any form of thinking about thinking, in cognitive psychology, it is typically reserved for thinking about one's own thinking, as opposed to thinking about others' thinking. How metacognition in this more specific sense relates to otherdirected mindreading is one of the main theoretical issues debated in the literature. This article considers the idea that we make use of the same or a largely similar package of resources in conceptually interpreting our own mind as we do in interpreting others'. I assume that a capacity for other-directed mindreading is minimally shared with our great-ape relatives, but I argue that the architecture of this system had to be substantially modified before it could efficiently and adaptively be turned inwards on one's own mind. I contend that an important piece of the overall evolutionary explanation here likely concerns selection pressures arising from the domain of conversational interaction. Specifically, drawing on work carried out in the human interaction studies tradition (e.g., conversation analysis), I argue that the smooth to-and-fro of conversational interaction can be seen to heavily depend on metacommunicative capacities, which, in turn, are underpinned by metacognitive capacities. I conclude with a thumbnail sketch of an evolutionary account of the emergence of these metacognitive capacities in the human line. Their appearance and spread-whether via genes, cultural learning, or more likely, some combination of the two-helps to explain the transition from great-ape communication to human conversation.

Conversation and the Evolution of Metacognition Ronald J. Planer School of Liberal Arts University of Wollongong Abstract While the term “metacognition” is sometimes used to refer to any form of thinking about thinking, in cognitive psychology, it is typically reserved for thinking about one’s own thinking, as opposed to thinking about others’ thinking. How metacognition in this more specific sense relates to other-directed mindreading is one of the main theoretical issues debated in the literature. This article considers the idea that we make use of the same or a largely similar package of resources in conceptually interpreting our own mind as we do in interpreting others’. I assume that a capacity for other-directed mindreading is minimally shared with our great-ape relatives, but I argue that the architecture of this system had to be substantially modified before it could efficiently and adaptively be turned inwards on one’s own mind. I contend that an important piece of the overall evolutionary explanation here likely concerns selection pressures arising from the domain of conversational interaction. Specifically, drawing on work carried out in the human interaction studies tradition (e.g., conversation analysis), I argue that the smooth to-and-fro of conversational interaction can be seen to heavily depend on metacommunicative capacities, which, in turn, are underpinned by metacognitive capacities. I conclude with a thumbnail sketch of an evolutionary account of the emergence of these metacognitive capacities in the human line. Their appearance and spread—whether via genes, cultural learning, or more likely, some combination of the two—helps to explain the transition from great-ape communication to human conversation. Keywords: metacognition; other-directed mindreading; procedural vs. explicit metacognition; theory of mind system; interaction engine; conversation; turn-taking; comprehension monitoring; repair initiators; backchanneling; cultural learning; cooperation and coordination 1. Introduction In cognitive psychology, “metacognition” is standardly defined as thinking about one’s own thinking (see, e.g., collections such as Weinert and Kluwe 1987; Nelson 1992; Metcalfe and Shimamura 1994; Chambres et al. (2002); Dunlosky and Metcalfe (2009).) This definition marks a critical contrast with thinking about others’ thinking. We obviously have access to information about our own minds that we lack in the case of others’ minds. Moreover, that is true even if there are fundamental symmetries in the way we interpret our own minds and those of others. For even then our evidential base differs considerably (e.g., I have access to my own mental imagery, but not yours). In addition, the two cases – interpreting one’s own mind and interpreting others’ – differ in their potential to involve a conflict of interest. It is very hard (though perhaps not impossible) for the evolutionary interests of that which interprets and that which is interpreted to come apart in the case of metacognition. On this possibility, see Haig (2008); Spurrett (2016). An individual’s evolutionary interests are of course separate from their psychological interests. The latter is the relevant notion of interest in the case of self-deception (Trivers 2000; 2011). I leave the topic of self-deception aside in this work. For it is (parts of) the mind of one and the same biological individual that serves in both roles in the case of metacognition. That is obviously not the true in the case of other-directed mindreading. While, as a highly cooperative species, it is very often in our interest to have our minds accurately read by others, there are also plenty of contexts in which it is in your interest to hide your mind from mine—most obviously, when you intend to deceive me. This suggests that the evolution of metacognition and other-directed mindreading may well have been shaped by different selective forces (assuming they are at least partially subserved by different cognitive mechanisms). Information tends to flow freely when fitness interests are closely aligned, whereas conflict tends to breed silence or deceit (Searcy and Nowicki 2010; Skyrms 2010). However, the definition of metacognition in terms of thinking about one’s own thinking needs clarification in other respects. For starters, “thinking about one’s own thinking” connotes conscious mental activity in which I explicitly judge myself as being in some mental state (e.g., as when I judge myself as knowing the capital of my home state). While it is true that a lion’s share of the research on human metacognition has focused on such explicit metacognitive judgments, as we’ll see shortly, there are other forms of self-knowledge that researchers wish to recognize as genuinely metacognitive that do not take this form. By limiting our focus only to such high-level mental activity, we run the risk of obscuring the origins of metacognition in both development and evolution, as we are likely to overlook plausible, lower-level precursors to such cognition. In addition, researchers also standardly wish to use the term “metacognition” to include cases of monitoring and control of cognitive activity. In many cases, metacognitive knowledge is likely to be implicated in monitoring and control of cognition (e.g., as when I judge that I am failing to learn some subject matter, and hence decide to switch cognitive strategies). But it is not obvious that such knowledge must always be involved. Much attention has been paid by philosophers interested in metacognition (few though there are) to ironing out these conceptual issues (see, e.g., Proust 2006, 2007, 2010, 2012, 2019; Carruthers 2009, Carruthers 2011, 2021; Carruthers and Ritchie 2012; Shea 2018, 2020; Shea et al. 2014). One crucial distinction that has been drawn is between procedural and explicit metacognition. For some researchers, explicitness is the same as or closely related to consciousness; explicit mental states are either conscious, or potentially conscious, ones. For others, explicitness is rather a matter of how information is instantiated in the mind, regardless of whether or not it is (potentially) conscious. Information that is explicitly represented is somehow encoded in the mind, where encoded information contrasts with information that is embodied in one of the mind’s computational procedures. To give a very simple illustration of this distinction: a computational system might compute the AND function either by looking up the rule for AND (i.e., A AND B is true if and only if A is true and B is true) and applying it to its inputs (explicit), or it might simply be hard-wired to output a state symbolizing truth if and only if both of its inputs symbolize truth (implicit). To the extent that the mind (or some part of it) has a conventional computing architecture, the distinction between encoded vs. embodied information should straightforwardly apply to it. So, one way that metacognitive information (information about our own mind, or about minds in the abstract) might be present in one’s mind is for it to be embodied in some of the mind’s procedures. Procedural metacognition involves such procedures, but typically, also “noetic” or “epistemic feelings.” A paradigm case of such feelings is that of confidence (or the lack thereof). Feelings of confidence attach to the information we use in decision-making. A lack of confidence that, e.g., I have locked my car, is likely to cause me to walk back to my car and (re?) lock it. A strong feeling of confidence that I have locked my car is likely to cause me to dismiss the possibility that I might not have as idle worry. The idea is that such feelings function to carry behaviorally useful information about one’s own cognitive states (our certainty, uncertainty) even if they do not represent them in overtly mentalistic terms. A mind can be organized to exploit such feelings and hence metacognitive information in the adaptive control of behavior or cognition without having the concept of a mistaken belief, for example. That is the preferred interpretation, anyway, of some researchers of the uncertainty-monitoring research that has been carried out with non-human primates (see, e.g., Proust 2010; 2019)). Other researchers remain skeptical of this interpretation (Metcalfe 2008; Carruthers 2011; Carruthers and Ritchie 2012). Here I propose to set aside some of the more complex debates about whether such epistemic feelings are legitimately regarded as metacognitive. I shall assume that they are. Even so, there is a real and important distinction between explicit and procedural metacognition. One way theorists have attempted to further conceptually disentangle the two is by appeal to the distinction between conceptual vs. non-conceptual content (with explicit metacognition being conceptual; procedural, non-conceptual). This distinction is not trivial to pin down, as it depends on the (arguably vexed) question of what concepts are. But the main idea, which I shall assume is a good one, is that conceptual representations are, in virtue of their structure, useable in a range of cognitive processes. A conceptual representation of a dog, for example, can be used in categorizing stimuli, in planning, in reasoning, and so on. That is not true in the case of a non-conceptual representation (e.g., a raw experience of a visual expanse). This is not to say that non-conceptual representations are causally inefficacious, but rather that they do not form a part of a larger, integrated inferential network spanning multiple cognitive functions. This raises the question, again debated at great length by philosophers of metacognition and increasingly by psychologists and neuroscientists, of what the connection is between procedural and explicit metacognition. In this article, I’ll confine myself to two main threads of thinking. The first is that procedural metacognition provides the raw material for explicit metacognition. It has been hypothesized, for example, that metacognitive knowledge is turned into explicit knowledge via a process Karmiloff-Smith (1991) has called “representational redescription” (Proust 2003, 2007). In representational redescription, information that is present in the mind in one format is translated into another format. In some cases, this is held to involve making explicit information that is wholly implicit in the mind, that is, embodied in one of the mind’s computational procedures—the equivalent of moving from a situation in which the logic of the AND rule is embodied in the operation of some (part of a) system, to one in which the AND rule is encoded. In other cases, redescription involves translation from one encoded format to another, thus affecting the computational role of the information in one’s cognitive economy (on the non-trivial assumption that there are procedures poised to consume the newly formatted information). It is an important feature of the view (i.e., one that does real explanatory work) that redescribed structures do not disappear over the course of development, but rather remain in place and continue to drive certain forms of cognition and behavior. According to another line of thinking, procedural and explicit metacognition are more disparately related. On this view, explicit metacognition is the result of turning a third-person or other-directed conceptual mindreading faculty inwards on the self (see, e.g., Carruthers 2011). More fully: the idea is that mindreading, and the metarepresentational resources that support it, evolved in the service of navigating both the competitive and cooperative demands of life in a complex social group. Human social life is the most complex of all, and that is why we show the most developed set of resources for mindreading others. But more to the present point: the idea is that other-directed mindreading immediately makes possible, as a by-product, self-directed mindreading (i.e., explicit metacognition). As regards procedural metacognition: such a view can be combined with a number of different stances. One might be skeptical that such cognition is genuinely metacognitive; one might accept such cognition as genuinely metacognitive, but hold that procedural metacognition evolved (largely) independently of explicit metacognition; or simply remain neutral on the question. In this article, I assume that some version of the other-directed mindreading account is right about the origins of explicit metacognition. I find this account more compelling on both computational and evolutionary grounds. Computationally speaking, the mechanistic details of how representational redescription is supposed to go have, at least to my mind, never been very clearly spelled out. (It is particularly unclear how information that is wholly implicit in procedures is supposed to be made explicit for the mind.) And evolutionarily speaking, I am thoroughly convinced that other great apes possess considerable theory of mind abilities, For an up-to-date overview of the evidence, see Planer (2021). but probably have little or no use for explicit metacognition in their social niche (which is not to say that procedural metacognition is unimportant for them). However, I want to defend a version of the other-directed mindreading theory with an important twist. Specifically, I shall argue that the process of turning a mindreading faculty that evolved for other-directed mindreading inwards on the self is not at all a trivial task. Adaptive and efficient self-directed mindreading is not something we get “for free.” Instead, self-directed mindreading of the explicit variety very likely required substantial change to the architecture of the previously evolved mindreading system. If so, then an account must be provided of the selective pressure(s) that drove this change. What benefits attached to explicit self-directed mindreading in our evolution as a species such that our great-ape-inherited mindreading faculty was significantly modified and upgraded? The proposal I shall seek to flesh out in what follows is this: metacognition, including and especially metacognition of the explicit variety, became increasingly important to managing the flow of human communicative interaction. By “communicative interaction,” I particularly have in mind here fast-paced conversational interaction, but also proto-versions of it. My picture of the nature of such interaction will draw heavily from work in the human interaction studies tradition (see, e.g., Schegloff 1968; Schegloff 2006; Schegloff et al. 1977; Clark 1996; and Sherzer 1989; Levinson 2006; Enfield 2017). More specifically, I take as a main and recent exemplar of such work that of Steve Levinson and colleagues (Holler and Levinson 2019; Homke, Holler and Levinson 2018; Levinson and Enfield 2020; Levinson 2020, forthcoming). According to Levinson—and an increasing number of others)—there are few, if any, bona fide linguistic universals (Evans and Levinson 2009; Seifart et al. 2018). Evidently, languages can and do vary along every, or virtually every, possible dimension. Nevertheless, there are robust universals to be found at the level of communicative interaction. Levinson sees the latter features as generated by what he calls the “human interaction engine”— a mosaic of co-adapted cognitive mechanisms, likely with diverse phylogenetic histories. And while Levinson does not specifically mention metacognition (of either a procedural or explicit variety), Though he (and others in the field) do frequently talk of metacognition in the other-directed sense (e.g., Levinson claims interaction typically involves “Schelling thinking”). But as already explained, that is a separate notion from self-directed mindreading. it is clear that the picture of communicative interaction he presents heavily depends on metacognition (and hence should be included as part of the interaction engine). In a word, then: I shall argue that the demands of real-time communication were a crucial driver of the evolution of self-directed, explicit mindreading, as such a form of social interaction became central to our lives. An outline of the discussion to follow: I begin by presenting in more detail the other-directed mindreading account of the origins of explicit metacognition advanced by Carruthers (2011), followed by my critique of that account. After that, I turn my focus to the nature of conversational interaction. There is a vast literature in this area which I cannot possibly hope to do proper justice to here. Rather, I simply focus on some aspects of such interaction that I think are plausibly demanding of metacognition. Finally, I provide a sketch of an evolutionary account of the origins of these metacognitive capacities in the human line which integrates the insights and lessons of the preceding sections. 2. Other-Directed Mindreading and the Origins of Explicit Metacognition Along with many other theorists, Carruthers maintains that our minds contain a computational system specialized for mindreading. This system is conceived of as outputting conceptual representations—hence ones that can go on to affect a variety of cognitive states and processes in the brain (including in our language faculty). Moreover, the mindreading system can consult various forms of conceptual information in the course of processing its inputs. Some of this information is held internal to the mindreading system itself (in a proprietary database), while other forms of information are sourced from other systems instantiated elsewhere in the brain. All this is sensible enough, in my view. Also sensible, I think, is Carruthers’ contention that the mindreading system originally evolved primarily for other-directed mindreading. There is a version of this system in other great apes, and in them, it may well be that it exclusively serves that purpose. The system (at least in us) is not impermeable to learning, and can be influenced by various forms of individual and social learning. Carruthers envisages this system as a “module,” though that is entirely optional from the perspective of the view being discussed here. As mentioned above, Carruthers contends that, with this system in place, explicit metacognition can be generated simply by turning this system inwards, that is, on the contents of one’s own mind. More fully, he tells us: The simplest hypothesis … is that self-knowledge is achieved by turning one’s mindreading capacities on oneself. All of the conceptual resources necessary for this to happen would already be present in the mindreading faculty, designed for attributing mental states to other agents. And the mindreading faculty would receive a rich body of information about one’s own mental life through the global broadcast of sensory and imagistic events …. Other sorts of mental state could then be self-attributed through inference and interpretation of data concerning one’s own behavior, circumstances, and sensory experience. (p. 66) Carruthers refers to this view as the “one-system theory” (Carruthers 2011; Carruthers and Ritchie 2012; see also Nicholson et al. 2021). There is one system that underlies both other-directed and self-directed mindreading. Carruthers contrasts it with a “two-systems theory,” which, as the name suggests, holds that metacognition and mindreading are subserved by two distinct faculties (a position he attributes to Stich and Nichols [2003]), as well as a “metacognition-is-prior theory.” The metacognition-is-prior theory holds that other-directed mindreading abilities develop (in both ontogeny and phylogeny) out of a pre-existing capacity for metacognition when grafted on to certain capacities for imaginative projection. The metacognition-is-prior theory is associated with various forms of simulationism about other-directed mindreading (e.g., Goldman 2006). There is good evidence for thinking that there is something right in the one-system view, though I direct the reader to other resources for a full look at that evidence. Carruthers (2011) is a balanced overview. Suffice it to say that there is substantial overlap in the neurobiological areas that are active during both self-directed and other-directed mindreading. This, alone, would not necessarily tell against the metacognition-first view, however. For it is open to defenders of the metacognition-is-prior view that self- and other-directed mindreading should make use of a substantially overlapping set of cognitive resources. However, other evidence arguably does tell in favor of the one-system theory. For example, whereas the one-system theory predicts that any factor (clinical or artificially induced) that interferes with other-directed mindreading should likewise inhibit metacognition, no such prediction is made by the metacognition-is-prior theory. Indeed, some defenders of the metacognition-is-prior theory have claimed just the opposite, arguing that metacognition can be spared in certain clinical conditions like autism spectrum disorder while other-directed mindreading is impaired. Recent work instead suggests no such dissociation between the two exists, providing support for the one-system theory (Nicholson et al. 2021). I have no objection with the idea that we make use of a largely overlapping set of conceptual resources in mentalistically interpreting both ourselves and others. Rather, the move I take issue with is that there was no significant evolutionary change required in order for those resources to be efficiently and adaptively deployed in the first-person case. To see this, let us take a step back. To say that a system is specialized for processing information from a particular domain is generally understood to mean that the algorithms the system runs, and the information stores it is set up to consult (and the way it is set up to consult them), are “tuned” to that domain. The specs of the system make adaptive sense when dealing with information from the domain it is dedicated to. But not otherwise. To the extent that a system dedicated to, say, spatial navigation, happens to adaptively process information about biology, that is a miracle (Machery 2008; Planer 2017a). In practice, we should expect such processing to result in useless outputs (or even counterproductive ones). The mindreading system is no exception. In order for this system to adaptively function, it needs a way of recognizing those, and only those, representations which have to do with minded agents. But semantic content, whether pertaining to agents or otherwise, is not directly visible to a computational system. So, how is the system to be set up such that it shows selectivity in what it processes? How is it to home in on just that information it is “supposed to” process? One way for this to go is for the system to be located in the brain in such a way that the only information it ever receives is information of the appropriate type. Here, the problem of gating or selectivity is solved “upstream” of the system rather than by the system itself. Another solution is for the system to be physically configured so that it is caused to operate on some information only if the vehicle carrying that information bears an appropriate syntactic signature. This is the model via which selectivity at the level of gene transcription and intra and intercellular communication is achieved. It is like a lock being opened by a particular key. See Barrett (2005) for an excellent discussion of these design motifs. Each of these designs has its pros and cons, though it is beyond the scope of this article to enter into a discussion of them here. Suffice it to say that, for complex information-processing systems (i.e., ones that instantiate a large number and range of computations, and deal with large amounts of data), it is widely agreed that the latter design is preferable. This design has the virtue of allowing a variety of specialist systems to “probe” a representation of one and the same form, each checking for a match with its input conditions. I will assume a design like this in what follows (as it happens, it’s also the one Carruthers implicitly works with.) To the extent the mindreading system does things the other way, the problem I raise would be even more daunting. Assuming that the mindreading system initially evolved for other-directed mindreading, then, what kind of signature feature of information vehicles might it have been set up to detect? For a variety of reasons, the most plausible candidate is imagistic representations of agents (Planer 2021). A visual perception of a conspecific agent at a particular location in its environment is a paradigm case. Conspecific individuals bear a range of perceptible properties that serve to distinguish them from non-conspecifics, and imagistic representations can be expected to capture these differences. So, a mindreading system that is designed to be caused to operate by a representation of this sort will do a good job at locking on to appropriate information, to information it is useful for the system to process. But now we can see there is bound to be a problem. For the signals emanating from the inside of the organism are not of this general type (mental imagery of self or others aside). To see this, take a simple example. Suppose I happen to be viewing a watermelon. I am thus in a state of visual perception representing a watermelon. A mindreading system that supports metacognition would presumably use this visual perceptual state as evidence that I am currently seeing a watermelon; to self-attribute to me, in other words, a mental state with the content I see a watermelon there. The problem is that a visual representation of a watermelon is not a visual representation of a conspecific agent. There is no reason to think that this representation should cause the mindreading system to fire. More generally: when Carruthers tells us that “the mindreading faculty would receive a rich body of information about one’s own mental life through the global broadcast of sensory and imagistic events” (p. 66), he is in effect assuming that such sensory and imagistic representations will cause the mindreading system to operate and that such information will be subsequently processed by the system in an adaptive way. I see no reason to think this would be true of a system designed purely for other-directed mindreading, however. And in fact, the problem might be worse yet. Other remarks Carruthers makes reveal that he assumes the totality of our quasi-perceptual and affective states will serve as inputs to the mindreading system. All of these states are evidence for the mindreading system to use in attributing mental states to the self. On the one hand, this seems right. But on the other, it now seems that all selectivity on the part of the mindreading system will have gone out the window. Any state of perception, any state of imagination, any affective state—all of these will cue the mindreading system into action. To me, this seems like a significant modification in the specs of our specialist mindreading system indeed. While we can plausibly assume that no new neural wiring was necessary in order for this information to be delivered to the interface of the mindreading system, the system itself must have been reconfigured to now accept and operate on representations of an entirely different and very heterogenous sort. What benefits were produced by self-directed mindreading such that these changes were selected for? That is all concerning the input side of things. However, it is plausible that there would also need to be certain changes to the internal structure of the system—to the algorithms or inferential procedures it instantiates—as well. Once the mindreading system has tokened a representation of the form I see a watermelon there, the system will be disposed to infer I believe there is a watermelon there. There is really no difference, looked at computationally, between this case and one involving a third-party. The knowledge that seeing leads to believing is either explicitly represented in the mindreading system or else embodied in one of its procedures. Either way, we expect it to be insensitive to the identity of the agent to whom the perceptual or propositional attitude state is ascribed. (The information/procedure needs to be general in this sense, as it works for any possible conspecific we might mindread.) But consider the first inferential step. How does the system “know” that a visual perceptual state of a watermelon there is evidence of the fact that I see a watermelon there? Or put differently, where does the premise/procedure come from that causes the system to token a representation of the form I see a watermelon there on the basis of a watermelon-representing visual perceptual state of mine? This might seem trivial, but it is not trivial from a computational point of view. This encoded/embodied information must somehow be added to the mindreading system. And so on, for the many other inferences the mindreading system will initially have to make in the first-person case before it can harness its general knowledge of minds. That this is not at all trivial is easy to see especially in the case of learning to associate certain inner feelings (e.g., emotions) with their corresponding conceptual representations. Anger experienced from the inside does not at all “look like” anger expressed by another agent. So, we must add this additional inferential machinery to the bill. The discussion in this section has been quite detail oriented. But that is necessary, as the devil lies in these details. Pace Carruthers’ suggestion that explicit metacognition would result as a side effect of a mindreading system designed for other-directed mindreading, we see that a number of non-trivial changes to the architecture of that system were likely required. One possibility is that these changes were affected through a process of natural selection operating on genes. Alternatively, it may be that there was/remains a critical role for individual and cultural learning in this area (Heyes et al. 2020). In addition, some hypothesize a particularly central role for public language symbols in the development of explicit metarepresentational (including metacognitive) capacities, see, e.g., Clark 2006; Dingemanse 2020). For example, communication about mental states (e.g., with caretakers) may be crucial to learning how to categorize certain of our mental states on the basis of internal data. For my part, I strongly suspect that a complete account of the changes that brought about explicit metacognitive capacities as we now know them will give a role to all of these selective forces. Genetic change may be especially important to “pinning down” changes that first occur through cultural learning (West-Eberhard 2003; Jablonka and Lamb 2014), in a process of genetic accommodation. And it may well be that different patterns of explanation hold for different metacognitive capacities (e.g., some capacities might be more genetically channelled than others, etc.). I leave all of this open in what follows. The important point for us is that, given these requisite changes to the architecture of our original mindreading system, there must be some benefit, or set of benefits, to metacognition driving these changes. 3. Conversation It is now widely accepted that mindreading plays a crucial role in human communication, even if the exact extent of involvement, and the form that such mindreading takes (e.g., how many “orders of intentionality” it involves) remain controversial (e.g., Tomasello 2003, 2008; Sperber and Wilson 1986; Sperber and Orrigi 2012; Fitch 2010; Sterelny 2012; Bar-On 2013; Gamble et al. 2014; Scott-Phillips 2015; Bar-On and Moore 2017; Moore 2017a, 2017b; Planer 2017b, 2017c; Planer and Godfrey-Smith 2020; Planer and Sterelny 2021). There has been a similar and increasing consensus in the field of interaction studies, as well, though the terminology used often differs (e.g., “mutual understanding” [Schegloff [1992]], “intersubjectivity” [Enfield and Sidnell 2022], etc.). Mindreading was probably even more crucial during early stages of language evolution, when we did not yet possess rich bodies of communicative conventions to aid us in the process of communication. It is very plausible, then, that a key part of the story as to how our ancestors evolved language was through evolving increasingly sophisticated capacities for reasoning about each other’s minds. At the same time, as communication itself became richer and more demanding on cognition, selection would have rewarded those who were better at reading minds, the better to communicate. Communication and mindreading have almost certainly co-evolved via a positive feedback loop (Planer 2021; Planer and Sterelny 2021). In contrast to the attention other-directed mindreading has received in this area, very little attention has been paid to the role of metacognition in the self-directed sense in communication (for an exception, see Proust 2016). However, even a cursory inspection suggests that such cognition plays a crucial role in human communication. In this section, I highlight several such functions. In particular, I wish to focus specifically on the role of metacognition in regulating the flow of conversation(-like) interaction. In my view, the importance of metacognition in this domain has been seriously underappreciated. First some background on conversational interaction. Despite the tremendous variation in languages spoken, it turns out that there are very robust cross-cultural features of linguistic communication (Levinson 2006, 2020, forthcoming). Historically—and to this very day in traditional societies—the vast majority of human communication is face-to-face. Two or more individuals stand in close proximity to one another while speaking (or signing). In such a setting, they communicate with a lot more than their voices; facial expression, gaze, and gesture all play crucial roles in fleshing out utterance meaning. Communication takes place in short bursts, with one individual speaking, then another, then the first again or another individual. Each such turn tends to average around 2 seconds (in English, about 10 words). Responses are made rapidly. Indeed, they are lightning-fast: on average, a response is made within 200 msec of the speaker finishing his or her turn. Pauses longer than this tend to be interpreted as semantically significant; for example, as indicating that a negative response is imminent (e.g., a declined invitation). Among other things, the rapidity of response demonstrates, conclusively, that hearers are busy planning their response long before the speaker finishes his or her utterance. Hence, hearers must—and this is confirmed by the brain-imaging data we now have—predict how a speaker will finish his or her utterance even earlier. These features are robust across both signed and spoken languages in diverse parts of the world (see, e.g., Enfield, et al. 2015; Dingemanse et al. 2015). Against this backdrop, the flexibility and open-endedness of human communication is even more impressive. Utterance meaning is almost always underdetermined by literal meaning (Levinson 1983; Sperber and Wilson 1986; Scott-Phillips 2015). To understand what a speaker is actually saying—what they intend to communicate to us on some concrete occasion—we have to get at the intention behind their words. And often, there is quite some distance between the two. As researchers in pragmatics are always quick to point out, a simple sentence such as “It’s five o’clock” could not only indicate a number of different propositions depending on the context of utterance; it may express entirely different speech acts (a complaint, a question, an indicative statement). The highly inferential nature of human communicative interaction (at least in part) explains why such interaction assumes the dynamical form that it does. In particular, it makes frequent turn-taking crucial. Speakers thereby gain the opportunity to receive feedback from hearers on how well their message is being understood. And receivers, in turn, gain the opportunity to request clarification, or to check their understanding. In this way, the give-and-take of a conversation is clearly cooperative. Without such rapid switching of the roles of speaker and hearer, the mental representations of the two (or more) parties would very quickly drift apart, effectively rendering their communication useless. The prevention of this—and hence the competent managing of the to-and-fro of human communicative interaction—requires several things (Schegloff 2006; Enfield and Sidnell 2022; Levinson forthcoming). Comprehension monitoring. If communication is to run smoothly, individuals first and foremost have to monitor how well they are understanding one another’s utterances. There has in fact been a good deal of research in the area of metacognition on children’s so-called comprehension monitoring of spoken and written language (though mostly written). Understanding a bit of a dialogue of course requires assigning meaning to individual sentences as they come in. But it involves a lot more than that. While there are a number of different accounts of how comprehension is achieved, See, e.g., Oakhill (2017) and Soto et al. (2019) for recent overviews. the dominant ones are united in positing some kind of situation or mental model in Johnson-Laird’s (1983) sense. This mental model, which is constantly tweaked in the course of comprehension, collects and organizes both information that has come before in the dialogue as well as relevant background information. The model, in turn, supports the drawing of the various inferences on which comprehension depends, many of them merely implied by what has been said so far. In short: comprehension almost certainly involves the construction and maintenance in real time of a rather rich and complex mental representation. To monitor one’s comprehension, then, would seem to necessitate some kind of awareness of this model. How closely and carefully one monitors comprehension is of course influenced by one’s motivation. That motivation will be much stronger in cases which one perceives as being high stakes. Repair initiators. Having detected some form of deficiency in their understanding, receivers can and generally do act to redress that deficiency. Specifically, they typically deploy so-called repair initiators of some kind. A considerable amount of research has been done on repair initiators, including much cross-cultural research. They are a very stable and robust feature of conversation across cultures (Levinson 2006; Levinson 2020; Dingemanse et al. 2015; Albert and Ruiter 2018; Micklos and Woensdregt 2022). More specifically, repair initiators tend to come in two forms: specific and general. In English, we are all familiar with the general repair initiators such as, “huh?” Initiators of this kind are often the least helpful to the speaker, as they simply express puzzlement about the entirety of the speaker’s last utterance. However, sometimes this is the best we can do (e.g., if we have had difficulty hearing the speaker due to environmental noise). Typically, receivers try to be more cooperative by providing as specific a repair initiator as possible. An example would be, “Where did John ride his bike to last night?” The specificity of this repair initiator makes it clear to the original speaker that they need only specify the relevant location to get the conversation back on track; everything else about the utterance was grasped just fine by the audience. Often, speakers catch themselves being unclear or making a mistake midstream, and offer a correction even before the receiver can request repair, as in “Pass me the salt—sorry, I mean the garlic salt.” Self-repair initiators can also be quite a bit more complicated than this, requiring non-trivial cognitive effort from the receiver: “To get to the shop, take your first left, then your first right—urgh!, sorry, the other way around.” From a lay perspective, it is perhaps easy to be unimpressed by this aspect of communication. As Enfield (2017) nicely drives home. But it turns out that repair initiators are absolutely critical to successful conversation. Indeed, it has been shown that, cross-culturally and linguistically, repair initiators occur as much as once per every 80 seconds in conversation (Dingemanse et al. 2015). Here, we have framed repair initiators as mostly relevant to signaling a deficiency in communication of some kind. However, as Mark Dingemanse has pointed out to me, specific repair initiators also signal what has been successfully grasped (or at least what the receiver believes has been successfully grasped). Looked at this way, repair initiators also help to move communication along, not just get it back on track, and as such, one might think that the distinction between repair initiators and backchanneling signals (see below) is a fuzzy one. Signaling Understanding. Even when communication is operating smoothly, receivers are not entirely passive. Specifically, they signal their understanding to senders; that they are (or better: take themselves to be) adequately tracking what the sender is saying to them (White 1989; Peters and Wong 2015). These signals take both verbal and non-verbal forms, and their production is often called “backchannelling.” We are all familiar with interjecting such phrases as “yep,” “uh-huh,” and so on, as someone speaks to us. Ditto for nodding our head more or less vigorously. And ditto for the phenomenon of finishing or filling-out others’ utterances, resulting in a jointly constructed utterance (e.g., A: “Darn! I forgot— B: “the passports”). But we produce other signals of this type that fly under the radar of consciousness. For example, how we blink our eyes during communication while another is addressing us indicates our evaluation of our degree of understanding. Moreover, it has been shown that senders are sensitive to this information: in an experiment using an avatar whose blinking behavior could be systematically modulated, different blink patterns altered sender behavior (specifically, longer blinks caused speakers to abruptly finish their utterance turn) (Hömke et al. (2018)). Pulling all this together: it is clear that participants in a conversation are constantly engaged in monitoring their understanding and performance in that conversation. This is, needless to say, no trivial feat, given the immense flexibility and open-endedness of human communication, and given the breakneck speed at which it unfolds. At the same time, they take an active role in signaling to others their self-evaluation: either seeking to initiate repair or seeking to let others know that everything is humming along. Note it is a further question why, exactly, conversational participants act in such ways. One very proximal answer is that they want communication to be successful. But, (we can ask again) why do they want this? Enfield (2017) and Enfield and Sidnell (2022) suggest that it is because conversation is a form of cooperation, and as such, participants are socially responsible or accountable for how communication goes. This strikes me as plausible (though I am less sure that conversation is always properly regarded as cooperative). All of these activities might be subsumed under the umbrella concept of “metacommunication.” Each activity, in its own way, is about the process of ongoing communication. There is a close connection here with the notion of linguistic reflexivity (Hockett 1966). But arguably, language is not necessary to communicate about communication. So—and this is the main point this section has sought to establish—metacommunication is demanding of metacognition. But there is another feature of human communication which one might think is linked to metacognition: namely, the way we track and make sophisticated use of common ground in communication. There is much literature on this issue (for some canonical works, see, e.g., Lewis [1969]; Sperber and Wilson [1986]; Stalnaker [2002]). Some promote quite rich theories of tracking (e.g., Clark 1996), other more minimal (e.g., Keysar and Barr 2002). On the standard way of understanding common ground, some proposition p (or more generally, some piece of information) is (or is in) common ground between two communicators A and B just if (i) A believes p, and B believes p; (ii) A believes that B believes p, and B believes that A believes p; and finally (iii) A believes that B believes that A believes p, and B believes that A believes that B believes p. (Some theorists require further, or even an indefinite number, of recursive “layers” of believing/knowing. We need not worry about that presently.) Communicative acts are selected from amongst a range of possible ones in a way that is sensitive to information that is currently in common ground. Likewise, communicative acts are interpreted against this informational backdrop. No doubt it can seem hard to believe that communication, particularly given its characteristic speed and fluidity, is actually underpinned by such cognition. And yet, even the most mundane aspects of our communication—resolving the referent of a generic referring expression (e.g., “the guy from the party”), for example—seem unexplainable without it. As reflected by clause (iii), common ground cognition thus involves thinking about another’s thinking about one’s own thinking. In line with the remarks earlier in this article, “thinking” is construed broadly here. (Or, at least, that is how orthodoxy would have it.) This is a form of metacognition, albeit a different sort of metacognition than we have so far discussed. Here, there is thinking about our own mental states. But such thinking is indirect, running via how another individual represents one’s own mental states. Might there nevertheless be some connection between this form of cognition and direct metacognition? Some theorists have thought so. For example, Gärdenfors (1995) writes: The final step in the evolution of higher-level inner representation is small but crucial for self-awareness in its proper sense: I must realize that the inner environment of my opponent does not only contain a representation of myself as a bodily agent, but as an agent with inner representations as well. I propose that it is only after this insight that the agent can become self-conscious in the sense that it can form representations of its own representations. … [S]elf-awareness can then develop as a shortcut in the representations involved in the deception game: I can, in my inner environment, have a representation of my own inner environment. However, I submit that this kind of self-awareness could never develop without the previous establishment of a representation of the inner environment of other individuals. Citations and a footnote have been omitted from this quote. (p. 270) In my view, it is very hard to evaluate such claims in the absence of a worked-out picture of the cognitive architecture underlying thinking about others’ thinking about one’s own thinking. And unfortunately, we still do not have such a picture. That said, it would not be surprising if the resources we use to construct a representation of another’s representation of our own mind overlap considerably with the resources we use to construct a representation of our own mind directly. If so, then to the extent that some selective process (natural selection, reinforcement learning, cultural learning) was busy building the former set of resources, it may well have been providing key elements of the latter set of resources, too. To see the potential for overlap here: suppose I want to know whether my conversational partner believes that I can see the book on the table before me, for example. Answering this question seems to require that I take an external perspective (in this case, theirs) on my own perspective. Can he see that I can see the book? But once I can do this, it may well be that the presence of an actual other is not required; perhaps I can simply reflect directly on my perceptual perspective on things, by assuming an indirect standpoint on my perspective. Here I would be exploiting certain external cues to guide the self-attribution of perceptual states (e.g., where I am located in the environment, whether there is anything obstructing my line of sight to the book, etc.). But this would also seem to open up the possibility of forming associations between self-attributed perceptual states “from the outside,” as it were, and the internal perceptual states of which I am consciously aware. Later, I would presumably be able to recognize that I am in said states on the basis of internal signals alone. No doubt these remarks are speculative. But that is fine as their role here is simply to illustrate a type of possibility. This is that selection for tracking others’ perspectives on our own perspective may build much of the machinery that is necessary to track our own perspective directly. To the extent that this is true, the demands of human-style communication would go even farther towards explaining our explicit metacognitive capacities. But this is a topic for another day. For now, I am content to have provided a strong case for a deep connection between the two via the mechanisms of metacommunication. 4. From Great-Ape Communication to Human Conversation Let me now offer a sketch of an evolutionary scenario linking the metacognitive capacities of the great apes with those of modern humans. Crucially, I intend this scenario as no more than a piece of the overall story that must eventually be provided here (and which at least some others are now attempting to provide, e.g., Proust 2016). However, it seems to me an important piece. It is widely appreciated that great apes show very sophisticated communicative abilities. That is especially true in the gestural domain. It is estimated that their gesture repertoires can reach as much as 80 or so signs (Genty et al. 2009). Moreover, great apes regularly produce gesture sequences: these take the form either of gesture bouts (which are mainly iterations of the same gesture type) or rapid-fire sequences (which feature distinct gesture types) (Genty and Byrne 2010; Byrne et al. 2017). For a long time, it was assumed that great ape vocalization was rather unremarkable, though there is increasing evidence that we have underestimated the sophistication and flexibility of their vocal signaling (see, e.g., Crockford et al. 2012; Schel, et al. 2013; Crockford, et al. 2017). Minimally, we can say that great-ape communication is multi-modal, just as ours is. So, too, then would the communication system of our last common ancestor with chimps and bonobos have been. Less widely known are the facts about great-ape response behavior, in part because this topic has received much less research attention to date. Recent work in this area is suggesting that in at least some contexts (e.g., mother-infant communication relating to travel), great apes approximate the very striking 200 msec response time seen in humans (Fröhlich et al. 2016; Rossano 2019). So, it is likely that the communicative interactions between the Pan-Human last common ancestor also approximated this response window. Strikingly, and importantly for the present argument, what we do not find is evidence for any of the metacommunicative phenomenon highlighted above. There is no evidence (as of yet, anyway) that great apes request clarification when they do not understand the meaning of an utterance, nor any apparent form of backchannelling (signaling of understanding) (Fröhlich et al. 2016; Fröhlich and van Schaik 2022; Heesen et al. 2022). The two possible exceptions here are behaviors known as “persistence” and “elaboration.” These behaviors are produced when a communicational partner is unresponsive, and respectively take the form of repeating one’s gesture or producing an alternative one. One might see these as forms of self-initiating repair, though it is also possible to provide a leaner interpretation, not least because they are not pre-emptive. That said, such metacommunicative capacities are likely composed out of simpler cognitive and behavioral capacities, many of which appear to be shared with our primate relatives (and perhaps beyond) (see Heesen et al. 2022 and references therein). Despite the absence of metacommunication in great apes, given the evidence for feelings of uncertainty in primates in general, See Proust (2019) for a good overview. it is plausible that great apes nonetheless have feelings of understanding and not understanding (or, equivalently, stronger and weaker feelings of understanding) what another agent is attempting to communicate to them. Or so I shall here assume. If so, then, as I see it, the main theoretical question we face at this point is how and why such feelings, which would have likewise been present in our last common ancestor with great apes, were recruited into metacommunicative service. Let us take how first. What is particularly striking about the mental states we self-attribute during conversation is their specificity. As a general rule, during communication misfires, we do not simply feel at a total loss as to the meaning of some utterance. Rather, we form particular beliefs (or some similar propositional attitude) regarding the character of our misunderstanding or confusion. So, for example, I might represent myself as failing to understand/being unsure about which restaurant my partner meant when she said to me “Meet me at the restaurant at 7pm.” And it is this specificity of this self-attributed state that, in turn, enables me to request clarification with a specific repair initiator: “Meet you at which restaurant at 7pm?” or simply, “Which restaurant?” Moreover—and this is a crucial point—the influence of mental states we self-attribute in this domain is not limited to the selection of certain repair initiators (or backchannelling signals). Instead, they appear to be available for use across a range of cognitive processes. For example, if, for some reason, I am unable to request clarification from the speaker in the here and now (say, she has left a note for me), I can nonetheless reason in ways that are sensitive to and informed by the specific character of my uncertainty. Adapting the above example: I might call ahead to several restaurants and check whether any of them have a reservation under my partner’s name for 7pm. Examples such as these can be proliferated with ease. So, these discrete, contentful metacommunicative beliefs interface smoothly, not only with our linguistic faculty, but also with other cognitive faculties (e.g., reasoning, planning, practical decision-making, etc.), too. What I would like to propose, then, is this: feelings of understanding and not understanding others’ utterances that were present in early hominins came to play a central role in our communication by way of being conceptually interpreted by our specialist, theory of mind system. It is quite plausible that this system had already seen an upgrade relative to the version present in other great apes by the time this development occurred, For a discussion of the possible nature and evolution of this upgrade, see Planer (2021). though on that issue I shall remain neutral in this article. The crucial thing to note for present purposes is that, in line with my remarks in Section 2, this development required non-trivial changes to the architecture of our theory of mind system. In particular, the specs of that system had to be tweaked to allow it to efficiently operate on these internally generated (and perhaps externally generated, too) signals. But why? What benefit(s) drove this change? In brief: because this improved the quality of real-time human communication. Or more fully put: as hominins transitioned from living in a dense rainforest habitat to a more open and arid savanna-woodlands habitat, they confronted a range of new challenges. Novel food sources had to be identified and taken, which likely required novel technologies and techniques. Moreover, they faced a range of novel threats (e.g., new/intensified predation threats). All this is widely agreed. It is also now increasingly agreed that our ancestors very likely met these challenges through enhanced cooperation and coordination. Collective defense, collaborative foraging (in the form of, e.g., aggressive scavenging, and later group hunting), and cooperative breeding were all likely early and important players (see, e.g., Hrdy 1999, 2009; Tomasello 2010; Tomasello et al. 2012; Sterelny 2012; van Schaik and Carel 2016; Planer and Sterelny 2021). Such forms of cultural learning, cooperation, and coordination created strong selection pressures for enhanced communication, including communication that was more efficient. Given the central role of communication in each of these areas, any upgrade in our communicative capacities very likely produced multiple fitness-relevant benefits. But most obviously: more efficient communication meant more efficient collective action. Now, much of our conversation flies free of specific practical aims, however. We enjoy telling stories, telling jokes, and simply making “small talk.” Put differently, we often (though not always, of course) enjoy conversation simply for conversation’s sake (i.e., we often find it intrinsically rewarding). There is every ethnographic indication that this is a human universal. So, one would expect the evolution of this tendency to lie deep in our sapiens past, and it would not be surprising if it evolved earlier. No doubt the complete evolutionary story here is bound to be complex. But it seems plausible to think that once communication took on “a life of its own,” so to speak, those individuals who were better equipped for conversation would have had at least some social advantage. For example, assuming that part of the function of conversation is to promote social bonding (Dunbar 1998), those with enhanced conversational skills would have been better at navigating and managing their social worlds. Bottom line: given the foundational role of conversational interaction in so many areas of human life, and given the direct social importance of such interaction, it is very plausible that those who were better conversationalists would have enjoyed some fitness advantage. This advantage, I propose, would have been more than enough to drive selection for the above-hypothesized changes to the architecture of humans’ theory of mind system. Acknowledgments I would like to thank Mark Dingemanse and Nick Enfield (both of whom signed their referee reports on an earlier version of this manuscript) for their valuable feedback. References Albert, S., & Ruiter, J. P. de. (2018). Repair: The Interface Between Interaction and Cognition. Topics in Cognitive Science, 10(2), 279–313. doi: 10.1111/tops.12339 Bar‐on, D. (2013). Origins of meaning: Must we ‘go Gricean’?. Mind & Language, 28(3), 342- 375. Bar-On, D., & Moore, R. (2017). Pragmatic interpretation and signaler-receiver asymmetries in animal communication. Barr, D. J., & Keysar, B. (2005). Making sense of how we make sense: The paradox of egocentrism in language use. Figurative language comprehension: Social and cultural influences, 2141. Barrett, H. C. (2005). Enzymatic computation and cognitive modularity. Mind & language, 20(3), 259-287. Bauman, R., & Sherzer, J. (Eds.). (1989). Explorations in the Ethnography of Speaking (No. 8). Cambridge University Press. Byrne, R. W., E. Cartmill, et al. (2017). “Great Ape Gestures: Intentional Communication with a Rich Set of Innate Signals.” Animal Cognition 20 (4): 755–769. Carruthers, P. (2009). Mindreading underlies metacognition. Behavioral and brain sciences, 32(2), 164-182. Carruthers, P. (2011). The opacity of mind: An integrative theory of self-knowledge. OUP Oxford. Carruthers, P. (2021). Explicit nonconceptual metacognition. Philosophical Studies, 178(7), 2337-2356. Carruthers, P., & Ritchie, J. B. (2012). The emergence of metacognition: Affect and uncertainty in animals. Foundations of metacognition, 76-93. Chambres, P., Izaute, M., & Marescaux, P. J. (Eds.). (2002). Metacognition: Process, function, and use. Springer Science & Business Media. Clark, A. (2006). Language, embodiment, and the cognitive niche. Trends in cognitive sciences, 10(8), 370-374. Clark, H. H. (1996). Using language. Cambridge University Press. Crockford, C., R. M. Wittig, et al. (2012). “Wild Chimpanzees Inform Ignorant Group Members of Danger.” Current Biology 22 (January 24): 142–146. Crockford, C., R. Wittig, et al. (2017). “Vocalizing in Chimpanzees Is Influenced by Social-Cognitive Processes.” Science Advances 3 (11). Dingemanse, M. (2020). Resource-rationality beyond individual minds: The case of interactive language use. Behavioural and Brain Sciences, 43, 23-24. Dingemanse, M., Roberts, S. G., Baranova, J., Blythe, J., Drew, P., Floyd, S., ... & Enfield, N. J. (2015). Universal principles in the repair of communication problems. PloS one, 10(9), e0136100. Dunbar, R. I. M. (1998). Grooming, gossip, and the evolution of language. Harvard University Press. Dunlosky, J., & Metcalfe, J. (2008). Metacognition. Sage Publications. Enfield, N. J. (2017). How we talk: The inner workings of conversation. Basic Books. Enfield, N. J., & Sidnell, J. (2022). Consequences of language: from primary to enhanced intersubjectivity. MIT Press. Enfield, N. J., Dingemanse, M., Baranova, J., Blythe, J., Brown, P., Dirksmeyer, T., ... & Torreira, F. (2013). Huh? What?–A first survey in 21 languages. In Conversational repair and human understanding (pp. 343-380). Cambridge University Press. Evans, N., & Levinson, S. C. (2009). The myth of language universals: Language diversity and its importance for cognitive science. Behavioral and brain sciences, 32(5), 429-448. Fitch, W. T. (2010). The evolution of language. Cambridge University Press. Fröhlich, M., & van Schaik, C. P. (2022). Social tolerance and interactional opportunities as drivers of gestural redoings in orang-utans. Philosophical Transactions of the Royal Society B, 377(1859), 20210106. Fröhlich, M., Kuchenbuch, P., Müller, G., Fruth, B., Furuichi, T., Wittig, R. & Pika, S. (2016). “Unpeeling the Layers of Language: Bonobos and Chimpanzees Engage in Cooperative Turn-taking Sequences.” Scientific Reports 6, 25887. Gärdenfors, P. (1995). Cued and detached representations in animal cognition. Behavioural processes, 35(1-3), 263-273. Genty, E., and R. W. Byrne (2010). “Why Do Gorillas Make Sequences of Gestures?” Animal Cognition 13 (2): 287–301. Genty, E., T. Breuer, et al. (2009). “Gestural Communication of the Gorilla (Gorilla gorilla): Repertoire, Intentionality and Possible Origins.” Animal Cognition 12 (3): 527–546. Goldman, A. I. (2006). Simulating minds: The philosophy, psychology, and neuroscience of mindreading. Oxford University Press on Demand. Haig, D. (2008). Conflicting messages: genomic imprinting and internal communication. Sociobiology of communication, 209-223. Heesen, R., Fröhlich, M., Sievers, C., Woensdregt, M., & Dingemanse, M. (2022). Coordinating social action: a primer for the cross-species investigation of communicative repair. Philosophical Transactions of the Royal Society B, 377(1859), 20210110. Heyes, C., Bang, D., Shea, N., Frith, C. D., & Fleming, S. M. (2020). Knowing ourselves together: The cultural origins of metacognition. Trends in Cognitive Sciences, 24(5), 349-362. Hockett, F. (1966). The Problem of Universals in Language. In Universals of Language, edited by Joseph H. Greenberg, 1–29. Cambridge MA and London: MIT Press. Holler, J., & Levinson, S. C. (2019). Multimodal language processing in human communication. Trends in Cognitive Sciences, 23(8), 639-652. Hömke, P., Holler, J., & Levinson, S. C. (2018). Eye blinks are perceived as communicative signals in human face-to-face interaction. PloS one, 13(12), e0208030. Hömke, P., Holler, J., & Levinson, S. C. (2018). Eye blinks are perceived as communicative signals in human face-to-face interaction. PloS one, 13(12), e0208030. Hrdy, S. B. (2009). Mothers and others: The evolutionary origins of mutual understanding. Harvard University Press. Hrdy, S. B. (2016). Development plus social selection in the emergence of “emotionally modern” humans. Childhood: Origins, evolution, and implications, 11-44. Jablonka, E., & Lamb, M. J. (2014). Evolution in four dimensions, revised edition: Genetic, epigenetic, behavioral, and symbolic variation in the history of life. MIT press. Johnson-Laird, P. N. (1983). Mental models: Towards a cognitive science of language, inference, and consciousness (No. 6). Harvard University Press. Karmiloff-Smith, A. (1991). Beyond modularity: Innate constraints and developmental change. The epigenesis of mind: Essays on biology and cognition, 171-197. Levinson, S. C. (1983). Pragmatics. Cambridge university press. Levinson, S. C. (2020). On the human" interaction engine". In Roots of human sociality (pp. 39-69). Routledge. Levinson, S. C. (forthcoming). The Interaction Engine and the Evolution of Language. Levinson, S. C., & Enfield, N. J. (Eds.). (2020). Roots of human sociality: Culture, cognition and interaction. Routledge. Lewis, D. (1969). Convention: A philosophical study. John Wiley & Sons. Machery, E. (2008). Massive modularity and the flexibility of human cognition. Mind & Language, 23(3), 263-272. Metcalfe, J. (2008). Evolution of metacognition. Handbook of metamemory and memory, 29, 46. Michael, T. (2003). Constructing a Language: a Usage Based Theory of Language Acquisition. Harvard University Press. Micklos, A., & Woensdregt, M. (2022, August 19). Cognitive and Interactive Mechanisms for Mutual Understanding in Conversation. PsyArXiv. doi: 10.31234/osf.io/aqtfb Moore, R. (2017). Pragmatics-First Approaches to the Evolution of Language. Psychological Inquiry, 28(2-3), 206-210. Moore, R. (2018). Gricean communication, language development, and animal minds. Philosophy Compass, 13(12), e12550. Nelson, T. O. (1992). (Ed.). Metacognition: Core Readings. Nichols, S., & Stich, S. P. (2003). Mindreading: An integrated account of pretence, self-awareness, and understanding other minds. Clarendon Press/Oxford University Press. Nicholson, T., Williams, D. M., Lind, S. E., Grainger, C., & Carruthers, P. (2021). Linking metacognition and mindreading: Evidence from autism and dual-task investigations. Journal of Experimental Psychology: General, 150(2), 206. Oakhill, J. (2020). Four decades of research into children’s reading comprehension: A personal review. Discourse Processes, 57(5-6), 402-419. Peters, P., & Wong, D. (2015). Turn management and backchannels. In Corpus pragmatics: A handbook (pp. 408-429). Cambridge University Press. Planer, R. J. (2017a). How language couldn’t have evolved: a critical examination of Berwick and Chomsky’s theory of language evolution. Biology & Philosophy, 32(6), 779-796. Planer, R. J. (2017b). Protolanguage might have evolved before ostensive communication. Biological Theory, 12(2), 72-84. Planer, R. J. (2017c). Talking about tools: Did early pleistocene hominins have a protolanguage?. Biological Theory, 12(4), 211-221. Planer, R. J. (2021). Theory of Mind, System-2 Thinking, and the Origins of Language. In Explorations in Archaeology and Philosophy (pp. 171-195). Springer, Cham. Planer, R. J., & Godfrey-Smith, P (2020). Communication and representation understood as sender–receiver coordination. Mind Lang. Planer, R., & Sterelny, K. (2021). From signal to symbol: the evolution of language. MIT Press. Proust, J. (2006). Rationality and metacognition in non-human animals. Proust, J. (2007). Metacognition and metarepresentation: is a self-directed theory of mind a precondition for metacognition?. Synthese, 159(2), 271-295. Proust, J. (2010). Metacognition. Philosophy Compass, 5(11), 989-998. Proust, J. (2012). Metacognition and mindreading: one or two functions. Foundations of metacognition, 234-251. Proust, J. (2016). The evolution of primate communication and metacommunication. Mind & language, 31(2), 177-203. Proust, J. (2019). From comparative studies to interdisciplinary research on metacognition. Animal Behavior and Cognition, 6(4), 309-328. Rossano F 2019, The structure and timing of human vs. primate social interaction. In P. Hagoort (ed) Human Language, MIT, pp. 201-220. Schegloff, E. A. (1968). Sequencing in conversational openings 1. American anthropologist, 70(6), 1075-1095. Schegloff, E. A. (1992). Repair after next turn: The last structurally provided defense of intersubjectivity in conversation. American journal of sociology, 97(5), 1295-1345. Schegloff, E. A. (2006). Interaction: The infrastructure for social institutions, the natural ecological niche for language, and the arena in which culture is enacted. In Roots of human sociality (pp. 70-96). Routledge. Schegloff, E. A., Jefferson, G., & Sacks, H. (1977). The preference for self-correction in the organization of repair in conversation. Language, 53(2), 361-382. Schel, A. M., S. W. Townsend, et al. (2013). “Chimpanzee Alarm Call Production Meets Key Criteria for Intentionality.” PLoS One 8 (10): e76674. Searcy, W. A., & Nowicki, S. (2010). The evolution of animal communication. In The Evolution of Animal Communication. Princeton University Press. Seifart, F., Evans, N., Hammarström, H., & Levinson, S. C. (2018). Language documentation twenty-five years on. Language, 94(4), e324-e345. Shea, N. (2018). Metacognition and abstract concepts. Philosophical Transactions of the Royal Society B: Biological Sciences, 373(1752), 20170133. Shea, N. (2020). Concept‐metacognition. Mind & Language, 35(5), 565-582. Shea, N., Boldt, A., Bang, D., Yeung, N., Heyes, C., & Frith, C. D. (2014). Supra-personal cognitive control and metacognition. Trends in cognitive sciences, 18(4), 186-193. Shimamura, A. P., & Metcalfe, J. (1994). The neuropsychology of metacognition. Metacognition: Knowing about knowing, 253-276. Skyrms, B. (2010). Signals: Evolution, learning, and information. OUP Oxford. Soto, C., Gutiérrez de Blume, A. P., Jacovina, M., McNamara, D., Benson, N., & Riffo, B. (2019). Reading comprehension and metacognition: The importance of inferential skills. Cogent Education, 6(1), 1565067. Sperber, D., & Origgi, G. (2012). 15 A pragmatic perspective on the evolution of language. Meaning and relevance, 331. Sperber, D., & Wilson, D. (1986). Relevance: Communication and cognition (Vol. 142). Cambridge, MA: Harvard University Press. Spurrett, D. (2016). Does intragenomic conflict predict intrapersonal conflict?. Biology & Philosophy, 31(3), 313-333. Stalnaker, R. (2002). Common ground. Linguistics and philosophy, 25(5/6), 701-721. Sterelny, K. (2012). Language, gesture, skill: the co-evolutionary foundations of language. Philosophical Transactions of the Royal Society B: Biological Sciences, 367(1599), 2141- 2151. Sterelny, K. (2012). The evolved apprentice. MIT press. Tomasello, M (2008). Origins of human communication. MIT press. Tomasello, M. (2010). Origins of human communication. MIT press. Trivers, R. (2000). The elements of a scientific theory of self‐deception. Annals of the New York Academy of Sciences, 907(1), 114-131. Trivers, R. (2011). Deceit and self-deception: Fooling yourself the better to fool others. Penguin UK. van Schaik, Carel P. (2016). The Primate Origins of Human Nature. Hoboken, N.J.: John Wiley. Weinert, F. E., & Kluwe, R. H. (1987). Metacognition, motivation, and understanding. West-Eberhard, M. J. (2003). Developmental plasticity and evolution. Oxford University Press. White, S. (1989). Backchannels across cultures: A study of Americans and Japanese1. Language in society, 18(1), 59-76. 18