Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
License: CC BY 4.0
arXiv:2401.00009v2 [cs.AI] 07 Jan 2024

Turing’s Test, a Beautiful Thought Experiment

Bernardo Gonçalves [Uncaptioned image] 0000-0003-2794-8478 Polytechnic School, University of São PauloBrazil begoncalves@usp.br Also Visiting Fellow at King’s College and Research Affiliate at the Department of History and Philosophy of Science, University of CambridgeUK
(2024)
Abstract.

In the wake of large language models, there has been a resurgence of claims and questions about the Turing test and its value for AI, which are reminiscent of decades of practical “Turing” tests. If AI were quantum physics, by now several “Schrödinger’s” cats could have been killed. Better late than never, it is time for a historical reconstruction of Turing’s beautiful thought experiment. In this paper I present a wealth of evidence, including new archival sources, give original answers to several open questions about Turing’s 1950 paper, and address the core question of the value of Turing’s test.

Alan Turing, Turing test, Thought experiment, Foundations of AI & computer science, Galileo Galilei, History of science, History of AI
journalyear: 2024doi: XXXXXXX.XXXXXXX
Refer to caption
Figure 1. Alan Turing (1912-1954). Photographs of Alan Turing, copyright The Provost and Scholars of King’s College Cambridge 2023. Archives Centre, King’s College, Cambridge, AMT/K/7/12. Reproduced with permission.

1. Introduction

In recent decades, the Turing test has been used as a practical experiment for publicity purposes and aptly criticized (Shieber, 1994a, b; Vardi, 2014), and has been the whipping boy of AI (Hayes and Ford, 1995), cognitive sciences and analytic philosophy (Shieber, 2004), and increasingly, with the rise of AI, the humanities and social sciences (Brynjolfsson, 2022). It is not uncommon for criticism coming from all these areas to take Turing’s test literally, assuming that he encouraged deception as a criterion and/or proposed a crucial experiment to establish the existence of machine intelligence.

Now, in the wake of large language models, science and technology outlets ask whether Turing’s test can be a ‘benchmark’ for AI (Biever, 2023), and whether it is ‘dead’ (Wells, 2023). Based on recent primary research (Gonçalves, 2023e, a, b, c, d), in this paper I present a mass of evidence, including newly discovered archival sources, and a new perspective on Turing’s test.

I address a few problems that we will keep track as we go along: (P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT) Why would Turing design a test based on imitation, which can be seen as encouraging deception? (P2subscript𝑃2P_{2}italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT) Why to present multiple versions of a test, as opposed to a well-defined, controlled experiment? (P3subscript𝑃3P_{3}italic_P start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT) Why gender imitation in a test of machine intelligence, and why conversation for an intelligence task? (P4subscript𝑃4P_{4}italic_P start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT) Why are practical “Turing” tests circular, as Hayes and Ford (Hayes and Ford, 1995) claimed? Finally, the core problem, (P5subscript𝑃5P_{5}italic_P start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT) What is the value of Turing’s test for AI?

Section 2 presents a reading of Turing’s 1950 paper. Section 3 introduces newly discovered archival sources and examines Turing’s concept of imitation, addressing P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Section 4 shows that his presentation of the imitation game fits “the basic method of thought experiments” (Mach, 1897), addressing P2subscript𝑃2P_{2}italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. Section 5 reconstructs the historical conditions of Turing’s proposal, addressing P3subscript𝑃3P_{3}italic_P start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT. Section 6 draws a parallel with the history of “the most beautiful experiment in the history of science” (Palmieri, 2005), addressing P4subscript𝑃4P_{4}italic_P start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT. Finally, Section 7 revisits the history of AI, addressing P5subscript𝑃5P_{5}italic_P start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT, and Section 8 concludes the paper.

2. What is the Turing Test?

In 1950, Alan Turing (Fig. 1) published the second of his three seminal papers, ‘Computing machinery and intelligence’ (Turing, 1950). The text has 28 pages, divided in seven sections, §1-§7. Three main logical steps can be identified in his argument: the proposal (§1-§3, 3+ pp.), the science (§4-§5, 6 pp.), and the discussion (§6-§7, 18+ pp.).

The proposal sought to replace the question “Can machines think?,” which he considered “too meaningless to deserve discussion” (p. 442),111Turing was coming from unstructured multidisciplinary debates in at least two editions of a seminar, “Mind and Machine,” held at the Philosophy Department of Manchester University in October and December, 1949. Of the latter, one participant wrote in a Christmas postcard sent to Warren McCulloch: “I wish you had been with us a few days ago we had an amusing evening discussion with Thuring [sic], Williams, Max Newman, Polyani [sic], Jefferson, J Z Young & myself \ldots An electronic analyser and a digital computer (universal type) might have sorted the arguments out a bit.” Jules Y. Bogue to McCulloch, c. December 1949. American Philosophical Society, Warren S. McCulloch Papers, Mss.B.M139_005. Thanks to J. Swinton for this archival finding. with the imitation game. The purpose was to change the common meaning of the word ‘machine’ (e.g., a steam engine, a bulldozer) in light of the new mathematical science of ‘universal’ digital computing. The imitation game would allow for a grounded discussion of ‘machine’ and ‘thinking,’ seeking to expand the meaning of ‘thinking’ and detach it from the human species, much as the meaning of ‘universe’ was once detached from the Earth.

In 1950, one of the OED definitions of ‘machine’ was:222New English Dictionary. Oxford, Vol. VI, Part II, M-N, p. 7. “a combination of parts moving mechanically as contrasted with a being having life, consciousness and will \ldots Hence applied to a person who acts merely from habit or obedience to a rule, without intelligence, or to one whose actions have the undeviating precision and uniformity of a machine.” Thus, by definition, common sense did not allow the meanings of ‘machine’ and ‘thinking’ to overlap. Despite Turing’s emphasis in his opening paragraph that he did not intend to discuss how these words were “commonly used” (p. 433), the hostility to his proposal can be seen from one of the first reactions, from a participant in the 1949 Manchester seminars, who quoted the above OED definition to appeal to common sense (Mays, 1952).

The new question, which Turing considered to have a “more accurate form” (Turing, 1950, p. 442), would be based on a vivid image, his “criterion for ‘thinking”’ (p. 436), which he called interchangeably the ‘imitation game’ and his ‘test.’333For Turing’s exact references to his test in all known sources, see (Gonçalves, 2023e, p. 2). The new question is whether a machine, playing A, the deceiver, can imitate a woman, a man, a human being, or another machine, playing B, the assistant, in a remotely played conversation game, and deceive an average interrogator, playing C, the judge, about its machine condition.

However, the details and exact conditions of the imitation game as an experiment slipped through Turing’s text in a series of variations that defies interpretation. A structural reading of the text identifies four different conditions of the game with respect to players A-B, namely, man-woman (p. 433), machine-woman (p. 434), machine-machine (p. 441), and machine-man (p. 442). These different conditions relate to four variants of the “new” question that Turing posed to replace his “original” question (see Box 1). In addition to varying the genus/species (types) of the players, he also increased the storage and speed of the machine and provided it with a hypothetically appropriate program (Q′′′superscript𝑄′′′Q^{\prime\prime\prime}italic_Q start_POSTSUPERSCRIPT ′ ′ ′ end_POSTSUPERSCRIPT), and suggested a base time for the interrogation session (Q′′′′superscript𝑄′′′′Q^{\prime\prime\prime\prime}italic_Q start_POSTSUPERSCRIPT ′ ′ ′ ′ end_POSTSUPERSCRIPT). Other seemingly relevant parameters were not mentioned, such as the number of interrogators used to arrive at a statistically sound conclusion, although their profile is mentioned — they should be “average” —, and later reiterated — they “should not be expert about machines.”444‘Can automatic calculating machines be said to think?’, Broadcast on BBC Third Programme, 14 and 23 Jan. 1952. Archives Centre, King’s College, Cambridge, AMT/B/6.

BOX 1
 
Turing’s various questions and conditions
 
Q𝑄Qitalic_Q: “I propose to consider the question, ‘Can machines think?’ ” (p. 433)
Qsuperscript𝑄Q^{\prime}italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT: “What will happen when a machine takes the part of A in this [man-woman] game?’ Will the interrogator decide wrongly as often when the game is played like this [machine-woman] as he does when the game is played between a man and a woman? These questions replace our original, ‘Can machines think?’ ” (pp. 433-434) Q′′superscript𝑄′′Q^{\prime\prime}italic_Q start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT: “There are already a number of digital computers in working order, and it may be asked, ‘Why not try the experiment straight away? It would be easy to satisfy the conditions of the game. A number of interrogators could be used, and statistics compiled to show how often the right identification was given.’ The short answer is that we are not asking whether all digital computers would do well in the game nor whether the computers at present available would do well, but whether there are imaginable computers which would do well.” (p. 436) Q′′′superscript𝑄′′′Q^{\prime\prime\prime}italic_Q start_POSTSUPERSCRIPT ′ ′ ′ end_POSTSUPERSCRIPT: “It was suggested tentatively that the question [Q𝑄Qitalic_Q] should be replaced by [Q′′superscript𝑄′′Q^{\prime\prime}italic_Q start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT] \ldots But in view of the universality property we see that either of these questions is equivalent to this, ‘Let us fix our attention on one particular digital computer C. Is it true that by modifying this computer to have an adequate storage, suitably increasing its speed of action, and providing it with an appropriate programme, C can be made to play satisfactorily the part of A in the imitation game, the part of B being taken by a man?’ ” (p. 442) Q′′′′superscript𝑄′′′′Q^{\prime\prime\prime\prime}italic_Q start_POSTSUPERSCRIPT ′ ′ ′ ′ end_POSTSUPERSCRIPT: “I believe that in about fifty years’ time it will be possible to programme computers, with a storage capacity of about 109superscript10910^{9}10 start_POSTSUPERSCRIPT 9 end_POSTSUPERSCRIPT, to make them play the imitation game so well that an average interrogator will not have more than 70 per cent. chance of making the right identification after five minutes of questioning.” (p. 442)

3. Imitation: from 1936 to 1950

Because the machine must imitate stereotypes of what it is not, Turing’s proposal has often been criticized for encouraging fakes and tricks. But this is part of the literal reading that Turing would have meant his test as a practical experiment about short conversations, gendered machines, and how to fine-tune them to fool average human interrogators. It misses the point of Turing’s irony (Gonçalves, 2023b) and that his notion of imitation in 1950 was largely in continuity with his 1936 paper (Turing, 1936), as hinted at in his 1947 lecture,555‘Lecture to L.M.S. Feb. 20 1947.’ Archive Centre, King’s College, Cambridge, AMT/B/1. and as newly discovered correspondence with the Mexican-Canadian computer pioneer Beatrice Worsley (1921-1972) helps to clarify (see Box 2).

In his letter to Worsley, Turing seems to be more interested in the relations between “the motions” of Turing machines and infinite computers, whose behavior can be non-periodic. Perhaps he thought of the living human brain as an infinite computer, in the sense that it has a continuous interface with its environment, which constantly intervenes and changes its logical structure.666Cf. “Intelligent Machinery”, written in 1948 as a technical report to the National Physical Laboratory. Archives Centre, King’s College, Cambridge, AMT/C/11. Now, the imitation game puts into empirical form the relation between digital computers, whose behavior is ultimately periodic, and the behavior of the human players. Can the behavior of their brains be approximated by a digital computer? Turing pursued this question. For his May 1951 broadcast, he wrote: “the view which I hold myself, that it is not altogether unreasonable to describe digital computers as brains \ldots If it is accepted that real brains, as found in animals, and in particular in men, are a sort of machine it will follow that our digital computer, suitably programmed, will behave like a brain.”777‘Can digital computers think?’, broadcast on BBC Third Programme, 15 May 1951. Archives Centre, King’s College, Cambridge, AMT/B/5.

Even if the human brain can only be compared to an infinite computer, could it not be simulated by a digital computer equipped with a sufficiently large memory? An excerpt of another newly discovered Turing letter to Worsley from mid-1951 can give more contour and provide further insight into Turing’s views (see Box 3).

BOX 2
 
Turing’s mathematical concept of imitation
 
Dear Miss Worsley,
I was interested in your work on the relation between computers and Turing machines. I think it would be better though if you could try and find a realtion [sic] between T machines and infinite computers, rahter [sic] than between finite T machines and computers. The relation that you suggest is rather too trivial. The fact is that the motions of either a finite T machine or a finite computer are ultimately periodic, and therefore any sequence computed by them is ultimately periodic. It is easy therefore in theory to make one imitate the other, though the size of the imitating machine will (if this technique is adopted) have to be of the order of the exponential of the size of the imitated machine. Probably your methods could prove that this exponential relation could be reduced to a multiplicative factor.
Yours sincerely, A. M. Turing888 Turing to B. H. Worsley, June 11, 1951, typeset; emphasis added. Unpublished writings of Alan Turing, copyright The Provost and Scholars of King’s College Cambridge 2023. B.H. Worsley Collection, Archives Center, National Museum of American History, Smithsonian Institution. Quoted with permission. Thanks to Mark Priestley for finding and kindly sharing this source.

A highlight in this excerpt is Turing’s view that to the extent that the behavior of a neuron can be described as a stochastic process, it would be possible to “calculate random samples” of the mechanism that embodies the brain and then imitate it.999Note the connection with the imitation game and his wartime experience studying and imitating the behavior of the Enigma machines used by the Nazi forces (Hodges, 1983). An effective imitation of the brain by a machine would require knowledge of the anatomy and physiology of the brain to inspire an appropriate program, as well as much more storage and speed than was available to the Ferranti Mark I at the time (see Fig. 2). Another important element in the excerpt is Turing’s point that, even if a thinking machine is possible, the relation he has in mind is not one of identity but one of analogy: “It’ll just be another species of the thinking genus.”

An original answer to problem P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, why design a test based on imitation, which can be seen as encouraging deception, is as follows. Actually, imitation was Turing’s fundamental principle of the new science of universal digital computing. He conceived his 1950 paper largely in continuity with his 1936 paper. Both were based on his core concepts of machine and imitation, i.e., what it takes for a machine to imitate another machine. A point of difference is that by 1950 he had generalized the machine architecture and how ‘universal’ imitation can be achieved. Using Turing’s 1948 language,101010“Intelligent Machinery,” op. cit. universality could be achieved with an ‘organized’ machine (1936), or with an ‘unorganized’/‘learning’ machine (1948/1950). Whereas in 1936 the machine would be given an a priori, specified, and fixed table of instructions for each task, in 1950 it would also be able to perform a new task by changing its logical structure as a result of learning from experience, much as the brain does, “by changing its neuron circuits by the growth of axons and dendrites.”111111Turing to Ross Ashby, circa November 19, 1946. British Library, Collection ‘W. Ross Ashby: correspondence of W. Ross Ashby’, Add MS 89153/26.

The kind of forgery and trickery that occurs in publicity-oriented practical “Turing” tests has nothing to do with Turing’s 1950 proposal. In 1951, he warned: “It would be quite easy to arrange the [machine’s] experiences in such a way that they automatically caused the structure of the machine to build up into a previously intended form, and this would obviously be a gross form of cheating, almost on a par with having a man inside the machine.”121212‘Intelligent machinery, a heretical theory’, a lecture given to ‘51 Society’ at Manchester, c. 1951. Archives Centre, King’s College, Cambridge, AMT/B/4. The “human fallibility” that Turing encouraged the machine to show was not meant to be artificially introduced, but rather a by-product of learning from experience. This 1950 passage clarifies: “Another important result of preparing our machine for its part in the imitation game by a process of teaching and learning is that ‘human fallibility’ is likely to be omitted in a rather natural way, i.e., without special ‘coaching’ ” (Turing, 1950, p. 459). That is, for the machine to be a valid player, it must not be specially prepared for the test. This is one of the reasons why we have never seen a practical Turing test.

BOX 3
 
Simulation of neuron processes
 
Dear Miss Worsley, \,\ldots
I do not think you will be able to find any clue to essential differences between brains and computing machines (if there are any), in neuron behaviour. So long as what we know about a neuron can be embodied in the description of stochastic processes, the behaviour of any mechanism embodying such neurons can, in principle, be calculated by a suitable enlarged and speeded up Ferranti [Mark II] machine.131313‘Ferranti’ is typed and erased, and ‘Mark II’ is added in pencil, referring to a version of the evolving Manchester electronic computer. More accurately I should say that one can calculate random samples of its behaviour. I think any attempt to draw any sharp line between what machine and brain can do will fail. I think it is largely a quantitative matter. Probably one needs immensely more storage capacity then [sic] we have got, and possibly more than we shall ever have. Perhaps we may have enough capacity, but just won’t find an appropriate programme. Naturally one won’t make a man that way ever. It’ll just be another species of the thinking genus.
Yours sincerely, A. M. Turing141414 Turing to B. H. Worsley, circa June, 1951, Turing’s emphasis. Credit for this source is exactly the same as for that of Box 2. Quoted with permission.
Refer to caption
Figure 2. Control console of Ferranti Mark I and a group with Turing’s secretary at the Computing Machine Laboratory, Sylvia Robinson, pretending to play chess with the machine, c. 1955. Courtesy of The University of Manchester.

4. The method of thought experiments

The various rhetorical questions Turing posed, QQ′′′′superscript𝑄superscript𝑄′′′′Q^{\prime}\ldots Q^{\prime\prime\prime\prime}italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT … italic_Q start_POSTSUPERSCRIPT ′ ′ ′ ′ end_POSTSUPERSCRIPT, to replace the original question, Q𝑄Qitalic_Q, can be generalized as follows (Gonçalves, 2023e):

Question Qsuperscript𝑄Q^{\star}italic_Q start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT: could player A imitate intellectual stereotypes associated with player B’s type successfully (well enough to deceive player C), despite A and B’s physical differences?

It has been largely unnoticed that the various questions instantiating Qsuperscript𝑄Q^{\star}italic_Q start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT follow a case-control methodology, applied in two stages. At the more obvious intra-game level, A plays the case, and B plays the control. However, at the inter-game level, two variants set the case (machine-woman and the machine-man) and the other two set the control (man-woman and the machine-machine). While the first two are open, creating suspense around the test, the latter two are resolved as follows. It is known that a man (A) can possibly imitate gender stereotypes associated with a woman (B) to the point of deceiving an interrogator (C) despite their physical differences. This is the very premise of the parlor games that existed at the time. Further, regarding the machine-machine variant, it is also known that a digital computer (A), because of its universality property, as Turing explained in the paper (Turing, 1950, §§4, 5), can possibly imitate any discrete-state machine (B), despite their architectural differences.

We can now explore how Turing’s presentation of his test conforms to Ernst Mach’s reconstruction of “the basic method of thought experiments,” which is variation, continuously if possible. Mach is the author of perhaps the most classic text on thought experiments in the modern scientific tradition (Mach, 1897), in which he developed observations and insights based on a wealth of examples from the history of modern physics, mathematics, and common sense experience. He wrote: “By varying the conditions (continuously if possible), the scope of ideas (expectations) tied to them is extended: by modifying and specializing the conditions we modify and specialize the ideas, making them more determinate, and the two processes alternate” (p. 139). Mach illustrated his point with the process of discovery of universal gravitation (Mach, 1897, pp. 138-139):

A stone falls to the ground. Increase the stone’s distance from the earth, and it would go against the grain to expect that this continuous increase would lead to some discontinuity. Even at lunar distance the stone will not suddenly lose its tendency to fall. Moreover, big stones fall like small ones: the moon tends to fall to the earth. Our ideas would lose the requisite determination if one body were attracted to the other but not the reverse, thus the attraction is mutual and remains so with unequal bodies, for the cases merge into one another continuously \ldots discontinuities are quite conceivable, but it is highly improbable that their existence would not have betrayed itself by some experience. Besides, we prefer the point of view that causes less mental exertion, so long as it is compatible with experience.

The conditions, i.e., the distance of the fall and the size of the stones, are continuously varied in the physicist’s mind and eventually stretched to the celestial scale. Reciprocally, the concept of a celestial body, such as the Earth or the Moon, becomes interchangeable with the concept of a stone, and quite unequal stones can then become mutually attracted. The cases continuously merge into one another, and a conceptual integration is established that connects near-earth bodies to celestial bodies under a unified concept.

Turing’s imitation game extended the scope of ideas and expectations established earlier in his 1936 paper, moving from machine-machine and restricted human-machine imitation in 1936151515“We may compare a man in the process of computing a real number to a machine which is only capable of a finite number of conditions” (Turing, 1936, p. 231). to more general human-machine imitation in 1950.

To understand this better, let us take a brief look at Turing’s 1948 report “Intelligent Machinery” (op. cit.). In section (§3) ‘Varieties of machinery,’ he noted: “All machinery can be regarded as continuous, but when it is possible to regard it as discrete it is usually best to do so.” A brain, he wrote, “is probably” a ‘continuous controlling’ machine, but in light of the digital nature of neural impulses, it “is very similar to much discrete machinery.” In section (§6) ‘Man as Machine,’ he referred to the imitation of “any small part of a man” by machines: “A great positive reason for believing in the possibility of making thinking machinery is the fact that it is possible to make machinery to imitate any small part of a man” (p. 420). In light of this, he argued: “One way of setting about our task of building a ‘thinking machine’ would be to take a man as a whole and to try to replace all the parts of him by machinery.” But Turing dismissed such a method as “altogether too slow and impracticable,” and later alluded to moral and aesthetic reasons as well.161616For the May 1951 broadcast (op. cit.), he wrote: “I certainly hope and believe that no great efforts will be put into making machines with the most distinctively human, but non-intellectual characteristics such as the shape of the human body; it appears to me to be quite futile to make such attempts and their results would have something like the unpleasant quality of artificial flowers.”

We can now follow Turing’s use of the method of continuous variation in the design of his imitation tests. The essential question (Qsuperscript𝑄Q^{\star}italic_Q start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT) Turing asks is whether the intellectual and cultural performances (the stereotypes)171717S. Sterrett first emphasized the importance of stereotypes in the imitation game (Sterrett, 2000). associated with woman, man, machine (the types) could be imitated, and thus softly transposed. Note that for any arbitrarily chosen type, say, a woman, further specific subtypes can be continuously conceived and considered as varied conditions of the imitation game: women having property p𝑝pitalic_p, women having subproperty ppsuperscript𝑝𝑝p^{\prime}\subset pitalic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⊂ italic_p, and so on. For any two arbitrarily chosen types, a new type can be conceived, whether as a specialization or a modification (“any small part of a man”). Because concepts are fluid entities, there is an evolving continuum of levels and types.

The question across the various versions of the game can be posed this way: how does C’s perception of A’s performance against B’s performance change as the game’s conditions are (continuously) varied? Will it change if gendered verbal behavior is required as a subtype of human verbal behavior? Will it change if the machine’s hardware is increased and/or its learning program is modified? For Turing, there is no conceptual discontinuity among the various conditions that instantiate his thought experiment.

From 1948 to 1952, Turing presented various imitation tests based on both the game of chess and conversation. A historically sound problem, because it does not struggle with the materiality of Turing’s texts and their chronological coherence, does not erase some of his tests in favor of others, does not ignore the historical conditions of his proposal, is (P2subscript𝑃2P_{2}italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT) Why would Turing present multiple versions of his test, as opposed to a well-defined, controlled experiment? I have presented an answer by reconstructing Turing’s use of “the method of thought experiments,” and this will be reinforced in the next section. Turing’s purpose in his 1950 paper was not to propose a ‘benchmark’ for the non-existent field of AI, but to respond to critics. Especially in 1949, he felt compelled to point out that the new science of universal digital computing would eventually have an impact and expand our view of ‘thinking.’

5. 1949, the crucial year

As is often the case with thought experiments, Turing proposed his test out of a controversy (Gonçalves, 2023d). He was coming from his continuing disputes with the physicist and computer pioneer, Fellow of the Royal Society (FRS), Douglas Hartree (1897-1958), over the meaning of the newly existing digital computers, which had started in 1946 (Gonçalves, 2023c). Now, in mid-1949, new opponents had arrived, notably the neurosurgeon Geoffrey Jefferson (1886-1961), and the chemist and philosopher Michael Polanyi (1891-1976), both also FRS and based at the same institution as Turing, the University of Manchester, where Turing had spent a year as a Reader in the Department of Mathematics (Hodges, 1983). These three thinkers challenged Turing’s claims about the future possibilities and limitations of digital computers.

In June 1949, Hartree published his Calculating Instruments and Machines (Hartree, 1949), in which Ada Lovelace’s work was acknowledged seemingly for the first time by a twentieth-century computer pioneer (Gonçalves, 2023c). Since November 1946, Hartree had been opposing the use of the term ‘electronic brain.’ He wrote in a letter to the Times: “These machines can only do precisely what they are instructed to do by the operators who set them up.”181818“The ‘Electronic Brain’: A Misleading Term; No Substitute for Thought,” Times, November 7, 1946. Now in 1949, Hartree added strength to his argument by quoting the words of Ada Lovelace from the 1840s about Charles Babbage’s machine: “The Analytical Engine has no pretensions to originate anything \ldots It can do whatever we know how to order it to perform” (her emphasis) (Hartree, 1949, p. 70). Noting Hartree’s anachronism in taking Lovelace’s words out of their time and place, Turing further developed his earlier, 1947 response to Hartree’s challenge,191919‘Lecture to L.M.S. Feb. 20 1947’, p. 22; op. cit. now calling it ‘(6) Lady Lovelace’s objection’ (Turing, 1950, p. 450). Turing argued that intelligent behavior is the result of learning, a capability he had no problem attributing to future digital computers. He also questioned the implicit assumption of Hartree’s challenge: “Who can be certain that ‘original work’ that he has done was not simply the growth of the seed planted in him by teaching, or the effect of following well-known general principles” (p. 450). In the imitation game, Turing suggested, the interrogator would be able to evaluate the machine’s ability to learn: “The game (with the player B omitted) is frequently used in practice under the name of viva voce to discover whether some one really understands something or has ‘learnt it parrot fashion’ ” (p. 446). But then we might ask, what is player B doing in the imitation game? Following the 1949 events will suggest an answer.

On June 9, in London, Jefferson delivered his prestigious Lister Oration on ‘The Mind of Mechanical Man,’ which was published in the debuting British Medical Journal on June 25 (Jefferson, 1949). His lecture was headlined in the Times on June 10,202020‘No Mind For Mechanical Man.’ Times, 10 June 1949, p. 2. emphasizing his claim that “Not until a machine can write a sonnet or compose a concerto because of thoughts and emotions felt, and not by the chance fall of symbols, could we agree that machine equals brain” (p. 1110). This rendered Turing’s ironic response: “I do not think you can even draw the line about sonnets, though the comparison is perhaps a little bit unfair because a sonnet written by a machine will be better appreciated by another machine.”212121‘The Mechanical Brain.’ Times, 11 June 1949, p. 4. In October and December 1949, two seminars on ‘Mind and Machine’ were organized by Polanyi et al., and attended by Jefferson, Turing et al., at the Philosophy Department in Manchester (Polanyi, 1958, p. 275; cf. also note 1 above). These seminar discussions, followed by Jefferson giving Turing an offprint of his Lister Oration,222222This may have happened in the evening of the December meeting of the Manchester seminar (op. cit.), when, according to a later letter from Jefferson to Ethel S. Turing, Turing and J.Z. Young went to dinner at Jefferson’s house (Turing, 1959, p. xx). which Turing read and marked with a pencil,232323Off-print, ‘The mind of mechanical man’ by Geoffrey Jefferson. Archives Centre, King’s College, Cambridge, AMT/B/44. led him to write his 1950 paper and propose his test (Gonçalves, 2023d).

In his Lister Oration (Jefferson, 1949), Jefferson had characterized intelligence as an emergent property of the animal nervous system. He emphasized that “sex hormones introduce peculiarities of behaviour often as inexplicable as they are impressive” (p. 1107). Because “modern automata” are not moved by male and female sex hormones, they could not exhibit such peculiarities to imitate the actions of animals or ‘men.’ Specifically, he used a thought experiment to criticize Grey Walter’s mechanical turtles by suggesting that gendered behavior is causally related to the physiology of sex hormones (ibid.):

[…It] should be possible to construct a simple animal such as a tortoise (as Grey Walter ingeniously proposed) that would show by its movements that it disliked bright lights, cold, and damp, and be apparently frightened by loud noises, moving towards or away from such stimuli as its receptors were capable of responding to. In a favourable situation the behaviour of such a toy could appear to be very lifelike — so much so that a good demonstrator might cause the credulous to exclaim ‘This is indeed a tortoise.’ I imagine, however, that another tortoise would quickly find it a puzzling companion and a disappointing mate.

Jefferson thus brought forward the image of a genuine individual of a kind, which is placed side by side with the artificial one so that the latter’s artificiality is emphasized. The function of the genuine individual is to expose the artificiality of the impostor. The means of exposure is to fail at demonstrating interesting (sexual) behavior. This can explain Turing’s introduction of a (gendered) control player B, who appears in Turing’s 1950 test, whose design was prompted by his reading of Jefferson, but not in Turing’s 1948, 1951, and 1952 tests. In discussing “(4) The Argument from Consciousness,” Turing addressed Jefferson directly and quoted in full his conditions for agreeing “that machine equals brain,” including “be warmed by flattery” and “be charmed by sex” (Turing, 1950, pp. 445-446). In discussing the “(5) Argument from Various Disabilities,” Turing again mentioned Jefferson and argued that to say that a machine could never “fall in love” or “make someone fall in love with it” was a flawed scientific induction from the capabilities of present machines.

Thus, an answer to the first part of problem P3subscript𝑃3P_{3}italic_P start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT, why gender imitation in a test of machine intelligence, is that Turing’s design of his test was an ironic response to Jefferson’s association of sex and gender with intelligence, particularly his suggestion that gendered behavior is causally related to the physiology of male and female sex hormones. It remains to address the second part of problem P3subscript𝑃3P_{3}italic_P start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT, why Turing chose conversation as the intelligence task of his test.

Surviving minutes of the ‘Mind and Machine’ seminar held on October 27, 1949, were published in 2000 by a participant, Wolfe Mays.242424W. Mays, ‘Turing and Polanyi on minds and machines.’ Appraisal, 3(2), 55-62. Andrew Hodges also published it on his website: https://www.turing.org.uk/sources/wmays1.html. Accessed December 6, 2023. In the first session, Polanyi presented a statement, ‘Can the mind be represented by a machine?,’252525Polanyi, Michael. Papers, Box 22, Folder 19, Hanna Holborn Gray Special Collections Research Center, University of Chicago Library. which was a Gödelian argument that humans can do things that machines cannot. Although Turing had already addressed this argument in his 1947 lecture (op. cit.), Polanyi’s insistence may help explain Turing’s inclusion of “(3) The Mathematical Objection” (Turing, 1950). Further, the minutes (op. cit.) show that Polanyi tried to distinguish the formal “rules of the logical system” from the informal “rules which determine our own behaviour,” and this can explain Turing’s inclusion of “(8) The Argument from Informality of Behaviour” (Turing, 1950).

Years later (Polanyi, 1958, p. 275), Polanyi remembered “a communication to a Symposium held on ‘Mind and Machine’ at Manchester University in October, 1949,” in which “A.M. Turing has shown that it is possible to devise a machine which will both construct and assert as new axioms an indefinite sequence of Gödelian sentences.”262626Polanyi added that “this is foreshadowed” in Turing’s 1938 paper based on his Ph.D. thesis, ‘Systems of Logic Based on Ordinals,’ J. London Math. Soc. s2-45(1), 161-228. Polanyi resumed, showing that he assimilated the punch: “Any heuristic process of a routine character—for which in the deductive sciences the Gödelian process is an example—could likewise be carried out automatically.” However, Polanyi used the same argument to dismiss the game of chess as a testbed for machine intelligence, noting: “A routine game of chess can be played automatically by a machine, and indeed, all arts can be performed automatically to the extent to which the rules of the art can be specified.”

Chess, not conversation, had been Turing’s chosen field to illustrate, develop, and test machine intelligence since at least February 1946.272727‘Proposed electronic calculator,’ February 1946. Archives Centre, King’s College, Cambridge, AMT/C/32. On p. 16, Turing asks: “Can the machine play chess?” In his 1948 ‘Intelligent Machinery’ report (op. cit.), Turing had discussed a tradeoff between convenient and impressive intellectual fields for exploring machine intelligence. After discussing “various games e.g. chess,” Turing wrote: “Of the above possible fields the learning of languages would be the most impressive, since it is the most human of these activities.” However, he avoided language learning because it seemed “to depend rather too much on sense organs and locomotion to be feasible,” stuck with chess, and ended up describing a chess-based imitation game. Now in October 1949, he saw chess being dismissed as an unimpressive to make the case for machine intelligence because its rules could be specified.

Some time later, probably around Christmas 1949, Turing will read Jefferson’s Lister Oration (Jefferson, 1949) and mark the passage quoting René Descartes (p. 1106), which starts: “Descartes made the point, and a basic one it is, that a parrot repeated only what it had been taught and only a fragment of that; it never used words to express its own thoughts.” Overall, Jefferson suggested ‘speech’ to be the distinguishing feature of human intelligence compared to other kinds of animal intelligence: “Granted that much that goes on in our heads is wordless \ldots we certainly require words for conceptual thinking as well as for expression \ldots It is here that there is the sudden and mysterious leap from the highest animal to man, and it is in the speech areas of the dominant hemisphere \ldots that Descartes should have put the soul, the highest intellectual faculties” (p. 1109).

Unlike chess, which is governed by definite rules, good performance in conversation was thought to be reserved for humans. So Turing’s 1950 change to “the learning of languages” as the intellectual field addressed by his test can best be understood as yet another concession to Jefferson, and in this case to Polanyi as well.

In summary, Turing varied the design of his imitation tests to respond to the challenges posed by Hartree, Polanyi, and Jefferson. This fits neatly into Popper’s methodological rule for “the use of imaginary experiments in critical argumentation” (Popper, 1959): “the idealizations made must be concessions to the opponent, or at least acceptable to the opponent” (p. 466, Popper’s emphasis).

6. A hammer and a feather on the moon

On August 2, 1971, more than three centuries after Galileo’s death, a live anecdotal demonstration of Galileo’s legendary tower experiment was performed for the television cameras by astronaut David Scott during the final Apollo 15 moonwalk (Fig. 3). Far from the Earth’s atmosphere, essentially in a vacuum, the astronaut simultaneously released a heavy object (an aluminum geological hammer) and a light object (a falcon feather) from approximately the same height, which fell to the ground at the same rate to the naked eye. The performer, who attributed their successful mission in part to “a rather significant discovery about falling objects in gravity fields” made long ago by “a gentleman named Galileo,” celebrated: “How about that! Mr Galileo was correct in his findings.”282828For footage and a technical description of the demonstration, see ¡http://nssdc.gsfc.nasa.gov/planetary/lunar/apollo_15_feather_drop.html¿. Accessed December 6, 2023.

Variants of Galileo’s falling-bodies experiment first appeared in his De motu drafts written in the 1590s. Decades later came the punchy presentation of his 1638 Two New Sciences (Galilei, 1638, pp. 66-67):

SALVIATI: But without experiences, by a short and conclusive demonstration, we can prove clearly that it is not true that a heavier moveable is moved more swiftly than another, less heavy, these being of the same material, and in a word, those of which Aristotle speaks. Tell me, Simplicio, whether you assume that for every heavy falling body there is a speed determined by nature such that this cannot be increased or diminished except by using force or opposing some impediment to it \ldots [SIMPLICIO agrees]
Then if we had two moveables whose natural speeds were unequal, it is evident that were we to connect the slower to the faster, the latter would be partly retarded by the slower, and this would be partly speeded up by the faster \ldots

Refer to caption
Figure 3. Alan Bean’s painting “The Hammer and the Feather,” 1986. Reproduced with permission. Source: http://alanbeangallery.com/hammerfeather-story.html.

From the outset, Galileo makes Simplicio accept Salviati’s carefully formulated assumption that for every heavy falling body there is a natural speed that cannot be altered except by external intervention. However, the Aristotelian could have found a way out by noting the imprecision and denying that weight and natural speed are physically determinate for connected but not unified bodies (Gendler, 1998).

Could Galileo have run a variant of this experiment from the Leaning Tower of Pisa and obtained the results claimed in the story? Physicists went to the laboratory and concluded that it is doubtful (Adler and Coulter, 1978). Decades earlier, it was found (Cooper, 1935) that the only source for the legendary story was actually an apologetic biography written by Galileo’s disciple Vincenzo Viviani 12 years after Galileo’s death and first published in 1717. For centuries, Galileo was largely considered “the first true empiricist,” but the role of experiment in Galilean science was more complex than previously thought (Segre, 1989; Palmieri, 2005).

Galileo’s falling-bodies experiment suggests an anomaly in Aristotle’s theory of motion under certain idealized conditions, the existence of motion in a void, which was unacceptable at the time. Testing such an existential hypothesis was in infinite regress with the conditions it required, and creating those conditions would require long-term scientific and technological progress.

The Galilean impasse could only be broken by the power of his thought experiments as propaganda292929‘Propaganda’ is meant here in its pre-Nazi, neutral sense of propagating, spreading. for a next generation of scientists (Gonçalves, 2023a). Robert Boyle (1627-1692) was one of them. As a teenager, he visited Florence shortly before Galileo’s death and was impressed by “the new paradoxes of the great star-gazer Galileo” (Fulton, 1960, p. 119). In his career, Boyle built air pumps and special chambers to study vertical fall in small evacuated environments, and became an exponent of experimental philosophy in the Royal Society. There is no record of Boyle interpreting Galilean science literally and performing any tower experiments. The value of Galileo’s thought experiments was to lay conceptual foundations and to conjecture a class of idealized phenomena to be pursued by progressive science and technology. There is a path that connects Galileo’s thought experiments and Boyle’s vacuum chambers to the space programs of the 1950s and finally to the anecdotal confirmation of Galileo’s hypothesis by the crew of the Apollo 15 mission to the moon.

Now note the analogy with the Turing test (Gonçalves, 2023a). As Hayes and Ford claimed: “The tests are circular: they define the qualities they are claiming to be evidence for” (p. 974). Turing’s existential hypothesis of a Turing-test-passing machine is in infinite regress with the conditions assumed by the test: an idealized computer equipped with a hypothetically appropriate program (Box 1). But if these conditions exist, why do we need a Turing test at all? This shows that practical Turing tests can serve at best as anecdotal confirmation of Turing’s hypothesis, and at worst, as we have seen for decades, as publicity stunts. The value of Turing’s test must lie elsewhere.

7. The value of Turing’s test for AI

By May 1953, John McCarthy and Claude Shannon were working on their collection Automata Studies (McCarthy and Shannon, 1956), which revolved around “the theory of Turing machines” (p. vii), and to which they invited Turing to contribute.303030Shannon and McCarthy to Turing, May 18, 1953. Alan Turing Papers (Additional), University of Manchester Library, GB133 TUR/Add/123. Turing declined the invitation, saying that he had been working for the last two years on “the mathematics of morphogenesis,” although he expected “to get back to cybernetics very shortly.”313131Turing to Shannon, June 3, 1953 (ibid.). One year and four days later, Turing was dead, and early AI would not note his biological turn. Commenting on “the Turing definition of thinking” (p. vi), McCarthy and Shannon found it “interesting” because it “has the advantages of being operational or, in the psychologists’ term, behavioristic \ldots No metaphysical notions of consciousness, ego and the like are involved.” They also thought that this very strength could be a weakness, because it has “the disadvantage” of being susceptible to a memorizing machine playing the imitation game by looking up “a suitable dictionary.”

McCarthy and Shannon referred interchangeably to ‘definition’ and to a word that Turing actually used, ‘criterion:’ “While certainly no machines at the present time can even make a start at satisfying this rather strong criterion, Turing has speculated that within a few decades it will be possible to program general purpose computers in such a way as to satisfy this test” (McCarthy and Shannon, 1956, p. v, emphasis added).

In 1955, before the publication of Automata Studies, McCarthy and Shannon, together with Marvin Minsky and Nathaniel Rochester, co-authored their well-known ‘Proposal’ for AI research (McCarthy et al., 1955). Unlike Turing himself, they seem to have thought of machine intelligence in terms of Turing machines, as their opening paragraph suggests: “The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.” Whether or not they followed Turing on the ‘how,’ they certainly followed him on the ‘what,’ in writing: “For the present purpose the artificial intelligence problem is taken to be that of making a machine behave in ways that would be called intelligent if a human were so behaving” (p. 7). This definition — compare it with “the Turing definition of thinking” — would stay.

In the early 1960s, Edward Feigenbaum and Julian Feldman noted in Computers and Thought (Feigenbaum and Feldman, 1963) that Turing’s 1950 paper “appeared five years before concrete developments in intelligent behavior by machine began to occur;” and “yet,” they continued, “it remains today one of the most cogent and thorough discussions in the literature on the general question “Can a machine think?” (pp. 9-10). They observed Turing’s “behavioristic posture relative to the question,” which “is to be decided by an unprejudiced comparison of the alleged ‘thinking behavior’ of the machine with normal ‘thinking behavior’ in human beings” (emphasis added). They concluded: “He proposes an experiment — commonly called ‘Turing’s test’ — in which the unprejudiced comparison could be made \ldots Though the test has flaws, it is the best that has been proposed to date.”

Minsky, in the preface to his 1967 collection (Minsky, 1968), reiterates the definition of AI as “the science of making machines do things that would require intelligence if done by men” (p. v). Around the same time, Minsky collaborated with Stanley Kubrick and Arthur Clarke on their 1968 screenplay, also written as a novel, 2001: A Space Odyssey (Clarke, 1968), which featured a futuristic computer named HAL:

Whether HAL could actually think was a question which had been settled by the British mathematician Alan Turing back in the 1940s. Turing had pointed out that, if one could carry out a prolonged conversation with a machine — whether by typewriter or microphone was immaterial — without being able to distinguish between its replies and those that a man might give, then the machine was thinking, by any sensible definition of the word. HAL could pass the Turing test with ease.

The “Turing definition of thinking” was to become legendary.

Stuart Shieber studied McCarthy and Shannon’s memorizing machine objection in depth, elaborated on its assumptions, and concluded that it is invalid (Shieber, 2014). But McCarthy’s concept of ‘memorizing’ may have been more elastic, as his later comment on Deep Blue’s defeat of Gary Kasparov seems to suggest (McCarthy, 1997). He expressed disappointment that it was mostly an achievement of computational power rather than thinking, and gave a clear argument why he thought so. Essentially, McCarthy pointed out, computer chess advanced by replacing heuristic techniques, which relied on the expertise of human players to prune the search space of possible moves, with brute force computing. “[I]t is a measure of our limited understanding of the principles of artificial intelligence,” McCarthy wrote, “that this level of play requires many millions of times as much computing as a human chess player does.” It may be, but that the problem was “largely a quantitative matter” was hinted at by Turing in his letter to Worsley of c. June 1951 (Box 3).

Ten years after Deep Blue vs. Kasparov, McCarthy referred to Turing’s 1947 lecture (op. cit.) as “the first scientific discussion of human level machine intelligence,” and to Turing’s 1950 paper as “amplifying” that discussion into a “goal” (McCarthy, 2007, p. 1174).

In 1992, Minsky co-authored a work of fiction, The Turing Option (Warner, New York), in which Turing’s test is featured in the preface. In 1995, Minsky took a stand against Loebner’s Weinzenbaum experiments, pleading to “revoke his stupid prize, save himself some money, and spare us the horror of this obnoxious and unproductive annual publicity campaign.”323232‘Annual Minsky Loebner Prize Revocation Prize 1995 Announcement,’ 2 March 1995. Available at: https://groups.google.com/g/comp.ai/c/dZtU8vDD_bk/m/QYaYB18qAToJ. Accessed 25 Nov 2023. In 2013, when asked about the Turing test in a taped interview, Minsky said: “The Turing test is a joke, sort of, about saying ‘A machine would be intelligent if it does things that an observer would say must be being done by a human’ \ldots it was suggested by Alan Turing as one way to evaluate a machine but he had never intended it as being the way to decide whether a machine was really intelligent.”333333‘Marvin Minsky on AI: the Turing test is a joke!’, from 23’ 35” to 24’45”. Available at https://www.singularityweblog.com/marvin-minsky/. Accessed Dec. 6, 2023. This materially connects McCarthy et al.’s definition of “the AI problem” with Turing’s test, if material evidence were still needed.

Overall, it seems that all of these AI pioneers understood and were inspired by Turing’s test at the level of conceptual foundations. Even if some of them also used the term ‘experiment,’ none of them took it literally as a practical experiment, which would indeed imply an astonishing lack of imagination on their part. Turing’s test moved the burgeoning field of AI away from unproductive debates about the meaning of words, for example, allowing Minsky to write in 1967 (Minsky, 1967): “Turing discusses some of these issues in his brilliant article, ‘Comput­ing Machines and Intelligence” [sic], and I will not recapitulate his arguments \ldots They amount, in my view, to a satisfactory refutation of many such objections” (p. 107).

The value of Turing’s test (P5subscript𝑃5P_{5}italic_P start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT) is that it has long been and still is a unifying ‘definition,’ a ‘criterion,’ a ‘goal’ for, in the words of McCarthy et al., the science and engineering of “making a machine behave in ways that would be called intelligent if a human were so behaving.” Every time AI succeeds in automating a new task that was once reserved for humans because it requires intelligence, “the Turing definition” conquers new territory, and the significance of Turing’s early message to his contemporaries becomes clearer.

8. Conclusion

In this paper, I presented a mass of evidence, including newly discovered archival sources, and a new perspective on Turing’s test.

New light was shed on Turing’s concept of imitation, emphasizing that it does not encourage deception in AI. Rather, it is a mathematical concept, largely in continuity with his 1936 paper. I also showed that Turing’s presentation of the various versions of his test fits what Mach characterized as “the basic method of thought experiments” in science. I reconstructed the historical conditions of Turing’s proposal, explaining that gender imitation was his ironic response to Jefferson, and conversation was yet another concession to his opponents. Further, I compared Turing’s test to Galileo’s falling-bodies experiment, showing that the problem of circularity is inherent in existential hypotheses, and that the solution may lie in propaganda and the progressive scientific and technological developments of a next generation of scientists. I then revisited the history of AI, showing that Turing’s test provided McCarthy, Minsky, and others with a definition of the AI problem that, at the level of conceptual foundations, still drives AI research today.

But whatever its utility, Turing’s test has secured its place as one of the most beautiful thought experiments in the history of science.

Acknowledgements.
The author thanks Andrew Hodges, Jim Miles, and H. V. Jagadish for their valuable comments on an earlier version of this article, Mark Priestley for the gift of Turing’s letters to Worsley, and Fabio Cozman and Murray Shanahan for their support. The author is solely responsible for the accuracy of this work. This research has been supported by the São Paulo Research Foundation (FAPESP grants 2022/16793-9 and 2019/21489-4, “The Future of Artificial Intelligence: The Logical Structure of Alan Turing’s Argument”), and by the IBM Corporation and the São Paulo Research Foundation (FAPESP grant 2019/07665-4).

References

  • (1)
  • Adler and Coulter (1978) Carl G. Adler and Byron L. Coulter. 1978. Galileo and the Tower of Pisa experiment. Am. J. Phys. 46, 3 (1978), 199–201. doi:10.1119/1.11165.
  • Biever (2023) Celeste Biever. 2023. ChatGPT broke the Turing test — the race is on for new ways to assess AI. Nature 619 (2023), 686–689. doi:10.1038/d41586-023-02361-7.
  • Brynjolfsson (2022) Eric Brynjolfsson. 2022. The Turing Trap: The Promise & Peril of Human-Like Artificial Intelligence. Daedalus 151, 2 (2022), 272–287. doi:10.1162/DAED_a_01915.
  • Clarke (1968) Arthur C. Clarke. 1968. 2001: a space odyssey. Dutton, New York.
  • Cooper (1935) Larry Cooper. 1935. Aristotle, Galileo, and the Tower of Pisa. Ithaca, New York.
  • Feigenbaum and Feldman (1963) Edward A. Feigenbaum and Julian Feldman (Eds.). 1963. Computers and Thought. McGraw-Hill, New York.
  • Fulton (1960) John F. Fulton. 1960. The honourable Robert Boyle, F. R. S. (1627-1692). Notes Rec. 15, 1 (1960). doi:10.1098/rsnr.1960.0012.
  • Galilei (1638) Galileo Galilei. 1974 [1638]. Two new sciences. University of Wisconsin Press, Madison. Translated by Stillman Drake.
  • Gendler (1998) Tamar S. Gendler. 1998. Galileo and the indispensability of scientific thought experiment. Brit. J. Philos. Sci. 49, 3 (1998), 397–424. doi:10.1093/bjps/49.3.397.
  • Gonçalves (2023a) Bernardo Gonçalves. 2023a. Galilean resonances: the role of experiment in Turing’s construction of machine intelligence. Ann. Sci. (2023). doi:10.1080/00033790.2023.2234912.
  • Gonçalves (2023b) Bernardo Gonçalves. 2023b. Irony with a point: Alan Turing and his intelligent machine utopia. Philos. Technol. 36, 3 (2023). doi:10.1007/s13347-023-00650-7.
  • Gonçalves (2023c) Bernardo Gonçalves. 2023c. Lady Lovelace’s objection: the Turing-Hartree disputes over the meaning of digital computers, 1946-1951. IEEE Ann. Hist. Comput. (2023). doi:10.1109/MAHC.2023.3326607.
  • Gonçalves (2023d) Bernardo Gonçalves. 2023d. The Turing Test Argument. Routledge, New York. doi:10.4324/9781003300267.
  • Gonçalves (2023e) Bernardo Gonçalves. 2023e. The Turing test is a thought experiment. Minds Mach. 33, 1 (2023), 1–31. doi:10.1007/s11023-022-09616-8.
  • Hartree (1949) Douglas R. Hartree. 1949. Calculating Instruments and Machines. University of Illinois Press, Urbana.
  • Hayes and Ford (1995) Patrick Hayes and Kenneth Ford. 1995. Turing test considered harmful. In Proc. IJCAI. 972–977.
  • Hodges (1983) Andrew Hodges. 1983. Alan Turing: The Enigma. Burnett, London.
  • Jefferson (1949) Geoffrey Jefferson. 1949. The Mind of Mechanical Man. Brit. Med. J. 1, 4616 (1949), 1105–1110. doi:10.1136/bmj.1.4616.1105.
  • Mach (1897) Ernst Mach. 1976 [1897]. On Thought Experiments. In Knowledge and Error: Sketches on the Psychology of Enquiry, Erwin N. Hiebert (Ed.). Springer, Dordrecht, Chapter 11, 134–147.
  • Mays (1952) W. Mays. 1952. Can Machines Think? Philosophy 27, 101 (1952), 148–162. doi:10.1017/S003181910002266X.
  • McCarthy (1997) John McCarthy. 1997. Al as sport. Science 276, 5318 (1997), 1518–1519. doi:10.1126/science.276.5318.1518.
  • McCarthy (2007) John McCarthy. 2007. From here to human-level AI. Artif. Intell. 171, 18 (2007), 1174–1182. doi:10.1016/j.artint.2007.10.009.
  • McCarthy et al. (1955) John McCarthy, M. L. Minsky, N. Rochester, and C.E. Shannon. 2006 [1955]. A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence, August 31, 1955. AI Mag. 27, 4 (2006 [1955]), 1–12. doi:10.1609/aimag.v27i4.1904.
  • McCarthy and Shannon (1956) John McCarthy and Claude Shannon. 1956. Preface. In Automata Studies, Claude Shannon and John McCarthy (Eds.). University Press, Princeton.
  • Minsky (1967) Marvin Minsky. 1967. Computation: Finite and Infinite Machines. Prentice-Hall, Hoboken, NJ.
  • Minsky (1968) Marvin Minsky. 1968. Semantic Information Processing. MIT Press, Cambridge, MA.
  • Palmieri (2005) Paolo Palmieri. 2005. ‘Spuntar lo scoglio più duro:’ did Galileo ever think the most beautiful thought experiment in the history of science? Stud. Hist. Philos. Sci. A 36, 2 (2005), 223–240. doi:10.1016/j.shpsa.2005.03.001.
  • Polanyi (1958) Michael Polanyi. 1962 [1958]. Personal Knowledge: Towards a Post-Critical Philosophy (second ed.). University Press, Chicago.
  • Popper (1959) Karl Popper. 2002 [1959]. The Logic of Scientific Discovery. Routledge, London.
  • Segre (1989) Michael Segre. 1989. Galileo, Viviani and the Tower of Pisa. Stud. Hist. Philos. Sci. A 20, 4 (1989), 435–451. doi:10.1016/0039-3681(89)90018-6.
  • Shieber (1994a) Stuart M. Shieber. 1994a. Lessons from a Restricted Turing Test. Comm. ACM 37, 6 (1994), 70–78. doi:10.1145/175208.175217.
  • Shieber (1994b) Stuart M. Shieber. 1994b. On Loebner’s lessons. Comm. ACM 37, 6 (1994), 83–84. https://doi.org/10.1145/175208.175604
  • Shieber (2004) Stuart M. Shieber (Ed.). 2004. The Turing Test: Verbal Behavior as the Hallmark of Intelligence. MIT Press, Cambridge, MA.
  • Shieber (2014) Stuart M. Shieber. 2014. There can be no Turing-test-passing memorizing machines. Philos. Impr. 14, 16 (2014), 1–13. doi:2027/spo.3521354.0014.016.
  • Sterrett (2000) Susan G. Sterrett. 2000. Turing’s two tests for intelligence. Minds Mach. 10 (2000), 541–559. doi:10.1023/A:1011242120015.
  • Turing (1936) Alan M. Turing. 1936. On computable numbers, with an application to the Entscheidungsproblem. Proceedings of the London Mathematical Society s2-42, 1 (1936), 230–265. doi: 10.1112/plms/s2-42.1.230.
  • Turing (1950) Alan M. Turing. 1950. Computing machinery and intelligence. Mind 59, 236 (1950), 433–460. doi:10.1093/mind/LIX.236.433.
  • Turing (1959) Ethel S. Turing. 2012 [1959]. Alan M. Turing: Centenary Edition. University Press, Cambridge.
  • Vardi (2014) Moshe Y. Vardi. 2014. Would Turing have passed the Turing test? Comm. ACM 57, 9 (2014), 5. doi:10.1145/2643596.
  • Wells (2023) Sarah Wells. 2023. Is the Turing Test Dead? IEEE Spectr. (2023). doi:https://spectrum.ieee.org/turing-test.