Abstract
Here we consider some well-known facts in syntax from a physics perspective, allowing us to establish analogies between both fields with many consequences. Mainly, we observe that the operation MERGE, put forward by Chomsky (in: Evolution and Revolution in Linguistic Theory, Essays in honor of Carlos Otero., eds. Hector Campos and Paula Kempchinsky, 1995), can be interpreted as a physical information coarse-graining. Thus, MERGE in linguistics entails information renormalization in physics, according to different time scales. We make this point mathematically formal in terms of statistical language models. In this setting, MERGE amounts to a probability tensor implementing a coarse-graining, akin to a probabilistic context-free grammar. The probability vectors of meaningful sentences are given by stochastic tensor networks (TN) built from diagonal tensors and which are mostly loop-free, such as Tree Tensor Networks and Matrix Product States, thus being computationally very efficient to manipulate. We show that this implies the polynomially-decaying (long-range) correlations experimentally observed in language, and also provides arguments in favour of certain types of neural networks for language processing. Moreover, we show how to obtain such language models from quantum states that can be efficiently prepared on a quantum computer, and use this to find bounds on the perplexity of the probability distribution of words in a sentence. Implications of our results are discussed across several ambits.
Similar content being viewed by others
Notes
A clarification is in order: here we understand “renormalization” as the tool that allows the mathematical description of the information in a system at different physical scales, accounted for by relevant degrees of freedom at every scale. Of course, the implementation of this idea in several contexts leads to different consequences. Well-known examples in physics are the existence of critical systems, critical exponents, universality classes, phase transitions, the c-theorem, the reshuffling of Hilbert spaces, the \(\beta\)-function, fixed points, RG flows, scaling laws, relevant/irrelevant/marginal perturbations... the list is unending. In our case, however, we do not assume necessarily the existence of any of these in the case of language (though some of them may also be there), and adopt instead the most fundamental and general perspective of what “renormalization” means at its very basic core at the level of information.
To be defined in Eq. (17).
Meaning that \([ \alpha , [\beta , \gamma ]] \ne [[\alpha , \beta ], \gamma ]\) using the bracketed notation.
Other operations can be accounted for by introducing extra links in the graphical representation, as we shall explain, but the renormalization picture still holds.
Notice that the converse is not true.
Unlike some TNs with loops, which are \(\sharp\)P-hard and therefore need an exponential time to be contracted [51].
This is in fact a very efficient procedure to compute the overall probability of a given tree in a language model.
A recent attempt in this direction, also related to quantum physics and linear algebra, is the Matrix Syntax model in Ref. [65].
In words of N. Chomsky, “there is only one human language” (private communication).
References
See, e.g., https://en.wikipedia.org/wiki/Linguistics
Russell SJ, Norvig P. Artificial intelligence: a modern approach. 3rd ed. Upper Saddle River: Prentice Hall; 2009.
Chomsky N, Gallego ÁJ, Ott D. Generative grammar and the faculty of language: insights, questions, and challenges. Ms., MIT / UAB / UOttawa (2017); Available at http://ling.auf.net/lingbuzz/003507
Descartes R. Discours de la méthode, 1662.
Chomsky N. The language capacity: architecture and evolution. Psychon Bull Rev. 2017;24:200–3.
Hauser MD, Chomsky N, Fitch WT. The faculty of language: What is it, who has it, and how did it evolve? Science. 2002;298:1569–79.
Anderson SR. Doctor Dolittle’s Delusion. Animals and the uniqueness of human language. New Haven: Yale University Press; 2004.
Chomsky N. Some simple evo-devo theses: how true might they be for language? In: Larson RK, Déprez V, Yamakido H, editors. The evolution of human language: biolinguistic perspectives. Cambridge: Cambridge University Press; 2012. p. 45–62.
Chomsky N. A minimalist program for linguistic theory, MIT occasional papers in linguistics no. 1. Cambridge, MA: Distributed by MIT Working Papers in Linguistics, 1993.
Chomsky N. Three factors in language design. Linguistic Inquiry. 2005;36:1–22.
Thompson DW. On growth and form. Cambridge: Cambridge University Press; 1917.
Turing AM. The chemical basis of morphogenesis. Phylosophic Trans Roy Soc B. 1952;237(642):37–42.
Chomsky N. Bare Phrase structure, evolution and revolution in linguistic theory, essays in honor of Carlos Otero., eds. Hector Campos and Paula Kempchinsky, 1995;51–109.
See, e.g., https://en.wikipedia.org/wiki/Emergentism
Anderson PW. More is different. Sci New Ser. 1972;177(4047):393–6.
There are plenty of books and introductory articles on renormalization in physics. Some good original sources, though, are L. Kadanoff, Scaling laws for Isig models near \(T_c\), Physics, 1966;2: 263.
Wilson KG. The renormalization group: critical phenomena and the Kondo problem. Rev Mod Phys. 1975;47(4):773.
Wilson KG. Problems in Physics with many Scales of Length. Sci Am. 1979;241:140–57.
Also K. G. Wilson’s nobel prize lecture from 1982, available at http://www.nobelprize.org.
See, e.g., Shankar R. Renormalization-group approach to interacting fermions, Rev. Mod. Phys. 1992;66: 129, White SR. Density matrix formulation for quantum renormalization groups, Phys. Rev. Lett. 1992;69:2863.
See, e.g., Weinberg S. The Quantum Theory of Fields (3 volumes), Cambridge University Press (1995).
Verstraete F, et al. Renormalization-group transformations on quantum states. Phys Rev Lett. 2005;94:140601.
Vidal G. Entanglement renormalization. Phys Rev Lett. 2007;99:220405.
See, e.g., https://en.wikipedia.org/wiki/Language_model
Verstraete F, Cirac JI, Murg V. Matrix product states, projected entangled pair states, and variational renormalization group methods for quantum spin systems. Adv Phys. 2008;57:143.
Cirac JI, Verstraete F. Mathematical and Theoretical Renormalization and tensor product states in spin chains and lattices. J Phys A. 2009;42:504004.
Eisert J. Entanglement and tensor network states. Model Simul. 2009;3:520.
Schuch N. Condensed matter applications of entanglement theory, QIP, Lecture Notes of the 44th IFF Spring School (2013).
Orús R. Advances on tensor network theory: symmetries, fermions, entanglement, and holography. Eur Phys J B. 2014;87:280.
Orús R. A practical introduction to tensor networks: matrix product states and projected entangled pair states. Ann Phys. 2014;349:117158.
Chomsky N. Problems of projection. Lingua. 2013;130:33–49.
Nielsen M, Chuang I. Quantum computation and quantum information. New York: Cambridge University Press; 2000.
Sengupta B, Stemmler MN. Power consumption during neuronal computation. Proc IEEE 2014;102(5).
Smith NA, Johnson M. Weighted and probabilistic context-free grammars are equally expressive. Comput Linguist. 2007;33(4):477.
Shi Y, Duan L, Vidal G. Classical simulation of quantum many-body systems with a tree tensor network. Phys Rev A. 2006;74:022320.
Tagliacozzo L, Evenbly G, Vidal G. Simulation of two-dimensional quantum systems using a tree tensor network that exploits the entropic area law. Phys Rev B. 2009;80:235127.
Murg V, et al. Simulating strongly correlated quantum systems with tree tensor networks. Phys Rev B. 2010;82:205105.
Gerster M, et al. Unconstrained tree tensor network: an adaptive gauge picture for enhanced performance. Phys Rev B. 2014;90:125154.
Fannes M, Nachtergaele B, Werner RF. Finitely correlated states on quantum spin chains. Commun Math Phys. 1992;144:443–90.
Klümper A, Schadschneider A, Zittartz J. Equivalence and solution of anisotropic spin-1 models and generalized t-J fermion models in one dimension. J Phys A. 1991;24:L955.
Klümper A, Schadschneider A, Zittartz J. Matrix-product-groundstates for one-dimensional spin-1 quantum antiferromagnets. Europhys Lett. 1993;24:293.
Schollwöck U. The density-matrix renormalization group in the age of matrix product states. Ann Phys. 2011;326:96.
Oseledets IV. Tensor-train decomposition. SIAM J Sci Comput. 2011;33(5):2295–317.
Chomsky N. Syntactic structures. The Hague/Paris: Mouton; 1957.
See, e.g., Liu H. Dependency Grammar: from Theory to Practice. Beijing: Science Press (2009).
Papadimitrou CH. Computational complexity. Addison Wesley, (1994).
Alvarez-Lacalle E, Dorow B, Eckmann J-P, Moses E. Hierarchical structures induce long-range dynamical correlations in written texts. PNAS. 2006;103(21):7956–61.
Altmann EG, Cristadoro G, Esposti MD. On the origin of long-range correlations in texts. PNAS. 2012;109(29):11582–7.
Lin HW, Tegmark M. Critical behavior in physics and probabilistic formal languages. Entropy. 2017;19:299.
Schuch N, et al. Computational complexity of projected entangled pair states. Phys Rev Lett. 2007;98:140506.
Temme K, Verstraete F. Stochastic matrix product states. Phys Rev Lett. 2010;104:210502.
De las Cuevas G, et al. Purifications of multipartite states: limitations and constructive methods. New J Phys. 2013;15:123021.
Wolf MM, Verstraete F, Hastings MB, Cirac JI. Area laws in quantum systems: mutual information and correlations. Phys Rev Lett. 2008;100:070502.
Holzhey C, Larsen F, Wilczek F. Geometric and renormalized entropy in conformal field theory. Nucl Phys B. 1994;424:443.
Vidal G, et al. Entanglement in quantum critical phenomena. Phys Rev Lett. 2003;90:227902.
Latorre JI, Rico E, Vidal G. Ground state entanglement in quantum spin chains. Quantum Inf Comput. 2004;4:48.
Eisert J, Cramer M. Single-copy entanglement in critical quantum spin chains. Phys Rev A. 2005;72:042112.
Orús R et al. Half the entanglement in critical systems is distillable from a single specimen. Phys Rev A 2006;73060303(R).
See, e.g., Bathia R. Matrix Analysis, Springer-Verlag, New York, (1997);M. Nielsen and G. Vidal, QIC, 2001;1(1):76–93.
Kadanoff LP. More is the same; phase transitions and mean field theories. J Stat Phys. 2009;137:777.
Sidorov G, et al. Syntactic dependency-based n-grams as classification features. LNAI. 2012;7630:1–11.
Orús R, Martin R, Uriagereka J. Mathematical foundations of matrix syntax, arXiv:1710.00372; R. Martin, R. Orús and J. Uriagereka, Towards Matrix Syntax, Catalan Journal of Linguistics. 2019;27-44.
Ferrer i Cancho R, Solé RV. The small world of human language. Proc R Soc Lond B. 2001;268:2261–5.
Zamolodchikov AB. Irreversibility of the flux of the renormalization group in a 2D field theory. JETP Lett. 1996;43:730–2.
Latorre JL, et al. Fine-grained entanglement loss along renormalization-group flows. Phys Rev A. 2005;71:034301.
Orús R. Entanglement and majorization in (1+1)-dimensional quantum systems. Phys Rev A. 2005;71:052327.
Eddy SR, Durbin R. RNA sequence analysis using covariance models. Nucleic Acids Res. 1994;22(11):2079–88.
Sakakibara Y, et al. Stochastic context-free grammers for tRNA modeling. Nucleic Acids Res. 1994;22(23):5112–20.
Durbin R, Eddy S, Krogh A, Mitchinson G, editors. Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge: Cambridge University Press; 1998.
Searls D. A primer in macromolecular linguistics. Biopolymers. 2013;99(3):203–17.
Krogh A, et al. Hidden Markov models in computational biology. Applications to protein modeling. J Mol Biol. 1994;235:1501–31.
Sigrist C, et al. PROSITE: a documented database using patterns and profiles as motif descriptors. Brief Bioinform. 2002;3(3):265–74.
Dyrka W, Nebel J-C. A stochastic context free grammar based framework for analysis of protein sequences. BMC Bioinform. 2009;10:323.
Dyrka W, Nebel J-C, Kotulska M. Probabilistic grammatical model for helix?helix contact site classification. Algor Mol Biol. 2013;8:31.
Levine Y. et al. Deep learning and quantum entanglement: fundamental connections with implications to network design, arXiv:1704.01552.
Lin HW, Tegmark M, Roinick D. Why does deep and cheap learning work so well? J Stat Phys. 2017;168(6):1223–47.
Graps A. An introduction to wavelets. IEEE Comput Sci Eng. 1995;2(2):50–61.
Latorre JI. Image compression and entanglement, arxiv:quant-ph/0510031
Stoudenmire EM, Schwab DJ. Supervised learning with tensor networks. Adv Neural Inf Process Syst. 2016;29:4799.
Katz J, Pesetsky D. The identity thesis for language and music, http://ling.auf.net/lingBuzz/000959
Alcock JK, et al. Pitch and timing abilities in inherited speech and language impairment. Brain Lang. 2000;75:34–46.
Lai CSL, et al. A forkhead-domain gene is mutated in a severe speech and language disorder. Nature. 2001;413:519–23.
Peretz I. Music, language and modularity framed in action. Psychol Belg. 2009;49:157–75.
Rimfeld K, et al. Pleiotropy across academic subjects at the end of compulsory education. Sci Rep. 2015;5:11713.
See, e.g., https://en.wikipedia.org/wiki/Successor_function, and also Paul R. Halmos Naive Set Theory, Nostrand (1968).
Chomsky N. On Phases, MIT Press (2008).
Nelson MJ et al. Neurophysiological dynamics of phrase-structure building during sentence processing, PNAS Vol. 114, No. 18 (2017).
Carlon E, Henkel M, Schollwoeck U. Density matrix renormalization group and reaction-diffusion processes. Eur Phys J B. 1999;12:99.
For other examples in different contexts see, e.g., Piattelli-Palmarini M, Vitiello G. Linguistics and some aspects of its underlying dynamics, arXiv:1506.08663.
Piattelli-Palmarini M, Vitiello G. Quantum field theory and the linguistic Minimalist Program: a remarkable isomorphism. J Phys. 2017;880:012016.
Solé R. Synthetic transitions: towards a new synthesis. Philos Trans R Soc B. 2016;371:20150438.
Bolhuis J, Tattersall I, Chomsky N, Berwick RC. How could language have evolved? PLoS Biol. 2014;12:e1001934.
Berwick RC, Chomsky N. Why only us. Cambridge: MIT Press; 2016.
Chomsky N. Three models for the description of language. IRE Trans Inf Theory. 1956;2:113–24.
Gallego ÁJ, editor. Phases. Developing the framework. Berlin: De Gruyter; 2012.
Chomsky N, Halle M, Lukoff F. On Accent and Juncture in English. In For Roman Jakobson: Essays on the occasion of his sixtieth birthday, M. Halle et al. (eds.), 65-80. The Hague: Mouton and Co. (1956).
Chomsky N, Halle M. The sound pattern of English. New York: Harper Row; 1968.
N. Chomsky, Problems of Projection. Extensions. In E. di Domenico et al. (eds.), In Structures, Strategies and Beyond, 1-16. Amsterdam: John Benjamins. (2015).
Acknowledgements
We acknowledge the Universitat Autònoma de Barcelona, the Max Planck Institute of Quantum Optics, and the University of Mainz, where the ideas in this paper materialized from interdisciplinary discussions over several years. Discussions with Ondiz Aizpuru, Gustavo Ariel Schwartz, Gemma de las Cuevas, Laura García-Álvarez, Geza Giezdke, José I. Latorre, Enrique Solano, Miles Stoudenmire and Juan Uriagereka are acknowledged. A special acknowledgement to Sergi Quintana, for explaining us the basics of Chomsky’s generative grammar 28 years ago: qui sembra, recull.
Funding
This project was not funded by any grant.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: More Analogies by Digging Deeper
Appendix A: More Analogies by Digging Deeper
Our paper is written having in mind a reader with background on physics and mathematics. However, the topic itself is strongly interdisciplinary. Because of this, in this appendix we would like to add some extra information useful for the reader with knowledge about theoretical linguistics. In particular, we would like to define a few concepts more precisely in linguistic jargon. Thanks to this, we will see that by digging deeper into the linguistic jargon, more analogies with physics will show up, in turn strengthening our thesis that MERGE in linguistics and RG in physics are deeply linked to each other.
To begin with, the term Universal Grammar (UG) is nothing but a label for the striking difference in cognitive capacity between “us and them”, i.e., humans versus the rest of animal species. UG is thus the research topic of generative grammar in its attempt to understand what it is and how it evolved in our species. Finding a satisfying answer to the latter question may be impossible with the tools we have right now, but any theory of UG seeking to address the former must meet a criterion of evolvability: any properties, mechanisms, etc. attributed to UG should have emerged in what appears to have been a unique and relatively sudden event on the evolutionary timescale [95, 96]. This line of thought presupposes that UG (the genetic encoding of the the human linguistic capacity) manifests bona fide traits of perfect design, in the sense that contains operations and mechanisms that follow from conceptual necessities, efficiency principles or interface demands. In this respect, linguistic expressions (sentences, phrases, words, etc.) are built up by adhering to these principles, therefore in an optimal fashion. While these notions are intuitively clear, their precise formulations remain vague and controversial.
One of the most important mathematical achievements of generative grammar is the so-called “Chomsky Hierarchy” [97], a classification of formal grammars according to their complexity. As Chomsky showed 60 years ago, human languages manifests both context-free and context-sensitive properties, needed to construct PHRASES and CHAINS respectively, shown in the Sentences A1(a,b):
In Sen. A1(a) (a PHRASE) we have two tokens of the lexical item “John” that participate in phrasal dependencies to yield a compositional interpretation whereby the first John is the agent of a killing event, and the second John is the patient of such event. What we have in Sen. A1(b) (a CHAIN) is more complex. This time, we don’t have two tokens of “John”, but two occurrences of the same lexical item—as if they were one and the same object in two positions at the same time, where the notation \(\mathrm{<John>}\) means that the word itself is not pronounced at that position, but it is also interpreted there from the logical point of view. This is what is called CHAIN in linguistics. In languages of the English type, the first (leftmost) occurence is spelled-out, whereas the second (rightmost) is necessary to keep a syntax-semantics homomorphism (that is, to capture the desideratum that a specific interpretation is tied to a specific position). Notice that the same type of object (a CHAIN) is necessary in Sen. A2, where “John” is pronounced to the left of seem, although it is interpreted as the patient of killed.
In order to account for these properties, generative grammar has resorted to phrase structure rules (PSR) and transformations. The most articulated version of PSR is known as X-bar Theory, which resorted to different devices that have been subject to a revision within minimalism. In particular, Chomsky [13] argued that the basic properties of PSR could be understood by means of a computational operation, dubbed MERGE [13], which captures two empirical properties of human language that are non-negotiable: discrete infinity and displacement. To be able to account for those properties, one must assume an operation that constructs hierarchically structured expressions with displacement. And that is what MERGE does. MERGE applies to two objects X and Y (be these words or bigger units), yielding a new one, K, which is the set containing X and Y, i.e., \(\{ X, Y \}\). If X, Y are distinct (taken directly from the lexicon or independently assembled), K is constructed by what is called EXTERNAL MERGE (EM); if Y is part of X (if Y is contained in X), then we have what is called INTERNAL MERGE (IM). The latter scenario is that of Sentences A1(b) and A2 above, where MERGE turns “John” into a discontinuous object (a CHAIN). For completeness, if the operation is at the beginning of a derivation (e.g., with bare lexical items from a lexicon), it is called FIRST MERGE, and if it operates with partially-derived items (phrases), it is called ELSEWHERE MERGE.
Chomsky [13] takes MERGE to be strictly binary, as it is what is minimally necessary to create hierarchical structure. Generation by MERGE thus entails a restrictive class of recursively defined, binary-branching and discrete-hierarchical structures.
It is also worth mentioning that in X-bar Theory, the label identifies the properties of the entire phrase, at the cost of this being a theory-internal symbol that departs from inclusiveness demands. An alternative to this is a label-free representation (see Fig. 1), where endocentricity (the assumption that all phrases must be headed) is not preserved. This entails that syntactic objects can be exocentric, as seems to be necessary for objects formed by the combination of two phrases, \(\{XP,YP \}\). Syntactic objects are “endocentric” if they contain an element that can be determined by Minimal Search—typically, a head. Given this logic, \(\{X,YP \}\) is endocentric and \(\{XP,YP \}\) exocentric. Consequently, such a system freely generates objects of different kinds, without stipulating their endocentric nature.
Moreover, MERGE is subject to efficiency and economy conditions. One such condition is inclusiveness, which precludes the introduction of extraneous objects, like the ones that X-bar Theory deployed: traces, bar-levels, projections, etc. Inclusiveness also bars introduction of features that are not present in lexical items.
To further clarify MERGE, we stress that the combination of two objects, X and Y, yields a new one, K, which is the set \(\{ X, Y \}\). Once we have \(\{X,Y \}\), we may want to merge K and some object W, which can be either internal to K or external to it (see above). In any event, the merger of W cannot change or tamper with \(\{X,Y \}\), which behaves as a unit. More precisely, subsequent applications of MERGE must yield Eq. A3(a), not A3(b):
The driving force of this work is the fact that MERGE and renormalization seem to play a similar role on various respects. As noted above, MERGE takes two objects, X and Y, to yield a new one, K, thus removing X and Y from the computational workspace (WS). In the simplest scenario, MERGE maps \(\mathrm{WS} = [X,Y]\) onto \(\mathrm{WS'} = [\{X,Y \}]\), reducing the complexity of WS. Notice that MERGE never extends the WS, at least in terms of cardinality; thus \(\mathrm{WS} = [\{X,Y \}]\) and \(\mathrm{WS'} = [\{W,\{X,Y\}\}]\) are equally bigger, since they only contain one set. A new element can be added to WS (or \(\mathrm{WS'}\)) in only one way: by taking two items W, Z from the lexicon and introducing \(\{W,Z \}\) into WS as a new element, yielding \(\mathrm{WS''} = [\{W,Z\},\{X,Y\}]\). Of course, cardinality can be reduced if we apply EM (EXTERNAL MERGE) and neither of the elements are taken from the lexicon, as if we map \(\mathrm{WS''} = [\{W,Z\},\{X,Y\}]\) onto \(\mathrm{WS'''} = [\{\{W,Z\},\{X,Y\}\}]\). This idea is indeed very similar to that of a coarse-graining in physics, in the sense made precise throughout the paper.
Additionally, the possibility that computational load is reduced by MERGE is perhaps somewhat new, as this typically follows from a principle in linguistics that is called STRICT CYCLICITY. The notion of cycle (and thus cyclicity) goes back to the fifties, where work in phonology [99, 100] showed that the application of stress-assigning rules apply from innermost to outermost units of a word, putting aside linear order information. More generally, an object is build under cyclic principles if it is COMPOSITIONAL, which means that its interpretation is fixed by the elements it contains and the way in which they are combined. Consider this with Sentences A4, where the interpretation is crucially different (Brutus is an agent in (a), and a patient in (b)), although both examples contain the same three words:
The concept of STRICT CYCLICITY is a stronger version of cyclicity. The key intuition behind it is that for certain linguistic object constructed in a derivation (say, a VP), further computation should not modify it. Let us see this with the example in Eq. A5, where the verb “leave” is merged with the NP “the room” to yield the complex VP “leave the room”, which we can call K for ease of reference.
What is of interest here is that the interpretation of K (that is, of “leave the room”) is determined at that stage of the derivation (at that “cycle”), and cannot be changed at subsequent stages (“cycles”). Therefore, if we add “Mary” to obtain “Mary leaves the room” (call it \(K'\)), as in Eq. A6, the interpretation of K will be the same in Eq. A5 and in Eq. A6.
In a nutshell, the interpretation of complex objects is constructed stepwise, in a step-by-step fashion, and whatever has been done at a stage s cannot be undone at statge \(s+1\) (Eqs. A5 and A6 above). This, in turn, is quite analogue to the idea of irreversibility of RG flows in physics, which matches perfectly with our interpretation of MERGE as a coarse-graining of information.
Such stages at a derivation, where a “computation” is done and cannot be altered afterwards, correspond with the so-called linguistic PHASES, and the device responsible for ensuring that the interior of a PHASE is no longer accessible is the PHASE IMPENETRABILITY CONDITION (PIC for short). What has been called “phase” roughly corresponds with the notion of “cycle” described above. Using the physical interpretation that we introduce in this paper, one would say that a PHASE in linguistics is the analogous of an RG scale in physics.
To be more precise, a phase is defined in linguistics as a domain D where uninterpretable features (number and person features of verbs) are valued. When a phase is closed off, the complement domain \(\Omega\) (which can itself be complex, in the sense of having some inner structure) of the phase head P cannot be modified from the outside; this means, for instance, that the case of an NP within \(\Omega\) (e.g., “the book” in the VP “read the book”) cannot be changed once the phase headed by P is complete [31, 101]. Among other things, this entails that “the book”, which is the Direct Object of “read” Sentence A7 (it receives accusative case from “read”), cannot also be the Direct Object of the matrix verb “believe”:
That “the book” is the Direct Object of “read” and not of “believe” is shown in Sentences A8, where we see that this NP can be passiviced in the embedded clause, but not in the matrix clause (\(*\) signals ungrammaticality):
This “shielding” effect that makes the VP impenetrable is captured by the phase impenetrability condition mentioned above. Physically, this is the irreversibility of the RG flow when moving from one RG scale to the next. There are various approaches to Phase Theory [98], but all of them share the key intuition that PHASES are domains where complexity is reduced by somehow allowing the system to “forget” about an amount of structure that has been created and which will be no longer accessible. This process of “forgetting” is, in fact, analogous to the process of “discarding irrelevant degrees of freedom” in an RG-step in physics.
Moreover, the “rescaling” step in RG has not been discussed in this paper, but also appears naturally when particularizing to specific models of language. For instance, in the Matrix Syntax model [65] this rescaling appears naturally in order to recover the correct linguistic labels after a MERGE operation (see Ref. [65] and the discussions therein for more information). We believe that this is a general feature: the “rescaling” in physics is nothing but the “mathematical relabelling” that one needs in order to recover the correct labels (NP, VP, etc) after a MERGE operation when dealing in practice with models of language.
Rights and permissions
About this article
Cite this article
Gallego, Á.J., Orús, R. Language Design as Information Renormalization. SN COMPUT. SCI. 3, 140 (2022). https://doi.org/10.1007/s42979-021-01002-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-021-01002-y