Language Design as Information Renormalization

Gallego, Ángel J.; Orús, Román

doi:10.1007/s42979-021-01002-y

Language Design as Information Renormalization

Original Research
Published: 21 January 2022

Volume 3, article number 140, (2022)
Cite this article

SN Computer Science Aims and scope Submit manuscript

425 Accesses
41 Altmetric
Explore all metrics

Abstract

Here we consider some well-known facts in syntax from a physics perspective, allowing us to establish analogies between both fields with many consequences. Mainly, we observe that the operation MERGE, put forward by Chomsky (in: Evolution and Revolution in Linguistic Theory, Essays in honor of Carlos Otero., eds. Hector Campos and Paula Kempchinsky, 1995), can be interpreted as a physical information coarse-graining. Thus, MERGE in linguistics entails information renormalization in physics, according to different time scales. We make this point mathematically formal in terms of statistical language models. In this setting, MERGE amounts to a probability tensor implementing a coarse-graining, akin to a probabilistic context-free grammar. The probability vectors of meaningful sentences are given by stochastic tensor networks (TN) built from diagonal tensors and which are mostly loop-free, such as Tree Tensor Networks and Matrix Product States, thus being computationally very efficient to manipulate. We show that this implies the polynomially-decaying (long-range) correlations experimentally observed in language, and also provides arguments in favour of certain types of neural networks for language processing. Moreover, we show how to obtain such language models from quantum states that can be efficiently prepared on a quantum computer, and use this to find bounds on the perplexity of the probability distribution of words in a sentence. Implications of our results are discussed across several ambits.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semantic and Syntactic Model of Natural Language Based on Tensor Factorization

Development of a Semantic and Syntactic Model of Natural Language by Means of Non-negative Matrix and Tensor Factorization

Semantic and Syntactic Model of Natural Language Based on Non-negative Matrix and Tensor Factorization

Notes

A clarification is in order: here we understand “renormalization” as the tool that allows the mathematical description of the information in a system at different physical scales, accounted for by relevant degrees of freedom at every scale. Of course, the implementation of this idea in several contexts leads to different consequences. Well-known examples in physics are the existence of critical systems, critical exponents, universality classes, phase transitions, the c-theorem, the reshuffling of Hilbert spaces, the $\beta$-function, fixed points, RG flows, scaling laws, relevant/irrelevant/marginal perturbations... the list is unending. In our case, however, we do not assume necessarily the existence of any of these in the case of language (though some of them may also be there), and adopt instead the most fundamental and general perspective of what “renormalization” means at its very basic core at the level of information.
To be defined in Eq. (17).
Meaning that $[ \alpha , [\beta , \gamma ]] \ne [[\alpha , \beta ], \gamma ]$ using the bracketed notation.
Other operations can be accounted for by introducing extra links in the graphical representation, as we shall explain, but the renormalization picture still holds.
Notice that the converse is not true.
Unlike some TNs with loops, which are $\sharp$P-hard and therefore need an exponential time to be contracted [51].
To be precise, this is correct in average, since depending on the tree it is possible to choose specific pairs of points with longer separation [35,36,37,38].
This is in fact a very efficient procedure to compute the overall probability of a given tree in a language model.
A recent attempt in this direction, also related to quantum physics and linear algebra, is the Matrix Syntax model in Ref. [65].
In words of N. Chomsky, “there is only one human language” (private communication).

References

See, e.g., https://en.wikipedia.org/wiki/Linguistics
Russell SJ, Norvig P. Artificial intelligence: a modern approach. 3rd ed. Upper Saddle River: Prentice Hall; 2009.
MATH Google Scholar
Chomsky N, Gallego ÁJ, Ott D. Generative grammar and the faculty of language: insights, questions, and challenges. Ms., MIT / UAB / UOttawa (2017); Available at http://ling.auf.net/lingbuzz/003507
Descartes R. Discours de la méthode, 1662.
Chomsky N. The language capacity: architecture and evolution. Psychon Bull Rev. 2017;24:200–3.
Article Google Scholar
Hauser MD, Chomsky N, Fitch WT. The faculty of language: What is it, who has it, and how did it evolve? Science. 2002;298:1569–79.
Article Google Scholar
Anderson SR. Doctor Dolittle’s Delusion. Animals and the uniqueness of human language. New Haven: Yale University Press; 2004.
Google Scholar
Chomsky N. Some simple evo-devo theses: how true might they be for language? In: Larson RK, Déprez V, Yamakido H, editors. The evolution of human language: biolinguistic perspectives. Cambridge: Cambridge University Press; 2012. p. 45–62.
Google Scholar
Chomsky N. A minimalist program for linguistic theory, MIT occasional papers in linguistics no. 1. Cambridge, MA: Distributed by MIT Working Papers in Linguistics, 1993.
Chomsky N. Three factors in language design. Linguistic Inquiry. 2005;36:1–22.
Article Google Scholar
Thompson DW. On growth and form. Cambridge: Cambridge University Press; 1917.
Book Google Scholar
Turing AM. The chemical basis of morphogenesis. Phylosophic Trans Roy Soc B. 1952;237(642):37–42.
MathSciNet MATH Google Scholar
Chomsky N. Bare Phrase structure, evolution and revolution in linguistic theory, essays in honor of Carlos Otero., eds. Hector Campos and Paula Kempchinsky, 1995;51–109.
See, e.g., https://en.wikipedia.org/wiki/Emergentism
Anderson PW. More is different. Sci New Ser. 1972;177(4047):393–6.
Google Scholar
There are plenty of books and introductory articles on renormalization in physics. Some good original sources, though, are L. Kadanoff, Scaling laws for Isig models near $T_c$, Physics, 1966;2: 263.
Wilson KG. The renormalization group: critical phenomena and the Kondo problem. Rev Mod Phys. 1975;47(4):773.
Article MathSciNet Google Scholar
Wilson KG. Problems in Physics with many Scales of Length. Sci Am. 1979;241:140–57.
Article Google Scholar
Also K. G. Wilson’s nobel prize lecture from 1982, available at http://www.nobelprize.org.
See, e.g., Shankar R. Renormalization-group approach to interacting fermions, Rev. Mod. Phys. 1992;66: 129, White SR. Density matrix formulation for quantum renormalization groups, Phys. Rev. Lett. 1992;69:2863.
See, e.g., Weinberg S. The Quantum Theory of Fields (3 volumes), Cambridge University Press (1995).
Verstraete F, et al. Renormalization-group transformations on quantum states. Phys Rev Lett. 2005;94:140601.
Article Google Scholar
Vidal G. Entanglement renormalization. Phys Rev Lett. 2007;99:220405.
Article Google Scholar
See, e.g., https://en.wikipedia.org/wiki/Language_model
Verstraete F, Cirac JI, Murg V. Matrix product states, projected entangled pair states, and variational renormalization group methods for quantum spin systems. Adv Phys. 2008;57:143.
Article Google Scholar
Cirac JI, Verstraete F. Mathematical and Theoretical Renormalization and tensor product states in spin chains and lattices. J Phys A. 2009;42:504004.
Article MathSciNet MATH Google Scholar
Eisert J. Entanglement and tensor network states. Model Simul. 2009;3:520.
Google Scholar
Schuch N. Condensed matter applications of entanglement theory, QIP, Lecture Notes of the 44th IFF Spring School (2013).
Orús R. Advances on tensor network theory: symmetries, fermions, entanglement, and holography. Eur Phys J B. 2014;87:280.
Article Google Scholar
Orús R. A practical introduction to tensor networks: matrix product states and projected entangled pair states. Ann Phys. 2014;349:117158.
Article MathSciNet MATH Google Scholar
Chomsky N. Problems of projection. Lingua. 2013;130:33–49.
Article Google Scholar
Nielsen M, Chuang I. Quantum computation and quantum information. New York: Cambridge University Press; 2000.
MATH Google Scholar
Sengupta B, Stemmler MN. Power consumption during neuronal computation. Proc IEEE 2014;102(5).
Smith NA, Johnson M. Weighted and probabilistic context-free grammars are equally expressive. Comput Linguist. 2007;33(4):477.
Article MathSciNet MATH Google Scholar
Shi Y, Duan L, Vidal G. Classical simulation of quantum many-body systems with a tree tensor network. Phys Rev A. 2006;74:022320.
Article Google Scholar
Tagliacozzo L, Evenbly G, Vidal G. Simulation of two-dimensional quantum systems using a tree tensor network that exploits the entropic area law. Phys Rev B. 2009;80:235127.
Article Google Scholar
Murg V, et al. Simulating strongly correlated quantum systems with tree tensor networks. Phys Rev B. 2010;82:205105.
Article Google Scholar
Gerster M, et al. Unconstrained tree tensor network: an adaptive gauge picture for enhanced performance. Phys Rev B. 2014;90:125154.
Article Google Scholar
Fannes M, Nachtergaele B, Werner RF. Finitely correlated states on quantum spin chains. Commun Math Phys. 1992;144:443–90.
Article MathSciNet MATH Google Scholar
Klümper A, Schadschneider A, Zittartz J. Equivalence and solution of anisotropic spin-1 models and generalized t-J fermion models in one dimension. J Phys A. 1991;24:L955.
Article MathSciNet Google Scholar
Klümper A, Schadschneider A, Zittartz J. Matrix-product-groundstates for one-dimensional spin-1 quantum antiferromagnets. Europhys Lett. 1993;24:293.
Article Google Scholar
Schollwöck U. The density-matrix renormalization group in the age of matrix product states. Ann Phys. 2011;326:96.
Article MathSciNet MATH Google Scholar
Oseledets IV. Tensor-train decomposition. SIAM J Sci Comput. 2011;33(5):2295–317.
Article MathSciNet MATH Google Scholar
Chomsky N. Syntactic structures. The Hague/Paris: Mouton; 1957.
Book MATH Google Scholar
See, e.g., Liu H. Dependency Grammar: from Theory to Practice. Beijing: Science Press (2009).
See https://en.wikipedia.org/wiki/N-gram$♯$cite$_$note-1
Papadimitrou CH. Computational complexity. Addison Wesley, (1994).
Alvarez-Lacalle E, Dorow B, Eckmann J-P, Moses E. Hierarchical structures induce long-range dynamical correlations in written texts. PNAS. 2006;103(21):7956–61.
Article Google Scholar
Altmann EG, Cristadoro G, Esposti MD. On the origin of long-range correlations in texts. PNAS. 2012;109(29):11582–7.
Article Google Scholar
Lin HW, Tegmark M. Critical behavior in physics and probabilistic formal languages. Entropy. 2017;19:299.
Article Google Scholar
Schuch N, et al. Computational complexity of projected entangled pair states. Phys Rev Lett. 2007;98:140506.
Article MathSciNet MATH Google Scholar
Temme K, Verstraete F. Stochastic matrix product states. Phys Rev Lett. 2010;104:210502.
Article MathSciNet Google Scholar
De las Cuevas G, et al. Purifications of multipartite states: limitations and constructive methods. New J Phys. 2013;15:123021.
Article Google Scholar
Wolf MM, Verstraete F, Hastings MB, Cirac JI. Area laws in quantum systems: mutual information and correlations. Phys Rev Lett. 2008;100:070502.
Article MathSciNet MATH Google Scholar
See, e.g., https://en.wikipedia.org/wiki/Language_isolate
See, e.g., https://en.wikipedia.org/wiki/QR_decomposition
Holzhey C, Larsen F, Wilczek F. Geometric and renormalized entropy in conformal field theory. Nucl Phys B. 1994;424:443.
Article MathSciNet MATH Google Scholar
Vidal G, et al. Entanglement in quantum critical phenomena. Phys Rev Lett. 2003;90:227902.
Article Google Scholar
Latorre JI, Rico E, Vidal G. Ground state entanglement in quantum spin chains. Quantum Inf Comput. 2004;4:48.
MathSciNet MATH Google Scholar
Eisert J, Cramer M. Single-copy entanglement in critical quantum spin chains. Phys Rev A. 2005;72:042112.
Article Google Scholar
Orús R et al. Half the entanglement in critical systems is distillable from a single specimen. Phys Rev A 2006;73060303(R).
See, e.g., Bathia R. Matrix Analysis, Springer-Verlag, New York, (1997);M. Nielsen and G. Vidal, QIC, 2001;1(1):76–93.
Kadanoff LP. More is the same; phase transitions and mean field theories. J Stat Phys. 2009;137:777.
Article MathSciNet MATH Google Scholar
Sidorov G, et al. Syntactic dependency-based n-grams as classification features. LNAI. 2012;7630:1–11.
Google Scholar
Orús R, Martin R, Uriagereka J. Mathematical foundations of matrix syntax, arXiv:1710.00372; R. Martin, R. Orús and J. Uriagereka, Towards Matrix Syntax, Catalan Journal of Linguistics. 2019;27-44.
Ferrer i Cancho R, Solé RV. The small world of human language. Proc R Soc Lond B. 2001;268:2261–5.
Article Google Scholar
Zamolodchikov AB. Irreversibility of the flux of the renormalization group in a 2D field theory. JETP Lett. 1996;43:730–2.
Google Scholar
Latorre JL, et al. Fine-grained entanglement loss along renormalization-group flows. Phys Rev A. 2005;71:034301.
Article Google Scholar
Orús R. Entanglement and majorization in (1+1)-dimensional quantum systems. Phys Rev A. 2005;71:052327.
Article MathSciNet Google Scholar
Eddy SR, Durbin R. RNA sequence analysis using covariance models. Nucleic Acids Res. 1994;22(11):2079–88.
Article Google Scholar
Sakakibara Y, et al. Stochastic context-free grammers for tRNA modeling. Nucleic Acids Res. 1994;22(23):5112–20.
Article Google Scholar
Durbin R, Eddy S, Krogh A, Mitchinson G, editors. Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge: Cambridge University Press; 1998.
MATH Google Scholar
Searls D. A primer in macromolecular linguistics. Biopolymers. 2013;99(3):203–17.
Article Google Scholar
Krogh A, et al. Hidden Markov models in computational biology. Applications to protein modeling. J Mol Biol. 1994;235:1501–31.
Article Google Scholar
Sigrist C, et al. PROSITE: a documented database using patterns and profiles as motif descriptors. Brief Bioinform. 2002;3(3):265–74.
Article Google Scholar
Dyrka W, Nebel J-C. A stochastic context free grammar based framework for analysis of protein sequences. BMC Bioinform. 2009;10:323.
Article Google Scholar
Dyrka W, Nebel J-C, Kotulska M. Probabilistic grammatical model for helix?helix contact site classification. Algor Mol Biol. 2013;8:31.
Article Google Scholar
Levine Y. et al. Deep learning and quantum entanglement: fundamental connections with implications to network design, arXiv:1704.01552.
Lin HW, Tegmark M, Roinick D. Why does deep and cheap learning work so well? J Stat Phys. 2017;168(6):1223–47.
Article MathSciNet MATH Google Scholar
Graps A. An introduction to wavelets. IEEE Comput Sci Eng. 1995;2(2):50–61.
Article Google Scholar
Latorre JI. Image compression and entanglement, arxiv:quant-ph/0510031
Stoudenmire EM, Schwab DJ. Supervised learning with tensor networks. Adv Neural Inf Process Syst. 2016;29:4799.
Google Scholar
Katz J, Pesetsky D. The identity thesis for language and music, http://ling.auf.net/lingBuzz/000959
Alcock JK, et al. Pitch and timing abilities in inherited speech and language impairment. Brain Lang. 2000;75:34–46.
Article Google Scholar
Lai CSL, et al. A forkhead-domain gene is mutated in a severe speech and language disorder. Nature. 2001;413:519–23.
Article Google Scholar
Peretz I. Music, language and modularity framed in action. Psychol Belg. 2009;49:157–75.
Article Google Scholar
Rimfeld K, et al. Pleiotropy across academic subjects at the end of compulsory education. Sci Rep. 2015;5:11713.
Article Google Scholar
See, e.g., https://en.wikipedia.org/wiki/Successor_function, and also Paul R. Halmos Naive Set Theory, Nostrand (1968).
Chomsky N. On Phases, MIT Press (2008).
Nelson MJ et al. Neurophysiological dynamics of phrase-structure building during sentence processing, PNAS Vol. 114, No. 18 (2017).
Carlon E, Henkel M, Schollwoeck U. Density matrix renormalization group and reaction-diffusion processes. Eur Phys J B. 1999;12:99.
Article Google Scholar
For other examples in different contexts see, e.g., Piattelli-Palmarini M, Vitiello G. Linguistics and some aspects of its underlying dynamics, arXiv:1506.08663.
Piattelli-Palmarini M, Vitiello G. Quantum field theory and the linguistic Minimalist Program: a remarkable isomorphism. J Phys. 2017;880:012016.
Google Scholar
Solé R. Synthetic transitions: towards a new synthesis. Philos Trans R Soc B. 2016;371:20150438.
Article Google Scholar
Bolhuis J, Tattersall I, Chomsky N, Berwick RC. How could language have evolved? PLoS Biol. 2014;12:e1001934.
Article Google Scholar
Berwick RC, Chomsky N. Why only us. Cambridge: MIT Press; 2016.
Book Google Scholar
Chomsky N. Three models for the description of language. IRE Trans Inf Theory. 1956;2:113–24.
Article MATH Google Scholar
Gallego ÁJ, editor. Phases. Developing the framework. Berlin: De Gruyter; 2012.
Google Scholar
Chomsky N, Halle M, Lukoff F. On Accent and Juncture in English. In For Roman Jakobson: Essays on the occasion of his sixtieth birthday, M. Halle et al. (eds.), 65-80. The Hague: Mouton and Co. (1956).
Chomsky N, Halle M. The sound pattern of English. New York: Harper Row; 1968.
Google Scholar
N. Chomsky, Problems of Projection. Extensions. In E. di Domenico et al. (eds.), In Structures, Strategies and Beyond, 1-16. Amsterdam: John Benjamins. (2015).

Download references

Acknowledgements

We acknowledge the Universitat Autònoma de Barcelona, the Max Planck Institute of Quantum Optics, and the University of Mainz, where the ideas in this paper materialized from interdisciplinary discussions over several years. Discussions with Ondiz Aizpuru, Gustavo Ariel Schwartz, Gemma de las Cuevas, Laura García-Álvarez, Geza Giezdke, José I. Latorre, Enrique Solano, Miles Stoudenmire and Juan Uriagereka are acknowledged. A special acknowledgement to Sergi Quintana, for explaining us the basics of Chomsky’s generative grammar 28 years ago: qui sembra, recull.

Funding

This project was not funded by any grant.

Author information

Authors and Affiliations

Departament de Filologia Espanyola, Facultat de Filosofia i Lletres, Universitat Autònoma de Barcelona, 08193, Bellaterra, Spain
Ángel J. Gallego
Donostia International Physics Center, Paseo Manuel de Lardizabal 4, 20018, San Sebastián, Spain
Román Orús
Ikerbasque Foundation for Science, Maria Diaz de Haro 3, 48013, Bilbao, Spain
Román Orús
Multiverse Computing, Paseo de Miramón 170, 20014, San Sebastián, Spain
Román Orús

Authors

Ángel J. Gallego
View author publications
You can also search for this author in PubMed Google Scholar
Román Orús
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Román Orús.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: More Analogies by Digging Deeper

Our paper is written having in mind a reader with background on physics and mathematics. However, the topic itself is strongly interdisciplinary. Because of this, in this appendix we would like to add some extra information useful for the reader with knowledge about theoretical linguistics. In particular, we would like to define a few concepts more precisely in linguistic jargon. Thanks to this, we will see that by digging deeper into the linguistic jargon, more analogies with physics will show up, in turn strengthening our thesis that MERGE in linguistics and RG in physics are deeply linked to each other.

To begin with, the term Universal Grammar (UG) is nothing but a label for the striking difference in cognitive capacity between “us and them”, i.e., humans versus the rest of animal species. UG is thus the research topic of generative grammar in its attempt to understand what it is and how it evolved in our species. Finding a satisfying answer to the latter question may be impossible with the tools we have right now, but any theory of UG seeking to address the former must meet a criterion of evolvability: any properties, mechanisms, etc. attributed to UG should have emerged in what appears to have been a unique and relatively sudden event on the evolutionary timescale [95, 96]. This line of thought presupposes that UG (the genetic encoding of the the human linguistic capacity) manifests bona fide traits of perfect design, in the sense that contains operations and mechanisms that follow from conceptual necessities, efficiency principles or interface demands. In this respect, linguistic expressions (sentences, phrases, words, etc.) are built up by adhering to these principles, therefore in an optimal fashion. While these notions are intuitively clear, their precise formulations remain vague and controversial.

One of the most important mathematical achievements of generative grammar is the so-called “Chomsky Hierarchy” [97], a classification of formal grammars according to their complexity. As Chomsky showed 60 years ago, human languages manifests both context-free and context-sensitive properties, needed to construct PHRASES and CHAINS respectively, shown in the Sentences A1(a,b):

$$\begin{aligned}&a.&\mathrm{John ~ killed ~ John.} \\& b.&\mathrm{John ~ was ~ killed ~ <John> } \nonumber \end{aligned}$$

(A1)

In Sen. A1(a) (a PHRASE) we have two tokens of the lexical item “John” that participate in phrasal dependencies to yield a compositional interpretation whereby the first John is the agent of a killing event, and the second John is the patient of such event. What we have in Sen. A1(b) (a CHAIN) is more complex. This time, we don’t have two tokens of “John”, but two occurrences of the same lexical item—as if they were one and the same object in two positions at the same time, where the notation $\mathrm{<John>}$ means that the word itself is not pronounced at that position, but it is also interpreted there from the logical point of view. This is what is called CHAIN in linguistics. In languages of the English type, the first (leftmost) occurence is spelled-out, whereas the second (rightmost) is necessary to keep a syntax-semantics homomorphism (that is, to capture the desideratum that a specific interpretation is tied to a specific position). Notice that the same type of object (a CHAIN) is necessary in Sen. A2, where “John” is pronounced to the left of seem, although it is interpreted as the patient of killed.

$$\begin{aligned} \mathrm{John ~ seems ~ to ~ have ~ been ~ klilled ~ <John> } \end{aligned}$$

(A2)

In order to account for these properties, generative grammar has resorted to phrase structure rules (PSR) and transformations. The most articulated version of PSR is known as X-bar Theory, which resorted to different devices that have been subject to a revision within minimalism. In particular, Chomsky [13] argued that the basic properties of PSR could be understood by means of a computational operation, dubbed MERGE [13], which captures two empirical properties of human language that are non-negotiable: discrete infinity and displacement. To be able to account for those properties, one must assume an operation that constructs hierarchically structured expressions with displacement. And that is what MERGE does. MERGE applies to two objects X and Y (be these words or bigger units), yielding a new one, K, which is the set containing X and Y, i.e., $\{ X, Y \}$. If X, Y are distinct (taken directly from the lexicon or independently assembled), K is constructed by what is called EXTERNAL MERGE (EM); if Y is part of X (if Y is contained in X), then we have what is called INTERNAL MERGE (IM). The latter scenario is that of Sentences A1(b) and A2 above, where MERGE turns “John” into a discontinuous object (a CHAIN). For completeness, if the operation is at the beginning of a derivation (e.g., with bare lexical items from a lexicon), it is called FIRST MERGE, and if it operates with partially-derived items (phrases), it is called ELSEWHERE MERGE.

Chomsky [13] takes MERGE to be strictly binary, as it is what is minimally necessary to create hierarchical structure. Generation by MERGE thus entails a restrictive class of recursively defined, binary-branching and discrete-hierarchical structures.

It is also worth mentioning that in X-bar Theory, the label identifies the properties of the entire phrase, at the cost of this being a theory-internal symbol that departs from inclusiveness demands. An alternative to this is a label-free representation (see Fig. 1), where endocentricity (the assumption that all phrases must be headed) is not preserved. This entails that syntactic objects can be exocentric, as seems to be necessary for objects formed by the combination of two phrases, $\{XP,YP \}$. Syntactic objects are “endocentric” if they contain an element that can be determined by Minimal Search—typically, a head. Given this logic, $\{X,YP \}$ is endocentric and $\{XP,YP \}$ exocentric. Consequently, such a system freely generates objects of different kinds, without stipulating their endocentric nature.

Moreover, MERGE is subject to efficiency and economy conditions. One such condition is inclusiveness, which precludes the introduction of extraneous objects, like the ones that X-bar Theory deployed: traces, bar-levels, projections, etc. Inclusiveness also bars introduction of features that are not present in lexical items.

To further clarify MERGE, we stress that the combination of two objects, X and Y, yields a new one, K, which is the set $\{ X, Y \}$. Once we have $\{X,Y \}$, we may want to merge K and some object W, which can be either internal to K or external to it (see above). In any event, the merger of W cannot change or tamper with $\{X,Y \}$, which behaves as a unit. More precisely, subsequent applications of MERGE must yield Eq. A3(a), not A3(b):

$$\begin{aligned}&a.&\mathrm{MERGE} (K,W) = \{ \{ X, Y \},W \} \\&b.&\mathrm{MERGE} (K,W) = \{ \{ X, W \},Y \} \nonumber \end{aligned}$$

(A3)

The driving force of this work is the fact that MERGE and renormalization seem to play a similar role on various respects. As noted above, MERGE takes two objects, X and Y, to yield a new one, K, thus removing X and Y from the computational workspace (WS). In the simplest scenario, MERGE maps $\mathrm{WS} = [X,Y]$ onto $\mathrm{WS'} = [\{X,Y \}]$, reducing the complexity of WS. Notice that MERGE never extends the WS, at least in terms of cardinality; thus $\mathrm{WS} = [\{X,Y \}]$ and $\mathrm{WS'} = [\{W,\{X,Y\}\}]$ are equally bigger, since they only contain one set. A new element can be added to WS (or $\mathrm{WS'}$) in only one way: by taking two items W, Z from the lexicon and introducing $\{W,Z \}$ into WS as a new element, yielding $\mathrm{WS''} = [\{W,Z\},\{X,Y\}]$. Of course, cardinality can be reduced if we apply EM (EXTERNAL MERGE) and neither of the elements are taken from the lexicon, as if we map $\mathrm{WS''} = [\{W,Z\},\{X,Y\}]$ onto $\mathrm{WS'''} = [\{\{W,Z\},\{X,Y\}\}]$. This idea is indeed very similar to that of a coarse-graining in physics, in the sense made precise throughout the paper.

Additionally, the possibility that computational load is reduced by MERGE is perhaps somewhat new, as this typically follows from a principle in linguistics that is called STRICT CYCLICITY. The notion of cycle (and thus cyclicity) goes back to the fifties, where work in phonology [99, 100] showed that the application of stress-assigning rules apply from innermost to outermost units of a word, putting aside linear order information. More generally, an object is build under cyclic principles if it is COMPOSITIONAL, which means that its interpretation is fixed by the elements it contains and the way in which they are combined. Consider this with Sentences A4, where the interpretation is crucially different (Brutus is an agent in (a), and a patient in (b)), although both examples contain the same three words:

$$\begin{aligned}&a.&\mathrm{Brutus ~ stabbed ~ Caesar.} \\&b.&\mathrm{Caesar ~ stabbed ~ Brutus.} \nonumber \end{aligned}$$

(A4)

The concept of STRICT CYCLICITY is a stronger version of cyclicity. The key intuition behind it is that for certain linguistic object constructed in a derivation (say, a VP), further computation should not modify it. Let us see this with the example in Eq. A5, where the verb “leave” is merged with the NP “the room” to yield the complex VP “leave the room”, which we can call K for ease of reference.

$$\begin{aligned} \mathrm{MERGE}(\mathrm{leave,\{the,room \}}) = \mathrm{\{leave,\{the,room\}\}}. \end{aligned}$$

(A5)

What is of interest here is that the interpretation of K (that is, of “leave the room”) is determined at that stage of the derivation (at that “cycle”), and cannot be changed at subsequent stages (“cycles”). Therefore, if we add “Mary” to obtain “Mary leaves the room” (call it $K'$), as in Eq. A6, the interpretation of K will be the same in Eq. A5 and in Eq. A6.

$$\begin{aligned} \mathrm{MERGE} (\mathrm{Mary},K) = \mathrm{\{Mary,\{leaves,\{the,room\}\}\} }. \end{aligned}$$

(A6)

In a nutshell, the interpretation of complex objects is constructed stepwise, in a step-by-step fashion, and whatever has been done at a stage s cannot be undone at statge $s+1$ (Eqs. A5 and A6 above). This, in turn, is quite analogue to the idea of irreversibility of RG flows in physics, which matches perfectly with our interpretation of MERGE as a coarse-graining of information.

Such stages at a derivation, where a “computation” is done and cannot be altered afterwards, correspond with the so-called linguistic PHASES, and the device responsible for ensuring that the interior of a PHASE is no longer accessible is the PHASE IMPENETRABILITY CONDITION (PIC for short). What has been called “phase” roughly corresponds with the notion of “cycle” described above. Using the physical interpretation that we introduce in this paper, one would say that a PHASE in linguistics is the analogous of an RG scale in physics.

To be more precise, a phase is defined in linguistics as a domain D where uninterpretable features (number and person features of verbs) are valued. When a phase is closed off, the complement domain $\Omega$ (which can itself be complex, in the sense of having some inner structure) of the phase head P cannot be modified from the outside; this means, for instance, that the case of an NP within $\Omega$ (e.g., “the book” in the VP “read the book”) cannot be changed once the phase headed by P is complete [31, 101]. Among other things, this entails that “the book”, which is the Direct Object of “read” Sentence A7 (it receives accusative case from “read”), cannot also be the Direct Object of the matrix verb “believe”:

$$\begin{aligned} \mathrm{I ~ believe ~ that ~ John ~ read ~ the ~ book.} \end{aligned}$$

(A7)

That “the book” is the Direct Object of “read” and not of “believe” is shown in Sentences A8, where we see that this NP can be passiviced in the embedded clause, but not in the matrix clause ($*$ signals ungrammaticality):

$$\begin{aligned}&a.&\mathrm{I ~ believe ~ that ~ the ~ book ~ was ~ read.} \\&b.&\mathrm{*The ~ book ~ was ~ believed ~ that ~ John ~ read.} \nonumber \end{aligned}$$

(A8)

This “shielding” effect that makes the VP impenetrable is captured by the phase impenetrability condition mentioned above. Physically, this is the irreversibility of the RG flow when moving from one RG scale to the next. There are various approaches to Phase Theory [98], but all of them share the key intuition that PHASES are domains where complexity is reduced by somehow allowing the system to “forget” about an amount of structure that has been created and which will be no longer accessible. This process of “forgetting” is, in fact, analogous to the process of “discarding irrelevant degrees of freedom” in an RG-step in physics.

Moreover, the “rescaling” step in RG has not been discussed in this paper, but also appears naturally when particularizing to specific models of language. For instance, in the Matrix Syntax model [65] this rescaling appears naturally in order to recover the correct linguistic labels after a MERGE operation (see Ref. [65] and the discussions therein for more information). We believe that this is a general feature: the “rescaling” in physics is nothing but the “mathematical relabelling” that one needs in order to recover the correct labels (NP, VP, etc) after a MERGE operation when dealing in practice with models of language.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gallego, Á.J., Orús, R. Language Design as Information Renormalization. SN COMPUT. SCI. 3, 140 (2022). https://doi.org/10.1007/s42979-021-01002-y

Download citation

Received: 18 June 2021
Accepted: 17 December 2021
Published: 21 January 2022
DOI: https://doi.org/10.1007/s42979-021-01002-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Language Design as Information Renormalization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Semantic and Syntactic Model of Natural Language Based on Tensor Factorization

Development of a Semantic and Syntactic Model of Natural Language by Means of Non-negative Matrix and Tensor Factorization

Semantic and Syntactic Model of Natural Language Based on Non-negative Matrix and Tensor Factorization

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Appendix A: More Analogies by Digging Deeper

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Language Design as Information Renormalization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Semantic and Syntactic Model of Natural Language Based on Tensor Factorization

Development of a Semantic and Syntactic Model of Natural Language by Means of Non-negative Matrix and Tensor Factorization

Semantic and Syntactic Model of Natural Language Based on Non-negative Matrix and Tensor Factorization

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Appendix A: More Analogies by Digging Deeper

Appendix A: More Analogies by Digging Deeper

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation