Abstract
The informal judgments of the well-formedness of phrases and sentences have long been used as the primary data source for syntacticians. In recent years, the reliability of data based on linguists’ introspective intuitions is increasingly subject to scrutiny. Although a number of studies were able to replicate a vast majority of English judgments published in a textbook and in peer-reviewed journal articles, the status of data in many non-English languages has yet to be experimentally examined. In this work, we employed formal quantitative methods to evaluate the reliability of judgments in the widely used textbook, The Syntax of Chinese (Huang et al. 2009). We first assessed example sentences based on the acceptability ratings from 148 native Mandarin Chinese speakers. Using a target forced-choice task, we further explored the potentially problematic sentence pairs. Results of the two experiments suggest an eminently successful replication of judgments in the book: out of the 557 data samples tested, only five sentence pairs require further investigation. This large-scale study represents the first attempt to replicate the judgments in a non-English syntax textbook, in hopes to bridge the gap between the informal data-collection in Chinese linguistic research and the protocols of experimental cognitive science.
Similar content being viewed by others
Notes
Sprouse and Almeida (2017) were the first to compare the statistical power of various judgment tasks. A follow-up work by Langsford et al. (2018) estimated how much of the variability within each task is due to psychometric properties, including participant-level individual differences, sample size, response styles, and item effects.
Sprouse et al. (2013) defined “predominantly” as more than 80% of the data points in an article.
While the HLL book was originally published in English, the experimental stimuli were presented in scripts from the book’s simplified Chinese edition (Huang et al. 2013) where spaces segmenting two adjacent words were removed. For all examples in this paper, the page numbers that we refer to are from the book’s English edition.
The experimental materials (and the excluded sentences), data, and code for this manuscript are available at https://osf.io/374h6/.
These quadruples, e.g. a group of four “bad” sentences, can not be divided into two Pair contrasts, nor can they be analyzed using two-way ANOVAs.
It is also possible to address the rating bias issue by testing baseline items along with target sentences in the same rating experiment. Lin (2018), for example, created three baseline groups in his naturalness-rating experiment by manipulating the degree of word-order and grammaticality violation in sentence items.
Seven participants majored in language-related degree programs, such as linguistics, applied linguistics, Chinese literature, or foreign literature.
In this paper, we choose to also report statistical analyses based on the raw data following recommendations by Juzek (2015) and others.
We suspect that a number of participants may have misunderstood the instruction of catch trials. 15 participants correctly answered at least one catch trial and only seven missed both. Nonetheless, we excluded the data produced by anyone who had failed even one catch trial.
For most contrasts in the Control 1 group, participants did not choose any “bad” sentence. The calculated z-value was therefore smaller in the Control 1 group than in the Control 2 group, as the mixed-effects model considered results of individual contrasts.
We focus on four contrasts where the predicted directionality of results is reversed. For contrast c8-60, more participants indeed chose the acceptable sentence as the preferred one. Its marginal result in Experiment 2 may be a power issue.
Bare NPs in Chinese are often ambiguous between definite and indefinite readings.
See Fukuda (to appear) for a comprehensive review of studies using acceptability and truth value judgment methods in East Asian languages, including Chinese.
References
Adger, David. 2003. Core syntax: A minimalist approach. Oxford: Oxford University Press.
Aoun, Joseph, and Yen-hui Audrey Li. 2003. Essays on the representational and derivational nature of grammar: The diversity of wh-constructions. Cambridge: MIT Press.
Birdsong, David. 1989. Metalinguistic performance and interlinguistic competence. Dordrecht: Springer.
Chaves, Rui P. and Jeruen E. Dery. 2014. Which subject islands will the acceptability of improve with repeated exposure? In Proceedings of the 31st west coast conference on formal linguistics. Cascadilla proceedings project, ed. R. E. Santana La Barge. Somerville, MA.
Chen, Zhong, Lena Jäger, and Shravan Vasishth. 2012. How structure-sensitive is the parser? Evidence from Mandarin Chinese. In Empirical approaches to linguistic theory. Studies in generative grammar, ed. B. Stolterfoht, and S. Featherston, 43–62. Berlin: Mouton de Gruyter.
Chomsky, Noam. 1965. Aspects of the theory of syntax. Cambridge, MA: MIT Press.
Chomsky, Noam. 1973. Conditions on transformations. In A Festschrift for Morris Halle, ed. S. Anderson, and P. Kiparsky, 232–286. New York: Holt, Reinhart and Winston.
Chomsky, Noam. 1986. Barriers. Cambridge: MIT Press.
Cowart, Wayne. 1997. Experimental Syntax: Applying objective methods to sentence judgements. Thousand Oaks, CA: SAGE Publications.
den Dikken, Marcel, Judy B. Bernstein, Christina Tortora, and Raffaella Zanuttini. 2007. Data and grammar: Means and individuals. Theoretical Linguistics 33 (3): 335–352.
Do, Monica L., and Elsi Kaiser. 2017. The relationship between syntactic satiation and syntactic priming: A first look. Frontiers in Psychology: Language Sciences 8: 1851.
Edelman, Shimon, and Morten H. Christiansen. 2003. How seriously should we take minimalist syntax? A comment on Lasnik. Trends in Cognitive Sciences 7 (2): 60–61.
Erlewine, Michael Yoshitaka, and Hadas Kotek. 2016. A streamlined approach to online linguistic surveys. Natural Language & Linguistic Theory 34 (2): 481–495.
Featherston, Sam. 2005. Magnitude estimation and what it can do for your syntax: Some wh-constraints in German. Lingua 115 (11): 1525–1550.
Featherston, Sam. 2007. Data in generative grammar: The stick and the carrot. Theoretical Linguistics 33 (3): 269–318.
Ferreira, Fernanda. 2005. Psycholinguistics, formal grammars, and cognitive science. The Linguistic Review 22 (2–4): 365–380.
Francom, Jerid Cole. 2009. Experimental Syntax: Exploring the effect of repeated exposure to anomalous syntactic structure–evidence from rating and reading tasks. Ph. D. thesis, University of Arizona, Tucson, AZ.
Fukuda, Shin. Acceptability and truth value judgment studies in East Asian languages. In The Cambridge handbook of experimental syntax, ed. G. Goodall. Cambridge: Cambridge University Press (to appear).
Gibson, Edward, and Evelina Fedorenko. 2010. Weak quantitative standards in linguistics research. Trends in Cognitive Sciences 14 (6): 233–234.
Gibson, Edward, and Evelina Fedorenko. 2013. The need for quantitative methods in syntax and semantics research. Language and Cognitive Processes 28 (1–2): 88–124.
Gibson, Edward, Steven T. Piantadosi, and Evelina Fedorenko. 2013. Quantitative methods in syntax/semantics research: A response to Sprouse and Almeida (2013). Language and Cognitive Processes 28 (3): 229–240.
Gong, Tao, Lan Shuai, and Wu Yicheng. 2019. The acceptability judgment of Chinese pseudo-modifiers with and without a sentential context. PLOS ONE 14 (7): e0219896.
Goodall, Grant. 2011. Syntactic satiation and the inversion effect in English and Spanish wh-questions. Syntax 14 (1): 29–47.
Hartley, James. 2014. Some thoughts on Likert-type scales. International Journal of Clinical and Health Psychology 14 (1): 83–86.
Hiramatsu, Kazuko. 2000. Accessing linguistic competence: Evidence from children’s and adults’ acceptability judgments. Ph. D. thesis, University of Connecticut.
Hofmeister, Philip, and Ivan A. Sag. 2010. Cognitive constraints and island effects. Language 86 (2): 366–415.
Huang, Cheng-Teh James. 1982. Logical relations in Chinese and the theory of grammar. Ph. D. thesis, MIT.
Huang, Cheng-Teh James, Yen-Hui Audrey Li, and Yafei Li. 2009. The syntax of Chinese. Cambridge: Cambridge University Press.
Huang, Cheng-Teh James, Yen-Hui Audrey Li, and Yafei Li. 2013. Hanyu Jufa Xue [The Syntax of Chinese] (Simplified, Chinese ed.; Yang Gu, Ed. and Heyou Zhang, Trans.). Beijing, China: World Publishing Corporation.
Juzek, Thomas. 2015. Acceptability judgement tasks and grammatical theory. Ph. D. thesis, University of Oxford.
Juzek, Thomas, and Jana Häussler. Data convergence in syntactic theory and the role of sentence pairs. Zeitschrift fü Sprachwissenschaft (to appear).
Keller, Frank. 2000. Gradience in grammar: Experimental and computational aspects of degrees of grammaticality. Ph. D. thesis, University of Edinburgh.
Khoo, Yong Kang, and Jingxia Lin. 2018. Grammatical variations between Singapore, Mainland China, and Taiwan Mandarin: A pilot study of aspect marking. In Proceedings of the 32nd Pacific Asia conference on language, information and computation.
Labov, William. 1978. Sociolinguistics. In A survey of linguistic science, ed. W.O. Dingwall, 339–72. Stamford, CT: Greylock.
Langendoen, D.Terence, Nancy Kalish-Landon, and John Dore. 1973. Dative questions: A study in the relation of acceptability to grammaticality of an English sentence type. Cognition 2 (4): 451–478.
Langsford, Steven, Amy Perfors, Andrew T. Hendrickson, Lauren A. Kennedy, and Danielle J. Navarro. 2018. Quantifying sentence acceptability measures: Reliability, bias, and variability. Glossa: A Journal of General Linguistics 3 (1): 1–34.
Lau, Jey Han, Alexander Clark, and Shalom Lappin. 2017. Grammaticality, acceptability, and probability: A probabilistic view of linguistic knowledge. Cognitive Science 41 (5): 1202–1241.
Laws, Jacqueline, and Boping Yuan. 2010. Is the core-peripheral distinction for unaccusative verbs cross-linguistically consistent?: Empirical evidence from Mandarin. Chinese Language and Discourse 1 (2): 220–263.
Levelt, Willem J.M., J.A.W.M. van Gent, A.F.J. Haans, and A.J.A. Meijers. 1977. Grammaticality, paraphrase, and imagery. In Acceptability in language, ed. S. Greenbaum, 87–101. The Hague: Mouton.
Likert, Rensis. 1932. A technique for the measurement of attitudes. Archives of Psychology 140: 44–60.
Lin, Chien-Jer Charles. 2012. Distinguishing grammatical and processing explanations of syntactic acceptability. In In search of grammar: Experimental and corpus-based studies. Language and linguistics monograph series, vol. 48, ed. J. Myers. Taipei: Academia Sinica.
Lin, Chien-Jer Charles. 2018. Subject prominence and processing dependencies in prenominal relative clauses: The comprehension of possessive relative clauses and adjunct relative clauses in Mandarin Chinese. Language 94 (4): 758–797.
Linzen, Tal, and Yohei Oseki. 2018. The reliability of acceptability judgments across languages. Glossa: A Journal of General Linguistics 3(1) (100): 1–25.
Lu, Jiayi, Cynthia K. Thompson, and Masaya Yoshida. 2020. Chinese wh-in-situ and islands: A formal judgment study. Linguistic Inquiry 51 (3).
Mahowald, Kyle, Peter Graff, Jeremy Hartman, and Edward Gibson. 2016. Snap judgments: A small N acceptability paradigm (SNAP) for linguistic acceptability judgments. Language 92 (3): 619–635.
Marantz, Alec. 2005. Generative linguistics within the cognitive neuroscience of language. The Linguistic Review 22 (2–4): 429–445.
Munro, Robert, Steven Bethard, Victor Kuperman, Vicky Tzuyin Lai, Robin Melnick, Christopher Potts, Tyler Schnoebelen, and Harry Tily. 2010. Crowdsourcing and language studies: The new generation of linguistic data. In Proceedings of the NAACL-HLT 2010 workshop on creating speech and language data with Amazon’s Mechanical Turk, 122–130. Association for Computational Linguistics.
Myers, James. 2007. MiniJudge: Software for small-scale experimental syntax. International Journal of Computational Linguistics & Chinese Language Processing 12: 175–194.
Myers, James. 2009a. The design and analysis of small-scale syntactic judgment experiments. Lingua 119 (3): 425–444.
Myers, James. 2009b. Syntactic judgment experiments. Language and Linguistics Compass 3 (1): 406–423.
Myers, James. 2012. Testing adjunct and conjunct island constraints in Chinese. Language and Linguistics 13 (3): 437.
Newmeyer, Frederick J. 1983. Grammatical theory: Its limits and its possibilities. Chicago: University of Chicago Press.
Newmeyer, Frederick J. 2013. Goals and methods of generative syntax. In The Cambridge handbook of generative syntax, ed. M. den Dikken, 61–92. Cambridge: Cambridge University Press.
Ou, Tzu-Shan. 2006. Suo relative clauses in Mandarin Chinese. Master’s thesis, National Chung Cheng University, Taiwan.
Phillips, Colin. 2009. Should we impeach armchair linguists? In Japanese/Korean Linguistics, vol. 17, ed. S. Iwasaki, 49–64. Stanford: CSLI Publications.
Phillips, Collin, and Howard Lasnik. 2003. Linguistics and empirical evidence: Reply to Edelman and Christiansen. Trends in Cognitive Sciences 7 (2): 61–62.
Rosenbach, Anette. 2003. Aspects of iconicity and economy in the choice between the s-genitive and the of-genitive in English. In Determinants of grammatical variation in English. Volume 43 of topics in English linguistics [TiEL], ed. G. Rohdenburg, and B. Mondorf, 379–412. Berlin: De Gruyter Mouton.
Schütze, Carson. 2020. Acceptability ratings cannot be taken at face value. In Linguistic intuitions, ed. S. Schindler, A. Drozdzowicz, and K. Brøcker. Oxford: Oxford University Press.
Schütze, Carson. 1996. The empirical base of linguistics: Grammaticality judgments and linguistic methodology. Chicago: University of Chicago Press.
Schütze, Carson, and Jon Sprouse. 2014. Judgment data. In Research methods in linguistics, ed. R.J. Podesva, and D. Sharma, 27–50. Cambridge: Cambridge University Press.
Scontras, Gregory, Maria Polinsky, Cheng-Yu Edwin Tsai, and Kenneth Mai. 2017. Cross-linguistic scope ambiguity: When two systems meet. Glossa: A Journal of General Linguistics 2 (1): 1–28.
Scontras, Gregory, Cheng-Yu Edwin Tsai, Kenneth Mai, and Maria Polinsky. 2014. Chinese scope: An experimental investigation. Proceedings of Sinn und Bedeutung 18: 396–414.
Shi, Dingxu. 1994. The nature of Chinese wh-questions. Natural Language & Linguistic Theory 12 (2): 301–333.
Snyder, William. 2000. An experimental investigation of syntactic satiation effects. Linguistic Inquiry 31 (3): 575–582.
Song, Sanghoun, Jae-Woong Choe, and Oh Eunjeong. 2014. FAQ: Do non-linguists share the same intuition as linguists? Language Research 50 (2): 357–386.
Sprouse, Jon. 2009. Revisiting satiation: Evidence for an equalization response strategy. Linguistic Inquiry 40 (2): 329–341.
Sprouse, Jon. 2011. A validation of Amazon Mechanical Turk for the collection of acceptability judgments in linguistic theory. Behavior Research Methods 43 (1): 155–167.
Sprouse, Jon, and Diogo Almeida. 2012. Assessing the reliability of textbook data in syntax: Adger’s Core Syntax. Journal of Linguistics 48 (3): 609–652.
Sprouse, Jon, and Diogo Almeida. 2013. The empirical status of data in syntax: A reply to Gibson and Fedorenko. Language and Cognitive Processes 28 (3): 222–228.
Sprouse, Jon, and Diogo Almeida. 2017. Design sensitivity and statistical power in acceptability judgment experiments. Glossa: A Journal of General Linguistics 2 (1): 1–32.
Sprouse, Jon, Carson Schütze, and Diogo Almeida. 2013. Assessing the reliability of journal data in syntax: Linguistic inquiry 2001–2010. Lingua 134: 219–248.
Sprouse, Jon, Matt Wagers, and Colin Phillips. 2012. Working-memory capacity and island effects: A reminder of the issues and the facts. Language 88 (2): 401–407.
Sprouse, Jon, Beracah Yankama, Sagar Indurkhya, Sandiway Fong, and Robert C. Berwick. 2018. Colorless green ideas do sleep furiously: Gradient acceptability and the nature of the grammar. The Linguistic Review 35 (3): 575–599.
Thurstone, Louis L. 1927. A law of comparative judgment. Psychological Review 34 (4): 273.
Wang, Shichang, Chu-Ren Huang, Yao Yao, and Angel Chan. 2015. Mechanical turk-based experiment vs laboratory-based experiment: A case study on the comparison of semantic transparency rating data. In Proceedings of the 29th Pacific Asia conference on language, information and computation, pp. 53–62.
Xu, Liejiong. 1990. Remarks on LF movement in Chinese questions. Linguistics 28 (2): 355–383.
Xu, Liejiong. 1996. Construction and destruction of theories by data: A case study. Chicago Linguistics Society 32: 107–118.
Yao, Yao, Zhiguo Xie, Chien-Jer Charles Lin, and Chu-Ren Huang. Acceptability or grammaticality: Judging Chinese sentences for linguistic studies. In Cambridge handbook of Chinese linguistics. Cambridge: Cambridge University Press (to appear).
Zhou, Peng, and Liqun Gao. 2009. Scope processing in Chinese. Journal of Psycholinguistic Research 38: 11–24.
Acknowledgements
This work was supported by a faculty research grant to Zhong Chen from the College of Liberal Arts at Rochester Institute of Technology. We are grateful to Jeff Runner for discussions at various stages of this project. We thank Tian Tian for her assistance in preparing experimental materials, Qingrong Chen, Qiongpeng Luo and Zhuang Wu for helping with recruiting participants, as well as Jacquelyn Haller for editorial suggestions. We would also like to thank the anonymous reviewers and the editors of this journal for their insightful comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chen, Z., Xu, Y. & Xie, Z. Assessing introspective linguistic judgments quantitatively: the case of The Syntax of Chinese. J East Asian Linguist 29, 311–336 (2020). https://doi.org/10.1007/s10831-020-09210-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10831-020-09210-y