Introduction

Word frequency effects have long played a central role in developing and evaluating models of visual word recognition and reading. Although models differ in how they account for frequency effects, most have assumed that lexical representations encode some form of token frequency.

Recently, there has been increased interest in studying word recognition in character-based orthographies because differences from alphabetic orthographies can be used to address a range of theoretical issues in reading (e.g., Rayner, Li, & Pollatsek, 2007). In Chinese, two-character words comprise 72% of the lexicon (Lexicon of Common Words in Contemporary Chinese Research Team, 2008). Studies of frequency effects for two-character words show both word-level frequency effects and, with word-level frequency controlled, smaller frequency effects for the initial character (e.g., Shen & Li, 2012; Yan, Tian, Bai, & Rayner, 2006). This is consistent with models in which, like alphabetic orthographies, the word is the primary unit but with independent effects at the character-level, especially for word-initial characters, which are often semantically related to the entire word.

Emerging evidence suggests that measures that take into account the contexts in which a word are likely to occur provide plausible alternative explanations for effects previously attributed to token frequency. Thus it becomes crucial to establish the empirical foundations for context-dependent measures. Adelman and colleagues (2006) operationalized a measure, contextual diversity (CD) – the proportion of texts in a corpus in which a word occurs. CD is typically correlated with frequency: High frequency words typically occur in a wider variety of contexts than lower frequency words. When Adelman, Brown, and Quesada (2006) manipulated frequency, while controlling for CD, and vice versa, naming and lexical decision times were affected by CD but not by token frequency. Subsequent research suggests that the underlying dimension reflected in CD effects may be the semantic distinctiveness of a word used across contexts (Jones, Johns, & Recchia, 2012), which will generally be correlated with CD. We return to this issue in the “General discussion”.

Two recent eye-movement studies found CD effects and the absence of frequency effects (CD-only effects) for words embedded in sentences in English (Plummer, Perea, & Rayner, 2014) and Chinese (Chen, Huang, Xu, Yang, & Tanenhaus, 2017). Chen et al. provide the strongest evidence. First, they demonstrated a CD-only effect in a design which rotated triples of control words, frequency-matched words with higher CD values, and CD-matched words with lower frequency values through the same sentence frames, thus ruling out potential confounds from differences in sentence frames. Second, small differences that were consistent with a residual frequency effect did not increase with an expanded frequency manipulation.

The current study uses the same manipulations as Chen et al. (2017) with two-character Chinese words to examine first character frequency and CD effects, when overall word CD is controlled. The possibilities for first-character effects are: (1) both frequency and CD effects; (2) frequency-only effects; (3) CD-only effects; and (4) neither frequency nor CD effects. Each outcome would place potential constraints on models of word recognition in sentence contexts, and suggest different avenues for research that would likely shed new light on long-standing theoretical debates (e.g., how context influences different aspects of visual word recognition). We consider implications of a few possible outcomes to highlight the theoretical motivation for examining character-based frequency and CD.

Classic word recognition models assume multiple levels of representation. For Chinese, these include radicals and, for two character-Chinese words, the component characters. Whereas word-level CD-only effects could reflect post-lexical integration and/or word-level expectations, character-level frequency effects would be consistent with models in which characters, as pre-lexical units, are primarily influenced by bottom-up processing, and thus sensitive only to token frequency. In contrast, CD-only character effects, especially if they were localized to first-pass measures, would be consistent with prediction-based models (e.g., Smith & Levy, 2013) in which the first character in parafoveal vision has special status. The absence of both character-based frequency and CD effects would be consistent with models in which all prediction and/or integration effects are word-based.

We controlled word frequency and CD, while independently manipulating frequency and CD of the first character. Experiment 1 rotated yoked triples through the same sentence frame to establish whether there are frequency and/or CD effects, while ruling out any confounding effects of sentence-context. Experiment 2 used a separate sentence frame for each character, allowing for more words with a larger frequency range and longer and more varied sentence frames.

Experiment 1: Contextual diversity effect on characters

We compared three groups of characters: Higher frequency characters with lower CD (Control group); lower frequency characters (LCF group) with a similar CD as the Control group but lower frequency; and higher-CD (HCD) group matched in frequency with the Control group but with higher CD. We adopted this design because there is a dearth of characters with high CD and low frequency (also see Hills, Maouene, Riordan, & Smith, 2010; Perea, Soares, & Comesaña, 2013; Plummer et al., 2014). Comparisons of primary interest are the LCF group with the Control group, which would reveal any effects of frequency with CD controlled, and the HCD group with the Control group, which would reveal any effects of CD with frequency controlled.

Method

Participants

Participants were 30 native speakers of Mandarin with normal or corrected-to-normal vision. Participants received informed consent and were paid 20 RMB.

Materials

Critical characters were chosen from SUBTLEX-CH-CHR (Cai & Brysbaert, 2010), which uses 6,243 films to compute character frequency based on the log10 transformed number of occurrences from 46 million characters and CD based on the log10 transformed number of films in which the character appears. Frequencies from this corpus explain significantly more of the variance in word and character reading than frequencies based on written texts (see Cai & Brysbaert for details).

We selected 27 character triples for each condition (see Table 1). Characters in the LCF group had a similar CD to the control condition (t (52) = 1.43, p = 0.16) but a lower character frequency, t (52) = 18.10, p < 0.001. Characters in the HCD group had a similar character frequency to the control condition (t (52) = -0.58, p = 0.56) but a higher CD, t (52) = −12.50, p < 0.001. Characters were matched in number of strokes (ts < 0.64, ps > 0.53), radicals (ts < 1.22, ps > 0.22), and orthographic neighborhood size (ts < 0.68, ps > 0.50). All targets were the first character of compound nouns. The characters after the target were matched in frequency (ts < 1.09, ps > 0.28), CD (ts < 1.03, ps > 0.30), stroke (ts < 0.79, ps > 0.43), radical (ts < 0.41, ps > 0.68), and orthographic neighborhood size (ts < 1.10, ps > 0.27) across conditions (see Table 2). We also controlled frequency (ts < 0.48, ps > 0.63), CD (ts < 1.10, ps > 0.28), and semantic diversity (ts < 1.07, ps > 0.29) of the two-character word that began with the target character. We created 27 sentence frames and rotated target characters from each triple through the same frame across three lists such that each list contained nine sentences from each condition, with one version of each sentence.

Table 1 Experimental conditions and exemplar sentences in Experiment 1
Table 2 Characteristics of the target in each group

Control norms

Using a 7-point scale, norming studies with 30 participants who rated plausibility (see Table 1) revealed no significant differences between conditions, ts < 0.92, ps > 0.36. Character predictability was assessed using 20 students in a cloze completion task with the target word replaced by a blank. The target character appeared in only 2.36% of the completions, with no significant differences across conditions (ts < 0.22, ps > 0.83). Concreteness values obtained from 30 participants on a 7-point scale (Table 2) revealed no significant differences across conditions for the target characters (ts < 1, ps > 0.34) and the two-character words they were embedded in (ts < 1.09, ps > 0.28).

Apparatus

Eye movements were monitored using a SensoMotoric Instruments (Teltow/Berlin, Germany) iView Hi-Speed system, sampling at 1,250 Hz (tracking resolution < 0.01°). Viewing was binocular but data were collected only from the right eye. Sentences were presented on a 17-in. CRT monitor. Each character subtended 1.05° of visual angle at a viewing distance of 70 cm.

Procedure

Trials began with presentation of a fixation point left aligned with the first character of each sentence. The order of the 27 experimental sentences and 27 filler sentences was randomized for each participant. Participants were instructed to read silently at a normal rate and answered a yes/no question. The experiment began with eight practice sentences.

Results and discussion

All participants responded correctly on at least 85% of the questions. Prior to analysis, sentences with track losses were excluded (less than 2.49% of trials). Trials with blinks, fixations shorter than 80 ms, longer than 800 ms, and fixations above or below three standard deviations from the mean were deleted (6.73%).

We computed three first-pass measures, considered to primarily reflect early processes, and three measures (see Table 3) which reflect later processes (e.g., Rayner, 1998, 2009). Early measures were first-fixation duration, gaze duration, and skipping rate. Later measures were go-past time, regression rate, and total fixation time.

Table 3 Eye movement measures on target characters in each group

We implemented planned comparisons using linear mixed-effects models for fixation durations and mixed logit models for skipping rate using the lme4 package (Bates, Maechler, & Bolker, 2012) in R environment (R Development Core Team, 2014). The regression model included fixed effects (e.g., log-transformed character frequency and log-transformed CD) and maximal random effects for participants and items (Barr, Levy, Scheepers, & Tily, 2013; Jaeger, 2008). We report t-values for linear mixed-effects models, z-values for mixed logit models, and corresponding p-values (see Table 4). For t-values, the lmerTest package was implemented to estimate p values using Satterthwaite approximation for degrees of freedom (Kuznetsova, Christensen, & Brockhoff, 2014).

Table 4 Regression coefficients and test statistics from linear mixed-effects and logistic mixed-effects models for eye movement measures on the target character

Effect of character frequency (control group vs. LCF group)

There were no effects of character frequency in first-pass measures, ts < 0.86, ps > 0.39, βs < 6.20 (skipping rate, z = −0.17, p = 0.86, β = −0.08) and later measures, ts < 1.25, ps > 0.22, βs < 9.07 (Reg.in, z = 1.09, p = 0.28, β = 0.29). Note that the skipping rates are low compared to some previous studies (e.g., Li, Bicknell, et al., 2014). The reason is that we included a fixation that preceded and was adjacent to a skipped character, which makes it likely that the character was processed on that fixation (Ehrlich & Rayner, 1983; Garrod, Freudenthal, & Boyle, 1994). We also calculated skipping rates without that fixation. We found similar skipping rates to the most comparable study (Cui, Yan, Bai, Hyönä, Wang, & Liversedge, 2013) using that procedure and repeated all analyses using those data.Footnote 1

Effect of contextual diversity (control group vs. HCD group)

For first-pass measures, first-fixation duration (β = −15.64) and gaze duration (β = −24.69) were shorter for the HCD characters than for characters with lower CD, respectively (ts > −2.28, ps < 0.03). The effect of CD was not significant in skipping rate (β = 0.54). For the later measures, HCD target characters had shorter go-past times (β = −28.49) and total fixation times (β = −33.76), ts > −2.28, ps < 0.007. There was no difference in regression rate (β = −0.36).

An additional regression analysis used the eye movement measures as dependent variables and log-transformed character CD and log-transformed character frequency as predictors. There were facilitative effects of character CD on FFD (β = −0.12, t = −2.83, p = 0.005), GD (β = −0.16, t = −3.77, p < 0.001), TTime (β = −0.15, t = −3.45, p = 0.001), and Go-past (β = −0.17, t = −3.48, p < 0.001) but no effects of character frequency (βs < 0.10, |t|s/Wald < 1.50, ps > 0.13).

In sum, when CD is controlled, character-frequency did not affect reading times, whereas when frequency is controlled, CD affected both first-pass and later measures.

Experiment 2: Contextual diversity effects on characters with large frequency range

In Experiment 1, the direction of the effect for four of the six measures is consistent with small increase in processing difficulty for the LCF characters (see Table 3). If there were an underlying frequency effect, differences between the Control and the LCF conditions, should increase with a larger frequency range. In Experiment 2, we used separate sentence frames for each word, enabling us to increase in the difference in mean log frequencies for the LCF and Control characters from 0.4 to 0.8. This also allowed us to a greater variety of sentence structures than the materials used in Experiment 2.

Method

Participants

Participants were 48 native speakers of Mandarin with normal or corrected-to-normal vision. Participants received informed consent and were paid 20 RMB.

Materials

We selected 48 critical characters, 16 for each of three conditions (see Tables 5 and 6). Characters in the LCF group had a similar CD as the control condition (t < 1.72, p > 0.10) but with a lower character frequency, t (30) = 10.59, p < 0.001. Characters in the HCD group had a similar character frequency to the control condition (t < 0.12, p > 0.24) but with a higher CD value, t (30) = 9.98, p < 0.001. All targets were the first character of compound nouns, matched in number of stokes (ts < 0.67, ps > 0.50), radicals (ts < 1.55, ps > 0.13), and orthographic neighborhood size (ts < 0.63, ps > 0.53). Characters prior to and after the target character were matched in frequency (ts < 1.39, ps > 0.17), CD (ts < 1.66, ps > 0.10), stroke (ts < 1.40, ps > 0.17), radical (ts < 1.45, ps > 0.15), and orthographic neighborhood size (ts < 1.65, ps > 0.38) across conditions. We also controlled the frequency (ts < 0.14, ps > 0.89), CD (ts < 0.92, ps > 0.36), and semantic diversity (ts < 0.20, ps > 0.26) of the two-character word that began with the target character (Table 5). Each target character was placed in its own sentence frame (see Table 6).

Table 5 Characteristics of the target in each group
Table 6 Experimental conditions and exemplar sentences in Experiment 2

Control norms

Thirty participants rated perceived difficulty and plausibility, using a 7-point scale (see Table 6). Differences between conditions were not significant (ts < 1.15, ps > 0.26). Character predictability, as assessed by 15 students in a cloze completion task, did not differ across conditions (ts < 1.09, ps > 0.28). Concreteness ratings by 30 participants using a 7-point scale found no significant differences across conditions for the target characters (ts < 1, ps > 0.54) and compound words they were embedded in (ts < 1, ps > 0.41).

Apparatus and procedure

Apparatus and procedure were identical to Experiment 1.

Results and discussion

We used the same exclusion criteria as in Experiment 1 to delete trials with track losses, blinks, and those above or below three standard deviations from the mean (9.6%). Table 7 displays condition means for each eye movement measure. Planned comparisons were performed using linear mixed-effects and mixed logit models (see Table 8). We again repeated all analyses with fixations immediately to the left of a skipped target character excluded (see Table 7). We note that reading times are in general longer for Experiment 2 compared to Experiment 1. This most likely reflects the increased variety, length, and likely complexity of the sentences.

Table 7 Eye movement measures on target character
Table 8 Regression coefficients and test statistics from linear mixed-effects and logistic mixed-effects models for eye movement measures on the target character

Effect of character frequency (control group vs. LCF group)

Effects of character frequency were not significant in any of the first-pass (ps > 0.55, βs < −3.42) and later measures (ps > 0.61, βs < 6.10).

Effect of contextual diversity (control group vs. HCD group)

For first-pass measures, first-fixation duration (β = −16.67) and gaze duration (β = −39.12) were shorter for the HCD characters than for characters with lower CD, respectively (ts > −3.13, ps < 0.005), with no main effect of CD on skipping rate, z = 0.53, p = 0.597, β = 0.26. For later measures, HCD target characters had shorter go-past time (β = −49.88) and total fixation time (β = −45.39), ts > −2.28, ps < 0.007. The regression rate was numerically, but not significantly, lower for HCD characters (β = −0.12).

A regression analysis found significant effects of character CD on FFD (β = −0.10, t = −3.07, p = 0.002), GD (β = −0.12, t = −3.83, p < 0.001), TTime (β = −0.12, t = −3.87, p < 0.001), and Go-past (β = −0.12, t = −3.58, p < 0.001), but no significant effects of character frequency (βs < 0.30, |t|s/Wald < 1.40, ps > 0.19).

General discussion

When word frequency and CD for two-character words were controlled, first-character fixation times were shorter for higher CD characters but were not affected by character frequency. These results complement Chen et al. (2017) who found the same pattern for word-level frequency and CD. These studies establish that CD, a measure that takes into account contexts better accounts for differences that have typically been attributed to token frequency. They go beyond previous findings in demonstrating that the absence of frequency effects in these and previous studies is unlikely to be a consequence of using restricted frequency ranges.

We conclude by outlining some of the theoretical implications. First and foremost, it is important to develop models that can explain both why CD influences word recognition in reading and why effects that have previously been attributed to token frequency are eliminated when CD is controlled. Although standard models of word frequency have assumed that token frequency is a characteristic of form-based lexical representations, it now seems increasingly unlikely that simple form-based (token) frequencies are encoded in lexical representations. We note, however, that future research with measures such as ERP could, in principle, isolate token frequency effects that are not reflected in processing difficulty (cf. Vergara-Martínez, Comesaña, & Perea, 2017)

Setting aside this possibility, why might CD have the effects it does, and what are the implications for models of word recognition and reading? As Jones et al. argue, the relevant dimension approximated by CD is the likelihood that a word will occur in different types of contexts. Indeed, Jones et al. show that a measure that they operationalize as semantic diversity – roughly the likelihood that a word will occur in semantically distinct contexts, as measured by co-occurrence – is correlated with CD but provides a better fit to data which orthogonally varies semantic diversity and CD.

We believe that a promising approach is to embed context-dependent measures in models in which reading times are a function of predictability (e.g., Smith & Levy, 2013) by assuming that predictability effects are at least partially conditionalized on types of contexts. In the absence of knowledge that one is in a particular context, the most appropriate prior for a word is the likelihood that it will occur in any class of context. In unconstrained contexts, then, CD is a better estimate of the likelihood that a word will be encountered than is token frequency. This approach is related to, but conceptually distinct from, semantic diversity as defined by Jones et al. (2012 in that it predicts that (a) token frequency effects will emerge in constrained contexts and (b) judgments of relatively frequency would be more accurate for words within the same category (e.g., trumpet and trombone than for words from different categories (e.g., trumpet and daffodil).

If token frequency effects emerge in constrained contexts when CD is controlled, frequency and CD effects could prove useful in elucidating the grain at which contexts are defined. For example, local within-sentence predictability might be less important than predictability that is tied to a more global context, for example, a text or conversation about a particular sporting event (e.g., baseball or football (soccer in American English) or a type of activity (e.g., cooking).

This, in turn, raises questions about how to operationalize the notion of a context, such that it includes domains which shift the distributions of families of words. One possibility is that a context is theoretically similar to the notion of topic that has been explored in models by Steyvers and colleagues (2007). Examining issues related to the likelihood of different word senses appearing in these broad-based contexts, is likely to shed light on the relevant dimension(s) over which readers and listeners generate expectations about forms versus senses. These questions are of particular interest given the emerging evidence for form-based expectation effects in visual word recognition (e.g., Dikker, Rabagliati, Farmer, & Pylkkanen, 2010; Farmer, Christiansen, & Monaghan, 2006; Farmer, Monaghan, Misyak, & Christiansen, 2011; Farmer, Yan, Bicknell, & Tanenhaus, 2015). From this perspective, word-independent, CD-based initial character effects for two-character words occur because form-based predictions about the upcoming character are useful in rapidly estimating surprisal. Chinese will be particularly useful for addressing these issues because of the independent effects of word and character CD in two-character words, and the quasi-systematic relationship between the sense of the first character and the sense of the word.