De Leeuw, Andrews Et Al. 2018
De Leeuw, Andrews Et Al. 2018
De Leeuw, Andrews Et Al. 2018
Joshua R. de Leeuw*, Jan Andrews, Zariah Altman1, Rebecca Andrews1, Robert Appleby1, James L. Bonanno1,
Isabella DeStefano1, Eileen Doyle-Samay1, Ayela Faruqui1, Christina M. Griesmer1, Jackie Hwang1, Kate
Lawson1, Rena A. Lee1, Yunfei Liang1, John Mernacaj1, HenryJ. Molina1, Hui Xin Ng1, Steven Park1, Thomas
Possidente1, Anne Shriver1
Joshua R. de Leeuw, Jan Andrews, Zariah Altman1, Rebecca Andrews1, Robert Appleby1,
James L. Bonanno1, Isabella DeStefano1, Eileen Doyle-Samay1, Ayela Faruqui1, Christina M.
Griesmer1, Jackie Hwang1, Kate Lawson1, Rena A. Lee1, Yunfei Liang1, John Mernacaj1,
Henry J. Molina1, Hui Xin Ng1, Steven Park1, Thomas Possidente1, Anne Shriver1
Vassar College
We report a replication of Patel, Gibson, Ratner, Besson, and Holcomb (1998). The results
of our replication are largely consistent with the conclusions of the original study. We
found evidence of a P600 component of the event-related potential (ERP) in response to
syntactic violations in language and harmonic inconsistencies in music. There were some
minor differences in the spatial distribution of the P600 on the scalp between the repli-
cation and the original. The experiment was pre-registered at https://osf.io/g3b5j/. We
conducted this experiment as part of an undergraduate cognitive science research meth-
ods class at Vassar College; we discuss the practice of integrating replication work into
research methods courses.
Keywords: EEG, ERP, P600, language, music, replication.
Patel, Gibson, Ratner, Besson, and Holcomb publication (Google Scholar search, September
(1998) found that violations of expected syntactic 2018). It has been used as evidence for the “shared
structure in language and violations of expected syntactic integration resource hypothesis,” a theory
harmonic structure in music both elicit the P600 that posits that structural processing of music and
component of the event-related potential (ERP). The language utilizes the same cognitive and neural re-
P600 is a positive ERP component that occurs ap- sources (Patel, 2003). It has also been used to argue
proximately 600 ms after stimulus onset. While pre- more broadly for the shared neurological basis of
vious work had established a link between the P600 music and language (e.g., Abrams et al., 2011; Besson
component and syntactic violations in language (Os- & Schön, 2001; Herdener et al., 2014; Merrill et al.,
terhout & Holcomb, 1992, 1993; Osterhout, Holcomb, 2012; Patel, 2010; Sammler et al., 2010, 2013), and for
& Swinney, 1994), Patel and colleagues were the first the existence of shared cognitive resources/con-
to report a direct comparison of the P600 for viola- straints for processing music and language (e.g.,
tions of musical and linguistic structure, finding that Besson, Chobert, & Marie, 2011; Chobert, François,
the amplitude and scalp distribution of the P600 was Velay, & Besson, 2014; Christiansen & Chater, 2008;
similar for linguistic and musical violations. Lima & Castro, 2011; Moreno et al., 2009; Thompson,
This result has been influential in theorizing Schellenberg, & Husain, 2004; Tillmann, 2012).
about the relationship between music and language, Though the work has been influential, we are not
with more than 700 citations twenty years after aware of any published direct replications of the
main result. Several studies have found ERP corre-
We are extremely grateful to Debra Ratchford and lates of structural violations in music (Besson &
Prashit Parikh for their assistance supervising EEG data Faïta, 1995; Besson, Faïta, & Requin, 1994; Janata,
collection, and Polyphony Bruna for assisting with data 1995), though there is variation in the kinds of com-
organization and stimulus measurement. ponents that are found (Featherstone, Morrison,
1
Authors contributed equally to the work and are listed
in alphabetical order.
2 DE LEEUW, ANDREWS, ALTMAN, ANDREWS, APPLEBY, BONANNO, DESTEFANO, DOYLE-SAMAY, FARUQUI, GRIESMER, HWANG, LAWSON,
LEE, LIANG, MERNACAJ, MOLINA, NG, PARK, POSSIDENTE, & SHRIVER
Waterman, & MacGregor, 2013; Featherstone, Wa- of phenomena (LeBel et al., 2017), which is an essen-
terman, & Morrison, 2012). Other studies have found tial step for creating a set of robust empirical facts
that ERP markers of violations of linguistic structure for theory development.
are systematically affected by the presence or ab-
sence of simultaneous structural violations in music
Method
(Koelsch, Gunter, Wittfoth, & Sammler, 2005; Stein-
beis & Koelsch, 2008). These findings, along with
All stimuli, experiment scripts, data, and analysis
many other behavioral and non-ERP neural
scripts are available on the Open Science Frame-
measures (see Koelsch, 2011 for a review), support
work at https://osf.io/zpm9t/. The study pre-reg-
the general conclusion of Patel et al. (1998) that
istration is available at https://osf.io/g3b5j/. All
there is overlap between the processing of struc-
participants provided informed consent and this
tural violations in music and language. While this
study was approved by the Vassar College Institu-
converging evidence should bolster our belief in the
tional Review Board.
results, there is no substitute for a direct replication
given the well-documented problem of publication Overview
bias in the literature (e.g., Ingre & Nilsonne, 2018;
Rosenthal, 1979). In both the original experiment and our replica-
This experiment was part of an undergraduate tion, participants listened to short sentences and
research methods course in cognitive science, musical excerpts and made judgments about
which 2 of us co-taught, 17 of us were enrolled in, whether the sentence/music was acceptable or un-
and 1 of us was serving as a course intern. A major acceptable. ERPs in response to particular words or
focus of this course was exposure to and training in musical events were measured with EEG.
practices that have developed in response to the In the original experiment there were three crit-
replication crisis, including an increased emphasis ical kinds of sentences (grammatically simple, gram-
on direct replications (Zwaan, Etz, Lucas, & Donnel- matically complex, and ungrammatical) and three
lan, 2017), pre-registration of experiments critical kinds of musical excerpts (in key, nearby out
(Wagenmakers, Wetzels, Borsboom, van der Maas, & of key, and distant out of key). The P600 is measured
Kievit, 2012), and transparency through public shar- by comparing the amplitude of the ERP in the gram-
ing of materials, data, and analysis scripts (Nosek et matically simple condition to the other two language
al., 2015). To gain hands-on experience with these conditions and the in-key condition to the other two
practices, the class conducted this replication study. music conditions (see Results, below).
We chose to replicate Patel et al. (1998) given its the- Due to logistical constraints of lab availability,
oretical significance in the field, lack of prior direct time, and class schedule, we opted to restrict the
replications, and practical considerations like the replication to two kinds of sentences and two kinds
complexity of the data analysis and study design. of musical excerpts. We used only the grammatically
Our replication is what LeBel et al. (2017) would simple and ungrammatical sentences for the lan-
call a very close replication. While we were able to guage stimuli (plus their associated control stimuli,
operationalize the independent and dependent var- see Stimuli below), and only the in-key and distant
iables in the same manner as Patel et al. and were out-of-key musical excerpts. We believe that this
able to use either the same exact (music) or close choice is justifiable, as the theoretical claims of Patel
replicas (language) of the original stimuli, we did et al. are most strongly based on the P600 that was
make some changes to their procedure. We re- found in the ungrammatical and distant out-of-key
moved two conditions (out of six) to shorten the conditions, as these are the stronger contrasts (i.e.,
overall length of the experiment, which was neces- they are more “syntactically wrong”). The grammat-
sary to run the experiment in a classroom environ- ical and in-key conditions serve as the baseline for
ment. We also focused the analysis on what we took these analyses, and so must also be included. The
to be the key findings of the original. We highlight original also included unanalyzed filler sentences
these deviations from the original throughout the and musical excerpts to balance certain (possibly
methods section below. Very close replications like confounding) properties of the stimuli; by not in-
this one are efforts to establish the “basic existence” cluding many of the original stimuli these properties
SIMILAR EVENT-RELATED POTENTIALS TO MUSIC AND LANGUAGE: A REPLICATION OF PATEL, GIBSON, RATNER, BESSON, & HOLCOMB (1998) 3
were more balanced in the critical stimuli, and we with a list of the text of the original stimuli. We refer
were able to drop all of the fillers in the music con- the reader to Patel et al. (1998) for the full details of
dition and 20 of the fillers in the language condition. the stimuli. Here we describe a basic overview of the
Altogether, the original experiment contained 150 format to provide enough context for understanding
sentences (3 x 30 plus 60 fillers) and 144 musical ex- the experiment, as well as our process for recording
cerpts (3 x 36 plus 36 fillers), and our replication the audio stimuli.
contained 100 sentences (2 x 30 plus 40 fillers) and The music stimuli were short sequences of
72 musical excerpts (2 x 36). chords synthesized using a generic piano MIDI in-
strument. They were about 6 seconds long. The
Participants chords initially established a harmonic key. The tar-
44 Vassar College students, ages 18-22 (M = 19.8 get chord — either the root chord of the established
years, SD = 1.2 years), participated in the study. Our key (in-key condition) or the root chord of a dis-
pre-registered target was 40, which is slightly more tantly-related harmonic key (distant out-of-key
than 2.5 times the original sample (N=15). We aimed condition) — occurred in the second half of the ex-
for at least 2.5 times the original sample based on cerpt. An example in-key sequence can be heard at
the heuristic provided by Simonsohn (2015). The https://osf.io/z6vcu/. An example out-of-key se-
goal of the heuristic is for replications to have suffi- quence can be heard at https://osf.io/wde67/. To
cient power to detect effects that are smaller than simplify condition labeling in what follows, the in-
the original but still plausibly detectable by the orig- key (harmonically congruous) musical stimuli will be
inal study. While we ran more participants than the called grammatical and the distant out-of-key (har-
original target of 40, 5 participants did not complete monically incongruous) musical stimuli will be called
the experiment due to technical difficulties such as ungrammatical, even though we recognize that the
recording problems with the EEG equipment. Thus, application of those terms to music is not neces-
we ended up with 39 participants, one under our sarily as straightforward as it is for language.
pre-registered target. We stopped data collection The language stimuli were spoken sentences
because we reached our pre-registered cutoff date with a target noun phrase that was either grammat-
of 2/24/18 prior to having 40 usable recordings. The ical or ungrammatical given the prior context. There
cutoff date was necessary for the schedule of the were two primary types of sentences (grammatical
class. and ungrammatical) as well as two kinds of filler sen-
Participants in Patel et al. (1998) were musically tences, designed to prevent listeners from using
trained but specifically did not have perfect pitch. cues other than the target noun phrase in context to
Their participants had an average of 11 years of mu- judge the acceptability of the sentence. The gram-
sical experience, had studied music theory, and matical but unacceptable fillers make it so that not
played a musical instrument for an average of 6.2 all instances of “had” are acceptable. The grammat-
hours per week. All of our participants had at least 4 ical fillers make it so that not all instances of verb +
years of prior musical experience (M = 9.7 years, SD “the” are unacceptable. Examples of each sentence
= 3.3 years), which we defined as participation in type are below (the target noun phrase is italicized):
music lessons, enrollment in music coursework, or
experience with any musical instrument (including Grammatical: Some of the soldiers had discov-
voice). We also required that participants not have ered a new strategy for survival.
perfect pitch (by self-report). We did not require Ungrammatical: Some of the marines pursued the
that participants had studied music theory. Our par- discovered a new strategy for survival.
ticipants played a musical instrument for an average Grammatical, unacceptable (filler): Some of the
of 5.8 hours per week (SD = 3.4 hours per week). lieutenants had reimbursed a new strategy for
survival.
Stimuli Grammatical (filler): Some of the explorers pur-
sued the idea of a new strategy for survival.
Patel graciously provided the music stimuli used
in the original study. The language stimuli were no
Sentences ranged from 2.9 to 4.8 seconds long,
longer available in audio form, but we were provided
spoken by one of the female experimenters at a rate
4 DE LEEUW, ANDREWS, ALTMAN, ANDREWS, APPLEBY, BONANNO, DESTEFANO, DOYLE-SAMAY, FARUQUI, GRIESMER, HWANG, LAWSON,
LEE, LIANG, MERNACAJ, MOLINA, NG, PARK, POSSIDENTE, & SHRIVER
of approximately six syllables per second using a The experiment consisted of 5 blocks: 3 language
Blue Snowball iCE Condenser microphone, sampled blocks containing 33, 33, and 34 trials, and 2 music
at 44.1kHz. The audio files were later amplified in blocks containing 36 trials each. The experiment al-
Audacity in order to be at a volume similar across ways started with a language block and then alter-
sentences and approximately comparable to that of nated between language and music. Grammatical
the music stimuli. For each file, the onset and dura- and ungrammatical trials were randomly intermixed
tion of the target noun phrase was recorded (in mil- within each block. At the conclusion of a block, par-
liseconds) to refer to in analysis when identifying the ticipants were given the opportunity to take a break.
onset of ERP components (see Participants controlled the length of the break.
https://osf.io/tr7mq/ for complete list).
In addition to the music and language stimuli ERP Recording
used in the original experiment, we created sample We recorded EEG activity using a 128-channel
stimuli to provide a short pre-task tutorial for par- sensor net (Electrical Geodesics Inc.) at a sampling
ticipants. These consisted of six new sentences and rate of 1000 samples/s referenced to Cz. The data
six new musical excerpts, designed to match the were amplified using a Net Amps 400 Amplifier
properties of the original stimuli. The music files (Electrical Geodesics Inc.). We focused on the 13
were created in MuseScore (MuseScore Develop- scalp locations that were used in Patel et al. (1998).
ment Team, 2018). The locations and their corresponding electrode
number on the EGI-128 system were Fz (11), Cz (129),
Procedure
and Pz (62) (midline sites), and F8 (122), ATR (115), TR
Participants completed the experiment in a quiet (108), WR (93), O2 (83), F7 (33), ATL (39), TL (45), WL
room seated at a computer screen and keyboard. (42), and O1 (70) (lateral sites). Vertical eye move-
Audio files were played through a pair of speakers ments and blinks were monitored by means of two
(the original study used headphones). The experi- electrodes located above and one located below
ment was built using the jsPsych library (de Leeuw, each eye; horizontal eye movements were moni-
2015). Communication between jsPsych and the EEG tored by means of one electrode located to the outer
recording equipment was managed through a side of each eye. Impedances for all of these elec-
Chrome extension that enables JavaScript-based trodes were kept below 50 kΩ prior to data collec-
control over a parallel port (Rivas, 2016). tion.
Each trial began with the audio file playing while Netstation 5.4 waveform tools were used to pro-
a fixation cross was present on the screen. Partici- cess the EEG data offline, first by applying a high
pants were asked to avoid blinking or moving their pass filter at 0.1 Hz and a low pass filter at 30 Hz.
eyes while the fixation cross was present, to prevent Data were segmented into 1100 ms segments start-
eye movement artifacts in the EEG data. After the ing 100 ms prior to and ending 1000 ms after target
audio file concluded, participants saw a blank screen stimulus onset. Segments containing ocular artifacts
for 1450 ms. Finally, a text prompt appeared on the were excluded from further analyses, as were any
screen asking participants if the sentence or musical segments that had more than 20 bad channels. The
excerpt was acceptable or unacceptable. Partici- NetStation bad channel replacement tool was ap-
pants pressed either the A (acceptable) or U (unac- plied to the EEG data which were re-referenced us-
ceptable) key in response. This procedure is nearly ing an average reference and baseline corrected to
identical to the original, except for the use of a key-
board instead of a response box held in the partici-
pant’s lap.
The experiment started with a short set of prac-
tice trials: 6 language trials followed by 6 music tri-
als. Following the practice trials, the experimenter
verified that the participant understood the instruc-
tions before the experiment proceeded.
SIMILAR EVENT-RELATED POTENTIALS TO MUSIC AND LANGUAGE: A REPLICATION OF PATEL, GIBSON, RATNER, BESSON, & HOLCOMB (1998) 5
Figure 1. Grand average waveforms for language stimuli. The shaded box highlights the time window for
analyzing P600 differences (500-800ms after stimulus onset) and the area surrounding each line repre-
sents ±1 SE. The plots are arranged to represent approximate scalp position of each electrode, with poste-
rior electrodes at the bottom.
the 100 ms prior to stimulus onset. These processing analysis. To avoid making a biased decision, we tab-
steps are similar to those used by Patel et al. (1998; ulated the number of artifact-free segments for
see pgs. 729-730), with some minor differences due each of the four conditions for each participant and
to the use of a different EEG system. Information chose a cutoff as the very first step in our analysis,
about all tool settings is available at prior to any examination of the waveforms. Based on
https://osf.io/96bjn/. this ad-hoc inspection of the data (see
https://osf.io/w7hrm/), we decided to exclude 4
participants who had at least one condition with
Results
fewer than 19 good segments. We chose 19 as the
cutoff because the data had some natural clustering;
We conducted our analyses in R v3.4.2 (R Core
the 4 participants who did not meet that cutoff had
Team, 2017) using several packages (Henry & Wick-
15 or fewer good segments in at least one condition.
ham, 2017; Lawrence, 2016; Morey & Rouder, 2015;
This left us with data from 35 participants. All sub-
Wickham, 2016; Wickham, Francois, Henry, & Mül-
sequent analyses are based only on these 35 partic-
ler, 2017; Wickham & Henry, 2018; Wickham, Hester,
ipants. The mean number of usable trials across
& Francois, 2017; Wilke, 2017). The complete anno-
participants was 27.7 for language-grammatical and
tated analysis script is available as an R Notebook at
https://osf.io/m9kej/.
Data Exclusions
39 participants had a complete data set. We pre-
registered a plan to exclude trials that contained ar-
tifacts, but we did not pre-register a decision rule
for how many good ERP segments a participant
would need in each condition to be included in the
6 DE LEEUW, ANDREWS, ALTMAN, ANDREWS, APPLEBY, BONANNO, DESTEFANO, DOYLE-SAMAY, FARUQUI, GRIESMER, HWANG, LAWSON,
LEE, LIANG, MERNACAJ, MOLINA, NG, PARK, POSSIDENTE, & SHRIVER
Figure 2. Grand average waveforms for music stimuli. The shaded box highlights the time window for
analyzing P600 differences (500-800ms after stimulus onset) and the area surrounding each line repre-
sents ± 1 SE. The plots are arranged to represent approximate scalp position of each electrode, with pos-
terior electrodes at the bottom.
EEG Data
Condition Patel et al. Replication
(1998) In the original experiment, Patel et al. analyzed
the EEG data in two primary ways. We repeat and
Language, M = 95% M = 93.3%, extend these analyses below.
Grammatical SD = 5.3% First, they calculated mean amplitude of the
waveforms in all conditions (they had six total con-
Language, M = 96% M = 88.2%, ditions, but we have four) and then used ANOVAs to
Ungrammatical SD = 18.0% model the effects of grammaticality and electrode
site on the amplitude of the ERP. They used separate
Music, M = 80% M = 84.5%,
ANOVA models for the language and music condi-
Grammatical SD = 14.5%
tions and did not treat this as a factor in this part of
Music, M = 72% M = 69.1%, the analysis. They analyzed three time windows,
Ungrammatical SD = 15.5% 300-500 ms, 500-800 ms, and 800-1100 ms, repli-
cating the ANOVAs separately in each time window.
Finally, they repeated this analysis separately for
SIMILAR EVENT-RELATED POTENTIALS TO MUSIC AND LANGUAGE: A REPLICATION OF PATEL, GIBSON, RATNER, BESSON, & HOLCOMB (1998) 7
Figure 3. Grand average difference waves (ungrammatical minus grammatical) for language and music.
The shaded box highlights the time window for analyzing P600 differences (500-800ms after stimulus
onset) and the area surrounding each line represents ± 1 SE.
midline electrodes and lateral electrodes. This was a main effects somewhat difficult to interpret. For
total of 12 ANOVAs. Given that the P600 should be language stimuli, the interaction between electrode
strongest in the 500-800 ms window, we pre-regis- site and grammaticality was due to a stronger effect
tered a decision to restrict our analysis to the 500- of grammaticality at posterior electrode sites. This
800 ms window only, reducing the number of ANO- is also what Patel et al. found. For music stimuli, the
VAs to 4. We view this as the strongest test of the effect of grammaticality was also stronger at poste-
original conclusion. rior sites, with the exception of the two most poste-
The results of these four ANOVAs are reported in rior sites (O1 and O2), where there was no clear ef-
Table 2. While we cannot make direct comparisons fect of grammaticality. This is a difference from the
with the ANOVA results reported by Patel et al. be- original study, as Patel et al. did observe the music-
cause we dropped one of the levels of the grammar based P600 effect at these sites.
factor from the procedure, we can look at whether The second analysis that Patel et al. ran was to
the results align at a high level. For both music and calculate difference waves — subtracting the gram-
language stimuli, Patel et al. report a significant ef- matical ERP from the ungrammatical ERP — to, in
fect of grammaticality at both midline and lateral theory, isolate the P600 and then directly compare
electrode sites, as well as a significant interaction the amplitude of the difference waves for language
between electrode location and grammaticality at and music stimuli. For an unexplained reason, they
both midline and lateral electrode sites. We found shifted the time window of analysis to 450-750 ms.
most of these effects; the exceptions were that we We pre-registered a decision to analyze the differ-
found no main effect of grammaticality for lateral ence waves in the 500-800 ms range, to remain con-
electrodes and language stimuli, and no main effect sistent with the prior analysis.
of grammaticality for lateral electrodes and music Patel et al. found no significant difference in the
stimuli. However, we did consistently find an inter- amplitude of the difference waves and concluded “in
action between electrode site and grammaticality the latency range of the P600, the positivities to
for all conditions, which makes the differences in
8 DE LEEUW, ANDREWS, ALTMAN, ANDREWS, APPLEBY, BONANNO, DESTEFANO, DOYLE-SAMAY, FARUQUI, GRIESMER, HWANG, LAWSON,
LEE, LIANG, MERNACAJ, MOLINA, NG, PARK, POSSIDENTE, & SHRIVER
Table 2.
ANOVA results for grammaticality x electrode models. “Electrode” refers to the specific electrode sites
within the midline and lateral site groups.
Language Midline F(1, 34) = 6.41, F(2, 68) = 1.00, F(2, 68) = 5.11,
p = 0.016 p = 0.372 p = 0.009
Language Lateral F(1, 34) = 0.44, F(9, 306) = 3.68, F(9, 306) = 4.99,
p = 0.512 p = 0.0002 p = 0.000003
Music Midline F(1, 34) = 23.94, F(2, 68) = 7.00, F(2, 68) = 12.43,
p = 0.00002 p = 0.002 p = 0.00003
Music Lateral F(1, 34) = 1.12, F(9, 306) = 11.10, F(9,306) = 6.47,
p = 0.298 p < 0.000001 p < 0.000001
structurally incongruous elements in language and interaction between stimulus type and electrode
music do not appear to be distinguishable” (pg. 726). site for lateral electrodes, though we note that the
We note that a failure to find a statistically signifi- p-value is relatively high (p = 0.038) with no correc-
cant difference is not necessarily indicative of tion for multiple comparisons.
equivalence (Gallistel, 2009; Lakens, 2017). We re- We conducted the Bayes factor analysis using the
peat this analysis for the sake of comparison, but we BayesFactor R package (Morey & Rouder, 2015).
also include an analysis using Bayes factors to exam- Briefly, the analysis evaluates the relative support
ine how the relative probabilities of models that do for five different models of the data. All models con-
and do not include the factor of stimulus type (lan- tain a random effect of participant; models 2-5 also
guage v. music) are affected by these data. contain one or more fixed effects. Model 2 contains
The results of the 2 ANOVAs are shown in Table the fixed effect of electrode; model 3 contains the
3. Like Patel et al., we found no main effect of stim- fixed effect of stimulus type; model 4 contains both
ulus type (language v. music) in either lateral or mid- fixed effects; and model 5 contains both fixed effects
line electrodes. However, we did find a significant
Table 3.
ANOVA results for the difference waves. “Stimulus” refers to language v. music.
Midline F(1, 34) = 0.226, F(2, 68) = 16.289, F(2, 68) = 0.315,
p = 0.637 p = 0.000002 p = 0.731
Lateral F(1, 34) = 1.784, F(9, 306) = 12.181, F(9, 306) = 2.009,
p = 0.190 p < 0.000001 p = 0.038
SIMILAR EVENT-RELATED POTENTIALS TO MUSIC AND LANGUAGE: A REPLICATION OF PATEL, GIBSON, RATNER, BESSON, & HOLCOMB (1998) 9
Table 4.
Bayes Factors for models of the effect of electrode site and stimulus type (language v. music) at mid-
line and lateral electrodes. Bayes factors indicate the change in posterior odds for the model rela-
tive to the model that contains only the random effect of participant. Bayes factors larger than 1
therefore indicate relative support for the model, with larger Bayes factors representing more sup-
port. Bayes factors less than 1 indicate relative support for the participant-only model, with num-
bers closer to 0 indicating more support.
size (N = 35) that is more than twice the original (N in the 500-800 ms window after structural viola-
= 15). tions in music and language might reflect shared
One aspect of the data that is visually striking is processing resources, but it’s also possible that
the clear differences in the shape of the waveforms there are two distinct processes that both generate
for music and language stimuli (Figures 1 and 2). Pa- this kind of EEG signal. As we described in the intro-
tel et al. (1998) also noted this difference and at- duction, there is already a literature with numerous
tributed it to theoretically-irrelevant differences studies that examine the behavioral and neurologi-
between the musical and linguistic stimuli. The mu- cal overlap between music and language, a literature
sical excerpts are rhythmic with short gaps of si- in which debates about the best theoretical inter-
lence, while the sentences are more variable and pretation of the empirical findings are unfolding.
continuous. Patel et al. argued that this could ex- Finally, we note that there has been a growing in-
plain the difference. This seems plausible, but the terest in conducting serious replication studies in
statistical models they (and therefore we) used are undergraduate and graduate research methods
limited to making comparisons on the mean ampli- classes (Frank & Saxe, 2012; Grahe, Brandt, IJzerman,
tude in a particular time window, which is a substan- & Cohoon, 2014; Hawkins et al., 2018; Wagge, Baciu,
tial reduction in the information content of the Banas, Nadler, & Schwarz, 2019; Wagge, Brandt, et
waveforms. An advantage of making the full data set al., 2019). The hypothesized benefits are numerous:
available is that other researchers can choose to an- students act as real scientists with tangible out-
alyze the data with other kinds of models. Another comes, motivating careful and engaged work on the
difference between the language and music wave- part of the students and benefiting the scientific
forms reported by Patel et al. was a right anterior community with the generation of new evidence;
temporal negativity (RATN) in the 300-400 ms range students learn about the mechanics and process of
(N350) only for the music condition. This was re- conducting scientific research with well-defined re-
ported as an interesting, unexpected effect but not search questions and procedures, providing a
one that was important theoretically for the main stronger foundation for generating novel research
result of similar processing of language and music in the future; reading papers with the goal of repli-
structural violations. The RATN pattern was not ev- cation teaches students to critically evaluate the
ident in our music waveform data and the relevant methods and rationales in order to be able to repli-
statistical analysis did not replicate this element of cate the work (Frank & Saxe, 2012). Exposing the
Patel et al.’s findings (see Appendix for further de- next generation of researchers to methodological
tails). innovations that improve replicability and repro-
Of course, some concerns can only be addressed ducibility spreads those practices, hopefully pro-
through changes to the experimental design, such as ducing a more reliable corpus of knowledge in the
creating stimuli that have different properties de- future.
signed to control for additional factors. Feather- Our experience with this project anecdotally
stone, Waterman, and Morrison (2012) point out po- supports these hypotheses. Students were engaged
tential confounding factors in the stimuli used by and produced high-quality work. Moreover, the rep-
Patel et al. (1998) and other similar, subsequent lication project provided a strong foundation for
studies. For example, the musical violations are both novel experimental work. The class was structured
improbable in context and violate various rules of so that smaller teams of students conducted original
Western musical harmony. Direct replications, studies following the whole-class replication effort.
while crucial for establishing the reliability of a par- Students were able to apply a variety of methodo-
ticular finding, necessarily also contain any method- logical skills learned from the replication project —
ological weakness of the original study. While we pre-registration, data analysis techniques, use of the
contend that this replication supports the empirical Open Science Framework, and, more abstractly, an
conclusions of the original study, we are mindful of understanding of what the complete research pro-
the need to also examine support for the theoretical cess entails — to this second round of projects.
conclusion with a variety of methodological ap- Given our experiences, we endorse similar initia-
proaches. The relative increase in mean amplitude tives that involve students in replication work as
part of their methodological training.
SIMILAR EVENT-RELATED POTENTIALS TO MUSIC AND LANGUAGE: A REPLICATION OF PATEL, GIBSON, RATNER, BESSON, & HOLCOMB (1998) 11
Personality and Social Psychology, 113(2), 254– potential study. Journal of Cognitive Neurosci-
261. ence, 10(6), 717–733.
Lima, C. F., & Castro, S. L. (2011). Speaking to the R Core Team. (2017). R: A Language and Environ-
trained ear: musical expertise enhances the ment for Statistical Computing (Version 3.4.2).
recognition of emotions in speech prosody. Vienna, Austria: R Foundation for Statistical
Emotion, 11(5), 1021–1031. Computing. Retrieved from https://www.R-
Merrill, J., Sammler, D., Bangert, M., Goldhahn, D., project.org/
Lohmann, G., Turner, R., & Friederici, A. D. Rivas, D. (2016). jsPsych Hardware (Version v0.2-al-
(2012). Perception of words and pitch patterns pha). Retrieved from https://github.com/ri-
in song and speech. Frontiers in Psychology, 3, vasd/jsPsychHardware
76. Rosenthal, R. (1979). The “file drawer problem” and
Moreno, S., Marques, C., Santos, A., Santos, M., Cas- tolerance for null results. Psychological Bulle-
tro, S. L., & Besson, M. (2009). Musical training tin, 86(3), 638–641.
influences linguistic abilities in 8-year-old chil- Rouder, J. N., Morey, R. D., Speckman, P. L., & Prov-
dren: more evidence for brain plasticity. Cere- ince, J. M. (2012). Default Bayes factors for
bral Cortex, 19(3), 712–723. ANOVA designs. Journal of Mathematical Psy-
Morey, R. D., & Rouder, J. N. (2015). BayesFactor: chology, 56(5), 356–374.
Computation of Bayes Factors for Common Sammler, D., Baird, A., Valabrègue, R., Clément, S.,
Designs. Retrieved from https://CRAN.R-pro- Dupont, S., Belin, P., & Samson, S. (2010). The
ject.org/package=BayesFactor relationship of lyrics and tunes in the pro-
MuseScore Development Team. (2018). MuseScore cessing of unfamiliar songs: a functional mag-
(Version 2.3.2). Retrieved from netic resonance adaptation study. Journal of
https://musescore.org/ Neuroscience, 30(10), 3572–3578.
Nosek, B. A., Alter, G., Banks, G. C., Borsboom, D., Sammler, D., Koelsch, S., Ball, T., Brandt, A.,
Bowman, S. D., Breckler, S. J., … Yarkoni, T. Grigutsch, M., Huppertz, H.-J., … Schulze-Bon-
(2015). Promoting an open research culture. hage, A. (2013). Co-localizing linguistic and mu-
Science, 348(6242), 1422–1425. sical syntax with intracranial EEG. NeuroImage,
Osterhout, L., & Holcomb, P. J. (1992). Event-related 64, 134–146.
brain potentials elicited by syntactic anomaly. Simonsohn, U. (2015). Small telescopes: Detectabil-
Journal of Memory and Language, 31(6), 785– ity and the evaluation of replication results.
806. Psychological Science, 26(5), 559–569.
Osterhout, L., & Holcomb, P. J. (1993). Event-related Steinbeis, N., & Koelsch, S. (2008). Shared neural
potentials and syntactic anomaly: Evidence of resources between music and language indi-
anomaly detection during the perception of cate semantic processing of musical tension-
continuous speech. Language and Cognitive resolution patterns. Cerebral Cortex, 18(5),
Processes, 8(4), 413–437. 1169–1178.
Osterhout, L., Holcomb, P. J., & Swinney, D. A. Thompson, W. F., Schellenberg, E. G., & Husain, G.
(1994). Brain potentials elicited by garden-path (2004). Decoding speech prosody: do music
sentences: evidence of the application of verb lessons help? Emotion, 4(1), 46–64.
information during parsing. Journal of Experi- Tillmann, B. (2012). Music and language perception:
mental Psychology: Learning, Memory, and Cog- expectations, structural integration, and cogni-
nition, 20(4), 786–803. tive sequencing. Topics in Cognitive Science,
Patel, A. D. (2003). Language, music, syntax and the 4(4), 568–584.
brain. Nature Neuroscience, 6(7), 674–681. Wagenmakers, E.-J., Wetzels, R., Borsboom, D., van
Patel, A. D. (2010). Music, Language, and the Brain. der Maas, H. L. J., & Kievit, R. A. (2012). An
Oxford University Press, USA. agenda for purely confirmatory research. Per-
Patel, A. D., Gibson, E., Ratner, J., Besson, M., & Hol- spectives on Psychological Science, 7(6), 632–
comb, P. J. (1998). Processing syntactic rela- 638.
tions in language and music: an event-related
SIMILAR EVENT-RELATED POTENTIALS TO MUSIC AND LANGUAGE: A REPLICATION OF PATEL, GIBSON, RATNER, BESSON, & HOLCOMB (1998) 13