HHS Public Access
Author manuscript
Author Manuscript
Int J Psychophysiol. Author manuscript; available in PMC 2018 April 01.
Published in final edited form as:
Int J Psychophysiol. 2017 April ; 114: 24–30. doi:10.1016/j.ijpsycho.2017.01.012.
How Many Blinks are Necessary for a Reliable Startle
Response? A Test Using the NPU-threat Task
Lynne Lieberman1, Elizabeth S. Stevens1, Carter J. Funkhouser1, Anna Weinberg2, Casey
Sarapas1, Ashley A. Huggins3, and Stewart A. Shankman1
1University
Author Manuscript
2McGill
of Illinois at Chicago, Department of Psychology, Chicago, IL 60657
University, Department of Psychology, Montreal, QB
3University
of Wisconsin, Department of Psychology, Milwaukee, WI 53211
Abstract
Author Manuscript
Emotion-modulated startle is a frequently used method in affective science. Although there is a
growing literature on the reliability of this measure, it is presently unclear how many startle
responses are necessary to obtain a reliable signal. The present study therefore evaluated the
reliability of startle responding as a function of number of startle responses (NoS) during a widely
used threat-of-shock paradigm, the NPU-threat task, in a clinical (N = 205) and non-clinical (N =
92) sample. In the clinical sample, internal consistency was also examined independently for
healthy controls vs. those with panic disorder and/or major depression and retest reliability was
assessed as a function of NoS. Although results varied somewhat by diagnosis and for retest
reliability, the overall pattern of results suggested that six startle responses per condition were
necessary to obtain acceptable reliability in clinical and non-clinical samples during this threat-ofshock paradigm in the present study.
Keywords
anxiety-potentiated startle; emotion-modulated startle; eyeblink startle reflex; fear-potentiated
startle; reliability
1. Introduction
Author Manuscript
Establishing the reliability of a measure is an essential first step towards establishing its
validity (Cronbach, 1947; Cronbach & Meehl, 1955). Although this fact is well accepted in
the development of self-report and interview measures, the psychometric properties of
psychophysiological indices of psychological constructs has received less attention until
recently (Hajcak & Patrick, 2015; Tomarken, 1995). This is particularly important given the
Correspondence concerning this article should be addressed to Stewart A. Shankman, University of Illinois at Chicago, 1007 W.
Harrison St. (M/C 285), Chicago, IL, 60657. Phone: (312)-355-3812; stewarts@uic.edu.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our
customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of
the resulting proof before it is published in its final citable form. Please note that during the production process errors may be
discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Lieberman et al.
Page 2
Author Manuscript
increasingly prominent role of psychophysiological measures within psychology (and
affective science more specifically; Schwartz, Lilienfeld, Meca, & Sauvigné, 2016;
Shankman & Gorka, 2015). The present study therefore seeks to contribute to this
burgeoning literature by examining the reliability of a widely used psychophysiological
index of emotion – electromyography of the eyeblink startle reflex (EMG startle).
Author Manuscript
The startle reflex is particularly conducive to translational research on emotion because it is
present across species and its magnitude is modulated by an organism’s emotional state.
More specifically, the magnitude of the startle reflex is potentiated or blunted relative to
baseline when an organism is in an aversive (e.g., fear) or appetitive (e.g., excitement)
emotional state, respectively (Grillon & Ameli, 2001; Vrana. Spence & Lang, 1988). Startle
is also commonly used to examine emotional processing abnormalities that may contribute
to the development and maintenance of psychopathology. For example, heightened aversive
responding to particular threatening stimuli/situations has been implicated in the
pathogenesis of several internalizing disorders (e.g., panic disorder and interoceptive cues;
posttraumatic stress disorder and trauma-related cues; social anxiety disorder and social
evaluation; Craske et al., 2009). However, unpredictable threatening stimuli are particularly
aversive for anxious individuals. Panic disorder (PD), posttraumatic stress disorder, and
social anxiety disorder have all been associated with heightened startle potentiation during
the anticipation of unpredictable threat (Cornwell, Johnson, Berardi & Grillon, 2006; Grillon
et al., 2009; Shankman et al., 2013). Thus, aberrant emotion-modulated startle, particularly
during the anticipation of unpredictable threat, may represent a transdiagnostic marker for
several internalizing disorders.
Author Manuscript
The literature on the psychometric properties of emotion-modulated startle has also grown in
recent years. Investigations of the retest reliability of emotion-modulated startle elicited
during an affective picture-viewing task have yielded mixed results, with some investigations
finding strong retest reliability (Bradley, Gianaros, & Lang, 1995; Larson, Ruffalo, Nietert,
& Davidson, 2000) and others finding weak retest reliability (Kaye, Bradford, & Curtin,
2016; Manber, Allen, Burton, & Kaszniak, 2000). Only two studies to date have examined
retest reliability of emotion-modulated startle during the No threat-Predictable threatUnpredictable threat-task (NPU; Schmitz & Grillon, 2012), a startle paradigm that is widely
used to differentiate startle potentiation to predictable threat (i.e., fear-potentiated startle)
and unpredictable threat (i.e., anxiety potentiated startle). Both studies reported retest
correlations above .69 for anxiety-potentiated startle and fear-potentiated startle (Kaye et al.,
2016; Shankman et al., 2013). Kaye and colleagues (2016) reported acceptable internal
consistency (i.e., Cronbach’s alphas > .70 [Nunnally, 1978]) for anxiety-potentiated startle
and fear-potentiated startle during the NPU-threat task.
Author Manuscript
Despite growing focus in the field of psychology on exploring the reliability of emotionmodulated startle, there are several major gaps in the extant literature on the psychometric
properties of this psychophysiological measure. For example, it is presently unknown how
many startle responses are necessary to obtain a reliable index of startle potentiation scores
during emotion-modulated startle paradigms. It is also presently unknown whether the
number of startle responses (NoS) necessary for reliable condition averages (which are used
to calculate startle potentiation scores) and potentiation scores differs for those with
Int J Psychophysiol. Author manuscript; available in PMC 2018 April 01.
Lieberman et al.
Page 3
Author Manuscript
internalizing psychopathology relative to those without. This is a particularly important
question to address given the abovementioned association between internalizing
psychopathology and aberrant emotion-modulated startle.
Author Manuscript
Condition averages and potentiation scores calculated from a sufficient NoS should
demonstrate acceptable internal consistency and strong retest reliability. Determining the
minimum number of startle responses (NoS) necessary for reliable condition averages and
potentiation scores would be highly beneficial for the design of future experimental
protocols (at least with the NPU startle paradigm), which should be as brief as possible to
reduce participant burden and the potential impact of startle habituation on task effects
(Blumenthal et al., 2005). An empirically determined minimum NoS could also help
experimenters determine when a participant has too few usable startle responses to be
included in data analyses. This is critical given that certain trials may be excluded for some
participants due to artifacts (e.g., excessive participant movement just before or after the
presentation of a startle probe) and non-responses (i.e., failure to exhibit a discernable startle
response) and some participants may withdraw from the study prior to study completion.
Author Manuscript
Several studies have examined the reliability of event-related potentials as a function of
number of trials (e.g., Foti, Kotov & Hajcak, 2013; Moran, Jendrusina & Moser, 2013;
Meyer, Riesel & Proudfit, 2013). However, only one study to our knowledge has examined
this question with respect to EMG startle data. Our laboratory recently investigated the NoS
necessary for adequate internal consistency (i.e., degree of interrelatedness or stability;
Tavakol & Dennick, 2011) of average startle magnitude during each condition of the NPUthreat task (i.e., condition averages) in a non-clinical sample. Startle magnitude exhibited
excellent internal consistency (Cronbach’s alpha > .80) for all NPU conditions with as few
as three responses (Nelson, Hajcak, & Shankman, 2015). The present study will replicate
our previous investigation by examining the internal consistency of condition averages
during NPU as a function of NoS across two additional samples, one clinical and one nonclinical. We will also extend our previous investigation by examining, (a) the internal
consistency of potentiation scores (i.e., fear-potentiated startle and anxiety potentiated
startle) as a function of NoS; and (b) whether the NoS necessary for adequate consistency of
condition averages and potentiation scores differs for those with an anxiety and/or
depressive disorder. Lastly, we will conduct exploratory analyses to assess the NoS
necessary for significant retest reliability of condition averages and potentiation scores in a
subset of participants.
2. Methods
Author Manuscript
2.1. Participants
Data from the present study was collected as part of two investigations on emotional and
cognitive processes. Details of the two studies are provided elsewhere (see Sarapas,
Weinberg, Langenecker, & Shankman, & 2017; Shankman et al., 2013). In brief, study 1 (n
= 92) was a non-clinical sample of undergraduates. Study 2 (n = 205) was a clinical sample
recruited from the community to be in one of four groups: (1) no history of Axis I
psychopathology (i.e., healthy controls; n = 82), (2) current major depressive disorder
(MDD) and no lifetime history of any anxiety disorder (i.e., MDD-only group; n = 37), (3)
Int J Psychophysiol. Author manuscript; available in PMC 2018 April 01.
Lieberman et al.
Page 4
Author Manuscript
current PD and no lifetime history of MDD (i.e., PD-only group; n = 28), (4) current PD and
MDD (i.e., comorbid PD and MDD group; n = 58). Diagnoses were made via the Structured
Clinical Interview for DSM-IV (SCID; First, Spitzer, Gibbon, & Williams, 1996).
Exclusion criteria for both studies were a history of head trauma, left-handedness, and
English fluency. Participants in Study 2 were additionally required to have no lifetime
history of a psychotic disorder, bipolar disorder, or dementia. Participant demographics can
be found in Table 1, along with clinical characteristics, such as self-reported anxiety and
depressive symptomology.
2.2. Procedure and NPU-Threat Task
Author Manuscript
The full procedure for Studies 1 and 2 has been reported elsewhere (Sarapas et al., 2017;
Shankman et al., 2013). In brief, after informed consent all participants completed the NPU
threat-task. For Study 2, 34 participants returned to the laboratory 5–17 (M = 9.46, SD =
3.71) days after their initial visit to complete NPU a second time. Of these 34 individuals, 7
had MDD-only, 5 had PD-only, 10 had comorbid PD and MDD, and 12 were healthy
controls. All procedures were approved by the local Institutional Review Board.
The NPU-threat task was designed to assess responses to predictable and unpredictable
threats (Schmitz & Grillon 2012). In brief, prior to the task, shock electrodes were placed on
participants’ left wrist and a shock work-up procedure was completed to identify the level of
shock intensity each participant described as “highly annoying but not painful” (between 1–
5 mA). Participants also completed a 2-min startle habituation task prior to the task to
reduce early, exaggerated startle potentiation.
Author Manuscript
The NPU-threat task included three within-subjects conditions - no shock (N), predictable
shock (P), and unpredictable shock (U). Text at the bottom of the computer monitor
informed participants of the current threat condition and each condition lasted for 90s. In
Study 1, a 6-s countdown was displayed five times within each condition, and in Study 2, an
8-s geometric cue (blue circle for N, red square for P, and green star for U) was presented
four times within each condition. Interstimulus intervals ranged from 7 to 17-s during which
only the text describing the condition was on the screen (i.e., ISI conditions).
Author Manuscript
During N, no shocks were delivered. During P, Study 1 participants only received a shock
when the countdown reached 1 and Study 2 participants only received a shock when the cue
(red square) was on the screen (i.e., the shock was predicted by the countdown or cue in
study 1 and 2, respectively). In the U condition, shocks were administered at any time (i.e.,
during the cue countdown [hereafter: cue] or ISI). Study 1 participants received 20 shocks
(10 each during P and U) and 48 startle probes (16 each during N, P, and U). Study 2
participants received 12 shocks (6 during P and 6 during U) and 72 startle probes (24 each
during N, P, and U). Study 2’s NPU was divided into two recording blocks, separated by a
rest period.
Stimuli (i.e., shocks, white noise) were administered using PSYLAB (Contact Precision
Instruments, London, UK) hardware and software. Psychophysiological data were acquired
using Neuroscan 4.4 (Compumedics, Charlotte, NC). Acoustic startle probes were 40-ms,
Int J Psychophysiol. Author manuscript; available in PMC 2018 April 01.
Lieberman et al.
Page 5
Author Manuscript
103-dB bursts of white noise presented binaurally through headphones. Electric shocks were
400-ms. Consistent with published guidelines (Blumethal et al., 2005), EMG startle was
recorded from two 4-mm Ag/AgCl electrodes placed over the orbicularis oculi muscle below
the right eye and the ground electrode was at the frontal pole (AFZ). Data were collected
using a bandpass filter of DC to 200 Hz at a sampling rate of 1,000 Hz.
Author Manuscript
Startle blinks were scored according to published guidelines (Blumenthal et al., 2005). Data
processing included applying a 28 Hz high-pass filter, rectifying, and then smoothing using a
40 Hz low-pass filter. Blink response was defined as the peak amplitude of EMG activity
within the 20–150 ms period following startle probe onset relative to baseline. The baseline
period was defined as the average baseline EMG level for the 50 ms preceding the startle
probe onset. Each peak was identified by software but examined by hand to ensure
acceptability. Blinks were scored as nonresponses if EMG activity during the 20–150 ms
poststimulus time frame did not produce a blink peak that was visually differentiated from
baseline activity. Blinks that were scored as nonresponses were included as zeros. Blinks
were scored as missing if the baseline period was contaminated with noise, movement
artifact, or if a spontaneous or voluntary blink began before minimal onset latency and thus
interfered with the startle probe-elicited blink response.
2.3. Data Analysis Plan
Author Manuscript
Reliability was examined separately for Studies 1 and 2. Reliability was also examined
separately for startle amplitude (non-responses scored as missing values) and magnitude
(nonresponses scored as zeros). Cronbach’s alpha was used to index internal consistency
(Santos, 1999). We first examined Cronbach’s alpha as a function of the NoS entered into
the averages for each condition (NCue, PCue, UCue, NISI, PISI, and UISI) with a maximum of
8 (Study 1) and 12 (Study 2) probes per condition. Condition averages were derived from
raw microvolt values. For each NoS (NoS = 2; NoS = 3, etc), startle probes were selected in
the order that they occurred in (i.e., sequentially).1 Given that, as mentioned above, some
startle responses were scored as missing during EMG data processing, it is important to note
that the available sample size of participants for all reliability analysis decreased as the NoS
increased. The median number of probes that elicited missing responses was 2 (out of 48)
for study 1 and 4 (out of 72) for study 2 and the median number of non-responses was 1 in
each sample (sees Table 2 and 3).2 Also of note is that no case analyses were conducted, and
no model outliers were removed. That is, all participants who completed the NPU-threat task
in each study were included in the analyses.
Author Manuscript
Internal consistency analyses were conducted separately for each diagnostic group for Study
2 (i.e., healthy controls, PD-only, MDD-only, and comorbid MDD/PD). Cronbach’s alpha
was defined as ‘acceptable’ when equal to or greater than .70 (Nunnally, 1978). Split-half
reliability analyses were conducted to examine the internal consistency of potentiation
1The pattern of results was comparable when internal consistency analyses were conducted by adding startle responses to reliability
estimates in a random order. For this method, at each NoS (NoS = 2; NoS = 3, etc), startle probes were randomly selected from all
possible non-missing startle probes. For example, for NoS = 3, if a participant in study two had all 12 non-missing startle probes for a
condition, 3 of the 12 were randomly selected for the analyses.
2The median is more appropriate than the mean in this context as ‘number of missings’ and ‘number of nonresponses’ were highly
skewed (i.e., the vast majority of probes elicited startle responses).
Int J Psychophysiol. Author manuscript; available in PMC 2018 April 01.
Lieberman et al.
Page 6
Author Manuscript
scores as a function of NoS. To do so, averages of odd-numbered trials and even-numbered
trials were first separately calculated as a function of NoS (e.g., the average of startle
responses one and three; the average of startle responses two and four, etc.). SpearmanBrown corrected Coefficients were then calculated to assess the relation between odd and
even trials (see Kappenman et al., 2014 and Kaye et al., 2016 for a similar approach).
Consistent with the literature, Spearman-Brown Coefficients were interpreted as acceptable
if greater than .50 (Kaye et al., 2016).
For Study 2, retest reliabilities were tested as a function of NoS for: (1) average startle in
each of the six NPU conditions, (2) startle potentiation to the unpredictable threat (average
UCue minus average NCue and average UISI minus average NISI), and (3) startle potentiation
to the predictable threat (average PCue minus average NCue). Pearson’s r was also used to
assess retest reliability.
Author Manuscript
3. Results
3.1. Internal Consistency in the Non-Clinical Sample (Study 1)
At only two responses (NoS=2), Cronbach’s alphas for average startle magnitude ranged
from .70–.83 for all conditions (see Figure 1A). For average startle amplitude with two
responses, Cronbach’s alphas were comparable, ranging from .79–.86 for all conditions
except PCue (.68). Cronbach’s alpha for amplitude during PCue reached an acceptable level
of .75 at three responses. For magnitude and amplitude potentiation scores, SpearmanBrown Coefficients reached an acceptable level across all conditions at just two responses
total (range of rs = .73–.86 and rs = .71–.86, respectively, p < .05 [see Figure 1B]).
Author Manuscript
3.2. Internal Consistency in the Clinical Sample (Study 2)
Across all four groups, at two responses, Cronbach’s alphas for startle magnitude and
amplitude ranged from .85–.90 across all six conditions (see Figure 1C). Similarly, for
magnitude and amplitude potentiation scores, split-half correlations reached an acceptable
level across all conditions at just two responses (range of rs = .85–.86 for magnitude and
amplitude, p < .05 [see Figure 1D]).
Author Manuscript
The number of responses necessary to reach acceptable Cronbach’s alpha levels across all
conditions was comparable across diagnostic groups. In healthy controls alphas across all
conditions ranged from .86–.90 for magnitude (see Figure 2A) and .85–.90 for amplitude at
NoS = 2. In the MDD-only group, alphas across all conditions ranged from .78–.92 for
magnitude (see Figure 2B) and .77–.91 for amplitude at NoS = 2. For startle amplitude in the
PD-only group, alphas ranged from .80–.90 across all conditions except NISI at NoS = 2.
Likewise, for startle magnitude in the PD-only group, alphas ranged for from .82–.90 across
all conditions except NISI at NoS = 2. Alpha for magnitude and amplitude during NISI
reached an acceptable level of .81 at NoS = 3 (see Figure 2C). Lastly, in the comorbid
MDD/PD group, alphas across all conditions ranged from .82–.94 for magnitude (see Figure
2D) and 83–.93 for amplitude at NoS = 2.
Given that alpha values for magnitude and amplitude were acceptable for all conditions
across all diagnostic groups at NoS = 3, exploratory follow-up analyses were conducted to
Int J Psychophysiol. Author manuscript; available in PMC 2018 April 01.
Lieberman et al.
Page 7
Author Manuscript
examine whether Cronbach’s alpha values significantly differed between those with a
diagnosis of PD and/or MDD relative to healthy controls. To compare internal consistency
estimates at this NoS between individuals with and without a diagnosis, Cronbach’s alpha
values at NoS = 3 were calculated for individuals with any diagnosis (i.e., collapsing across
individuals with PD-only, MDD-only, or comorbid PD/MDD). We then conducted a series
of pairwise comparisons using a dependent-alpha calculator developed by Abd-El-Fattah &
Hassan (2011) to statistically compare Cronbach’s alpha at NoS = 3 for individuals with any
diagnosis, relative to healthy controls for the key threat conditions of the NPU-threat task:
PCue, UCue, and UISI. These comparisons revealed no significant differences between
Cronbach’s alpha values at NoS =3 for individuals with a diagnosis, relative to those
without.
3.3. Retest Reliability (Study 2)
Author Manuscript
For all conditions except NCue and PISI, there was a significant positive retest correlation for
startle magnitude across the two visits with as few as NoS = 2 (range of rs = .38–.71, ps < .
05, see Figure 3A). Retest correlations for startle magnitude reached significance for NCue
and PISI at NoS = 3 (rs = .28 and .31, respectively, p < .05). Similarly, for all conditions
except except NCue and PISI, there was a positive retest correlation for startle amplitude with
as few as NoS = 2 (range of rs across conditions at three responses = .44–.78, ps < .05).
Retest correlations for NCue startle amplitude reached significance at NoS = 5 (r = .39, p < .
05), and PISI at NoS = 3 (r = .35, p < .05).
Author Manuscript
Startle potentiation to unpredictable threat during visit one positively predicted startle
potentiation during visit two with as few as two startle responses for magnitude (rs for UCue
and UISI at two responses = .61 and .49, respectively, p < .05) and amplitude (rs for UCue
and UISI at two responses = .59 and .56, respectively, p < .05). Retest reliability for PCue
reached significance at NoS = 6 for amplitude (r = .38, p < .05) and magnitude (r = .36, p < .
05 [Figure 3B]).
4. Discussion
Author Manuscript
EMG of emotion-modulated startle is a commonly used index of emotional processing and
startle potentiation to threat has been used as a measure of heightened negative emotional
responding to threatening stimuli/situations in various anxiety disorders (Cornwell et al.,
2006; Grillon et al., 2009). Given the potential for emotion-modulated startle to serve as a
transdiagnostic marker of multiple internalizing conditions, there is a growing literature on
the psychometric properties of this psychophysiological measure. This is the first study,
however, to examine the reliability of EMG startle as a function of number of startle
responses during each condition of the NPU-threat task, a widely used threat of shock
paradigm, in two samples – one clinical and one non-clinical. In the clinical sample, we also
explored retest reliability in a smaller subset of subjects as a function of number of startle
responses for: (1) NPU condition averages, (2) anxiety-potentiated startle to unpredictable
threat (UISI/UCue), and (3) fear-potentiated startle to predictable threat (PCue).
In the non-clinical sample, two responses were necessary for magnitude and three responses
for amplitude condition averages to reach acceptable internal consistency (alpha >.70) across
Int J Psychophysiol. Author manuscript; available in PMC 2018 April 01.
Lieberman et al.
Page 8
Author Manuscript
all conditions. This pattern of results is similar to our laboratory’s previous finding that as
few as two responses were necessary for magnitude to reach acceptable internal consistency
across all NPU conditions in a non-clinical sample (Nelson et al., 2015). In the clinical
sample, just two startle responses were necessary for condition averages (for magnitude and
amplitude) to reach acceptable internal consistency across all conditions. Importantly, the
internal consistency results for condition averages were similar across MDD-only, PD-only,
comorbid MDD/PD, healthy controls, suggesting that internalizing psychopathology did not
negatively impact reliability. Internal consistency of startle potentiation to threat, a
commonly used index of negative emotional responding, was comparable to that of
condition averages. More specifically, split-half correlations for magnitude and amplitude
startle potentiation scores reached an acceptable level across all threat conditions in the nonclinical and clinical samples at just two responses total.
Author Manuscript
Of note is that the NoS necessary for significant retest reliability of average startle and
potentiation scores differed between task conditions. All condition averages exhibited
significant retest reliability at just two responses except for PISI and NCue. For PISI and NCue
to exhibit significant retest reliability for amplitude and magnitude, five responses were
necessary. As safety conditions in a threatening task, PISI and NCue may elicit greater
variability and inconsistency in startle responding within a given task administration than do
clearly threatening conditions (Lissek, Pine & Grillon, 2006). Retest reliability reached
significance at just two responses for amplitude and magnitude potentiation to UCue and
UISI. However, retest reliability did not reach significance for PCue until NoS = 6, suggesting
that startle potentiation to predictable threat may be somewhat more variable than to
unpredictable threat.
Author Manuscript
Author Manuscript
It is noteworthy that reactivity to unpredictable threat may be more reliable than reactivity to
predictable threat, as the literature on the relation between startle potentiation to predictable
threat and anxiety psychopathology is less consistent (e.g., Shankman et al., 2013; Grillon et
al., 2008) than the literature on the relation between startle potentiation to unpredictable
threat and anxiety psychopathology (e.g., Gorka et al., 2017; Lieberman et al., 2016;
Shankman et al., 2013). That is, mixed findings on the relation between anxiety
psychopathology and reactivity to predictable threat may be in part due to the poorer
reliability of startle potentiation during the anticipation of predictable threat. It is also
noteworthy that a higher NoS was necessary for significant retest reliability of PCue relative
to the NoS necessary for acceptable internal consistency of PCue. This suggests that
researchers may need to obtain a greater number of startle responses for temporal stability of
startle potentiation to predictable threat, whereas fewer responses may be necessary for
internal consistency of startle condition averages during PCue. Relatedly, researchers may
place a greater emphasis on the results from retest analyses when designing a study that aims
to obtain a temporally stable index of startle. Temporally stable indices of startle may be
particularly relevant in clinical research, which may use startle responding as a predictor of
risk for psychopathology or response to treatment for psychopathology.
In interpreting retest reliability results, however, it is important to consider several factors.
First, this was an exploratory analyses conducted in a smaller sample (n = 34). Second,
although retest correlations reached statistical significance for the majority of conditions at
Int J Psychophysiol. Author manuscript; available in PMC 2018 April 01.
Lieberman et al.
Page 9
Author Manuscript
just two responses, the coefficients were moderate at this NoS. Retest correlations increased
in magnitude as the NoS increased. This pattern of results suggests that the retest reliability
of startle condition averages and potentiation scores is improved by a greater NoS.
Author Manuscript
Author Manuscript
In sum, investigators may only need six startle responses in non-clinical and clinical samples
to obtain reliable and stable indices of average startle amplitude or magnitude in each
condition of NPU, as well as of anxiety-potentiated and fear-potentiated startle during NPU.
It is worth noting that potentiation scores (rather than startle during the individual
conditions) are often the metric of interest in the NPU-threat task and other emotionmodulated startle paradigms. Given this, it is encouraging for psychophysiological
researchers that so few startle responses were necessary for potentiation scores and the
condition averages that are used to calculate those potentiation scores. As mentioned above,
compared to self-report and interview measures of psychological variables, the
psychometrics of psychophysiological tasks are often ignored, but this pattern has begun to
change. For example, there have been recent investigations on how best to quantify startle
potentiation or change within a paradigm (Bradford, Starr, Shackman & Curtin, 2015).
Moreover, Kaye et al. (2016) investigated the internal consistency of startle condition
averages and potentiation scores. Results from the present study are consistent with those
reported by Kaye et al., 2016, such that startle during the NPU-threat task exhibited
acceptable internal consistency and temporal stability. Furthermore, NoS analyses reported
here suggest that the significant retest reliability reported by Shankman et al. (2013) in this
same clinical sample and, could have been obtained with half as many startle responses.
There have also been recent investigations to determine the number of events necessary to
obtain reliable ERP averages (Foti, Kotov & Hajcak, 2013; Moran, Jendrusina & Moser,
2013; Meyer, Riesel & Proudfit, 2013). Results from ERP investigations of this nature
yielded results that are similar to that of the present study, such that a minimum of seven and
eight responses have been suggested to obtain a reliable index of the late positive potential
and error-related negativity, respectively. This exploratory study therefore adds to this
growing methodological literature, and provides an empirically determined guideline to
consider when developing a task to assess for emotion-modulated startle (or at least with the
NPU paradigm).
Author Manuscript
Given that startle probes are naturally aversive and participant startle responses tend to
habituate over the course of a task (Blumenthal et al., 2005; Campbell et al., 2014), it is
important that researchers design their startle tasks to be as brief as possible to decrease
participant burden and increase the quality of the psychophysiological data collected.
Although data from the present study suggests that a minimum of six may be sufficient to
obtain reliable and stable indices of startle during NPU, it is important to note that several
responses were excluded from analyses after data collection due to artifacts or nonresponses. For the non-clinical sample in the present study, a median of two responses was
scored as missing and one as non-response (out of 48 responses across six conditions). For
the clinical sample, a median of four responses was scored as missing and one as nonresponse (out of 72 responses across six conditions). Taken together, these data suggest that
approximately 6–7% of startle responses may need to be excluded from data analyses due to
artifacts (which typically occur at random throughout a task). It may therefore be necessary
Int J Psychophysiol. Author manuscript; available in PMC 2018 April 01.
Lieberman et al.
Page 10
Author Manuscript
to increase the size of one’s task by this percentage so as to improve the likelihood that there
is a six responses available for analyses.
Author Manuscript
Although the overarching goal of this study was to provide an empirically determined
guideline to inform the development of startle tasks, a second and related goal is to inform
data pre-processing and analytic procedures for emotion-modulated startle paradigms. For
example, if some participants have multiple unusable trials due to randomly occurring
artifacts, researchers may choose to include those participants in analyses so long as there
are still six usable trials per condition. Researchers should, however, consider their sample
size when determining whether subjects with noisier EMG data should be excluded from
analyses. When sample sizes are small, researchers may choose to include subjects with
fewer than six usable trials per condition in order to improve the signal-to-noise ratio of the
data. Ultimately, research of this nature can also inform the selection of artifact-rejection
procedures that strike an appropriate balance to maximize signal-to-noise ratio. Two
important caveats to the abovementioned guideline (i.e., the minimum NoS per condition =
six) should be noted. First, this guideline may only generalize to studies that utilize the
NPU-threat task (Schmitz & Grillon, 2012). That is, a different NoS may be (and likely will
be) necessary to obtain reliable signals for other emotion modulated startle paradigms (e.g.,
affective picture viewing [e.g., Lang, Bradley & Cuthbert, 1997], or fear conditioning [Duits
et al., 2015]). Second, given that the present study’s clinical sample only included
individuals with select internalizing disorders (i.e., MDD and/or PD), the suggested
minimum NoS may not apply to individuals with other types of psychopathologies, such as
externalizing or psychotic disorders.
Author Manuscript
There are also several limitations to the present study that should be noted. First, the two
samples had slightly different NPU-threat tasks (e.g., countdowns vs. geometric shapes for
cues), although the overall recommended NoS for both samples were quite comparable.
Second, the sample size for retest reliability analyses was too small to evaluate whether
retest reliability differed by diagnosis. Third, although analyses were also conducted with
startle responses added in a random order (see Footnote 1), startle responses were only
randomized once for this purpose. Thus, future studies should examine whether results
change as a function of repeated random sampling. Additionally, further studies should
examine whether a similar NoS is necessary to obtain a reliable index of baseline startle
magnitude. However, this study benefited from several strengths including the assessment of
the reliability of startle across two samples, one of which included individuals with
diagnosed internalizing psychopathology. Additionally, the reliability of startle magnitude
and amplitude were examined, which is important given that these two methods of startle
quantification are each frequently used in research.
Author Manuscript
5. Conclusions
Results from the present study provide information that may help researchers obtain
psychometrically sound indices of emotional processing using the eyeblink startle reflex. In
particular, our findings suggest that a minimum of six responses may be sufficient for
obtaining a reliable and stable index of emotion-modulated startle (i.e., anxiety-potentiated
and fear-potentiated startle) during the NPU-threat task in non-clinical and clinical samples.
Int J Psychophysiol. Author manuscript; available in PMC 2018 April 01.
Lieberman et al.
Page 11
Author Manuscript
Although this guideline may apply to other emotion-modulated paradigms, future studies
should test this directly.
Acknowledgments
Funding: This study was supported by grants from National Institute of Mental Health (S.S., grant number R01
MH098093 and R21 MH080689.
References
Author Manuscript
Author Manuscript
Author Manuscript
Abd-El-Fattah SM, Hassan HK. Dependent-Alpha Calculator: Testing the Differences between
Dependent Coefficients Alpha. Journal of Applied Quantitative Methods. 2011; 6:59–61.
Blumenthal TD, Cuthbert BN, Filion DL, Hackley S, Lipp OV, van Boxtel A. Committee report:
Guidelines for human startle eyeblink electromyographic studies. Psychophysiology. 2005; 42(1):1–
15. DOI: 10.1111/j.1469-8986.2005.00271.x [PubMed: 15720576]
Bonnet DG. Sample Size Requirements for Testing and Estimating Coefficient Alpha. Journal of
Educational and Behavioral Statistics. 2002; 27(4):335–340. DOI: 10.3102/10769986027004335
Bradford DE, Starr MJ, Shackman AJ, Curtin JJ. Empirically based comparisons of the reliability and
validity of common quantification approaches for eyeblink startle potentiation in humans.
Psychophysiology. 2015; 52(12):1669–1681. DOI: 10.1111/psyp.12545 [PubMed: 26372120]
Bradley MM, Gianaros P, Lang PJ. As time goes by: Stability of affective startle modulation.
Psychophysiology. 1995; 32:21.
Campbell ML, Gorka SM, McGowan SK, Nelson BD, Sarapas C, Katz AC, … Shankman SA. Does
anxiety sensitivity correlate with startle habituation? An examination in two independent samples.
Cognition & Emotion. 2014; 28(1):46–58. DOI: 10.1080/02699931.2013.799062 [PubMed:
23746071]
Carleton RN, Norton MPJ, Asmundson GJ. Fearing the unknown: A short version of the Intolerance of
Uncertainty Scale. Journal of anxiety disorders. 2007; 21(1):105–117. DOI: 10.1016/j.janxdis.
2006.03.014 [PubMed: 16647833]
Cornwell BR, Johnson L, Berardi L, Grillon C. Anticipation of public speaking in virtual reality
reveals a relationship between trait social anxiety and startle reactivity. Biological Psychiatry. 2006;
59(7):664–666. DOI: 10.1016/j.biopsych.2005.09.015 [PubMed: 16325155]
Craske MG, Rauch SL, Ursano R, Prenoveau J, Pine DS, Zinbarg RE. What is anxiety disorder?
Depression and Anxiety. 2009; 26(12):1066–1085. DOI: 10.1002/da.20633 [PubMed: 19957279]
Cronbach LJ. Test “reliability”: Its meaning and determination. Psychometrika. 1947; 12(1):1–16.
DOI: 10.1007/BF02289289 [PubMed: 20293842]
Cronbach LJ, Meehl PE. Construct validity in psychological tests. Psychological Bulletin. 1955; 52(4):
281–302. DOI: 10.1037/h0040957 [PubMed: 13245896]
First, MB., Spitzer, RL., Gibbon, M., Williams, JBW. Structured Clinical Interview for DSM-IV Axis I
disorders (SCID I). New York: Biometric Research Department; 1996.
Foti D, Kotov R, Hajcak G. Psychometric considerations in using error-related brain activity as a
biomarker in psychotic disorders. Journal of Abnormal Psychology. 2013; 122(2):520–531. DOI:
10.1037/a0032618 [PubMed: 23713506]
Gorka SM, Lieberman L, Shankman SA, Phan KL. Startle potentiation to uncertain threat as a
psychophysiological indicator of fear-based psychopathology: An examination across multiple
internalizing disorders. Journal of Abnormal Psychology. 2017; 126(1):8–18. http://dx.doi.org/
10.1037/abn0000233. [PubMed: 27868423]
Gorka SM, Liu H, Sarapas C, Shankman SA. Time course of threat responding in panic disorder and
depression. International Journal of Psychophysiology. 2015; 98(1):87–94. DOI: 10.1016/
j.ijpsycho.2015.07.005 [PubMed: 26168883]
Grillon C, Ameli R. Conditioned inhibition of fear-potentiated startle and skin conductance in humans.
Psychophysiology. 2001; 38(5):807–815. DOI: 10.1017/S0048577201000294 [PubMed:
11577904]
Int J Psychophysiol. Author manuscript; available in PMC 2018 April 01.
Lieberman et al.
Page 12
Author Manuscript
Author Manuscript
Author Manuscript
Author Manuscript
Grillon C, Pine DS, Lissek S, Rabin S, Bonne O, Vythilingam M. Increased anxiety during
anticipation of unpredictable aversive stimuli in posttraumatic stress disorder but not in
generalized anxiety disorder. Biological Psychiatry. 2009; 66(1):47–53. DOI: 10.1016/j.biopsych.
2008.12.028 [PubMed: 19217076]
Hajcak G, Patrick CJ. Situating psychophysiological science within the research domain criteria
(RDoC) framework. International Journal of Psychophysiology. 2015; 98(2):223–226. DOI:
10.1016/j.ijpsycho.2015.11.001 [PubMed: 26546861]
Kaye JT, Bradford DE, Curtin JJ. Psychometric properties of startle and corrugator response in NPU,
affective picture viewing, and resting state tasks. Psychophysiology. 2016; doi: 10.1111/psyp.
12663
Lang, PJ., Bradley, MM., Cuthbert, BN. Motivated attention: Affect, activation, and action. In: Lang,
PJ.Simons, RF., Balaban, MT., editors. Attention and orienting: Sensory and motivational
processes. Hillsdale NJ: Lawrence Erlbaum Associates; 1997. p. 97-135.
Larson CL, Ruffalo D, Nietert JY, Davidson RJ. Temporal stability of the emotion-modulated startle
response. Psychophysiology. 2000; 37(1):92–101. DOI: 10.1111/1469-8986.3710092 [PubMed:
10705771]
Duits P, Cath DC, Lissek S, Hox JJ, Hamm AO, Engelhard IM, … Baas JM. Updated meta-analysis of
classical fear conditioning in the anxiety disorders. Depression and Anxiety. 2015; 32(4):239–253.
DOI: 10.1002/da.22353 [PubMed: 25703487]
Manber R, Allen JJB, Burton K, Kaszniak AW. Valence-dependent modulation of psychophysiological
measures: Is there consistency across repeated testing? Psychophysiology. 2000; 37(5):683–692.
DOI: 10.1111/1469-8986.3750683 [PubMed: 11037044]
Meyer A, Riesel A, Proudfit GH. Reliability of the ERN across multiple tasks as a function of
increasing errors. Psychophysiology. 2013; 50(12):1220–1225. DOI: 10.1111/psyp.12132
[PubMed: 24730035]
Moran TP, Jendrusina AA, Moser JS. The psychometric properties of the late positive potential during
emotion processing and regulation. Brain Research. 2013; 1516:66–75. DOI: 10.1016/j.brainres.
2013.04.018 [PubMed: 23603408]
Nelson BD, Hajcak G, Shankman SA. Event–related potentials to acoustic startle probes during the
anticipation of predictable and unpredictable threat. Psychophysiology. 2015; 52(7):887–894.
DOI: 10.1111/psyp.12418 [PubMed: 25703182]
Nunnally, JC. Psychometric theory. 2. New York: McGraw-Hill; 1978.
Santos J. Cronbach’s alpha: A tool for assessing the reliability of scales. Journal of Extension. 1999;
37(2):34–36.
Sarapas C, Weinberg A, Langenecker SA, Shankman SA. Relationships among attention networks and
physiological responding to threat. Brain and Cognition. 2017; 111:63–72. [PubMed: 27816781]
Schmitz A, Grillon C. Assessing fear and anxiety in humans using the threat of predictable and
unpredictable aversive events (the NPU-threat test). Nature Protocols. 2012; 7(3):527–532. DOI:
10.1038/nprot.2012.001 [PubMed: 22362158]
Schwartz SJ, Lilienfeld SO, Meca A, Sauvigné KC. The role of neuroscience within psychology: A
call for inclusiveness over exclusiveness. American Psychologist. 2016; 71(1):52–70. DOI:
10.1037/a0039678 [PubMed: 26766765]
Shankman SA, Gorka SM. Psychopathology research in the RDoC era: Unanswered questions and the
importance of the psychophysiological unit of analysis. International Journal of Psychophysiology.
2015; 98(2):330–337. [PubMed: 25578646]
Shankman SA, Nelson BD, Sarapas C, Robison-Andrew E, Campbell ML, Altman SE, … Gorka SM.
A psychophysiological investigation of threat and reward sensitivity in individuals with panic
disorder and/or major depressive disorder. Journal of Abnormal Psychology. 2013; 122(2):322–
338. DOI: 10.1037/a0030747 [PubMed: 23148783]
Tavakol M, Dennick R. Making sense of Cronbach’s alpha. International Journal of Medical
Education. 2011; 2:53–55. DOI: 10.5116/ijme.4dfb.8dfd [PubMed: 28029643]
Tomarken AJ. A psychometric perspective on psychophysiological measures. Psychological
Assessment. 1995; 7(3):387–395. DOI: 10.1037/1040-3590.7.3.387
Int J Psychophysiol. Author manuscript; available in PMC 2018 April 01.
Lieberman et al.
Page 13
Author Manuscript
Vrana SR, Spence EL, Lang PJ. The startle probe response: A new measure of emotion? Journal of
Abnormal Psychology. 1988; 97(4):487–491. DOI: 10.1037/0021-843X.97.4.487 [PubMed:
3204235]
Watson D, O’Hara MW, Simms LJ, Kotov R, Chmielewski M, McDade-Montez EA, … Stuart S.
Development and validation of the Inventory of Depression and Anxiety Symptoms (IDAS).
Psychological Assessment. 2007; 19(3):253.doi: 10.1037/1040-3590.19.3.253 [PubMed:
17845118]
Author Manuscript
Author Manuscript
Author Manuscript
Int J Psychophysiol. Author manuscript; available in PMC 2018 April 01.
Lieberman et al.
Page 14
Author Manuscript
Highlights
•
An investigation of the number of startle responses needed for a reliable
signal
•
Six startle responses were needed for acceptable reliability
•
This number did not differ for those with internalizing psychopathology
•
This guideline can inform task development and artifact rejection procedures
Author Manuscript
Author Manuscript
Author Manuscript
Int J Psychophysiol. Author manuscript; available in PMC 2018 April 01.
Lieberman et al.
Page 15
Author Manuscript
Author Manuscript
Author Manuscript
Author Manuscript
Int J Psychophysiol. Author manuscript; available in PMC 2018 April 01.
Lieberman et al.
Page 16
Author Manuscript
Author Manuscript
Author Manuscript
Author Manuscript
Int J Psychophysiol. Author manuscript; available in PMC 2018 April 01.
Lieberman et al.
Page 17
Author Manuscript
Author Manuscript
Figure 1.
Note. Internal consistency, as indexed by Cronbach’s alpha, of startle magnitude as a
function of number of responses during each condition of the NPU-threat task in the (A)
non-clinical, and (C) clinical sample (across all diagnostic groups). Split-level correlations
as a function of responses for potentiation scores in the (B) non-clinical, and (D) clinical
sample (across all diagnostic groups). Error bars represent a 95% confident interval.
Author Manuscript
Author Manuscript
Int J Psychophysiol. Author manuscript; available in PMC 2018 April 01.
Lieberman et al.
Page 18
Author Manuscript
Author Manuscript
Author Manuscript
Author Manuscript
Int J Psychophysiol. Author manuscript; available in PMC 2018 April 01.
Lieberman et al.
Page 19
Author Manuscript
Author Manuscript
Author Manuscript
Figure 2.
Note. Internal consistency, as indexed by Cronbach’s alpha, of startle magnitude as a
function of number of responses during each condition of the NPU-threat task in the clinical
sample among individuals with (A) no history of psychopathology, (B) MDD-only, (C) PDonly and (D) comorbid PD/MDD. Error bars represent a 95% confident interval.
Author Manuscript
Int J Psychophysiol. Author manuscript; available in PMC 2018 April 01.
Lieberman et al.
Page 20
Author Manuscript
Author Manuscript
Author Manuscript
Figure 3.
Note. Retest reliability in the clinical sample, as indexed by Pearson’s r, of average startle
magnitude during (A) each condition of the NPU-threat task, as well as (B) startle
magnitude potentiation to predictable and unpredictable threats (PCue - NCue, UCue – Ncue,
UISI – NISI). Error bars represent a 95% confident interval.
Author Manuscript
Int J Psychophysiol. Author manuscript; available in PMC 2018 April 01.
Lieberman et al.
Page 21
Table 1
Author Manuscript
Sample Demographics and Clinical Characteristics
Characteristic
Age
Gender (% Female)
Clinical Sample
Non-Clinical Sample
32.93 (12.31)
19.02 (1.38)
64.40
76.1
46.30
35.9
22.26 (10.61)
21.74 (81.90)
IDAS-Panic
11.93 (5.34)
11.78 (4.00)
IUS-12
28.22 (10.09)
27.74 (8.67)
Ethnicity (% Caucasian)
IDAS-Dysphoria
Note. IDAS = Inventory for Depression and Anxiety Symptoms (Watson et al., 2007); IUS-12 = Intolerance of Uncertainty scale (Carleton, Norton
& Asmundson, 2007)
Author Manuscript
Author Manuscript
Author Manuscript
Int J Psychophysiol. Author manuscript; available in PMC 2018 April 01.
Author Manuscript
Author Manuscript
Author Manuscript
Author Manuscript
Table 2
NoS
Int J Psychophysiol. Author manuscript; available in PMC 2018 April 01.
NISI
NCue
PISI
PCue
UISI
UCue
2
80
81
72
80
80
83
3
77
73
66
72
74
79
4
74
71
60
68
69
76
5
72
66
56
67
66
73
6
68
64
53
64
63
67
7
66
61
50
61
63
63
8
64
58
50
60
58
61
Lieberman et al.
Sample size at each NoS for the Non-Clinical Sample’s Cronbach’s Alpha Analyses of Magnitude Condition Averages
Note. NoS = Number of startle responses; N = No shock condition; P = Predictable shock condition; U = Unpredictable shock condition; ISI = Inter-stimulus interval
Page 22
Author Manuscript
Author Manuscript
Author Manuscript
Author Manuscript
Table 3
Int J Psychophysiol. Author manuscript; available in PMC 2018 April 01.
NoS
NISI
NCue
PISI
PCue
UISI
UCue
2
178
172
187
193
187
187
3
169
159
183
188
175
166
4
152
139
173
180
168
156
5
135
128
156
166
159
151
6
130
119
147
155
145
143
7
122
109
142
147
139
137
8
116
105
135
141
133
134
9
110
102
129
136
129
126
10
102
99
122
130
126
123
11
100
94
118
119
115
118
12
97
88
111
109
108
112
Lieberman et al.
Sample size at each NoS for the Clinical Sample’s Cronbach’s Alpha Analyses of Magnitude Condition Averages
Note. NoS = Number of startle responses; N = No shock condition; P = Predictable shock condition; U = Unpredictable shock condition; ISI = Inter-stimulus interval
Page 23