Reliability and Validity of A Structured Interview Guide For The Hamilton Anxiety Rating Scale (Sigh-A)

166 Shear et al.
DEPRESSION AND ANXIETY 13:166–178 (2001)
RELIABILITY AND VALIDITY OF A STRUCTURED

INTERVIEW GUIDE FOR THE HAMILTON ANXIETY
RATING SCALE (SIGH-A)
M. Katherine Shear, M.D.,1* Joni Vander Bilt, M.P.H.,1 Paola Rucci, D. Stat.,1 Jean Endicott, Ph.D.,2
Bruce Lydiard, M.D.,3 Michael W. Otto, Ph.D.,4 Mark H. Pollack, M.D.,4 Linda Chandler, Ph.D.,5
Jenna Williams, B.S.,1 Arjumand Ali, and David M. Frank1
The Hamilton Anxiety Rating Scale, a widely used clinical interview assess-
ment tool, lacks instructions for administration and clear anchor points for the
assignment of severity ratings. We developed a Structured Interview Guide for
the Hamilton Anxiety Scale (SIGH-A) and report on a study comparing this
version to the traditional form of this scale. Experienced interviewers from
three Anxiety Disorders research sites conducted videotaped interviews using
both traditional and structured instruments in 89 participants. A subset of the
tapes was co-rated by all raters. Participants completed self-report symptom
questionnaires. We observed high inter-rater and test-retest reliability using
both formats. The structured format produced similar but consistently higher
(+ 4.2) scores. Correlation with a self-report measure of overall anxiety was
also high and virtually identical for the two versions. We conclude that in set-
tings where extensive training is not practical, the structured scale is an
acceptable alternative to the traditional Hamilton Anxiety instrument. De-
pression and Anxiety 13:166–178, 2001. © 2001 Wiley-Liss, Inc.
Key words: anxiety disorders; generalized anxiety disorders; assessment

instrument; outcomes
INTRODUCTION this study was concerned that such variance in outcome

ratings might have contributed to difficulty detecting
The Hamilton Anxiety Rating Scale [HAM-A or true drug-placebo differences in several large and ex-
HARS: Hamilton, 1959, 1969] is a 14-item clinical in- pensive multicenter clinical studies of GAD. To ad-
terview measure of somatic and psychic anxiety symp- dress this problem, we developed a modified form of
toms. This scale was one of the first attempts to the instrument that includes instructions for standard
measure the clinical status of patients diagnosed with administration and better-specified anchors for assign-
“neurotic anxiety states” quantitatively and has become ing severity ratings.
one of the most widely used symptom rating scales in
the world. Although the scale assesses a broad range of
symptoms that are common to all eight of the DSM IV 1
University of Pittsburgh School of Medicine, Pittsburgh,
Anxiety Disorders, it is most often used to assess sever- Pennsylvania
ity of Generalized Anxiety Disorder (GAD). The 2
Columbia University, New York, New York
Hamilton Anxiety Rating Scale comprises the main 3
Medical University of Southern Carolina, Charleston, South
outcome measure in most treatment studies of this dis- Carolina
order. However, in its original form, the scale has no 4
Massachusetts General Hospital, Boston, Massachusetts
5
established reliability aids, such as instructions for ad- Pfizer Incorporated, New York, New York
ministration or for scoring, and there are no scripted
questions to guide the interviewers who administer this Contract grant sponsor: Pfizer Company; Contract grant spon-
scale. Without such guidelines, the method of adminis- sor: National Institute of Mental Health; Contract grant numbers:
MH-53817, MH30915
tering each item and assigning the level of symptom
severity can be quite arbitrary. These deficiencies could *Correspondence to: Dr. M. Katherine Shear, M.D., Western Psy-
result in inconsistent use of this scale, which could in- chiatric Institute and Clinic, 3811 O’Hara Street, Pittsburgh,
crease the variability of treatment outcome ratings and Pennsylvania, 15213. E-mail: shearmk@msx.upmc.edu
decrease the accuracy of cross-site or cross-rater com-
parisons [Bruss et al., 1994]. The industry sponsor of Received for publication 11 May 2000; Accepted 12 October 2000
© 2001 WILEY-LISS, INC.

Research Article: HAM-A Anxiety Reliability 167
The structured interview approach was motivated in der as on day one. In addition, all participants com-
part by the successful development of a structured in- pleted the Beck Anxiety Inventory (BAI). Patients
terview guide for the Hamilton Depression Rating completed the Patient Global Improvement (PGI) on
Scale [SIGH-D; Williams, 1988]. The SIGH-A was day 2 to ensure that there was no substantial change in
designed to improve the ease and consistency of use by clinical state. The mean PGI score on day 2 was 3.73
employing specific instructions for administration, (SD 1.02) (3 = minimally improved; 4 = unchanged).
structured questions for each item, and operationalized Raters who participated in this study were experi-
criteria for scoring. The instructions include sugges- enced research interviewers. Similar to standard proce-
tions for probing and handling “boundary” problems dures in pharmaceutical trials, they received a brief
in rating severity, as well as for establishing a uniform introduction and instructions for using each scale but
time frame during which the symptoms are rated (Fig. did not undergo formal training and certification pro-
1). Instructions to rate items on a boundary by using cedures. There was no cross-site training. In order to
the lower rating may provide more sensitivity at lower avoid crossover effects, e.g., inadvertent use of the
levels of symptoms and might be useful to avoid “floor guidelines from the structured interview, raters at each
effects.” The purpose of this paper is to report results site administered either the structured SIGH-A only
of a study designed to determine the reliability of the or the unstructured HAM-A version only. A different
structured interview compared to the traditional form rater administered each scale on day 1 and day 2, thus
of the Hamilton Anxiety Rating Scale and to document providing a more stringent test of inter-rater reliability
the correlation between the two forms, when adminis- than co-rated videotapes. All interviews were video-
tered to outpatients with different anxiety disorders. taped and sent to Western Psychiatric Institute and
Clinic. Two raters at each site, along with two addi-
tional raters from New York State Psychiatric Institute,
METHODS carried out co-ratings of 32 videotaped interviews for
The study sample included individuals who sought the structured interview form of the Hamilton Scale
treatment for an anxiety disorder at one of three sites and Hamilton Anxiety Rating Scale. The 32 tapes were
(Western Psychiatric Institute and Clinic, Massachu- selected by a random process of 16 from the first half
setts General Hospital, and the Medical University of of the sample, and 16 from the second half of the
South Carolina) between April 1, 1997 and February sample, with an even distribution of the range of Clini-
28, 1998. Eligible patients signed informed consent as cal Global Impression severity scores across sites.
approved by the Institutional Review Boards associated
with the three study sites. All interviews were video-
taped for the purpose of co-rating. In addition to com- INSTRUMENTS
pleting structured interviews, all patients completed HAMILTON ANXIETY RATING SCALE
self-report questionnaires assessing anxiety-related [HARS; HAMILTON, 1959]
symptoms. Participants underwent two interviews on This instrument was developed to assess and quan-
each of 2 days and were compensated $50 for each day. tify symptom severity among patients with anxiety
Study participants were consenting patients, age 18 neurosis. Inter-rater reliability has been reported as an
years or older, who met criteria for a DSM IV Anxiety Intraclass Correlation Coefficient of 0.74–0.96 [Bruss
Disorder of at least 6 months in duration, as deter- et al., 1994].
mined by trained raters, using the Structural Clinical In-
terview for DSM-IV [SCID-P with Psychotic Screen; BECK ANXIETY INVENTORY [BAI; BECK ET
Spitzer et al., 1995]. Participants were excluded from AL., 1988]
the study if they had a primary diagnosis of Major De-
pressive Disorder, Panic Disorder without comorbid This instrument is a 21-item, self-report question-
Generalized Anxiety Disorder, psychotic disorder, or if naire designed to assess and evaluate the frequency of
they met criteria for any psychoactive substance abuse anxiety symptoms over a one-week period. This test
or dependence within the past 6 months or were assesses two factors: cognitive and somatic symptoms.
judged unable to participate in the interviews reliably The instrument has good internal consistency (α =
because of things like chaotic lifestyle or practical 0.92), test-retest reliability (r = 0.75; df = 81, P =
problems or other characteristics judged by the coordi- <.001), and convergent and discriminant validity.
nator to be likely to interfere with providing accurate
or complete data. RESULTS
Participants underwent 2 days of testing within a 7-
day period. On each day all participants completed DEMOGRAPHIC AND CLINICAL
both the traditional Hamilton Anxiety Rating Scale CHARACTERISTICS
and the structured interview form of the Hamilton Eighty-nine adults participated in the study includ-
scale, with the order of these scales randomly assigned ing 30 at Massachusetts General Hospital, 30 at the
and counterbalanced across the two testing days. On Medical University of South Carolina, and 29 at
day 2 the scales were administered in the opposite or- Western Psychiatric Institute and Clinic. Sixty percent
168 Shear et al.
Figure 1. SIGH-A: Structured interview for the Hamilton Anxiety Rating Scale.
170 Shear et al.
172 Shear et al.
174 Shear et al.
176 Shear et al.
were female. Participants mean age was 37.1 years (SD Compulsive Disorder (n = 10, 11%). Considering any
10.9) with range of 19 to 68 years. Eighty-eight per- current diagnosis, 74% endorsed Generalized Anxiety
cent were European-American, 6% African-American, Disorder (n = 66), 33% Social Phobia (n = 29), 24%
4% Hispanic, and 2% other. Thirty-eight (43%) were Panic Disorder (n = 21), 14% Major Depression (n =
married. Three participants (3%) had no high school 12), 14% Obsessive Compulsive disorder (n = 12),
diploma or G.E.D., 30 patients (34%) highest degree 10% Dysthymic Disorder (n = 9), and 5% Posttrau-
was a high school diploma or G.E.D., and 56 (63%) matic Stress Disorder (n = 4).
had completed some college, graduate school, or voca-
tional/technical/business school following high school. RELIABILITY
The mean years of education was 14.4 (SD = 3.4). Test-retest reliability was assessed with the Intraclass
Sixty-three participants (71%) were either employed Correlation Coefficient [Bartko and Carpenter, 1966]
or were students and 26 (29%) were unemployed. (ICC) for the total score on each instrument, assigned by
Primary DSM IV diagnoses included Generalized raters on day 1 and day 2. Using a two-way random ef-
Anxiety Disorder (n = 46, 52%), Social Phobia (n = 20, fects model, the ICC for the HAM-A was 0.86 (CI 95%
22%), Panic Disorder (n = 13, 15%), and Obsessive 0.78–0.91). The corresponding ICC for the SIGH-A day
1-day 2 ratings was 0.89 (CI 95% 0.83–0.93).
Inter-rater reliability was assessed using the ICC for
the co-ratings of the 32 selected videotaped inter-
views. The ICC was 0.98 (CI 95% 0.97–0.99) for the
traditional scale and 0.99 (CI 95%: 0.98–0.99) for the
SIGH-A. Item correlations for traditional Hamilton
Anxiety Rating Scale ranged from 0.94 to 0.98, except
for item 14 (observational rating of behavior), where
the ICC was 0.76. Item-total correlations for the
structured interview guide ranged from 0.91 to 0.99,
except for item 14 , where the correlation was 0.81.
VALIDITY OF SIGH-A AS A MEASURE OF
HAMILTON ANXIETY SCORE
The total score obtained using the structured inter-
view format correlated highly with the total score of
the traditional format on both day one (r = 0.77, P <
.01) and day two (r = 0.75, P < .01). We also examined
the relationship between each of the two versions of
the Hamilton Scale with scores on the Beck Anxiety
Figure 2. Relationship between SIGH-A and HAM-A total Inventory, a different way of obtaining an overall mea-
score. sure of somatic and cognitive anxiety. The correlation
TABLE 1. Reliability of SIGH-A and HAM-A Scales in Patients With and Without Current GAD.
Total sample Current GAD No current GAD
ICC (test-retest) for HAM-A .86 (.78–.91) .79 (.66–.87) .94 (.85–.97)
ICC (test-retest) for SIGH-A .89 (.83–.93) .88 (.80–.92) .93 (.84–.97)
ICC (inter-rater reliability) for HAM-A .98 (.97–.99) .98 .98
ICC (inter-rater reliability) for SIGH-A .99 (.98–.99) .98 .99
HAM-A/SIGH-A correlation .77 (day1) .75 (day2) .70 (day1) .72 (day2) .89 (day1) .84 (day2)
Internal consistency (alpha) for SIGH-A .82 .79 .88
Internal consistency (alpha) for HAM-A .85 .82 .92
Mean HAM-A total score (day1)(S.D.) 20.58 (8.48) 20.85 (7.47) 19.80 (11.05)
Mean SIGH-A total score (day1)(S.D.) 24.62 (9.09) 24.67 (8.68) 24.48 (10.38)
of the two forms was essentially the same (0.53 for the formation = –2.6; P < 0.05), whereas this is not true of
traditional Hamilton scale and 0.57 for the SIGH-A). the SIGH A.
This finding provides further confirmation of the con-
vergent validity of the two forms of the instrument.
DISCUSSION
RELATIONSHIP BETWEEN THE HAM-A The Hamilton Anxiety Rating Scale is an exten-
AND SIGH-A sively used assessment instrument, which was devel-
The structured interview guide was associated with oped for rating severity of anxiety symptoms prior to
consistently higher scores, such that the mean score the development of reliable diagnostic criteria for
for this version was 24.6 (SD 9.1), while the mean for different Anxiety Disorders. In some studies it has
the traditional format was 20.5 (SD 8.4) on day 1 shown sensitivity to change and may be useful as an
(paired-sample t-test = 6.4, df = 88, P < 0.001) and outcome measure in clinical settings. The HAM-A is
23.5 (SD 9.0) and 19.2 (SD 7.6) on day 2 (paired- the primary outcome measure most often used in
sample t-test = 6.8, df = 88, P < 0.001). Considering treatment studies of Generalized Anxiety Disorder,
individual items, correlation between the two forms and it is also used to rate severity of anxiety symp-
ranged from 0.33 for item 14 (behavior at interview) toms in other disorders. The lack of instructions for
to 0.78 for item 5 (trouble concentrating or remem- administration and the absence of clear anchor points
bering). In all cases, on both days, the mean item for severity ratings mean training is somewhat diffi-
score from the structured interview was higher than cult and decisions for both administration and scor-
for the same item on the traditional scale. The mag- ing can be idiosyncratic.
nitude of this difference ranged from a mean of 0.02 We developed a structured interview to provide an
(items 8, 9, and 10) to 0.57 (item 4). The mean dif- explicit guide for the use of the Hamilton scale. The
ference between the scales was 4.1 for day 1 and 4.3 study reported here documents good reliability of the
for day 2. The relationship between the two forms, “SIGH-A”, though the traditional scale also performed
as obtained by linear regression for day 1, was well. Of some interest, there was significantly lower
SIGH-A score = 7.547 + 0.832 HAM-A score (Fig. rater reliability across 2 days on the unstructured
2). These differences are most likely a result of ex- HAM-A in GAD patients compared to non-GAD pa-
plicit instructions on the structured interview form, tients. Scores on the two instruments were highly cor-
which describe how to rate severity and indicate how related in this study, with a uniform and reliable
to resolve boundary differences, as well as the sys- difference between them. One possible reason that the
tematic inquiry made for each item. For consistency, SIGH-A has yielded slightly higher scores on average is
these instructions tell the rater to score questionable that, unlike the traditional scale, the SIGH-A instructs
cases at the higher level. clinicians to probe subject responses for frequency, dis-
tress, and interference before making ratings. Further-
RELIABILITY OF THE HAM-A AND SIGH-A
more, these ratings are based on distinct severity scale
IN PATIENTS WITH AND WITHOUT anchor points. Because the HAM-A does not instruct
CURRENT GAD clinicians to probe subject responses before making rat-
Since the HAM-A is the most frequently used out- ings, this may contribute to the HAM-A generating
come measure in patients with Generalized Anxiety lower total scores due to potentially obtaining less in-
Disorder, we examined reliability and validity in pa- formation from subjects. A second possibility is that the
tients with (n = 66) and without (n = 23) current GAD. raters, despite not having formal training on the SIGH-
Table 1 compares the reliability results. On the more A instrument, may have been aware of the hypothesis of
stringent test of inter-rater HAM-A performs signifi- the study. These rater expectancies could have influ-
cantly worse in subjects with GAD than in subjects enced the ratings in a consistent direction.
with other anxiety disorders (t-test on Fisher’s z trans- Our study is limited by the fact that this was a
178 Shear et al.
sample of subjects who presented to our University- PhD; Miriam Gibbon, MSW; Hillary Glick, PhD; and
based clinics. The subjects we recruited were similar Kristin Trautman, MSW; at Massachusetts General
to those who come to our settings for treatment of Hospital: Steve Safren, PhD; Isabel Scarinci, PhD;
anxiety and depressive disorders in having a range of Naomi Simon, MD; and Sabine Wilhelm, PhD; at the
diagnoses and severity. Moreover, collection of data Medical University of South Carolina: Sarah Book,
from three sites improves generalizability. However, MD; Marsha Crawford , RN,C; Naresh Emmanuel,
we cannot be certain if results would be similar for pa- MD; Michael Johnson, MD; Rebecca Kapp, RN; and
tients presenting to other clinical settings or for those Alex Morton, PharmD; and at the University of Pitts-
who do not seek treatment. In addition, all raters in burgh: Ulrike Feske, PhD; Briggett Ford PhD; Caro-
this study had prior experience as research raters; we lyn Hughes LSW; Mark Jones, LSW; Carl Lejuez MS;
do not know whether untrained raters would achieve Mary McShea, M.Ed.; and Pamela Stimac, LSW. This
similar levels of reliability on either instrument. project was supported in part by a grant to Dr. Shear
We conclude that either form of the Hamilton scale from the Pfizer Company and in part by the National
can be used with confidence by trained research raters. Institute of Mental Health grants MH-53817 and
The main advantage of the traditional format is that it MH30915.
has been used for many years. However, the advantage
of the structured interview is that it provides instruc-
tions to assist in training and increased consistency of REFERENCES
administration and scoring, which may also generate Bartko JJ, Carpenter WT. 1966. The Intraclass Correlation Coeffi-
more appropriate cross-site comparisons and increase cient as a measure of reliability Psychol Rep 19:3–11.
the variability of treatment outcome ratings. The fact Beck AT, Epstein N, Brown G, Steer RA. 1988. An inventory for
that raters in this study were all experienced research measuring clinical anxiety: psychometric properties. J Consult
assessors may have contributed to the lack of signifi- Clin Psychol 56:893–897.
cant differences between the two instruments. Provid- Bruss GS, Gruenberg AM, Goldstein RD, Barber JP. 1994. Hamil-
ing clear instructions may be especially useful when ton anxiety rating scale interview guide: joint interview and test-
raters are inexperienced and extensive training is im- retest methods for interrater reliability. Psychiatry Res 53:
practical. A study comparing ratings in such a situa- 191–202.
tion would be of interest. Hamilton M. 1959. The assessment of anxiety states by rating. Br J
Psychiatry, 32:50–55.
Hamilton M. 1969. Diagnosis and rating of anxiety. Br J Psychiatry
ACKNOWLEDGMENTS Special Pub 3:76–79.
The authors acknowledge the contributions of as- Williams JB. 1988. A structured interview guide for the Hamilton
sessors at Columbia University: Richard Blumenthal, Depression Rating Scale. Arch Gen Psychiatry 45:742–747.

Reliability and Validity of A Structured Interview Guide For The Hamilton Anxiety Rating Scale (Sigh-A)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Reliability and Validity of A Structured Interview Guide For The Hamilton Anxiety Rating Scale (Sigh-A)

Uploaded by

Copyright:

Available Formats

166 Shear et al.

DEPRESSION AND ANXIETY 13:166–178 (2001)

RELIABILITY AND VALIDITY OF A STRUCTURED

Key words: anxiety disorders; generalized anxiety disorders; assessment

INTRODUCTION this study was concerned that such variance in outcome

© 2001 WILEY-LISS, INC.

Total sample Current GAD No current GAD

You might also like