Positive and Negative Syndrome Scale (PANSS) Training: Challenges, Solutions
Positive and Negative Syndrome Scale (PANSS) Training: Challenges, Solutions
Positive and Negative Syndrome Scale (PANSS) Training: Challenges, Solutions
R
Innov Clin Neurosci. 2017;14(11–12):77–81
consistency of ratings are critical to ensuring reliability of
study measures and sensitivity to changes in the course
of a clinical trial. The Positive and Negative Syndrome Rater training helps to ensure reliability As described in the original 1987
Scale (PANSS) has been widely used in clinical trials of in measurement throughout the course of a publication,2 each PANSS item contains three
schizophrenia and other disorders and is considered the clinical trial. Precision in the use of a rating elements that must be used correctly in order
“gold standard” for assessment of antipsychotic treatment scale is important primarily because statistical to ensure that reliability and validity are
efficacy. The various features associated with training power to detect differences between treatment maintained:1 1) The item definition describes
and calibration of this scale are complex, reflecting the groups increases proportionally to inter-rater the construct under evaluation; 2) Each item
intricacy and heterogeneity of the disorders that the PANSS
reliability. A related secondary objective is contains a detailed description of the basis for
is used to evaluate. In this article, the authors review the
methods for ensuring reliability of the PANSS as well as a
to ensure that when scale items or subscale rating that indicates the sources of information
proposed trajectory for its use in the future. An overview score thresholds are being incorporated as intended to be used for each item. These
of the current principles, implementation, technologies, inclusion criteria, all raters in a study can sources include observations made during
and strategies for the best use of the PANSS; tips for how reliably classify subjects. Rater training further the interview, the patient verbal report, and/
to achieve consistency among raters; and optimal training enhances precision by standardizing interview or corroborative information obtained from
practices of this instrument are presented. procedures and codifying the principles of caregivers about symptoms and behaviors
KEYWORDS: Positive and Negative Syndrome Scale, use for a given scale across raters, sites, and during the reference period prior to the
PANSS, rater, rater training, technology, clinical trials regions.1 assessment; and 3) Each item includes a set
The Positive and Negative Syndrome Scale of carefully written anchors for each level of
(PANSS) has several complex features and severity, from 1 (absent) to 7 (extreme).
requires a thorough and structured approach
to rater training.1 Compared with rating scales CORE PRINCIPLES IN THE USE OF THE
developed for other disorders, the PANSS has PANSS
many items, evaluates a multidimensional Several approaches to the use of the
array of symptoms (e.g. positive, negative, PANSS might help raters and those leading
neuromotor, depressive), and involves the use training programs to achieve a high degree of
of data from patient reports, caregiver reports, reliability. Four core principles, summarized
and clinical observations. Consequently, the here, are taken from publications and lectures
PANSS takes up more time during training and given by Dr. Lewis Opler over the course of
requires a greater amount of time for one to many years. We summarize them briefly here
master it compared to many other instruments. so as to provide guidance to individual raters
and those persons implementing training certain items based solely on nonverbal the patient during the reference period. It is
programs to improve reliability. symptoms during the interview, such as Item sometimes challenging to obtain sufficient
First principle—Read each item N1 (blunted effect), will be rated based on information to cover all of the required areas,
definition and all anchor points carefully the presentation the rater can observe during but raters are first instructed to do their best
and interpret each element as literally the interview. Patients might describe a wide to obtain the necessary information from a
as possible. The process of rating PANSS range of experiences during the course of an third party. In the absence of any available
items requires a very close reading of each assessment—including some that occurred independent person to query, the rater may
required element. The item definition needs to more than one week ago. While that might use records of various sorts in order to gain
be considered first to determine whether the reveal beliefs or ideation that is, effectively, insight into behaviors during the past week.
item is applicable. If not, a score of 1 (absent) still present, many time-delimited phenomena Is adherence to the SCI-PANSS necessary
should be assigned. Any evidence suggesting might not be impacted. For example, Item P7 or is a general clinical psychiatric interview
the item is present should prompt a score (hostility) would not be directly impacted by sufficient to obtain information for the purpose
of 2 (minimal) or higher. Particularly when a fight that the patient had four weeks ago of rating? Most clinical trials now mandate
determining the highest score that applies when using the standard past week reference the use of the SCI-PANSS. Lindstrom3
(see below), efforts should be made to not period. and others4 have demonstrated that high
reinterpret the wording, and “impressionistic” Fourth principle—Use all available reliability can be generated between raters
scoring should be avoided. Terms involving information for rating, as long as it using the SCI-PANSS.1 While the SCI-PANSS
“and” or “and/or” should be closely attended meets the basis for rating. Instruments could be improved upon—and could be in
so as to ensure that all necessary elements are developed for other disorders sometimes future iterations—it is necessary to have a
present before assigning or eliminating a score assume a linear progression with discrete standardized approach to assessment across
from consideration. sections compartmentalized by scale item. visits, patients, and investigators so as to
Second principle—Always give the While the Structured Clinical Interview- help improve reliability. Additionally, the
highest rating that applies. Very often, PANSS (SCI-PANSS) does have some relatively SCI-PANSS is designed to help ensure that all
raters are faced with ambiguity. It might discrete components, it is more likely that necessary domains of inquiry are addressed.
be that the answers to queries are unclear information relevant to rating different items It is important, however, to remember that
or that the information available suggests may be presented at any time, possibly even the SCI-PANSS is intended to be used as
that more than one score may be applicable. well after the section on an item has been semi-structured interview guidelines rather
A simple solution—and a “convention” completed. Patients might also give conflicting than a rigidly conducted script. Rewording,
frequently applied for other instruments—is information at different points during an rephrasing, and other techniques to help
to “rate up” when more than one score might interview, denying a symptom initially and improve patient comprehension can and
be applicable. For the PANSS, a somewhat then endorsing it later. While it is difficult to should be engaged when applicable.
different approach is mandated, and instead anticipate every combination of presentations Additionally, there might be instances
of arbitrarily moving to select a score, raters or endorsements, raters should avoid assigning in which it is beneficial to change the
should instead always give the highest item scores during the interview and should order of the questions. For example, a
score that applies based on the available instead wait until the assessment is complete disorganized and challenging patient
information. For example, if a patient clearly and all necessary information (including might spontaneously begin talking about
meets the criteria for a score of 3 (mild) and informant data) is collected. At the conclusion hallucinatory experiences. A rater might then
also for 4 (moderate) on any item, as long of the assessment, all information that is determine that it is clinically advisable to
as all the necessary criteria for both items relevant and meets the basis for ratings take advantage of the opportunity to explore
are met, then the patient should receive a should be taken into account in the final this symptom further rather than attempting
score of 4 (moderate). In the same vein, if a determination. to redirect the interview at that point.
patient almost meets the criteria for a score Notably, there are several controversies that Is it necessary to use the anchoring points
of 4 (moderate), but is clearly missing some have arisen over the years with regard to the if the patient is quite severe across an entire
key component, then a score of 4 (moderate) proper use of the PANSS. While the following domain (e.g. positive symptoms)? Less
cannot be assigned. items do not comprise an exhaustive list, they experienced clinicians and raters are often
Third principle—Always consider still highlight some of the challenges that over-impressed by psychotic symptoms and
the reference period and time frame. raters should consider and develop techniques appear to rely less on the anchor points in
Some patients are not always clear about and strategies to address. these instances. While it is tempting to “save
the time frame under examination during Is collateral (informant) information required time” by assigning blanket scores for items
an assessment. Typically, the PANSS is rated to rate the PANSS? Two items in the PANSS impressionistically, such an approach fails
based on a “past week” reference period (i.e. (N4 and G16) are rated solely on the basis of to meet the standards for reliable use of the
the ratings are based on the most severe information meant to be gathered from an PANSS. Raters are urged to carefully reach
phenomenon for a given item in the past informant such as a caregiver or a treating each item and assign the highest score that
week). It is worth noting, however, that clinician who has had significant contact with applies on the basis of the written anchors.
In cases in which the local definition information from a psychotic patient; 3) Certification procedures. In the past,
of an item/concept differs from the one training should be relevant and individualized certification to administer the PANSS was
shown in the PANSS rating criteria, may the to the specific clinical trial; and 4) a rater’s commonly based on the successful rating of
local alternative be substituted? Different behavior in the laboratory of an investigator a videotaped PANSS interview. However, this
disciplines and fields of study can variably meeting does not necessarily reflect the is a passive procedure that fails to assess the
define common concepts (e.g. delusions). rater’s behavior while at his or her site rating investigator’s ability to deliver a thorough
In clinical practice, these approaches might patients.8 and unbiased interview. It is critical to
have significant value to treatment of Interactive training. PANSS training standardize both the interview technique
patients in a local context; for example, is rapidly evolving to address the above and measurement skills. A newer procedure
if a culturally influenced explanation of a issues. Increasingly, traditional, passive, for certification is to require candidates to
symptom that is acceptable to the patient classroom-style training is being replaced successfully interview and measure the
and his/her family needs to be explored and with interactive, case-oriented methods that symptom severity of highly synchronized
acknowledged by the treating clinician to require active participation from investigators. actors portraying patients with psychotic
facilitate communication and adherence For example, in the “roundtable approach,” disorders. The use of quantified approaches to
with treatment, then this is of great value to investigators are organized in small groups, the evaluation of interview technique has been
all stakeholders in that context.5 However, often by site and nationality. Instead of a long linked with data quality and signal separation,
within the confines of a clinical trial, repetitive lecture, there is a short review of making this “active” evaluation a more relevant
particularly one that is multi-site and/or the basic principles of rating followed by case and meaningful approach to certification.1
global in nature, the need for standardization discussions. Within each group, raters come Videotaped interviews are more commonly
across visits, sites, and regions for the to a consensus with their colleagues from used than actors to evaluate assessment
purposes of research necessitates that all their sites and countries. This synchronizes a technique and scoring, in part because video
raters adhere to the common definitions of rating methodology within a site and prevents recording is more resource-intensive than
terms without substitution. “noise in the ratings” when raters cross-cover training and synchronizing actors in multiple
for each other. The session is moderated by an languages and bringing them to investigator
IMPLEMENTATION OF TRAINING appropriately qualified trainer who is capable meetings. For the most part, raters with
Traditionally, rater training for the PANSS of synthesizing the various points of view sufficient credentials and experience
involved raters attending an investigator and who is tasked with ensuring compliance administering the PANSS to the population
meeting for each clinical trial, where they to core principles and gold-standard under study are certified if they meet certain
would sit classroom style, listen to a slide- approaches. There are many variations in this standards of accuracy and precision with
based lecture, view videotaped interviews, methodology but they share the concepts of their measurement of the individual PANSS
and rate them through an audience response active participation and consensus-building to items and the PANSS total, based on both
system. Outlying scores were discussed with replace passive listening. gold standards and statistical outliers. To
the goal of optimizing inter-rater reliability. In the past, the centerpiece of training accelerate the rater approval process, decisions
Certification was based on scoring an agreed- for both beginner and advanced raters regarding success or failure of the candidate
upon percentage of items with fidelity to the were lengthy, item-by-item ratings of full, as well as remediation may be delivered at
“gold standard.” At a mid-study investigator unselected PANSS interviews. The current the investigator meeting. Like any assay, the
meeting intended to prevent rater drift, trend for experienced raters is to teach with measurement of psychotic symptoms must
raters would review a slide lecture and rate shorter vignettes targeting relevant areas of be periodically recalibrated. Intra-rater and
an additional videotaped interview, and were study design, such as the population under inter-rater reliability should be assessed and
remediated if their scores were outside the study (e.g. acutely psychotic, prominent remediated regularly.
“gold standard.”6 positive symptoms, predominant negative
Limitations of traditional training. symptoms, stable, treatment resistant), IMPACT OF TECHNOLOGY
Such methodologies were capable of achieving change from baseline, and difficult to rate Technology has provided vibrant, efficient
and maintaining high levels of reliability and symptoms. alternatives to expensive, potentially
have effectively remained unchanged since Assessment technique. Interview skill inefficient in-person, multi-country
the original Phase III studies of risperidone in assessment and feedback has become integral investigator meetings. Initial training, as
the 1990s.7 However, the limitations of this to PANSS training and addresses the ability of well as mid-study refresher training, may
methodology have become apparent and are the rater to probe the population under study occur by use of “live” web conferencing,
as follows: 1) raters working on multiple trials sufficiently so as to distinguish among the essentially recapitulating the interaction of an
are sometimes subjected to repetitive training anchor points of each item in a neutral manner investigator meeting, or in an “on-demand”
that does not take their individual issues in unlikely to induce a placebo response. This is manner, either online or application-based.9
PANSS rating into account; 2) rating a video- most effective when using highly trained live Adaptive and risk-based methods may be
taped interview does not address the correct actors who challenge the investigator with applied to individualize PANSS training to
assessment technique and the ability to elicit scripted foils. triage a rater to more basic or advanced
nuanced curriculums or to steer the training have gained attention in the last decade.3,14,15 increase variance and thus the ability to detect
toward specifc areas for improvement. The latter technique uses algorithms to the signal where it exists). This method has
Avatars can be programmed with decision generate flags for what is often referred to as proved cost-effective, and the targeted nature
tree logic to serve as subjects for interview a risk-based approach to monitoring in-study of intervention requires fewer resources than
skills training. Virtual reality may be used to data. Algorithms can consist of logical binary interval retraining (e.g., training done every
create a realistic assessment environment. All or factorial relationships between one or more 3–6 months) for the full cohort of raters. More
these technologies, and more to come, might scale items or more sophisticated statistical importantly, the reduction in non-informative
transform traditional training and make it techniques that leverage large clinical trial data can make the difference between a failed
more useful, practical, and effective in years datasets with known outcome parameters. For or negative trial and one that is positive.
to come. the purposes of this article, we will limit our Prospective, adequately controlled
Use of electronic clinical outcome discussion to the sorts of binary and factorial comparisons of methodologies for rater
assessment (eCOA). Another means by which relationships that exist within the PANSS and training or in-study data quality monitoring
newer technologies can bolster PANSS training how these can be used to generate flags. For coupled with remediation are rare because
and data quality is use of eCOA. Platforms example, if a rater scores at the level of 7 on sponsors are reluctant to vary methodologies
utilizing this methodology can assess ratings Item P5 (grandiosity) and then scores Item within a clinical trial. The comparison of
for logical inconsistencies among PANSS P1 (delusions) at the level of 1, this would methodologies across trials is complicated
items and between the PANSS and other generate a flag. This is because at the level of by multiple uncontrolled differences in trial
scales and alert the investigator before data 7 on P5, we expect significant and pervasive characteristics. That said, used in parallel,
submission. The investigator has the option grandiose delusions and, if that is the case, the methodologies are complimentary and
to reevaluate their rating or to maintain the then the P1 should receive a similarly severe can reinforce the four principles critical to
original scores. eCOA also permits additional rating. While this is an extreme example (and obtaining reliable and valid data for the
alerts and reminders to be made to the usually related to the rater’s reluctance to duration of a trial. Although the results must
rater. For example, the PANSS rater may be “double rate” the same symptom) it serves be evaluated carefully, comparisons of inter-
prompted to include informant information to illustrate the essential idea that the rater reliability, nonspecific variance, placebo
when appropriate or to periodically remind instrument relationships themselves can response, and drug-placebo differences
the subject of the reference period. Notes show us where there is a high risk for error. across trials using different methodologies
to support the choice of anchor point might Another illustration comes in the form of can be informative, if not definitive.15 Newer
be required. This technology was positively the Marder16 five-factor model for the PANSS interview training and scale rule training
received by both patients and caregivers, with (though some dispute this factor solution8); techniques can be evaluated against in-study
minimum modification requests.10 in such frameworks, the expected correlations metrics based on error rates detected by data
The capacity for audio/video recording between items that represent factors can be analytics as well as via an external expert
of SCI-PANSS interviews can be embedded used to detect aberrant presentations and review of recorded patient interviews. The
in the eCOA platform to facilitate deeper potential risk.5,11 For example, if we think independent review of patient interviews is
independent review of visits, either through about the negative factor that includes N1 highly recommended for all clinical trials. It
an a priori plan (e.g. evaluation of every rater’s to N4, N6, and G7 and we expect that these has been demonstrated that interviews that
first assessment) or via a risk-based approach will be predictably correlated (within certain are recorded and reviewed have PANSS scores
using inconsistencies detected within PANSS severity ranges), we can identify risk when that align better with the scale requirements.17
data to “flag” an evaluation for review. Early one or more correlation fails to agree with the
detection and remediation of these data flaws identified matrix. CONCLUSION
is critical for study success and to prevent “rater How are these risks are dealt with? Is PANSS rater training has become a standard
drift.”11 Continual evaluation of the quality of it actually rater error that is present? Or component of most clinical trials, but true
a site’s interviews and ratings and retraining is it simply a somewhat unusual patient standardization with respect to the exact
as necessary should continue throughout all presentation? Intervention methods differ and approaches, techniques, and standards
phases of the trial, just as any assay would be depend on who is leading the data-monitoring remains elusive. For clinical trials using the
repeatedly monitored and recalibrated. effort, but if actual rater error is responsible, PANSS, it is strongly advised that the training
this is the point at which a targeted training program incorporates the core principles
EVALUATION OF NEWER TRAINING event takes place. It must be emphasized described in this article and advocated by
AND DATA MONITORING PROCEDURES here that an expert clinician with a very clear the author of the PANSS. Where possible, we
There have been a number of solutions to understanding of the scale and the patient also further recommend the following:1)
managing rater drift during clinical trials. population must complete the training. This Favor active learning techniques over passive
Remote, independent rating,12 smaller trials in-study targeted training is essential in ones, particularly for experienced clinicians
with more experienced rater cohorts,13 and a arresting rater drift and reducing the impact and raters with meaningful prior experience
number of in-study techniques that utilize the of non-informative data (i.e., data that using the PANSS. While some raters have
internal logic of instruments like the PANSS contribute little to the goal of the study but persistent idiosyncrasies in their approaches
to the use of the scale, active approaches will and Negative Syndrome Scale (PANSS) for Meas. 2015;6:91–101.
be far more effective in highlighting these schizophrenia. Schizophr Bull. 1987;13(2): 11. Kobak K, Opler M, Engelhardt N. PANSS rater
issues and enabling retention of new concepts 261–276. training using Internet and videoconference:
and information; 2) evaluations of inter- 3. Lindström E, Wieselgren IM, von Knorring L. results from a pilot study. Schizophr Res.
rater reliability should include a videotaped Interrater reliability of the Structured Clinical 2007;92(1-3):63–67.
interview or evaluation of a standardized Interview for the Positive and Negative 12. Shen J, Kobak KA, Zhao Y, et al. Use of remote
subject/volunteer; most optimally, Syndrome Scale for schizophrenia. Acta Psychiatr centralized raters via live 2-way video in a
certification will involve an assessment of Scand. 1994;89(3):192–195. multicenter clinical trial for schizophrenia. J Clin
interview technique as well as inter-rater 4. Knorring LV, Lindström E. Principal components Psychopharmacol. 2008;28(6):691–693.
reliability to ensure that all prospective raters and further possibilities with the PANSS. Acta 13. Alphs L, Benedetti F, Fleischhacker W, Kane
are capable of conducting an evaluation Psychiatrica Scandinavica. 1995;92(s388):5–10. J. Placebo-related effects in clinical trials in
that strikes the proper balance of adherence 5. Napo F, Heinz A, Auckenthaler A. Explanatory schizophrenia: what is driving this phenomenon
to the interview guide and maintenance of models and concepts of West African Malian and what can be done to minimize it?. Int J
flexibility and clinical research rapport; and patients with psychotic symptoms. European Neuropsychopharmacol. 2012;15(7):1003–1014.
lastly 3) following initial training, quality Psychiatry. 2012;27 Suppl 2:S44–S49. 14. Rabinowitz J, Schooler N, Anderson A, et al.
assurance approaches should include ongoing 6. Müller MJ, Wetzel H. Improvement of inter-rater Consistency checks to improve measurement
evaluation of data and assessment technique, reliability of PANSS items and subscales by a with the Positive and Negative Syndrome Scale
are employed. Where possible, technology standardized rater training. Acta Psychiatr Scand. (PANSS). Schizophr Res. 2017 Mar 8. pii:
can and should be used to help facilitate 1998;98(2):135–139. S0920–9964(17)30141-X.
these processes. Whether utilizing eCOA to 7. Leucht S, Kane JM, Kissling W, et al. What does 15. Daniel D, Kalali A, West M, et al. Data quality
replace paper with electronic forms or driving the PANSS mean? Schizophr Res. 2005;79(2- monitoring in clinical trials: Has it been worth it?
“targeted calibration” through an analysis of 3):231–238. an evaluation and prediction of the future by all
data in-study, a dynamic approach to ensuring 8. van der Gaag M, Cuijpers A, Hoffman T, et stakeholders. Innov Clin Neurosci. 2016;13(1–2):
inter-rater reliability will help to guarantee al. The five-factor model of the Positive and 27–33.
that core principles are applied rigorously Negative Syndrome Scale I: confirmatory factor 16. Marder SR, Davis JM, Chouinard G. The effects
throughout the study. analysis fails to confirm 25 published five-factor of risperidone on the five dimensions of
solutions. Schizophr Res. 2006;85(1-3):273–297. schizophrenia derived by factor analysis:
REFERENCES 9. Kobak KA, Feiger AD, Lipsitz JD. Interview quality combined results of the North American trials. J
1. Sajatovic M, Gaur R, Tatsuoka C, et al. Rater and signal detection in clinical trials. Am J Clin Psychiatry. 1997;58(12):538–546.
training for a multi-site, international Psychiatry. 2005;162(3):628. 17. Kott A, Daniel DG. Effects of PANSS audio/
clinical trial: What mood symptoms may be 10. Tolley C, Rofail D, Gater A, Lalonde JK. The video recordings on the presence of identical
most difficult to rate? Psychopharmacol Bull. feasibility of using electronic clinical outcome scorings across visits. Eur Neuropsychopharmacol.
2011;44(3):5–14. assessments in people with schizophrenia and 2015;25:S543–S544. ICNS
2. Kay SR, Fiszbein A, Opler LA. The Positive their informal caregivers. Patient Relat Outcome