HIS IS A COMPANION PIECE TO
an article that appeared in the
January 1995 issue of Ergonomics in Design (Gilliland &
Schlegel,
1995). My focus
here is on detecting occurrences of unfitness for work
on a daily basis at the worksite, particularly jobs that affect public safety, public
health, or other public domain issues.
This discussion is intended to provoke
oratory testing of bodily fluids is a costly
and time-consuming procedure. For this
reason, urine testing is typically used on a
random basis. Additionally,
the turnaround time is too slow to provide a true
fitness-for-duty screening method. Lowcost devices or procedures are needed that
quickly, conveniently, and reliably assess
employees' fitness for duty and that are
legally and socially acceptable.
reflection and discussion about the issue
Types
of daily fitness-for-duty
testing in the
workplace.
There is considerable concern over impairment in the workforce and the effect
this impairment has on safety and productivity. When one considers only the
everyday factors that may restrict job performance, such as fatigue and illness, the
overall effect on safety and productivity
clearly becomes significant and merits the
development of countermeasures.
Testing
Urine testing is one such countermeasure for drugs, but the collection and lab-
testing
tools
for assessing
public
safety
and
health workers'
of Fitness-for-Duty
One may take a number of approaches
to test for fitness for duty in the workplace: performance,
neurological,
and
biochemical measurement. The three approaches are probably complementary:
biochemical and neurological measures
assess lifestyle, which is appropriate for
some occupations, and performance measures assess immediate,
job-relevant
issues. Testing relevant to the fitness-for-
duty question
Performance-
may be categorized
as
readiness
for
work.
BY
JAMES
C.
MILLER
shown in the table on the next page.
APRIL
1996
•
ERGONOMICS
IN
DESIGN
II
TYPES OF FITNESS-FOR-DUTY
TESTING
Biochemical
Direct (i.e., brain; not used)
Indirect (e.g., blood, expired air, urine,
hair)
Behavioral
Neurological
Subjective (e.g., DOT Standardized
Field Sobriety Test, or SFST)
Objective (e.g., EEG evoked
response, pupillometry)
Performance
Motor
Abstract (e.g., finger tapping)
Industrial (e.g., simulated driving)
Perceptual
Abstract (e.g., pattern
comparison)
Industrial
Higher cognitive
Abstract (e.g., code substitution)
Industrial
Mixed
Abstract
Industrial
Some investigators who have published
information about the effects of stressors
on performance measures in the performance category have suggested informally
that fatigue or environmental stressors may
impair the higher cognitive functions first,
perception next, and motor functions last.
For example, automated or highly over-
12
ERGONOMICS
IN
indices for evaluating fitness for duty. In
other words, a relatively large proportion of
nondetections
of impairment will occur
among those who are truly not impaired.
Unfortunately,
these two kinds of detections tend to be somewhat reciprocal: One
is enhanced at the other's expense. As I will
show, the latter may be the higher priority
for workplace testing.
Performance
Tests
Performance tests may be sorted into
several subcategories: motor, perceptual,
and higher cognitive. These labels each refer
to the highest level of "thinking" (cognition) required to perform a given test. For
example, a motor test might simply require
that the hand be connected to the eye by
normally functioning neural circuits. One
reliable motor test measures finger-tapping
speed and coordination. A perceptual test
might require that one accurately compare
patterns, identify colors, or recognize tones
as high, medium, or low pitch. Examples of
higher cognitive tests include (a) simple
mathematical processing, such as adding
three-digit sums with reasonable speed and
accuracy; (b) code substitution,
whereby
numbers (digits) are substituted rapidly and
accurately for letters (symbols) in a simple
code-solving
process; or (c) short-term
memory tasks in which, for example, a set
of letters (say, four) is briefly presented
(e.g., one second), followed by a series of
letters presented one at a time. The subject's task is to determine if the letters in
both sets match.
The face validity of performance tests
may be abstract, or not similar to workplace
tasks. Alternatively, performance tasks may
learned tasks such as tracking may be more
be drawn directly from the workplace, or
resistant to fatigue than mental tasks such
as decision making and other higher cognitive tasks involving short-term memory and
attention. Electroencephalographic
data
indicate that one can steer a vehicle successfully on a straight highway while the
brain's cortex is asleep (O'Hanlon & Kelley,
1977).
Measurements of higher cognitive functions may provide sensitive indices for evaluating fitness for duty. In other words, a
relatively large proportion of detections of
impairment will occur among those who are
truly impaired. Conversely, measurements
of motor functions may provide specific
industrial. For example, a color-naming task
may use colors taken from a palette designed to screen for color blindness during
a medical examination (abstract), or it may
use colored indicator lamps from a real
control panel (industrial).
Performance tests may also require or
not require skilled performance. For example, the finger-tapping motor test requires
only one demonstration and little practice
before an individual may be tested. On the
other hand, skilled compensatory tracking
performance may require 30 minutes to two
hours of practice.
Performance tests may use an absolute or
DESIGN·
APRIL
1996
a relative passlfail criterion or a mixture of
the two approaches. An absolute criterion
uses a distribution of data collected from a
representative sample of the population of
interest. The individual's performance on a
given test is then compared with expectations derived from that sample's score distribution. A relative criterion is based solely
on a distribution of scores provided by the
individual.
In the mixed approach, one uses a moving window of recent performance by an
employee to calculate that employee's baseline. The pass level is set at a standard position within the individual's performance
score distribution, such as two standard deviations from the mean. Thus, all employees
have an absolute pass criterion based on a
relative comparison: Their individual performance must fall within a specified portion of their own distribution
of recent
scores.
With most of the commercial tests described in this article, the employee is tested against his or her own baseline using the
mixed approach. This procedure avoids
some problems associated with the use of
population data, including inappropriate
discriminations
on the bases of age, sex,
computer literacy, intelligence, education,
dexterity, and even test anxiety. Using the
individual as his or her own control also
gives one greater sensitivity and specificity
than do comparisons to a population norm.
For a given effect size, the sensitivity of a
screening test increases with the number of
samples because of the decreasing sampling
variance of the estimate. The effect size is
the average effect of an independent variable on the measure of interest - for example, the average effect of 0.05 blood alcohol
content on tracking performance.
One
aspect of sensitivity is the quotient of effect
size and variability. Sensitivity is enhanced
as the useful data set is increased and is diminished as the useful data set is reduced
when effect size is held constant. Thus, a
desirable test provides sensitivity within a
short test period, but not so short that it is
insensitive. Another way to increase sensitivity is to measure several performance
functions at once and then look for any of
them to fall outside their respective normal
limits.
One way to increase specificity is to use
test-retest strategies. For example, Wade
Allen and Henry]ex
at Systems Technology, Inc., described the "l-of-n" strategy,
which I applied to the Factor 1000 test for
Performance
Factors, Inc. In the Factor
1000 l-of-n strategy, the test is usually
passed in the first or second 20- to 30second tracking trial. Test failure requires
failing every one of eight trials in 2 fourtrial sessions separated by a 10-minute rest
period. This strategy emphasizes the correct identification of unimpaired workers.
Conducting
Testing
Daily performance testing seems best
suited to the immediate determination of occasions when the employee is not ready to start work.
A reliable detection of relevant
psychomotor impairment may be
made at the start of the work period. However, the manager must
remain aware of test limitations
associated with the relationship
between the nature of the test and the
nature of the job.
For example, driving a truck safely requires intact decision-making skills, auditory and visual perception,
and eye-hand
coordination, among other things. If I test
only for eye-hand coordination, I may draw
conclusions based only on test failures, not
on test passes. When the employee fails the
test, he or she discloses that eye-hand coordination, a central nervous system function
essential to the job, is not operating well.
One may conclude that the employee is
likely not to perform safely.
However, when the employee passes the
eye-hand coordination test, he or she discloses only that a necessary, but not sufficient, function is working well. I call this
exclusion testing. A failure is reason to exclude
the employee, at least temporarily, from
safety-sensitive work. But other necessary
functions have not been tested, so one may
not conclude from a test pass that the job
will be performed safely. It is difficult to
identify - much less test for - all the cognitive and neuromuscular functions required
to perform a job well. Thus, performance
testing of the nature discussed here will
allow only detections of potentially unsafe
situations.
Even though it may sharply
reduce on-the-job accident probabilities,
performance testing cannot guarantee safe
job performance.
APRIL
1996
•
ERGONOMICS
Low- cost devices
or procedures
needed
are
that
quickly,
conveniently,
reliably
and
assess
employees'
fitness
for duty.
IN
DESIGN
13
VVorker Acceptance
To date, only scant data have been available that specifically address worker acceptance issues. As Gilliland and Schlegel
(1995) noted, workers are more likely to
accept a screening test that has good face
validity - in other words, if they believe the
test relates closely to their job performance
capabilities.
During validation tests of the NovaScan
(Nova Technology, Inc., or NTI), test participants - employees of a major shipyard in
Norfolk, Virginia - were asked their opinions of NovaScan compared with urinalysis.
According
to Bob O'Donnell
of
J
NTI, "Sixty-seven percent clearly
preferred NovaScan."
My experience with drivers who
have been tested daily at their
worksite before driving suggests
that immediate feedback to management about test results should not be
the ultimate objective of fitness-forduty testing. Professional drivers use their
own perceptions of the varying effort they
must exert from day to day to pass the test
to give themselves feedback about their fitness. Drivers may modify their drinking
(alcohol) and sleeping behaviors somewhat
to be ready for the test. They may introduce a nap into their schedule. Thus, I suspect that one highly useful application of
fitness-for-duty
testing will be private,
immediate fitness feedback to the professional driver. Most will use it; some will
ignore it.
Typically, several aspects of worker acceptance must be addressed. First, each worker
experiences the natural fear of the unknown
when faced with the prospect of daily testing before driving. With time, the worker
realizes that, given adequate rest and an
absence of depressant or otherwise psychoactive substances, the test can be passed
routinely. The worker therefore progresses
from anxiety to comfort and confidence.
This dimension is labeled comfort.
Second, the applicability of the tests to
the worker's job is of concern. Over time,
workers will perceive correlations or the
absence of correlations between the test
structures and various tasks within their
jobs. For example, in simulated driving,
they will immediately notice the obvious
general similarities and differences between
the simulation and actual driving, and as
In the
uses
a moving
window
of recent
performance
by an employee
to calculate
that
employee's
baseline.
14
ERGONOMICS
IN
DESIGN.
APRIL
1996
Image from the ReadyShift® I test
(seepage 17).
they continue to drive and perform the
simulation task, they will notice more specific similarities and differences between
the two tasks. This probably happens because they become expert psychophysical
observers of both tasks and are able to compare the two tasks in a meaningful manner.
This dimension is labeled relevance. It is
similar to the psychological construct, face
validity.
Third, hardware and software failures
may preclude testing on some workdays. No
objective judgment about fitness is available
to management or to the employee on these
days. This dimension is labeled availability,
and it is measured objectively. These incidents may affect workers' confidence in test
reliability, possibly changing their opinions
about the usefulness of testing. They may
reason that if the machine fails often, it is
probably not reliable even when it seems
to be operating correctly (i.e., guilt by association).
Finally, one should provide feedback about
test performance to workers. This should be
no more than a simple, volatile, brief display
on the screen at the end of a test (for example, pass or fail). Using this information and
many presentations of tests across months of
work, workers become able to provide estimates of how well the test results reflect their
fitness for duty. Over time, a worker should
increasingly perceive a test as an accurate
predictor of work performance. This dimension is labeled accuracy.
Management
Acceptance
Performance testing introduces some new
problems for managers. The employee who
is unable to pass a performance test may be
impaired because of (a) job-induced fatigue
accumulated from work period to work period, (b) non-job-induced
fatigue associated with the family (e.g., a new baby in the
household) or with outside employment, (c)
circadian rhythms in human performance
abilities, (d) prescription or over-the-counter
medications,
or (e) alcohol, illicit substances, and the like. No longer may managers conclude that the impaired employee
is guilty of an illegal act, as with urine testing. This being true, what should management do with the employee who fails?
What are the patterns of performance test
failure associated with acute and chronic
employee personal problems?
Management acceptance probably depends on at least three factors. First, managers are concerned about administering
the test. Highest on the list of concerns
in this area is the management of test failures. This dimension is labeled workload.
Second, managers are concerned about hardware and software failures, a dimension that
is labeled availability. These incidents may
change managers' opinions about the practicality of testing. Finally, managers expect
to see workers modify their off-duty behaviors somewhat to be ready for the test. They
also expect to see reductions in ancillary
costs resulting from incidents, accidents,
insurance rates, and worker compensation
claims. This dimension is labeled usefulness.
Given this discussion of the nature of
fitness-for-duty
testing and worker and
management acceptance issues, what performance test resources are available at present for application in the workplace?
Behavioral
Resources
Testing
Consider the commercial tests that follow. Because the focus here is on actual or
emerging commercial projects, the list does
not include tests created within government
agencies, such as the Department of Transportation and Department of Defense. Gilliland and Schlegel (1995) provided a review
performance tests
of 14 computer-based
and batteries, many created by government
agencies. This list overlaps their list only
with the Delta/ APTS battery.
EPS-IOO
Performance
System
(Eye
Dynamics, Inc. Torrance,
CA; formerly
Oculokinetics,
Inc., and Drug Detection
Systems, Inc.). This computer system evaluates the ability of an individual's eyes to
follow a moving light and react to a dim
and bright light stimulus. The system is
nondiagnostic and uses individual baselines.
Testing takes 90 seconds, and results are
immediate.
FIT (Pulse Medical Instruments, Inc.,
Rockville, MD). The FIT is a
pupillometric and nystagmus test
that identifies changes resulting
from fatigue, illness, and intoxication from drugs and alcohol.
It tests involuntary
responses
and requires no operator. Minimal
training is required, and there are
no learning or skill effects. The system is nondiagnostic
and uses individual
baselines. Testing takes 30 seconds; results
are immediate.
Eyegaze System (LC Technologies, Inc.,
Fairfax, VA). This system determines the
eye's gaze direction using a pupil-center/
corneal reflection method. While a small
infrared light-emitting diode (LED) illuminates the subject's eye, generating a corneal
reflection and brightening the pupil, a video
camera continually observes the eye's movements and pupil dilations. The Eyegaze
System encodes, stores, and analyzes eye
movements, fixations, and pupil dilation.
These parameters are useful in characterizing vigilance, mental workload, and attention.
Delta (Essex Corp., Alexandria, VA, and
Orlando, FL). The tests in the commercial
Delta battery assess the following mental
functions: associative memory, linguistic
information integration and manipulation,
spatial information integration and manipulation, perceptual input, and output and
response execution. All of the task implementations have been studied carefully and
extensively. They are all learned quickly,
and performance is stable across individuals
within tests. Data must be extracted from
computer files and processed with computer spreadsheets and statistics packages to
make decisions about fitness for duty. The
research version is called the Automated
Performance Test System (APTS).
APRIL
1996
•
ERGONOMICS
d sirable
provides
sensitivity
within
a short test
period,
but not
so short that it is
insensitive.
IN
DESIGN
15
It is di
to identify
ART-90/Vienna Test System (Schuhfried
GmbH, Modling, Austria). The ART -90 was
developed, built, and programmed between
1979 and 1982 based on scientific knowledge available at the time. Adaptation and
changes were completed in 1989. Hardware
and software were developed by Schuhfried
GmbH, Austria; road safety development
and research
were performed
by the
Institute for Road Safety in Vienna. The
ART -90 is used for driver testing by authorities in Austria and Germany, where driver
testing is compulsory in many instances.
More than 500,000 tests have been documented by these sources. The
ART-90 is also used by the railways in Switzerland and by institutions in Russia, Japan, Israel, the
Netherlands, and a number of other
countries.
FA CTOR 1000 (Performance
Factors, Inc., Alameda, CA). Development of the Critical Tracking Task
(FACTOR 1000 is the commercial version)
dates from the 1950s. It generates an estimate of the minimum delay time from the
hand to the eye for a continuous tracking
task (as opposed to a discrete reaction time
task). It was used successfully for a National
Highway Transportation
Safety Administration investigation of the behavior of
convicted drunk drivers and in the 1978 and
current Federal Highway Administration
investigations of commercial truck driver
fatigue. Testing usually takes about two
minutes, and results are available immediately.
NovaScan (Nova Technologies,
Inc.,
Tarzana, CA). Two main tasks are performed. They may involve, for example,
spatial visualization
and tracking. They
replace each other on a random basis. This
strategy allows a measure of the individual's
ability to switch from one skill to another.
Measures of attention are obtained with the
introduction of a third task. An indicator
appears in the corner of each screen for this
third task. The net effect of this paradigm
is to partial the overall attention system into a series of separate subskills, including
monitoring, attention switching, and interference with monitoring from each of the
main tasks.
Nova has about 30 standardized and documented tasks available, covering such skills
as logical reasoning, decision making, nu-
icult
- much
less test for - all
the cognitive
and
neuromuscular
functions
required
to perform
a
job well.
16
ERGONOMICS
IN
DESIGN·
APRIL
1996
merical manipulation, short-term memory,
situation awareness, tracking, and memory
functions. The employee's score on any
given test session is compared with that
person's baseline performance, defined as
the aver~ge of his or her passing scores over
the past 30 test sessions. Pass/fail criteria
are determined using traditional psychometric approaches. These criteria are used to
determine whether or not the individual
is within his or her normal performance
envelope.
Personal Safety Analyzer (CAE-Link,
Binghamton, NY). This matrix of tests is
self-administered
via a touch-screen personal computer monitor. No special skills
or external support are required to operate
the equipment. The system uses personal
baselines. The matrix of tests assesses an
employee's performance in key representative work tasks: the speed with which an
employee extracts information from stimulus displays, how quickly and accurately he
or she can use that information in decisionmaking tasks, and his or her response patterns. This framework is representative of
the human-machine interfaces that will be
encountered in the workplace, such as driving a vehicle or operating sophisticated
equipment. The tests encompass acquisition of information
(encoding), working
memory, long-term memory, decision making, and response selection and execution.
DAVE (Atlantis
Aerospace
Corp.,
Brampton, Ontario, Canada). The Divided
Attention Visual Experiment (DAVE) was
developed to assess levels of driving impairment in individuals affected by occlusive
sleep apnea syndrome. DAVE tests subjects
by measuring their ability to perform a subcritical tracking task while responding to a
four-choice, simple reaction-time task.
Truck Operator Proficiency System
(Systems Technology,
Inc., Hawthorne,
CA). The State of Arizona, with Systems
Technology, developed a fitness-for-duty
testing device that evaluates commercial
vehicle drivers' hand-eye coordination and
ability to divide attention. The driver sits
behind a steering wheel in a simulated truck
cab and steers an imaginary vehicle down a
road displayed on a computer screen. A
boring eight-minute test is administered,
and results are available immediately.
Drivers who fail the test for whatever reason (fatigue, illness, intoxication, etc.) are
deemed unfit for driving duty and are not
permitted to drive. The Arizona Department of Public Safety uses the device to test
commercial truck drivers at weigh stations
and puts drivers out of service for failing
the test. The department believes that this
regulatory use of the technology will be
upheld by the courts.
ReadyShift
(Evaluation Systems, Inc.,
Lakeside,
CA). The TOPS device was
enhanced by Evaluation Systems, to include
a driver's personal, non declining baseline of
scores for a five-minute test. This enhancement allows for the skill developed by a driver who takes the test day after day, month
after month. The driver's performance on a
given day is compared with the distribution
of the same driver's recent scores on the
test. ReadyShiftI is a desktop device, and
ReadyShiftII is installed in the cabs of overthe-road, long-haul trucks. The test takes
either five or ten minutes, depending on
the outcome of the first five minutes.
Results are available immediately.
Psychomotor Vigilance Task (Ambulatory
Monitoring, Inc., Ardsley, NY). The PVT
is an unalerted reaction-time
task. It was
selected for use in cockpit studies of aircrew
fatigue conducted by NASA-Ames Research
Center for the Federal Aviation Administration. This task was also used in the recent
Federal Highway Administration
investigation of countermeasures
to commercial
driver fatigue. The task runs on a small
portable device and on IBM-compatible
desktop computers. The PVT is also available in a hand-held, battery-powered box.
The test takes ten minutes and requires the
use of a desktop computer with spreadsheets and statistical
packages for data
analyses.
CogScreen Aeromedical
Edition
(Psychological Assessment Resources, Inc.,
Odessa, FL). This cognitive test battery was
designed primarily for recertifying aviators.
issues. Now the question before us is, can
we use this intelligent technology intelligently?
Bibliography
c., & Miller,]. C. (1990). Peiformance testing as a determinant offitness-for-duty (Tech.
Allen, R. W., Stein, A.
Paper 901870). Warrendale PA: Society of Automotive Engineers.
Gilliland, K., & Schlegel, R. E. (1993). Readiness to perform testing: A critical analysis of the concept and current
practices (DOT/FAA/AM-93/13).
Oklahoma City:
Civil Aeromedical Institute, Federal Aviation Administration.
Gilliland, K., & Schlegel, R. E. (1995,]anuary). Readiness-to-perform testing and the worker. Ergonomics in
Design, pp. 14-19.
Kennedy, R. S., Wilkes, R. L., Baltzley, D. R., & Fowlkes,
]. E. (1990). Development of microcomputer-based mental
acuity tests for repeated measures studies (NASA-CR185607). Orlando, FL: Essex Corp.
Miller, J. c., & Beels, C. A. (1994, February). Qualitative
assessment of motor carriers and test suppliers having experience with fitness-for-duty
testing (ESI- TR-93-003).
Lakeside, CA: Evaluation Systems.
O'Donnell, R. D. (1992). The NovaScan test paradigm:
Theoretical basis and validation. Dayton, OH: Nova
Technology, Inc.
O'Hanlon, J. F, & Kelley, G. R. (1977). Comparison of
performance and physiological changes between drivers who perform well and poorly during prolonged
vehicular operations. In R. R. Mackie (Ed.), Vigilance:
Theory, operational performance, and physiological correlates (pp. 87-110). New York: Plenum.
James C. Miller is vice president, Human Factors, at
Evaluation Systems, Inc., in Lakeside, California,
and a consultant with Miller Ergonomics, 1330 5th
Street, Imperial Beach, CA 91932-3208. This work
was supported by Federal Highway Administration
Contract DTFH61-93-C-00088,
administered by
the Trucking Research Institute of the American
Trucking Associations. A longer version was published in Miller and Beels (1994). This article reflects
the opinions of the author and does not necessarily
reflect the opinions, policies, or regulations of the
Department of Transportation, the Federal
Highway Administration, or the American Trucking
Association. UIil
u.s.
The functions assessed include attention,
spatial perception, reasoning, and response
time. It allows a varied battery to be selected and compares results with pilot norms.
The software runs on IBM-compatible
computers. Its distribution
is limited to
qualified testing professionals.
In Conclusion
The use of performance measurement
allows us to assess immediate, job-relevant
APRIL
1996
•
ERGONOMICS
IN
DESIGN
17