Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Fit for duty?

1996, Ergonomics in Design

HIS IS A COMPANION PIECE TO an article that appeared in the January 1995 issue of Ergonomics in Design (Gilliland & Schlegel, 1995). My focus here is on detecting occurrences of unfitness for work on a daily basis at the worksite, particularly jobs that affect public safety, public health, or other public domain issues. This discussion is intended to provoke oratory testing of bodily fluids is a costly and time-consuming procedure. For this reason, urine testing is typically used on a random basis. Additionally, the turnaround time is too slow to provide a true fitness-for-duty screening method. Lowcost devices or procedures are needed that quickly, conveniently, and reliably assess employees' fitness for duty and that are legally and socially acceptable. reflection and discussion about the issue Types of daily fitness-for-duty testing in the workplace. There is considerable concern over impairment in the workforce and the effect this impairment has on safety and productivity. When one considers only the everyday factors that may restrict job performance, such as fatigue and illness, the overall effect on safety and productivity clearly becomes significant and merits the development of countermeasures. Testing Urine testing is one such countermeasure for drugs, but the collection and lab- testing tools for assessing public safety and health workers' of Fitness-for-Duty One may take a number of approaches to test for fitness for duty in the workplace: performance, neurological, and biochemical measurement. The three approaches are probably complementary: biochemical and neurological measures assess lifestyle, which is appropriate for some occupations, and performance measures assess immediate, job-relevant issues. Testing relevant to the fitness-for- duty question Performance- may be categorized as readiness for work. BY JAMES C. MILLER shown in the table on the next page. APRIL 1996 • ERGONOMICS IN DESIGN II TYPES OF FITNESS-FOR-DUTY TESTING Biochemical Direct (i.e., brain; not used) Indirect (e.g., blood, expired air, urine, hair) Behavioral Neurological Subjective (e.g., DOT Standardized Field Sobriety Test, or SFST) Objective (e.g., EEG evoked response, pupillometry) Performance Motor Abstract (e.g., finger tapping) Industrial (e.g., simulated driving) Perceptual Abstract (e.g., pattern comparison) Industrial Higher cognitive Abstract (e.g., code substitution) Industrial Mixed Abstract Industrial Some investigators who have published information about the effects of stressors on performance measures in the performance category have suggested informally that fatigue or environmental stressors may impair the higher cognitive functions first, perception next, and motor functions last. For example, automated or highly over- 12 ERGONOMICS IN indices for evaluating fitness for duty. In other words, a relatively large proportion of nondetections of impairment will occur among those who are truly not impaired. Unfortunately, these two kinds of detections tend to be somewhat reciprocal: One is enhanced at the other's expense. As I will show, the latter may be the higher priority for workplace testing. Performance Tests Performance tests may be sorted into several subcategories: motor, perceptual, and higher cognitive. These labels each refer to the highest level of "thinking" (cognition) required to perform a given test. For example, a motor test might simply require that the hand be connected to the eye by normally functioning neural circuits. One reliable motor test measures finger-tapping speed and coordination. A perceptual test might require that one accurately compare patterns, identify colors, or recognize tones as high, medium, or low pitch. Examples of higher cognitive tests include (a) simple mathematical processing, such as adding three-digit sums with reasonable speed and accuracy; (b) code substitution, whereby numbers (digits) are substituted rapidly and accurately for letters (symbols) in a simple code-solving process; or (c) short-term memory tasks in which, for example, a set of letters (say, four) is briefly presented (e.g., one second), followed by a series of letters presented one at a time. The subject's task is to determine if the letters in both sets match. The face validity of performance tests may be abstract, or not similar to workplace tasks. Alternatively, performance tasks may learned tasks such as tracking may be more be drawn directly from the workplace, or resistant to fatigue than mental tasks such as decision making and other higher cognitive tasks involving short-term memory and attention. Electroencephalographic data indicate that one can steer a vehicle successfully on a straight highway while the brain's cortex is asleep (O'Hanlon & Kelley, 1977). Measurements of higher cognitive functions may provide sensitive indices for evaluating fitness for duty. In other words, a relatively large proportion of detections of impairment will occur among those who are truly impaired. Conversely, measurements of motor functions may provide specific industrial. For example, a color-naming task may use colors taken from a palette designed to screen for color blindness during a medical examination (abstract), or it may use colored indicator lamps from a real control panel (industrial). Performance tests may also require or not require skilled performance. For example, the finger-tapping motor test requires only one demonstration and little practice before an individual may be tested. On the other hand, skilled compensatory tracking performance may require 30 minutes to two hours of practice. Performance tests may use an absolute or DESIGN· APRIL 1996 a relative passlfail criterion or a mixture of the two approaches. An absolute criterion uses a distribution of data collected from a representative sample of the population of interest. The individual's performance on a given test is then compared with expectations derived from that sample's score distribution. A relative criterion is based solely on a distribution of scores provided by the individual. In the mixed approach, one uses a moving window of recent performance by an employee to calculate that employee's baseline. The pass level is set at a standard position within the individual's performance score distribution, such as two standard deviations from the mean. Thus, all employees have an absolute pass criterion based on a relative comparison: Their individual performance must fall within a specified portion of their own distribution of recent scores. With most of the commercial tests described in this article, the employee is tested against his or her own baseline using the mixed approach. This procedure avoids some problems associated with the use of population data, including inappropriate discriminations on the bases of age, sex, computer literacy, intelligence, education, dexterity, and even test anxiety. Using the individual as his or her own control also gives one greater sensitivity and specificity than do comparisons to a population norm. For a given effect size, the sensitivity of a screening test increases with the number of samples because of the decreasing sampling variance of the estimate. The effect size is the average effect of an independent variable on the measure of interest - for example, the average effect of 0.05 blood alcohol content on tracking performance. One aspect of sensitivity is the quotient of effect size and variability. Sensitivity is enhanced as the useful data set is increased and is diminished as the useful data set is reduced when effect size is held constant. Thus, a desirable test provides sensitivity within a short test period, but not so short that it is insensitive. Another way to increase sensitivity is to measure several performance functions at once and then look for any of them to fall outside their respective normal limits. One way to increase specificity is to use test-retest strategies. For example, Wade Allen and Henry]ex at Systems Technology, Inc., described the "l-of-n" strategy, which I applied to the Factor 1000 test for Performance Factors, Inc. In the Factor 1000 l-of-n strategy, the test is usually passed in the first or second 20- to 30second tracking trial. Test failure requires failing every one of eight trials in 2 fourtrial sessions separated by a 10-minute rest period. This strategy emphasizes the correct identification of unimpaired workers. Conducting Testing Daily performance testing seems best suited to the immediate determination of occasions when the employee is not ready to start work. A reliable detection of relevant psychomotor impairment may be made at the start of the work period. However, the manager must remain aware of test limitations associated with the relationship between the nature of the test and the nature of the job. For example, driving a truck safely requires intact decision-making skills, auditory and visual perception, and eye-hand coordination, among other things. If I test only for eye-hand coordination, I may draw conclusions based only on test failures, not on test passes. When the employee fails the test, he or she discloses that eye-hand coordination, a central nervous system function essential to the job, is not operating well. One may conclude that the employee is likely not to perform safely. However, when the employee passes the eye-hand coordination test, he or she discloses only that a necessary, but not sufficient, function is working well. I call this exclusion testing. A failure is reason to exclude the employee, at least temporarily, from safety-sensitive work. But other necessary functions have not been tested, so one may not conclude from a test pass that the job will be performed safely. It is difficult to identify - much less test for - all the cognitive and neuromuscular functions required to perform a job well. Thus, performance testing of the nature discussed here will allow only detections of potentially unsafe situations. Even though it may sharply reduce on-the-job accident probabilities, performance testing cannot guarantee safe job performance. APRIL 1996 • ERGONOMICS Low- cost devices or procedures needed are that quickly, conveniently, reliably and assess employees' fitness for duty. IN DESIGN 13 VVorker Acceptance To date, only scant data have been available that specifically address worker acceptance issues. As Gilliland and Schlegel (1995) noted, workers are more likely to accept a screening test that has good face validity - in other words, if they believe the test relates closely to their job performance capabilities. During validation tests of the NovaScan (Nova Technology, Inc., or NTI), test participants - employees of a major shipyard in Norfolk, Virginia - were asked their opinions of NovaScan compared with urinalysis. According to Bob O'Donnell of J NTI, "Sixty-seven percent clearly preferred NovaScan." My experience with drivers who have been tested daily at their worksite before driving suggests that immediate feedback to management about test results should not be the ultimate objective of fitness-forduty testing. Professional drivers use their own perceptions of the varying effort they must exert from day to day to pass the test to give themselves feedback about their fitness. Drivers may modify their drinking (alcohol) and sleeping behaviors somewhat to be ready for the test. They may introduce a nap into their schedule. Thus, I suspect that one highly useful application of fitness-for-duty testing will be private, immediate fitness feedback to the professional driver. Most will use it; some will ignore it. Typically, several aspects of worker acceptance must be addressed. First, each worker experiences the natural fear of the unknown when faced with the prospect of daily testing before driving. With time, the worker realizes that, given adequate rest and an absence of depressant or otherwise psychoactive substances, the test can be passed routinely. The worker therefore progresses from anxiety to comfort and confidence. This dimension is labeled comfort. Second, the applicability of the tests to the worker's job is of concern. Over time, workers will perceive correlations or the absence of correlations between the test structures and various tasks within their jobs. For example, in simulated driving, they will immediately notice the obvious general similarities and differences between the simulation and actual driving, and as In the uses a moving window of recent performance by an employee to calculate that employee's baseline. 14 ERGONOMICS IN DESIGN. APRIL 1996 Image from the ReadyShift® I test (seepage 17). they continue to drive and perform the simulation task, they will notice more specific similarities and differences between the two tasks. This probably happens because they become expert psychophysical observers of both tasks and are able to compare the two tasks in a meaningful manner. This dimension is labeled relevance. It is similar to the psychological construct, face validity. Third, hardware and software failures may preclude testing on some workdays. No objective judgment about fitness is available to management or to the employee on these days. This dimension is labeled availability, and it is measured objectively. These incidents may affect workers' confidence in test reliability, possibly changing their opinions about the usefulness of testing. They may reason that if the machine fails often, it is probably not reliable even when it seems to be operating correctly (i.e., guilt by association). Finally, one should provide feedback about test performance to workers. This should be no more than a simple, volatile, brief display on the screen at the end of a test (for example, pass or fail). Using this information and many presentations of tests across months of work, workers become able to provide estimates of how well the test results reflect their fitness for duty. Over time, a worker should increasingly perceive a test as an accurate predictor of work performance. This dimension is labeled accuracy. Management Acceptance Performance testing introduces some new problems for managers. The employee who is unable to pass a performance test may be impaired because of (a) job-induced fatigue accumulated from work period to work period, (b) non-job-induced fatigue associated with the family (e.g., a new baby in the household) or with outside employment, (c) circadian rhythms in human performance abilities, (d) prescription or over-the-counter medications, or (e) alcohol, illicit substances, and the like. No longer may managers conclude that the impaired employee is guilty of an illegal act, as with urine testing. This being true, what should management do with the employee who fails? What are the patterns of performance test failure associated with acute and chronic employee personal problems? Management acceptance probably depends on at least three factors. First, managers are concerned about administering the test. Highest on the list of concerns in this area is the management of test failures. This dimension is labeled workload. Second, managers are concerned about hardware and software failures, a dimension that is labeled availability. These incidents may change managers' opinions about the practicality of testing. Finally, managers expect to see workers modify their off-duty behaviors somewhat to be ready for the test. They also expect to see reductions in ancillary costs resulting from incidents, accidents, insurance rates, and worker compensation claims. This dimension is labeled usefulness. Given this discussion of the nature of fitness-for-duty testing and worker and management acceptance issues, what performance test resources are available at present for application in the workplace? Behavioral Resources Testing Consider the commercial tests that follow. Because the focus here is on actual or emerging commercial projects, the list does not include tests created within government agencies, such as the Department of Transportation and Department of Defense. Gilliland and Schlegel (1995) provided a review performance tests of 14 computer-based and batteries, many created by government agencies. This list overlaps their list only with the Delta/ APTS battery. EPS-IOO Performance System (Eye Dynamics, Inc. Torrance, CA; formerly Oculokinetics, Inc., and Drug Detection Systems, Inc.). This computer system evaluates the ability of an individual's eyes to follow a moving light and react to a dim and bright light stimulus. The system is nondiagnostic and uses individual baselines. Testing takes 90 seconds, and results are immediate. FIT (Pulse Medical Instruments, Inc., Rockville, MD). The FIT is a pupillometric and nystagmus test that identifies changes resulting from fatigue, illness, and intoxication from drugs and alcohol. It tests involuntary responses and requires no operator. Minimal training is required, and there are no learning or skill effects. The system is nondiagnostic and uses individual baselines. Testing takes 30 seconds; results are immediate. Eyegaze System (LC Technologies, Inc., Fairfax, VA). This system determines the eye's gaze direction using a pupil-center/ corneal reflection method. While a small infrared light-emitting diode (LED) illuminates the subject's eye, generating a corneal reflection and brightening the pupil, a video camera continually observes the eye's movements and pupil dilations. The Eyegaze System encodes, stores, and analyzes eye movements, fixations, and pupil dilation. These parameters are useful in characterizing vigilance, mental workload, and attention. Delta (Essex Corp., Alexandria, VA, and Orlando, FL). The tests in the commercial Delta battery assess the following mental functions: associative memory, linguistic information integration and manipulation, spatial information integration and manipulation, perceptual input, and output and response execution. All of the task implementations have been studied carefully and extensively. They are all learned quickly, and performance is stable across individuals within tests. Data must be extracted from computer files and processed with computer spreadsheets and statistics packages to make decisions about fitness for duty. The research version is called the Automated Performance Test System (APTS). APRIL 1996 • ERGONOMICS d sirable provides sensitivity within a short test period, but not so short that it is insensitive. IN DESIGN 15 It is di to identify ART-90/Vienna Test System (Schuhfried GmbH, Modling, Austria). The ART -90 was developed, built, and programmed between 1979 and 1982 based on scientific knowledge available at the time. Adaptation and changes were completed in 1989. Hardware and software were developed by Schuhfried GmbH, Austria; road safety development and research were performed by the Institute for Road Safety in Vienna. The ART -90 is used for driver testing by authorities in Austria and Germany, where driver testing is compulsory in many instances. More than 500,000 tests have been documented by these sources. The ART-90 is also used by the railways in Switzerland and by institutions in Russia, Japan, Israel, the Netherlands, and a number of other countries. FA CTOR 1000 (Performance Factors, Inc., Alameda, CA). Development of the Critical Tracking Task (FACTOR 1000 is the commercial version) dates from the 1950s. It generates an estimate of the minimum delay time from the hand to the eye for a continuous tracking task (as opposed to a discrete reaction time task). It was used successfully for a National Highway Transportation Safety Administration investigation of the behavior of convicted drunk drivers and in the 1978 and current Federal Highway Administration investigations of commercial truck driver fatigue. Testing usually takes about two minutes, and results are available immediately. NovaScan (Nova Technologies, Inc., Tarzana, CA). Two main tasks are performed. They may involve, for example, spatial visualization and tracking. They replace each other on a random basis. This strategy allows a measure of the individual's ability to switch from one skill to another. Measures of attention are obtained with the introduction of a third task. An indicator appears in the corner of each screen for this third task. The net effect of this paradigm is to partial the overall attention system into a series of separate subskills, including monitoring, attention switching, and interference with monitoring from each of the main tasks. Nova has about 30 standardized and documented tasks available, covering such skills as logical reasoning, decision making, nu- icult - much less test for - all the cognitive and neuromuscular functions required to perform a job well. 16 ERGONOMICS IN DESIGN· APRIL 1996 merical manipulation, short-term memory, situation awareness, tracking, and memory functions. The employee's score on any given test session is compared with that person's baseline performance, defined as the aver~ge of his or her passing scores over the past 30 test sessions. Pass/fail criteria are determined using traditional psychometric approaches. These criteria are used to determine whether or not the individual is within his or her normal performance envelope. Personal Safety Analyzer (CAE-Link, Binghamton, NY). This matrix of tests is self-administered via a touch-screen personal computer monitor. No special skills or external support are required to operate the equipment. The system uses personal baselines. The matrix of tests assesses an employee's performance in key representative work tasks: the speed with which an employee extracts information from stimulus displays, how quickly and accurately he or she can use that information in decisionmaking tasks, and his or her response patterns. This framework is representative of the human-machine interfaces that will be encountered in the workplace, such as driving a vehicle or operating sophisticated equipment. The tests encompass acquisition of information (encoding), working memory, long-term memory, decision making, and response selection and execution. DAVE (Atlantis Aerospace Corp., Brampton, Ontario, Canada). The Divided Attention Visual Experiment (DAVE) was developed to assess levels of driving impairment in individuals affected by occlusive sleep apnea syndrome. DAVE tests subjects by measuring their ability to perform a subcritical tracking task while responding to a four-choice, simple reaction-time task. Truck Operator Proficiency System (Systems Technology, Inc., Hawthorne, CA). The State of Arizona, with Systems Technology, developed a fitness-for-duty testing device that evaluates commercial vehicle drivers' hand-eye coordination and ability to divide attention. The driver sits behind a steering wheel in a simulated truck cab and steers an imaginary vehicle down a road displayed on a computer screen. A boring eight-minute test is administered, and results are available immediately. Drivers who fail the test for whatever reason (fatigue, illness, intoxication, etc.) are deemed unfit for driving duty and are not permitted to drive. The Arizona Department of Public Safety uses the device to test commercial truck drivers at weigh stations and puts drivers out of service for failing the test. The department believes that this regulatory use of the technology will be upheld by the courts. ReadyShift (Evaluation Systems, Inc., Lakeside, CA). The TOPS device was enhanced by Evaluation Systems, to include a driver's personal, non declining baseline of scores for a five-minute test. This enhancement allows for the skill developed by a driver who takes the test day after day, month after month. The driver's performance on a given day is compared with the distribution of the same driver's recent scores on the test. ReadyShiftI is a desktop device, and ReadyShiftII is installed in the cabs of overthe-road, long-haul trucks. The test takes either five or ten minutes, depending on the outcome of the first five minutes. Results are available immediately. Psychomotor Vigilance Task (Ambulatory Monitoring, Inc., Ardsley, NY). The PVT is an unalerted reaction-time task. It was selected for use in cockpit studies of aircrew fatigue conducted by NASA-Ames Research Center for the Federal Aviation Administration. This task was also used in the recent Federal Highway Administration investigation of countermeasures to commercial driver fatigue. The task runs on a small portable device and on IBM-compatible desktop computers. The PVT is also available in a hand-held, battery-powered box. The test takes ten minutes and requires the use of a desktop computer with spreadsheets and statistical packages for data analyses. CogScreen Aeromedical Edition (Psychological Assessment Resources, Inc., Odessa, FL). This cognitive test battery was designed primarily for recertifying aviators. issues. Now the question before us is, can we use this intelligent technology intelligently? Bibliography c., & Miller,]. C. (1990). Peiformance testing as a determinant offitness-for-duty (Tech. Allen, R. W., Stein, A. Paper 901870). Warrendale PA: Society of Automotive Engineers. Gilliland, K., & Schlegel, R. E. (1993). Readiness to perform testing: A critical analysis of the concept and current practices (DOT/FAA/AM-93/13). Oklahoma City: Civil Aeromedical Institute, Federal Aviation Administration. Gilliland, K., & Schlegel, R. E. (1995,]anuary). Readiness-to-perform testing and the worker. Ergonomics in Design, pp. 14-19. Kennedy, R. S., Wilkes, R. L., Baltzley, D. R., & Fowlkes, ]. E. (1990). Development of microcomputer-based mental acuity tests for repeated measures studies (NASA-CR185607). Orlando, FL: Essex Corp. Miller, J. c., & Beels, C. A. (1994, February). Qualitative assessment of motor carriers and test suppliers having experience with fitness-for-duty testing (ESI- TR-93-003). Lakeside, CA: Evaluation Systems. O'Donnell, R. D. (1992). The NovaScan test paradigm: Theoretical basis and validation. Dayton, OH: Nova Technology, Inc. O'Hanlon, J. F, & Kelley, G. R. (1977). Comparison of performance and physiological changes between drivers who perform well and poorly during prolonged vehicular operations. In R. R. Mackie (Ed.), Vigilance: Theory, operational performance, and physiological correlates (pp. 87-110). New York: Plenum. James C. Miller is vice president, Human Factors, at Evaluation Systems, Inc., in Lakeside, California, and a consultant with Miller Ergonomics, 1330 5th Street, Imperial Beach, CA 91932-3208. This work was supported by Federal Highway Administration Contract DTFH61-93-C-00088, administered by the Trucking Research Institute of the American Trucking Associations. A longer version was published in Miller and Beels (1994). This article reflects the opinions of the author and does not necessarily reflect the opinions, policies, or regulations of the Department of Transportation, the Federal Highway Administration, or the American Trucking Association. UIil u.s. The functions assessed include attention, spatial perception, reasoning, and response time. It allows a varied battery to be selected and compares results with pilot norms. The software runs on IBM-compatible computers. Its distribution is limited to qualified testing professionals. In Conclusion The use of performance measurement allows us to assess immediate, job-relevant APRIL 1996 • ERGONOMICS IN DESIGN 17