1.1. Background on Risk Assessment
The practice of occupational safety and health (OSH) has undergone a 50-year transition from being a mostly rule-following practice into a multi-faceted profession blending rules and risk management processes to achieve effective and feasible protection for employees, property, environment, and other business interests [
1,
2,
3]. Risk management today involves several processes, repeated periodically, to identify hazards, evaluate the associated risks, and assess various tactics for preventing and mitigating harm from those risks [
2,
3,
4]. A tool used for assessing and evaluating risks is referred to in the OSH field as a risk table, risk grid, risk matrix, or (our preference) risk assessment matrix (RAM) [
2,
3,
5,
6,
7,
8,
9,
10,
11].
RAMs appear as a two-dimensional grid with one axis having categories of harmful consequence and the other axis with categories for likelihood or probability. The cells inside the grid are used to indicate risk. Risk-assessment teams use RAMs as part of an organization-specific risk management process [
2,
3,
5,
7,
8,
11]. Although the details differ somewhat, a risk-management process involves: (1) identifying hazards and the associated risks, (2) determining tactics for reducing/mitigating each risk, also called risk treatment, (3) assessing the risks in terms of credible harmful consequences and likelihood of occurring, (4) evaluating each hazard-specific risk in terms of the organization’s tolerance for risk, (5) communicating with those affected, (6) implementing the approved risk-reduction tactics, and (7) following up by monitoring implementation and effectiveness. RAMs are tools used in Process 3 (risk assessment) and Process 4 (risk evaluation).
A RAM can be used in Process 3 to analyze risks of a specific hazard, document effect from each risk-reduction tactic, and provide useful information for Process 4. This involves following steps that can later be used to document having used due diligence or reasonable care (depending on the applicable legal system). The hazard-specific assessment process described by Jensen [
2] begins by using a RAM to establish a baseline risk by assuming the hazard has not yet incorporated any attempt to prevent or mitigate the harm. It involves judging the consequence of one or more foreseeable harmful event and the likelihood of occurrence. For each risk-reduction tactic added, the RAM is used to document the effect of that tactic by reducing severity or likelihood. This process is performed again and again, each time an additional risk-reduction tactic is considered, thereby, providing a documented trail of having taken safety seriously [
2]. Thus, an organization’s RAM serves as a core tool for use by risk-assessment teams to characterize risk in a systematic manner. Completed RAMs provide information in a visual format for Process 4 involving the evaluation of the risks and deciding if the organization can tolerate the remaining risks [
2,
3,
5,
6,
8,
9,
10,
11].
This paper provides background on the numerous variations in RAM designs, the means for characterizing level of risk, and options for helping the individuals who use RAMs to achieve reasonable accuracy and precision. A typical use of a RAM is to have a small team use it as a tool for assessing various hazards. In OSH, the people who serve on risk-assessment teams have varying backgrounds in education, experience with the types of hazards being assessed, and experience applying RAMs. Thus, in selecting an appropriate RAM for use by an organization involves recognizing that a RAM is a tool for use by people and should, therefore, be designed for human usability. At the very least, a RAM should be designed for usability by engineers, operations personnel, and others likely to be assigned to risk-assessment teams.
The substantial body of literature about RAMs reflects articles based on reasoning, experience, and expert opinion [
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18]. Few papers on RAMs report empirical research. The authors of this paper have identified four empirical studies on RAMs. Two studies examined how health service providers conduct risk management [
19,
20]. Card, Ward, and Clarkson reported a content analysis of health services organizations in the East of England area of the British National Health Service. They found the risk management systems were weak in two main areas: (i) guidance to support risk evaluation methods, including use of a RAM, and (ii) organizational guidance to support risk control [
19]. In a second empirical study, Kaya, Ward, and Clarkson sent requests to 160 hospitals in England for descriptions of the RAMs they use [
20]. Out of 100 responses, 99 used a 5-row by 5-column matrix similar to the one in
Figure 1. The 99 RAMs used the order number of rows and order of columns to fill the cells in the matrix with numbers obtained by multiplying the applicable order numbers. These numerals were used to sort cells with similar risk into bands identified by a particular color. In the study, each cell had a number ranging from one to 25; however, the healthcare providers differed in how cells were assigned to the colored levels of similar risk. This resulted in 28 different RAMs. The 99 hospitals used three, four, or five colored risk bands in their matrices [
20]. The number of bands and number of hospitals were as follows: three bands (23), four bands (70), and five bands (6).
In a third empirical study, Ball and Watt reported a campus study of using a 5 × 5 RAM to assign a risk score to three photos of public places with unprotected edges where deadly falls could occur [
12]. Their students had received basic instruction on the use of a RAM, but no specific training on how to judge likelihood or severity [
12]. They found students had poor accuracy and precision. In a fourth study, Jensen and Hansen surveyed undergraduates studying OSH to determine how they understand various words and phrases used in RAMs [
21]. Using results, the researchers identified sets of terms most suitable for naming the row and column categories in RAMs [
21]. This article provides background on RAMs followed by a description of this follow-on survey of individuals with at least two years of OSH-related experience undertaken with the aim to reexamine the prior recommended word sets to determine if the prior recommendations are confirmed, or if improvements are desirable.
1.2. Diverse Options for Design
Organizations may design and use a RAM of their choosing. This has the advantage of allowing organizations to match their needs and values. There are, however, many RAMs that contain inherent pitfalls, inconsistencies, and difficulties in usability [
8,
9,
10,
11,
12,
13,
14,
15,
16]. To explain the various ways that RAMs can differ, some terms need clarification.
Figure 1 serves as a point of reference RAMs come in different sizes, commonly described by the number of rows and number of columns. The size of the example in
Figure 1 is a 5 × 5. The size of a RAM affects the resolution—more categories mean greater resolution. While it appears desirable to have large resolution, the RAM designer should recognize that assigning categories for likelihood and severity is a subjective process that is not well suited for making fine distinctions between adjacent categories [
8,
12]. Therefore, as Baybutt advises, the number of levels “should be consistent with the ability of practitioners to discriminate between levels” [
8].
RAMs are presented in different orientations.
Figure 2 depicts possible orientations of a 3 × 3 RAM using the Cartesian coordinate system to establish the positive and negative directions of rows and columns. In each RAM, the green colored cell is the lowest risk; the red cell is the greatest risk. Panel a depicts a RAM in quadrant II. This is illustrated by MIL-STD-882E [
22] and others [
11,
14,
22]. This quadrant fits activities for which the horizontal axis applies to expected loss; the business community assigns a negative value to losses.
Figure 3b depicts a RAM in quadrant I. That is the location of RAMs emphasized in this paper and others [
6,
10,
12,
13,
16,
17,
18,
19].
Figure 3c is a location where both axes are negative. The authors did not find any examples of a RAM located in quadrant III.
Figure 3d depicts a RAM in quadrant IV. Three examples have been found [
7,
8,
23].
The columns in
Figure 1 are for amount of harm—commonly called severity or consequence. Severity and consequences may relate to either financial loss or harm to personnel or other. For OSH practice, the term severity is most conventional and is used throughout this paper. Columns are for distinguishing ordered categories of severity
A RAM needs a key containing a text description of each severity category to explain and illustrate what makes each column different from adjacent columns. Another essential attribute of the severity categories is that they must be put in order such that each is clearly greater than the next lower category [
8,
11,
13,
15]. In addition to the text description, each column has a header term at the top. In
Figure 1, the five column headers are indicated by variables C1, C2, C3, C4, and C5. The project described in this paper explored various terms for these column headers.
The rows in
Figure 1 are for the ordered categories of how likely the hazardous event or exposure will occur. Four ways to describe the row categories were used in this paper. Probability was used for quantitative ratings with values in the range 0.0–1.0 or a multiple of 10. Likelihood refers to qualitative judgments expressed numerically or nominally (without numbers). A third dimension included in the present study is extent of exposure, a term that includes measures used to account for employees very rarely exposed to a hazard versus employees regularly exposed to the hazard. Extent of exposure is expressed by the frequency or duration of employee exposures to the hazard per a specific unit of time, e.g., three times per year, three exposure-hours per week, 80 uses per month. Extent of exposure may be used as a third dimension of a RAM or may be incorporated within the rows of a 2-dimensional RAM by inclusion in the descriptions provided in the key. A dimension not studied in this survey is frequency; it is used in the process industries to distinguish rows categories in a RAM. Common uses include 1 death/10 years, 1 death per 100 years, and 1 death per thousand years. This project addressed sets of terms to replace the generic row headers in
Figure 1 (R1, R2, R3, R4, and R5).
For a specified hazard, the individuals participating in a risk assessment are expected to both foresee possible hazard scenarios and estimate how likely each may occur [
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17]. These projections must then be put into the column and row categories of the applicable RAM. Two aids for helping risk assessment team members select column and row categories that match their projections are, first, explicit descriptions in the RAM’s key, and second, the terms used to label each column and row category. The authors developed this project with intent to help RAM designers with the second of these aids—selecting sets of terms for both column and row headers.
The cells in a RAM indicate level or risk. Colors are often used to show groups of cells with similar risk levels, known as risk bands. In
Figure 1, red cells denote the highest risk band and green cells denote the lowest risk band. Yellow cells are those separating green and red cells. For OSH, a hazard rated in the green band is generally considered tolerable or acceptable, and a hazard in the red band is typically considered highly undesirable or not tolerable [
5,
6,
7,
8,
9,
10]. While the decisions associated with red and green cells are often stated as clear-cut rules, the preferred practice is to consider these as indicators to assist with making decisions [
8,
9,
10,
11,
12,
13,
14,
24]. Cells rated in the yellow band indicate a need for additional attention in order to reduce the risk to as low as reasonably practicable (ALARP) prior to deciding on tolerability. After achieving ALARP, the organization’s risk-assessment team uses the final RAM as a visual tool to communicate with the organization’s decision makers about tolerability [
18].
The basic definition of risk in Equation (1) provides the basis for using a table format [
2,
3,
6,
8,
9,
10,
11,
13]. According to Equation (1), the probability of a harmful event B occurring (
PB) is multiplied by expected loss, given that B occurred.
A risk assessment matrix provides an easily understood depiction of risk being based on the product of applicable values in the row (probability or likelihood) and column (severity). Although this approach has been a tradition in the field of system safety, the OSH community has, for various reasons, sought a less quantitative approach [
5,
7,
8,
9,
10,
11,
15,
19,
20].
The risk matrices in
Figure 3 illustrate three ways to express risk within the cells. Each matrix uses rows for likelihood and columns for severity. In
Figure 3a,b, the rows are numbered 1–5 in order from lowest to highest likelihood, and the columns are numbered 1–5 in order from least to greatest severity of harm. With that start, there are two ways to assign numerical risk indicators (RI
ij) to the cells. Using the notation that subscripts i and j refers to row and column, respectively, R refers to rows, and C refers to columns, one method is to determine the RI values in cells is RI
ij = R
i × C
j. That yields the values in the
Figure 3a matrix. The other method is to add the values using RI
ij = R
i + C
j. That yields the values in the
Figure 3b matrix [
6,
11]. The approach in
Figure 3a assumes the category-to-category increases are basically linear. The approach in
Figure 3b assumes the categories in both the rows and columns are spaced logarithmically so that each category is approximately 10 times greater than the next lower category [
6,
10,
11].
The third approach to quantify a risk matrix is to take the established row and column values, normalize each to a common scale (e.g., 0–1, 0–10, or 0–100), and use the normalized row and column matrix for establishing a less complex RAM, for which
Figure 3c is an example. The row and the column categories are then defined in terms of those values. In the
Figure 3c example, a 5 × 5 matrix may have a 10-point axis divided so that five equal width categories have upper bounds at 2, 4, 6, 8 and 10. The risk indicators in each cell are the product of the mid-range value of the respective row category (1, 3, 5, 7, 9) and the mid-range value of the respective column (1, 3, 5, 7, 9). This mid-point approach corresponds to instructing a RAM assessment team to assign severity categories based on the most representative sort of harm the team members can foresee, and likelihood categories based on the reasonably foreseeable chance of occurrence.
Several insightful papers have been positive on the approach of using the framework depicted in
Figure 3c [
8,
11,
12,
13,
17,
19,
20]. These authors of these papers expressly recognize the approach as being a simplified version of an underlying quantitative matrix. Mathematical justification for the approaches in
Figure 3b and
Figure 3c have been provided by Rausand [
6] (pp. 102–103) and Cox [
13], respectively.
The next challenge is to determine how to distinguish the cells for highest risk (colored red) from cells with lower risks (colored green). One approach is to follow the axioms developed by Cox [
13]; the other approach is to use the iso-risk contour-based method [
14,
24]. The RAM in
Figure 1 was created using the iso-risk contour method by which green cells were located below or left of the iso-risk line 20, and red cells were located above and right of the iso-risk line 45. For cells bifurcated by an iso-risk line, color was assigned based on the side of the line with the largest area of the cell.
Referring to the RAM in
Figure 1, the cells colored green have risk values per Equation (1) in the range 0–24, while the red cells have risk values in the range 36–100. The red-color band includes the upper right cell plus three adjacent cells. All cells not colored green or red are assigned the color yellow.
Breaking each axis into categories defined as portions of the full range helps with usability by the risk-assessment teams, first, by not asking assessors to understand the underlying mathematics, and, second, by not expecting them to spend countless hours discussing the precise number to use for each row and column value. Discussions of RAMS frequently include a distinction between qualitative and quantitative forms. A quantitative RAM, for example, has probability values for the row categories, monetary values for the columns, and the cells values are computed with Equation (1) resulting in risk values in monetary units. Qualitative RAMs have rows and columns defined nominally and cells assigned risk categories such as high, medium, and low [
2,
17]. Cox, Babayev, and Huber [
17] provide examples of regulatory agencies that use this approach. A third form of RAM, often called semi-quantitative, has each axis divided into ordered categories and assigned numerical values based on their order.
Figure 3a,b are examples. A fourth type of RAM, illustrated in
Figure 1 and
Figure 3c, consists of (i) both axes using linear scaling and the same range (e.g., 0–10), and (ii) risk indicated by the product of the respective row and column values.
Appendix A provides a conceptual explanation of how this fourth type of RAM can approximate an underlying quantitative relationship based on Equation (1).
The domain of application may, or may not, warrant different matrices. Employers using, or planning to adopt, a RAM need to ponder some things about the hazards involved [
8,
11]. In what kind of industry will the RAM be used? For what types of hazards will the RAM be used as a tool for risk assessment? Related to this issue is the temptation to have one RAM for all applications in the organization. This approach has been criticized by multiple authors who recommend different RAMs for different consequences, e.g., employee safety, property damage, environmental harm, business interruption, or community relations [
9,
14,
15]. Baybutt [
10] recognized the pitfalls of using one matrix for diverse domains and proposed a method for calibrating the matrix for different domains within an organization.
Another domain-related matter is defining the role of risk-scoring using the RAM to drive the decision on tolerability of a particular risk. Multiple authors advise against using locations on a RAM (risk band) as the decision maker for tolerability of a hazard [
8,
11,
12,
13]. The concern about this is it extends the responsibility of risk-assessment team members to doing both the risk assessments (Process 3) and making decisions about tolerability (Process 4) without having all the information needed such as cost-benefit information.
1.3. Usability Issues
Members of a risk-assessment team will likely have differing opinions on assigning a hazard to a specific cell in their matrix. For that reason, RAMs should be designed to help the team members decide on the most appropriate row and column category. Three matrix attributes for helping risk-assessment team members make accurate and precise assignments to row and column categories are having: (i) a clear order to categories in each axis, (ii) descriptions of each category so that categories are distinguishable, and (iii) header terms that are clearly ordered and distinguishable. The third of these attributes has been the subject of only one previous study [
21], and that was based on a survey of undergraduate OSH students. That left open an issue of how closely results of the undergraduate survey might correspond to ratings by individuals with OSH-related work experience.
Multiple usability issues involve the accuracy and precision of risk based on the judgment of risk-assessment teams. These estimates of risk are used by some organization to help set priorities for corrective actions A second use is to help decide if the risk-reduction tactics have reduced the risk of a hazard to the level of being tolerable or acceptable. Both uses are important to employee safety and health [
9,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23]. An example opinion expressed by Ale, Burnup, and Slater [
9] is that using RAMs to prioritize risk-reduction processes may provide informative input, but should not be taken as a primary driver for prioritization. Similar opinions by other authors are that risk levels resulting from a risk-assessment team are not sufficiently accurate or precise to rely on as a sole determinant of risk tolerability [
12,
13]. Four implications of these opinions are that organizations need to make strong efforts to achieve accurate and precise entries into RAMs by (i) assigning competent individuals to risk-assessment teams, (ii) training risk-assessment team members for improving both accuracy and precision of assessments, (iii) providing team members with adequate time to do their assessments well, and (iv) adopting RAMs designed for usability.
The complexity of RAMs can contribute to usability. The form used in
Figure 1 of this paper was based on both axes being linear and having equal ranges. Cox [
13] presents justification for using that form of RAM for reasons including understandability, simplicity, and usability by risk assessors dealing with occupational hazards. He advises that three colored bands should be enough for RAMs designed for people estimating the row and column categories for a particular hazard. Cox also explained a rule to avoid having a green cell share an edge with a red cell. This reflects the reality that a risk-assessment team cannot be expected to reliably distinguish between adjacent categories of either scale. Having green and red cells share an edge invites misclassification errors, or what the human factors practitioners call design-induced errors.
The matrix format in
Figure 3c has been discussed by numerous authors in papers about the spacing of categories [
8,
9,
10,
11,
12,
13,
14]. A strength of this format is providing flexibility for a RAM designer to define the number of categories in each row and each column. While the common practice is to make equal width categories, unequal width categories may be used. For example, a five-category severity axis could be grouped so that the least harm category has the narrowest range while the greatest harm category has the widest range. Another example is setting the upper bounds of five likelihood categories at 1, 3, 5, 7, 10 [
23]. Pons proposed simplifying required risk assessments by defining severity categories to align with those found in the applicable legislation [
15].
Thus far in this article, the topic has been exclusively about two-dimensional risk matrices. These have been criticized for not including enough factors; in particular, the dimension of exposure is not included [
11,
21]. This concern may be addressed by either incorporating exposure into the likelihood dimension or adding a third dimension to account for extent of exposure. Terms for such a dimension were included in both the earlier study [
21] and this follow-on study.
Another usability issue for RAM designers—selecting the terms for row and column headers—is an important attribute of RAMs that has received little attention. Duijm [
11] commented that “the ways axis categories are defined and described” effects the subjective row and column category assignments. Baybutt [
8] states that “different terms should not be used when the same meaning is intended”. He offered as an example naming adjacent severity categories with terms having essentially the same meaning, citing as examples significant injury and major injury. Duijm [
11] pointed out the need to name categories on a single axis with clearly different descriptors and offered the following examples of misnaming adjacent categories by using terms that are listed as synonyms in a dictionary.
Improbable and seldom.
Often, frequent, and probable.
Disastrous and catastrophic.
Although Duijm’s examples were based on synonyms found in a dictionary, further support was subsequently provided by the survey of undergraduate OSH students reported by Jensen and Hansen [
21]. They found that ratings on a 100-point likelihood scale were very close for the words improbable and seldom (mean 18.7 vs. 19.7 and median 20 vs. 18) as well as for frequent and probable (mean 72.0 vs. 68.2 and median 72.5 vs. 70.0). These authors also pointed out that MIL-STD-882E [
22] uses the synonyms frequent and probable as labels for adjacent probability categories [
21].