Assessment of Mental Workload Using Physiological Measures with Random Forests in Maritime Teamwork

Zhang, Yu; Zhang, Yijing; Cui, Xue; Li, Zhizhong; Liu, Yuan

doi:10.1007/978-3-030-49044-7_10

Yu Zhang¹⁰,
Yijing Zhang¹¹,
Xue Cui¹⁰,
Zhizhong Li¹⁰ &
…
Yuan Liu¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12186))

Included in the following conference series:

International Conference on Human-Computer Interaction

5741 Accesses
4 Citations

Abstract

Assessment of mental workload plays an important role in adaptive systems to perform dynamic task allocations for teamwork onboard. In our study, workload assessment models were established based on EEG, Eye movement, ECG, and performance data, respectively. The data were collected from team subjects operating maritime target identification and coping device allocation tasks collaboratively in a computer simulation program. Physiological measures were collected from wearable sensors, and the team workload was self-assessed using the Team Workload Questionnaire (TWLQ). Mental workload models were trained by the random forests algorithm to predict team workload with self-reported TWLQ measure as reference and physiological measures and objective performance measures as inputs. The low levels of MAPE (Mean Absolute Percent Error) suggested that these measures can be used to provide accurate assessment of operator mental workload in the tested type of maritime teamwork. This study demonstrates the possibility to assess operator status according to physiological measures, which could be employed in adaptive systems.

You have full access to this open access chapter, Download conference paper PDF

The Mental Machine: Classifying Mental Workload State from Unobtrusive Heart Rate-Measures Using Machine Learning

Mental Workload Assessment in Human–Computer Interaction Multitasking Environment Based on Multimodal Physiological Signals

A framework to estimate cognitive load using physiological data

Article Open access 27 September 2020

Keywords

1 Introduction

Modern work in safety-critical systems is often characterized by teamworking with a high level of informatization and simultaneous multitasking. During the work process, an operator needs to maintain high concentration of attention for a long time, which may cause high level of mental workload. Prolonged high mental workload can degenerate the operator’s status, which will further lead to degraded performance and errors, even serious accidents sometimes. According to the report released by the American Bureau of Shipping about maritime accidents, approximately 80–85% of accidents involved human errors and about 50% were initially caused by human errors (Baker and Seah 2004). Human error has been identified as a major contributor to maritime accidents just like in some other industries. To this end, assessing mental workload of operators accurately is crucial to the implementation of an adaptive system to support maritime teamwork onboard.

Workload has been defined as a set of task demands, as effort, and as activity or accomplishment (Hartman and McKenzie 1979). Similarly, mental workload could be defined as the amount of mental effort that an individual uses to perform tasks in essence (Gao et al. 2013). Unfortunately, up to now, mental workload is still an ill-defined construct (Rizzo et al. 2016). There is still not a generally accepted definition for the 50-year old construct. Mental workload is induced not only by cognitive demands of tasks but also by other factors, such as stress, fatigue and the level of motivation (Sheridan and Stassen 1979). There have been hundreds of studies on the measurement of mental workload in the past half-century. Many different techniques have been applied in mental workload measurement, including physiological measurements, dual-task (or secondary task) methods, primary task measurements, attention allocation and subjective measurements (Sheridan and Stassen 1979). Continuous and objective measurements instead of overall subjective measurements are desired in adaptive systems (Rusnock and Borghetti 2018), which puts forward a higher requirement to measurement techniques and data collection. Fortunately, neurological and physiological measurements have great potential for assessing workload in complex tasks (Parasuraman and Wilson 2008).

Until now, the state-of-the-art computational models for mental workload are mainly theory-driven instead of data-driven (Moustafa et al. 2017). However, inductive research methodologies can also be applied to create mental workload models from data and produce alternative inferences just like in other scientific fields. Regarding computational models, a model could be trained directly with measurable variables by finding patterns and relationships between these variables and corresponding performance measures (Dearing et al. 2019). At present, one of the most popular research fields devoted to the development of inductive models is machine learning, which is a sub-field of Artificial Intelligence (AI). Because of the multifaceted characteristics of mental workload, the ambiguity and uncertainty associated with many non-linear variables shaped this construct. Moreover, the difficulties associated with the aggregation of these variables and the development of computational models made the need to use machine learning to arise.

With the acceleration and spread of machine learning in the past two decades (Bishop 2006), researchers initiated to investigate mental workload using inductive data-driven research methodologies (Borghetti et al. 2017; Dearing et al. 2019; Lee and Tan 2006; McKendrick et al. 2019; Moustafa et al. 2017; Rusnock et al. 2015; Zhang et al. 2004). Moustafa et al. (2017) presented a study investigating the capability of a selection of supervised machine learning classification techniques to produce data-driven computational models of mental workload, which tended to outperform the NASA Task Load Index and the Workload Profile in the prediction of human performance (concurrent validity). Besides, McKendrick et al. (2019) investigated two machining learning algorithms for classifying mental workload based on fNIRS signals, and the Rasch model labeling paired with a random forest classifier led to the best model fits and showed an evidence of both cross-person and cross-task transfer. Instead of considering workload categorically, finer-grain automation decisions are possible by representing workload numerically. Besides using machine learning algorithms for classification, some studies have been conducted by regression analysis. Smith et al. (2015) examined the efficacy of using two regression-tree alternatives (random forests and pruned regression trees) to decrease workload estimation cross-application error based on a remotely piloted aircraft simulation. Borghetti et al. (2017) used Random forests algorithms on electroencephalogram (EEG) input to infer operator workload based upon IMPRINT (Improved Performance Research Integration Tool) workload model estimates.

The vast majority of previous studies for mental workload based on machine learning have measured single one physiological variable, such as EEG (Borghetti et al. 2017; Heger et al. 2010). Meanwhile, the mental workload assessment or prediction was performed for individuals instead of teams. In this paper, we reported our initial effort to induce data-driven workload models for maritime teamwork with different physiological variables as inputs. We chose the Random forests algorithm based on past modeling success (Borghetti et al. 2017; Moustafa et al. 2017; Smith et al. 2015).

2 Methodology

2.1 Dataset

In this study, we utilized an existing dataset consisting of multiple physiological measures collected as part of a previous study conducted by the authors’ lab. In total 108 male university students were recruited as the participants, who were divided into 54 two-operator teams randomly in the experiment. The two-operator teams were asked to complete maritime target identification and coping device allocation tasks collaboratively in a computer simulation program. All the participants were undergraduates or graduates in STEM programs with normal or correct-to-normal vision.

While the participants performing tasks in teams, physiological measures (see Table 1) were recorded, including eight channels of brain electrical activities (by NeuroFlight Cognitive Assessment Training Analysis System at a rate of 512 Hz), eye movement (by SMI IviewX RED Eye Tracker), and heart measures (by Bioharness Physiology Monitoring System at a rate of 250 Hz). Furthermore, the objective performance data in the experiment were also recorded automatically.

Table 1. Summary of dataset variables.

Full size table

Each team should complete all the required operations within a time limit, otherwise, the target disappeared and the platform would record it as an uncompleted target. Each team performed 24 trials for each of the six scenario complexity levels at a certain level of time pressure and filled out the Team Workload Questionnaire (TWLQ) (Sellers et al. 2014) after each treatment to measure their subjective team workload. The subjective scores served as the outputs to train our machine learning model.

2.2 Workload Model Training

In this study, we pursued to predict operator mental workload from physiological data using a machine learning method. The Random Forests method was selected as the machine learning algorithm due to its resistance to overfitting, its simplicity, and its ability to model nonlinear data (Breiman 2001). It is a commonly used and effective algorithm in the field of machine learning and data analysis. Random forests are integrated decision trees, which can reduce the variance of the overall model. Furthermore, by constructing a multitude of decision trees at the training stage and outputting the mean prediction of the individual trees for regression, random forests can correct decision trees’ issue of overfitting to their training data set (Breiman 2001). Compared with one single decision tree, random forests usually have better generalization performance. Besides, random forests are not sensitive to outliers in the data set and do not require excessive parameter tuning.

Taking into account the characteristics above, the supervised random forest algorithm was selected to train the workload model. We optimized only two tuning parameters: the number of features randomly sampled as candidates at each internal split node and the number of trees in the forests (Liaw and Wiener 2001). The number of features to set the size of the random subset of features was considered when splitting a node. The lower the number, the more variance is reduced whereas bias is also increased. The number of trees in the forest controls how many decision trees the forest contains. Added trees generally mean better, but would increase computation time and lead to performance plateaus as the number increases. Regression Model can be trained and tested using the randomForest() functions (Liaw and Wiener 2001) that are available in the randomForest R package. In our study, Random forests algorithm was employed on different physiological measures as inputs to infer subjective operator workload measured by TWLQ. All the variables in the physiological data set were averaged across trials of every scenario complexity level for each team before used to train the model. The MAPE (Mean Absolute Percent Error) of the regression model was selected to test the utility of the model (Rayer 2007; Swanson et al. 2011).

3 Results and Discussion

3.1 EEG-Based Assessment Model

After removing the missing values from the EEG data set, the data set was divided into a training set and a test set according to a ratio of 7 to 3, resulting in 158 training data records and 70 test data records. The first step of modeling was to find the optimal parameter mtry, that is, the optimal number of variables for the binary tree in every specified node. By calculating the increase of the random forest error after each variable was removed, the value corresponding to the smallest error is found. The second step was to find the optimal parameter ntree, which is number of decision trees, as shown in Fig. 1. When ntree was 200, the error in the model tended to be stable.

The important() function can induce the importance of the variables filtered by the random forest model, where %IncMSE indicates the Out-of-bag error when this variable is excluded (The Bootstrap method cannot draw all the samples, but only about 2/3 of the samples can be drawn each time. The remaining 1/3 is used as an out-of-bag observation to test the model). IncNodePurity represents the total amount of node impurity reduction caused by this variable, which is shown in Fig. 2.

By observing Fig. 2, the three most important variables can be found to be Alpha1 relative power, Gamma1 relative power, and Theta relative power. The random forest generated 200 trees, and the three variables were randomly selected at each split.

The third step of modeling is to set the model parameters and build a random forest model. According to the calculation results above, the random forest generated 200 trees, and 3 variables were selected at each split. To test the utility of the model, we input the test set into the model, and the output is shown in Fig. 3. We called the regr.eval() function in the DMwR package and got that MAPE of the EEG-based assessment model was 19.34%.

3.2 Eye Movement-Based Assessment Model

After removing the missing values from the eye movement data set, the data set was divided into a training set and a test set according to a ratio of 7:3, resulting in 65 training data records and 25 test data records. Firstly, we found the optimal parameter mtry was 5. The second step was to find the optimal ntree. When ntree was 400, the error in the model tended to be stable. By calling the important () function, the three most important variables we obtained were total blink time, average pupil size of X, and the number of saccades. The random forest generated 400 trees, and 5 variables were randomly selected at each split. The test set was input to the model, and the output was shown in Fig. 4. By calculation, we found that the MAPE of the eye movement-based assessment model was 16.03%.

3.3 ECG-Based Assessment Model

Similar to the operation above, the data set was divided into a training set and a test set according to a ratio of 7 to 3 after removing the missing values from the collected ECG data set. We obtained 116 training data records and 52 test data records. The first step was to find the optimal parameter mtry. By calculating the increase of the random forest error after each variable was removed, the value with the smallest corresponding error was found, and mtry was 3. The second step was to find the optimal parameter ntree. When ntree was 400, the error in the model tended to be stable. The three most important variables obtained were HF, LF, and TP by calling the important () function. The random forest generated 400 trees, and 3 variables were randomly selected at each split. The test set was input to the model, and the output was shown in Fig. 5. The calculation result of MAPE was 15.87% for the ECG-based assessment model.

3.4 Performance-Based Assessment Model

After removing missing values from the performance data set, the data set was divided into a training set and a test set according to a ratio of 7 to 3. We obtained 214 training data records and 91 test data records. The first step was to find the optimal parameter mtry. By calculating the increase of the random forest error after each variable was removed, the value corresponding to the smallest error was found, and the mtry was 8 by calculation. The error in the model tends to be stable when ntree was 300. By calling the important () function, we found the three most important variables were the total completion rate, device allocation time of operator 2, and identity judgment time of operator 1. The random forest generated 300 trees, and 8 variables were randomly selected at each split. The test set was input to the model, and the output was shown in Fig. 6. The calculation results of MAPE was 17.09% for the performance-based assessment model.

4 Discussion

Mental workload assessment models using different physiological measures were established, including assessment models based on EEG, eye movement, ECG and objective performance. From the criterion of MAPE, the error of the ECG-based assessment model was the minimum, followed by the eye movement-based assessment model, and then the performance-based assessment model. The EEG-based assessment model owned the maximum error. Observing the number of variables as the inputs of each model, we found that the performance-based assessment model had the most number of best variables (mtry) for the binary tree in every specified node, the eye movement-based assessment model used the least number of variables, accounting for only about 1/7. For the eye movement-based assessment model, the sample size provided by the original data set was 324, but there were many missing values and outliers. Totally 90 valid samples remained after data cleaning, and the training set and test set were divided according to a 7 to 3 ratio. Thus only 65 samples were left for the training of the model, so the MAPE of the model was relatively large. The number of decision trees in the random forests constructed by the four models was not much different. A detailed comparison of four models was shown in Table 2.

Table 2. Detailed comparison table of four models.

Full size table

5 Conclusion

Focusing on the effective assessment of mental workload, many researchers have carried out various work, including the classification of workload levels, determination of workload values, monitoring of continuous workload during work, and also evaluation of the workload level from designated tasks. This study was conducted based on the data under different time pressure and scenario complexity levels. We attempted to construct an effective workload assessment model through the random forests algorithm. Three types of physiological data were measured concurrently including EEG, ECG, and eye movements. Assessment models were compared from MAPE (Mean Absolute Percent Error), optimal number of variables (mtry), and number of decision trees (ntree). We found that the MAPE of the ECG-based model was the minimal, the performance-based assessment model employed the most number of best variables (mtry) for the binary tree in every specified node, while the eye movement-based assessment model used the least number of variables, accounting for only about 1/7. This study demonstrated the possibility to assess the mental workload of operator onboard according to different physiological measures in maritime teamwork.

References

Baker, C.C., Seah, A.K.: Maritime accidents and human performance: the statistical trail. In: MarTech Conference, Singapore (2004)
Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)
MATH Google Scholar
Borghetti, B.J., Giametta, J.J., Rusnock, C.F.: Assessing continuous operator workload with a hybrid scaffolded neuroergonomic modeling approach. Hum. Factors 59(1), 134–146 (2017)
Article Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/A:1010933404324
Article MATH Google Scholar
Dearing, D., Novstrup, A., Goan, T.: Assessing workload in human-machine teams from psychophysiological data with sparse ground truth. In: Longo, L., Leva, M. (eds.) H-WORKLOAD 2018. CCIS, vol. 1012, pp. 13–22. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-14273-5_2
Chapter Google Scholar
Gao, Q., Wang, Y., Song, F., Li, Z., Dong, X.: Mental workload measurement for emergency operating procedures in digital nuclear power plants. Ergonomics 56(7), 1070–1085 (2013)
Article Google Scholar
Hartman, B., Mckenzie, R.E. and Advisory Group for Aerospace Research Development Neuilly-Sur-Seine: Survey of Methods to Assess Workload (1979)
Google Scholar
Heger, D., Putze, F., Schultz, T.: Online workload recognition from EEG data during cognitive tests and human-machine interaction. In: Dillmann, R., Beyerer, J., Hanebeck, U.D., Schultz, T. (eds.) KI 2010. LNCS (LNAI), vol. 6359, pp. 410–417. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16111-7_47
Chapter Google Scholar
Lee, J., Tan, D.: Using a low-cost electroencephalograph for task classification in HCI research. In: Proceedings of the 19th Annual ACM Symposium on User Interface Software and Technology, pp. 81–90 (2006)
Google Scholar
Liaw, A., Wiener, M.: Classification and regression by RandomForest. Forest 23 (2001)
Google Scholar
Mckendrick, R., Feest, B., Harwood, A., Falcone, B.: Theories and methods for labeling cognitive workload: classification and transfer learning. Front. Hum. Neurosci. 13, 295 (2019)
Article Google Scholar
Moustafa, K., Luz, S., Longo, L.: Assessment of mental workload: a comparison of machine learning methods and subjective assessment techniques. In: Longo, L., Leva, M.C. (eds.) H-WORKLOAD 2017. CCIS, vol. 726, pp. 30–50. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-61061-0_3
Chapter Google Scholar
Parasuraman, R., Wilson, G.: Putting the brain to work: neuroergonomics past, present, and future. Hum. Factors: J. Hum. Factors Ergon. Soc. 50(3), 468–474 (2008)
Article Google Scholar
Rayer, S.: Population forecast accuracy: does the choice of summary measure of error matter? Popul. Res. Policy Rev. 26(2), 163 (2007). https://doi.org/10.1007/s11113-007-9030-0
Article Google Scholar
Rizzo, L., Dondio, P., Delany, S.J., Longo, L.: Modeling mental workload via rule-based expert system: a comparison with NASA-TLX and workload profile. In: Iliadis, L., Maglogiannis, I. (eds.) AIAI 2016. IAICT, vol. 475, pp. 215–229. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-44944-9_19
Chapter Google Scholar
Rusnock, C., Borghetti, B., McQuaid, I.: Objective-analytical measures of workload – the third pillar of workload triangulation? In: Schmorrow, D., Fidopiastis, C. (eds.) AC 2015. LNCS (LNAI), vol. 9183, pp. 124–135. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-20816-9_13
Chapter Google Scholar
Rusnock, C.F., Borghetti, B.J.: Workload profiles: a continuous measure of mental workload. Int. J. Ind. Ergon. 63, 49–64 (2018)
Article Google Scholar
Sellers, J., Helton, W., Näswall, K., Funke, G., Knott, B.: Development of the team workload questionnaire (TWLQ). Proc. Hum. Factors Ergon. Soc. Ann. Meet. 58(1), 989–993 (2014)
Article Google Scholar
Sheridan, T.B., Stassen, H.G.: Definitions, models and measures of human workload. In: Moray, N. (ed.) Mental Workload: Its Theory and Measurement, pp. 219–233. Plenum Press, New York (1979)
Chapter Google Scholar
Smith, A.M., Borghetti, B.J., Rusnock, C.F.: Improving model cross-applicability for operator workload estimation. Proc. Hum. Factors Ergon. Soc. Ann. Meet. 59(1), 681–685 (2015)
Article Google Scholar
Swanson, D.A., Tayman, J., Bryan, T.M.: MAPE-R: a rescaled measure of accuracy for cross-sectional subnational population forecasts. J. Popul. Res. 28(2), 225–243 (2011). https://doi.org/10.1007/s12546-011-9054-5
Article Google Scholar
Zhang, Y., Owechko, Y., Zhang, J.: Driver cognitive workload estimation: a data-driven perspective. In: Proceedings of the 7th International IEEE Conference on Intelligent Transportation Systems (IEEE Cat. No.04TH8749), pp. 642–647 (2004)
Google Scholar

Download references

Acknowledgment

This study was supported by Project No. JCKY2016206A001 and No. 6141B03020604.

Author information

Authors and Affiliations

Department of Industrial Engineering, Tsinghua University, Beijing, 100084, People’s Republic of China
Yu Zhang, Xue Cui & Zhizhong Li
Department of Industrial Engineering, Beijing University of Civil Engineering and Architecture, Beijing, 100044, People’s Republic of China
Yijing Zhang
China Institute of Marine Technology and Economy, Beijing, 100081, People’s Republic of China
Yuan Liu

Authors

Yu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yijing Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xue Cui
View author publications
You can also search for this author in PubMed Google Scholar
Zhizhong Li
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yijing Zhang .

Editor information

Editors and Affiliations

Coventry University, Coventry, UK
Don Harris
Cranfield University, Cranfield, UK
Wen-Chin Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Y., Zhang, Y., Cui, X., Li, Z., Liu, Y. (2020). Assessment of Mental Workload Using Physiological Measures with Random Forests in Maritime Teamwork. In: Harris, D., Li, WC. (eds) Engineering Psychology and Cognitive Ergonomics. Mental Workload, Human Physiology, and Human Energy. HCII 2020. Lecture Notes in Computer Science(), vol 12186. Springer, Cham. https://doi.org/10.1007/978-3-030-49044-7_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-49044-7_10
Published: 10 July 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-49043-0
Online ISBN: 978-3-030-49044-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics