Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Cognitive Effort Measures Driven by Fixation Induced Retinal Flow in Visual Scanning Behavior during Virtual Driving

Runlin Zhang, Qing Xu
College of Intelligence and Computing
Tianjin University
Tianjin
{runlin, qingxu}@tju.edu.cn
&Simon Parkinson
School of Computing and Engineering
University of Huddersfield
Huddersfield
S.Parkinson@hud.ac.uk
\ANDKlaus Schoeffmann
Institute of Information Technology
Alpen-Adria Universitat Klagenfurt
Klagenfurt
ks@itec.aau.at
&Yu Chen
School of Foreign Languages
Southeast University
Nanjing
Abstract

In this paper, we consider the problem of visual scanning mechanism underpinning sensorimotor tasks, such as walking and driving, in dynamic environments. We exploit eye tracking data for offering two new cognitive effort measures in visual scanning behavior of virtual driving. By utilizing the retinal flow induced by fixation, two novel measures of cognitive effort are proposed through the importance of grids in the viewing plane and the concept of information quantity, respectively. With psychophysical studies, two proposed cognitive effort measures have shown their significant correlation with widely used objective measurements of cognitive effort. Our results suggest that the quantitative exploitation of eye tracking data provides an effective approach for the evaluation of sensorimotor activities.

Keywords Virtual/Augmented Reality  \cdot Information Theory  \cdot Eye Tracking

1 Introduction

Visual scanning and eye tracking are important for the living of any human in natural surroundings [1, 2]. Visual scanning is indeed the foundation for a human to perform common and everyday sensorimotor tasks, such as walking and driving. Actually, the understanding of the mechanism behind visual scanning has been valuable since late 1970 and, is especially helpful and beneficial to making stark and essential progress in both theoretical and practical perspectives [3].

Basically, it is significant to make clear how much a human can achieve for sampling visual information through visual scanning, bearing one of the most fundamental points involved in the understanding of visual scanning mechanism [3]. Cognitive effort is an important approach to comprehending cognitive control and visual scanning behavior [4]. Actually, cognitive effort basically depicts subjective engagement for assessing the human’s internal state during tasks [5]. Thus, cognitive effort plays an important role in visual scanning and visuo-motor behavior, but how to measure and in particular how to objectively assess the cognitive effort have been a paramount concern in both theory and practice [4, 6].

It is noted that the visual motion, which always occurs during the visual scanning behavior of sensorimotor tasks [7], has been rarely touched in the cognitive effort measure. In this paper, based on the so-called fixation induced retinal flow [8] that is a quantitative description for the visual motion in the visual interactive perception on environmental stimuli, the importance of grids in the viewing plane is developed. A cognitive effort measure, called the view importance based cognitive effort measure (CEMVI𝐶𝐸subscript𝑀𝑉𝐼CEM_{VI}italic_C italic_E italic_M start_POSTSUBSCRIPT italic_V italic_I end_POSTSUBSCRIPT), is proposed, through employing Shannon entropy based complexity [9] of the probability distribution of the view importance of grids. Still, based on the fixation induced retinal flow, the amount of perception of the visual motions during sensorimotor tasks is obtained using the square root of Jensen-Shannon divergence, which is a true mathematical metric [10]. Then, in terms of the concept of information quantity [9], the perceived amount of visual motions is transformed to be utilized for satisfactorily defining the information quantity based cognitive effort measure (CEMIQ𝐶𝐸subscript𝑀𝐼𝑄CEM_{IQ}italic_C italic_E italic_M start_POSTSUBSCRIPT italic_I italic_Q end_POSTSUBSCRIPT) to understand the cognitive status of humans during driving tasks. To the best of our knowledge, this is the first time, based on the exploitation of visual motion and a true mathematical metric, to effectively define the quantitative and objective evaluations of the classical and subjective cognitive effort. Our proposal paves a novel path for behaviometric discovery by the utilization of eye tracking data.

2 Related Works

An initial consideration for discussing the measurement of the visual scanning behavior is to select suitable eye tracking indices [2] naturally associated with cognitive processing. Fixation and saccade [2] are classic indices for this purpose. But, direct and indirect usages of these indices (for examples, rate/duration of them and their simple combinations) are more applicable in specific application scenarios, rather than in the general evaluation of visual scanning behavior [11]. In addition, pupil dilation and blink rate are two widely used eye tracking indices for the study of cognition and psychology [12]. Notice specifically that eye tracking has been expected as a strong estimator for task performance in many professional fields [13].

Basically, the knowing of cognitive effort plays a significant character in the procedure of all kinds of cognitive processing [5, 6, 14]. The cognitive effort, which has started within educational psychology, is a classic measure for subjective engagement [5]. From the perspective of cost-benefit decision-making, the cognitive effort is deemed as an amplitude or intensity of behavior in the fulfillment of cognitive control for accomplishing tasks [14]. The assessment of cognitive effort, which is used for the estimation of the human’s internal state, has been largely encouraged in the area of ergonomics and human factors [6]. And undoubtedly, the measurement of cognitive effort in the domain of visual scanning and visuo-motor has attracted a lot of attention in both theory and practice [4]. But, the visual motion, which usually appears during sensorimotor activities, has not been used in the measurement of cognitive effort. In addition, evaluating the cognitive effort, particularly in an objective way, bears a big challenge [6].

The visual motion perceived during a fixation in a sensorimotor task, which is the so-called fixation induced retinal flow in this paper, has been introduced based on the concepts of eye tracking and optical flow in the literature of visual scanning and visuo-motor [8]. Considering that the fixation induced retinal flow is very important as the fundamental basis for establishing the two proposed measures, its methodology is specially depicted in Section 3.1.

Additionally, previous research has demonstrated numerous eye movement measures related to cognitive effort in humans during tasks. For example, when a driver’s cognitive load increases, the driver’s periphery/mirror/instrument check rate (hereafter referred to as check rate) tends to decrease [15]. Stationary Gaze Entropy (SGE𝑆𝐺𝐸SGEitalic_S italic_G italic_E) is used to measure the level of fixation dispersion during the eye scanning process [3]. Shiferaw et al. found that during driving, if the driver is in an abnormal state such as hungover or fatigued, their SGE𝑆𝐺𝐸SGEitalic_S italic_G italic_E shows significant changes [3]. Entropy rate, a concept in information theory, is used to describe the rate at which a random process generates information. In more specific applications, it can quantify the uncertainty and complexity of a signal or data sequence. It has also been confirmed to change with variations in cognitive load [16].

3 Methods

3.1 The Retinal Flow induced by Fixation and Visual Scanning Efficiency

The retinal flow induced by a fixation is introduced based on the identification of the visual motion resulted from a fixated stimulus [8].

Refer to caption


Figure 1: An illustration of retinal flow Insubscript𝐼𝑛I_{n}italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT induced by fixation fnsubscript𝑓𝑛f_{n}italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT.

A fixation fnsubscript𝑓𝑛f_{n}italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT (its index is n𝑛nitalic_n) with duration τnsubscript𝜏𝑛\tau_{n}italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT by an observer O𝑂Oitalic_O in a 3D environment is shown in Fig. 1. Here, ρnsubscript𝜌𝑛\rho_{n}italic_ρ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is the distance between O𝑂Oitalic_O and fnsubscript𝑓𝑛f_{n}italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT in the direction of viewing, and ρnsubscript𝜌𝑛\rho_{n}italic_ρ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is the depth of fnsubscript𝑓𝑛f_{n}italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT from the perspective of O𝑂Oitalic_O. Considering, during τnsubscript𝜏𝑛\tau_{n}italic_τ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, there is a relative motion displacement unsubscript𝑢𝑛\vec{u_{n}}over→ start_ARG italic_u start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG happened to fnsubscript𝑓𝑛f_{n}italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, and υnsubscript𝜐𝑛\vec{\upsilon_{n}}over→ start_ARG italic_υ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG is taken as the optical flow vector [7] for the fixated stimulus by fnsubscript𝑓𝑛f_{n}italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT . Actually, υnsubscript𝜐𝑛\vec{\upsilon_{n}}over→ start_ARG italic_υ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG is the projection of unsubscript𝑢𝑛\vec{u_{n}}over→ start_ARG italic_u start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG in the direction of optical flow vector, and thus perpendicular to the direction of viewing. For computation simplicity, the length of trajectory segment of the circular motion of fnsubscript𝑓𝑛f_{n}italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT centered on O𝑂Oitalic_O, mnsubscript𝑚𝑛m_{n}italic_m start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, acts as an approximated magnitude of υnsubscript𝜐𝑛\vec{\upsilon_{n}}over→ start_ARG italic_υ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG. The central angle Insubscript𝐼𝑛I_{n}italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT subtended by mnsubscript𝑚𝑛m_{n}italic_m start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT,

In=mnρn,subscript𝐼𝑛subscript𝑚𝑛subscript𝜌𝑛I_{n}=\dfrac{m_{n}}{\rho_{n}},italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = divide start_ARG italic_m start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG italic_ρ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG , (1)

is further used to define a perceived magnitude of optical flow, for characterizing the amount of visual motion perceived by the observer during the fixation fnsubscript𝑓𝑛f_{n}italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. Insubscript𝐼𝑛I_{n}italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is called as the fixation induced retinal flow in this paper, because this quantity represents the amount of perceived optical flow in the course of a fixation. The definition of Insubscript𝐼𝑛I_{n}italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT meets the usual practice that, angle is widely used as the representation of magnitude or amount in eye tracking [2]. Note Insubscript𝐼𝑛I_{n}italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT explicitly uses the depth cue ρnsubscript𝜌𝑛\rho_{n}italic_ρ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, enabling the fixation induced retinal flow to convey this important and special cue in 3D environments.

Visual scanning is done by a performer through a sequence of fixations, so that meeting the requirement of visual sampling the surrounding environments for fulfillment of a sensorimotor task. The probability distribution of fixations Pfsubscript𝑃𝑓P_{f}italic_P start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT, namely the fixation distribution, which is built up based on the normalized histogram of fixation locations in a 3D environment, is used for a representation of the visual scanning behavior. The fixation induced retinal flow is used to construct the retinal flow probability distribution Prsubscript𝑃𝑟P_{r}italic_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT. Previous work measured the difference between Pfsubscript𝑃𝑓P_{f}italic_P start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT and Prsubscript𝑃𝑟P_{r}italic_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT by the Square Root of Jensen-Shannon Divergence (SRJSD) between them,

SRJSD(Pf||Pr)=JSD(Pf||Pr),SRJSD(P_{f}||P_{r})=\sqrt{JSD(P_{f}||P_{r})},italic_S italic_R italic_J italic_S italic_D ( italic_P start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT | | italic_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) = square-root start_ARG italic_J italic_S italic_D ( italic_P start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT | | italic_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) end_ARG , (2)

for assessing the so-called visual scanning efficiency [8]. SRJSD(Pf||Pr)SRJSD(P_{f}||P_{r})italic_S italic_R italic_J italic_S italic_D ( italic_P start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT | | italic_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) plays a basic function for understanding the mechanism of visual scanning behavior.

3.2 The Proposed Cognitive Effort Measure Based on the View Importance

During a sensorimotor task such as driving, for the purpose of safety and stability, the driver usually focuses varied attention on regions in the viewing plane. For example, central and peripheral viewing regions are paid large and small attention and/or importance, respectively, to achieve stable driving, if the central viewing regions dominate the road for driving. That is to say, the importance of stimulus observation plays an important factor in performing the driving tasks, considering that a stimulus in 3D environment corresponds to at least a region in the viewing plane.

In this paper, the amount of the perceived visual motion resulted from a fixated stimulus, which is characterized as the fixation induced retinal flow, is utilized to define

Jn=In|un|subscript𝐽𝑛subscript𝐼𝑛subscript𝑢𝑛J_{n}=\dfrac{I_{n}}{|\vec{u_{n}}|}italic_J start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = divide start_ARG italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG | over→ start_ARG italic_u start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG | end_ARG (3)

as the importance of stimulus observation. Notice that the observation importance Jnsubscript𝐽𝑛J_{n}italic_J start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT takes into consideration of the motion displacement |un|subscript𝑢𝑛|\vec{u_{n}}|| over→ start_ARG italic_u start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG | of a fixation during its duration, for explicitly signaling the influence of the eye yaw rotation on the observation of the fixated stimulus. This means that when Jnsubscript𝐽𝑛J_{n}italic_J start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is larger the obtainment of fixation is easier, and conversely, when Jnsubscript𝐽𝑛J_{n}italic_J start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is smaller the obtainment of fixation is more challenging. That is, from the perspective of stimulus observation, Jnsubscript𝐽𝑛J_{n}italic_J start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT provides a kind of indicator for pointing out how much effort should be exerted to observe and perceive a stimulus. From the viewpoint of a stimulus itself, Jnsubscript𝐽𝑛J_{n}italic_J start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT offers a measurement of its importance for observation and perception.

And then, the view importance of a region in the viewing plane is obtained, by accumulating all the values of the observation importance of the corresponding stimulus to this region. A region corresponds to a single element of Ng×Ngsubscript𝑁𝑔subscript𝑁𝑔N_{g}\times N_{g}italic_N start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT × italic_N start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT grids in the viewing plane (currently, Ng=5subscript𝑁𝑔5N_{g}=5italic_N start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT = 5 achieves good results in this paper, and other options on Ngsubscript𝑁𝑔N_{g}italic_N start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT will be done as a future work). The normalized histogram of values of the view importance of grids is created, to obtain a probability distribution Pg={pg(j)|j=1,,Ng}subscript𝑃𝑔conditional-setsubscript𝑝𝑔𝑗𝑗1subscript𝑁𝑔P_{g}=\{p_{g}(j)|j=1,\cdots,N_{g}\}italic_P start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT = { italic_p start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( italic_j ) | italic_j = 1 , ⋯ , italic_N start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT } of the view importance of grids. The Shannon entropy

CEMVI=j=1Ngpg(j)log2pg(j)𝐶𝐸subscript𝑀𝑉𝐼superscriptsubscript𝑗1subscript𝑁𝑔subscript𝑝𝑔𝑗subscript2subscript𝑝𝑔𝑗CEM_{VI}=-\sum\limits_{j=1}^{N_{g}}p_{g}(j)\log_{2}p_{g}(j)italic_C italic_E italic_M start_POSTSUBSCRIPT italic_V italic_I end_POSTSUBSCRIPT = - ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( italic_j ) roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( italic_j ) (4)

of this probability distribution is proposed to evaluate the degree of balance for visually scanning various grids in the viewing plane, leading to the view importance based cognitive effort measure (CEMVI𝐶𝐸subscript𝑀𝑉𝐼{CEM_{VI}}italic_C italic_E italic_M start_POSTSUBSCRIPT italic_V italic_I end_POSTSUBSCRIPT) in our paper. This entropy based complexity evaluation on the view importance of grids, which indeed takes into account of the non-trivial interaction between the observations on various grids. As a result, this proposed CEMVI𝐶𝐸subscript𝑀𝑉𝐼{CEM_{VI}}italic_C italic_E italic_M start_POSTSUBSCRIPT italic_V italic_I end_POSTSUBSCRIPT indicates the degree of a systematic perception of all the visual motions during the sensorimotor driving, as well suggests an intensity or amplitude of the visual scanning behavior and behaves as an assessment function for cognitive effort.

3.3 The Information Quantity of Perceived Visual Motion and the Corresponding Proposed Cognitive Effort Measure

The developed measure of the amount of perceived visual motions, SRJSD(Pf||Pr)SRJSD(P_{f}||P_{r})italic_S italic_R italic_J italic_S italic_D ( italic_P start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT | | italic_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ), can be studied from the perspective of probability and information theory [9]. That is, SRJSD(Pf||Pr)SRJSD(P_{f}||P_{r})italic_S italic_R italic_J italic_S italic_D ( italic_P start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT | | italic_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ), in fact, can be considered as a probability p𝑝pitalic_p an event occurs, because it ranges from 0 to 1 [10]. As a result, SRJSD(Pf||Pr)SRJSD(P_{f}||P_{r})italic_S italic_R italic_J italic_S italic_D ( italic_P start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT | | italic_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) gives a probability of perception of the visual motions. According to information theory, the logarithmic probability of occurrence (log2psubscript2𝑝-\log_{2}p- roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_p) represents the quantity of information conveyed by the occurrence [9]. It is obvious that the information quantity log2SRJSD(Pf||Pr)-\log_{2}SRJSD(P_{f}||P_{r})- roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_S italic_R italic_J italic_S italic_D ( italic_P start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT | | italic_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) indicates the quantified amount of perception of all the visual motions during a sensorimotor task. Notably, the amplitude for the perception of visual motions and for the completion of sensorimotor tasks, reflects the meaning of cognitive effort [14]. Thus the logarithm transformation of perceived visual motion is taken as the core function for the proposed information quantity based cognitive effort measure (CEMIQ𝐶𝐸subscript𝑀𝐼𝑄CEM_{IQ}italic_C italic_E italic_M start_POSTSUBSCRIPT italic_I italic_Q end_POSTSUBSCRIPT).

Because the distributions of fixations and fixation transitions reflect different aspects of visual scanning, a combination of both these two distributions should provide a more complete understanding of visual scanning behavior, as has been pointed out in relevant work [3]. Indeed, the performer of visual scanning voluntarily exerts some cognitive effort to do a unidirectional switch between two neighboring fixations, and the visual motion induced by the first fixation perceived/cognized in the procedure of completing one fixation transition measures this effort. Therefore, we propose to utilize the fixation transition distribution Pfssubscript𝑃𝑓𝑠P_{fs}italic_P start_POSTSUBSCRIPT italic_f italic_s end_POSTSUBSCRIPT and the retinal flow distribution Prssubscript𝑃𝑟𝑠P_{rs}italic_P start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT based on fixation transition to obtain the definition of CEMIQ𝐶𝐸subscript𝑀𝐼𝑄CEM_{IQ}italic_C italic_E italic_M start_POSTSUBSCRIPT italic_I italic_Q end_POSTSUBSCRIPT. The approach to obtaining Pfssubscript𝑃𝑓𝑠P_{fs}italic_P start_POSTSUBSCRIPT italic_f italic_s end_POSTSUBSCRIPT is similar to that of the fixation distribution Pfsubscript𝑃𝑓P_{f}italic_P start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT, but here the fixation sequence is employed. Correspondingly, Prssubscript𝑃𝑟𝑠P_{rs}italic_P start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT can be easily obtained based on Pfssubscript𝑃𝑓𝑠P_{fs}italic_P start_POSTSUBSCRIPT italic_f italic_s end_POSTSUBSCRIPT. log2SRJSD(Pfs||Prs)-\log_{2}SRJSD(P_{fs}||P_{rs})- roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_S italic_R italic_J italic_S italic_D ( italic_P start_POSTSUBSCRIPT italic_f italic_s end_POSTSUBSCRIPT | | italic_P start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT ) gives an information quantity for the perceived visual motion during a single fixation transition. We define CEMIQ𝐶𝐸subscript𝑀𝐼𝑄CEM_{IQ}italic_C italic_E italic_M start_POSTSUBSCRIPT italic_I italic_Q end_POSTSUBSCRIPT by the division between these two information quantities, log2SRJSD(Pf||Pr)-\log_{2}SRJSD(P_{f}||P_{r})- roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_S italic_R italic_J italic_S italic_D ( italic_P start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT | | italic_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) and log2SRJSD(Pfs||Prs)-\log_{2}SRJSD(P_{fs}||P_{rs})- roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_S italic_R italic_J italic_S italic_D ( italic_P start_POSTSUBSCRIPT italic_f italic_s end_POSTSUBSCRIPT | | italic_P start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT ), as

CEMIQ=log2SRJSD(Pf||Pr)log2SRJSD(Pfs||Prs),\displaystyle CEM_{IQ}=\dfrac{-\log_{2}SRJSD(P_{f}||P_{r})}{-\log_{2}SRJSD(P_{% fs}||P_{rs})},italic_C italic_E italic_M start_POSTSUBSCRIPT italic_I italic_Q end_POSTSUBSCRIPT = divide start_ARG - roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_S italic_R italic_J italic_S italic_D ( italic_P start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT | | italic_P start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) end_ARG start_ARG - roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_S italic_R italic_J italic_S italic_D ( italic_P start_POSTSUBSCRIPT italic_f italic_s end_POSTSUBSCRIPT | | italic_P start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT ) end_ARG , (5)

for characterizing a cognitive effort during a sensorimotor task.

Refer to caption

Figure 2: A participant is performing a driving task in virtual reality.

4 Experiment

4.1 Participants

14 Master/Phd students (5555 females; age range: 21212121-29292929, Mean = 21.321.321.321.3, SD = 2.372.372.372.37) with driving experience (they hold their driver license at least one and a half years) from our University volunteer to participate in the psychophysical studies. All of the participants have normal/corrected-to-normal visual acuity and normal color vision. There is no participant having adverse reaction to the virtual environment we set up for the studies.

4.2 Apparatus

HTC Vive headset is used to display the vitural environment for participants. The eye-tracking equipment is 7INVENSUN Instrument aGlass DKII, which is embedded into the HTC Vive display to capture visual scanning data in a frequency of 90909090 Hz and in an accuracy of gaze position of 0.5superscript0.50.5^{\circ}0.5 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT. The driving device is a Logitech G29 steering wheel. Participants listen the ambient traffic and car engine sounds in VE by speakers. The visual and driving behaviors of participants are displayed on desktop monitor for observation.

4.3 Driving Task

As discussed in Section 4.1, the task directed focus on visual scanning and driving is taken. And as a result, participants are required to keep driving at a target speed of 40404040 km/h, for speed control. This paper takes the inverse of the mean acceleration of vehicle to denote the driving performance. The smaller the mean acceleration, the higher the driving performance becomes, and vice versa. In fact, this kind of performance measure has been used a lot in literature [17]. An example of performing driving tasks is presented in Fig. 2.

4.4 Procedure

Each participant completes 4 test sessions with the same task requirements and the same driving routes, with a 9-point calibration for the eye tracker at the beginning of each session and with an interval of one week between every two sessions. Data for visual scanning and driving behaviors are recorded during test sessions. In this paper, a trial represents a test session, and there are 144=561445614*4=5614 ∗ 4 = 56 valid trials in all (obviously this gets a large enough sample size [18]). A preparation session is applied to participants before each test session to let them know the purpose and procedure about the studies.

Table 1: Correlation between the proposed measures and pupil size change/fixaion rate
Correlation Pupil Size Change Fixaion Rate
Pearson Kendall Spearman Pearson Kendall Spearman
CC p-value CC p-value CC p-value CC p-value CC p-value CC p-value
CEMVI𝐶𝐸subscript𝑀𝑉𝐼CEM_{VI}italic_C italic_E italic_M start_POSTSUBSCRIPT italic_V italic_I end_POSTSUBSCRIPT 0.38 p<superscript𝑝absentabsentp^{**}<italic_p start_POSTSUPERSCRIPT ∗ ∗ end_POSTSUPERSCRIPT <0.01 0.27 p<superscript𝑝absentabsentp^{**}<italic_p start_POSTSUPERSCRIPT ∗ ∗ end_POSTSUPERSCRIPT < 0.01 0.39 p<superscript𝑝absentabsentp^{**}<italic_p start_POSTSUPERSCRIPT ∗ ∗ end_POSTSUPERSCRIPT < 0.01 -0.19 p>𝑝absentp>italic_p >0.05 -0.04 p>𝑝absentp>italic_p >0.05 -0.04 p>𝑝absentp>italic_p >0.05
CEMIQ𝐶𝐸subscript𝑀𝐼𝑄CEM_{IQ}italic_C italic_E italic_M start_POSTSUBSCRIPT italic_I italic_Q end_POSTSUBSCRIPT 0.27 p<superscript𝑝absentp^{*}<italic_p start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT <0.05 0.20 p<superscript𝑝absentp^{*}<italic_p start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT < 0.05 0.27 p<superscript𝑝absentp^{*}<italic_p start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT < 0.05 -0.46 p<superscript𝑝absentabsentp^{***}<italic_p start_POSTSUPERSCRIPT ∗ ∗ ∗ end_POSTSUPERSCRIPT <0.001 -0.23 p<superscript𝑝absentp^{*}<italic_p start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT < 0.05 -0.36 p<superscript𝑝absentabsentp^{**}<italic_p start_POSTSUPERSCRIPT ∗ ∗ end_POSTSUPERSCRIPT < 0.01
CheckRate𝐶𝑒𝑐𝑘𝑅𝑎𝑡𝑒CheckRateitalic_C italic_h italic_e italic_c italic_k italic_R italic_a italic_t italic_e 0.15 p>𝑝absentp>italic_p >0.05 0.14 p>𝑝absentp>italic_p >0.05 0.21 p>𝑝absentp>italic_p >0.05 0.35 p<superscript𝑝absentabsentp^{**}<italic_p start_POSTSUPERSCRIPT ∗ ∗ end_POSTSUPERSCRIPT <0.01 0.32 p<superscript𝑝absentabsentp^{**}<italic_p start_POSTSUPERSCRIPT ∗ ∗ end_POSTSUPERSCRIPT < 0.01 0.46 p<superscript𝑝absentabsentp^{**}<italic_p start_POSTSUPERSCRIPT ∗ ∗ end_POSTSUPERSCRIPT < 0.01
SGE𝑆𝐺𝐸SGEitalic_S italic_G italic_E 0.03 p>𝑝absentp>italic_p >0.05 0.02 p>𝑝absentp>italic_p >0.05 0.05 p>𝑝absentp>italic_p >0.05 0.01 p>𝑝absentp>italic_p >0.05 -0.02 p>𝑝absentp>italic_p >0.05 -0.04 p>𝑝absentp>italic_p >0.05
EntropyRate𝐸𝑛𝑡𝑟𝑜𝑝𝑦𝑅𝑎𝑡𝑒EntropyRateitalic_E italic_n italic_t italic_r italic_o italic_p italic_y italic_R italic_a italic_t italic_e -0.02 p>𝑝absentp>italic_p >0.05 0.01 p>𝑝absentp>italic_p >0.05 0.03 p>𝑝absentp>italic_p >0.05 -0.07 p>𝑝absentp>italic_p >0.05 -0.10 p>𝑝absentp>italic_p >0.05 -0.18 p>𝑝absentp>italic_p >0.05

5 Results and Discussions

5.1 Correlation Results

As widely used in literature as a measure of cognitive effort [12, 19], pupil size change has been accepted as an autonomic and reflexive measure of cognitive effort. In this paper, the standard deviation of pupil size [20, 4] during each trial is utilized to represent the pupil size change because of its simplicity and effectiveness. The fixation rate is also used to measure cognitive effort [21], because corresponding studies have indicated that factors influencing pupil size are not solely due to cognitive effort [22]. It is clear that cognitive load affects both human pupillary response and fixation based eye movements. Therefore, both pupil size change and fixation rate are taken in this paper as the definitive quantitative ground truth for cognitive effort. Three classic quantitative measures of cognitive effort, check rate [15], SGE𝑆𝐺𝐸SGEitalic_S italic_G italic_E [3] and entropy rate [16] are used as comparison for evaluating the effectiveness of our proposed measures.

We validate the relationship between two proposed measures and pupil size change/fixation rate through three commonly used correlation coefficients called Pearson Linear Correlation Coefficient (PLCC), Kendall Rank Order Correlation Coefficient (KROCC) and Spearman Rank Order Correlation Coefficient (SROCC). The correlation results are listed in Table 1. We find that, CEMIQ𝐶𝐸subscript𝑀𝐼𝑄CEM_{IQ}italic_C italic_E italic_M start_POSTSUBSCRIPT italic_I italic_Q end_POSTSUBSCRIPT shows a significant correlation with both pupil size change and fixation rate. CEMVI𝐶𝐸subscript𝑀𝑉𝐼CEM_{VI}italic_C italic_E italic_M start_POSTSUBSCRIPT italic_V italic_I end_POSTSUBSCRIPT exhibits a significant correlation with pupil size change. The check rate is not related to pupil size change, yet it has a significant correlation with fixation rate. But, SGE𝑆𝐺𝐸SGEitalic_S italic_G italic_E and entropy rate are not correlated with pupil size change and fixation rate. Correlation analysis between eye movement measures with pupil size change/fixation rate is also clearly shown in Fig. 3.

Refer to caption


Figure 3: Correlation coefficients of eye movement measures with pupil size change/fixation rate (*: p<0.05𝑝0.05p<0.05italic_p < 0.05, **: p<0.01𝑝0.01p<0.01italic_p < 0.01, ***: p<0.001𝑝0.001p<0.001italic_p < 0.001)

5.2 General Discussions

The proposed cognitive effort measures are based on the methodology of information theory, through taking advantage of the perceived visual motion always happening in dynamic environments during sensorimotor tasks. In fact, our proposal actually satisfies the definition of cognitive effort in terms of information processing [14]. The significant positive correlation between CEMVI𝐶𝐸subscript𝑀𝑉𝐼CEM_{VI}italic_C italic_E italic_M start_POSTSUBSCRIPT italic_V italic_I end_POSTSUBSCRIPT and pupil size change suggests the principle that the more chaotic and varied the distribution of the importance of different areas in the driver’s viewing plane, the higher the corresponding cognitive effort on the driver. Indeed CEMVI𝐶𝐸subscript𝑀𝑉𝐼CEM_{VI}italic_C italic_E italic_M start_POSTSUBSCRIPT italic_V italic_I end_POSTSUBSCRIPT is designed based on this principle to assess the driver’s cognitive load. Overall, CEMIQ𝐶𝐸subscript𝑀𝐼𝑄CEM_{IQ}italic_C italic_E italic_M start_POSTSUBSCRIPT italic_I italic_Q end_POSTSUBSCRIPT achieves the best in measuring cognitive effort, from both pupillary and fixation perspectives. From the viewpoints of both pupil size and fixation, a consistent conclusion can be drawn: the larger the CEMIQ𝐶𝐸subscript𝑀𝐼𝑄CEM_{IQ}italic_C italic_E italic_M start_POSTSUBSCRIPT italic_I italic_Q end_POSTSUBSCRIPT, the greater the cognitive effort. The significant correlation between CEMIQ𝐶𝐸subscript𝑀𝐼𝑄CEM_{IQ}italic_C italic_E italic_M start_POSTSUBSCRIPT italic_I italic_Q end_POSTSUBSCRIPT and the ground truth of cognitive effort demonstrates that the higher the proportion of perceived visual motion information among all perceived potential eye movement changes, the higher the cognitive effort on the driver. The achievement by CEMIQ𝐶𝐸subscript𝑀𝐼𝑄CEM_{IQ}italic_C italic_E italic_M start_POSTSUBSCRIPT italic_I italic_Q end_POSTSUBSCRIPT is also evidenced by a comparison between it with other classic measurements of cognitive effort based on eye movement. Among the three measures under comparison, only the check rate has a significant correlation with the fixation rate. We believe this is because the check rate itself is specifically related to the fixation distribution for driving. In a word, we consider our proposed CEMIQ𝐶𝐸subscript𝑀𝐼𝑄CEM_{IQ}italic_C italic_E italic_M start_POSTSUBSCRIPT italic_I italic_Q end_POSTSUBSCRIPT to have the best robustness, being applicable in a broader range of scenarios and potentially yielding a more accurate measurement of cognitive effort. In the meantime, we believe that the definition of cognitive effort in terms of the information quantity and of “physics" is worthwhile, and we will continue deep exploitation in this avenue.

Considering that we have made progress on the exploitation of eye tracking data, as a behaviometric, for the evaluations on cognitive effort, a further investigation into the relationship between these two proposed measures and the performance of sensorimotor tasks will be done in future work. And actually, this could be a working path illuminated based on the exploitation of Yerkes-Dodson law [23].

Notice that the findings of this paper may not be applicable for all cases, but it does work in the context of our topic. Due to that the visual scanning and visuo-motor behavior is exceptional important in virtual and real-world sensorimotor tasks, what we have achieved on the measurement of cognitive effort in virtual driving should be potentially helpful for ergonomic evaluation pragmatically, in many practical and relevant applications.

6 Conclusions and Future Works

In this paper, we take an important step for thorough understanding the mechanism of visual scanning in virtual driving. This paper has established, in an objective and quantitative way, two new measures for the subjective cognitive effort, mainly by utilizing information theoretic tools. Our proposal is well done through a methodology that exploits the perceived visual motions in a sensorimotor task. As far as we can know, no research up to now has reported this kind of finding to shed light on the issue of cognitive effort measure for the visual scanning behavior during sensorimotor tasks. Additionally, the proposed cognitive effort measures may offer a new perspective on the inherent relationship between task directed visual scanning and eye tracking data, so as to help the development of behaviometric discovery, from both theoretical and practical perspectives.

In the near future, we will investigate our proposed methodology and measures for real-life driving scenarios, for instance, for crash risk problem [24]. In consideration of the critical role of illumination conditions for driving, we will exploit the manipulation of illumination levels in a detailed quantitative way, to comprehensively understand the mechanism of cognitive effort during visual scanning behavior. Also, physiological signals such as heartbeat  [25] will be investigated for understanding the relationships and interplays between these signals and eye tracking data, for the sake of cognitive effort.

References

  • [1] Shreya Ghosh, Abhinav Dhall, Munawar Hayat, Jarrod Knibbe, and Qiang Ji. Automatic gaze analysis: A survey of deep learning based approaches. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(1):61–84, 2023.
  • [2] Kenneth Holmqvist, Marcus Nyström, Richard Andersson, Richard Dewhurst, Halszka Jarodzka, and Joost Van de Weijer. Eye tracking: A comprehensive guide to methods and measures. OUP Oxford, 2011.
  • [3] Brook Shiferaw, Luke Downey, and David Crewther. A review of gaze entropy as a measure of visual scanning efficiency. Neuroscience & Biobehavioral Reviews, 96:353–366, 2019.
  • [4] Pauline van der Wel and Henk Van Steenbergen. Pupil dilation as an index of effort in cognitive control tasks: A review. Psychonomic bulletin & review, 25:2005–2015, 2018.
  • [5] Andrew Westbrook and Todd S Braver. Cognitive effort: A neuroeconomic approach. Cognitive, Affective, & Behavioral Neuroscience, 15:395–415, 2015.
  • [6] Luca Longo, Christopher D Wickens, Gabriella Hancock, and Peter A Hancock. Human mental workload: A survey and a novel inclusive definition. Frontiers in psychology, 13:883321, 2022.
  • [7] S Negahdaripour. Revised definition of optical flow: integration of radiometric and geometric cues for dynamic scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(9):961–979, 1998.
  • [8] Zezhong Lv, Qing Xu, Klaus Schoeffmann, and Simon Parkinson. A jensen-shannon divergence driven metric of visual scanning efficiency indicates performance of virtual driving. In 2021 IEEE International Conference on Multimedia and Expo (ICME), pages 1–6. IEEE, 2021.
  • [9] Thomas M Cover and Joy A Thomas. Elements of information theory. John Wiley & Sons, 2012.
  • [10] J Lin. Divergence measures based on the shannon entropy. IEEE Transactions on Information Theory, 37(1):145–151, 1991.
  • [11] Heejin Jeong, Ziho Kang, and Yili Liu. Driver glance behaviors and scanning patterns: Applying static and dynamic glance measures to the analysis of curve driving with secondary tasks. Human Factors and Ergonomics in Manufacturing & Service Industries, 29(6):437–446, 2019.
  • [12] Maria K Eckstein, Belén Guerra-Carrillo, Alison T Miller Singley, and Silvia A Bunge. Beyond eye gaze: What else can eyetracking reveal about cognition and cognitive development. Developmental cognitive neuroscience, 25:69–91, 2017.
  • [13] Amie C Hayley, Brook Shiferaw, and Luke A Downey. Amphetamine-induced alteration to gaze parameters: A novel conceptual pathway and implications for naturalistic behavior. Progress in neurobiology, 199:101929, 2021.
  • [14] Amitai Shenhav, Sebastian Musslick, Falk Lieder, Wouter Kool, Thomas L Griffiths, Jonathan D Cohen, and Matthew M Botvinick. Toward a rational and mechanistic account of mental effort. Annual review of neuroscience, 40:99–124, 2017.
  • [15] Dengbo He, Ziquan Wang, Elias B Khalil, Birsen Donmez, Guangkai Qiao, and Shekhar Kumar. Classification of driver cognitive load: exploring the benefits of fusing eye-tracking and physiological measures. Transportation research record, 2676(10):670–681, 2022.
  • [16] Shurong Tong and Yafei Nie. Measuring designers’ cognitive load for timely knowledge push via eye tracking. International Journal of Human–Computer Interaction, 39(6):1230–1243, 2023.
  • [17] Ankit Kumar Yadav and Nagendra R Velaga. Effect of alcohol use on accelerating and braking behaviors of drivers. Traffic Injury Prevention, 20(4):353–358, 2019.
  • [18] Erich Leo Lehmann. Elements of large-sample theory. Springer, 1999.
  • [19] Siddhartha Joshi, Yin Li, Rishi M Kalwani, and Joshua I Gold. Relationships between pupil diameter and neuronal activity in the locus coeruleus, colliculi, and cingulate cortex. Neuron, 89(1):221–234, 2016.
  • [20] Siyuan Chen, Julien Epps, Natalie Ruiz, and Fang Chen. Eye activity as a measure of human mental effort in hci. intelligent user interfaces, pages 315–318, 2011.
  • [21] Alexis D Souchet, Stéphanie Philippe, Domitile Lourdeaux, and Laure Leroy. Measuring visual fatigue and cognitive load via eye tracking while learning with virtual reality head-mounted displays: A review. International Journal of Human–Computer Interaction, 38(9):801–824, 2022.
  • [22] Bernhard Petersch and Kai Dierkes. Gaze-angle dependency of pupil-size measurements in head-mounted eye tracking. Behavior Research Methods, 54(2):763–779, 2022.
  • [23] Paul A Watters, F Martin, and Zoltan Schreter. Caffeine and cognitive performance: The nonlinear yerkes-dodson law. Human Psychopharmacology-clinical and Experimental, 12(3):249–257, 1997.
  • [24] Bargman J Victor T, Dozza M. Analysis of naturalistic driving study data: Safer glances, driver inattention, and crash risk. Technical report, 2015.
  • [25] Alejandro Galvez-Pol, Ruth Mcconnell, and James M. Kilner. Active sampling in visual search is coupled to the cardiac cycle. Cognition, 196, 2020.