Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

TOWARDS GAZE-INDEPENDENT C-VEP BCI: A PILOT STUDY
S. Narayanan1, S. Ahmadi1, P. Desain1, J. Thielen1
1Donders Institute, Radboud University, Nijmegen, the Netherlands
E-mail: jordy.thielen@donders.ru.nl

ABSTRACT: A limitation of brain-computer interface (BCI) spellers is that they require the user to be able to move the eyes to fixate on targets. This poses an issue for users who cannot voluntarily control their eye movements, for instance, people living with late-stage amyotrophic lateral sclerosis (ALS). This pilot study makes the first step towards a gaze-independent speller based on the code-modulated visual evoked potential (c-VEP). Participants were presented with two bi-laterally located stimuli, one of which was flashing, and were tasked to attend to one of these stimuli either by directly looking at the stimuli (overt condition) or by using spatial attention, eliminating the need for eye movement (covert condition). The attended stimuli were decoded from electroencephalography (EEG) and classification accuracies of 88888888 % and 100100100100 % were obtained for the covert and overt conditions, respectively. These fundamental insights show the promising feasibility of utilizing the c-VEP protocol for gaze-independent BCIs that use covert spatial attention when both stimuli flash simultaneously.

INTRODUCTION

A brain-computer interface (BCI) records its users’ brain activity and translates it into a computer command, opening a novel non-muscular channel for communication and control [wolpaw2002]. Typically, a BCI records brain activity with electroencephalography (EEG) because it is affordable, practical, and non-invasive.

One of the fastest BCIs for communication uses the code-modulated visual evoked potential (c-VEP) as measured in the EEG [martnezcagigal2021]. The c-VEP is observed during visual stimulation of the user with a pseudo-random sequence of flashes. As each of the presented symbols concurrently flickers with a random but unique sequence of flashes, specific brain activity is evoked when the user attends to one of the symbols. Subsequently, machine learning algorithms infer the attended symbol from the users’ evoked brain activity. Such a visual BCI speller allows its user to select symbols or commands and as such communicate, bypassing most of the motor system [verbaarschot2021].

Unfortunately, an important limitation of a standard visual BCI speller is the requirement of the users’ eyes to shift their gaze towards (i.e., fixate on) a target symbol. Because BCI control is fully dependent on eye movements, this poses a major challenge and quickly renders the BCI uncontrollable for people who have lost voluntary control of their eye movements, i.e., people living with late-stage amyotrophic lateral sclerosis (ALS).

In the visual domain, several studies have attempted to develop a gaze-independent BCI. For instance, Blankertz and colleagues developed a BCI speller called the ‘Hex-o-Spell’ that used motor imagery (imagined right hand and right foot movement, i.e., N=2𝑁2N=2italic_N = 2 classes) of the user to aid the selection of characters from six hexagonal fields [blankertz2006]. They reported a typing speed of 2.3–5 char/min and 4.6–7.6 char/min, for their two participants respectively. Interestingly, Treder and Blankertz showed that visual covert spatial attention can also be used to operate the ‘Hex-o-Spell’ and the ‘Matrix’ speller using the P300 event-related potential (ERP) [treder2010]. This covert ‘Hex-o-Spell’ outperformed the covert ‘Matrix’ speller, with a classification accuracy of 60606060 % (N=36𝑁36N=36italic_N = 36 classes) and 40404040 % (N=30𝑁30N=30italic_N = 30 classes), respectively.

Furthermore, work by Treder and colleagues compared the P300-based ‘Hex-o-Spell’, the ‘Cake Speller’, which is similar to the former, and a ‘Center Speller’, where unique geometric shapes with different colors were closely surrounded by characters, and presented centrally on the screen in a sequential fashion [treder2011a]. A classification accuracy of 91.391.391.391.3 %, 88.288.288.288.2 %, and 97.197.197.197.1 % was reported for the three spellers, respectively (N=30𝑁30N=30italic_N = 30 classes). Similarly, Chen and colleagues [Chen2016] used an extension of the P300 oddball paradigm, namely, rapid serial visual presentation (RSVP). The authors used two versions: a colored circles paradigm (CCP), and a dummy face paradigm (DFP). The average performances obtained from the CCP and DFP paradigms were in the range 51.651.651.651.6 % and 73.573.573.573.5 %, respectively.

Additionally, Treder and colleagues, in another instance, focused on using changes in alpha band activity induced by covert attention shifts to classify the direction in which attentional shifts occurred [treder2011b]. The authors showed that a classification accuracy of 73.6573.6573.6573.65 % was obtained (N=2𝑁2N=2italic_N = 2 classes). These results indicate the potential of using alpha activity as a feature for spatial attention decoding in gaze-independent BCIs.

Furthermore, Kelly and colleagues designed a gaze-independent BCI for communication by combining features from the steady-state visual evoked potential (SSVEP) and alpha band modulations to decode covert spatial attention [kelly2005]. The authors reported an average performance of 70.370.370.370.3 %, 72.872.872.872.8 % and 79.579.579.579.5 % when using the SSVEP, alpha band, or both features in their analysis pipeline, respectively (N=2𝑁2N=2italic_N = 2 classes). Similarly, Egan and colleagues [egan2017] aimed for a hybrid gaze-independent speller using the P300 ERP and alpha in addition to the SSVEP. Importantly, adding the P300 response and alpha as additional features in their classification pipeline improved the performance by 17171717 % to an overall 79797979 % when compared to the performance using only the SSVEP, achieving 62626262 % (N=2𝑁2N=2italic_N = 2 classes).

In this pilot study, we work towards a gaze-independent BCI. The gaze-dependent c-VEP has recently demonstrated exceptional performance, surpassing other evoked paradigms like ERP and SSVEP [shi2024]. Another study revealed the reliable decoding of c-VEP from peripheral stimulation (away from fixation) compared to direct foveal stimulation (at fixation) [waytowich2015].

Our objective is to acquire fundamental insights on the feasibility of decoding the c-VEP in a fully gaze-independent manner. Specifically, participants will use covert spatial attention to concentrate on stimuli, eliminating the need for direct eye movements to foveate on them. In this pilot work, the stimuli were presented sequentially, to assess whether the c-VEP can be decoded from the far periphery, before testing the more complex parallel stimulation case, where stimuli would be presented simultaneously. If successful, this study provides the first steps to a gaze-independent c-VEP BCI, potentially providing a high-speed neuro-technological assistive device for individuals who may not have reliable control of their eye movements.

MATERIALS AND METHODS

Participants: Five participants (all male, mean age 31 years, range 24-50 years) were included in this study after obtaining written informed consent. Two participants were authors of this study. A pre-screening procedure excluded any participants with a history of epilepsy or brain injury. All participants had normal or corrected-to-normal vision and reported no central nervous system abnormalities. This study was approved by the Ethical Committee of the Faculty of Social Sciences at the Radboud University Nijmegen.

Materials: EEG data from 64 Ag/AgCl active electrodes placed according to the international 10-10 system were recorded at 512 Hz amplified by a Biosemi ActiveTwo amplifier. The data were preprocessed using a notch filter at 50 Hz and a bandpass filter with a lower cutoff at 1 Hz and a higher cutoff at 40 Hz. Subsequently, the data were sliced to trials starting at 500 ms before stimulus onset until 20 s after stimulus onset. Finally, the data were downsampled to 120 Hz, and the 500 ms pre-stimulus that may have captured filter artefacts due to initial slicing and subsequently filtering were removed.

The stimulus protocol (see Fig. 1) was displayed on a 27 in Corsair Xeneon 27QHD240 OLED screen at a 1920×1080192010801920\times 10801920 × 1080 px resolution with a 120 Hz refresh rate. The participants were seated at a 60 cm distance in front of the display. A black fixation cross was presented at the center of the screen on a mean luminance gray background. To each of the sides of the fixation cross at a distance of 2.1superscript2.12.1^{\circ}2.1 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT, two circles with a 3superscript33^{\circ}3 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT diameter were presented.

Refer to caption
(a)
Refer to caption
(b)
Figure 1: Stimulus protocol. In (a), a graphical representation of the stimulus interface is depicted, featuring two stimuli positioned at 2.1superscript2.12.1^{\circ}2.1 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT on either side of a fixation cross. The stimuli took the form of circles measuring 3superscript33^{\circ}3 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT in both height and width. The fixation cross was 0.7superscript0.70.7^{\circ}0.7 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT for each side. The shapes presented were bound to a maximum height and width of 1.4superscript1.41.4^{\circ}1.4 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT each. The shapes’ heights and widths were as follows: green circle (1.4superscript1.41.4^{\circ}1.4 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT diameter), inverted cyan triangle and yellow triangle (0.9×1.4superscript0.9superscript1.40.9^{\circ}\times 1.4^{\circ}0.9 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT × 1.4 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT), magenta hourglass (0.9×1.4superscript0.9superscript1.40.9^{\circ}\times 1.4^{\circ}0.9 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT × 1.4 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT) and the red rectangle (1.5×0.5superscript1.5superscript0.51.5^{\circ}\times 0.5^{\circ}1.5 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT × 0.5 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT, rotated by 45superscript4545^{\circ}45 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT). In (b), a graphical representation of the stimulus protocol is depicted comprising two crucial components: first, the background of the stimuli underwent alternating black-and-white transitions following a binary pseudo-random sequence; second, diverse-colored shapes were presented within the stimuli. The stimulus background could dynamically change with each frame of 16.67 ms (60 Hz), while the shapes within the stimuli changed every 250 ms (4 Hz). A trial took 20 s, within which target shapes (the magenta hour glass) appeared randomly in the sequence with at least 1 s distance. Participants engaged with the stimuli by counting the number of target shapes on the attended side. In this pilot study, we adopted a paradigm where only the background of the attended stimulus alternated, while the background of the unattended stimulus remained constant. A left-attended trial is shown in (b).

The circles’ background color was luminance modulated with binary pseudo-random noise-codes, such that ones represent a white and zeros a black background. We used 126-bit modulated Gold codes [gold1967, thielen2015], which contained only short flashes of 16,67 ms (bit sub-sequence ‘010’) and long flashes of 33,33 ms (bit sub-sequence ‘0110’) at a presentation rate of 60 Hz. From the available modulated Gold codes, we carefully selected one for the left side. For the right circle a 61 bits phase-shifted version of the left code was used. This was done such that the noise-codes’ properties were identical, but still had a near-zero correlation at a maximum delay.

Inside the circles (3superscript33^{\circ}3 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT diameter), five colored shapes were presented with a maximum possible height and width of 1.4superscript1.41.4^{\circ}1.4 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT each. The shapes and their colors are as follows : a green circle (1.4superscript1.41.4^{\circ}1.4 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT diameter), magenta hourglass (0.9×1.4superscript0.9superscript1.40.9^{\circ}\times 1.4^{\circ}0.9 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT × 1.4 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT), cyan inverted triangle (0.9×1.4superscript0.9superscript1.40.9^{\circ}\times 1.4^{\circ}0.9 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT × 1.4 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT), red rectangle (1.5×0.5superscript1.5superscript0.51.5^{\circ}\times 0.5^{\circ}1.5 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT × 0.5 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT, rotated by 45superscript4545^{\circ}45 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT), and the yellow triangle (0.9×1.4superscript0.9superscript1.40.9^{\circ}\times 1.4^{\circ}0.9 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT × 1.4 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT). All shapes had the same brightness and were sequentially presented in random order at a rate of 4 Hz (see Fig. 1). Participants were asked to count the number of times that the magenta hourglass, i.e., the target shape, occurred on the cued side, to facilitate sustaining their attention and to evaluate the behavioral performance of attending to each of the sides. Within a trial, the temporal distance between the presentation of two target shapes was at least 1 s, the target shape could not be presented on both sides simultaneously, and the number of times the target shape was presented differed for the two sides within a trial.

In this pilot study, we used sequential stimulation in both overt and covert runs to make the first step towards gaze-independent c-VEP BCI. That is, only the circle on the attended (cued) side underwent alternating background changes based on the pseudo-random noise-code, while the unattended side retained a constant black background. Notably, both sides featured distinct shape sequences despite this sequential stimulation protocol.

Experiment: During the experiment, participants completed five runs: four runs required covert attention and one required overt attention, the order of which was randomized across participants. Each run consisted of 20 trials, 10 for each of the two classes, in random order. At the start of a run, a 5 s period was used to let the participant prepare for the upcoming trials. At the start of a trial, a 1-second cue was presented to indicate the to-be-attended side using an arrow. Subsequently, for a duration of 20 s, the cued circle flashed according to its bit sequence while the uncued circle remained static, while both circles showed their distinct shape sequences. At the end of a trial, participants were given a maximum of 5 s to enter the number of target shapes they counted on the attended side using a keyboard, after which they received feedback for a period of 1 s on the correctness of their response. Finally, before continuing to the next trial, a 1 s blank inter-trial interval was presented. At the end of a run, the behavioral accuracy of correct responses was shown on the screen. Participants could take self-paced breaks in between runs.

In summary, we gathered 20 trials for each participant in the overt condition, whereas the covert condition involved the recording of 80 trials per participant. In both conditions, the labels (left and right) were balanced.

Analysis: We used a template-matching classifier to predict the attended side (left or right) given the recorded brain activity. Specifically, we used the ‘reconvolution’ method [thielen2015], which assumes that the evoked response to a stimulus sequence can be described by the linear superposition of the responses to the individual flashes in that sequence. The reconvolution approach can substantially reduce the number of parameters while increasing the number of samples to train these parameters, which effectively can limit the required training data [thielen2021].

In reconvolution, the event time-series of the i𝑖iitalic_ith stimulus sequence are listed in the event matrix 𝐄iE×Tsubscript𝐄𝑖superscript𝐸𝑇\mathbf{E}_{i}\in\mathbb{R}^{E\times T}bold_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_E × italic_T end_POSTSUPERSCRIPT for E𝐸Eitalic_E-many events and T𝑇Titalic_T-many samples. This matrix describes the onset of each of the events in a sequence. In this work, the events were defined as the onset of the stimulation sequence in each trial, and one event for each of the the flash durations (short ‘010’ and long ‘0110’), for a total of E=3𝐸3E=3italic_E = 3 events.

The event time-series are subsequently transformed to a structure matrix that not only describes the onset, but also the modeled length and importantly the overlap of the transient responses for each of the events in the event matrix. Assuming that the transient response length can be limited to L𝐿Litalic_L samples without losing relevant data, the structure matrix of the i𝑖iitalic_ith stimulus sequence is a Toeplitz-like matrix 𝐌iM×Tsubscript𝐌𝑖superscript𝑀𝑇\mathbf{M}_{i}\in\mathbb{R}^{M\times T}bold_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_M × italic_T end_POSTSUPERSCRIPT for M=EL𝑀𝐸𝐿M=E*Litalic_M = italic_E ∗ italic_L event time points.

Let’s assume we have a training dataset {(𝐗1,y1),(𝐗j,yj),(𝐗J,yJ)}subscript𝐗1subscript𝑦1subscript𝐗𝑗subscript𝑦𝑗subscript𝐗𝐽subscript𝑦𝐽\{(\mathbf{X}_{1},y_{1}),(\mathbf{X}_{j},y_{j})\dots,(\mathbf{X}_{J},y_{J})\}{ ( bold_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , ( bold_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) … , ( bold_X start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT ) } including labeled EEG data for j{1,,J}𝑗1𝐽j\in\{1,...,J\}italic_j ∈ { 1 , … , italic_J } trials with the single-trial EEG 𝐗C×T𝐗superscript𝐶𝑇\mathbf{X}\in\mathbb{R}^{C\times T}bold_X ∈ blackboard_R start_POSTSUPERSCRIPT italic_C × italic_T end_POSTSUPERSCRIPT of C𝐶Citalic_C-many channels and T𝑇Titalic_T-many samples and the associated binary label y{0,1}𝑦01y\in\{0,1\}italic_y ∈ { 0 , 1 }. With this data, we can learn a spatial filter 𝐰C𝐰superscript𝐶\mathbf{w}\in\mathbb{R}^{C}bold_w ∈ blackboard_R start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT and a temporal response vector 𝐫M𝐫superscript𝑀\mathbf{r}\in\mathbb{R}^{M}bold_r ∈ blackboard_R start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT by maximizing the following correlation ρ𝜌\rhoitalic_ρ as part of a canonical correlation analysis (CCA):

argmax𝐰,𝐫ρ(𝐰𝐒,𝐫𝐃)𝐰𝐫𝜌superscript𝐰top𝐒superscript𝐫top𝐃\underset{\mathbf{w},\mathbf{r}}{\arg\max}\leavevmode\nobreak\ \rho(\mathbf{w}% ^{\top}\mathbf{S},\mathbf{r}^{\top}\mathbf{D})start_UNDERACCENT bold_w , bold_r end_UNDERACCENT start_ARG roman_arg roman_max end_ARG italic_ρ ( bold_w start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_S , bold_r start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_D ) (1)

where 𝐒=[𝐗1,𝐗j,,𝐗J]𝐒subscript𝐗1subscript𝐗𝑗subscript𝐗𝐽\mathbf{S}=[\mathbf{X}_{1},\mathbf{X}_{j},\dots,\mathbf{X}_{J}]bold_S = [ bold_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , … , bold_X start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT ] are the concatenated single trials and 𝐃=[𝐌y1,𝐌yj,,𝐌yJ]𝐃subscript𝐌subscript𝑦1subscript𝐌subscript𝑦𝑗subscript𝐌subscript𝑦𝐽\mathbf{D}=[\mathbf{M}_{y_{1}},\mathbf{M}_{y_{j}},\dots,\mathbf{M}_{y_{J}}]bold_D = [ bold_M start_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , bold_M start_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT , … , bold_M start_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT end_POSTSUBSCRIPT ] are the concatenated accompanying structure matrices.

Having learned the spatial filter and temporal response vector, we can now predict the label of a new trial y^^𝑦\hat{y}over^ start_ARG italic_y end_ARG by maximizing the following Pearson’s correlation ρ𝜌\rhoitalic_ρ:

y^=argmax𝑖ρ(𝐰𝐗,𝐫𝐌i)^𝑦𝑖𝜌superscript𝐰top𝐗superscript𝐫topsubscript𝐌𝑖\hat{y}=\underset{i}{\arg\max}\leavevmode\nobreak\ \rho(\mathbf{w}^{\top}% \mathbf{X},\mathbf{r}^{\top}\mathbf{M}_{i})over^ start_ARG italic_y end_ARG = underitalic_i start_ARG roman_arg roman_max end_ARG italic_ρ ( bold_w start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_X , bold_r start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) (2)

Here, 𝐰𝐗superscript𝐰top𝐗\mathbf{w}^{\top}\mathbf{X}bold_w start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_X is the spatially filtered data and 𝐫𝐌isuperscript𝐫topsubscript𝐌𝑖\mathbf{r}^{\top}\mathbf{M}_{i}bold_r start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the predicted response template for the i𝑖iitalic_ith stimulus sequence.

To evaluate the performance of the reconvolution CCA on the overt and covert data, we used a chronological 4-fold cross-validation within each condition. The classification accuracy was averaged across folds. Note, the c-VEP stimulation was only applied on the attended side, while the unattended side remained a black background color. In the decoding analysis, we simulated as if the unattended side had been flashing with the noise-code other than the one presented on the attended side.

Code for the reconvolution CCA approach is available at: https://github.com/thijor/pyntbci.

RESULTS

As this study presents the initial step to decode c-VEP from peripheral stimulation, aiming towards covert spatial attention, it is imperative to study how classification accuracy is influenced by the modeled transient response length. Given the potential for distinct transient responses between conditions, we assessed the mean accuracy across transient response lengths spanning from 0.1 to 0.9 s, for all participants (S1-S5) and both conditions (see to Fig. 2).

In the covert condition, mean accuracy fluctuated from 55555555 % to 99999999 % across participants, whereas in the overt condition, mean accuracy remained consistently at 100100100100 % for all participants across all transient response lengths.

In the covert condition, participants S3 and S4 achieved a peak accuracy of 85858585 % and 86868686 % respectively, observed at a transient response length of 200 ms. Participants S1 and S5 reached a highest accuracy of 88888888 % and 89898989 %, respectively, at a transient response length of 300 ms. Participant S2 demonstrated a peak accuracy of 99999999 % at 400 ms. Notably, the mean accuracy across participants in the covert condition was highest at a transient response length of 300 ms. Hence, for subsequent analysis, we use a transient response length of 300 ms.

Refer to caption
Figure 2: Classification accuracy across modeled transient response lengths. Depicted are the participant-specific classification accuracies for both overt (solid lines) and covert (dashed lines) conditions across transient response lengths ranging from 0.1 s to 0.9 s. The grand average over participants is shown in black. Please note, that for the overt condition, the classification accuracy was 100100100100 % for all transient response lengths and all participants. The dashed gray line indicates theoretical chance level (50505050 %).

Tab. 1 shows the classification accuracy for a transient response length of 300 ms. The scores obtained in the covert condition for S1-5 were 88%,98%,84%,81percent88percent98percent848188\,\%,98\,\%,84\,\%,8188 % , 98 % , 84 % , 81 % and 89898989 %, respectively, leading to an average of 88888888 %. The overt condition performed better for all participants (100100100100 %). All individual scores in Tab. 1 are significantly higher (p<.001𝑝.001p<.001italic_p < .001) than chance level (50505050 %) as verified by a permutation test using 1000100010001000 permutations.

Table 1: Mean classification accuracy. The table shows the classification accuracy using a transient response length of 300 ms, for each participant and the grand average, for both overt and covert conditions. All classification results for both conditions and all participants individually were significantly higher than chance (50505050 %) as verified by a permutation test with 1000 permutations (p<.001𝑝.001p<.001italic_p < .001).
S1 S2 S3 S4 S5 Avg
Overt 1.00 1.00 1.00 1.00 1.00 1.00
Covert 0.88 0.98 0.84 0.81 0.89 0.88

To investigate the differences in characteristics of the spatial activity patterns and transient responses, we computed these at a transient response length of 300 ms for both conditions. Fig. 3 shows an example of the spatial pattern and transient responses for S4. Across participants, we observed that the spatial activity pattern for the overt condition was more focally distributed, whereas it was more lateralized for the covert condition.

Refer to caption
(a)
Refer to caption
(b)
Figure 3: Spatial activity pattern and transient responses of participant S4. (a) and (b) show the spatial activity pattern and transient responses of S4 for the overt and covert conditions, respectively. For all participants, the spatial activity for the overt condition was more focally distributed as compared to the more lateralized distribution seen for the covert condition. The spatial pattern 𝐚C𝐚superscript𝐶\mathbf{a}\in\mathbb{R}^{C}bold_a ∈ blackboard_R start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT was estimated as 𝐚=𝐰𝚺𝐚superscript𝐰top𝚺\mathbf{a}=\mathbf{w}^{\top}{\bm{\Sigma}}bold_a = bold_w start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_Σ, where 𝚺C×C𝚺superscript𝐶𝐶{\bm{\Sigma}}\in\mathbb{R}^{C\times C}bold_Σ ∈ blackboard_R start_POSTSUPERSCRIPT italic_C × italic_C end_POSTSUPERSCRIPT is the spatial covariance matrix.

DISCUSSION

Our pilot study provides fundamental insights into the plausibility of a c-VEP-based stimulation paradigm for decoding covert spatial attention, thereby potentially eliminating the need for the ability to make eye movements to control a c-VEP BCI. We implemented a two-class paradigm, requiring participants to attend on a stimulus either to the left or to the right of their fixation point. The stimuli background flashed following pseudo-random noise-codes, while their foreground simultaneously presented a random sequence of five distinct shapes with an infrequent target shape. Participants were tasked with counting the occurrences of the target shape amidst the shape sequence (see Fig. 1). In this pilot study, we used sequential stimulation to assess the feasibility of covert c-VEP, before moving to the more complex parallel stimulation requiring covert spatial attention.

In our experiment, participants engaged with the stimuli through either overt means, involving eye movements to foveate on the target, or covertly, relying on spatial attention to focus on a target. In the overt condition, we reached a decoding performance of 100100100100 % for all participants. In the covert condition, we achieved an average accuracy of 88888888 %. To the best of the authors’ knowledge, this marks the first evaluation of a c-VEP BCI using covert attention, although here we still rely on sequential stimulation. Our study highlights the feasibility of such a design for developing gaze-independent BCIs that can be used by people with ALS.

In the overt condition, all participants achieved 100 % accuracy, likely caused by the large data availability, low number of classes, and sequential stimulation. Specifically, this study used 5.3 min of data for training and 20 s for testing, while 1 min training and 1-2 s testing would suffice [thielen2021]. In the covert condition, we employed 16 min of data for training, achieving a decoding accuracy of 88888888 %. This result underscores the lower SNR in the covert condition compared to the overt scenario. Nevertheless, although using sequential stimulation, the attained performance surpasses the 62626262 % accuracy reported in a similar SSVEP study that used parallel stimulation [egan2017], offering evidence for the potential performance of gaze-independent c-VEP.

It is essential to approach the results of our study on gaze-independent c-VEP BCI with caution and consider two important limitations. Firstly, this preliminary study involved a small cohort of five highly motivated participants. Secondly, the c-VEP protocol employed sequential stimulation, where only the stimulus on the attended side alternated its background based on the pseudo-random noise-code. In practical online usage of the BCI, simultaneous stimulation on both sides is necessary. While our study offers valuable fundamental insights into the feasibility of gaze-independent c-VEP BCI, it is imperative to acknowledge these limitations. Further research, including a larger sample size and parallel stimulation, is crucial to fully unveil the potential of this approach.

Additionally, it is important to acknowledge that stimulation paradigms outside the visual domain have been explored as well for developing independent BCIs. For instance, Schreuder and colleagues developed the P300-based auditory multi-class spatial ERP (AMUSE) interface reaching a classification accuracy of about 85858585 % (N=6𝑁6N=6italic_N = 6 classes) [Schreuder2011]. Similarly, Brouwer and van Erp designed a P300-based BCI using vibro-tactile feedback around the waist with an accuracy of 58585858 % (N=6𝑁6N=6italic_N = 6 classes) and 73737373 % (N=2𝑁2N=2italic_N = 2 classes) [Brouwer2010]. Moreover, Van der Waal and colleagues [vanderWaal2012] used tactile stimulation on the finger tips reaching a classification accuracy of 82828282 % (N=6𝑁6N=6italic_N = 6 classes). These results may also highlight the potential to explore the pseudo-random stimulation protocol in the auditory and tactile domain.

Our studies’ design enables the use of two additional features in the analysis pipeline, possibly further improving the accuracy. Firstly, the stimulus protocol used in the study was designed such that the infrequent occurrence of the target events within the shape sequence could potentially evoke a P300 response. Hence, the P300 response could be used alongside the c-VEP to decode the attended side, similar to P300 response that was used alongside the SSVEP by Egan and colleagues [egan2017]. Secondly, the alpha-band modulations are expected to be lateralized with respect to the attended side [worden2000]. Specifically, covertly attending to a stimulus on one side suppresses visual alpha-activity in the contra-lateral (task-positive) hemisphere, while it increases alpha in the ipsi-lateral (task-negative) hemisphere [jensen2010]. Hence, visual alpha oscillations can also be used as an additional feature, again similar to the alpha response used alongside the SSVEP in earlier work [egan2017]. Thirdly, aligning with the anticipated lateralization in the alpha-band, we also anticipate lateralization in the c-VEP itself during the covert condition. In our current application of reconvolution CCA, a single spatial filter was employed to decode the attended side. This method can be extended by incorporating distinct spatial filters for each side, a concept referred to as an ‘ensemble’ decoder [gembler2020b]. Finally, in the present study, we employed only two stimuli positioned on either side of the fixation point, using luminance modulation with two 126-bit Gold codes. Given the limited number of classes, there is potential to explore shorter codes, which could lead to faster decoding. Furthermore, alternative codes, such as the m-sequence or Golay sequence, may be considered, as they have shown promise in enhancing classification accuracy [thielen2023a].

CONCLUSION

Our study shows the feasibility and high performance of a novel covert BCI design based on c-VEP. Our design eliminates the dependence on gaze, which is an essential feature if BCIs are to be used by people that have no voluntary control over their eye movements, such as people living with late stage ALS. Further, the design of the study makes it possible to use additional measures of brain activity to improve classification performance, which is a potential fruitful avenue for future work to improve the efficacy of the gaze-independent c-VEP BCI. Overall, our results suggest the potential for a high-speed BCI that does not rely on any overt behavior.

ACKNOWLEDGEMENTS

This work was part of the project ‘Obtaining fast brain-computer interfacing without eye movements for communication and control’ with project number OCENW.XS23.1.127 of the research programme ‘Open Competitie ENW XS’ which is financed by the Dutch Research Council (NWO).

REFERENCES \printbibliography[heading=none]