a

TOWARDS GAZE-INDEPENDENT C-VEP BCI: A PILOT STUDY
S. Narayanan¹, S. Ahmadi¹, P. Desain¹, J. Thielen¹
¹Donders Institute, Radboud University, Nijmegen, the Netherlands
E-mail: jordy.thielen@donders.ru.nl

ABSTRACT: A limitation of brain-computer interface (BCI) spellers is that they require the user to be able to move the eyes to fixate on targets. This poses an issue for users who cannot voluntarily control their eye movements, for instance, people living with late-stage amyotrophic lateral sclerosis (ALS). This pilot study makes the first step towards a gaze-independent speller based on the code-modulated visual evoked potential (c-VEP). Participants were presented with two bi-laterally located stimuli, one of which was flashing, and were tasked to attend to one of these stimuli either by directly looking at the stimuli (overt condition) or by using spatial attention, eliminating the need for eye movement (covert condition). The attended stimuli were decoded from electroencephalography (EEG) and classification accuracies of $88$ % and $100$ % were obtained for the covert and overt conditions, respectively. These fundamental insights show the promising feasibility of utilizing the c-VEP protocol for gaze-independent BCIs that use covert spatial attention when both stimuli flash simultaneously.

INTRODUCTION

A brain-computer interface (BCI) records its users’ brain activity and translates it into a computer command, opening a novel non-muscular channel for communication and control [wolpaw2002]. Typically, a BCI records brain activity with electroencephalography (EEG) because it is affordable, practical, and non-invasive.

One of the fastest BCIs for communication uses the code-modulated visual evoked potential (c-VEP) as measured in the EEG [martnezcagigal2021]. The c-VEP is observed during visual stimulation of the user with a pseudo-random sequence of flashes. As each of the presented symbols concurrently flickers with a random but unique sequence of flashes, specific brain activity is evoked when the user attends to one of the symbols. Subsequently, machine learning algorithms infer the attended symbol from the users’ evoked brain activity. Such a visual BCI speller allows its user to select symbols or commands and as such communicate, bypassing most of the motor system [verbaarschot2021].

Unfortunately, an important limitation of a standard visual BCI speller is the requirement of the users’ eyes to shift their gaze towards (i.e., fixate on) a target symbol. Because BCI control is fully dependent on eye movements, this poses a major challenge and quickly renders the BCI uncontrollable for people who have lost voluntary control of their eye movements, i.e., people living with late-stage amyotrophic lateral sclerosis (ALS).

In the visual domain, several studies have attempted to develop a gaze-independent BCI. For instance, Blankertz and colleagues developed a BCI speller called the ‘Hex-o-Spell’ that used motor imagery (imagined right hand and right foot movement, i.e., $N=2$ classes) of the user to aid the selection of characters from six hexagonal fields [blankertz2006]. They reported a typing speed of 2.3–5 char/min and 4.6–7.6 char/min, for their two participants respectively. Interestingly, Treder and Blankertz showed that visual covert spatial attention can also be used to operate the ‘Hex-o-Spell’ and the ‘Matrix’ speller using the P300 event-related potential (ERP) [treder2010]. This covert ‘Hex-o-Spell’ outperformed the covert ‘Matrix’ speller, with a classification accuracy of $60$ % ( $N=36$ classes) and $40$ % ( $N=30$ classes), respectively.

Furthermore, work by Treder and colleagues compared the P300-based ‘Hex-o-Spell’, the ‘Cake Speller’, which is similar to the former, and a ‘Center Speller’, where unique geometric shapes with different colors were closely surrounded by characters, and presented centrally on the screen in a sequential fashion [treder2011a]. A classification accuracy of $91.3$ %, $88.2$ %, and $97.1$ % was reported for the three spellers, respectively ( $N=30$ classes). Similarly, Chen and colleagues [Chen2016] used an extension of the P300 oddball paradigm, namely, rapid serial visual presentation (RSVP). The authors used two versions: a colored circles paradigm (CCP), and a dummy face paradigm (DFP). The average performances obtained from the CCP and DFP paradigms were in the range $51.6$ % and $73.5$ %, respectively.

Additionally, Treder and colleagues, in another instance, focused on using changes in alpha band activity induced by covert attention shifts to classify the direction in which attentional shifts occurred [treder2011b]. The authors showed that a classification accuracy of $73.65$ % was obtained ( $N=2$ classes). These results indicate the potential of using alpha activity as a feature for spatial attention decoding in gaze-independent BCIs.

Furthermore, Kelly and colleagues designed a gaze-independent BCI for communication by combining features from the steady-state visual evoked potential (SSVEP) and alpha band modulations to decode covert spatial attention [kelly2005]. The authors reported an average performance of $70.3$ %, $72.8$ % and $79.5$ % when using the SSVEP, alpha band, or both features in their analysis pipeline, respectively ( $N=2$ classes). Similarly, Egan and colleagues [egan2017] aimed for a hybrid gaze-independent speller using the P300 ERP and alpha in addition to the SSVEP. Importantly, adding the P300 response and alpha as additional features in their classification pipeline improved the performance by $17$ % to an overall $79$ % when compared to the performance using only the SSVEP, achieving $62$ % ( $N=2$ classes).

In this pilot study, we work towards a gaze-independent BCI. The gaze-dependent c-VEP has recently demonstrated exceptional performance, surpassing other evoked paradigms like ERP and SSVEP [shi2024]. Another study revealed the reliable decoding of c-VEP from peripheral stimulation (away from fixation) compared to direct foveal stimulation (at fixation) [waytowich2015].

Our objective is to acquire fundamental insights on the feasibility of decoding the c-VEP in a fully gaze-independent manner. Specifically, participants will use covert spatial attention to concentrate on stimuli, eliminating the need for direct eye movements to foveate on them. In this pilot work, the stimuli were presented sequentially, to assess whether the c-VEP can be decoded from the far periphery, before testing the more complex parallel stimulation case, where stimuli would be presented simultaneously. If successful, this study provides the first steps to a gaze-independent c-VEP BCI, potentially providing a high-speed neuro-technological assistive device for individuals who may not have reliable control of their eye movements.

MATERIALS AND METHODS

Participants: Five participants (all male, mean age 31 years, range 24-50 years) were included in this study after obtaining written informed consent. Two participants were authors of this study. A pre-screening procedure excluded any participants with a history of epilepsy or brain injury. All participants had normal or corrected-to-normal vision and reported no central nervous system abnormalities. This study was approved by the Ethical Committee of the Faculty of Social Sciences at the Radboud University Nijmegen.

Materials: EEG data from 64 Ag/AgCl active electrodes placed according to the international 10-10 system were recorded at 512 Hz amplified by a Biosemi ActiveTwo amplifier. The data were preprocessed using a notch filter at 50 Hz and a bandpass filter with a lower cutoff at 1 Hz and a higher cutoff at 40 Hz. Subsequently, the data were sliced to trials starting at 500 ms before stimulus onset until 20 s after stimulus onset. Finally, the data were downsampled to 120 Hz, and the 500 ms pre-stimulus that may have captured filter artefacts due to initial slicing and subsequently filtering were removed.

The stimulus protocol (see Fig. 1) was displayed on a 27 in Corsair Xeneon 27QHD240 OLED screen at a $1920\times 1080$ px resolution with a 120 Hz refresh rate. The participants were seated at a 60 cm distance in front of the display. A black fixation cross was presented at the center of the screen on a mean luminance gray background. To each of the sides of the fixation cross at a distance of $2.1^{\circ}$ , two circles with a $3^{\circ}$ diameter were presented.

The circles’ background color was luminance modulated with binary pseudo-random noise-codes, such that ones represent a white and zeros a black background. We used 126-bit modulated Gold codes [gold1967, thielen2015], which contained only short flashes of 16,67 ms (bit sub-sequence ‘010’) and long flashes of 33,33 ms (bit sub-sequence ‘0110’) at a presentation rate of 60 Hz. From the available modulated Gold codes, we carefully selected one for the left side. For the right circle a 61 bits phase-shifted version of the left code was used. This was done such that the noise-codes’ properties were identical, but still had a near-zero correlation at a maximum delay.

Inside the circles ( $3^{\circ}$ diameter), five colored shapes were presented with a maximum possible height and width of $1.4^{\circ}$ each. The shapes and their colors are as follows : a green circle ( $1.4^{\circ}$ diameter), magenta hourglass ( $0.9^{\circ}\times 1.4^{\circ}$ ), cyan inverted triangle ( $0.9^{\circ}\times 1.4^{\circ}$ ), red rectangle ( $1.5^{\circ}\times 0.5^{\circ}$ , rotated by $45^{\circ}$ ), and the yellow triangle ( $0.9^{\circ}\times 1.4^{\circ}$ ). All shapes had the same brightness and were sequentially presented in random order at a rate of 4 Hz (see Fig. 1). Participants were asked to count the number of times that the magenta hourglass, i.e., the target shape, occurred on the cued side, to facilitate sustaining their attention and to evaluate the behavioral performance of attending to each of the sides. Within a trial, the temporal distance between the presentation of two target shapes was at least 1 s, the target shape could not be presented on both sides simultaneously, and the number of times the target shape was presented differed for the two sides within a trial.

In this pilot study, we used sequential stimulation in both overt and covert runs to make the first step towards gaze-independent c-VEP BCI. That is, only the circle on the attended (cued) side underwent alternating background changes based on the pseudo-random noise-code, while the unattended side retained a constant black background. Notably, both sides featured distinct shape sequences despite this sequential stimulation protocol.

Experiment: During the experiment, participants completed five runs: four runs required covert attention and one required overt attention, the order of which was randomized across participants. Each run consisted of 20 trials, 10 for each of the two classes, in random order. At the start of a run, a 5 s period was used to let the participant prepare for the upcoming trials. At the start of a trial, a 1-second cue was presented to indicate the to-be-attended side using an arrow. Subsequently, for a duration of 20 s, the cued circle flashed according to its bit sequence while the uncued circle remained static, while both circles showed their distinct shape sequences. At the end of a trial, participants were given a maximum of 5 s to enter the number of target shapes they counted on the attended side using a keyboard, after which they received feedback for a period of 1 s on the correctness of their response. Finally, before continuing to the next trial, a 1 s blank inter-trial interval was presented. At the end of a run, the behavioral accuracy of correct responses was shown on the screen. Participants could take self-paced breaks in between runs.

In summary, we gathered 20 trials for each participant in the overt condition, whereas the covert condition involved the recording of 80 trials per participant. In both conditions, the labels (left and right) were balanced.

Analysis: We used a template-matching classifier to predict the attended side (left or right) given the recorded brain activity. Specifically, we used the ‘reconvolution’ method [thielen2015], which assumes that the evoked response to a stimulus sequence can be described by the linear superposition of the responses to the individual flashes in that sequence. The reconvolution approach can substantially reduce the number of parameters while increasing the number of samples to train these parameters, which effectively can limit the required training data [thielen2021].

In reconvolution, the event time-series of the $i$ th stimulus sequence are listed in the event matrix $\mathbf{E}_{i}\in\mathbb{R}^{E\times T}$ for $E$ -many events and $T$ -many samples. This matrix describes the onset of each of the events in a sequence. In this work, the events were defined as the onset of the stimulation sequence in each trial, and one event for each of the the flash durations (short ‘010’ and long ‘0110’), for a total of $E=3$ events.

The event time-series are subsequently transformed to a structure matrix that not only describes the onset, but also the modeled length and importantly the overlap of the transient responses for each of the events in the event matrix. Assuming that the transient response length can be limited to $L$ samples without losing relevant data, the structure matrix of the $i$ th stimulus sequence is a Toeplitz-like matrix $\mathbf{M}_{i}\in\mathbb{R}^{M\times T}$ for $M=E*L$ event time points.

Let’s assume we have a training dataset $\{(\mathbf{X}_{1},y_{1}),(\mathbf{X}_{j},y_{j})\dots,(\mathbf{X}_{J},y_{J})\}$ including labeled EEG data for $j\in\{1,...,J\}$ trials with the single-trial EEG $\mathbf{X}\in\mathbb{R}^{C\times T}$ of $C$ -many channels and $T$ -many samples and the associated binary label $y\in\{0,1\}$ . With this data, we can learn a spatial filter $\mathbf{w}\in\mathbb{R}^{C}$ and a temporal response vector $\mathbf{r}\in\mathbb{R}^{M}$ by maximizing the following correlation $\rho$ as part of a canonical correlation analysis (CCA):

\underset{\mathbf{w},\mathbf{r}}{\arg\max}\leavevmode\nobreak\ \rho(\mathbf{w}% ^{\top}\mathbf{S},\mathbf{r}^{\top}\mathbf{D})

(1)

where $\mathbf{S}=[\mathbf{X}_{1},\mathbf{X}_{j},\dots,\mathbf{X}_{J}]$ are the concatenated single trials and $\mathbf{D}=[\mathbf{M}_{y_{1}},\mathbf{M}_{y_{j}},\dots,\mathbf{M}_{y_{J}}]$ are the concatenated accompanying structure matrices.

Having learned the spatial filter and temporal response vector, we can now predict the label of a new trial $\hat{y}$ by maximizing the following Pearson’s correlation $\rho$ :

\hat{y}=\underset{i}{\arg\max}\leavevmode\nobreak\ \rho(\mathbf{w}^{\top}% \mathbf{X},\mathbf{r}^{\top}\mathbf{M}_{i})

(2)

Here, $\mathbf{w}^{\top}\mathbf{X}$ is the spatially filtered data and $\mathbf{r}^{\top}\mathbf{M}_{i}$ is the predicted response template for the $i$ th stimulus sequence.

To evaluate the performance of the reconvolution CCA on the overt and covert data, we used a chronological 4-fold cross-validation within each condition. The classification accuracy was averaged across folds. Note, the c-VEP stimulation was only applied on the attended side, while the unattended side remained a black background color. In the decoding analysis, we simulated as if the unattended side had been flashing with the noise-code other than the one presented on the attended side.

Code for the reconvolution CCA approach is available at: https://github.com/thijor/pyntbci.

RESULTS

As this study presents the initial step to decode c-VEP from peripheral stimulation, aiming towards covert spatial attention, it is imperative to study how classification accuracy is influenced by the modeled transient response length. Given the potential for distinct transient responses between conditions, we assessed the mean accuracy across transient response lengths spanning from 0.1 to 0.9 s, for all participants (S1-S5) and both conditions (see to Fig. 2).

In the covert condition, mean accuracy fluctuated from $55$ % to $99$ % across participants, whereas in the overt condition, mean accuracy remained consistently at $100$ % for all participants across all transient response lengths.

In the covert condition, participants S3 and S4 achieved a peak accuracy of $85$ % and $86$ % respectively, observed at a transient response length of 200 ms. Participants S1 and S5 reached a highest accuracy of $88$ % and $89$ %, respectively, at a transient response length of 300 ms. Participant S2 demonstrated a peak accuracy of $99$ % at 400 ms. Notably, the mean accuracy across participants in the covert condition was highest at a transient response length of 300 ms. Hence, for subsequent analysis, we use a transient response length of 300 ms.

Tab. 1 shows the classification accuracy for a transient response length of 300 ms. The scores obtained in the covert condition for S1-5 were $88\,\%,98\,\%,84\,\%,81$ % and $89$ %, respectively, leading to an average of $88$ %. The overt condition performed better for all participants ( $100$ %). All individual scores in Tab. 1 are significantly higher ( $p<.001$ ) than chance level ( $50$ %) as verified by a permutation test using $1000$ permutations.

	S1	S2	S3	S4	S5	Avg
Overt	1.00	1.00	1.00	1.00	1.00	1.00
Covert	0.88	0.98	0.84	0.81	0.89	0.88

To investigate the differences in characteristics of the spatial activity patterns and transient responses, we computed these at a transient response length of 300 ms for both conditions. Fig. 3 shows an example of the spatial pattern and transient responses for S4. Across participants, we observed that the spatial activity pattern for the overt condition was more focally distributed, whereas it was more lateralized for the covert condition.

DISCUSSION

Our pilot study provides fundamental insights into the plausibility of a c-VEP-based stimulation paradigm for decoding covert spatial attention, thereby potentially eliminating the need for the ability to make eye movements to control a c-VEP BCI. We implemented a two-class paradigm, requiring participants to attend on a stimulus either to the left or to the right of their fixation point. The stimuli background flashed following pseudo-random noise-codes, while their foreground simultaneously presented a random sequence of five distinct shapes with an infrequent target shape. Participants were tasked with counting the occurrences of the target shape amidst the shape sequence (see Fig. 1). In this pilot study, we used sequential stimulation to assess the feasibility of covert c-VEP, before moving to the more complex parallel stimulation requiring covert spatial attention.

In our experiment, participants engaged with the stimuli through either overt means, involving eye movements to foveate on the target, or covertly, relying on spatial attention to focus on a target. In the overt condition, we reached a decoding performance of $100$ % for all participants. In the covert condition, we achieved an average accuracy of $88$ %. To the best of the authors’ knowledge, this marks the first evaluation of a c-VEP BCI using covert attention, although here we still rely on sequential stimulation. Our study highlights the feasibility of such a design for developing gaze-independent BCIs that can be used by people with ALS.

In the overt condition, all participants achieved 100 % accuracy, likely caused by the large data availability, low number of classes, and sequential stimulation. Specifically, this study used 5.3 min of data for training and 20 s for testing, while 1 min training and 1-2 s testing would suffice [thielen2021]. In the covert condition, we employed 16 min of data for training, achieving a decoding accuracy of $88$ %. This result underscores the lower SNR in the covert condition compared to the overt scenario. Nevertheless, although using sequential stimulation, the attained performance surpasses the $62$ % accuracy reported in a similar SSVEP study that used parallel stimulation [egan2017], offering evidence for the potential performance of gaze-independent c-VEP.

It is essential to approach the results of our study on gaze-independent c-VEP BCI with caution and consider two important limitations. Firstly, this preliminary study involved a small cohort of five highly motivated participants. Secondly, the c-VEP protocol employed sequential stimulation, where only the stimulus on the attended side alternated its background based on the pseudo-random noise-code. In practical online usage of the BCI, simultaneous stimulation on both sides is necessary. While our study offers valuable fundamental insights into the feasibility of gaze-independent c-VEP BCI, it is imperative to acknowledge these limitations. Further research, including a larger sample size and parallel stimulation, is crucial to fully unveil the potential of this approach.

Additionally, it is important to acknowledge that stimulation paradigms outside the visual domain have been explored as well for developing independent BCIs. For instance, Schreuder and colleagues developed the P300-based auditory multi-class spatial ERP (AMUSE) interface reaching a classification accuracy of about $85$ % ( $N=6$ classes) [Schreuder2011]. Similarly, Brouwer and van Erp designed a P300-based BCI using vibro-tactile feedback around the waist with an accuracy of $58$ % ( $N=6$ classes) and $73$ % ( $N=2$ classes) [Brouwer2010]. Moreover, Van der Waal and colleagues [vanderWaal2012] used tactile stimulation on the finger tips reaching a classification accuracy of $82$ % ( $N=6$ classes). These results may also highlight the potential to explore the pseudo-random stimulation protocol in the auditory and tactile domain.

Our studies’ design enables the use of two additional features in the analysis pipeline, possibly further improving the accuracy. Firstly, the stimulus protocol used in the study was designed such that the infrequent occurrence of the target events within the shape sequence could potentially evoke a P300 response. Hence, the P300 response could be used alongside the c-VEP to decode the attended side, similar to P300 response that was used alongside the SSVEP by Egan and colleagues [egan2017]. Secondly, the alpha-band modulations are expected to be lateralized with respect to the attended side [worden2000]. Specifically, covertly attending to a stimulus on one side suppresses visual alpha-activity in the contra-lateral (task-positive) hemisphere, while it increases alpha in the ipsi-lateral (task-negative) hemisphere [jensen2010]. Hence, visual alpha oscillations can also be used as an additional feature, again similar to the alpha response used alongside the SSVEP in earlier work [egan2017]. Thirdly, aligning with the anticipated lateralization in the alpha-band, we also anticipate lateralization in the c-VEP itself during the covert condition. In our current application of reconvolution CCA, a single spatial filter was employed to decode the attended side. This method can be extended by incorporating distinct spatial filters for each side, a concept referred to as an ‘ensemble’ decoder [gembler2020b]. Finally, in the present study, we employed only two stimuli positioned on either side of the fixation point, using luminance modulation with two 126-bit Gold codes. Given the limited number of classes, there is potential to explore shorter codes, which could lead to faster decoding. Furthermore, alternative codes, such as the m-sequence or Golay sequence, may be considered, as they have shown promise in enhancing classification accuracy [thielen2023a].

CONCLUSION

Our study shows the feasibility and high performance of a novel covert BCI design based on c-VEP. Our design eliminates the dependence on gaze, which is an essential feature if BCIs are to be used by people that have no voluntary control over their eye movements, such as people living with late stage ALS. Further, the design of the study makes it possible to use additional measures of brain activity to improve classification performance, which is a potential fruitful avenue for future work to improve the efficacy of the gaze-independent c-VEP BCI. Overall, our results suggest the potential for a high-speed BCI that does not rely on any overt behavior.

ACKNOWLEDGEMENTS

This work was part of the project ‘Obtaining fast brain-computer interfacing without eye movements for communication and control’ with project number OCENW.XS23.1.127 of the research programme ‘Open Competitie ENW XS’ which is financed by the Dutch Research Council (NWO).

REFERENCES \printbibliography[heading=none]