Score rectification for online assessments in robot-assisted arm rehabilitation

Michael Sommerhalder; Yves Zimmermann; Manuel Knecht; Zelio Suter; Robert Riener; Peter Wolf

doi:10.1515/auto-2022-0113

Open Access Published by De Gruyter (O) November 16, 2022

Score rectification for online assessments in robot-assisted arm rehabilitation

Michael Sommerhalder

Michael Sommerhalder received his B.Sc. (2017) and M.Sc. (2020) in mechanical engineering at ETH Zurich followed by doctoral studies on the development of a polymorphic control framework for neurotherapy robots at ETH Zurich under the supervision Robert Riener. He is especially interested in system architecture, controls and computer vision.
, Yves Zimmermann

Yves Zimmermann received his B.Sc. (2015) and M.Sc. (2017) in mechanical engineering at ETH Zurich followed by doctoral studies on the development of robotics for neurotherapy at ETH Zurich under the supervision of Marco Hutter and Robert Riener. Over multiple projects he worked on developing controls and hardware for human-machine interaction from an electric race car to a robotic exoskeleton for neurotherapy.
, Manuel Knecht

Manuel Knecht completed his B.Sc. in mechanical engineering at ETH Zurich and is now pursuing his master’s degree in Robotics, Sytems and Control with a focus on Rehabilitation Engineering. He is particularly interested in software, controls and mechatronics.
, Zelio Suter

Zelio Suter completed his B.Sc. (2019) in mechanical engineering and is currently in his masters studies in Robotics, Systems and Control at ETH Zurich. He joined the Sensory Motor Systems Lab in August 2022. Zelio Suter is particularly interested in fields where mechanics and dynamics meet complex programming and mathematical tasks, control theory and data analysis.
, Robert Riener

Robert Riener is full professor for Sensory-Motor Systems at the Department of Health Sciences and Technology, ETH Zurich, and full professor of medicine at the University Hospital Balgrist, University of Zurich. He obtained a MSc in mechanical engineering in 1993 and a PhD in biomedical engineering 1997, both from TU München, Germany. In 2003 he became professor in Zurich. His main research focus is in rehabilitation robotics, virtual reality, and biomechanics. Riener has published more than 400 peer-reviewed articles, 36 book chapters and filed 26 patents. He is the initiator and head of the strategic board of the Cybathlon.
and Peter Wolf

Peter Wolf has a Master’s degree in sports engineering and completed his doctorate in biomechanics at ETH Zurich. In addition to the development of sport-specific measurement platforms, his research interest lies in the optimal interaction between users and robotics in the field of motor learning.

From the journal at - Automatisierungstechnik

https://doi.org/10.1515/auto-2022-0113

Abstract

Relative comparison of clinical scores to measure the effectiveness of neuro-rehabilitation therapy is possible through a series of discrete measurements during the rehabilitation period within specifically designed task environments. Robots allow quantitative, continuous measurement of data. Resulting robotic scores are also only comparable within similar context, e.g. type of task. We propose a method to decouple these scores from their respective context through functional orthogonalization and compensation of the compounding factors based on a data-driven sensitivity analysis of the user performance. The method was validated for the established accuracy score with variable arm weight support, provoked muscle fatigue and different task directions on 6 participants of our arm exoskeleton group on the ANYexo robot. In the best case, the standard deviation of the assessed score in changing context could be reduced by a factor of 3.2. Therewith, we paved the way to context-independent, quantitative online assessments, recorded autonomously with robots.

Zusammenfassung

Roboter erlauben eine quantitative, kontinuierliche Messung von Daten. Kontinuierlich gemessene Metriken sind nur vergleichbar, wenn sie vom Messkontext losgelöst werden können. Wir führen eine Methode ein, die eine Vergleichbarkeit durch ein Orthogonalisierungsverfahren und Ausgleichsmodell in gängigen Metriken erreicht. Die Methode wurde an der Accuracy-Metrik mit Armgewicht-Kompensation an sechs Mitgliedern unserer Gruppe am ANYexo-Roboter getestet. Im besten Fall konnte die Standardabweichung der Accuracy in veränderlichem Kontext um einen Faktor von 3.2 reduziert werden. Dabei erreichten wir einen ersten Schritt in Richtung automatisierter Online-Messung von kontextunabhängigen quantitativen Scores.

Keywords: assistance as needed; exoskeleton; rehabilitation; robotic assessments

Schlagwörter: Unterstützung nach Bedarf; Exoskelette; Rehabilitation; robotische Metriken

1 Introduction

Stroke is one of the leading causes of death and disability with over 12.2 million new incidences each year globally [1]. Stroke survivors often suffer from neuromuscular impairments of the upper limb which are treated through long-term and high-intensity neuro-rehabilitation. To determine if their therapy is effective and to adapt the therapy if necessary, therapists assess the patient’s neuromuscular abilities quantitatively by means of well-established clinical scores (e.g. Fugl–Meyer Assessment [2], Modified Ashworth Scale [3]). These scores are collected regularly (in studies every two to four weeks). Observing the relative change in the scores (and thereby tracking of the patient’s progress) is feasible with these assessments, as they are performed in dedicated measurement protocols in a controlled environment. However, these scores have to be assessed during the ordinary therapy time whereby valuable training time is lost. To complement these discrete assessments, therapists observe the patient throughout the sessions qualitatively. Although this observation allows for a continuous estimation of the patient’s state without disturbing the training, it is prone to subjective perception. Furthermore, a therapist will only be able to record a handful of data points in their respective context for each session, limiting the potential insight into therapy progress. Robots were introduced in upper-limb neuro-rehabilitation two decades ago, and relieve therapists from the physical labor with the help of assistance controllers. Robots can provide quantitative, objective records about the state and progress of patient recovery [4]. As a consequence, several robotic scores such as accuracy [5], smoothness [6] or reaction time [7] have been developed. Thereby, robots could potentially be used to continuously estimate the therapy progress quantitatively without interfering with the training itself. However, robotic scores might be biased, e.g. by the provided assistance, movement type, or manipulability effects from reduced range of motion. Coping with these biases has not been addressed so far, which has led to insignificant exploitation of the potential of robotic assessments.

In this work, we aimed for a method to rectify robotic scores, thereby making them comparable across multiple sessions and different settings. Our approach consists of (1) identifying the disturbances and task characteristics influencing robotic scores, (2) designing a score decomposition procedure conditioned on task characteristics, and (3) developing a disturbance rejection method trained on experimental data using regression, with a “one-parameter-set-fits-all” policy. The rectified scores are independent of task type, movement characteristics, assistance, and muscle fatigue. We validated our approach for the accuracy score with healthy participants performing various shoulder movements with different levels of assistance on an arm rehabilitation robot, and demonstrated the advantages of a task and disturbance-independent score in contrast to many incomparable scores in literature. The presented rectification method could mark a starting point toward meaningful and interpretable online robotic assessments. Future work should elaborate on other scores, validate the rectification on patients, and compare the rectified robotic scores with established clinical scores.

2 Methods

2.1 Disturbance identification

Neurological factors such as progress in learning, motivation and fatigue influence the outcome of robotic scores and bias the estimation of neuromuscular abilities (these and further essential factors listed in Figure 1). Additionally, the robotic system itself influences the estimated score by limiting the active range of motion of the patient due to the robot topology (as observed by [8, 9]). Although recent developments such as ANYexo methods as shown for instance in scores of movement quality [10], harmony [11], and other devices [12] could increase the allowable upper-limb range of motion significantly by including shoulder degrees of freedom, other factors such as perceived robot inertia, maximum speed, comfort of the patient, alignment and systematic measurement errors influence the resulting scores, too. Also the type of assessment task can result in different scoring methods as shown for instance in scores of movement quality [8, 13]. Hereby, type of assessment task is a combination of different task characteristics (see Section 2.1.1) and its manifestations (manipulability effects, i.e. proximal vs distal movements, work against gravity, and path length [14]). Measured scores also depend on type and level of robotic assistance. Type of assistance is defined as the combination of none, one or multiple controller characteristics such as arm weight compensation (as, for instance presented by Just et al. [15]), guidance and tunnel controlller [16], active pushing controller (as, for instance, presented by Baur et al. [5]), or audiovisual feedback [17].

Figure 1:

Identified disturbance vector d. Each entry alters a given robotic score s ^T and makes this score incomparable to previous scores for therapists.

The complexity of our disturbance model was reduced by neglecting influences that stayed constant, as it is sufficient for our rectified score to be relatively comparable (i.e. no ground truth). Disturbances from the robot could not be neglected if data needs to be compared between different systems and clinical environments, however these scenarios were not considered in this work. Further, psychological influences (i.e. motivation and learning effects) on the patient were assumed to stay constant for simplicity. The disturbances considered in this work are: Arm Weight Assistance d _A,AW, Work against Gravity d _T,WG and Muscle Fatigue d _P,FA.

2.1.1 Task characteristics

In a typical rehabilitation exercise, type of task, mapped joints, scoring method, audio, visual and haptic feedback are designed in an integrated manner. Although the tasks of different types share fundamental characteristics (based on [18], e.g. task types can be distinguished in spatial, spatio-temporal or temporal type), these shared characteristics are usually not exploited for the development of context-independent scoring methods. If scoring based on these characteristics is achieved, partial scores could be compared between different task types. As an example, games like holding penalty (C), navigating a bird (I), collecting stars in given order (E) and others (tasks A–D, G–I, see Figure 2) share the characteristic that the user has to reach a specific target area without time constraints, while navigating a bird and similar types (tasks C, G–I, see Figure 2) additionally share the characteristic that the user has to be at specific positions at given times. Further, we can distinguish between tasks where the target can be reached with an arbitrary path (tasks A–C, see Figure 2), and tasks where a clear reference path is given (tasks D–G, see Figure 2). We introduce a process called Score Decomposition (see Section 2.3) and show how these characteristics can be exploited by the robotic accuracy score, exemplified for four 2D tasks (task types A, C, D and G, see Figure 2).

Figure 2:

Typical visual displays of different exercises or games. The associated tasks can be grouped as follows: classical reaching task (A), reaching task with obstacles (B), defend goal task with moving target (C, Armeo Power Game) are all spatial tasks without reference path. Classical follow path task (D), collecting the stars in given order (E) [19], follow the path (F) [19] are all spatial tasks with reference path, but without timing. Classical follow trajectory task (G), inverted pendulum (H) [20]), navigating a bird (I) [21] are all spatio-temporal tasks either with (G, H) or without (I) reference path.

2.1.2 Robotic platform

Exercises and tasks are typically designed robot platform-specific to match the available degrees of freedom and range of motion of the device. Similar to the exercises, this integrated manner often cannot be altered, although the underlying game mechanics would be similar for different devices and different joints in e.g. clinical space or cartesian world space. To achieve independence, joints actively involved in training can be mapped into a normalized training space q = {q ₁, q ₂, …} (joint pose in training space is further denoted as cursor position), where each axis is scaled such that joint minima in clinical or cartesian space map to −1 and joint maxima map to 1. Circular goal areas in clinical or cartesian space thus map to ellipsoids with principal axis v ∈ R m , where m denotes the number of dimensions mapped to the normalized training space (e.g. m = 3 for a standard end-effector training in world space). We implemented this mapping on ANYexo [22], an upper-limb rehabilitation robot designed as a research platform (see Figure 3). The exoskeleton featured six actuated joints of type ANYdrive 2.0 (ANYbotics AG, Switzerland) that were aligned with the human joints of the shoulder and arm. All actuators had a maximum torque of 70 Nm and the joint speed was limited to 6 rad/s. The user and the robot physically interacted with each other at two cushioned locations: one at the upper arm and one at the forearm close to the wrist including a handle. For the scope of this work, we mapped Plane of Elevation (POE) and Angle of Elevation (AOE) of the clinical coordinate system to the training space’s X and Y axis respectively. Kinematic data q was recorded with a sampling frequency of 800 Hz. Tasks on average were executed at a frequency of about 0.3 Hz.

Figure 3:

A healthy participant is training with ANYexo, an upper-limb robotic platform featuring six actuated joints. Plane of Elevation (POE) and Angle of Elevation (AOE) of the clinical coordinate system were mapped to training space X and Y axis.

2.2 Score rectification procedure

Recorded kinematic data q of the robot are dependent on neuromuscular abilities of the patient n _A, the disturbances d , and task type characteristics c _T (e.g. of a spatial or a temporal task) (see Figure 4 data recording). For any given list of ( n _A, d , c _T), the resulting kinematic data are further processed by various metrics into scores s ^T evaluated at the end of every task. Besides the fact that the scores are influenced by the disturbances, the scores are usually designed task-specific such that they are compliant with the underlying set of characteristics. Even a score on accuracy has been specifically designed for each task [8] (e.g. spatial accuracy based on the distance to goal in a reach goal task [10] versus spatio-temporal accuracy in path following task [23]). To decouple the scores from specific task types, we have introduced a score orthogonalization method where scores can be defined for each characteristic individually via a scoring function g( q , c _T), and then merged together according to the set of enabled task characteristics c T e with function h c T , i e , s 1 T , s 2 T , … (see Figure 4 score decomposition). Finally, the rectified score s ̂ T presented to the therapist is calculated with function f(s ^T, d) that minimizes the influence of disturbances d according to a model calibrated by experimental data (see Figure 4 disturbance rejection).

Figure 4:

Full balancing model with score orthogonalization functions g and h to decouple the robotic scores from task characteristics, and balancing function f that decouples the scores from the disturbances of the current task T.

2.3 Score decomposition

The score decomposition procedure consists in (1) producing of a list of orthogonal scores (g), and (2) combining them to obtain task type-independent score s ^T (h). Starting from a classification of tasks in spatial and temporal components, spatial tasks can further be dissected into longitudinal (i.e. towards a goal position) and radial (i.e. toward a reference path) components. Considering accuracy, following a reference trajectory would lead to a combination of both spatial components and the temporal one while following a reference path would lead to a combination of the spatial components only. The resulting possibility to compare different tasks is particularly important when the number of task types with different characteristics increases, e.g. when a large number of activities of daily living are to be trained. In the scope of this work, we applied the decomposition procedure on the accuracy score. Other scores could follow a similar procedure. Example given, smoothness could also be dissected spatially and temporally, but is only assessed for tasks where path or trajectory are available. Other scores such as aiming angle [13] might already be one-dimensional, i.e. no orthogonalization procedure would be necessary.

2.3.1 Score baseline and normalization

Robotic scores need to be brought into a frame of reference that is understandable by the therapist. Often it is necessary to distinguish between good scores and bad scores, e.g. to decide if a task needs to be repeated or the patient needs training on a particular score. Accordingly, we introduced a score baseline s _b ∈ [s _min, s _max] that is adjustable by therapists. A score x is normalized to be 0 at the baseline, −1 at s _min (worst possible score) and 1 at s _max (best possible score) by means of the function f _n:

(1) f n x , s min , s max , s b ≔ β ( x − s b ) ( γ − δ ) x + δ s max − γ s min . β = s max − s min , γ = s max − s b , δ = s b − s min .

2.3.2 Radial accuracy

In order to formulate the radial accuracy, the reference path p is defined as a series of N−1 path segments with its corresponding N discretization points p ₁ … p _N (see Figure 5 Radial). Provided a certain cursor position c(t) at time t, one can define its projection on the path c ̄ ( t ) , its radial distance ϑ c ( t ) = | c ( t ) − c ̄ ( t ) | , and the current closest discretization point p _i. Further, ϑ _r,i is defined as the distance from c ̄ ( t ) to the intersection of the nominal target region with the line segment between c ̄ ( t ) and c(t). This distance is scaled with the score baseline s _B and the score is normalized according to Eq. (1). The corresponding partial radial accuracy for discretization point p _i averages all cursor observations O _i (i.e. all update steps where the cursor projected on segment i) within the corresponding path segment, and is updated each observation as follows:

(2) s R , i n + 1 = O i n s R , i n + f n ( ϑ c ( t ) , 2 ϑ r , i , 0 , ϑ r , i ) O i n + 1 ,

where s R , i 0 , O i 0 = 0 and O i n + 1 = O i n + 1 . A reasonable boundary for the worst possible score s _min was found to be double the distance to the target area. The baseline score of 0 is achieved as soon as the cursor reaches the surface of the ellipsoid. The radial accuracy score s R T weights all partial accuracy scores with the path length ξ _p,i of the corresponding segment:

(3) s R T = 1 ξ p ∑ i = 0 N − 1 ξ p , i s R , i .

Figure 5:

Decomposition of the accuracy score into longitudinal, radial and temporal component enables comparability of these scores between tasks of different types. Orange: target, green: reference, yellow: segment of interest.

2.3.3 Longitudinal accuracy

Similar to the radial accuracy, one can define the total path length ζ _p, the cursor path length ζ _c(t) as the path length from start position up until c ̄ ( t ) (see Figure 5 longitudinal) and finally, ζ _r as the path length up until the reference position r ̄ , i.e. the position on the path intersecting the surface of the goal ellipsoid. The longitudinal score is calculated as follows:

(4) s L T = f n ( ζ p − ζ ̄ c , 0 , ζ p , ζ p − ζ r ) , ζ ̄ c = max t ζ c ( t ) .

2.3.4 Temporal accuracy

Temporal accuracy assesses temporally the synchronization of a movement with the reference point (see Figure 5 temporal). When considering trajectories, the reference position is specified at every time instant, p(t). The projection of the cursor position on the path c ̄ ( t ) allows the determination of a reference time t ̄ . The distance between t ̄ and the current time t, as well as the maximal temporal distance t _max (beyond this limit a normalized temporal accuracy of −1 should be achieved, for healthy participants t _max = 1 s was found to be reasonable) and the goal distance t _trg (i.e. the temporal equivalent to the directed radius ϑ _r,i) is used to calculate the corresponding partial temporal accuracy for discretization point c _i. For each c _i, all cursor observations are considered and the partial score is updated as follows:

(5) s T , i n + 1 = O i n s T , i n + f n | t ̄ − t | , t max , 0 , t trg O i n + 1 ,

where s R , i 0 , O i 0 = 0 and O i n + 1 = O i n + 1 . Similar to the radial accuracy, the final temporal accuracy score s T T averages all partial accuracy scores.

2.3.5 Score combination

Combinations of partial decomposed scores depend on the task characteristics c T e ∈ { Type A , … , Type I } . Since no prior knowledge regarding the importance of each characteristic in each task type is available, we assumed each partial score to be of equal weight:

(6) s T ≔ h c T e , s R T , s L T , s T T = s L T + s R T 2 , if c T e : D,E,F s L T + s R T + s T T 3 , if c T e : G,H,I v s L T , s R T , if c T e : A,B,C .

(7) v s L T , s R T = s L T 2 + ∑ i = 0 N − 1 w i s R , i T 2 ∑ i = 0 N − 1 w i , w i = 1 N − i .

The goal of Task Type D was to follow a reference path with high accuracy, Task Type G additionally constrained velocity, Task Type A was an adapted Reach Goal task where a target area had to be reached, and for Task Type C the area to be reached was moving along a trajectory. Task Types A, B and C represented a special case: Although no specific path has to be followed, the cursor had to be moved to a defined target area, i.e. radial accuracy was required. Consequently, for task types A, B and C, the scores were combined in such a way that radial accuracy was only taken into account near the target area. Thus, at the beginning of the task, radial accuracy had no impact on the combined score, but towards the end of the task its influence increased. Since this results in an altered version of the radial score, this altered version was not considered as a proper characteristic (i.e. types A, B and C did not share the radial characteristic).

2.4 Disturbance rejection

Disturbance rejection can be divided into three parts: work against gravity, fatigue and assistance. To reject work against gravity (i.e. compensating for the direction of movement), the model can be simplified if we assume a set of distinct directions v _i. The function f for the current task can be approximated by a linear combination of the set of functions f _i, i.e. disturbance rejection for direction i. The functions are weighted according to their angular distance towards the current direction v ̄ :

(8) f ( s T , d ) = ∑ i = 1 N w i f i s T , d i ∑ i = 1 N w i , w i = 1 2 v i ⋅ v ̄ | v i ‖ v ̄ | + 1 .

The work against gravity can then be rejected by correcting the score to the mean value s ̄ i achieved for direction i. Further, we defined the rejection method as a subsequent process of assistance f _A,i, fatigue f _F,i and work against gravity rejection:

(9) f i s T , d i = f A , i ( f F , i ( s T ) ) − s ̄ i + max k s ̄ k + min k s ̄ k 2 .

2.4.1 Fatigue model

Muscle fatigue (and regeneration) for a given direction v _i at time t were estimated according to the model introduced by Jaber et al. [24]:

(10) C k = C k − 1 + 1 − C k − 1 1 − e − λ k ( t k − t k − 1 ) , if I C k − 1 e − μ ( t k − t k − 1 ) , if R , I = Intense Phase , R = Regeneration Phase

where C _k is accumulated fatigue by time t _k. The fatigue parameters λ _k are dependent on the assistance level l _A,k at interval k. The fatigue parameter λ and regeneration parameter μ were tuned (see Section 3). The fatigue rejection function takes the following form:

(11) f F , i s T , d i = 1 − s T C k + s T , λ i = ( 1 − l A , i ) λ .

2.4.2 Assistance compensation

Individual neuromuscular abilities yield different scores for movements without assistance. Scores of patients with good abilities saturate faster when assistance is applied than the scores of patients with lower abilities. This saturation behaviour can be modelled by defining a transformation function (m) with tuning parameter α to map the rectified fatigue score into a linear space where the assistance level l _A can be added to the neuromuscular ability n _A:

(12) f A , i s ̂ F T , d i = l A m − 1 s ̂ F T + ( 1 − l A ) s ̂ F T .

(13) m ( x ) = 2 1 1 + e − α x − 1 2 .

3 Experiments

To validate our disturbance rejection method, each of the six members of our group performed eight sessions (S ₁ − S ₈) with a predefined constant level of support l _A ∈ {0, 1} for each session (see Figure 6A). The level of support scales the applied forces located at the cuffs by the controller in upwards direction [15]. To increase the task difficulty for the participants (all healthy), an additional weight of 3 kg was mounted at the wrist handle, which was not compensated by the robot. The elbow joint was locked at an angle of 30° to only allow movements in the shoulder. l _A was ramped up and down from session to session to elicit fatigue and to be able to contrast it to learning effects. After each session, participants switched places with a second participant, allowing them to rest for about 5 min. Every session contained four exercises (E ₁ − E ₄) with constant conditions (see Figure 6B). Four exercises were sufficient to temporarily fatigue most participants to a point where they were no longer able to reach all goals. Each exercise involved eight tasks (T ₁ − T ₈), i.e. moving to 8 positions p _i and back to the middle point p _c while tracking a moving goal on the path. The positions were arranged in a star pattern (see Figure 6C) [9]. The positions were set in clinical joint space for Plane of Elevation (POE) and Angle of Elevation (AOE) [25]. POE and AOE were then mapped to q ₁ (X) and q ₂ (Y) axis in training space with limits q _i, min and q _i, max respectively (see Table 1). The sequence of tasks T ₁ − T ₈ was chosen randomly for each exercise, reducing the risk of systematic errors and learning effects.

Figure 6:

Study protocol of exercise 2: (A) A total of eight sessions were performed by two participants alternately with variable assistance level l _A. (B) Each session consisted of four exercise repetitions, and each exercise was a sequence of eight tasks in random order with (C) goal positions aligned in a star pattern.

Table 1:

Goal positions p _i of tasks T ₁ − T ₈ were aligned in a star pattern with center point p _c. Plane of Elevation (POE) and Angle of Elevation (AOE) were mapped to training space. q _i, min and q _i, max denote the range of motion boundaries used to map the positions into training space.

Axis	q _i, min	q _i, max	p _c	p ₁	p ₂	p ₃	p ₄	p ₅	p ₆	p ₇	p ₈
POE (°)	−30	120	60	60	42.3	35	42.3	60	77.7	85	77.7
AOE (°)	25	110	75	100	92.7	75	57.3	50	57.3	75	92.7

3.1 Model tuning

To be able to validate our model and generalize the results, we split the data of the six participants into equal-sized training and test set. Since a model that requires participant-specific calibration tasks prior to training would contradict our initial goal of shifting from discrete to online assessments, we decided on comparability between different movement directions. Therefore we split the tasks into predefined training set (T ₁, T ₃, T ₅, T ₇) and test set (T ₂, T ₄, T ₆, T ₈). Since work against gravity has a significant influence on the disturbance model and is highest along the vertical axis, it was decided not to randomize the train-test split but rather assign tasks to the training set along the global horizontal and vertical axis. The parameters and models as described in the previous chapter were then tuned on the training set using regression, and applied to the test set (see Table 2). The total amount of data points for training was n = 768 (six participants with eight sessions each, containing four exercises with four task directions each).

Table 2:

List of all tuning parameters for the fatigue and assistance models. Each parameter λ _i and μ _i was tuned on training data of exercise 2 from all six participants.

Name	Symbol	Count	Description
Fatigue	λ _i	4	Fatigue accumulation rate for direction i
Regeneration	μ _i	4	Regeneration accumulation rate for direction i
Assistance	α	1	Assistance transformation function parameter

4 Results

4.1 Fatigue rejection

The fatigue and regeneration parameters λ _i and μ _i were tuned for the directions along the horizontal and vertical axis (see Table 3) on data from all participants. After applying the accumulated fatigue model to the unrectified combined score s ^T(t) for direction 2 of the test set, the long-term difference in mean score between sessions S ₁ and S ₈ dropped from Δs ^T(S ₁, S ₈) = 0.11 to Δ s ̂ T ( S 1 , S 8 ) = 0.02 (see Figure 7). The difference in mean score between exercises E 12 = E 1 + E 2 2 and E 34 = E 3 + E 4 2 for session S ₈ dropped significantly from Δs ^T(E ₁₂, E ₃₄) = 0.12 to Δ s ̂ T (₁₂, E ₃₄) = −0.06. The difference between scores for exercises with high assistance level (l _A > 0.5) changed insignificantly after applying the fatigue model, as the accumulated fatigue C(t) depleted during sessions with high assistance. C(t) was highest for session S ₈ and S ₁.

Table 3:

Resulting parameter values for fatigue (λ _i), regeneration (μ _i) and assistance rejection (α) after tuning on training data from all participants for directions i ∈ {1, 3, 5, 7}.

Symbol	λ ₁	λ ₃	λ ₅	λ ₇	μ ₁	μ ₃	μ ₅	μ ₇	α
Value	0.003	0.0025	0.0002	0.0015	0.004	0.003	0.003	0.004	3.00

Figure 7:

Fatigue Rectification for T ₂. For each session S ₁ − S ₈ and exercise E ₁ − E ₄, data from all participant were aggregated. The difference in the mean score between S ₁ and S ₈ (Δs ^T(S ₁, S ₈)) as well as the difference between E ₁₂ and E ₃₄ for S ₈ (Δs ^T(E ₁₂, E ₃₄) were significantly lower after the tuned fatigue model (C(t)) was applied on the unrectified score. The accumulated fatigue C(t) is inversely proportional to s ^T.

4.2 Full disturbance rejection pipeline

The full rejection pipeline is exemplified on participant P6 (see Figure 8). Without any rejection, scores were dependent on the movement direction. P6 achieved a score of s ^T = −0.29 without assistance in upwards direction (T = 1, see Figure 8A). The unrectified score between assistance levels ( s 1 , l A = 0 T = − 0.29 , s 1 , l A = 1 T = 0.64 ) was eminent. For l _A = 1, similar scores were observed independent of the direction of movement. Applying the fatigue model seemed to increase scores of all assistance levels for upwards directions. Applying the assistance model seemed to decrease scores of high assistance level. Finally, applying the work against gravity model seemed to increase all scores in all directions. With the exception of direction T ₈, scores in the range between s ^T = 0.65 and s ^T = 0.74 were observed for all directions and assistance levels after applying the full disturbance rejection pipeline.

Figure 8:

Full disturbance rejection pipeline demonstrated on participant P6: the accuracy score s ^T ∈ { − 1, 1} was averaged over all data for a given direction from sessions with equal assistance level l _A. Process steps are: A − B: fatigue rejection, B − C: assistance rejection, C − D: work against gravity rejection. Green arrows indicate regions with most significant changes in the score.

The rectification model was applied to all six participants (see Figure 9). The difference in unrectified scores for participants P2 and P4 in all directions and all assistance levels was small, while a significant increase in the unrectified score with increasing assistance level was observed for participants P5 and P6. For participants P1 and P3, increasing the assistance level had negative effects on the score (e.g. Δs ^T = −0.11 for P3 in direction T = 2 with Δl _A = 1). Over all participants, an increase in the average accuracy score in all directions could be observed after applying the rectification model. The difference in standard deviation of all data points (n = 32, 8 directions × 4 assistance levels) per participant between unrectified and rectified scores seemed to correlate with the assistance-dependent differences in unrectified scores (see Figure 10). For participant P5, the standard deviation could be reduced significantly from σ _s = 0.15 to σ _s = 0.05 after applying rectification. For participants where the assistance level had negative effects on the score (i.e. P1 and P3), the standard deviation of the scores increased slightly from σ _s = 0.11 to σ _s = 0.14 for P1.

Figure 9:

Comparison between unrectified normalized scores s ^T ∈ {−1, 1} (red) and rectified scores (blue) for different levels of assistance (l _A) and participants P1–P6. Rectification effects were strongest for participants with significant responses to the provided assistance (P5, P6). Opposite effects were observed for participants where applied assistance led to a decrease in scores.

Figure 10:

Standard deviation between all unrectified (red) and rectified scores (blue) over all sessions and directions for each participant. Rectification led to a significant decrease in scoring variability for participants with high responses to the provided assistance (P5, P6).

5 Discussion

5.1 Score decomposition could deepen insights into the user’s neuromotor abilities during training

Decomposition and (re-)combination of scores was applied on the accuracy score for robot-assisted rehabilitation therapy. This process could result in deeper insights into the performance of patients within the current and past tasks, and their respective contexts. While decomposed scores allowed to extract information about the patient’s neuromotor abilities independent of the context that the movement was performed in, the combination of these orthogonal components could help the therapist to observe how the patient changes behavior upon contextual variation. For example, it could be determined by looking at decomposed and combined scores to what functional components patient’s optimized their movement for when provided only with the combined scores as feedback.

5.2 Score decomposition allows combination of scores with different units

Further, the problem of combining scores with different units is not new, and various approaches have been proposed in the past to solve this issue (e.g. to compare spatial and temporal errors [26]). However, these approaches were specifically designed for a predefined set of robotic scores dependent on the given task. In our approach we sought for functional orthogonal components that could be normalized into a generalized training space, i.e. to make them independent of the type of task. Our approach has the additional advantage that therapists or automated high-level algorithms can set individualized, patient-specific baselines for each of the components. Increasing the set of tasks does not alter the decomposed scoring functions and thus comparability of orthogonalized scores within new types of tasks is guaranteed. The generalizability of the procedure to other robotic scores such as movement smoothness, aiming angle, reaction time, or joint speed, seems straightforward but needs to be proven in the future.

5.3 Fatigue rejection model could predict optimal exercise switching in therapy

We were able to show accurate modeling of fatigue throughout our experiment sessions for each healthy participant (see resulting rectification scores in Figure 9). The model rejected disturbances in scoring over a whole training (T(S ₁, S ₈) was reduced from 0.11 to 0.001, see Figure 7) and over each session (Δ^T(E ₁, E ₄) was reduced from 0.35 to 0.04). We are thus confident that our fatigue model could generalize well to other healthy participants. It could be investigated if less data (e.g. only considering direction T = 0) could also be sufficient to tune this model. Generalization to patients has to be proven in future work, however, as additional influences would have to be considered, e.g. compensatory movements upon high muscle fatigue. Such compensatory movements were also observed during our experiment with healthy participants, especially for exercises 3 and 4. Muscle fatigue is an important factor in therapy planning and execution, as it deteriorates proprioception and motor control during training [20]. Usually, therapists estimate fatigue qualitatively and counteract the issue by e.g. switching to a different exercise targeting different muscle groups. With our approach, therapy systems could switch automatically from one exercise to another in an attempt of keeping the patient at an optimal challenge point.

5.4 Similar balancing effects as in motor learning for healthy participants were observed

Results showed that fatigue rejection increased scores of all assistance levels for upwards directions, whereby scores with lower assistance levels were increased stronger than scores with higher assistance. In contrast, assistance rejection targeted scores with high assistance values and lowered them for participants with significant responses to the provided assistance, i.e. it balanced the scores after fatigue rejection. Interestingly, scoring errors of P1 and P3 were amplified, since assistance rejection was inverted. Similar to effects observed for motor learning [27], over-applying assistance to healthy participants hampered scoring. These findings may stimulate further research of this effect, as important conclusions could be drawn towards the generalization potential of models trained on healthy participants when applying them to patients.

5.5 Comparable online assessments are technically feasible and potentially applicable to multiple areas

The combination of score decomposition and disturbance rejection worked well for healthy participants where assistance had a reasonable influence on the score. Overall, we could see clear evidence that comparable online assessments are technically feasible with an appropriate decomposition and disturbance rejection method, as the overall standard deviation of the assessed score in changing context could be reduced by a factor of 3.2 in the best case, and 0.8 in the worst case (see Figure 10). Being able to compare robotic scores online has advantages in a multitude of areas: In robot-assisted motor learning for healthy trainees (e.g. strength training combined with an accuracy task as given in our experiments), but also in neuro-rehabilitation for patients, as introduced. For the latter, further steps would be needed to transfer the method into the rehabilitation setting, by validating the method on patients and analyzing the correlation to established clinical scores such as Fugl–Meyer. Further, our approach could enable the direct comparison of the performance of developed controllers between different research groups, robotic devices, and task settings. The existent comparability issue is especially distinctive in neuro-rehabilitation, considering the high amount of different controllers that were developed and tested on small patient groups with questionable generalizability. For this, the feasibility of transferring the decomposition method to various other robotic scores could also be explored.

Finally, it has to be investigated if disturbances besides fatigue, assistance, and work against gravity could also have a significant influence on the scoring, which was not covered in this work: path length, motivation, and different assistance methods such as tunnel or guidance controllers.

6 Conclusions

We aimed for a method to rectify robotic scores, thereby making them comparable across multiple sessions and different robotic settings to allow continuous assessment of therapy progress without interrupting ordinary training. Clinical scoring methods have to be assessed during ordinary therapy time within controlled environments and dedicated measurement protocols. Our score decomposition method could be a necessary step to enable online comparability between different types of tasks without the need for further insights into the context in which they were performed. Further, the fatigue rejection method was shown to be an accurate model of the observed fatigue of healthy participants. Together with assistance and work against gravity rejection, the rectification method worked well for participants that could improve their unrectified scores with the help of robotic assistance. Overall, we see clear evidence that online assessments estimated in different contexts such as different types of tasks, different patients, and different levels of assistance, are technically feasible with an appropriate rectification method, although the significance of our approach still needs to be proven on patients, and the correlation to clinical scores still has to be established.

Corresponding author: Michael Sommerhalder, Sensory-Motor Systems Lab, ETH Zurich, Zurich, Switzerland, E-mail: somichae@ethz.ch

Funding source: Innosuisse, the Swiss Innovation Agency, ID 33759.1 IP-LS (in part)

About the authors

Michael Sommerhalder

Michael Sommerhalder received his B.Sc. (2017) and M.Sc. (2020) in mechanical engineering at ETH Zurich followed by doctoral studies on the development of a polymorphic control framework for neurotherapy robots at ETH Zurich under the supervision Robert Riener. He is especially interested in system architecture, controls and computer vision.

Yves Zimmermann

Yves Zimmermann received his B.Sc. (2015) and M.Sc. (2017) in mechanical engineering at ETH Zurich followed by doctoral studies on the development of robotics for neurotherapy at ETH Zurich under the supervision of Marco Hutter and Robert Riener. Over multiple projects he worked on developing controls and hardware for human-machine interaction from an electric race car to a robotic exoskeleton for neurotherapy.

Manuel Knecht

Manuel Knecht completed his B.Sc. in mechanical engineering at ETH Zurich and is now pursuing his master’s degree in Robotics, Sytems and Control with a focus on Rehabilitation Engineering. He is particularly interested in software, controls and mechatronics.

Zelio Suter

Zelio Suter completed his B.Sc. (2019) in mechanical engineering and is currently in his masters studies in Robotics, Systems and Control at ETH Zurich. He joined the Sensory Motor Systems Lab in August 2022. Zelio Suter is particularly interested in fields where mechanics and dynamics meet complex programming and mathematical tasks, control theory and data analysis.

Robert Riener

Robert Riener is full professor for Sensory-Motor Systems at the Department of Health Sciences and Technology, ETH Zurich, and full professor of medicine at the University Hospital Balgrist, University of Zurich. He obtained a MSc in mechanical engineering in 1993 and a PhD in biomedical engineering 1997, both from TU München, Germany. In 2003 he became professor in Zurich. His main research focus is in rehabilitation robotics, virtual reality, and biomechanics. Riener has published more than 400 peer-reviewed articles, 36 book chapters and filed 26 patents. He is the initiator and head of the strategic board of the Cybathlon.

Peter Wolf

Peter Wolf has a Master’s degree in sports engineering and completed his doctorate in biomechanics at ETH Zurich. In addition to the development of sport-specific measurement platforms, his research interest lies in the optimal interaction between users and robotics in the field of motor learning.

Author contributions: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
Research funding: This research was supported in part by Innosuisse, the Swiss Innovation Agency, ID 33759.1 IP-LS (in part). The authors would like to thank Roland Stärk who supported us in this project as a member of the arm exoskeleton group at the SMS lab.
Conflict of interest statement: The authors declare no conflicts of interest regarding this article.

References

[1] V. L. Feigin, M. Brainin, B. Norrving, et al.., “Global stroke fact sheet 2022,” Int. J. Stroke, vol. 17, no. 1, pp. 18–29, 2022.10.1177/17474930211065917Search in Google Scholar PubMed

[2] A. R. Fugl-Meyer, L. Jääskö, I. A. Leyman, S. Olsson, and S. Steglind, “A method for evaluation of physical performance,” Scand. J. Rehabil. Med., vol. 71, pp. 13–31, 1975.Search in Google Scholar

[3] R. Bohannon and M. Smith, “Interrater reliability of a modified ashworth scale of muscle spasticity,” Phys. Ther., vol. 67, pp. 206–207, 1987. https://doi.org/10.1093/ptj/67.2.206.Search in Google Scholar PubMed

[4] P. Maciejasz, J. Eschweiler, K. Gerlach-Hahn, A. Jansen-Troy, and S. Leonhardt, “A survey on robotic devices for upper limb rehabilitation,” JNER, vol. 11, 2014. Art. no. 3.10.1186/1743-0003-11-3Search in Google Scholar PubMed PubMed Central

[5] K. Baur, V. Klamroth-Marganska, C. Giorgetti, D. Fichmann, and R. Riener, “Performance-based viscous force field adaptation in upper limb strength training for stroke patients,” in 2016 6th IEEE International Conference on Biomedical Robotics and Biomechatronics (BioRob), 2016, pp. 864–869.10.1109/BIOROB.2016.7523736Search in Google Scholar

[6] Y. Beck, T. Herman, M. Brozgol, N. Giladi, A. Mirelman, and J. M. Hausdorff, “SPARC: a new approach to quantifying gait smoothness in patients with Parkinson’s disease,” JNER, vol. 15, 2018. Art. no. 1.10.1186/s12984-018-0398-3Search in Google Scholar PubMed PubMed Central

[7] A. M. Coderre, A. A. Zeid, S. P. Dukelow, et al.., “Assessment of upper-limb sensorimotor function of subacute stroke patients using visually guided reaching,” Neurorehabil. Neural Repair, vol. 24, no. 6, pp. 528–541, 2010. https://doi.org/10.1177/1545968309356091.Search in Google Scholar PubMed

[8] N. Nordin, S. Xie, and B. Wünsche, “Assessment of movement quality in robot- assisted upper limb rehabilitation after stroke: a review,” JNER, vol. 11, no. 1, p. 137, 2014. https://doi.org/10.1186/1743-0003-11-137.Search in Google Scholar PubMed PubMed Central

[9] L. Dipietro, H. I. Krebs, S. E. Fasoli, et al.., “Changing motor synergies in chronic stroke,” J. Neurophysiol., vol. 98, no. 2, pp. 757–768, 2007. https://doi.org/10.1152/jn.01295.2006.Search in Google Scholar PubMed

[10] M. Alt Murphy and C. K. Häger, “Kinematic analysis of the upper extremity after stroke–how far have we reached and what have we grasped?” Phys. Ther. Rev., vol. 20, no. 3, pp. 137–155, 2015. https://doi.org/10.1179/1743288x15y.0000000002.Search in Google Scholar

[11] B. Kim and A. D. Deshpande, “An upper-body rehabilitation exoskeleton harmony with an anatomical shoulder mechanism: design, modeling, control, and performance evaluation,” Int. J. Robot. Res., vol. 36, no. 4, pp. 414–435, 2017. https://doi.org/10.1177/0278364917706743.Search in Google Scholar

[12] M. A. Gull, S. Bai, and T. Bak, “A review on design of upper limb exoskeletons,” Robotics, vol. 9, no. 1, p. 16, 2020. https://doi.org/10.3390/robotics9010016.Search in Google Scholar

[13] L. Zollo, L. Rossini, M. Bravi, G. Magrone, S. Sterzi, and E. Guglielmelli, “Quantitative evaluation of upper-limb motor control in robot-aided rehabilitation,” Med. Biol. Eng. Comput., vol. 49, no. 10, pp. 1131–1144, 2011. https://doi.org/10.1007/s11517-011-0808-1.Search in Google Scholar PubMed

[14] H. A. Abdullah, C. Tarry, C. Lambert, S. Barreca, and B. O. Allen, “Results of clinicians using a therapeutic robotic system in an inpatient stroke rehabilitation unit,” JNER, vol. 8, 2011. Art. no. 50.10.1186/1743-0003-8-50Search in Google Scholar PubMed PubMed Central

[15] F. Just, Ö. Özen, S. Tortora, R. Riener, and G. Rauter, “Feedforward model based arm weight compensation with the rehabilitation robot armin,” IEEE Int. Conf. Rehabil. Robot., vol. 2017, pp. 72–77, 2017.10.1109/ICORR.2017.8009224Search in Google Scholar PubMed

[16] M. Guidali, A. Duschau-Wicke, S. Broggi, V. Klamroth-Marganska, T. Nef, and R. Riener, “A robotic system to train activities of daily living in a virtual environment,” Med. Biol. Eng. Comput., vol. 49, no. 10, pp. 1213–1223, 2011. https://doi.org/10.1007/s11517-011-0809-0.Search in Google Scholar PubMed

[17] R. Secoli, G. Rosati, and D. J. Reinkensmeyer, “Using sound feedback to counteract visual distractor during robot-assisted movement training,” in 2009 IEEE International Workshop on Haptic Audio Visual Environments and Games, 2009, pp. 135–140.10.1109/HAVE.2009.5356119Search in Google Scholar

[18] E. Basalp, P. Wolf, and L. Marchal-Crespo, “Haptic training: which types facilitate (re)learning of which motor task and for whom? answers by a review,” IEEE Trans. Haptics, vol. 14, no. 4, pp. 722–739, 2021. https://doi.org/10.1109/toh.2021.3104518.Search in Google Scholar PubMed

[19] Á. Gutiérrez, D. Sepúlveda-Muñoz, Á. Gil-Agudo, and A. de los Reyes Guzmán, “Serious game platform with haptic feedback and emg monitoring for upper limb rehabilitation and smoothness quantification on spinal cord injury patients,” Appl. Sci., vol. 10, no. 3, 2020. https://doi.org/10.3390/app10030963.Search in Google Scholar

[20] Á. Özen, J. Penalver-Andres, E. V. Ortega, K. A. Buetler, and L. Marchal-Crespo, “Haptic rendering modulates task performance, physical effort and movement strategy during robot-assisted training,” in International Conference for Biomedical Robotics and Biomechatronics (BioRob), 2020, pp. 1223–1228.10.1109/BioRob49111.2020.9224317Search in Google Scholar

[21] M. Lyu, W. H. Chen, X. Ding, J. Wang, Z. Pei, and B. Zhang, “Development of an emg-controlled knee exoskeleton to assist home rehabilitation in a game context,” Front. Neurorobot., vol. 13, 2019. Art. no. 67.10.3389/fnbot.2019.00067Search in Google Scholar PubMed PubMed Central

[22] Y. Zimmermann, A. Forino, R. Riener, and M. Hutter, “ANYexo: a versatile and dynamic upper-limb rehabilitation robot,” IEEE Robot. Autom. Lett., vol. 4, no. 4, pp. 3649–3656, 2019. https://doi.org/10.1109/lra.2019.2926958.Search in Google Scholar

[23] T. Cluff and S. H. Scott, “Apparent and actual trajectory control depend on the behavioral context in upper limb motor tasks,” J. Neurosci., vol. 35, no. 36, pp. 12465–12476, 2015. https://doi.org/10.1523/jneurosci.0902-15.2015.Search in Google Scholar PubMed PubMed Central

[24] M. Jaber, Z. Givi, and W. Neumann, “Incorporating human fatigue and recovery into the learning–forgetting process,” Appl. Math. Model., vol. 37, no. 12, pp. 7287–7299, 2013. https://doi.org/10.1016/j.apm.2013.02.028.Search in Google Scholar

[25] G. Wu, S. Siegler, P. Allard, et al.., “ISB recommendation on definitions of joint coordinate system of various joints for the reporting of human joint motion—part i: ankle, hip, and spine,” J. Biomech., vol. 35, no. 4, pp. 543–548, 2002. https://doi.org/10.1016/s0021-9290(01)00222-6.Search in Google Scholar PubMed

[26] G. Rauter, N. Gerig, R. Sigrist, R. Riener, and P. Wolf, “When a robot teaches humans: automated feedback selection accelerates motor learning,” Sci. Robot., vol. 4, no. 27, p. eaav1560, 2019. https://doi.org/10.1126/scirobotics.aav1560.Search in Google Scholar PubMed

[27] Ö. Özen, K. A. Buetler, and L. Marchal-Crespo, “Towards functional robotic training: motor learning of dynamic tasks is enhanced by haptic rendering but hampered by arm weight support,” JNER, vol. 19, no. 19, 2022. Art. no. 19.10.1186/s12984-022-00993-wSearch in Google Scholar PubMed PubMed Central

Received: 2022-09-14

Accepted: 2022-10-04

Published Online: 2022-11-16

Published in Print: 2022-11-25

This work is licensed under the Creative Commons Attribution 4.0 International License.

Score rectification for online assessments in robot-assisted arm rehabilitation

Abstract

Zusammenfassung

1 Introduction

2 Methods

2.1 Disturbance identification

2.1.1 Task characteristics

2.1.2 Robotic platform

2.2 Score rectification procedure

2.3 Score decomposition

2.3.1 Score baseline and normalization

2.3.2 Radial accuracy

2.3.3 Longitudinal accuracy

2.3.4 Temporal accuracy

2.3.5 Score combination

2.4 Disturbance rejection

2.4.1 Fatigue model

2.4.2 Assistance compensation

3 Experiments

3.1 Model tuning

4 Results

4.1 Fatigue rejection

4.2 Full disturbance rejection pipeline

5 Discussion

5.1 Score decomposition could deepen insights into the user’s neuromotor abilities during training

5.2 Score decomposition allows combination of scores with different units

5.3 Fatigue rejection model could predict optimal exercise switching in therapy

5.4 Similar balancing effects as in motor learning for healthy participants were observed

5.5 Comparable online assessments are technically feasible and potentially applicable to multiple areas

6 Conclusions

About the authors

References

Journal and Issue

Articles in the same Issue