1 Introduction

Cardiovascular diseases—disorders of the heart and blood vessels—are the most prominent cause of global deaths, leading to 17.7 million deaths each year (World Health Organization 2011). Cardiac rehabilitation (CR) is a long-term programme following a cardiovascular episode, which aims to accelerate recovery and reduce the risk of recurrent cardiovascular events. CR is generally composed of three phases (Kraus and Keteyian 2007): (I) the Inpatient phase, involving a medical procedure within the 48 hours after the cardiac event, (II) the Outpatient phase, typically 18 weeks (with sessions twice per week) of exercise and education programmes to improve the health of the patient, (III) the Maintenance phase, lasting 9 months (with one or two sessions per week) to reinforce the learned behaviour. This programme is demanding both in time and physical effort and consequently results in low adherence rates (with drop-out ranging between 15 and 50%)  (Maclean and Pound 2000; Carlson et al. 2000; Siegert and Taylor 2004; Bethell et al. 2009; Scane et al. 2012). However, adherence to the programme is vital to achieving a complete recovery and reducing the risk of suffering recurrent events (Jolly et al. 2007; Suaya et al. 2009; Hammill et al. 2010).

Social assistive robotics (SAR) aims to provide monitoring and assistance in physical to cognitive activities, and social interaction during therapy, which improves patient motivation, task performance, and clinical progress (Feil-Seifer and Matarić 2005; Ahmad et al. 2017). However, over a long period of time, repetitive behaviours of the robot may decrease patient interest (Süssenbach et al. 2014; Kidd and Breazeal 2008), which could reduce the frequency of use and interaction with the robot (Fernaeus et al. 2010). Personalisation, through tailoring the robot’s behaviour to each patient, creates an opportunity to break this monotony and helps maintain motivation and facilitates trust over long-term interactions (Castellano et al. 2008; Leite et al. 2013; Irfan et al. 2019).

This work is concerned with the question on what the impact is of personalisation in socially assistive robotics for long-term cardiac rehabilitation? We designed and conducted a real-world clinical study at the Fundación Cardioinfantil-Instituto de Cardiología (Bogotá, Colombia) to evaluate the performance and perceptions of the patients throughout the outpatient phase (II) of CR (Lara et al. 2017; Casas et al. 2018). This long-term study explored three conditions: (a) conventional CR in which the patient is monitored using a suite of sensors, (b) SAR using a NAO robot (SoftBank Robotics Europe, Fig. 1) to continuously monitor and provide generic feedback to patients during exercise on the treadmill, based on sensory information, or (c) SAR using a NAO robot with personalised features to recognise users (Irfan et al. 2018b, 2021), recall the patients’ previous session progress, track their adherence, and give personalised feedback. The results obtained during the first two conditions (Casas et al. 2019; Céspedes et al. 2021) highlighted the benefits of a social robot in comparison with the conventional CR programme in improving adherence, motivation, and physical activity performance. However, the studies suggested a need for enhancing the sociability and social presence of the robot to further improve motivation and adherence. We suggest that personalisation of the interaction is key to achieving this. In this paper, we present the complete real-world study that lasted 2.5 years, drawing upon results from 26 patients that completed the programme out of 43 patients recruited, focusing on the benefits and challenges of personalisation in long-term interactions in the real world. This paper evaluates the physiological progress of the patients (i.e. training and recovery heart rate, exertion level, cervical posture), exercise intensity parameters of the sessions (i.e. speed and inclination of the treadmill), interactions with the robot, perceptions of the patients, and adherence to the programme for the personalised robot condition in comparison with the other two conditions. This is the first comprehensive study that explores the effects of personalisation in socially assistive robots for cardiac rehabilitation.

Fig. 1
figure 1

Setup of the sensor system (image on the left), tablet interface (at the middle), and the social robot (on the right), for the cardiac rehabilitation programme at the Fundación Cardioinfantil-Instituto de Cardiología (Bogotá, Colombia)

2 Background

Social assistive robotics (SAR) shares with assistive robotics (AR) not only the goal of providing assistance to patients, but also to support the user by offering social interaction, through emotional, cognitive, and social cues to encourage development, learning, or therapy (Feil-Seifer and Matarić 2005; Okamura et al. 2010; Breazeal 2011; Matarić and Scassellati 2016). Because SAR aims to deploy robots in real-world therapy with users who have limited robotics expertise—such as doctors, nurses, and patients—the robot needs to perform tasks with a high degree of autonomy. The robots need to provide verbal and non-verbal communication to engage in a natural interaction with the patient (Duffy et al. 1999; Feil-Seifer and Matarić 2005; Tapus et al. 2007). Other features necessary for real-world deployment of SAR applications include automated perception of the user’s behaviour, quantitative diagnosis and assessment, mobility, sensor-based automated health data acquisition, and context-appropriate assistance through user interfaces (Okamura et al. 2010; Prescott and Caleb-Solly 2017; Johanson et al. 2020). SAR-based applications have been developed in a range of clinical areas (e.g. cognitive and developmental disorders, care for elderly, and rehabilitation), all of which share common goals, such as providing physical, cognitive and social support, monitoring and feedback, increasing user motivation, engagement and adherence, and improving task performance and progress (Leite et al. 2013; Ahmad et al. 2017).

However, most research in SAR has been carried out under laboratory conditions, using short-term interventions that often rely on tele-operation (Leite et al. 2013; Lane et al. 2016; Vandemeulebroucke et al. 2018). This restricts their relevance to long-term therapies in real-world applications and does not address the novelty effect (Gockley et al. 2005) or the challenges faced in the adoption of technology (Riek 2017; Coninx et al. 2015). For instance, the only prior study with a socially assistive robot in cardiac rehabilitation (for spirometry exercises in inpatient phase I) (Kang et al. 2005) is evaluated under laboratory conditions for one session with a low number (5) of healthy participants, without analysing the physiological progress of the patients. Thus, despite the positive feedback of the participants, these results cannot be generalised to long-term rehabilitation. Nonetheless, real-world studies in other rehabilitation applications showed the benefit of robots over long-term interactions. For instance, a robot was used to support long-term post-stroke rehabilitation, which lasted 5 to 7 weeks (Feingold Polak and Tzedek 2020). Preliminary results (based on 4 patients) indicated that SAR helps to engage, motivate, and support upper limb rehabilitation. Another study in cognitive rehabilitation (Feng et al. 2020) developed a combined platform using augmented reality and an animal-like robot to engage dementia patients within multi-sensory stimulation sessions over four weeks, which was found to elicit positive emotions, increase social bonding, and restore communication. Broadbent et al. (2018) evaluated the usefulness of a home-based social robot as a supporting tool in patients with the chronic obstructive pulmonary disease based on a 4-month study, which highlighted the capability of SAR to increase adherence to medication and exercise.

However, long-term interactions can cause a considerable decrease in user interest and motivation compared to the initial interaction (Kidd and Breazeal 2008; Süssenbach et al. 2014; de Graaf et al. 2016). Thus, a number of studies focused on seeking strategies to promote long-term interactions in real-world environments. For instance, the Autom Robot (Kidd and Breazeal 2007, 2008) was developed with the aim to address obesity and assist those who are willing to lose or maintain weight. To achieve and maintain these goals in long-term interactions, several key features were used, such as eye contact, hand, head and arms gestures, speech recognition and synthesis, and tracking user progress. The study was conducted with 45 subjects over six weeks at the participants’ homes, and the Working Alliance Inventory (Horvath and Greenberg 1989) (WAI) questionnaire was used to evaluate the interaction. The results showed that the participants assisted by the social robot used the system for longer periods than those who use other methods (i.e. tablet and paper logging of data), had a stronger alliance with the proposed system and a higher interest in knowing calorie consumption and exercise performed (Kidd and Breazeal 2008), which supports the importance of embodiment in SAR. Other studies have shown that embodiment can increase compliance (Bainbridge et al. 2008), likeability (Fasola and Matarić 2013; Li 2015), social engagement (Lee et al. 2006; Wainer et al. 2006; Vasco et al. 2019), adherence (Bickmore and Picard 2005a; Kidd and Breazeal 2007), and task performance  (Vasco et al. 2019; Deng et al. 2019), which are essential in, especially long term, therapy.

Moreover, a variety of long-term studies on the adaptive and reactive seal-shaped robot PARO showed the positive impact on elder care, such as reducing negative emotions and behavioural symptoms of elderly residents, improving their social bonds and engagement, and promoting positive mood and quality of care experience (Hung et al. 2019). Adaptation and personalisation strategies (e.g. addressing the patient with their name, tracking their progress, and consequently adapting feedback or therapy tasks) were found to play an important role in long-term therapy (Rossi et al. 2017) to elicit and maintain user engagement over extended durations  (Tapus et al. 2009; Blanson Henkemans et al. 2013; Scassellati et al. 2018; Winkle et al. 2018; Clabaugh et al. 2019; Richardson et al. 2018; Cao et al. 2019), improve task performance  (Tapus et al. 2008; Tapus 2009; Matarić et al. 2009; Hemminahaus and Kopp 2017; Andriella et al. 2020), increase perceived familiarity and sociability (Sabelli et al. 2011; Fasola and Matarić 2013), and perceived competence and trust (Schneider and Kummert 2021). Moreover, previous studies in other domains showed the benefits of recalling user’s personal attributes (e.g. name, gender, age)  (Kanda et al. 2004; Gockley et al. 2005; Mutlu et al. 2006; Kanda et al. 2007, 2010; Sabelli et al. 2011; Fasola and Mataric 2012; Belpaeme et al. 2013; Leite et al. 2014; Kennedy et al. 2015; Churamani et al. 2017; Campos et al. 2018; Zheng et al. 2019; Irfan et al. 2020b), preferences  (Ho et al. 2010; Belpaeme et al. 2013; Churamani et al. 2017; Zheng et al. 2019; Irfan et al. 2020b), behaviour patterns (Glas et al. 2017; Zheng et al. 2019), and shared history  (Ho et al. 2010; Belpaeme et al. 2013; Matsumoto et al. 2012; Leite et al. 2014, 2017; Campos et al. 2018; Zheng et al. 2019; Ahmad et al. 2019) for improving user experience in long-term interactions.

In our previous work (Lara et al. 2017; Casas et al. 2018), we described a SAR interface for a long-term cardiac rehabilitation programme. The interface consists of two main modules: (i) the sensor module, in charge of measuring the patients’ cardiovascular and spatiotemporal gait parameters through a set of sensors, and (ii) a robot module, which consists of a social robot that monitors and provides immediate feedback and motivation to patients to increase their engagement in CR programmes. The interface was validated with laboratory (Lara et al. 2017) and clinical case studies (Casas et al. 2018, 2018a, b, 2019, 2020). The outcomes of these studies showed that the interface is robust and offers additional monitoring during therapy, and that using a robot improved motivation, trust and adherence in the CR programme. In addition, the clinicians’ perception and attitudes towards the robot improved after the demonstration of its potential benefits (Casas et al. 2019). Subsequently, we developed additional features, such as user recognition, adherence tracking, and personalised immediate and progress feedback. A case study of a patient assisted by that robot (Irfan et al. 2020a) showed that personalisation helps maintain positive perceptions and social interactions throughout the long-term programme and facilitates patient motivation and adherence. Finally, we analysed the overall benefits of the non-personalised socially assistive robot in comparison with conventional cardiac rehabilitation (Céspedes et al. 2021). The results showed that the patients assisted by the robot had a higher adherence, improved faster on their cardiovascular functioning, and showed better physical activity performance. Moreover, the clinicians acknowledged the motivational benefits of the robot for the patients and the added value of continuous monitoring in cardiac rehabilitation. This work draws on these conclusions and analyses whether personalisation can improve the benefits of the socially assistive robot and the perception of the patients.

3 Conventional cardiac rehabilitation

Each exercise session in the outpatient phase (II) of CR typically lasts about an hour at the Fundación Cardioinfantil-Instituto de Cardiología clinic and is generally conducted twice per week for 18 weeks. Each session starts by measuring patients’ initial parameters, such as the initial resting heart rate, weight and blood pressure, followed by a warm-up consisting of low-paced walking and low-intensity stretching exercises in a group. Afterwards, the patients attend the training session based on physical exercises using a treadmill, lasting 15 to 20 minutes. The intensity of the training session is determined by the speed and inclination of the treadmill, which are chosen by the physiatrist at the start of each session. These parameters progressively increase throughout the CR programme depending on the patient’s CR performance and progress (Simms et al. 2007). Furthermore, during training, the healthcare staff (i.e. physiatrists, occupational therapists, physical therapists, and nurses) manually measure the patients’ heart rate (HR) and request a self-reported exertion level using the Borg scale (Borg 1998) every 5 to 7 minutes, as shown in Fig. 2, to adjust the intensity if necessary. After training, patients step off the treadmill to perform low-intensity exercises for 10 to 15 minutes, referred to as cooldown, in order to gradually decrease their HR. During this step, the resting HR and blood pressure are measured.

Fig. 2
figure 2

Conventional cardiac rehabilitation session during training exercises at the Fundación Cardioinfantil-Instituto de Cardiología

Fig. 3
figure 3

Architecture of the personalised patient–robot interface for cardiac rehabilitation programme

The CR sessions are conducted within a large group (e.g. 20 patients); thus, it is challenging for clinicians to provide continuous and individual monitoring of the patients (Turk-Adawi et al. 2019) due to the lack of a telemetry in the CR unit, especially during high-intensity exercises in training which can result in critical heart rates. Moreover, clinical studies suggest that providing individual support and supervision during exercise can improve a patient’s motivation within the CR programme (Shahsavari et al. 2012). That is why, we designed a SAR system to provide continuous monitoring and feedback during training to facilitate prompt intervention from the healthcare staff, in addition to improving patients’ motivation and adherence to the programme, as described in the next section.

4 Personalised patient–robot interface

As previously described, cardiac rehabilitation is a long-term programme, where adherence is critical to ensure a complete recovery. However, repetitive and rigorous exercises may cause a decrease in patient motivation and consequently dropouts from the programme. Thus, based on prior research in other areas of SAR and rehabilitation, we developed a personalised patient–robot interface to provide continuous monitoring and individualised support to patients during exercise for improving their motivation and engagement in the programme. All behaviours and responses of the robot were created in collaboration with medical specialists to avoid incorrect medical assessments and prevent negative perceptions of the programme and with the aim to resemble the behaviours of the clinicians towards the patients. The interface, as presented in Fig. 3, consists of a sensor interface to measure the patient’s performance, the socially assistive robot to provide immediate feedback and motivation based on sensor values, and personalisation features to track the patient’s progress throughout the CR programme and provide a tailored experience to the patients.

4.1 Sensor interface

The patient–robot interface integrates a set of sensors to measure the patients’ physiological progress regarding their cardiovascular functioning and exercise performance. Cardiovascular parameters (i.e. training and resting HR) of the patient were measured by a heart rate monitor (Zephyr HxM, Medtronic, New Zealand). Gait spatiotemporal parameters (i.e. cadence, step length, speed) were measured with a laser range finder (Hokuyo-URG 04LX-UG01, Hokuyo, Japan), shortly LRF. The treadmill inclination was measured using an Inertial Measurement Unit (MPU9150, Invensense, USA), shortly IMU. Finally, to visualise the data registered, receive self-reported data from the patient, and control the sessions’ flow, a graphical user interface (GUI) was developed. A tablet (SurfacePro, Microsoft, USA) was used to display the GUI and the tablet camera was used to measure the cervical posture of the patient using a head gaze detection algorithm (Lemaignan et al. 2016) and take images for user recognition during the sessions.

Fig. 4
figure 4

Finite-state machine presenting different behaviours of the robot during the monitoring phase

4.2 Socially assistive robot

We use a NAO (SoftBank Robotics Europe, France) robot, as shown in Fig. 1, which is the most commonly used platform for human–robot interaction (HRI) research (Lambert et al. 2020). The robot was located at the same place (in front of the patient) during the study, for the conditions that include the robot. The robot’s goal is to support and monitor the patient throughout the training (Fig. 4). At the start of the interaction, the robot will greet the patient and describe the intensity of the session (treadmill inclination and speed). During the session, the robot will periodically provide a randomly selected verbal encouragement to facilitate motivation from three pre-scripted responses for the first five minutes (e.g. “Let’s start well today!”), 13 responses until mid-session (e.g. “Let’s go! You can do it!”), and eight responses until the cooldown period (e.g. “Only a few minutes left!”). Furthermore, similar to the medical staff in conventional therapy, the robot requests the patient to report their exertion level on a Borg scale every seven minutes, using a randomly selected phrase from 10 responses. Additionally, the robot monitors the patient to ensure that their physiological parameters remain in a healthy range. For example, if the patient’s cervical posture is incorrect (i.e. when the patient is looking down instead of straight ahead), the robot reminds the patient to look straight ahead through a randomly selected feedback from six responses. If the patient’s HR is above a warning threshold set by the therapists at the start of the session, the robot asks the patient whether they feel fine or need assistance from the medical staff. If the HR is above a critical threshold, as calculated by the medical staff using the Karvonen formula (She et al. 2014), or the patient requests it, the robot directly alerts the medical staff verbally (e.g. “Your heart rate is too high, I am calling for help. Doctor, could you please come here?”), waving its hand. The warning behaviours have a cooldown period of three minutes to prevent overloading the patient if a warning state is maintained. More details about the robot behaviours can be found in Casas et al. (2018, 2020).

4.3 Personalisation

As discussed in Sect. 2, several studies showed the importance of personalisation for socially assistive robotics in facilitating perceived familiarity and sociability, in addition to improving user performance and engagement within long-term interactions. Personalisation in long-term interactions should not only focus on the inherent differences, such as a person’s name; it should also focus on the long-term changes, such as their therapy progress, and incrementally learn and adapt to the user for maintaining user engagement over the duration of the interaction (Tapus et al. 2007). Moreover, as discussed in the introduction, low adherence is a big concern of the CR programmes. To address these needs, we developed personalisation features focusing on users’ personal attributes (i.e. appearance and name) and behaviour patterns (i.e. adherence and rehabilitation progress) by (i) recognising users autonomously, (ii) personalising the verbal (motivational and sensory) feedback of the social robot by occasionally (every 2 to 4 minutes) using their name, (iii) tracking their progress between sessions, and (iv) referring to their attendance to improve their motivation in the programme and facilitate adherence. These personalisation features were developed based on the suggestions of the medical specialists, and they correlate with therapists’ approaches in improving the motivation, engagement and compliance of the patient, through feedback, positive reinforcement, reminders, and prompts (Winkle et al. 2018). The personalised features of the robot were validated in a pilot study under laboratory conditions before deployment in the clinic.

4.3.1 Multi-modal user recognition

In a real-world environment such as a hospital, it is necessary to have a robust system that can autonomously recognise and learn new users with minimum efforts from the patients and the doctors (Reig et al. 2021). Moreover, user recognition allows more natural interaction with the robot. Thus, we applied Multi-modal Incremental Bayesian Network (MMIBN)Footnote 1 with online learning (Irfan et al. 2018b, 2021), which is the first method for sequential and incremental learning of users that does not require any preliminary training for user recognition. It combines face recognition with soft biometrics, which are ancillary physical or behavioural characteristics, such as gender, age, height and time of interaction, that can be used to improve the recognition performance (Jain et al. 2004; Dantcheva et al. 2016). The structure of user recognition is shown in Fig. 5.

Fig. 5
figure 5

Architecture for multi-modal user recognition in CR programme using the Multi-modal Incremental Bayesian Network (MMIBN) with online learning

An image is taken from the tablet camera and transferred to the robot to obtain the face recognition similarity scores, and gender, age, and height estimations through the NAOqiFootnote 2 proprietary software of the robot. Time of interaction is a suitable ancillary behavioural characteristic, because patients in the CR programme have set appointments twice a week on certain days and times. The biometric data are combined using MMIBN. If the quality of the estimation (i.e. the difference between the highest and the second-highest probability scores, multiplied by the number of known users) is above a certain threshold, then the identity that corresponds to the highest posterior probability score is returned; otherwise, the user is believed to be a new patient. Explicit confirmation of the identity is obtained after each recognition (e.g. “Hello PATIENT_NAME, it is nice to see you again! Could you confirm that it is you please?”) through the tablet interface, in order to avoid any errors in personalisation of the session and to improve the user recognition through online learning. Online learning of biometric data helps adapt to the changes of user appearances (e.g. different hair styles and glasses) and the interaction patterns (e.g. time of interaction). If the user enters an identity that is not in the system for the confirmation (i.e. the user is new), ground truth values (i.e. name, age, and height) are requested from the tablet interface to apply incremental learning on MMIBN and face recognition database. Overall, the system is able to recognise and learn new users without the need for intrusive methods or external devices, such as QR codes or access cards, to offer a more natural interaction, and it is suitable for non-expert users. Moreover, MMIBN was found to significantly outperform NAOqi face recognition and a state-of-the-art open world recognition method (Rudd et al. 2018) on a long-term (4 weeks) HRI study in the real world with 14 participants (93.2% identification rate) and on a large artificial multi-modal dataset with 200 users (65.7% identification rate) (Irfan et al. 2018b, 2021).

4.3.2 Progress tracking

CR aims to improve cardiovascular functioning and recovery of the patients throughout the long-term programme. The session’s intensity (as determined by the treadmill speed and inclination) progressively increases, but can be scaled back based on the patient’s progress determined by a variety of physiological factors, such as the recovery and training heart rate and exertion level, and whether these parameters stay within healthy levels. Thus, it is challenging to determine the progress of the patient from session to session. Correspondingly, the medical specialists suggested session-based feedback for progress, i.e. comparing the current session to the previous session of the patient using the alerts for critical heart rate and exertion level, and the cervical posture corrections, such that the patient can track whether they are responding well to the rehabilitation on a session per session basis, which is expected to improve their motivation.

In order to prepare and motivate the patient for the session, the relative session intensity is indicated by the personalised robot at the beginning of each session after the announcement of session intensity parameters, such as “Today, we are starting with a speed of 2.1 miles per hour with an inclination of 0.8 degrees, which will be more intense than the last time.”. The relative intensity is defined by the clinicians at Fundación Cardioinfantil-Instituto de Cardiología, to be higher (i.e. more intense) if the treadmill speed or inclination is higher than that of the previous session, and lower (i.e. less intense) if both of these parameters are lower. Subsequently, the previous session progress is mentioned, followed by a motivational phrase, such as “In the previous session, you experienced difficulty with your heart rate. I am sure it will be all fine this time!” or “I am sure that it will be as good as last time!”. At the end of the session, the performance of the patient is compared to the previous session. To avoid demotivating the patient, the relative session intensity is also noted only if it is higher. For instance, “We had a lower number of difficulties in this session than the previous one, even though the session intensity was higher. Let’s keep up the good work, PATIENT_NAME!”, or “We had a higher number of difficulties this session than the previous one. Next time will be better, PATIENT_NAME!”.

4.3.3 Adherence tracking

Patients are prescribed two sessions per week for the outpatient phase of CR for a total of 18 weeks (4.5 months). However, the medical records in the Fundación Cardioinfantil-Instituto de Cardiología clinicFootnote 3 show that patients take 5.7 months on average to finish the outpatient phase. The long duration of the programme also decreases the willingness to continue, resulting in dropouts, as mentioned in the introduction. Hence, in order to encourage patients to come to their appointed sessions and improve adherence, we tracked the patient’s attendance per week. Since the lack of attendance can be either due to justifiable reasons, such as sickness or leaving town, as well as negligence, the robot comments on the missed sessions (excluding holidays) in a positive manner, such as “You didn’t come to the last (X) session(s). I hope everything is all right!”. Moreover, to increase the sociability of the robot and familiarity, we tracked weekends and the national holidays, with comments such as “I hope you had a nice weekend/holiday!”.

5 Experimental procedure

A longitudinal study was carried out at the Fundación Cardioinfantil-Instituto de Cardiología (Bogotá, Colombia) for 2.5 years to evaluate the impact of socially assistive robots and personalisation in the outpatient phase (II) of cardiac rehabilitation, which is designed to last 18 weeks (36 sessions) per patient.

5.1 Hypotheses and predictions

In general, our study was designed to evaluate the benefits of using a socially assistive robot for long-term cardiac rehabilitation with the aim to improve user motivation and adherence to the programme. However, as previous research outlined in Sect. 2 shows, user motivation and interest towards a generic robot can wane over long-term interactions, which could be overcome by the personalisation of the interaction. Moreover, previous studies in SAR showed that personalisation leads to an improvement in task performance and user perceptions (e.g. competence, trust, sociability, familiarity). In addition, our previous work showed that using a social robot improves adherence, physical activity performance, and cardiovascular functioning in comparison with conventional CR programme (Céspedes et al. 2021) and leads to a significant increase in the patients’ perceptions of the robot (i.e. perceived trust, utility, usefulness, and ease of use) (Casas et al. 2019). Correspondingly, this work aims to address the research question, what the impact is of personalisation in socially assistive robotics for long-term cardiac rehabilitation? and the consequent hypotheses and predictions:

  • H1 Personalisation will improve patient motivation and adherence to the CR programme.

    • Prediction \(P_{1a}\): A higher ratio of patients will complete the CR programme with the personalised robot than the control or social robot.

    • Prediction \(P_{1b}\): Patients will report higher intrinsic motivation to improve in their sessions with the personalised robot than the other conditions.

  • H2 Personalisation will improve the cardiovascular performance of the patients.

    • Prediction \(P_{2a}\): Patients will have a higher gain of normalised recovery heart rate in the personalised condition than the other conditions.

    • Prediction \(P_{2b}\): The personalised robot will lead to a lower number of alerts to medical staff than the social robot.

  • H3 Interaction with the personalised robot will be maintained throughout the long-term programme.

    • Prediction \(P_{3a}\): Patients will comply with the robot’s posture correction requests throughout the CR programme.

    • Prediction \(P_{3b}\): There will be no significant decrease in the gazing behaviour to the robot throughout the programme.

    • Prediction \(P_{3c}\): Social interaction between the human and the robot will be maintained throughout the programme.

    • Prediction \(P_{3d}\): Patients will maintain their bond with the robot (as measured by the Working Alliance Inventory) throughout the programme.

  • H4 Personalisation will improve patients’ perceptions of the robot.

    • Prediction \(P_{4a}\): The personalised robot will be rated as more useful than the other conditions.

    • Prediction \(P_{4b}\): Patients will enjoy the personalised features of the robot.

5.2 Conditions

Based on our research question, we designed three conditions for the study:

  • Control The patients perform conventional CR sessions (i.e. without a robot), where they are supervised by the healthcare staff. In order to compare the physiological progress between the groups, the sensor interface (as described in Sect. 4.1) is used to measure the patients’ physiological parameters. Patients only interact with the tablet to enter their Borg scale, when it is requested through the tablet with an audible signal and a change of colour. While the health parameters (i.e. heart rate, gait speed, cadence, step length, treadmill inclination, and the self-reported Borg scale value) are visible on the tablet interface for informing the medical staff, patients do not receive any motivational or physiological verbal feedback to emulate the conventional CR sessions.

  • Social Robot The patients perform the CR sessions assisted by the robot, as described in Sect. 4.2, and the sensor interface.

  • Personalised Robot The patients perform the CR sessions assisted by the personalised robot, as described in Sect. 4.3, and the sensor interface.

The control and social robot conditions started in August 2017, and the personalised robot condition started in October 2019. The non-personalised (social) robot, the personalised robot, and the sensor interface operated fully autonomously. Nevertheless, an experimenter was present during the CR sessions for all conditions, to interfere only in the case of system failures.

5.3 Experimental criteria

  • Inclusion Criteria The study targeted the patients starting the outpatient phase (II) of the cardiac rehabilitation programme, which lasts 18 weeks with two sessions per week. Patients who are over 25 years old with acute myocardial infarction, percutaneous coronary intervention, coronary artery bypass graft, valve replacement, ischaemic heart disease and hypertension, and ejection fraction greater than 40% were recruited. Moreover, the participants should be able to perform treadmill exercises.

  • Exclusion Criteria The patient–robot interface may pose limitations on the patients with visual, auditive, or cognitive impairments that may impede the manipulation and correct understanding of the system; hence, such patients were excluded from the study based on their clinical records presented upon entrance to the CR programme. The patients with a different cardiovascular pathology than the aforementioned were also not considered for the study.

  • Dropout and Incomplete Criteria The initial duration of the CR programme was considered to be 18 weeks, where patients would attend twice per week as appointed. However, due to some of the patients missing their sessions, this initial policy resulted in a shorter CR duration for the patients (23-33 sessions). Correspondingly, the study policy was reviewed in 2018 to last 36 sessions, in order to improve the programme offered to the patients. Thus, we define a drop-out from the study to be the case when the patient does not attend three sessions in a row without a justification. The patient, in that case, is dropped from the study, but may continue the CR programme without the robot or tablet interface. On the other hand, if the patient could not complete the programme due to a critical health condition, funding (e.g. health insurance coverage) or COVID-19 outbreak, their CR programme is considered incomplete, since these reasons are beyond their control.

5.4 Participants and demographic data

Totally, 43 patients were recruited for the study: 15 patients for control and social robot conditions and 13 for the personalised robot. However, due to dropouts and incomplete therapies due to critical health conditions, funding, or the COVID-19 outbreak in March 2020, only 26 patients could complete the study. The demographic data of the patients that actively participated in the rehabilitation and completed the outpatient phase are presented in Table 1.

Table 1 Demographic data of the patients that actively participated and completed the outpatient phase of the CR programme within the study

Of all patients present at a group-based CR session, only one patient was a participant of the study. This decision was made to prevent patients in the study from meeting each other, which could potentially influence their perception of the robot. However, this places additional restrictions on the scheduling; hence, only 3 to 5 patients enrolled in the study could attend a CR session per day.

5.5 Measures

To evaluate the impact of personalisation in SAR for long-term CR, we developed the following measures based on the parameters taken during a conventional CR session and our hypotheses.

5.5.1 Physiological progress

The patient’s physiological progress is assessed using the variables measured by the sensor interface.

Cardiovascular parameters These parameters reflect the patient’s cardiovascular performance during the exercise performed in cardiac rehabilitation. Primarily, two measurements are analysed: (i) average heart rate during the training phase (THR) and (ii) recovery heart rate (RHR), which represents the difference between the heart rate one minute after ending the training phase of the exercise, and the THR. RHR is normalised (\({RHR_{normalised}}\)) with the patient’s initial resting heart rate measured at the beginning of the session (IHR) to allow comparison between patients. Equation 1 shows the calculation for the RHR.

$$\begin{aligned} \begin{aligned} RHR&= THR - HR_{post-training} \\ RHR_{normalised}&= RHR / IHR \end{aligned} \end{aligned}$$
(1)

Gait Spatiotemporal parameters Measuring the gait parameters is important to track the patient’s performance during the exercise. The main components of analysis during the gait can be classified in distance (spatial) measurements and time (temporal parameters). Within the CR programme, three of these variables are assessed: (i) cadence, which represents the total number of full cycles taken within a given period (Thompson 2002), (ii) the step length that describes the distance between the point of initial contact of one foot and the initial contact of the opposite foot (Thompson 2002), and (iii) the patient’s gait speed, which also represents the treadmill’s speed and is used to measure exercise intensity.

Cervical Posture Because the CR sessions are performed on a treadmill, a healthy posture is essential to avoid dizziness, falls and achieve a correct gait performance (Martin and McConahay 1972). Thus, we measure the cervical posture (i.e. head inclination) using the camera of the tablet located in front of the patient through a head gaze estimator (Lemaignan et al. 2016). The output data acquired using the estimator correspond to a binary value (e.g. ‘looking straight ahead’, ‘not looking straight ahead’), which was used for immediate feedback and was not recorded for analysis.

Exercise Intensity parameters To measure the exercise intensity, the following parameters were acquired and measured, in addition to the treadmill speed: (i) The treadmill inclination that was measured by an IMU (MPU9150) located on the treadmill’s floor (the values vary between 0 and 5 degrees angle), and (ii) the patient’s perceived exertion, as measured using the self-reported Borg Scale (Borg 1998). The Borg Scale assesses in a subjective manner the exertion and intensity perceived by a patient during the exercise (Aamot et al. 2014). At Fundación Cardioinfantil-Instituto de Cardiología, the Borg scale varies between 6 and 20 (6 corresponds to a very low level of the perceived exertion, 20 corresponds to a very high level of exertion). The clinicians consider values between 6 and 13 as a safe (healthy) perceived exertion level.

Warnings and Alerts Count As mentioned in Sect. 4.2, the social robot provides different types of feedback. As an additional indicator of the patient’s physiological progress, call medical staff alerts and high heart rate warnings during the session were counted.

5.5.2 Long-term perception of the robot

The Unified Theory of Acceptance and the Use of Technology (UTAUT) (Venkatesh et al. 2003) questionnaire and its extension the Almere model (Heerink et al. 2010) are commonly used to evaluate key aspects of a socially assistive therapy through several concepts, such as perceived utility, trust, and adaptivity. We previously adapted UTAUT and the Almere model for a CR programme with a robot and applied it to 8 patients that completed the social robot condition, and a baseline group of 20 patients in their early outpatient or maintenance phase, without any prior experience with robots or our system (Casas et al. 2019). The baseline group served as a baseline perception of the robot and our system; hence, we did not include the patients from the control condition (the interface only condition) to avoid biasing the results with their expectations and perceptions of the system. A debriefing was organised for the baseline group about SAR, its potential benefits, and the parameters measured by the system, in addition to a video presentation of the social robot condition. The results showed that the social robot improved the expectations and had a significant increase in the patients’ perceived trust, utility, usefulness, and ease of use. However, the patients and clinicians highlighted that the robot needs to have more social skills, such as personalised feedback, reminders, and physical activity updates, to enhance the interaction and improve compliance. At the time of the work, the study was not completed and the personalised robot condition has not yet started. In this work, we compare the perceptions of all the patients that completed the social robot condition and evaluate how personalisation changes these perceptions. Moreover, we developed additional questions to evaluate the personalisation features, as shown in Table 10 in Appendix A. The social and personalised robot conditions are compared to the baseline group based on the same (non-personalised) questions, whereas the personalisation questions are analysed separately.

Moreover, to evaluate whether personalisation helps build a relationship with the robot and how this is affected over the long term, we applied the Working Alliance Inventory (WAI) (Horvath and Greenberg 1989) to the patients in the personalised robot condition. WAI is a 36-item self-report instrument based on Bordin’s pantheoretical tripartite conceptualisation, i.e. Bond, Task and Goal. This questionnaire was used in long-term social robotics studies to evaluate the perceived task performance and sociability of a robot (Bickmore and Picard 2005b; Hoffman and Breazeal 2010; Kidd and Breazeal 2008). The Bond construct measures the degree of trust and familiarity between the robot and the patient (e.g. “My relationship with the robot is very important to me”). The Task construct evaluates the degree to which the robot and the patient agree on therapeutic tasks (e.g. “The things that the robot is requesting from me do not make sense”). The Goal construct aims to measure the degree to which the robot and the patient agree on the goals of the CR programme (e.g. “The robot perceives accurately what my goals are”). WAI uses negative (e.g. “I disagree with the robot about what I ought to get out of therapy.”) and positive (e.g. “The robot and I are in agreement on what is important for me to work on.”) formulations to limit the bias in the results. We adapted WAI for cardiac rehabilitation to analyse the long-term perception, as presented in Table 11 in Appendix A. WAI was applied at the middle of the CR programme (18 sessions) and at the end of the programme (36 sessions).

5.5.3 Video analysis

One of the most common measurements used in SAR is the analysis of videos (Sabanovic and Simmons 2006; Anzalone et al. 2015; Leite et al. 2012). Initially, video recordings were not considered for the control or social robot conditions as we did not expect to observe changes that would require video analysis, and due to the lack of available resources. However, during these conditions, we observed a change of behaviour towards the robot throughout the programme in the social robot condition, which prompted the necessity to analyse the behaviour in detail. Correspondingly, the consent forms were modified to include video recording for the personalised robot condition, which started the other two conditions. The sessions were recorded with a GoPro (GoPro, Inc., USA) camera installed in the CR service.

Gaze is an important factor in human–human interaction and human–robot interaction (Ruhland et al. 2015). Most of the work in HRI has focused on generating meaningful robot gaze (Mutlu et al. 2009; Mwangi et al. 2018; Admoni and Scassellati 2017); however, other studies also explored the importance of human gaze in HRI, how to measure it and interpret it (Broz et al. 2012; Lemaignan et al. 2016; Oertel et al. 2020). We draw from these works to interpret human gaze in our study and use it as a metric for engagement. Moreover, non-verbal emotional responses, such as gestures and facial expressions, and verbal social interactions with the robot, are other methods for detecting engagement  (Clave et al. 2016; Oertel et al. 2020). In addition, analysing patients compliance can help determine the effectiveness of the socially assistive robot in achieving good task performance and engagement in long-term therapy  (Matarić et al. 2007; McColl and Nejat 2013; Fasola and Matarić 2013). Correspondingly, the video analysis was made by two independent coders based on the following interactions: (i) gaze of the patient to the robot, (ii) social interaction of the patient’s verbal and non-verbal (e.g. positive or negative expressions, gestures) responses to the robot, and talking about the robot to other patients, (iii) medical staff interaction with the robot, such as the doctor touching the head of the robot to suppress the call medical staff alert, or interacting with it verbally and non-verbally beyond the requirements of the task, and (iv) patient compliance to the cervical posture request. Prior to the analysis, a coding session was performed with the coders to unify the measurement method. All the variables were coded as binary (e.g. 1: gaze triggered or 0: no gaze behaviour) and comments were added to specify the nature of the event (e.g. when a social interaction was triggered, the event was coded, and a comment was used to describe the situation). We set a 11.8% overlap (randomly selected 24 sessions corresponding to 8.95 hours) for the coding data to verify the validation of the coding results, which is sufficient to establish inter-rater reliability (O’Connor and Joffe 2020).

5.6 Statistical analysis

The patient’s performance in a session is affected by the exercise intensity, as well as external factors, such as illness and tiredness prior to the session. Thus, to decrease the intrasubject variability, the data are analysed within six stages (i.e. 6 sessions per stage), as suggested by the medical staff at Fundación Cardioinfantil-Instituto de Cardiología.

5.6.1 Numerical data

Our study is a two-way mixed design, that is, it contains repeated measures for different groups. However, the data for physiological progress, cervical posture corrections, and the interactions with the robot are not normally distributed (Shapiro–Wilk test gives \(p < .001\) on residuals, and the visual inspection of the residuals shows a large diversion from linear reference lines). Moreover, the homogeneity of variances assumption of ANOVA is violated (\(p < .05\) in Levene’s test and Box’s M-test) in almost all the cases, except for exertion levels and social interactions. Hence, ANOVA cannot be applied. The nonparametric test that corresponds to a repeated-measures ANOVA is a Friedman test; however, it requires a complete block design, whereas the group sizes are not equal (unbalanced data) due to the dropouts and the incomplete CR programmes. In addition, due to the change in the experimental criteria, some of the patients completed the CR programme earlier, and the sensor failures within some of the sessions caused incomplete (missing) data. Thus, we apply Johansen’s (Johansen 1980) general formulation of Welch (Welch 1938)–James (James 1951)’s statistic with Approximate Degrees of Freedom (Villacorta 2017; Welch 1951; Keselman et al. 2003), which is suitable for applying to repeated measures and two-way mixed designs. We evaluate the differences between the stages and the conditions using pairwise tests, with Hochberg correction for multiple comparisons and Least-Squares Estimators (i.e. without trimming), which are default parameters of the implementation (Villacorta 2017)Footnote 4. In order to evaluate the consistency and the magnitude of a particular phenomenon across different studies in the literature, effect sizes are reported for pairwise tests based on Glass’s delta (Glass et al. 1981; Keselman et al. 2003; Villacorta 2017). This measure does not classify effect sizes, such as ‘small’, ‘medium’ and ‘large’, in contrast with Cohen (1988), because the practical importance of an effect depends on the context of the applications, such as relative costs and benefits, and a small effect size can make a substantial difference. A negative effect size denotes a decrease in the mean between group 1 (e.g. social robot) and group 2 (e.g. personalised robot), whereas a positive effect size denotes an increase. Inter-rater reliability agreement on the video data is measured by Cohen’s kappa (\(\kappa \)) (Cohen 1960) and interpreted according to McHugh (2012). Furthermore, the McNemar test, which enables comparing two classification algorithms that are run only once (Dietterich 1998), is applied to compare MMIBN to NAOqi face recognition. A more detailed analysis of user recognition in comparison with a state-of-the-art open world recognition algorithm (Rudd et al. 2018) is available in (Irfan et al. 2021).

5.6.2 Ordinal data

Likert scales are ordinal; hence, non-parametric tests should be applied to analyse the questionnaires (Jamieson 2004). Correspondingly, Wilcoxon signed-rank test is applied on WAI results with Bonferroni correction, because the same test is applied to the patients twice (i.e. at the middle of their CR programme and at the end). Mann–Whitney U-test is applied for UTAUT for analysing the significant differences between the conditions (i.e. independent samples).

6 Results

As previously stated, this work focuses on the impact of the personalisation of the robot for the CR programme. Correspondingly, the results are analysed in that perspective, comparing the effects of the personalised robot to that of the social robot, as well as to the conventional CR programme, through various measures described earlier.

6.1 Adherence

While 43 patients participated in the study within the 2.5 year study duration, corresponding to 1050 sessions, due to the reasons beyond the control of the patients—such as funding, medical condition, and the outbreak of COVID-19—8 patients could not complete the CR programme, and an additional 9 patients dropped out of the programme. The control condition was completed in January 2019; however, the social and personalised robot conditions were halted due to the COVID-19 pandemic in March 2020; hence, 6 patients in the personalised robot condition and one patient in the social robot condition could not complete the programme. Thus, we do not have conclusive evidence on adherence to validate our initial prediction (\(P_{1a}\)) for the personalised robot condition. However, the attended sessions per condition in Fig. 6 shows that the dropouts occur at earlier stages of the CR programme mostly in the control condition, which could indicate a higher tendency to continue the programme with the presence of a (social or personalised) robot.

While the intended duration of the CR programme is 18 weeks (4.5 months) with sessions twice per week, patients who attend the conventional CR sessions take on average 5.7 months to finish the outpatient phase of the programme, as previously highlighted in Sect. 4.3.3. This duration is decreased in the control condition to 4.7 months, which could be due to being part of a study. Nonetheless, both the patients assisted by the social robot and the patients assisted by the personalised robot finished their CR programme earlier within 4.6 months on average. Although the difference between conditions is small, the findings suggest that SAR could encourage patients to attend more actively to the sessions. Patients performing in the social robot and personalised robot conditions were closer to the intended duration, which have multiple benefits, not only in their cardiovascular response, but also for their rehabilitation process, such as reducing the risk of a new cardiovascular event, and faster initiation of the maintenance phase (III) of the CR programme to acquire more independence, and reinforce the results obtained during the outpatient phase (II).

Fig. 6
figure 6

CR programme status of the users in the control, social robot, and personalised robot conditions: ‘complete’ refers to the completed cardiac rehabilitation programme as determined by the clinicians; ‘incomplete’ is when patients need to stop the programme due to reasons beyond their control (e.g. funding, medical condition, the outbreak of COVID-19), and ‘dropout’ refers to not attending 3 sessions in a row without a justification

Fig. 7
figure 7

Normalised recovery heart rate (\({RHR_{normalised}}\)) throughout the cardiac rehabilitation programme for control, social robot and personalised robot conditions. Recovery heart rate is normalised with the initial resting heart rate on each session. The mean \({RHR_{normalised}}\) per stage is marked with X

6.2 Physiological progress

Table 2 Welch–James ADF results (p-value and effect size in parentheses) of comparisons between stages per condition for normalised recovery heart rate (\({RHR_{normalised}}\)). Significant differences (\(p < .05\)) are highlighted in bold. Consecutive stages and other pairwise comparisons do not exhibit any significant differences

As mentioned in the previous sections, the recovery heart rate (RHR) and the training heart rate (THR) are the most important physiological parameters of the CR programme that determine a patient’s health progress. The increase in RHR signals an improvement in the patient’s cardiovascular functioning and healthy recovery. As previously described in Sect. 5.5.1, we analyse the normalised RHR to reduce the subjectivity of the measurements that change between the patients and increase the homogeneity. Figure 7 shows how this parameter changes throughout the CR programme for the patients in all conditions. As was expected, the normalised RHR is significantly different between the stages (\(T_{WJ}(5, 305) = 14.36, p < .001\)). Table 2 shows the percentage of increments of the normalised recovery heart rate and significance analysis between the initial stage and subsequent stages. The results show that the increments are generally greater and more rapid in the social robot condition in comparison with other conditions, in contrast with our prediction \(P_{2a}\). Nevertheless, the patients assisted by the personalised robot present significant differences starting from stage 5, demonstrating an improvement in the RHR and a successfully CR. The overall comparison between the conditions also exhibits significant differences (\(T_{WJ}(2, 359) = 19.62, p < .001\)). In accordance with this result, the pairwise comparison between conditions presents differences, (control-social robot: \(p = .001, \delta = 1.83\), control-personalised robot: \(p = .02, \delta = -1.29\), and social-personalised robot: \(p < .001, \delta = -3.47\)), elucidating an effect of the robot over the conventional CR sessions and the personalisation features (when comparing the robot-assisted sessions). As mentioned before, the patients of the social robot condition presented a greater increment; however, as shown in Fig. 7, the distribution is more symmetric (as depicted with a median that is in the centre of the distribution) for the personalised robot condition, which is a positive finding signifying that the patients tend to maintain a pattern in their RHR. In contrast, the comparison between conditions per stage (Table 3) shows that most of the differences occurred between the social robot and personalised robot, in particular for stages 4, 5, and 6. The corresponding reason could be due to the high-intensity training applied to the patients in the personalised robot condition, as explained below, which might had an adverse effect on the RHR.

Table 3 Welch–James ADF results (p-value and effect size in parentheses) of comparisons between conditions per stage for normalised recovery heart rate (\({RHR_{normalised}}\)). Significant differences (\(p < .05\)) are highlighted in bold. Consecutive stages and other pairwise comparisons do not exhibit any significant differences
Fig. 8
figure 8

Physiological parameters of the patients in all conditions: (a) training heart rate (THR), (b) gait speed, (c) treadmill inclination, (d) exertion level of the patients based on Borg Scale (range 6-20). The mean value per stage is marked with X

Figure 8a shows the progress of the THR throughout the programme for all conditions. The comparison between the conditions does not present differences for the THR, neither in an overall approach (\(T_{WJ}(2, 402) = 1.05, p = .35\)), nor for the comparison of conditions per stage (\(T_{WJ}(10, 275) = 1.03, p = .42\)). On the other hand, the comparison between the stages shows significant differences (\(T_{WJ}(5, 290) = 9.2, p < .001\)), in correlation with the expected behaviour during the cardiovascular rehabilitation programme. The subsequent stages present significant differences from the initial stage (Table 4) for the conditions assisted by a robot, showing that the patients assisted by the social and personalised robot improved their cardiovascular functioning. In the case of the personalised robot, most of the stages present a greater increment than the social robot condition. This outcome can be due to the physical activity intensity determined by the treadmill speed (as measured by the gait speed of the patient) and inclination, which was higher for the personalised robot (Fig. 8b and c).

Table 4 Welch–James ADF results (p-value and effect size in parentheses) of comparisons between stages per condition for training heart rate (THR). Significant differences (\(p < .05\)) are highlighted in bold. Consecutive stages and other pairwise comparisons do not exhibit any significant differences
Table 5 Welch–James ADF results (p-value and effect size in parentheses) of comparisons between stages per condition for gait speed. Significant differences (\(p < .05\)) are highlighted in bold. Consecutive stages and other pairwise comparisons do not exhibit any significant differences
Table 6 Welch–James ADF results (p-value and effect size in parentheses) of comparisons between conditions per stage for gait speed. Significant differences (\(p < .05\)) are highlighted in bold. Consecutive stages and other pairwise comparisons do not exhibit any significant differences

For the gait speed, the statistical analysis shows that there are differences between the stages (\(T_{WJ}(5,330) = 17.47, p < .001\)) corresponding to the expected increase in the treadmill speed within the CR programme. However, the analysis performed for stages within each condition (Table 5) shows that the gait speed for the personalised robot only presents differences between stages 1 and 4 and stages 1 and 6, whereas the differences are significant starting from stage 2 in the other conditions. This result can suggest that the treadmill speed for the personalised robot condition was more homogeneous across time with slight differences in increments. The comparison between the conditions shows significant differences, for the overall perspective (\(T_{WJ}(2, 439) = 110.24, p < .001\)) and considering the stages (\(T_{WJ}(10, 298) = 2.39, p = .01\)). As shown in Fig. 8b, the gait speed was consistently higher for the personalised robot condition, which is confirmed with the significant differences found between the conditions (control and personalised robot: \(T_{WJ}(1,386) = 125.87, p < .001, \delta = 6.33\), social robot and personalised robot: \(T_{WJ}(1,415) = 163.36,\) \(p < .001, \delta = 6.75\)). Similarly, the comparison between conditions per stage (Table 6) shows that there are significant differences between the personalised robot with the control and social robot conditions.

Table 7 Welch–James ADF results (p-value and effect size in parentheses) of comparisons between conditions per stage for treadmill inclination. Significant differences (\(p < .05\)) are highlighted in bold. Consecutive stages and other pairwise comparisons do not exhibit any significant differences

In the case of the treadmill inclination, there are no significant differences between the stages (\(T_{WJ}(5,166) = 1.20, p = .31\)). We observed that during the study, the healthcare staff do not drastically change the inclination to minimise patient exertion and assure a safe rehabilitation programme. However, the outcomes of the Welch–James ADF test show that there are significant differences between the conditions overall (\(T_{WJ}(2,280) = 156.51, p < .001\)) and when considering the ‘interaction’ between conditions and stages (\(T_{WJ}(10,224) = 3.93, p < .001\)). This difference is due to the higher treadmill inclination applied in the personalised robot condition: control and personalised robot (\(T_{WJ}(1,220) = 232.87, p < .001, \delta = 9.67\)), social and personalised robot (\(T_{WJ}(1,192) = 303.04, p < .001, \delta = 10.89\)), control and social robot (\(T_{WJ}(1,347) = 3.07, p = .08, \delta = -0.90\)). These differences are shown in Fig. 8c and Table 7 (the analysis between stages), where the inclination in the personalised robot group is significantly different from the other conditions.

The differences in the gait speed and treadmill inclination indicate that high-intensity training is applied for the personalised robot condition. This type of training did not have a negative effect on the training heart rate because the medical team could intervene when the value reached a critical level based on the robot’s alerts. High-intensity training might have resulted due to the following reasons: (i) clinicians trusted the continuous monitoring and capabilities of the robot more over time, thus applying it in the personalised robot condition, which started two years after the social robot condition, (ii) clinicians relied on the progress feedback of the personalised robot to adjust the session intensity, or (iii) the patients in the personalised robot condition had a better cardiovascular functioning initially; hence, further study is necessary to confirm the findings and reveal the underlying reasons, which could shed a more clear light on how personalisation affects cardiovascular performance and explain why \(P_{2a}\) was not validated in this study. Nonetheless, the interviews with the clinicians (Céspedes et al. 2021) provide support for trust in continuous monitoring and correlate with their initial perspectives on applying high-intensity training based on this feature (Casas et al. 2019).

There are significant differences for the Borg scale between the stages for all conditions (\(T_{WJ}(5, 255) = 2.34, p = .04\)), which indicate that the exertion level changes throughout the rehabilitation due to the physical activity intensity. However, Fig. 8d shows that the sessions mostly remained within the safe exertion level (between 6 and 13) of the patients. On the other hand, the comparison between the conditions (\(T_{WJ}(2, 377) = 0.18, p= .84\)) and the ‘interaction’ of the stage with the conditions (\(T_{WJ}(10, 269) = 1.67, p = .09\)) did not exhibit significant differences. Nonetheless, a significant difference was found in stage 5, between control and social robot conditions (\(p= .01, \delta = -0.59\)) and control and personalised robot (\(p= .04, \delta = -0.53\)). Furthermore, there are only significant differences for the control condition between the initial stage and stage 4 (\(p = .004, \delta = 0.73\)), and stage 5 (\(p < .001, \delta = 1.12\)), indicating that the control group perceived a higher level of exertion in these stages. While these differences occurred only in the control group, the patients assisted by the robot in both scenarios maintained a healthy exertion level throughout the CR programme, despite having a higher intensity training in the personalised robot group.

Fig. 9
figure 9

Number of high heart rate warnings and critical heart rate (call medical staff) alerts of the patients throughout the CR programme. The results show that in contrast with the low perceived exertion levels (Borg scale), warning and critical heart rate values may arise in the sessions throughout the programme. The mean value per stage is marked with X

The warning and critical heart rates detected by the robot shed a different light on the story (Fig. 9). For instance, the high HR warning significantly differs between stages (\(T_{WJ}(5, 227) = 4.79, p < .001\)) due to the physical intensity changes. However, only the personalised robot condition presents significant differences between the initial stage and the subsequent stages (except for stage 5). Comparison between the robot conditions presents differences for the overall analysis (\(T_{WJ}(1, 408) = 5.53, p= .02\)). However, there is only a significant difference in stage 1 (\(p = .003, \delta = -0.71\)). Similar to the high HR warning, the call medical staff alert significantly differs between the stages (\(T_{WJ}(5, 119) = 3.38, p = .007\)), supporting that the heart rate changes across the CR programme due to the physical activity intensity, and may reach critical values. Our previous work showed that these critical alerts could be crucial in promptly detecting any complications and facilitate fast intervention by the medical staff for life-saving measures (Irfan et al. 2020a). In particular, the analysis between the stages based on the condition shows significant differences mostly in the social robot (between the initial stage and stages 3, 4 and 6) than the personalised robot, where the significant differences were only observed between stages 2 and 3. There were no significant differences between the conditions on the number of alerts received (\(T_{WJ}(1, 132) = 0.20, p= .66\)); however, in stage 2 (\(p = .03, \delta = -0.44\)) and stage 6 (\(p = .01, \delta = -0.70\)) the patients in the personalised robot condition had a lower number of alerts, as observed in Fig. 9. The lower number of alerts despite the higher intensity training could indicate a positive effect of personalisation on increasing patients’ cardiovascular functioning, thus, validating \(P_{2b}\).

6.3 Interaction with the robot

Table 8 Welch–James ADF results (p-value and effect size in parentheses) for cervical posture correction in the social robot and personalised robot conditions, and the patients’ compliance in the personalised robot condition. Significant differences (\(p < .05\)) are highlighted in bold. Consecutive stages and other pairwise comparisons do not exhibit any significant differences

As previously highlighted, a correct cervical posture is important for the safety of the patient during exercise on a treadmill, to prevent dizziness and falls. The comparison between stages shows significant differences (\(T_{WJ}(5,209) = 3.16, p = .009\)), correlating with the expected posture behaviour of the patients depending on the physical activity intensity. On the other hand, no significant differences were found between the stages per condition, in addition to a lack of interaction between stages and condition (\(T_{WJ}(5, 209) = 1.16, p = .33\)). However, significant differences were found between the social robot and personalised robot conditions (\(T_{WJ}(1, 421) = 58.24, p < .001\)), as detailed in Table 8. Figure 10 shows that the posture corrections were lower for the patients assisted by the personalised robot. The underlying reason could be the progress feedback given by the personalised robot to the user in an individualised manner. For instance, two events are highly relevant to influence a patient’s cervical posture: (i) the feedback provided at the end of the sessions could positively affect the patient’s intrinsic motivation to improve their next sessions (which would validate \(P_{1b}\)) and (ii) including the patient’s name as part of the feedback could improve the patient’s perception and reaction over this type of feedback. However, considering that the number of posture correction requests was lower for the patients in the personalised robot condition starting from the first stage, the result may also be due to the differing characteristics of the patients; hence, further study with a larger population is necessary to confirm the effects.

Fig. 10
figure 10

Number of cervical posture correction requests by the social robot and the personalised robot conditions. The results show that the corrections were significantly less in the personalised robot condition. The mean value per stage is marked with X

Fig. 11
figure 11

Number of cervical posture correction requests by the personalised robot and the corresponding patients’ compliance to the requests. The results show that the patients complied well to the personalised robot throughout the CR programme, except on the final stage. The mean value per stage is marked with X

The patients in the personalised robot condition complied well with the requests of the robot, i.e. no significant differences were observed between requests and patient posture corrections (\(T_{WJ}(1, 107) = 2.81, p= .10\)), as visible in Fig. 11, with a moderate agreement between raters of the video analysis (Cohen’s \(\kappa = 0.665, z = 8.87, p < .001\)). While this compliance does not significantly differ with stage in general (\(T_{WJ}(5, 71) = 0.87, p= .51\)), the in-depth analysis showed that the patients’ corrections significantly differed from the robot requests in the last stage of the CR programme (\(p = .03, \delta = -0.77\)), which indicates that the patients mostly maintained their compliance throughout the long-term rehabilitation, validating \(P_{3a}\), only to decrease towards the end of the programme. The patients’ corrections are higher in stage 3 (\(p = .004, \delta = 1.28\)) and stage 4 (\(p = .03, \delta = 1.16\)) in comparison with the initial stage, which may have resulted from the physical intensity of the exercise. During the sessions, the experimenters observed that the patients had more difficulty in achieving a straight posture when the physical activity intensity was higher. We could not analyse the compliance for the social robot condition, due to the lack of video data or recorded gaze direction.

Fig. 12
figure 12

Gaze and social interaction of the patients with the robot over the duration of the CR programme. While gaze decreased over time, social interaction was maintained throughout the long-term rehabilitation. The mean value per stage is marked with X

Figure 12 shows the gaze and social interaction for the personalised robot scenario, as obtained from the video analysis. There is a moderate agreement between raters for gaze (Cohen’s \(\kappa = 0.712, z = 9.55, p < .001\)) and a minimal agreement for social interactions (\(\kappa = 0.319, z = 3.54, p < .001\)), which could be due to the subtle cues in facial expressions that may go unnoticed. While gaze generally decreased over the duration of the CR programme (\(T_{WJ}(5,26) = 4.43, p =.005\)), in contrast with \(P_{3b}\), a significant difference only exists between stage 2 and 5 (\(p = .02, \delta = -1.24\)). Although we expected that gaze would be maintained over time due to the personalisation features, either a decrease in the novelty effect or an increase in the physical activity intensity may have affected the gaze. The experimenters observed that as the intensity is increasing during the sessions, focusing on the robot becomes more challenging. On the other hand, social interaction was maintained throughout the programme (\(T_{WJ}(5,27) = 1.31, p = .29\)) validating \(P_{3c}\), and social interactions occurred often in response to the personalised behaviours of the robot (such as progress feedback or correct user recognition), which indicate that the personalisation features help maintain the social interaction over the long-term rehabilitation.

The patients interacted with the personalised robot in a variety of ways, such as talking to the robot, smiling at being recognised or giving a negative response for an incorrect recognition, smiling after motivational feedback, or saying “Bye!” to the robot on their last session. They also talked to other patients about the benefits of the robot during the session. The video recordings showed that the medical staff promptly responded to the robot’s requests for assistance, and also interacted with the robot repeatedly in several sessions, such as playfully trying to capture its gaze, thanking the robot, joking with it, or talking about its benefits to other medical staff, which often elicited positive reactions (e.g. smile, laugh, nod) during the interaction from the patients in the study and those around.

6.4 Perception of the robot

Table 9 Mann–Whitney U-test results for the Unified Theory of Acceptance and the Use of Technology (UTAUT) questionnaire (non-personalised questions) for the baseline group, the social robot and personalised robot conditions. The significant differences (\(p < .05\)) are highlighted in bold
Fig. 13
figure 13

Unified Theory of Acceptance and the Use of Technology (UTAUT) questionnaire results for the baseline group, the social robot, and personalised robot conditions. The patients in the baseline group did not have prior experience of the system and completed the questionnaire after the debriefing and video demonstrations of the social robot. The patients in the robot conditions completed the questionnaire after their last session of the outpatient phase of the cardiac rehabilitation programme (i.e. after completing the study). Significant differences are denoted with \(p < .05\):*, \(p < .01\):**, \(p < .001\):***, as presented in Table 9

Figure 13 and Table 9 present the Unified Theory of Acceptance and the Use of Technology (UTAUT) questionnaire results and the significant differences between the conditions. The perceptions of the patients that completed the CR programme with the social robot are significantly more positive than the expectations of the baseline group, in terms of perceived usefulness, utility, ease of use, and trust, in agreement with the preliminary results (Casas et al. 2019). On the one hand, the patients perceived the personalised robot significantly safer and trusted it more than the baseline group. On the other hand, the perceived utility of the personalised robot was significantly less than the social robot, in contrast with \(P_{4a}\). We believe this may be due to the user recognition and recall problems that was experienced within the sessions, which may have caused negative experiences (Hancock et al. 2011). Only 38% of the known users were correctly recognised by user recognition, and 44% of the new users were correctly detected. The poor performance was due to the failures arising from face recognition, which identified most users as new, only correctly identifying 35% of known users, significantly less than the multi-modal user recognition model, MMIBN (\(p = .01\)). Nonetheless, both the social robot and the personalised robot conditions improved the expectations about the robot and the system. Moreover, the responses to the questions developed for the personalised robot condition (Table 10) showed that the patients highly enjoyed the personalisation features (perceived enjoyment (PE): \(Mdn=5\) on a scale from 1 to 5), especially because the robot used their name, recognised them, and tracked their progress, validating \(P_{4b}\). Moreover, the usefulness of the personalised robot, such as feeling engaged with the programme and motivation to come to the sessions, was highly positive (perceived usefulness, \(Mdn=5\)), validating \(P_{1b}\). Usefulness and enjoyment are important factors for long-term acceptance of the robot (de Graaf et al. 2016). In addition, most patients felt attached to the robot at the end of the programme (\(Mdn=4.5\)).

The additional feedback (through open questions) of the patients in the personalised robot condition was similar to that of previous findings of the social robot condition (Casas et al. 2019). All patients recommended the system for future patients and commented on its usefulness, personalisation, and effects on user motivation, such as:

  • “The cardiac rehabilitation with the robot will help you to recover as quickly as possible, and you will be able to progress by being linked to the robot.” - P6

  • “I feel confident in doing the rehabilitation with the robot, because I know that it is personalised and constantly monitoring my performance and progress.” - P6

  • “I really like the idea of the robot, as he was constantly monitoring. Also, I think the corrections the robot made are good, it keeps me focused on the therapy.” - P5

  • “It feels more comfortable being on a treadmill with the robot and because the robot is more aware of the patient.” - P5

  • “Working with the robot motivates me.” - P1, P6

  • “Working with the robot makes me feel happy.” - P1

  • “The robot interacts in a positive way with me, it helps me along with the medical staff, and it is also a good tool for them. I would not change anything (about the robot).” - P3

  • “I would recommend the robot, it is a great help.” - P2

Nonetheless, the patients noted the need for improving the robustness of the user recognition and sensors, and decreasing the repetitiveness of the robot phrases, which was also mentioned in the previous study with the social robot (Casas et al. 2019). In addition, one patient found the appearance and the sound of the robot to be childish. Furthermore, because the progress feedback addressed the difficulties experienced in the sessions, some of the patients had concerns that they were not recovering well.

Fig. 14
figure 14

Working Alliance Inventory responses for the personalised robot condition evaluated at the middle of the CR programme (18th session) and the final session. The results suggest that the patients’ positive perception of the robot was maintained throughout the programme. A significant improvement was achieved for the perceived goal construct in positive formulation (\(p = .003\), \(V=42\))

Through the WAI questionnaire (Table 11), we can analyse how the patients’ overall perception of the robot changed over the long-term CR programme. Figure 14 shows the patients’ responses at the middle of the programme and at the final session. Wilcoxon signed-rank test shows that there is a significant improvement between the perceived goal construct in the positive formulation (\(p = .003\), \(V = 42\)) from the middle of the CR programme to the final session. No significant differences are found in the other constructs between the tests (\(p > .05\)). The positive formulation (on the right of the scale) of Bond, Goal, and Task shows that the robot and the CR programme were generally positively perceived, and the patients maintained their bond with the robot over the duration of the programme, validating \(P_{3d}\). Furthermore, the patients generally disagree with the negatively formulated questions (on the left of the scale), such as “I feel uncomfortable with the robot.”, indicating that despite the negative user experience due to sensor and recognition failures, patients perceived the robot highly positively, which may support that personalisation mitigates the negative user experience, similar to (Irfan et al. 2020b). Another possibility is that the benefits patients felt from the intervention mitigated their negative feelings about the errors.

7 Discussion

7.1 Hypotheses and predictions

Previous studies in SAR (in Sect. 2) highlighted the importance of encouraging feedback and continuous monitoring during repetitive exercise for healthcare programmes to increase motivation and enhance task performance. Other research has shown the value of personalisation for long-term HRI to improve users’ motivation, task performance, engagement, and perceptions. Correspondingly, this work aimed to address how these previous findings translate to long-term cardiac rehabilitation programmes, through the use of fully autonomous generic and personalised socially assistive robots in a clinical environment with non-expert users, that is, the patients and the clinicians.

Our study showed partial support for H1 (Personalisation will improve patient motivation and adherence to the CR programme): adherence (\(P_{1a}\)) could not be evaluated properly, but an increase in motivation for the programme (\(P_{1b}\)) was validated. H2 (Personalisation will improve the cardiovascular performance of the patients) was also partially supported: while a lower gain of normalised recovery heart rate was found with the personalised robot, thus, not validating \(P_{2a}\) probably due to an imbalance of participants’ initial health levels or the higher training intensity, the lower number of alerts (\(P_{2b}\)) was validated. H3 (Interaction with the personalised robot will be maintained throughout the long-term programme) was mostly supported by our study: except for an initial decrease in gaze (in contrast with \(P_{3b}\)), patients complied with the robot (\(P_{3a}\)), and the social interaction (\(P_{3c}\)) and bond (\(P_{3d}\)) with the robot were maintained throughout the long-term programme, validating these predictions. Finally, H4 (Personalisation will improve patients’ perceptions of the robot) was partially supported by our results: while the personalised robot was rated lower in utility (in contrast with \(P_{4a}\)) due to recognition and recall problems, personalised features were acknowledged and enjoyed (validating \(P_{4b}\)).

Overall, our long-term study showed that personalisation presents promises for SAR in cardiac rehabilitation programmes. We draw on these findings and highlight our key takes and suggestions based on the drawbacks and limitations, to benefit future researchers exploring SAR for long-term interactions in the real world.

7.2 Benefits of personalisation

The main challenges of cardiac rehabilitation are providing close monitoring of patients within the group sessions and assuring adherence to the long-term programme to ensure that the patients recover fully and retain healthy habits. Our study showed that robots motivate patients to continue the programme and finish the programme earlier. Moreover, patients assisted by the personalised robot acknowledged that the personalisation features, such as the progress feedback and adherence tracking, encourage them to come to the CR sessions.

The goal of the outpatient phase of the cardiac rehabilitation programmes is improving cardiovascular functioning through structured exercises that progressively increases in intensity to reduce the risk of suffering recurrent events and accelerating recovery. Our results indicate that the patients assisted by a (generic or personalised) robot achieved a significant improvement in their training and recovery heart rate and, thus, cardiovascular functioning. Moreover, the clinicians trusted the continuous monitoring aspects of the robots (Casas et al. 2019; Céspedes et al. 2021), which may have reinforced applying high-intensity training throughout the CR programme for the personalised robot, causing a greater increment in the patients’ training heart rate. The high-intensity training did not increase the perceived exertion level, and the patients experienced a lower frequency in reaching the critical heart rates in comparison with the patients in the social robot condition, further supporting the improvement in patients’ cardiovascular functioning. Nonetheless, the adoption of the technology by the medical staff and their immediate intervention in critical cases played an important role in achieving this improvement (Irfan et al. 2020a).

Throughout the CR programme, the patients assisted by the personalised robot maintained a better posture than the patients assisted by the social robot despite a higher exercise intensity. Moreover, the patients mostly complied with the personalised robot’s requests throughout the long-term rehabilitation. These findings show the importance of personalisation through addressing the person by their name and progress feedback such that the patients are more motivated to maintain and improve their good posture. Personalisation features also elicited gaze and social interaction with the robot, such as smiling when the robot addressed them with their name or upon correct user recognition, thanking the robot or talking to it after receiving personalised feedback. While gaze decreased after the initial session, which could be due to the fading novelty effect or the increasing exercise intensity requiring the patients to focus on the exercise, the social interaction with the personalised robot was maintained throughout the long-term programme, as intended. Future work can examine the behaviours of the patients assisted by the social robot through video analysis to compare the benefits of personalisation more thoroughly. However, the presence of a camera and videotaping may lead to a change of behaviour when observed (Irfan et al. 2018a), known as the Hawthorne effect (Roethlisberger et al. 1939), as well as selection bias on willingness to participate in the experiment, which could have affected the interaction of patients with the personalised robot, but this might have been offset during the long-term programme. Recording the gaze direction and duration through the tablet could be an alternative approach in detecting engagement (Oertel et al. 2020) that could eliminate such confounds.

Both the social robot and the personalised robot met the expectations about the system. The patients commended the use of both robots, and expressed feeling more secure due to continuous monitoring and immediate feedback. In addition, since the patients participated in group sessions, other patients that were not part of the study were able to observe its benefits, and several of them declared interest in working with a robot in their CR programme. Personalisation features were highly enjoyed by the patients, and the progress and adherence feedback were reported to be useful by the patients, suggesting improvement in motivation and increased adherence to the programme, as initially aimed. While the patients felt initially sceptical towards the robot’s role due to their lack of prior experience (as observed by the experimenters), the clinicians noted that their trust in the measurements and its feedback increased over time. These observations were supported by the WAI questionnaire results in which the perceived goal was significantly improved over time. Moreover, patients maintained their bond with the personalised robot and their perceptions over the duration of the programme. These findings are in line with the clinicians’ perceptions that the personalisation of the robot improves the quality of the interaction (Céspedes et al. 2021).

Continuous monitoring and immediate feedback aspects of both robots were highly appreciated by the patients and the clinicians in terms of the awareness of patient performance within the session. Additionally, progress tracking throughout the programme by the personalised robot helped inform both the medical staff and the patients, which increased the awareness of the medical staff further to detect complications (Irfan et al. 2020a; Céspedes et al. 2021), and provided knowledge of their recovery to the patients, thereby improving their perception of the CR programme and their motivation. While all of these aspects could be provided without the presence of a robot, i.e. through verbal feedback from the tablet interface, previous studies showed the added benefits of embodiment in improving compliance, likeability, social engagement, adherence, and task performance, as discussed in Sect. 2. However, having a robot in a rehabilitation programme is not cost efficient, in terms of the initial investment and maintenance requirements; thus, future work can investigate whether the lack of a robot could reproduce the benefits found in this study.

7.3 Drawbacks of personalisation and suggestions for future work

Long-term studies are labour- and time-intensive; thus, it is challenging to recruit subjects willing to participate, especially for a rehabilitation programme with a novel system, as the patients may be sceptical towards the approach (Casas et al. 2019). Moreover, dropouts and the incomplete rehabilitation experienced due to unforeseen reasons such as funding, the outbreak of COVID-19, and other medical conditions, caused a relatively limited number of patients with skewed characteristics (i.e. gender, age, and obesity) that may have affected the overall results. Furthermore, the patients were progressively recruited over 2.5 years and the personalised robot condition was started two years after the initial two conditions, which could have affected the initial technological acceptance and perceptions of patients towards robots (Pino et al. 2015). Nevertheless, our statistical analyses and the effect sizes indicate that the patients assisted by the personalised robot improved their cardiovascular functioning in terms of their training heart rate and endurance (evident by low exertion levels and lower critical heart rate alerts) with the high-intensity training, along with a better cervical posture, in comparison with the social robot. However, personalisation was found to have drawbacks.

While the personalised robot led to a higher increment from the patients’ initial recovery heart rate than the conventional CR programme, which is one of the primary parameters that reflect the patient’s cardiovascular performance, the social robot presented a greater and more rapid improvement. However, this result could be due to the higher intensity training applied to the patients assisted by the personalised robot; hence, a further balanced (i.e. same level of exercise intensity) experiment is necessary to confirm the findings.

Relying on full autonomy in a real-world environment brought about challenges such as malfunctioning in the sensors and the robot, as well as connection problems with the tablet interface, which may have caused negative perceptions of the robot, in addition to missing data. Especially relying on a user recognition system for personalisation decreased the perceived usefulness, utility, and trust to the robot, and the low reliability of the user recognition system was remarked by the patients. While the multi-modal incremental Bayesian network (MMIBN) with online learning is the only method that supports sequential and incremental recognition of previous and new users without requiring pre-training for real-time HRI, user recognition can be improved by using identifiers with lower noise, integrating additional non-intrusive modalities (e.g. voice, facial marks, eye colour) or by using other online learning methods. Another option is to remove user recognition, e.g. through requesting the patient’s name from the tablet interface either from the patient or the clinician; however, the questionnaires and video analysis showed that the patients enjoyed being recognised, and it could also decrease the naturalness of the interaction. Moreover, the acquisition of more robust medical sensors is necessary to ensure the reliability of the data. Nonetheless, the highly positive perceptions of the patients indicate that negative user experiences can be overcome with the added benefits of the robot.

Patients and clinicians remarked that both robots were repetitive; thus, novel and a larger variety of robot responses should be added to improve the sociability and social presence of the robot, especially in long-term interactions. A future research direction can be to adapt these responses based on the patient’s sensory values and adapt progress feedback to reflect the overall (or stage) progress instead of comparing with the previous session, to keep the interactions engaging and interesting in the long-term (Matarić and Scassellati 2016).

While speech and emotion recognition can be integrated with the robot to improve the naturalness and adaptivity of the interaction, these may not reliable in a noisy real-world environment. For instance, Fundación Cardioinfantil-Instituto de Cardiología plays loud music to motivate the patients in their exercise, which could cause failures in speech recognition. Thus, gesture recognition could be a better alternative. Emotions can be misleading due to the patients’ exertion levels; hence, extracting fatigue levels from cameras and wearable sensors could be more reliable and provide a complementary measure to the Borg scale for detecting when to provide motivation and additional alerts (Pinto et al. 2020).

The appearance and voice of the (NAO) robot were found to be childish by one of the patients, which could have affected the underlying perceptions of the patients (Goetz et al. 2003; Pino et al. 2015). Future work can use other taller and mobile platforms, such as a Pepper robot (SoftBank Robotics Europe, France), to also address other patients in the group session. Moreover, other robotic platforms with facial expressions, such as Nexi (MIT Media Lab, USA), can be used to improve the sociability of the robot (Johanson et al. 2020).

While the architecture of our patient–robot interface and personalisation features were designed in collaboration with medical specialists, this study showed various additional features that can be changed (e.g. progress feedback structure) or added to the system based on the needs of the patients, highlighting the importance of co-design (or participatory design). Co-design is the process where users (stakeholders) are involved in designing a product from the idea generation stage (Sanders and Stappers 2008). Involving patients and medical specialists in the design of a socially assistive system not only allows adapting to their (changing) needs, but also enabling them to understand the limitations of the system and actively contribute to the design of their care (Bate and Robert 2007; Šabanović 2010).

8 Conclusion

This paper presented a personalised socially assistive robot for the outpatient phase of a long-term cardiac rehabilitation programme. The aim of the socially assistive robot is to improve motivation and adherence to the programme. Personalisation features, such as recognising patients, addressing by their name, tracking their attendance, and providing progress feedback, were developed to improve and maintain motivation over the long-term interaction. Three conditions were designed to evaluate the impact of socially assistive robotics and personalisation on conventional cardiac rehabilitation, labelled as social robot, personalised robot, and control, respectively. 43 patients were recruited for the study; however, due to the dropouts and the external unforeseen reasons, such as funding, the outbreak of COVID-19, and other medical conditions, 26 patients (9 for control, 11 for social robot, 6 for personalised robot) actively participated and completed the outpatient phase of cardiac rehabilitation and the study.

The social and personalised robots were found to improve cardiovascular functioning in comparison with a conventional cardiac rehabilitation programme. Furthermore, the patients assisted by a robot completed their rehabilitation in a shorter duration, suggesting benefits on adherence. Moreover, the perceptions of the patients and the clinicians improved in comparison with expectations, and they were recommended for future use by both groups. The continuous monitoring of the robot enabled prompt detection of critical conditions, which may have reinforced the trust of the clinicians in the robot and thus, providing high-intensity training in the personalised robot condition. This resulted in a higher training heart rate without an adverse effect on the endurance (low perceived exertion levels and critical heart rates), however, a lower and slower improvement in the recovery heart rate in comparison with the patients assisted by a social robot. Moreover, relying on a fully autonomous robot for personalisation in long-term rehabilitation brought along sensor and user recognition failures, which decreased the perceived utility of the robot. On the other hand, personalised features often elicited gaze and social interaction, facilitated a bond with the user, and were highly positively perceived and this perception was maintained in the long-term, suggesting that various benefits of personalisation can overcome its drawbacks, supporting the potential for improving the conventional cardiac rehabilitation programmes and the long-term interaction with a socially assistive robot.