1. Introduction
In recent years, Human–Robot Interaction (HRI) has become a relevant topic in robotic research. HRI studies focus on analyzing the collaboration and communication between humans and robots. Some fields in robotics have used HRI solutions in complex tasks that robots or humans can not solve individually [
1]. Many solutions focus on improving task performance without reducing safety [
2,
3,
4,
5,
6,
7]. In surgical robotics, doctors are assisted by robots, improving their skills and reducing the risks associated with an intervention [
2,
3]. Collaborative robot manipulators (co-bots) are frequently used to improve productivity inside industries [
4,
5]. Further, some HRI studies have played a significant role in Search And Rescue (SAR) operations, such as developing systems that place sensors and monitor vital signs [
6] or optimizing signal policies in rescue operations using game theoric approaches [
7].
Physical Human–Robot Interaction (pHRI) is required for many applications, such as exoskeletons [
8], rehabilitation [
9,
10], or prostheses [
11]. Usually, these applications need solutions that satisfy multiple objectives, as in [
12], where Genetic Algorithms (GA) are implemented. One of the ultimate goals for several scenarios involving pHRI is to achieve natural contact through touch between the participants [
13]. A common definition of touch sensing is the perception of tactile and kinesthetic signals through the skin [
14]. Kinesthetic is defined as two sources of information: the relative positions of the limbs and the dynamical forces produced by the muscles [
15]. Haptic perception is the term used to describe the combination of tactile and kinesthetic information. Without the sensations experienced in this sense, we would be unable to perform numerous essential tasks. This necessity is also present in the field of robotics [
16]. Tactile sensors have played a crucial role in this sense, enhancing the capabilities whenever integrated into robotic systems [
17,
18]. The most common haptic perception approaches use static tactile information only. Traditional computer vision and machine learning techniques have been used to challenge the problem of tactile object recognition [
19,
20]. Considering tactile data as dynamic information is an approach some researchers have considered [
21,
22]. Other researchers proved that haptic data could be treated as sequential data; therefore, Long-Short Term Memory (LSTM) provided excellent results in detecting the slip direction [
23]. Kinesthetic data have also been proven to be useful. In [
24], the interaction forces between a gripper and a human are estimated using proprioceptive information only. Moreover, an estimation of the roll angle of a wrist using kinesthetic data is presented in [
25]. Despite the benefits of combining numerous haptic-based sources, only a few studies have followed this strategy. In [
26], a single and unplanned grasp is performed on multiple objects, and an approach to classify them using the proprioceptive and tactile data of the gripper is presented. Further, in our previous work, we developed a fusion of haptic data to classify objects and enhance the results of the tactile and kinesthetic approaches performed separately [
27].
Reacting to haptic inputs is a key component in pHRI, which typically requires a robot with tactile sensors and/or kinesthetic perception capabilities. The recognition system’s algorithm usually needs a dataset for training purposes. In recent years, several haptic datasets have been presented. In [
28], Wang et al. present "TacAct", which contains tactile data from multiple subjects and differentiates types of touch actions using a convolutional neural network. On the other hand, in [
29], tactile data are recorded from grasping objects with a sensed globe. A neural network is also trained to classify the objects in the dataset. Albini et al. presented a method to discriminate between touch from human and non-human hands, trained with the collected dataset [
30]. Not many datasets containing both tactile and kinesthetic information are found in the literature. In [
31], both sources are recorded from the NICO humanoid robot, classifying in-hand objects using various neural network approaches. Nevertheless, no regression approaches were found in the literature using tactile and kinesthetic data as the input, primarily due to the lack of haptic datasets.
This paper presents a novel dataset of a forearm obtained with a gripper that records full haptic perception. The three-fingered gripper contains a high-resolution tactile sensor in a finger and two independent underactuated fingers with proprioceptive sensors to provide kinesthetic data. The gripper performs a squeeze and release process, which approximates human palpation to obtain both tactile and kinesthetic data over time. With this procedure, some characteristics can be obtained, such as size, stiffness, and hard inclusions [
22]. Thirteen equally spaced measurements, from the wrist to the elbow, have been recorded, with a total of sixty experiments each (
). This dataset provides information about the bones and muscles, whose size and position vary along the forearm, as seen in
Figure 1, and could be used as training data in pHRI applications. To illustrate the application of the recorded dataset, we present an estimation of the forearm’s grasping location. A regression approach has been taken into consideration using LSTM neural networks. To the best of our knowledge, this is the first dataset that provides tactile and kinesthetic information about a whole human forearm. This information is relevant in the case of performing a safe upper-limb manipulation to avoid the wrong manipulation that could hurt the human. Moreover, some procedures must be performed in specific parts of the forearm, such as locating sensors to obtain optimal biomedical signal readings or performing medical assistance.
The main contributions of this work are:
A tactile-kinesthetic dataset obtained with a gripper of a whole human forearm for pHRI applications.
An example of the use of this dataset with a deep learning fusion-based regression approach, where both tactile and kinesthetic information are utilized to estimate the location of the gripped section on the forearm.
The performance of the proposed neural network is analyzed, providing non-trained examples to the regression approach, and comparing the outputs with the ground truth data. The dataset and code are publicly available in a GitHub repository (
https://github.com/fpastorm/Forearm-tactile-kinesthetic-dataset, accessed on 8 November 2022).
This paper is organized as follows:
Section 2 presents the experimental setup necessary for the dataset acquisition.
Section 3 details the dataset collection process.
Section 4 describes how the tactile and the kinesthetic data can be used for a regression approach. The experiments performed are related in
Section 5, followed by the results obtained and the discussion of the results in
Section 6. Finally,
Section 7 includes the conclusions and prospective research work.
3. Dataset Collection Process
Haptic information can be obtained by multiple approaches, realizing different exploration procedures (EPs) [
33]. In the case of in-hand recognition, one of the most common EPs is to measure the shape of a grasped body with the kinesthetic information fingers provide. Another common EP is to perform a palpation procedure to measure the stiffness the body has, and to detect some internal features it might present. Both EPs can be performed at the same time, realizing a squeeze-and-release procedure.
A robotic hand (presented in
Section 2) is utilized to perform these two EPs. The squeeze-and-release process is realized by holding the forearm inside the robotic hand and grasping it with the help of the two underactuated fingers. Both motors apply random increased (in the squeeze) and decreased (in the release) torque to simulate human palpation. An initial 10% of the max torque is applied, increasing by 5% every 0.5 s, with a random variation of
until 90% of the max torque is applied. Then, the torque decreases to 10%, similarly as described in the squeeze procedure. The randomness in the torque applied is included considering a human does not always perform palpation in exactly the same way. The kinesthetic information is provided by the two underactuated fingers and the tactile information is obtained by the tactile sensor located in the fixed finger.
We will define
as the percentage the forearm grasped, considering the wrist is
, and the elbow is
, following a linear scale for the intermediate grasps, as seen in
Figure 1. Overall, thirteen equally spaced EPs have been performed, from the wrist to the elbow of a right forearm of a subject, obtaining the following dataset (
) vector:
Each subindex (k) is associated with the percentage of the grasped location with the following relationship , with :
A total of 60 iterations have been carried out for each subindex, with a total of 780 experiments. Tactile information is recorded as a time tensor
, with
being the number of tactile frames obtained in each squeeze-and-release process. An example of some of those tactile frames is represented in
Figure 5a. The kinesthetic data recorded are a time matrix
, where
represents the number of samples recorded in each squeeze-and-release EP. The time matrix records the position of both actuators (
) and the angle between the underactuated joints (
), as seen in
Figure 5b. We will define the roll angle constant, considering it can be estimated and reoriented to a given angle [
25]. Therefore, for this dataset, we will define the wrist roll angle as 0 for each measurement.
4. Tactile and Kinesthetic Data Fusion for Regression
Physical interaction between robots and human upper limbs is crucial for many applications. For instance, rescue tasks, where a triage must be performed, locating biomedical sensors in specific parts of the forearm where they obtain better locations, or injecting medicine to survivors in critical condition. Other applications, such as assistive robotics, perform robot-initiated upper limb pHrI. Manipulating human upper limbs can be a high-risk task, in the sense we could harm the subject if we lack information about the grasped position and the forearm roll angle. Considering these necessities, in this work, we present a regression method using the haptic dataset to estimate the grasped forearm location (
). Both tactile and kinesthetic information are trained individually, and then, the outputs are fused to enhance the results, similar to our previous work [
27], where we fused the haptic information but for classification purposes.
4.1. Neural Networks Structure
A schematic of the three regression neural networks is presented in
Figure 6. Both tactile and kinesthetic networks are based on LSTM layers. This type of layer learns long-term dependencies between sequence data obtained in time series and is able to preserve previous information, as demonstrated in various works [
27,
34]. The squeeze-and-release process provides haptic information with an evident temporal structure; therefore, it seems intuitive to conclude that LSTM networks will perform adequately.
The tactile images time series (
) is used to train the tactile network, which is formed by four layers. It presents a Convolutional LSTM [
35] layer
, with a kernel of
and a hyperbolic tangent (tanh) activation function, followed by a convolutional layer
with a kernel of
and a Rectified Linear Unit (ReLU) activation function. Then, two fully connected layers
with 64 neurons, and ReLU activation function, and a single-neuron and linear activation function, respectively.
The kinesthetic network is fed with the angle time matrix () and is formed by four layers. The first three layers are LSTM layers , with 1000, 500, and 100 neurons, respectively, to achieve a progressive codification of the input matrix. All of them have tanh as an activation function. The last layer is a single-neuron fully connected layer with a linear activation function.
Considering tactile and kinesthetic estimation outputs may differ, its results are interesting for learning the strengths and weaknesses of each network with a new fusion neural network that uses both sources. Estimation outputs are concatenated, creating a new input matrix . The fusion network presented is a four-layer fully connected network with 128, 64, 32, and 1 neurons, respectively. All layers have a ReLU activation function, with the exception of the last layer, which has the linear activation function.
4.2. Training
To effectively train the tactile and kinesthetic data and then the fusion neural network, let us define two new subsets created from the original dataset (
):
contains the elements of the odd subindex of
, while
is formed by the even subindex elements of
. This division of data is not arbitrary and will be discussed in
Section 5.
The tactile and kinesthetic networks are trained with the subset. On the one hand, the tactile network estimation model is trained with 54 examples for each (90% of the subset), using 20% of the training data for validation. An Adam optimizer function is utilized for training, using the mean squared error as a loss function, with a learning rate of 8 ×
10−4 over 500 epochs. On the other hand, the kinesthetic network estimation model is trained on the same 54 samples with the same 20% of the training data for validation purposes. An Adam optimizer function is also utilized for training, using the mean squared error as a loss function, with a learning rate of 1 × 10−5 over 2000 epochs.
Finally, both tactile and kinesthetic networks are fed with the entire
subset, and then, as described in
Section 4.1, the outputs are used to train the Fusion neural network estimation model. Similarly, as with the tactile and kinesthetic neural networks, 90% of the data has been used for training purposes, with 20% for validation procedures. An Adam optimizer function is also used for training, using the mean squared error as a loss function, with a learning rate of 1 × 10
−6 over 2500 epochs.
5. Experiments
To effectively evaluate the performance of the three neural networks, they will all be fed with non-tested data from the known subset and completely new data from the other subset. Therefore, tactile and kinesthetic neural networks will be tested with the remaining 10% examples from each l element and with the entirely unknown . Similarly, the fusion network will be tested with the remaining 10% of examples from each i element and with the entirely unknown subset. To obtain significant statistical performance metrics, 20 trainings and tests of each regression neural network have been performed. In each of the iterations, random training data and random test data are taken from the 60 experiments, but the tactile data and the kinesthetic data always belong to the same squeeze-and-release process. We will evaluate the performance by analyzing the maximum and minimum estimated values, the amplitude of the 25/75 percentiles, and the median of the estimated values. Moreover, an analysis of the Root Mean Squared Error (RMSE) and the Mean Absolute Error (MAE) of each percentage of the grasped forearm will be performed.
The training and test experiments were performed using the Keras API in an Intel Core i7-8700K computer with 16 GB of RAM, equipped with an NVIDIA RTX 2080Ti graphics processing unit (GPU).
6. Results and Discussion
The results of the 20 iterations of experiments are presented in this section. The estimation output values are presented in a box plot for each neural network, as seen in
Figure 7. Moreover, the RMSE and the MAE of each percentage of the grasped forearm for each neural network are shown in
Figure 8, with
Table 3 summarizing the results.
As appreciated in
Figure 7, the tactile regression neural network presents the most significant range between the minimum and the maximum output, with an average of 50.87%. On the other hand, the kinesthetic regression neural network presents better results, having an average range of 41.56%. The fusion provides the minor range, with an average of 28.24%. As expected, fusion is the best approach, considering that their results are the most consistent over the 20 iterations. Similarly, the range of the 25/75 percentiles is the highest in the tactile regression neural network, followed by the kinesthetic neural network. Again, the fusion neural network outperforms with the lowest range.
Regarding MAE and RMSE errors, as seen in
Table 3, the tactile neural network approach presents the most significant average error. Nevertheless, the results provide a fair estimation of the grasped section and could be utilized in applications that do not require high accuracy. As presented in
Figure 8, the accuracy of the estimation in the tactile neural network is very precise from the wrist (
) until approximately half of the forearm (
). Then, the accuracy decreases abruptly until the elbow (
). This occurs due to the fact that in the zone between the wrist and the middle of the forearm, the bones are closer to the skin; therefore, when performing the squeeze-and-release process, the tactile sensor obtains a large amount of internal information. However, in the area from the middle of the forearm to the end of the elbow, the bone is deeper with respect to the skin surface; thus, the sensor finds less information that differentiates the forearm sections. The kinesthetic neural network outperforms the average results of the tactile neural network; thus, it is also possible to realize precise estimations. The kinesthetic estimation presents its best results in the zones close to the wrist and the elbow. However, it lacks the capacity to recognize half of the forearm zone. Kinesthetic data obtains information about the shape of the forearm; therefore, we can assume that the shape of both the wrist and elbow are sufficiently different. However, it is intuitive to conclude that the middle part of the forearm presents similar shapes, considering the estimation is worse in that zone. Lastly, the fusion estimator presents the best average error. The error is low in almost every
, except in
and
, coinciding with the highest errors of the kinesthetic and tactile neural networks, respectively. The robustness of the fusion network is remarkable. Although some error in the estimation is shown, no disparate outputs are presented.
Even though the results are satisfactory, they could be improved, including a more extensive dataset incrementing the thirteen grasps measurements and increasing the sixty experiments realized on each measurement. However, gathering a significant amount of tactile and kinesthetic data is a big investment. Various techniques for this challenge could be used, such as sim-to-real approaches that pre-train the models using simulated data; or employing generative adversarial networks (GANs) or variational autoencoders to produce new data that are equivalent to that received by the real sensor. It is also important to remark that this is training performed with a dataset formed from only one subject’s information; thus, a satisfactory performance in other subjects is not ensured.
7. Conclusions
In this work, a haptic database of a human upper limb was presented. Thirteen equally spaced measures have been obtained from the wrist to the elbow of the right forearm. The data have been recorded by a three-fingered robotic gripper, with a fixed finger containing a tactile sensor to perceive tactile information and two underactuated fingers that perform the grasping procedure and record proprioceptive information.
Moreover, an application using the collected dataset has been designed. An estimation of the grasped forearm position was presented, fusing the haptic data in a neural network approach. Tactile and kinesthetic information have been trained separately, and their outputs were fed to a new fusion neural network that has enhanced the results.
Future research shall consider the addition of tactile sensors to the underactuated fingers to improve the sensing capabilities of the whole gripper, including different forearm datasets, considering male and female subjects and different forearm profiles, taking into account various factors, such as muscle, fat percentage, or wingspan. Some data augmentation procedures could also be applied to increase the provided dataset, using flipping and scale techniques. Retraining this new dataset could lead to the regression fusion neural network being able to estimate grasped sections of left hands grasped symmetrically and sections of grasped forearms of different subjects. However, these assumptions must be handled with caution and studied in depth.