1. Introduction
Widespread in controlled environments such as industries, robots are moving towards unstructured settings like homes, schools, and hospitals performing high-level, complex, and fast reasoning; nevertheless, several challenges remain unsolved for robot skills achieving human-level capabilities [
1]. Although robots can accurately perform several tasks such as walk, pick and place objects, understand, and communicate with people, they still present a lack of hand dexterity. Improvements on tactile sensing for in-hand manipulation and an increased understanding of human perception to action inspire and could advance robot possibilities in the scenarios mentioned above [
2].
Manipulation skills developed in the human’s hand and the brain have a level of ability rarely seen in other animals. Grasping and manipulating objects is a distinctive part of the human-being skill set. It is an ability evolved from the erect posture that freed our upper limbs, turning our hands into two sophisticated sets of tools [
3]. Not surprisingly, humans’ hand dexterity and reasoning are the holy grail of bio-inspired robotics control and actuation. Hand dexterity is the ability to interact in a useful way with objects in the real world. During robotic manipulation, a robot changes an object’s state from an initial configuration to a final pose. For instance, during pick and place tasks, the goal of robotic platforms is to change the position and orientation of an object inside the manipulator’s workspace. Comparatively, in-hand manipulation is the ability to change the pose of objects, from initial orientation to a given one, within one hand. Implementing robots that change objects’ orientation while maintaining a stable grasp has the potential to amplify even more robot area of activity [
4]. Robotic manipulator literature comprises a long list of examples of robotic hands with many levels of dexterity. From there, underactuated hands arise as an option that can achieve a reasonable level of dexterity with simplicity.
Research on haptic perception over the past decades has developed an in-depth knowledge of the psychological aspects of human touch employed to manipulate objects and perceive their characteristics. Lederman et al. [
5] presented the somatosensory system divided into two subsystems: The “What” system that carries out perception and memory; and the “Where” system that deals with perception to action. In humans, the “What” system performs the recognition of surfaces and objects through their tactile properties [
6]. Robots that implement similar recognition systems demonstrate their efficacy when similar objects are identified promptly even without visual feedback [
7]. From this perspective, Reference [
8] is one example in recent research, where authors employed camera-based sensing and deformable material as input to an object recognition system integrating grasping and object recognition. On the other hand, the “Where” system, which has a counterpart in vision, produces a description of points, surfaces, and reference frames in the world. Different from vision, touch refers to a location in the sensory organ, the skin itself, and localization in the environment. The human sensorial loop combines tactile perception in the presence of vision had attracted research interest due to increased reliability when both modalities [
9]. Recent literature presents several approaches to this issue with a significant contribution on the tactile feedback topic were inspired by the concept of the tactile flow in humans, and improvements have been studied on the analysis of tactile feedback in robots using computational tactile flow [
10]. Although our work is also inspired by the human somatosensory system, we took different approaches for a similar goal. Instead of using the human tactile flow as a base for our research, we developed pose estimation based on the human visuotactile system, also called the “Where” system. This intersensory interaction extends to situations combining vision and touch to include different object information, for interobject interaction or allocation of attention [
5].
During visuotactile interaction, multiple frames of reference are simultaneously available on human haptic spatial localization [
5]. In the “Where” system, two types of touch spatial localization are considered, one being a position on the body where the stimulus is applied, or otherwise, in the external world from where the stimulus comes. During a task, humans can use a single frame or combine multiple frames of reference, the origins of which may be visible landmarks or a body part from the individual. Even though the aforementioned haptic frame describes the contact between the skin and an external object, one could use landmark axes to specify a frame of reference external to the body (an “allocentric” frame of reference). Similarly, a local frame of reference, such as a fingertip axis, is used for localization on the body (an “egocentric” frame of reference). In summary, haptic spatial localization could be described either by a coordinate system centered on the body, egocentric or by external features in the environment such as the edges of a table, an allocentric frame of reference.
Investigations on tactile sensing have the potential to improve robotic in-hand manipulation, including, but not limited to, object characteristics extraction and feedback control. Tactile sensing provides essential information about object manipulation, solving problems such as object occlusion and object’s pose estimation under stable grasping. Form, shape, and functionality of the human skin also inspired research on the tactile sensing field. A bio-inspired approach to tactile sensing has significant results in the literature, including successful works on texture classification and control feedback. The present work uses a visuotactile approach to pose estimation using data collected from bio-inspired multimodal tactile modules in conjunction with camera feedback.
Successful robotic manipulation starts with a stable object grasp; therefore, robots are expected to have robust grasping skills. Considering grasp as a control problem, in contrast to decomposing a grasping procedure into planning and execution, they do not require any specific hand–object relative pose and are more robust under pose uncertainty [
11]. A model-free solution in addition to computationally inexpensive control laws uses simpler hand designs, still ensuring a stable grasping solution. Fuzzy logic control for grasp stability is present in the literature as a useful tool for in-hand manipulation [
12,
13,
14,
15]. Multifingered hands achieve stable gasping when no resultant forces act on a fully restrained object [
16]. This work was developed using a fuzzy controller that was able to perform stable grasp tasks with controlled fingertip force single and dual actuated versions.
The main contributions of this paper are: (1) Tactile information to estimate the pose of unknown objects under autonomous, stable grasp; (2) Integration of bio-inspired multimodal tactile sensing modules and visual information to describe in-hand object pose; (3) Analysis of five different machine learning algorithms for tactile pose estimation. The system uses visual information similar to how humans allocate visual attention to determine frames of reference for objects of interest. Tactile information is used to learn vision-defined reference frames so that the vision system can be freed to perform other tasks [
5]. Concretely, we used vision to extract an allocentric frame of reference where the object pose is located in the environment. Further, five machine learning methods using data from the tactile sensing system inferred the relation between egocentric and allocentric reference frames during haptic spatial localization. Post-grasp object rotations were performed to collect tactile information, exposing the learning system to object angles outside of the finger’s actuation workspace. For this purpose, the object received external forces in two different axes, including an open-loop finger actuation experiment. A closed-loop stable grasp fuzzy controller also used the same sensory feedback signals. From all six machine learning algorithms, results using ridge regressor achieved an average mean squared error for all sizes of 1.82°.
In this paper,
Section 2 presents a literature review of human haptic perception, in-hand robotic manipulation, tactile sensing, fuzzy control, and regression algorithms.
Section 3 presents our prototype description, a system overview, and experimental setup.
Section 4 shows experimental results for two in-hand manipulation tasks performed, followed by conclusions in
Section 5.
3. Materials and Methods
This section presents materials, methods, and experimental setup for visuotactile object pose estimation. The open source design from [
20] is the basis for the present implementation. It is a modular 3D printed gripper in ABS plastic providing an easily customizable platform. Our robotic hand design is a two independently controlled fingers version mounted on top of a table. For that goal, we kept Zisimatos et al.’s [
20] top plate, but a modified base accommodates two motors needed to pull each finger separately. When compared to the human hand, these underactuated robotic fingers have intermediate and distal flexible joints made of Vitaflex
© 30 (Smooth-On, Easton, PA, USA). Based on Zisimatos et al. [
20], this top plate, including the fingers, shows a maximum force applied (and retained) of 8
N per fingertip during tests with a standard servo. Strings are attached to tip phalanges, and two fingers are pulled independently by two Dynamixel
© motors (Robotis, Lake Forest, CA, USA) mounted on a modified base. From base to fingertip, the gripper is about 20 cm long.
Figure 2 shows the opened gripper during experiments before a grasp attempt. At the top, the left picture shows the four tactile sensors mounted on the finger phalanges, motors, and pulleys. This viewpoint is used to place the camera, as shown in
Section 3.3. The red arrows indicate the direction of force applied by the motors. On the top right of
Figure 2, a side view shows details of tendons and flex joints. A detail of the pulley one appears with a red circular arrow indicating the motor actuation direction during the pulling phase. The bottom row of
Figure 2 shows steps of a single finger actuation: (a) Rest position, no motor rotation; (b) initial movement with motor pulling; (c) continuous motion brings the finger to closer to the palm; and (d) around the maximum safe finger curvature.
Each finger phalange has a tactile sensor module mounted on it.
Figure 2’s top row shows the tactile module multimodal sensors and its placement on each finger. The module’s compliant structure and material add flexibility for the fingers’ functionality. All tactile modules send data to microcontrollers acting as nodes to the central computer. The software developed for this prototype is a distributed system that uses Robot Operating System (ROS) [
34].
Figure 3 presents all primary ROS nodes developed for this implementation in yellow boxes inside the “Computer (ROS)” gray box. The central node runs in a laptop that concentrates control, data collection, and pose estimations. All data were collected to
ROS bags for post-processing and pose orientation estimation.
Figure 3 also shows the fuzzy control node that also receives sensor data and updates motor controllers. MCU 0 and MCU 1 are microcontrollers attached via serial connection to a main computer acting as ROS nodes. These microcontrollers receive data from magnetic, angular rate, and gravity (MARG) and barometer components of via I
2C protocol that is represented by arrows from the “Tactile sensing” module in
Figure 3, which also shows I
2C communication from tactile modules multiplexed to via MUX 0 and MUX 1. There is also a USB camera represented in a blue box and a USB Serial connection from the “Motor Manager” yellow box used for computer control of Dynamixel motors represented in an orange box at the bottom of
Figure 3. In the following sections, we present tactile sensing module components and organization as well as the fuzzy control system used in this experimental setup.
3.1. Tactile Sensors
Bio-inspired multimodal tactile sensing modules which are inspired by the type, functionality, and organization of cutaneous tactile elements provide tactile information during visuotactile perception experiments. Each module contains a 9 DOF, magnetic, angular rate, and gravity (MARG) sensor, a flexible, compliant structure, and a pressure sensor placed in a structured way similar to human skin [
30]. The components integrated in the tactile module are previously discussed in
Section 2 and shown in
Figure 1. A total of four tactile modules, one for each phalange, had data collected during experiments.
Figure 3 shows four MARGs connected to microcontroller MCU 0 by a multiplexer MUX 0. In a similar manner, four deep pressure sensors are connected to microcontroller MCU 1 by a multiplexer MUX 1. The master ROS node running at the central computer demultiplexes the data and stores tactile information represented in
Figure 3 by arrows from the USB serial connection to a yellow box labelled “ROS bag”. The experiments and results section presents data provided by the tactile modules during the execution of rotation using both external forces and open-loop execution of in-hand manipulation.
3.2. Fuzzy Controllers
Before any manipulation took place, two fuzzy controllers based on tactile sensing information such as microvibrations and pressure maintained a stable grasp by sending motor signals to each actuator responsible for pulling the fingers [
12]. The autonomous fuzzy grasping controller system on this setup provided consistent grasp force while handling different object sizes.
The present implementation used a dual fuzzy controller based on pressure and microvibrations. The pressure is measured from the deep pressure sensor readings, while gyroscope information provides microvibration values for a fuzzy feedback controller about each finger. The indicator that the object under grasp has movement is angular velocity used here to detect microvibrations. The pressure sensor shows the degree of contact with the object. In conjunction, angular velocity and pressure provided tactile feedback about stability and status to a second fuzzy grasp controller. Two gray boxes named
Tactile module 0 and
Tactile module 1 in
Figure 4 describe barometer and MARG components with separated I
2C signals forwarded to independent finger fuzzy feedback controllers.
Figure 4 describes a complete system flow during the experiments. Data from tactile sensing modules (top left) provided information for the fuzzy controller (bottom left) with details described in the following sections. With a grasp controller decision based on stability and status, finger actions modified the actuators’ status (blue box middle). Vision (middle blue box) provides an allocentric reference frame to the machine learning pose estimation. This allocentric reference frame used data from all four tactile modules to provide an egocentric frame of reference, which was input to the pose estimation module (right gray box). The last step was to apply five machine learning techniques and return angle estimation (green box).
Each
finger fuzzy feedback controller provided status and stability values to a
grasp fuzzy controller, which is indicated by a gray box inside the
fuzzy controller box receiving status and stability inputs in
Figure 4. The
grasp fuzzy grasp controller produced actions for each finger (go forward, go backward or hold) based on status and stability information from both fingers.
3.2.1. Finger Fuzzy Feedback Controller
The finger fuzzy feedback controller used real sensor data to provide pressure status and grasp stability information for input to a grasp fuzzy controller. In order to describe finger fuzzy feedback controller inputs, Low and High fuzzy sets were defined for microvibrations input and Nopressure, Lowpressure, Normalpressure, and Highpressure fuzzy sets for pressure input. Stable and Notstable fuzzy sets describe finger fuzzy feedback stability output while Nottouching, touching, Holding, and Pushing define status output.
Figure 5 and
Figure 6 present input values applied for microvibrations and pressure fuzzy sets, respectively. The tactile information used here was normalized data from the barometer part of the tactile module as pressure and raw gyroscope data measuring variations on the angular velocity of the respective module. An example of possible output,
Figure 7 and
Figure 8 show stability and status output form its fuzzy sets, respectively. The inference system used was Mandani based on the center of gravity. The second fuzzy controller used status and stability from this both finger’s controller to produce finger actions.
A rule book based on [
12] was implemented for this tactile in-hand manipulation fuzzy feedback controller, and
Table 1 summarizes the rules. Possible outputs for status are NT, not touching; T, touching; H, holding; and P, pushing. Stability has two possible outputs, S, stable and NS, not stable.
From the example above, microvibration data activate of “Low” and of “High” sets, resulting in , where is a membership function. In a similar way, the above example of pressure has with 0.2 of “Low” and 0.8 “Normal” pressure sets, while “No” and “High” sets have no participation. After inference, stability presents , activating of “Stable” and of “Nonstable”, while status has , with of “Touching” and of “Holding”.
3.2.2. Grasp Fuzzy Controller
Based on both finger status and stability, a second fuzzy controller is responsible for publishing motor values in order to maintain a stable grasp. The system overview in
Figure 4 shows a box labeled grasp fuzzy controller, where stability and status data from each finger are inputs for this fuzzy controller, while its outputs update motor velocities. Above,
finger fuzzy feedback controller provided status and stability from each finger. It is important to observe that input from both fingers is essential during the inference phase of the
grasp fuzzy controller. Although both fingers are necessary for inference, for simplicity,
Figure 9 and
Figure 10 only present one finger input example. The inference system used was also Mandani based on the center of gravity. Both outputs change finger motor velocity, updating the Dynamixel controller manager node such as presented on
Figure 11, where
Figure 9 and
Figure 10 show sets that define inputs of finger stability and status, respectively.
Another rule book also based on [
12] was implemented for the grasp fuzzy controller and is summarized in
Table 2. Possible outputs for updating motor velocities are GF1, finger 1 go forward; GF2, finger 2 go forward; H1, finger 1 hold; H2, finger 2 hold; and GB1, finger 1 go back; GB2, finger 2 go back.
From the example above, stability inputs provide , with of “NotStable” and of “Stable”. Similarly, status input results in , with sets Notouch of and Touching of , while Holding and Pushing are not activated. After inference, the grasp fuzzy controller produces , with contributions of from “Hold” and from “GoForward”.
Figure 12 shows sensor data, motor positions, and velocities during an autonomous grasp attempt. Start and end arrows point the beginning and final phases of a grasping attempt. Motor velocities and tactile sensing data are fast changing during those two points. Straight lines after the end arrow denote that sensor data and motor position achieve stability, and the fuzzy controller is maintaining a stable grasp between a reasonable error margin.
3.3. Camera Setup
The top view of gripper during manipulation was used to estimate object pose variations. The middle of the USB camera image established a frame of reference fixed at the top of this setup. Inside this working space, the difference of pixels determined the angle between two color papers fixed on the object.
Figure 13 presents the object angle based on a python script using OpenCV library [
35] to capture in-hand angle changes between an object and the camera’s visual frame of reference.
Using the setup presented in this section, stable autonomous fuzzy grasping of objects allowed angle estimations based on initial visual inference with tactile excitation to further pose estimation. The next section presents the results of estimations using external and internal stimuli.
5. Discussion
Underactuated hands are a useful tool for in-hand manipulation tasks due to their capability to seamlessly adapt to the contour surfaces of unknown objects and maintain such objects under grasp even when disturbed by external stimuli. These hands are incredibly versatile while grasping unknown objects, but after grasping, estimating the pose of in-hand objects becomes a challenge due to the flexibility of the fingers in such devices. To reduce the pose uncertainty of in-hand objects, we developed a visuotactile approach inspired by the human somatosensory system.
The human manipulation system operates with two subsystems with the “Where” system dealing with perception to action building egocentric and allocentric frames of reference [
5]. The present paper introduced a visuotactile approach for robotic in-hand pose estimation of objects. In this research, it was used as inspiration for after grasping object pose estimation combining visual and tactile sensing information. During the experiments, autonomous, stable grasping using a dual fuzzy controller provides consistent force even while handling different objects. Two sets of experiments achieved accurate angle estimation using tactile data.
The system proposed in this paper is an attempt to emulate the human “Where” subsystem. It explores machine learning methods to find the relationship between the data collected by multimodal tactile sensing modules and the orientation of an object under grasp given by intermittent allocentric reference frames. The main advantage of such a data-driven approach is that it could extend to different gripper configurations with more sensing modules and objects with different shapes.
In the first experiment, external forces were applied to the object during a stable grasping, forcing it to change its orientation, while during the second experiment, autonomous open-loop promoted object rotation. External stimuli promote learning of some representations between tactile sensing inputs an object poses, which sometimes are not achieved by a robots finger workspace. Controlled pose disturbances during in-hand manipulation also promote robot learning of some representations between tactile sensing inputs and object poses a robot may need to be exposed to, which can be achieved through the workspace of its fingers. After an initial visual exploration, angle change estimation uses tactile data from bio-inspired sensor modules. Post-processing with ridge regressor achieved an average mean squared error of during experiments using external forces. Compatible results using an MLP neural network achieved a mean squared error of during autonomous open-loop rotation in the second experiment. As robots become more complex and present in unstructured environments, they have to deal with unexpected object manipulation scenarios. Being able to use its camera in conjunction with tactile sensing is essential to develop a sustainable robot presence in the modern world.
Future research could focus on integration with the somatosensory system in real-time object recognition and pose estimation. Robots that achieve a reasonable level of dexterity using the allocation of attention will improve the efficiency of use of its resources to perform a broader range of tasks in unstructured environments. Since the solution is model-free, the selected machine learning algorithms will improve its accuracy as more information is available, which will lead the investigation of this approach using a more significant number of fingers. A simple pick and place task where a table is modified from its initial position would be improved in this scenario. An “egocentric” frame of reference related to tactile sensing modules is created after exploring the object with the visuotactile system. This robot could build an “egocentric” frame of reference centered tactile control system, while a now free vision system will find a “allocentric” frame of reference (e.g., the axis of a table) for interobject interaction.