1. Introduction
The interception of moving objects is central to several benchmark robotic tasks such as ball-in-the-cup [
1,
2,
3], batting [
4,
5,
6], ball-catching [
7,
8,
9,
10,
11], juggling [
12,
13,
14] and playing table tennis [
15,
16,
17,
18]. This research focuses on demonstrating the combined performance of visual tracking systems and motion control algorithms in highly dynamic environments. In particular, catching a flying ball is a challenging task due to the demanding spatial–temporal constraints, which require coordination between visual, planning and control systems to get the hand to the right place at the right time [
19]. While humans learn to accomplish this task with relative ease, this is still not the case for robotic systems, particularly in short-distance scenarios. The robot’s actions to catch a flying ball are hampered by delays and noise in both sensors and actuators [
20]. The sensory noise contributes to uncertainties in the ball’s trajectory prediction and creates difficulties in the control of the robot’s endpoint. Furthermore, frequent re-prediction of the target-catching location is required, as new observations become available. From the perspective of control, this progressive refinement of the desired catching point requires the online re-planning of robot motion, with object flight times that may last for around one second.
Traditional robot control systems for catching objects are built on visual information for the object in flight. Given the short flight time, only robot systems equipped with fast visual detection and tracking, sophisticated control architectures and computational power can perform the task successfully. To address this issue, this paper presents the Robot Anticipation Learning System (RALS) to predict feasible catching points in advance, using observations of the thrower’s motion before the ball is released. These anticipation skills gain extra time for the robot hand to start moving in the targeted direction before the opponent finishes throwing.
This paper is an extension of the work that was originally presented in [
21], where the role of early anticipation in human–robot ball catching was introduced. Here, we explain the RALS methodological framework in detail and we provide more comprehensive tests. Several computer simulations are conducted to demonstrate the effectiveness of the proposed solution in a scenario where the human subject throws a ball and the robot catches it. The experiments demonstrated that the proposed anticipation mechanism significantly improved the ball-catching rate compared to the baseline approach, where the predictions rely only on information acquired during the ball’s flight phase.
In summary, the contributions of this paper are three-fold:
RALS is developed, the first robot control system for catching flying objects with anticipation skills, using visual information from the thrower’s hand motion.
A learning mechanism is implemented to map the noisy vision information into a prediction of the ball’s position and velocity at the moment of release.
RALS was implemented and successfully evaluated for different levels of sensor noise and limits of the robot joint velocities.
The rest of the paper is organized as follows.
Section 2 reviews related works and main concepts. The experimental setup is presented in
Section 3.
Section 4 presents the design of the RALS system. The ball-catching results with and without the anticipation mechanism are discussed in
Section 5. Finally, conclusions are drawn in
Section 6.
2. Related Work
The numerous approaches to solve the ball-catching problem differ in the complexity of the robot model, the way they predict the object’s trajectory, and the adopted online trajectory generation method. Earlier works often show robots with few degrees-of-freedom (DOF) and simple motion generation algorithms. In the pioneering works of Slotine [
22,
23], a 4-DOF WAM arm (Barret Technology) and an active vision system with output information at 60 Hz were used. In these works, the catching point corresponds to the closest point of the ball’s trajectory to the base of the robot, while the end-effector assumes a perpendicular orientation with respect to the ball’s trajectory. The trajectory used to catch the ball is planned in Cartesian-space, using a 3rd-order polynomial function, which requires an inverse kinematics algorithm running in the control loop. The best performance results had a 70–80% success rate for similar launches.
Nishiwaki et al. [
24] addressed both the falling-ball task and the ball-catching task using a humanoid robot, Saika, using an active vision system consisting of two-CCD cameras. In this work, the end-effector reached the catching point through an inverse kinematics model using a three-layered neural network model. The vertical trajectory of the ball was approximated by a quadratic function through a weighted least squares method, giving more importance to the most recent observations. Frese et al. [
25] used a 7-DOF DLR-LWR-II arm equipped with a basket and an off-the-shelf stereo vision system that acquires and processes images at 50 Hz (a combination of two cameras). The catching point selection considered two criteria. First, a choice of location that is near to the robot’s end-effector. Second, a catching point that is far away from the robot, to avoid physical constraints such as joint limits. The catching configuration was calculated to ensure a perpendicular orientation of the basket with respect to the ball’s trajectory. In their experiments, the robot succeeded in 2/3 of its attempts to catch the ball. Most of the failures occurred due to the system’s (camera and lens) limited horizontal field of view, resulting in delays in the visual system.
Studies in the following years frequently used more complex robots and advanced techniques for motion planning; human imitation has attracted significant interest. Riley and Atkeson [
26] explored this idea to create human-like behaviours, encoded by movement primitives based on nonlinear dynamics. These Programmable Pattern Generators were adapted to catching a ball with a 30-DOF humanoid robot equipped with a baseball glove, although the end-effector’s orientation was not considered. The trajectory estimation of the flying ball used a stereo vision system at 60Hz. The catching point was derived from the intersection of the estimated trajectory with a horizontal plane placed at a given height. Park et al. [
27] proposed an evolutionary algorithm for the ball-catching task based on a motion database, created off-line through imitation learning. The database was initially filled with kinematic data extracted from human motions. Then, a data-driven evolutionary optimization was employed to provide human-like reaching with minimal torques in real-time. The proposed framework was validated in a humanoid robot with a 6-DOF arm, equipped with a vision system installed on the head. Kim et al. [
28] conducted their ball-catching experiments in the iCub humanoid robot by learning from human demonstrations acquired using a glove and the X-Sens motion capture suit. The work’s focus is on the synchronization of the robot’s movements with that of a flying ball. The proposed approach controls the timing of robot motions encoded with both Gaussian Mixture Models (GMMs) and Dynamic Movement Primitives (DMPs).
Additional challenges have been addressed in more recent works, such as the inclusion of more complex robots. For example, most of the works discussed above give little importance to the hand control and the grasping strategy. In [
29], Bauml et al. addressed the joint control of a DLR-LWR-III arm with 7-DOFs and a 12-DOF DLR-Hand-II hand. The motion control is formulated as a nonlinear optimization problem subject to constraints in terms of workspace, maximum joint velocities, and limits on joint angles. Later, the same authors extended their research to allow for up to two balls to be caught simultaneously using the mobile humanoid robot Rollin Justin [
7]. In addition to the degrees-of-freedom of the arms, torso and the mobile platform itself, a 2-DOF pan-tilt unit ensured the ball was in the field of view of the stereo vision system.
Mobile manipulator systems are of interest due to their extended workspace and dexterity in a scenario where accurate high-speed motions are required. Dong et al. [
30] proposed a framework based on a hierarchical optimization scheme for real-time trajectory generation. The higher-level kinematic planner is formulated as a nonlinear optimization problem, solved through sequential quadratic programming, while the low-level kinematic planner is solved by quadratic programming. The joint control is driven by an inverse dynamics learning method, which accounts for a state-of-the-art success rate of 85.3%. The balls are thrown by a human subject at around 4 m, resulting in a flight time of about 1 s. The experimental setup includes a 6-DOF manipulator (a UR10 arm from Universal Robotics) rigidly mounted on a 3-DOF omnidirectional mobile platform and the gold standard VICON system for ball estimation and trajectory prediction at 100 Hz.
Another recent topic of research is the possibility of catching arbitrary objects with uneven shapes. In most studies, the object in flight is a ball and the flight trajectory is approximated by a parabola, while the effect of air drag and other forces are ignored. However, uneven flying objects require their dynamical modelling to obtain robust predictions about translational and rotational behaviour. The catching of arbitrary objects with a complex flying behaviour was addressed by Kim et al. [
8]. A learning framework is proposed to teach a robot to catch flying objects through an observation of demonstrations encoded by dynamical systems. In this way, the robotic system learns a model of the arm’s movements, as well as the dynamics of the flying object, while offline, based on prior information such as mass, shape, or inertia. The prediction system is combined with a probabilistic model to obtain the distribution of optimal catching configurations. The advanced system can coordinate the motion of the arm, hand and fingers to catch a hammer, a tennis racket, a bottle or a cardboard box. Simulations were provided with the iCub humanoid robot, and experiments with the 7-DOF Kuka LWR 4+ arm. Following the same framework, Salehian et al. [
9] proposed a strategy in which the robot’s hand follows the object’s trajectory for a short period of time, to allow for more time to close the fingers. The control law is expressed as a linear system, whose parameters are approximated by Gaussian mixture models (GMMs). In contrast, Yu et al. [
31] proposed a neural acceleration estimator to tackle the task of motion prediction for in-flight uneven objects without any prior information. The experimental results show that htis was effective in terms of prediction accuracy and generalization performance for uneven objects in public datasets and real-world experiments with a UR5 robot.
The majority of the previous works concentrate on the classical problem of visual estimation of the ball’s trajectory (model-based approach) to anticipate the catching point. However, some others [
32,
33,
34,
35] continuously used visual information in the feedback control loop (visual serving approaches). This focuses on the relationship between the pose of the object in-flight, the robot’s pose and the projection of visual features in the image plane. Sato et al. [
35] proposed an eye-in-hand configuration for visual servoing control of high-speed ball catching using a robot arm with 7-DOFs. The system comprises a multi-fingered hand in which 8 small cameras are attached and two external fixed high-speed cameras (500 frames per second). On the one hand, the external cameras deal with predicting the 3D trajectory of the flying ball and the desired catching point. On the other hand, the multi-vision hand provides the visual information needed to correct the hand’s position and orientation using a visual serving control.
Among the most recent works are the contributions by Schill and Buss [
10] and Ardakani et al. [
36]. The former addresses the existing challenge of the reliable task execution of robots catching objects in-flight, such that experimental success in ballistic catching becomes predictable and robust. The authors adapted a hybrid systems framework approach, focusing on the Zeno behaviour of bouncing balls to enable the robotic catching of spherical objects. A dynamical system parametrization enables dynamically feasible offline motions using hybrid bouncing ball formalisms. The success prediction and dynamic feasibility are solved through optimization-based motion planning. The authors provide solutions to the different phases of joint robot–robot throwing and catching. The experimental setup consists of two 2-DOF robots, symmetrically mounted on a vertical plane. They consider a joint robot–robot scenario, which does not require visual feedback, in which each robot can perform the throwing and the catching task. The robustness, with respect to uncertainties in the impact model and object’s state accuracy, is quantified. The latter proposes the use of model predictive control (MPC) to solve the problem of point-to-point trajectory generation in a ball-catching scenario, while respecting physical limitations and real-time requirements.
The most evident finding in the literature of robot ball-catching is the lack of works emphasizing the advantages of earlier anticipations based on observations of the thrower’s movement. The existence of great flexibility in the use of visual information by humans was reported in [
37]. For this purpose, the participant’s access to earlier information regarding both the thrower’s action and the ball’s trajectory was manipulated. By recording the whole-body postural control, the performance of each participant was evaluated, considering three conditions: only the thrower’s action is available, only the information about the flying ball is available, and all visual information is available. Their study revealed that movements were initiated earlier when visual information was available, prior to ball flight, resulting in improved performance. Along the same line, the recognition of human actions and intentions is recognised as essential for human–robot interaction [
38]. This concept of earlier anticipation was explored in the context of human–robot table tennis [
18]. In the context of human–robot ball-catching, this idea was first addressed by the authors in [
21] and will be revisited in this article, which provides an extended analysis of the novel approach in the form of a simulation-based testbed.
3. Experimental Setup
3.1. Simulation Scenario
This section describes the main blocks involved in the simulation of the human–robot ball-catching task, as well as the various assumptions imposed in this study (see
Figure 1). First, a scenario is assumed in which a human performs underarm ball throwing, and the robot (at about 4 m) tries to catch the balls entering its reachable space. Second, the projectile motion is approximated by a curved parabolic path, assuming that the only force acting on the flying object is gravity (i.e., air resistance is neglected). Regarding, the robot catcher, this study considers a 3-DOF articulated manipulator (RRR), mounted upside down so that, in the home position, it is fully extended down. The world coordinate frame is fixed to the ground, with the x-axis oriented towards the human thrower, while the z-axis of both coordinate frames,
and
, are vertically aligned with each other. The robot’s base, as well as the origin of the base frame
, are located at a height of 1.70 m above the ground.
The robot’s link lengths, from the base to the end-effector, are the following: (base-shoulder), (shoulder-elbow) and (elbow-wrist). The robot’s reachable workspace is further reduced due to the physical limits imposed on the joints according to the following inequalities: (vertical joint), (shoulder joint) and (elbow joint). The flying ball and the robot’s end-effector are modelled as point source objects, while the problems of hand orientation and grasping are not addressed. In this work, the desired catching point is determined from all valid solutions inside the reachable workspace, as the trajectory point of the flying ball that is closest to the end-effector. This target point is constantly updated as new observations become available. At the beginning of each trial, the arm adopts an initial configuration where the elbow angle is at a right angle, aligned with the x-direction.
Focusing on control, the progressive refinement of the desired catching point requires the online re-planning of the robot’s kinematic chain so that the end-effector reaches the right place in time. This work adopts the so-called kinematics control in the motion control problem. This is based on an online inverse kinematic transformation that computes the reference joint velocities corresponding to an assigned end-effector velocity direction, derived as the difference between the catching point and the end-effector’s position. The solution to the inverse kinematics problem is based on the inversion of the manipulator’s analytical Jacobian matrix and the use of feedback corrections as a result of re-estimating the ball’s trajectory. The online closed-loop algorithm is implemented in discrete-time form, with a time interval of 10 ms. The objective is to find, at instant , a joint velocity vector that allows the end-effector to move as quickly as possible towards the desired catching point under the velocity constraints on the actuators. For that purpose, a scaling factor is applied to the joint velocity vector, such that the maximum possible values are always used, while the end-effector remains in the desired direction.
3.2. Human Demonstrations of Underarm Throwing
The generation of throwing trajectories was supported by an analysis of human demonstrations acquired from two subjects playing catch at a distance of about 4 m from each other. Motion capture was performed in a human motion analysis laboratory, equipped with a VICON optoelectronic system with eight infrared cameras. A standard upper-body marker set was attached to both subjects and two additional markers were attached to the ball. The 3D coordinates of each marker were collected at 100 Hz and stored for offline analyses with customized software written in Matlab (Mathworks, MA, USA).
Figure 2 illustrates the projection of the 3D thrower’s hand data in a 2D vertical plane aligned with the projectile motion with the axes
x horizontal and
y vertically up. The start marker (circles ‘o’) indicates the start of the retraction phase, in which the hand is pushed back while the arm stretches. The release marker (asterisks ’★’) was obtained by observing the distance between the coordinates of the ball and the hand. The analysis indicates the existence of regularity in the shape of the generated trajectories, as well as a small variability in the duration of the movement’s execution time. Therefore, a single demonstration that ignores the retraction phase was extracted from the real data as the basis of implementation of the throwing generator.
3.3. Data Generation in Preparatory and Ballistic Phases
In this study, catching the ball involves observing the thrower’s action, as well as observing the free motion as it approaches the robot. This subsection describes the data-generation process required to both train the feedforward neural network (core of RALS) and simulate the complete throwing–catching task. The simulation includes the ground truth data of the ball’s behaviour and raw data representing the effects of noisy measurements. The complete motion of the ball is divided into a preparatory phase, followed by a ballistic phase. The preparatory phase lasts from the beginning of the throw until the moment the ball is released. This is followed by the ballistic phase, in which the ball’s motion is completely determined by the laws of physics, whatever the trajectory during the initial phase. From the mechanical perspective, the path of the ball during flight takes place in a vertical plane, parallel to the gravity vector. The generation of the ball trajectory takes place over the entire simulation period, according to a given sequence.
The first step is to express the ballistic motion as a function of the following input parameters: (i) the initial position vector of the ball represented by the coordinates in the world reference frame; (ii) the final position vector of the ball when it intercepts the ground plane is represented by the coordinates in the world reference frame; (iii) the time elapsed from the initial position until the ball reaches the target final position . These parameters assume random values with uniform distribution, within the following ranges: , and ; , and ; and . Restrictions on the highest point of the object above the ground and on the possible range of the time-of-flight (i.e., from the launch to the catch) are also included. In this study, the maximum height is limited to 3 m and the time-of-flight (ToF) ranges from 0.5 to 1.2 s. A common problem in the above parametrization algorithm is that some of the generated trajectories may be invalid. Therefore, feasibility verification is considered, such that any trajectory outside the reachable workspace, or any trajectory that is unable to satisfy the imposed constraints, is discarded. The second step requires a throwing trajectory that verifies the previously established conditions in terms of the horizontal and vertical components of the ball’s position and velocity at release. For that purpose, a polynomial interpolation was used on the selected demonstration data. The algorithm provides trajectory smoothing along with the necessary requirements. Gaussian noise is added to the generated trajectories to represent the measurement noise that corrupts the observed samples.
3.4. Release Parameters versus Catching Point
The projectile motion is revisited by including specific aspects of a ball-catching task, such as the robot’s reachable space and the target-catching point. More concretely, this subsection establishes analytical relationships between the physical parameters of the projectile motion and the desired catching point, assuming negligible air resistance. For reasons of simplicity, the analysis is performed by assuming a two-dimensional projectile motion modelled by parabolic equations. In this context, the independent horizontal and vertical displacements, as function of time
t, can be written as:
where
and
are, respectively, the initial position and initial velocity of the object, and
g is the gravitational constant (
). The maximum height that the projectile reaches above the ground is obtained from the velocity equation in the vertical direction for
, that is:
where
is the initial velocity vector and
is the time taken to reach the maximum height, given by
. From the vertical displacement in (
1), we obtain,
Assuming a restriction on the highest point of the projectile
, Equation (
3) provides a superior limit to the initial velocity in the
y-direction, as follows:
Equation (
1) can also be used to determine the set of trajectories that intercept a given target point
. By eliminating
t, we obtain the following equality:
By specifying the initial velocity in the
-direction, we can obtain the required velocities in the
-direction by solving a quadratic equation. It should be noted that, although there may be two solutions, only one is valid, since we consider that the interception with the target point occurs during the descending phase of the ballistic motion. At the same time, a discriminant
of less than zero indicates that there is no solution, providing a lower limit on the initial velocity in the
-direction, as follows:
Figure 3 illustrates an example of multiple ballistic trajectories that intercept the same target point. The black curve is the single one where the target point coincides with the catching point, i.e., the closest point to the end-effector. At the same time, the evaluation of the ToF for each ballistic trajectory follows a similar procedure by solving a quadratic equation. The equation used to evaluate the ToF with respect to the initial velocity in the
y-direction, producing the same velocity limit as expressed in (
6), is the following:
At the end of this study, we provide an intuitive insight to the relationship between the initial velocities of the ball at the moment it is launched (
) and the feasibility of the generated movement (see
Figure 4). An invalid movement results from various circumstances, namely, it is outside the robot’s workspace, and exceeds the limits imposed on the maximum height of the ball
or on the ToF. Movements that intercept the reachable workspace and comply with these limits are considered valid.