research-article

Open access

Assistance in Teleoperation of Redundant Robots through Predictive Joint Maneuvering

Authors:

Connor Brooks,

Wyatt Rees,

Daniel SzafirAuthors Info & Claims

ACM Transactions on Human-Robot Interaction, Volume 13, Issue 3

Article No.: 35, Pages 1 - 23

https://doi.org/10.1145/3630265

Published: 26 August 2024 Publication History

PDF eReader

Abstract

In teleoperation of redundant robotic manipulators, translating an operator’s end effector motion command to joint space can be a tool for maintaining feasible and precise robot motion. Through optimizing redundancy resolution, the control system can ensure the end effector maintains maneuverability by avoiding joint limits and kinematic singularities. In autonomous motion planning, this optimization can be done over an entire trajectory to improve performance over local optimization. However, teleoperation involves a human-in-the-loop who determines the trajectory to be executed through a dynamic sequence of motion commands. We present two systems, Predictive Kinematic Control Tree and Predictive Kinematic Control Search, for utilizing a predictive model of operator commands to accomplish this redundancy resolution in a manner that considers future expected motion during teleoperation. Using a probabilistic model of operator commands allows optimization over an expected trajectory of future motion rather than consideration of local motion alone. Evaluation through a user study demonstrates improved control outcomes from this predictive redundancy resolution over minimum joint velocity solutions and inverse kinematics-based motion controllers.

1 Introduction

For teleoperation of robotic manipulators, a common method of simplifying control for human operators is enabling an operator to issue motion commands for the end effector to follow while the teleoperation system handles the translation of these commands into actual joint motion. If the manipulator has more degrees of freedom than the space in which the operator issues these motion commands, then the choice of how to perform this translation can affect the maneuverability of the end effector for future operator commands. Using a predictive model of the operator’s command generation extends the space over which the translation system can optimize. Optimizing according to expected future trajectories of operator motions gives this system the ability to “look ahead” when translating these motion commands into joint space, potentially avoiding singularities, joint limits, or other problematic joint states that could reduce the end effector’s maneuverability.

This article presents a novel approach for extending redundancy resolution to address problems specific to teleoperation. Through sampling from a model of the operator, this system performs redundancy resolution that optimizes over multiple future timesteps of predicted motion commands to maintain an operator’s ability to maneuver the end effector as desired. Utilizing redundancy resolution for maintaining joint maneuverability over an operator’s inferred task differs from existing shared control methods for improving teleoperation by constraining optimization of joint configurations to have a minimal effect on the local end effector motion so as to not reduce the authority of the operator. This approach follows a sampling-based strategy with a probabilistic model of the operator, allowing multiple possible future operator actions to be considered. The consideration of future operator commands enables redundancy resolution to ensure that the teleoperated system remains maneuverable and precise along the operator’s desired command trajectory. Two specific formalizations of this approach, the Predictive Kinematic Control Tree (PrediKCT) and Predictive Kinematic Control Search (PrediKCS), illustrate methods for applying predictive modeling to redundancy resolution in teleoperation. PrediKCT, first presented in Reference [7], uses a tree search framework to evaluate joint motion according to expected future commands. PrediKCS is a novel extension to PrediKCT that addresses latency and consistency issues, creating an anytime algorithm for incorporating predictive operator models into control space translation. We provide both algorithms as open source resources for the community: A C++ implementation of the PrediKCT system is available at https://github.com/hri-ironlab/predikct, while a ROS package using PrediKCS to translate twist commands into joint velocities is provided at https://github.com/hri-ironlab/ros-predikcs.

2 Background and Related Work

Robotic manipulators may take a variety of forms with differing complexity and likeness to the human arm. As a result, planning trajectories in joint space—the space defined by the translational and angular displacements of each robot joint—presents a challenging task for human teleoperators. As discussed by Rea and Seo [41] in their survey of teleoperation systems, challenges persist in creating efficient and usable means of operating robots. To address this challenge within the context of robotic manipulation, teleoperation interfaces commonly allow operators to issue commands in task space—the real-world coordinate system—to control the position and orientation of the robot end effector. This affords human operators with a potentially more natural and intuitive control method as they no longer have to translate from their desired task space motions to issue joint-level commands but comes at the expense of placing this translational burden onto the control system.

Teleoperation control systems typically have limited knowledge of the human’s task, which prevents motion optimization techniques that assume full knowledge of the trajectory to be executed from being applied when translating desired task space motion into joint space actuation. However, teleoperation does not imply that the sequence of motion commands to be executed will be independent from one another. Instead, teleoperation consists of a human performing some task mediated through the motion commands issued at each timestep. Predictive modeling of future operator commands can provide an estimation of the commands that are likely to be issued at future timesteps.

In this article, we present work on applying this predictive approach to the problem of control space translation. The remainder of this section describes the background of research on control space translation in teleoperation and utilization of operator models in human-in-the-loop control systems. Section 3 develops a novel system for implementing prediction into control space translation through teleoperation, with system performance characterized by a simulation study. Addressing potential drawbacks with this first system, Section 4 further develops the idea of applying predictive coordination to control space translation to create a low-latency method for action selection. This section also presents results from a user study evaluating operator usage of this system compared with a state-of-the-art inverse kinematics solver. Finally, Section 5 summarizes the findings from the development and evaluation of these teleoperation systems.

2.1 Utilizing Shared Control to Improve Teleoperation

PrediKCT and PrediKCS apply probabilistic models of robot operators to address redundancy resolution via informed estimates of future operator commands. These systems are informed and inspired by related research in shared control that illustrates the potential for assistive control of robotic systems. Shared control systems augment direct human control of a robotic manipulator with assistance that is meant to improve operation outcomes by accomplishing tasks such as simplifying control through constraining motion to safe paths [1, 29, 35], handling portions of the control space autonomously [21], or interrupting human control to ensure obstacle avoidance [45]. One straightforward and common method of implementing assistive teleoperation for robotic manipulators involves inferring operator intent out of a set of possible goals by observing operator commands, then generating assistive commands toward this inferred goal [3, 13, 18, 19, 26, 28, 36, 52]. A modification of this strategy involves assistance that considers the current probability distribution over the goal set to generate a command that works toward all goals, with the amount of assistance toward each goal weighted by that goal’s current probability [4, 23, 24]. Other work does not assume the operator’s goal is necessarily stationary, allowing for the possibility of a dynamic goal that changes between timesteps [22]. Further extensions of goal modeling for shared control include inferring over a continuous parameterization of a fixed set of types of goals [20] or using data-based approaches with repeated tasks to learn how to provide assistance [25]. These systems demonstrate that modeling an operator’s goal can allow a control system to adjust robot motion in a manner that improves task outcomes.

Applying this paradigm of shared control to our goal of maintaining maneuverability and feasibility of robot motion, Chipalkatty et al. [12] and Rubagotti et al. [42] both use a single predicted rollout of expected future commands to optimize smooth motion during teleoperation. Through prediction of human motion in a haptic control interface, Luo et al. [33] reduce latency in a control system by using prediction as a feedforward model for anticipation of local expected desired motion. Rakita et al. [39] use a similar guiding rationale of creating joint-level control that improves teleoperation through optimizing over multiple goals, although this system does not involve a prediction component. Their system avoids singularities and improves motion smoothness by relaxing precision when following motion commands for application to non-redundant manipulators. This approach was extended in a framework for Inverse Kinematics by applying relaxation of individual pose constraints to generate feasible motion [40]. These methods are not applied towards redundancy resolution in teleoperation, but instead use a shared control paradigm to mix operator commands with consideration of future timesteps to determine the final motion for the robot to follow, optimizing motion feasibility at the expense of operator authority over the control system.

These shared control systems for robotic manipulators assist operators in the challenging task of operating high-DOF robotic systems, demonstrating application of probabilistic models of operators in a variety of methods for assisting control. However, the shared control systems discussed utilize this model of future operator commands to provide assistance in a manner that reduces operator authority by directly altering control commands. PrediKCT and PrediKCS instead utilize this strategy of modeling operator commands within redundancy resolution to optimize motion while constraining this assistance to minimize its effect on local end effector motion.

2.2 Redundancy Resolution in Robotic Control

Through kinematic control, robot operators are able to indicate desired local movement of the robot’s end effector. Creating this type of controller requires translating a motion command from task space into the joint space of the robot. Performing this translation with minimum resulting joint velocities is accomplished by

\[\begin{gather} \dot{\mathbf {q}} = \mathbf {J}^{\dagger }(\mathbf {q})\mathbf {u}, \end{gather}\]

(1)

where $\mathbf {u}$ is a twist command representing the desired position and rotation velocities of the robot’s end effector in task space, $\mathbf {q}$ is a vector of robot joint positions, and $\mathbf {J}^{\dagger }$ is the pseudoinverse of the Jacobian, allowing inversion of the Jacobian for manipulators with redundant degrees of freedom [44, 50].

To make use of the extra degrees of freedom provided by the manipulator, actuation according to the nullspace of the Jacobian can be used to select from the set of joint motions that provide the same local movement of the end effector,

\[\begin{gather} \dot{\mathbf {q}} = \mathbf {J}^{\dagger }(\mathbf {q})\mathbf {u} + (I - \mathbf {J}^{\dagger }\mathbf {J})\mathbf {z}(\mathbf {q}). \end{gather}\]

(2)

Utilizing this nullspace motion can keep the robot away from singularities, ensure movement precisely follows the directed command, and maintain maneuverability for future timesteps [11]. Specific goals to which redundancy resolution has historically been applied include singularity avoidance [5, 10], obstacle and joint limit avoidance [32, 34], minimization of power consumption [48], and ensuring motion complies with specific application requirements [46].

Motion can be optimized according to various objectives through determination of $\mathbf {z}(\mathbf {q})$ . A common strategy is to use the gradient of an objective function to determine $\mathbf {z}(\mathbf {q})$ [5]. Examples of such objective functions explored in the literature include avoiding joint limits [30], avoiding singularities [10], or optimizing maneuverability [51]. Cascading projections into preceding tasks’ nullspace prioritizes a set of such objectives while performing this redundancy resolution [43].

Regardless of the objective function used, a key limitation of such approaches is that nullspace motion is only selected according to this first-order approximation of how to optimize the objective function, creating a myopic approach that does not take into account the robot’s future path. Illustrating the benefits of instead optimizing over an entire trajectory, Faroni et al. [14, 15] demonstrate improvements to this local approach by considering future states in a model-predictive framework of a fully known trajectory. Teleoperation applications prevent this optimization over a known trajectory due to the requirement for the robot to execute a dynamic trajectory affected by the goals and action choices of the operator. Accordingly, PrediKCT and PrediKCS use a probabilistic model of the human operator that predicts possible future operator actions given expected future states to determine the value of possible definitions for $\mathbf {z}(\mathbf {q})$ .

3 PrediKCT: the Predictive Kinematic Control Tree

The PrediKCT is a tree search framework for incorporating a probabilistic user model in the translation of velocity commands into joint motion commands for robots with redundant joints. This approach builds on optimization methods in kinematic control, extending redundancy resolution to include predictive modeling of future operator commands. An overview of this approach is presented here, while additional implementation details, results from a simulated system characterization, and discussion can be found in Reference [7].

3.1 Building the Predictive Kinematic Control Tree

PrediKCT uses a tree structure to search and evaluate potential future commands. The tree is divided into two types of nodes, with each layer alternating between types: (1) motion candidate nodes and (2) user prediction nodes. Motion candidate nodes represent decisions made by the control translation system while user prediction nodes represent decisions made by the human operator.

3.1.1 Motion Candidate Nodes.

Motion candidate nodes are given a state describing the positions, velocities, and accelerations, of the robot’s joints, $s_{t}$ , and a velocity command, $u_{t}$ . Each individual node $MC_{s_t, u_t}$ generates a set of possible joint motion candidates $\mathbf {Q}_u = \lbrace \dot{\mathbf {q}}^{1}, \ldots , \dot{\mathbf {q}}^{m}\rbrace$ through sampling nullspace motion around the gradient of the Yoshikawa manipulability index [51]. PrediKCT uses the gradient of this index as a central point for searching possible nullspace motion as keeping this measure high during teleoperation can help avoid kinematic singularities that would constrain the system’s ability to follow arbitrary operator commands [37]. PrediKCT samples from a multivariate distribution with the mean defined as the average of the last chosen nullspace velocity vector and the gradient of the Yoshikawa manipulability index at the current position,

\[\begin{gather} \mathbf {z}(\mathbf {q}_{t})^\dagger = \frac{\mathbf {z}(\mathbf {q}_{t-1}) + \nabla m(\mathbf {q}_{t})}{2}, \end{gather}\]

(3)

\[\begin{gather} \mathbf {z}(\mathbf {q}_{t}) \sim \mathcal {N}(\mathbf {z}(\mathbf {q}_{t})^\dagger , k\mathbf {I}), \end{gather}\]

(4)

where $m(\mathbf {q}_{t})$ is the Yoshikawa manipulability index of the joint positions at time t and k is a scaling factor that sizes an identity covariance according to the distance between the last chosen nullspace velocity vector and the gradient of the manipulability index. This anchoring of updates to the sampling region keeps the controller around the gradient of the manipulability index while biasing sampling toward perturbations identified as promising in the last timestep.

A joint motion candidate leads to a new state $s_{t+dt}$ through a forward transition model $T(\cdot)$ that integrates the motion candidate over a fixed period of time dt. This new state is passed to a child user prediction node. Thus, a motion candidate node creates a set of $|\mathbf {Q}|$ children, each of which is a user prediction node corresponding to a predicted future state from applying one of the joint motion candidates $\dot{\mathbf {q}}^{i} \in \mathbf {Q}_u$ to the current joint states $\mathbf {q} \in s_t$ : $s_{t+dt} = T(s_t, \dot{\mathbf {q}}^{i}, dt)$ .

3.1.2 User Prediction Nodes.

User prediction nodes $UP_{s_t}$ are given a state $s_t$ and generate velocity primitives representing possible operator velocity commands for that state. Each user prediction node utilizes an operator model that maintains a probability distribution over possible velocity commands for a given state: $p(\cdot | s_t)$ . To create the next layer in the tree, a set of velocity commands $\mathbf {U_s}$ is sampled according to this operator model: p( $u_{t}^{j} \in \mathbf {U_s}) \sim p(u_{t}^{j} | s_t)$ . For each sampled velocity command, a new child motion candidate node $MC_{s_t, u_{t}^{j}}$ is created using this command.

3.1.3 Tree Specification.

The root node in PrediKCT is a motion candidate node defined by the current state $s_{0}$ and most recent operator velocity command $u_{0}$ . User prediction nodes and motion candidates nodes are created in alternating layers as described above. The tree always ends with a layer of motion candidate leaf nodes representing the possible motion candidates for the final timestep of estimated operator commands. The height of the tree h and the time period dt over which motion candidates are applied together determine the time horizon that is considered by a given tree. Two branching factors affect the shape of the tree: $|\mathbf {Q}|$ , the number of possible joint motion candidates generated per motion candidate node, and $|\mathbf {U}|$ , the number of possible operator velocity commands considered per user prediction node.

3.1.4 Node Value Calculation.

The value calculation for nodes in PrediKCT begins with the leaf nodes and is calculated up to the root. PrediKCT makes use of a reward function over motion candidates, $R_A(\cdot)$ , which can be designed to select desired types of motion. For example, $R_A$ may produce higher reward for smooth motions, negative reward for moving toward joint configurations close to singularities, and large negative reward for self-collisions. A detailed example of an implementation for $R_A(\cdot)$ is given as part of PrediKCS’ sample implementation in Section 4.1.3.

Since PrediKCT always ends with a layer of motion candidate nodes, calculating the value from the leaf nodes begins with evaluating a set of motion candidates. The value of a leaf motion candidate node $MC_{s_{T}, \mathbf {u}_T}$ at the final timestep of T is defined according to the maximum reward value of its considered joint motion candidates,

\begin{align} val(MC_{s_{T}, \mathbf {u}_T}) = \max _{\dot{\mathbf {q}} \in \mathbf {Q_u}} R_A(\dot{\mathbf {q}}). \end{align}

(5)

For all user prediction node layers in the tree at any timestep t, the user prediction node $UP_{s_{t}}$ uses the calculated value of each of its motion candidate children nodes to assess its expected value. User prediction node values are calculated according to a sum over all of their children nodes, weighted by the probability of an individual velocity command being chosen. A discount parameter, $\gamma$ , where $0 \le \gamma \le 1$ is also included in this value calculation. Setting this parameter to $\gamma \lt 1$ prioritizes the reward given to nodes earlier in the tree and discounts the weight of node value calculations according to how far the nodes are from the current timestep,

\begin{align} val(UP_{s_{t}}) = \gamma \sum _{\mathbf {u} \in \mathbf {U_{s_t}}} p(\mathbf {u} | s_{t}, \mathbf {U_{s_{t}}}) val(MC_{s_{t}, u_t}). \end{align}

(6)

Finally, non-leaf motion candidate nodes also consider the value assigned to their children user prediction nodes when determining node value,

\begin{align} val(MC_{s_{t}, \mathbf {u}_t}) = \max _{\dot{\mathbf {q}} \in \mathbf {Q_u}}[ R_A(\dot{\mathbf {q}}) \ + val(UP_{T(s_{t}, \dot{\mathbf {q}}, dt)})]. \end{align}

(7)

All motion candidate layers internal to the tree use this value determination. Similarly, the root motion candidate node uses this same procedure to choose the optimal motion candidate, $\dot{\mathbf {q}}^{*}$ ,

\begin{align} \dot{\mathbf {q}}^{*} = \arg \max _{\dot{\mathbf {q}} \in \mathbf {Q_{u_{0}}}}[ R_A(\dot{\mathbf {q}}) + val(UP_{T(s_{0}, \dot{\mathbf {q}}, dt)})]. \end{align}

(8)

3.2 Limitations of PrediKCT

While PrediKCT forms a complete tree to evaluate paths of a consistent length into the future, this requires substantial processing time after an operator command is received to build out this tree and evaluate possible rewards. This introduces latency in the control system as the motion optimization step is performed between the time an operator issues a command and the translation system issues the final joint motion command to the robot. Though the presented system characterization uses a simulated operator to characterize the PrediKCT system in simulation to obtain a consistent source of commands, this latency may degrade performance with real operators.

Additionally, the branching nature of the motion candidate nodes in PrediKCT leads to evaluating local nullspace motion according to inconsistent future nullspace motion. Given a particular path down the tree to any leaf node, the definition of $\mathbf {z}(\mathbf {q})$ is not guaranteed to be consistent between motion candidate nodes at different levels of this path due to using probabilistic sampling to generate possible variations in this nullspace motion strategy. Since each path to a leaf node represents a single rollout of possible future motion, this causes evaluation of inconsistent nullspace motion rather than consistent motion strategies during these rollouts. As rearranging the joints to more maneuverable joint states may require multiple timesteps of control, this uses computational resources on sample rollouts that may be less likely to lead to productive motion.

The next section describes a novel system, PrediKCS, that extends and modifies the basic approach of PrediKCT to an anytime algorithm that performs its computation between operator commands rather than after receiving an operator command. This allows lower latency control while still applying the idea of predictive modeling for redundancy resolution. Additionally, PrediKCS utilizes rollouts of consistent nullspace motion strategies to further explore the space of possible nullspace motion, while evaluating each motion strategy over several timesteps of consistent behavior.

4 Predikcting Anytime: Predictive Kinematic Control Search

The PrediKCS approach modifies PrediKCT’s tree search framework to create an anytime algorithm that searches over local nullspace motion between receipt of operator commands. This addresses two key issues of PrediKCT: latency introduced from building out a tree of a predetermined size and a lack of depth in the tree due to the exponential nature of the tree’s branching factors. PrediKCS abandons PrediKCT’s tree-based framework in favor of Monte Carlo estimation of the expected result from applying individual nullspace motions. This allows longer-term prediction without the exponential branching factor of a tree, trading off more thorough expansion of a possible future trajectory for a greater number of trajectory rollouts with each assuming consistent nullspace motion throughout.

4.1 PrediKCS System Formulation

Instead of completing a tree search, PrediKCS uses a continuous multi-armed bandit approach with a rolling window of sample rollouts. The goal of this approach is to allow rapid evaluations of each possible nullspace motion, with the search over the space of these motions widening until a new operator command is received. The result is an anytime algorithm that can operate in a rapid control update system but produces increasingly thorough searches over possible nullspace motions the more time is given. Additionally, framing the search in this manner and utilizing a model of the operator for initial command prediction creates a low-latency control system. At the receipt of a new operator command, the only computation required is to calculate the pseudoinverse of the Jacobian based on the current joint state and apply the nullspace motion from the current best-performing arm of the bandit. Finally, the rolling window of sample rollouts reuses computation done from rollouts in temporally close states to allow more thorough search over possible nullspace motions without requiring a fixed amount of computation time between motion executions.

4.1.1 Redundancy Resolution as a Continuous Multi-armed Bandit Problem.

A continuous multi-armed bandit establishes a framework for optimizing sequential decisions about exploration of possible reward-generating actions. Ultimately, the goal of this exploration is to determine the action that leads to the highest expected future reward. This setting consists of a space of infinitely many “arms” that can be sampled at any given time to receive a sample reward out of a (unknown) probability distribution of rewards generated by that arm [2, 49]. After some exploration phase in which arms are sampled, the simple regret is the difference in expected reward from the optimal arm (the arm with the highest expected reward) and the current estimate of the optimal arm [8].

PrediKCS addresses the problem of redundancy resolution in teleoperation through the lens of a continuous multi-armed bandit where the arms represent a target joint configuration and the reward from choosing an arm is given by a function of the motion resulting from moving toward that target in the nullspace of the Jacobian. Then, minimizing the simple regret results in choosing the target configuration with the highest expected reward over predicted future motion.

More formally, the arms of the bandit correspond to possible configurations $\mathbf {q}^*$ for the desired motion,

\[\begin{gather} \mathbf {z}_{\mathbf {q}^*}(\mathbf {q}) = \alpha (\mathbf {q}^* - \mathbf {q}), \end{gather}\]

(9)

where this desired motion is applied as possible in the nullspace at each timestep (here $\alpha$ represents a gain parameter for this desired motion). For a given operator action commanding some task space motion, u, the final joint movement $\dot{\mathbf {q}} = f(\mathbf {q}, \mathbf {u}, \mathbf {q}^*)$ is determined by

\[\begin{gather} f(\mathbf {q}, \mathbf {u}, \mathbf {q}^*) = \mathbf {J}^{\dagger }(\mathbf {q})\mathbf {u} + (I - \mathbf {J}^{\dagger }\mathbf {J})\mathbf {z}_{\mathbf {q}^*}(\mathbf {q}). \end{gather}\]

(10)

To select the optimal arm $\mathbf {q}^*$ , PrediKCS searches for the configuration that maximizes the expected reward according to some arbitrary objective function $R_A(\cdot)$ , which can be designed to incentivize desired types of motion. The example implementation of this reward function used for system evaluation is given in Section 4.1.3. Using a probabilistic model of operator commands, $p(\mathbf {u} | s)$ , and a transition function consisting of a kinematic model of joint motion, $s_{t+dt} = T(s_t, \dot{\mathbf {q}}, dt)$ , this creates a generative model that can be used to sample possible rewards obtained from different target configurations over some finite time horizon h.

Thus, PrediKCS frames the problem of selecting a target configuration as a multi-armed bandit over continuous joint space. Each “arm,” corresponding to different possible target configurations, results in some stochastic reward consisting of the expected value of using this configuration as a target for nullspace motion over a finite horizon of predicted future operator commands. This reward is stochastic due to the user model considering a distribution of possible operator commands at each timestep within this sequence.

The basic procedure of PrediKCS consists of using the time between operator commands to explore the expected rewards of bandit arms through generating sample rewards from these arms. These sample rewards are produced using a predictive model of the operator to analyze expected future motion. This process continues until a new operator command is received, at which time motion is generated using the target joint configuration for nullspace motion corresponding to the current best-performing bandit arm. The result is an anytime algorithm that explores the space of possible target joint configurations until an operator command is received. Once a command is received, the desired end effector motion is translated into joint motion with minimal latency using the existing estimate of the optimal bandit arm.

Sample Rollout Generation

The key exploration action in multi-armed bandits consists of selecting arms to receive a sample from, providing information about the expected reward from picking that arm. To implement this process of arm selection, PrediKCS evaluates a given candidate target joint configuration, $\mathbf {q}^c$ , through Monte Carlo estimation of the expected reward using rollouts of future joint motion and predicted user commands. These rollouts can be evaluated according to the desired reward function to produce a sample reward for selecting that arm. PrediKCS allows specification of parameters dt and h representing the granularity and total time range in rollout generation, respectively, where h is some multiple of the sampling rate dt. Then, rollouts of predicted joint motion are generated h seconds into the future by utilizing the probabilistic user model to sample $\frac{h}{dt}$ possible future user commands.

Given some starting state, $s_t$ , defining joint states for all joints in the robot, a rollout $v^{q^c}_{t:t+h} = \lbrace (s_t, \mathbf {u}_t, \dot{\mathbf {q}}_t),\ldots , (s_{t + h}, \mathbf {u}_{t + h}, \dot{\mathbf {q}}_{t + h})\rbrace$ is created by iteratively sampling possible user actions, determining the joint movement using target configuration $\mathbf {q}^c$ , and applying the kinematic motion model to obtain the next state,

\begin{align} \begin{split}\mathbf {u}_{t + i(dt)} \sim p(u|s_{t + i(dt)}) \\ \dot{\mathbf {q}}_{t + i(dt)} = f(\mathbf {q}_{t + i(dt)}, \mathbf {u}_{t + i(dt)}, \mathbf {q}^c), \mathbf {q}_{t + i(dt)} \in s_{t + i(dt)} \\ s_{t + (i + 1)(dt)} = T(s_{t + i(dt)}, \dot{\mathbf {q}}_{t + i(dt)}, dt),\end{split} \end{align}

(11)

for all $0 \le i \le \frac{h}{dt}$ , $i \in \mathbb {Z}$ .

The sample reward for this rollout is given by the sum of time-discounted rewards,

\begin{align} val(v^{q^c}_{t:t+h}) = \sum _{i = 0}^{\frac{h}{dt}} \gamma ^i R_A(s_{t + (i)dt}, \dot{\mathbf {q}}_{t + (i)dt}, \mathbf {u}_{t + (i)dt}), \end{align}

(12)

where $\gamma$ is a discount factor, $0 \lt \gamma \le 1$ .

Rolling Window of Samples

While many bandit algorithms assume a stationary reward, this particular setting may involve shifting rewards. As the robot’s joint state changes over time, the expected reward from following a nullspace target may shift as well. For non-stationary bandit problems, one strategy extending across specific algorithms for regret minimization is to use a rolling window of arm sample rewards to estimate the value of that arm [9, 16, 47]. PrediKCS follows this approach to modify an algorithm developed in the stationary bandit setting to this non-stationary application of robotic motion.

To sufficiently search the joint space for $\mathbf {q}^*$ while maintaining a responsive teleoperation control system, PrediKCS reuses arm rollouts from previous timestep over some rolling window of length $\tau$ seconds. The choice of $\tau$ balances between model accuracy and search thoroughness: A large value of $\tau$ will generate a thorough search of the joint space for promising target configurations but will also involve estimating arm reward distributions using outdated user models and initial states. Thus, selecting a $\tau$ value is an important part of system tuning to achieve informative and accurate knowledge of which arm should be selected at each timestep.

To reuse these rollouts after their generation time, rollouts are generated for $h + \tau$ seconds. Then, for a rollout generated at time t, this rollout can be temporally aligned with rollouts generated at time $t+\epsilon$ , $0 \le \epsilon \le \tau$ by considering rollout steps $\lceil \frac{\epsilon }{dt}\rceil$ through $\lceil \frac{\epsilon }{dt}\rceil + h$ , denoted as

\begin{align} Align(v^{q^c}, t, t+\epsilon) = v^{q^c}_{t + \epsilon :t + \epsilon + h}. \end{align}

(13)

Figure 1 illustrates this alignment of rollouts generated at differing times.

In practice, PrediKCS discards rollouts where the predicted state $\mathbf {q}_{\lceil \frac{\epsilon }{dt}\rceil } \in v^{q^c}_{t+\lceil \frac{\epsilon }{dt}\rceil }$ is sufficiently far from the actual joint configuration $\mathbf {q}_{t+\epsilon } \in s_{t + \epsilon }$ to only consider rollouts that have (sufficiently closely) modeled the correct current joint state at the temporally aligned rollout timestep.

Fig. 1.

4.1.2 Voronoi Optimistic Optimization.

With nullspace motion selection thus framed as a continuous multi-armed bandit problem, the problem reduces to estimating the bandit arm with the highest expected reward through minimizing the simple regret. To do this, PrediKCS uses a state-of-the-art algorithm for sequential exploration for function optimization, Voronoi Optimistic Optimization (VOO), to sample possible arms on which to perform reward-estimating rollouts. VOO is specifically developed for function optimization of high-dimensional search spaces [27], fittingly for this application of bandit arms corresponding to high-DOF joint configurations. VOO uses Voronoi partitioning, more densely sampling areas of the joint space that show more promising potential reward.

As developed by Kim et al. [27], VOO selects arms from the continuous joint space by iteratively sampling either randomly over the entire space or in a targeted manner within the Voronoi cell created by the current best-performing arm. More precisely, for each new sample to be rolled out, PrediKCS samples a random joint configuration with probability w, or with probability $1 - w$ , it samples a joint configuration closer to the configuration with the highest expected reward than any other (where w is a system parameter specified such that $w \in [0,1]$ ). In practice, the sampling step in VOO is often implemented by sampling from a Gaussian centered at the best performing configuration until finding a new configuration within this best configuration’s Voronoi cell [31]. PrediKCS follows this approach to sampling the Voronoi cell of the current best configuration.

While the original application for VOO assumes a deterministic transition function, this application has effectively a probabilistic state transition model due to the probabilistic model of future operator commands. To compensate for this, when a new sample is selected, Monte Carlo estimation is used to estimate this sample’s reward across k rollouts instead of evaluating based on a single rollout.

The full sampling algorithm, presented as an anytime algorithm that will continue to produce samples until TimeRemaining() returns false, is presented in Algorithm 1. An example of the resulting clustering behavior of the VOO-based sampling method is illustrated in Figure 2.

4.1.3 Candidate Reward Function.

The example PrediKCS parameter implementation uses a similar reward function to PrediKCT. This sample reward function is a weighted combination of three terms: distance error, singularity avoidance, and joint limit avoidance.

Distance error ( $R^1_A$ ) evaluates the distance between the resulting position from following the proposed joint motion candidate over time window dt, versus the position obtained by directly applying the velocity command to the end effector’s pose (given by the forward kinematics function operating on the state, $fk(s_t)$ ) over that same time window. The singularity avoidance term ( $R^2_A$ ) incentivizes joint configurations that increase the Yoshikawa manipulability index (using $\mathbf {J}_{t+dt} = \mathbf {J}(\mathbf {q}_t + dt \dot{\mathbf {q}})$ , where $\mathbf {q}_t \in s_t$ ), while the joint limit avoidance term ( $R^3_A$ ) counts the number of joints within some $\epsilon$ of their limit (if the joint is non-continuous),

\[\begin{gather} R^1_A(\dot{\mathbf {q}}) = dist(T(s_t, \dot{\mathbf {q}}, dt), fk(s_t) + (dt)\mathbf {u})), \end{gather}\]

(14)

\[\begin{gather} R^2_A(\dot{\mathbf {q}}) = \sqrt {det(\mathbf {J}_{t+dt}\mathbf {J}^T_{t+dt})}, \end{gather}\]

(15)

\[\begin{gather} R^3_A(\dot{\mathbf {q}}) = \sum _{q_i \in \mathbf {q}_t} {1\!\!1} (Lim(i) - (q_i + (dt)\dot{q}_i)) \lt \epsilon). \end{gather}\]

(16)

The final reward is a weighted combination of these terms:

\[\begin{gather} R_A(\dot{\mathbf {q}}) = w_1R^1_A(\dot{\mathbf {q}}) + w_2R^2_A(\dot{\mathbf {q}}) + w_3R^3_A(\dot{\mathbf {q}}). \end{gather}\]

(17)

In the example implementations evaluated through a user study, the tuned set of weights used is $\lbrace w_1 = -50.0, w_2 = 5.0, w_3 = -0.5\rbrace$ , which keeps the order of magnitude roughly the same between reward functions. Note that $R^2_A(\cdot)$ has a positive weight, since higher manipulability indices are desirable while the other two functions represent objectives to be minimized.

Fig. 2.

4.2 Example Goal-based Operator Model

PrediKCS, like PrediKCT, is agnostic to the specifics of the probabilistic model of future operator commands used. However, for the completeness of the implementation, the operator model described in this section is used with PrediKCS to allow online inference of operator goals along with goal-based prediction of future commands.

4.2.1 Inferring Operator Goal from State–Action Pairs.

Given a discrete set of goals, $\Phi$ , this operator models calculates the likelihood of an individual goal given a particular state and action as proportional to the expected reward for that goal. We define the expected reward for a given goal as the expected decrease in distance to that goal, $R_{H, \phi }(s, \mathbf {u}) = dist(s, \phi) - dist(s, T(s, \mathbf {u}, dt))$ , where $T(s, \mathbf {u}, dt)$ models the expected resulting state from applying action $\mathbf {u}$ in state s over time dt. This likelihood is converted into the estimated probability of each goal using a softmax over the goal set,

\[\begin{gather} p(\phi | s, \mathbf {u}) = \frac{e^{R_{H, \phi }(s, \mathbf {u})}}{\sum _{\phi ^{\prime } \in \Phi }e^{R_{H, \phi ^{\prime }}(s, \mathbf {u})}}. \end{gather}\]

(18)

The operator model obtains the likelihood of each goal, $l(\phi)$ , according to the combined probability over a trajectory of state–action pairs $\xi = \lbrace (s_0, \mathbf {u}_0), \ldots , (s_t, \mathbf {u}_t)\rbrace$ ,

\[\begin{gather} l(\phi | \xi _t) = \prod _{i=0}^{t}p(\phi | s_t, \mathbf {u}_t). \end{gather}\]

(19)

Finally, the model creates a categorical distribution over the goal set by normalizing over the individual goal likelihoods,

\[\begin{gather} p(\phi | \xi _t) = \frac{l(\phi | \xi _t)}{\sum _{\phi ^{\prime } \in \Phi } l(\phi ^{\prime } | \xi _t)}. \end{gather}\]

(20)

4.2.2 Generating Rollouts through Sampling Possible Operator Commands.

To generate the set of rollouts $V^q$ for configuration q from state $s_t$ at time t, this model first uses inverse transform sampling to pick a possible user goal $\phi ^*$ weighted by the current estimated probability distribution over possible goals, $\phi ^* \sim p(\Phi |\xi _t)$ . Then, at each state in this rollout, a command toward the selected goal is sampled by first creating a twist, $\mathbf {u}^*$ , in the direction of $\phi ^*$ from s. The sampled command $\mathbf {u}$ is created by drawing from a Gaussian for each dimension of the twist command centered at $\mathbf {u}^*$ and with variance equal to some fraction of the magnitude of the twist: $\mathbf {u} \sim \mathcal {N}(\mathbf {u}^*, |\frac{\mathbf {u}^*}{c}|)$ (in practice, a variance $\frac{1}{10}$ the magnitude of the ideal twist generates appropriately varied commands). Finally, $\mathbf {u}$ is scaled according to the magnitude of the last recorded operator command, $\bar{\mathbf {u}} = \frac{\mathbf {u}}{||\mathbf {u}||}||\mathbf {u}_t||$ .

4.2.3 Importance Sampling for Correction of Old Samples.

Since PrediKCS utilizes a rolling window of samples spread out over $\tau$ seconds, samples to be considered at the current timestep may be anywhere between 0 and $\tau$ seconds old. As the operator model shifts over time, this results in old configuration rollouts having been sampled under a different probability distribution than the currently estimated probability distribution of the operator’s goal. Consequently, PrediKCS uses importance sampling to weight rollouts at old timesteps according to the probability distribution of the goal estimation at the current timestep.

More formally, consider the estimation of expected reward at timestep $t + \epsilon$ from some target null-space configuration $\mathbf {q}^c$ that was originally sampled at time t, where $0 \le \epsilon \le \tau$ . The rollouts for this configuration were generated using the probability distribution over goals $p(\Phi |\xi _{t})$ , while the current operator model gives the updated distribution $p(\Phi |\xi _{t + \epsilon })$ . For a particular rollout $v^{q^c}$ , let the mapping $\phi _{v^{q^c}}$ give the sampled goal toward which actions were generated for this rollout. Then, to estimate the reward for configuration $q^c$ using the rollout set $V^{q^c}$ , importance sampling weights each rollout by the sampling ratio of this goal under the current and former probability distributions,

\begin{align} \frac{1}{|V^{q^c}|}\sum _{v^{q^c} \in V^{q^c}} val\left(v^{q^c}_{t+\epsilon :t+\epsilon +h}\right) \frac{p(\phi _{v^{q^c}} | \xi _{t+\epsilon })}{p(\phi _{v^{q^c}} | \xi _{t})}. \end{align}

(21)

4.3 User Study on Teleoperation through PrediKCS

To evaluate the effectiveness of PrediKCS at translating operator motion commands into joint space motion, we conducted a user study in which users operated a robotic arm through manipulation tasks using task space motion commands. This study consisted of a within-subjects design where participants completed two tasks (one pick-and-place and one grasp-and-carry task) using four different control translation systems. The order in which participants encountered these controllers was counterbalanced through balanced Latin squares. The four controllers consisted of two versions of PrediKCS (local optimization versus longer-term optimization) with a minimum-norm joint velocity baseline and an inverse kinematics-based joint position optimizer. The details of these four controllers were as follows:

–

Minimum-Norm Joint Velocity Controller – This controller, called the Min Vel controller for the remainder of this section, directly applies the pseudoinverse of the Jacobian to the desired task space velocity vector, with no movement in the null-space of the Jacobian: $\dot{\mathbf {q}} = \mathbf {J}^{\dagger }(\mathbf {q})\mathbf {u}^H$ . This translates task space commands into joint space through minimizing the norm of the joint velocity vector.

–

RelaxedIK Joint Optimizing Controller – This controller, called the RIK Target controller from here onward, uses RelaxedIK [39, 40], an optimization method for solving inverse kinematics with consideration of joint manipulability and smoothness of motion. The controller queries a RelaxedIK ROS node for joint solutions corresponding to the pose from following the user’s most recent twist command over some small time period and then applies joint velocities toward the resulting IK solution.

–

Short-Horizon PrediKCS – The third controller, denoted as Short-Horizon PrediKCS in further discussion, uses the PrediKCS algorithm with only local optimization. The full PrediKCS system is used for generating samples, but samples are only evaluated one timestep into the future with the operator’s next predicted action resulting in rollouts not being used for reward estimation. Samples are kept for one additional timestep past generation time to retain the Voronoi sampling procedure’s ability to bias toward promising areas of the search space over multiple timesteps. Specifically, this controller uses parameter values of $dt = 1.0$ , $h = 0.0$ , $\tau = 0.1$ , and $w = 0.8$ .

–

Long-Horizon PrediKCS – The final controller, called Long-Horizon PrediKCS, uses PrediKCS with longer-term prediction and reuse of previously sampled configurations. This controller’s parameter values are $dt = 1.0$ , $h = 11.0$ , $\tau = 3.0$ , and $w = 0.5$ .

Both PrediKCS controllers use the operator model described previously with four possible goal locations, corresponding to the four task endpoints (the starting and goal locations of both objects). The belief state of this operator model is reset each time a gripper command is issued. No history of past belief is taken into account once this reset occurs, so there are always four possible goal configurations in the goal belief state.

Fig. 3.

4.3.1 Experiment Procedure.

With each of these four controllers, participants attempted to complete two tasks through controlling the arm of a Fetch robot using a PlayStation 4 controller. Participant inputs on this controller mapped to end effector twist commands, with particular button presses operating the end effector’s gripper to manipulate objects in the environment. The magnitude of twist commands was continuous using the PlayStation 4 controller’s dual joysticks to control the three axes (one joystick controlled the Z axis and the other controlled X and Y axes), while participants selected between linear and angular control modes with the bumper buttons on the back of the controller. Linear velocity commands were capped at 0.075 meters per second per axis while angular velocity commands were capped at 0.2 radians per second. While the Min Vel, Short-Horizon PrediKCS, and Long-Horizon PrediKCS controllers can all turn user commands into joint motion in only the time it takes to calculate the Jacobian pseudoinverse at the current joint configuration, the RIK Target controller demonstrated smoother motion with a slower control update that allowed time for reaching the target joint velocities after the IK solution was generated. Consequently, all controllers were set to a control update rate of 10 Hz, which we found through pre-testing to create generally smooth motion from each controller. The experimental tasks and control setup were designed to represent the operation of consumer-grade robots. While teleoperation of fixed-base, safety-critical systems such as surgical robots usually involves precise control rigs with high-frequency input, control of mobile-base, consumer-grade robotic systems is often done through video game controllers for the benefit of user familiarity. For example, video game controllers have been used for teleoperation of mobile manipulators in data collection at scale [6] and even bomb disposal [17, 38].

The first task consisted of picking up a cube on the ground in front of the robot, then placing it in a bin on the ground beside a robot. This task used a pick-and-place style of motion, with the end effector remaining pointed toward the ground in the shortest possible path. The second task involved grabbing a plant from a covered container on one side of the robot and placing it upright in another bin on the other side of the robot. This task used a grasp-and-carry type movement, with the shortest possible path keeping the end effector horizontal throughout the movement but rotating the orientation within the plane. The general experiment setup is shown in Figure 3.

Before the experiment began, participants were given instructions on how to operate the robot using the PS4 controller and given 2 minutes to get familiar with the controls through open-ended practice. During this control period, participants used the Min Vel controller. After this, the experimented explained the two tasks to be completed to participants before beginning with each controller in the designated order. Participants were not told details of the four controllers before operation but were simply told that the four controllers would affect how their inputs were transformed into joint motion.

Participants were given a max of two attempts per task for each controller. A task was classified as a failure if the task was rendered impossible during operation (e.g., from knocking an object to be grasped out of the robot’s reach) or if participants took more than 5 minutes on one portion of the task. In our analysis, we classify a single failure followed by a success as an incidental error and two consecutive failures as evidence of inability to accomplish the task. Consequently, if participants completed the task in either of the two attempts, then only the successful task completion trajectory was used for analysis. Otherwise, if both attempts with a single controller resulted in failure, then the two failed trajectories were concatenated for analysis. The experiment took approximately 30–60 minutes depending on participant control ability, and participants were compensated $10 for their participation.

4.3.2 Participants.

A total of 13 participants completed the experiment. One participant’s data were discarded for not following task instructions, providing data from 12 participants (8 male and 4 female) divided into three balanced Latin squares. On a five-point Likert-style questionnaire, participants reported medium familiarity with robots ( $M = 2.67$ , $SD = 1.37$ ) and medium-to-high familiarity with video games ( $M = 3.50$ , $SD = 1.19$ ).

4.3.3 Measures.

The effect of these four controllers on the success of the participant’s control is illustrated through analyzing both the local responses to participant commands and the entire trajectories of the task completions. During the experiment, robot joint states and participant motion commands were recorded at a rate of 20 Hz. The measures presented result from analysis of these recorded trajectories, combining trajectories between the two tasks, and ignoring timesteps at which the participants issued no commands and the robot was stationary.

The Path Distance metric gives the total distance traveled by the end effector while completing the tasks as calculated by the sum of pose distances between all consecutive end effector poses. This measure reflects how optimally the participant is able to complete the task with the given controller.

Next, the Joint Distance metric gives the total distance in radians between joint configurations across a trajectory, summed over all seven joints. This metric illustrates how much the joints of the arm actually moved across trajectory completion.

The Local Error metric gives the sum of squared pose distances over a trajectory between the actual end effector pose and the ideal pose—that is, the end effector pose that would result from applying the participant’s last twist command directly to the previous recorded end effector pose over the time between these two data points. This measure demonstrates how precisely the robot’s end effector motion follows the participant’s issued commands, rising when factors such as joint limits, joint dynamics, or kinematic singularities inhibit operator commands from being followed precisely.

Finally, the High Error Actions metric represents the number of participant commands that resulted in extremely high errors between ideal pose and actual pose. To calculate this number, the local errors for all actions across all participants and all controllers are first computed. Then, the distance value of the 95th percentile of these total errors is used as the threshold for these actions. That is, the number of High Error Actions in a trajectory is the number of actions in this trajectory that fall into the worst $5\%$ of actions across all data points.

To calculate pose distances in these results, the linear distance in meters l and angular distance in radians a (calculated as the geodesic distance between two orientation quaternions) were combined through a weighted blending,

\begin{align} d = w_1l + w_2a, \end{align}

(22)

where $w_1 = 2.67$ and $w_2 = 1.0$ . This weighting adjusts for the participant’s command limits scaling of radians per second being 2.67 times higher in angular control mode than the meters per second scale in linear control mode.

4.4 User Study Results

The collected data illustrates that participants were generally able to complete the tasks with all four controllers. Only 11 failed task attempts were recorded, split across all four of the controllers: RIK Target controller (3 isolated failures followed by successful task completions, 1 double failure resulting in an incomplete task), the Min Vel controller (3 isolated failures followed by successful task completions), the Short-Horizon PrediKCS controller (1 double failure resulting in an incomplete task), and the Long-Horizon PrediKCS controller (1 isolated failure followed by successful task completion).

We perform a one-way repeated-measures ANOVA to test the effect of controller type on each of the described measures. The results are described for each measure separately below.

4.4.1 Path Distance.

The ANOVA illustrates a significant effect of controller type on Path Distance, F(3, $33) = 8.36$ , $p \lt 0.001$ . Post hoc comparisons using Tukey’s HSD test show significantly shorter paths for Long-Horizon PrediKCS ( $M = 44.1$ , $SD = 9.16$ ) than the Min Vel controller ( $M = 65.9$ , $SD = 18.3$ ), $p \lt 0.001$ , and the RIK Target controller ( $M = 58.5$ , $SD = 16.2$ ), $p = 0.021$ . The Short-Horizon PrediKCS controller ( $M = 46.6$ , $SD = 12.9$ ) also has significantly shorter paths on average than the Min Vel controller only, $p \lt 0.001$ . No other pairwise comparisons are significant. Thus, both PrediKCS controllers allow operators to more directly accomplish the tasks than by using the Min Vel controller, although only the Long-Horizon PrediKCS controller with long-term prediction significantly outperforms the RIK Target controller at this metric.

4.4.2 Joint Distance.

On the Joint Distance metric, however, the analysis finds no significant effect of controller type, F(3, $33) = 1.95$ , $p = 0.141$ . Juxtaposed with the Path Distance results, this paints an interesting picture: Despite the Long-Horizon PrediKCS controller allowing operators to use significantly shorter end effector paths than the Min Vel and RIK Target controllers, this difference does not show up in the actual joint movement. This illustrates how PrediKCS works: Through “extra” nullspace motion, future end effector maneuverability is preserved at the cost of larger joint motion locally per timestep.

4.4.3 Local Error.

For Local Error, running an ANOVA shows a significant effect of controller type, F(3, $33) = 5.83$ , $p = 0.003$ . Using Tukey’s HSD shows that the Long-Horizon PrediKCS controller ( $M = 0.365$ , $SD = 0.231$ ) again has significantly lower error than both the Min Vel controller ( $M = 0.705$ , $SD = 0.295$ ), $p = 0.021$ , and the RIK Target controller ( $M = 0.741$ , $SD = 0.495$ ), $p = 0.008$ . Likewise, the Short-Horizon PrediKCS controller ( $M = 0.384$ , $SD = 0.218$ ) also significantly reduces Local Error from the Min Vel controller $p = 0.034$ and the RIK Target controller $p = 0.014$ . Again, there is no significant difference between the pairwise comparisons of the two PrediKCS controllers or the Min Vel and RIK Target controllers, respectively. These results demonstrate that the PrediKCS controllers give operators improved precision of motion commands on average over the other two controllers.

4.4.4 High Error Actions.

Finally, controller type is also found to have a significant effect on the number of High Error Actions per trajectory, F(3, $33) = 6.44, p \lt 0.002$ . While Short-Horizon PrediKCS ( $M = 168$ , $SD = 112$ ) lessens the number of these actions from Min Vel ( $M = 352$ , $SD = 222$ ), $p = 0.032$ , the Long-Horizon PrediKCS controller ( $M = 108$ , $SD = 64.9$ ) significantly decreases these actions over both the Min Vel controller ( $p = 0.002$ ) and the RIK Target controller ( $M = 333$ , $SD = 260$ ), $p = 0.005$ . No other pairwise comparisons show significant differences on Tukey’s HSD test. The PrediKCS control system, and particularly the use of long-horizon prediction in PrediKCS, reduces the amount of time spent in poorly maneuverable configurations where operator twist commands can not properly be converted into joint motion.

4.4.5 Summary.

In general, these results show that long-horizon prediction through PrediKCS can lead to improved operator performance in teleoperation over both a baseline minimum joint velocity controller and a controller based on a state-of-the-art IK optimizer. While this version of PrediKCS may not lower overall joint movement, it can utilize this higher joint movement per timestep to allow the operator to more directly and accurately control the motion of the end effector in task space. PrediKCS using only local prediction illustrates improvements over the minimum joint velocity controller as well, though it only demonstrates significant performance improvements over the RelaxedIK-based controller on local error. Generally, the two PrediKCS controllers perform similarly, but the longer-horizon PrediKCS produces more consistently improved behavior over the other two controllers. Figures 4 and 5 illustrate these results.

Fig. 4.

Fig. 5.

4.5 Discussion and Limitations

PrediKCS creates a system for choosing nullspace motion in teleoperation that can “look ahead” to lower the likelihood of running into a joint configuration that makes operation difficult. In the presented user study, participants often had difficulty with task 1 using the MinVel controller particularly due to the elbow of the 7-DOF arm preventing end effector motion toward the robot’s base. Through using a model of what an operator is attempting to do and optimizing over likely future trajectories, PrediKCS illustrates a way to prevent these difficulties by rearranging joints according to what will allow future motion to best follow operator commands.

4.5.1 Horizon Length’s Impact on PrediKCS.

To provide more insight into how Short- and Long-Horizon lowered error in this user study, Figure 6 illustrates the mean and $95\%$ confidence intervals of the squared error in every $5\%$ of the trajectory, separated by task (the two double failures were not included in this data). The visualization helps elucidate how error differs between the controllers over the trajectory: The Min Vel controller maintains generally larger error over most portions of both trajectories, while RIK Target appears prone to more numerous spikes in error than the PrediKCS controllers (perhaps due to occasional joint reconfigurations). The two PrediKCS controllers behave similarly with low error for much of the two trajectories along with occasional spikes. However, the two controllers differ in which portions of the two trajectories resulted in these spikes in error (e.g., Long-Horizon PrediKCS demonstrates higher error around the picking action at the midpoint of Task 1, while Short-Horizon PrediKCS gave users more difficulty in the placement section of Task 2).

Fig. 6.

This study did not illustrate significant differences in performance between a local PrediKCS system and a longer-horizon PrediKCS system. While both utilize prediction to some degree (and indeed some prediction is required for PrediKCS to generate samples ahead of time and run as an anytime algorithm), the longer-horizon PrediKCS might be expected to perform better than optimizing only locally. The data suggest that the longer-horizon PrediKCS was more consistent, demonstrated through its more consistent improvements over the other two controllers. One possible reason for this is that the operator model utilized only allows modeling of the next waypoint to which an operator is moving. This may have rendered long-term planning obsolete, since only considering local linear movement toward the current waypoint would produce similar joint movement to simulating the full movement toward this waypoint. A more complex operator model that could look ahead at future waypoints according to an inferred task may allow more complex joint motion planning to accommodate the full length of a task.

As Figure 6 suggests, advantages between shorter- and longer-horizon PrediKCS may also correlate with the layout of the specific task to be completed. We expect longer horizons to be better for tasks involving hard-to-reach poses for which early maneuvering may be necessary. Conversely, shorter horizons may perform better in the absence of these poses due to the more thorough exploration of the search over target configurations afforded by faster sample rollouts. Generally, the tunability of PrediKCS through adjustment of horizon lengths and operator models is notable. Future work on PrediKCS could further quantify the impacts of these parameters to determine how task specifics can inform this parameter tuning.

4.5.2 Operator Preference in Teleoperation.

This study involved operators who were relatively novice at using each control system. Future work could investigate the adaption over time to each control system and whether participants might learn to trust a controller like PrediKCS as operation commands are more consistently translated into proper joint motion. Additionally, since these control systems require the input of twist commands, other control interfaces for twist commands could be evaluated. More natural (e.g., motion controllers) vs. more precise (e.g., joysticks with haptic feedback) control interfaces may produce different types of operator behavior; evaluating these control systems across further control interfaces would help elucidate the impact they may have across future interfaces for robotic control.

Furthermore, the discussed system implements one of several possible paradigms for teleoperation: task-space velocity control. Other pose-IK systems, such as RelaxedIK, implement task-space pose control. Finally, higher-level interfaces use semantically meaningful actions or partial trajectories from which users can select. A systematic method for determining which of these paradigms is appropriate for a given context is an open question. Future work investigating how different users react to different control paradigms could better inform when systems like PrediKCS are most appropriate and worth the computational expense.

5 Conclusion

The Predictive Kinematic Control Tree and Predictive Kinematic Control Search systems turn end effector motion commands into joint motion of redundant manipulators through sampling and evaluating possible motion in the nullspace of the Jacobian through multiple timesteps of expected future commands. PrediKCT builds out a tree of possible future trajectories after operator command receipt, creating improved nullspace motion but introducing latency into this control augmentation process. PrediKCS instead frames the problem as a continuous multi-armed bandit where arms represent possible joint configurations to work toward in the nullspace. This approach allows immediate translation of operator commands into joint motion according to the current best arm, generating new samples to evaluate arms between receipt of commands. Through using Voronoi Optimistic Optimization, PrediKCS samples are clustered around promising areas of the joint space and evaluated according to Monte Carlo rollouts.

Both systems apply predictive modeling of an operator to teleoperation to allow considering the estimated effects of this augmentation process across some planning horizon instead of simply the immediate effects, while PrediKCS provides an improved way to perform this predictive optimization as an anytime algorithm. A conducted user study shows that PrediKCS creates control systems that allow operators to control redundant robotic arms more directly and with less error in translation of motion commands to joint space than non-predictive comparison systems for performing this translation.

References

[1]

Daniel Aarno, Staffan Ekvall, and Danica Kragic. 2005. Adaptive virtual fixtures for machine-assisted teleoperation tasks. In Proceedings of the IEEE International Conference on Robotics and Automation. IEEE, 1139–1144.

Abstract

1 Introduction

2 Background and Related Work

2.1 Utilizing Shared Control to Improve Teleoperation

2.2 Redundancy Resolution in Robotic Control

3 PrediKCT: the Predictive Kinematic Control Tree

3.1 Building the Predictive Kinematic Control Tree

3.1.1 Motion Candidate Nodes.

3.1.2 User Prediction Nodes.

3.1.3 Tree Specification.

3.1.4 Node Value Calculation.

3.2 Limitations of PrediKCT

4 Predikcting Anytime: Predictive Kinematic Control Search

4.1 PrediKCS System Formulation

4.1.1 Redundancy Resolution as a Continuous Multi-armed Bandit Problem.

4.1.2 Voronoi Optimistic Optimization.

4.1.3 Candidate Reward Function.

4.2 Example Goal-based Operator Model

4.2.1 Inferring Operator Goal from State–Action Pairs.

4.2.2 Generating Rollouts through Sampling Possible Operator Commands.

4.2.3 Importance Sampling for Correction of Old Samples.

4.3 User Study on Teleoperation through PrediKCS

4.3.1 Experiment Procedure.

4.3.2 Participants.

4.3.3 Measures.

4.4 User Study Results

4.4.1 Path Distance.

4.4.2 Joint Distance.

4.4.3 Local Error.

4.4.4 High Error Actions.

4.4.5 Summary.

4.5 Discussion and Limitations

4.5.1 Horizon Length’s Impact on PrediKCS.

4.5.2 Operator Preference in Teleoperation.

5 Conclusion

References

Index Terms

Recommendations

Trilateral teleoperation control of kinematically redundant robotic manipulators

An Optimization Based Inverse Kinematics of Redundant Robots Avoiding Obstacles and Singularities

Self-motion utilization for reducing vibration of a structurally flexible redundant robot manipulator system

Comments

Information

Published In

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

PDF

eReader

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations