The Shortcomings of Force-from-Motion in Robot Learning

Elie Aljalbout^1,2, Felix Frank², Patrick van der Smagt^2,3, and Alexandros Paraschos² ¹Elie Aljalbout is currently with the Robotics and Perception Group, at the Department of Informatics of the University of Zurich (UZH) and the Department of Neuroinformatics at UZH and ETH Zurich, Switzerland.²During this work, all Authors were affiliated with the Machine Learning Research Lab at the Volkswagen Group, Munich, Germany.³Department of Informatics, ELTE University Budapest.

Abstract

Robotic manipulation requires accurate motion and physical interaction control. However, current robot learning approaches focus on motion-centric action spaces that do not explicitly give the policy control over the interaction. In this paper, we discuss the repercussions of this choice and argue for more interaction-explicit action spaces in robot learning.

I Introduction

Learning manipulation skills can be a key enabler for general-purpose robotics. Recent work successfully demonstrated the ability of learning manipulation skills based on reinforcement and imitation learning [1, 2, 3]. Initial efforts focused on learning control policies that act directly in the lowest-level of control of the robot [4]. Recently, however, in an effort to reduce the policy complexity and facilitate sim-to-real transfer [2, 5, 6, 7], novel action spaces have been introduced to abstract the low level control particularities and platform-specific dependencies. Action spaces are implemented as control feedback loops [2, 5, 6], motion primitives [8, 9], or latent action models [10, 7, 11]. Their goal is to simplify the policy’s role to outputting simpler commands such as position or velocity targets in either the task or configuration space of the robot. In our recent work, we studied the effects of choosing an action space, different low-level feedback loops, and policy integration schemes on exploration, policy properties, and sim-to-real transfer [12]. Our results demonstrated that the choice of action space is crucial for learning a policy in simulation and for its transfer to the real-world.

Action spaces that provide control over physical interactions have been proposed in [13, 6, 5, 14]. These action spaces are typically based on variable and adaptive impedance control or on force control in the low-level feedback loops. However, motion-centric action spaces continue to be used for interaction tasks, despite their limitations. In this paper, we argue that such abstractions limit the policy’s capability to perform certain manipulation tasks, we motivate the adoption of interaction-explicit representations, and we promote the design of suitable action spaces for general-purpose manipulation.

II Shortcomings of force-from-motion

Refer to caption — Figure 1: A 1-dimensional (1D) manipulation example. The task is to push the blue cube to the target. The robot can move in the range $[q_{\mathrm{min}},q_{\mathrm{max}}]$ . The exerted force $F$ , generated by the policy and the low-level controller, has to overcome the fiction $F_{\mathrm{fric}}$ for the cube to move. Using a motion-centric action space, such as joint positions, the robot can only apply forces indirectly by motion commands. That is by setting the low-level controller’s target further away than the actual target position. This approach has several shortcomings, as discussed in Sec. II and the use of an interaction-explicit action space overcomes them.

Under force-from-motion, manipulation policies can only implicitly exert forces onto their environment by overshooting their actual motion targets. We illustrate the limitations of force-from-motion in a simple 1D pushing example with a prismatic joint, as shown in Fig. 1. The policy outputs joint position targets and its goal is to move the blue cube to the target position. The targets are tracked by a low-level joint impedance controller [15], that controls the joint force

F=K(q_{d}-q)+D(\dot{q_{d}}-\dot{q}),

(1)

where $K$ and $D$ are the stiffness and damping gains, $q$ and $\dot{q}$ the current joint position and velocity, $q_{d}$ and $\dot{q_{d}}$ the joint position and velocity targets. Moving the cube requires applying a force with magnitude higher than $F_{\mathrm{min}}$ , depending on the cube’s mass and the surface friction properties. We set $D=0$ to simplify our discussion and we assume the robot force limits to be higher than $F_{\mathrm{min}}$ . The policy outputs are within the limits of the robot, i.e. $q_{d}\in[q_{\mathrm{min}},q_{\mathrm{max}}]$ .

For the policy to be able of moving the cube, we have

F\geq F_{\mathrm{min}}\implies K\geq\frac{F_{\mathrm{min}}}{q_{d}-q},

(2)

which is quite problematic for deciding the value of $K$ in the setup. Setting, naively, $K=F_{\mathrm{min}}/(q_{\mathrm{max}}-q_{\mathrm{min}})$ , the policy will only be able to push the cube at $q=q_{\mathrm{min}}$ by setting $q_{d}=q_{\mathrm{max}}$ , or vice versa. For any $q\in(q_{\mathrm{min}},q_{\mathrm{max}})$ , i.e. any position but the limits, the generated force $F$ is $F<F_{\mathrm{min}}$ and the robot will not be able to move the cube. To increase the usable workspace, manipulate heavier objects, or if the contact surfaces have higher friction, $K$ needs to be increased. Increasing $K$ soon becomes problematic, especially when the task requires the robot to be compliant or when a human is in the loop. The constraint in Eq. (2) renders the task unsolvable under those requirements. Additionally, high values of $K$ and policy jitter can lead to force-clipping and unstable controllers. Allowing the policy to set $q_{d}$ outside the physical limits of the joint $[q_{\mathrm{min}},q_{\mathrm{max}}]$ can reduce $K$ , but it leads to safety violations near the workspace limits and creates a trade-off between task feasibility and hardware safety. Constraints similar to Eq. (2) can also be derived for higher-order derivative action spaces. As, typically, the magnitude of feasible velocities is greater than the range or the robot joints, using them allows for more compliant control due to the larger denominator. Higher-order derivative action spaces are often adequate, despite not explicitly controlling the interaction forces [12].

Working with light objects, surfaces with low friction coefficients, and minimal human interaction alleviates force-from-motion [1, 16] shortcomings. Under these assumptions, these tasks can be successfully performed with almost any choice of action space [12].

These shortcomings emerge as the force-from-motion action spaces are not explicitly designed for interaction control and the policy applies forces indirectly from motion commands. This illustrates how the choice of action space easily hinders task success. While our 1D example is intentionally simplified, similar conclusions can be drawn for more general settings, e.g., for robots with more degrees of freedom. Scaling robot learning to dynamic and human-robot interaction tasks would require more careful considerations.

III Discussion

There are multiple approaches to overcome the shortcomings of the force-from-motion action spaces. Torque control, where the policy directly outputs joint-level torques, provides full control over the robot interactions. However, learning such policies in the real world is very challenging due to safety considerations. Training them first in simulation before deploying them on the real robot, while possible, suffers from a very large sim-to-real gap compared to other action spaces. This is due to the lack of feedback loops to compensate for dynamic mismatches between simulation and the real robot [12]. Delta action spaces, where the policy output is integrated to obtain a position or velocity target [16] provide a different approach on controlling the interaction forces, but have similar force-from-motion limitations. Delta action spaces introduce additional hidden dynamics and reduce the reactivity of the robot, that further degrade sim-to-real transfer [12]. Applied to our illustrative example, the robot will move the cube only after the position target is integrated sufficiently beyond, to generate the required force.

To overcome the limitations of force-from-motion, we can use interaction-explicit action spaces, as for example in [13, 6, 5, 14], or develop new ones. Interaction-explicit action spaces can accurately control the interaction forces and are better suited for more dynamic manipulation tasks. However, a notable drawback of these spaces is the difficulty of collecting data to train policies using imitation learning, which can significantly boost the learning process by training policies from demonstrations. Interaction-explicit action spaces that are trainable from imitation are currently missing in the literature.

In recent robot learning works, the force-from-motion has been preferred for its simplicity and effectiveness in specific scenarios, particularly when manipulating light objects and the physical robot interactions are limited. However, it is inadequate for general-purpose robotics. This article demonstrates how force-from-motion limits the range of learned behaviors and often results in undesirable effects (e.g., exceeding torque limits), even in basic scenarios. We have emphasized the necessity for more flexible action spaces that can better accommodate physical interactions and dynamic real-world tasks.

Adopting interaction-explicit action spaces could mark a significant advancement towards more robust and general-purpose robotic manipulation learning. Future work should further explore this direction and develop action spaces that are applicable to a large range of real-world-relevant manipulation tasks.

References

[1] I. Akkaya, M. Andrychowicz, M. Chociej, M. Litwin, B. McGrew, A. Petron, A. Paino, M. Plappert, G. Powell, R. Ribas, et al., “Solving rubik’s cube with a robot hand,” arXiv preprint arXiv:1910.07113, 2019.
[2] M. Alles and E. Aljalbout, “Learning to centralize dual-arm assembly,” Frontiers in Robotics and AI, vol. 9, 2022.
[3] M. J. Kim, K. Pertsch, S. Karamcheti, T. Xiao, A. Balakrishna, S. Nair, R. Rafailov, E. Foster, G. Lam, P. Sanketi, et al., “Openvla: An open-source vision-language-action model,” arXiv preprint arXiv:2406.09246, 2024.
[4] N. Wahlström, T. B. Schön, and M. P. Deisenroth, “From pixels to torques: Policy learning with deep dynamical models,” arXiv preprint arXiv:1502.02251, 2015.
[5] R. Martín-Martín, M. A. Lee, R. Gardner, S. Savarese, J. Bohg, and A. Garg, “Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks,” in 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2019.
[6] M. Ulmer, E. Aljalbout, S. Schwarz, and S. Haddadin, “Learning robotic manipulation skills using an adaptive force-impedance action space,” arXiv preprint arXiv:2110.09904, 2021.
[7] A. Allshire, R. Martín-Martín, C. Lin, S. Manuel, S. Savarese, and A. Garg, “Laser: Learning a latent action space for efficient reinforcement learning,” in 2021 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 2021.
[8] S. Bahl, M. Mukadam, A. Gupta, and D. Pathak, “Neural dynamic policies for end-to-end sensorimotor learning,” Advances in Neural Information Processing Systems, vol. 33, 2020.
[9] E. Aljalbout, J. Chen, K. Ritt, M. Ulmer, and S. Haddadin, “Learning vision-based reactive policies for obstacle avoidance,” in Conference on Robot Learning, PMLR, 2021.
[10] W. Zhou, S. Bajracharya, and D. Held, “Plas: Latent action space for offline reinforcement learning,” in Conference on Robot Learning, 2020.
[11] E. Aljalbout, M. Karl, and P. van der Smagt, “Clas: Coordinating multi-robot manipulation with central latent action spaces,” in Learning for Dynamics and Control Conference, PMLR, 2023.
[12] E. Aljalbout, F. Frank, M. Karl, and P. van der Smagt, “On the role of the action space in robot manipulation learning and sim-to-real transfer,” IEEE Robotics and Automation Letters, 2024.
[13] C. C. Beltran-Hernandez, D. Petit, I. G. Ramirez-Alpizar, T. Nishi, S. Kikuchi, T. Matsubara, and K. Harada, “Learning force control for contact-rich manipulation tasks with rigid position-controlled robots,” IEEE Robotics and Automation Letters, 2020.
[14] J. Luo, E. Solowjow, C. Wen, J. A. Ojea, A. M. Agogino, A. Tamar, and P. Abbeel, “Reinforcement learning on variable impedance controller for high-precision robotic assembly,” in 2019 International Conference on Robotics and Automation (ICRA), IEEE, 2019.
[15] N. HOGAN, “Impedance control: an approach to manipulation. ii: Implementation,” Journal of dynamic systems, measurement, and control, 1985.
[16] B. Tang, M. A. Lin, I. Akinola, A. Handa, G. S. Sukhatme, F. Ramos, D. Fox, and Y. Narang, “Industreal: Transferring contact-rich assembly tasks from simulation to reality,” arXiv preprint arXiv:2305.17110, 2023.