The Shortcomings of Force-from-Motion in Robot Learning
Abstract
Robotic manipulation requires accurate motion and physical interaction control. However, current robot learning approaches focus on motion-centric action spaces that do not explicitly give the policy control over the interaction. In this paper, we discuss the repercussions of this choice and argue for more interaction-explicit action spaces in robot learning.
I Introduction
Learning manipulation skills can be a key enabler for general-purpose robotics. Recent work successfully demonstrated the ability of learning manipulation skills based on reinforcement and imitation learning [1, 2, 3]. Initial efforts focused on learning control policies that act directly in the lowest-level of control of the robot [4]. Recently, however, in an effort to reduce the policy complexity and facilitate sim-to-real transfer [2, 5, 6, 7], novel action spaces have been introduced to abstract the low level control particularities and platform-specific dependencies. Action spaces are implemented as control feedback loops [2, 5, 6], motion primitives [8, 9], or latent action models [10, 7, 11]. Their goal is to simplify the policy’s role to outputting simpler commands such as position or velocity targets in either the task or configuration space of the robot. In our recent work, we studied the effects of choosing an action space, different low-level feedback loops, and policy integration schemes on exploration, policy properties, and sim-to-real transfer [12]. Our results demonstrated that the choice of action space is crucial for learning a policy in simulation and for its transfer to the real-world.
Action spaces that provide control over physical interactions have been proposed in [13, 6, 5, 14]. These action spaces are typically based on variable and adaptive impedance control or on force control in the low-level feedback loops. However, motion-centric action spaces continue to be used for interaction tasks, despite their limitations. In this paper, we argue that such abstractions limit the policy’s capability to perform certain manipulation tasks, we motivate the adoption of interaction-explicit representations, and we promote the design of suitable action spaces for general-purpose manipulation.
II Shortcomings of force-from-motion
Under force-from-motion, manipulation policies can only implicitly exert forces onto their environment by overshooting their actual motion targets. We illustrate the limitations of force-from-motion in a simple 1D pushing example with a prismatic joint, as shown in Fig. 1. The policy outputs joint position targets and its goal is to move the blue cube to the target position. The targets are tracked by a low-level joint impedance controller [15], that controls the joint force
(1) |
where and are the stiffness and damping gains, and the current joint position and velocity, and the joint position and velocity targets. Moving the cube requires applying a force with magnitude higher than , depending on the cube’s mass and the surface friction properties. We set to simplify our discussion and we assume the robot force limits to be higher than . The policy outputs are within the limits of the robot, i.e. .
For the policy to be able of moving the cube, we have
(2) |
which is quite problematic for deciding the value of in the setup. Setting, naively, , the policy will only be able to push the cube at by setting , or vice versa. For any , i.e. any position but the limits, the generated force is and the robot will not be able to move the cube. To increase the usable workspace, manipulate heavier objects, or if the contact surfaces have higher friction, needs to be increased. Increasing soon becomes problematic, especially when the task requires the robot to be compliant or when a human is in the loop. The constraint in Eq. (2) renders the task unsolvable under those requirements. Additionally, high values of and policy jitter can lead to force-clipping and unstable controllers. Allowing the policy to set outside the physical limits of the joint can reduce , but it leads to safety violations near the workspace limits and creates a trade-off between task feasibility and hardware safety. Constraints similar to Eq. (2) can also be derived for higher-order derivative action spaces. As, typically, the magnitude of feasible velocities is greater than the range or the robot joints, using them allows for more compliant control due to the larger denominator. Higher-order derivative action spaces are often adequate, despite not explicitly controlling the interaction forces [12].
Working with light objects, surfaces with low friction coefficients, and minimal human interaction alleviates force-from-motion [1, 16] shortcomings. Under these assumptions, these tasks can be successfully performed with almost any choice of action space [12].
These shortcomings emerge as the force-from-motion action spaces are not explicitly designed for interaction control and the policy applies forces indirectly from motion commands. This illustrates how the choice of action space easily hinders task success. While our 1D example is intentionally simplified, similar conclusions can be drawn for more general settings, e.g., for robots with more degrees of freedom. Scaling robot learning to dynamic and human-robot interaction tasks would require more careful considerations.
III Discussion
There are multiple approaches to overcome the shortcomings of the force-from-motion action spaces. Torque control, where the policy directly outputs joint-level torques, provides full control over the robot interactions. However, learning such policies in the real world is very challenging due to safety considerations. Training them first in simulation before deploying them on the real robot, while possible, suffers from a very large sim-to-real gap compared to other action spaces. This is due to the lack of feedback loops to compensate for dynamic mismatches between simulation and the real robot [12]. Delta action spaces, where the policy output is integrated to obtain a position or velocity target [16] provide a different approach on controlling the interaction forces, but have similar force-from-motion limitations. Delta action spaces introduce additional hidden dynamics and reduce the reactivity of the robot, that further degrade sim-to-real transfer [12]. Applied to our illustrative example, the robot will move the cube only after the position target is integrated sufficiently beyond, to generate the required force.
To overcome the limitations of force-from-motion, we can use interaction-explicit action spaces, as for example in [13, 6, 5, 14], or develop new ones. Interaction-explicit action spaces can accurately control the interaction forces and are better suited for more dynamic manipulation tasks. However, a notable drawback of these spaces is the difficulty of collecting data to train policies using imitation learning, which can significantly boost the learning process by training policies from demonstrations. Interaction-explicit action spaces that are trainable from imitation are currently missing in the literature.
In recent robot learning works, the force-from-motion has been preferred for its simplicity and effectiveness in specific scenarios, particularly when manipulating light objects and the physical robot interactions are limited. However, it is inadequate for general-purpose robotics. This article demonstrates how force-from-motion limits the range of learned behaviors and often results in undesirable effects (e.g., exceeding torque limits), even in basic scenarios. We have emphasized the necessity for more flexible action spaces that can better accommodate physical interactions and dynamic real-world tasks.
Adopting interaction-explicit action spaces could mark a significant advancement towards more robust and general-purpose robotic manipulation learning. Future work should further explore this direction and develop action spaces that are applicable to a large range of real-world-relevant manipulation tasks.
References
- [1] I. Akkaya, M. Andrychowicz, M. Chociej, M. Litwin, B. McGrew, A. Petron, A. Paino, M. Plappert, G. Powell, R. Ribas, et al., “Solving rubik’s cube with a robot hand,” arXiv preprint arXiv:1910.07113, 2019.
- [2] M. Alles and E. Aljalbout, “Learning to centralize dual-arm assembly,” Frontiers in Robotics and AI, vol. 9, 2022.
- [3] M. J. Kim, K. Pertsch, S. Karamcheti, T. Xiao, A. Balakrishna, S. Nair, R. Rafailov, E. Foster, G. Lam, P. Sanketi, et al., “Openvla: An open-source vision-language-action model,” arXiv preprint arXiv:2406.09246, 2024.
- [4] N. Wahlström, T. B. Schön, and M. P. Deisenroth, “From pixels to torques: Policy learning with deep dynamical models,” arXiv preprint arXiv:1502.02251, 2015.
- [5] R. Martín-Martín, M. A. Lee, R. Gardner, S. Savarese, J. Bohg, and A. Garg, “Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks,” in 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2019.
- [6] M. Ulmer, E. Aljalbout, S. Schwarz, and S. Haddadin, “Learning robotic manipulation skills using an adaptive force-impedance action space,” arXiv preprint arXiv:2110.09904, 2021.
- [7] A. Allshire, R. Martín-Martín, C. Lin, S. Manuel, S. Savarese, and A. Garg, “Laser: Learning a latent action space for efficient reinforcement learning,” in 2021 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 2021.
- [8] S. Bahl, M. Mukadam, A. Gupta, and D. Pathak, “Neural dynamic policies for end-to-end sensorimotor learning,” Advances in Neural Information Processing Systems, vol. 33, 2020.
- [9] E. Aljalbout, J. Chen, K. Ritt, M. Ulmer, and S. Haddadin, “Learning vision-based reactive policies for obstacle avoidance,” in Conference on Robot Learning, PMLR, 2021.
- [10] W. Zhou, S. Bajracharya, and D. Held, “Plas: Latent action space for offline reinforcement learning,” in Conference on Robot Learning, 2020.
- [11] E. Aljalbout, M. Karl, and P. van der Smagt, “Clas: Coordinating multi-robot manipulation with central latent action spaces,” in Learning for Dynamics and Control Conference, PMLR, 2023.
- [12] E. Aljalbout, F. Frank, M. Karl, and P. van der Smagt, “On the role of the action space in robot manipulation learning and sim-to-real transfer,” IEEE Robotics and Automation Letters, 2024.
- [13] C. C. Beltran-Hernandez, D. Petit, I. G. Ramirez-Alpizar, T. Nishi, S. Kikuchi, T. Matsubara, and K. Harada, “Learning force control for contact-rich manipulation tasks with rigid position-controlled robots,” IEEE Robotics and Automation Letters, 2020.
- [14] J. Luo, E. Solowjow, C. Wen, J. A. Ojea, A. M. Agogino, A. Tamar, and P. Abbeel, “Reinforcement learning on variable impedance controller for high-precision robotic assembly,” in 2019 International Conference on Robotics and Automation (ICRA), IEEE, 2019.
- [15] N. HOGAN, “Impedance control: an approach to manipulation. ii: Implementation,” Journal of dynamic systems, measurement, and control, 1985.
- [16] B. Tang, M. A. Lin, I. Akinola, A. Handa, G. S. Sukhatme, F. Ramos, D. Fox, and Y. Narang, “Industreal: Transferring contact-rich assembly tasks from simulation to reality,” arXiv preprint arXiv:2305.17110, 2023.