Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

The Shortcomings of Force-from-Motion in Robot Learning

Elie Aljalbout1,2, Felix Frank2, Patrick van der Smagt2,3, and Alexandros Paraschos2 1Elie Aljalbout is currently with the Robotics and Perception Group, at the Department of Informatics of the University of Zurich (UZH) and the Department of Neuroinformatics at UZH and ETH Zurich, Switzerland.2During this work, all Authors were affiliated with the Machine Learning Research Lab at the Volkswagen Group, Munich, Germany.3Department of Informatics, ELTE University Budapest.
Abstract

Robotic manipulation requires accurate motion and physical interaction control. However, current robot learning approaches focus on motion-centric action spaces that do not explicitly give the policy control over the interaction. In this paper, we discuss the repercussions of this choice and argue for more interaction-explicit action spaces in robot learning.

I Introduction

Learning manipulation skills can be a key enabler for general-purpose robotics. Recent work successfully demonstrated the ability of learning manipulation skills based on reinforcement and imitation learning [1, 2, 3]. Initial efforts focused on learning control policies that act directly in the lowest-level of control of the robot [4]. Recently, however, in an effort to reduce the policy complexity and facilitate sim-to-real transfer [2, 5, 6, 7], novel action spaces have been introduced to abstract the low level control particularities and platform-specific dependencies. Action spaces are implemented as control feedback loops [2, 5, 6], motion primitives [8, 9], or latent action models [10, 7, 11]. Their goal is to simplify the policy’s role to outputting simpler commands such as position or velocity targets in either the task or configuration space of the robot. In our recent work, we studied the effects of choosing an action space, different low-level feedback loops, and policy integration schemes on exploration, policy properties, and sim-to-real transfer [12]. Our results demonstrated that the choice of action space is crucial for learning a policy in simulation and for its transfer to the real-world.

Action spaces that provide control over physical interactions have been proposed in [13, 6, 5, 14]. These action spaces are typically based on variable and adaptive impedance control or on force control in the low-level feedback loops. However, motion-centric action spaces continue to be used for interaction tasks, despite their limitations. In this paper, we argue that such abstractions limit the policy’s capability to perform certain manipulation tasks, we motivate the adoption of interaction-explicit representations, and we promote the design of suitable action spaces for general-purpose manipulation.

II Shortcomings of force-from-motion

Refer to caption
F𝐹Fitalic_F
Ffricsubscript𝐹fricF_{\mathrm{fric}}italic_F start_POSTSUBSCRIPT roman_fric end_POSTSUBSCRIPT
target
qmaxsubscript𝑞maxq_{\mathrm{max}}italic_q start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT
qminsubscript𝑞minq_{\mathrm{min}}italic_q start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT
Figure 1: A 1-dimensional (1D) manipulation example. The task is to push the blue cube to the target. The robot can move in the range [qmin,qmax]subscript𝑞minsubscript𝑞max[q_{\mathrm{min}},q_{\mathrm{max}}][ italic_q start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ]. The exerted force F𝐹Fitalic_F, generated by the policy and the low-level controller, has to overcome the fiction Ffricsubscript𝐹fricF_{\mathrm{fric}}italic_F start_POSTSUBSCRIPT roman_fric end_POSTSUBSCRIPT for the cube to move. Using a motion-centric action space, such as joint positions, the robot can only apply forces indirectly by motion commands. That is by setting the low-level controller’s target further away than the actual target position. This approach has several shortcomings, as discussed in Sec. II and the use of an interaction-explicit action space overcomes them.

Under force-from-motion, manipulation policies can only implicitly exert forces onto their environment by overshooting their actual motion targets. We illustrate the limitations of force-from-motion in a simple 1D pushing example with a prismatic joint, as shown in Fig. 1. The policy outputs joint position targets and its goal is to move the blue cube to the target position. The targets are tracked by a low-level joint impedance controller [15], that controls the joint force

F=K(qdq)+D(qd˙q˙),𝐹𝐾subscript𝑞𝑑𝑞𝐷˙subscript𝑞𝑑˙𝑞F=K(q_{d}-q)+D(\dot{q_{d}}-\dot{q}),italic_F = italic_K ( italic_q start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT - italic_q ) + italic_D ( over˙ start_ARG italic_q start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG - over˙ start_ARG italic_q end_ARG ) , (1)

where K𝐾Kitalic_K and D𝐷Ditalic_D are the stiffness and damping gains, q𝑞qitalic_q and q˙˙𝑞\dot{q}over˙ start_ARG italic_q end_ARG the current joint position and velocity, qdsubscript𝑞𝑑q_{d}italic_q start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT and qd˙˙subscript𝑞𝑑\dot{q_{d}}over˙ start_ARG italic_q start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG the joint position and velocity targets. Moving the cube requires applying a force with magnitude higher than Fminsubscript𝐹minF_{\mathrm{min}}italic_F start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT, depending on the cube’s mass and the surface friction properties. We set D=0𝐷0D=0italic_D = 0 to simplify our discussion and we assume the robot force limits to be higher than Fminsubscript𝐹minF_{\mathrm{min}}italic_F start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT. The policy outputs are within the limits of the robot, i.e. qd[qmin,qmax]subscript𝑞𝑑subscript𝑞minsubscript𝑞maxq_{d}\in[q_{\mathrm{min}},q_{\mathrm{max}}]italic_q start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ∈ [ italic_q start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ].

For the policy to be able of moving the cube, we have

FFminKFminqdq,𝐹subscript𝐹min𝐾subscript𝐹minsubscript𝑞𝑑𝑞F\geq F_{\mathrm{min}}\implies K\geq\frac{F_{\mathrm{min}}}{q_{d}-q},italic_F ≥ italic_F start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ⟹ italic_K ≥ divide start_ARG italic_F start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT end_ARG start_ARG italic_q start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT - italic_q end_ARG , (2)

which is quite problematic for deciding the value of K𝐾Kitalic_K in the setup. Setting, naively, K=Fmin/(qmaxqmin)𝐾subscript𝐹minsubscript𝑞maxsubscript𝑞minK=F_{\mathrm{min}}/(q_{\mathrm{max}}-q_{\mathrm{min}})italic_K = italic_F start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT / ( italic_q start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT - italic_q start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ), the policy will only be able to push the cube at q=qmin𝑞subscript𝑞minq=q_{\mathrm{min}}italic_q = italic_q start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT by setting qd=qmaxsubscript𝑞𝑑subscript𝑞maxq_{d}=q_{\mathrm{max}}italic_q start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT = italic_q start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT, or vice versa. For any q(qmin,qmax)𝑞subscript𝑞minsubscript𝑞maxq\in(q_{\mathrm{min}},q_{\mathrm{max}})italic_q ∈ ( italic_q start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ), i.e. any position but the limits, the generated force F𝐹Fitalic_F is F<Fmin𝐹subscript𝐹minF<F_{\mathrm{min}}italic_F < italic_F start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT and the robot will not be able to move the cube. To increase the usable workspace, manipulate heavier objects, or if the contact surfaces have higher friction, K𝐾Kitalic_K needs to be increased. Increasing K𝐾Kitalic_K soon becomes problematic, especially when the task requires the robot to be compliant or when a human is in the loop. The constraint in Eq. (2) renders the task unsolvable under those requirements. Additionally, high values of K𝐾Kitalic_K and policy jitter can lead to force-clipping and unstable controllers. Allowing the policy to set qdsubscript𝑞𝑑q_{d}italic_q start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT outside the physical limits of the joint [qmin,qmax]subscript𝑞minsubscript𝑞max[q_{\mathrm{min}},q_{\mathrm{max}}][ italic_q start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ] can reduce K𝐾Kitalic_K, but it leads to safety violations near the workspace limits and creates a trade-off between task feasibility and hardware safety. Constraints similar to Eq. (2) can also be derived for higher-order derivative action spaces. As, typically, the magnitude of feasible velocities is greater than the range or the robot joints, using them allows for more compliant control due to the larger denominator. Higher-order derivative action spaces are often adequate, despite not explicitly controlling the interaction forces [12].

Working with light objects, surfaces with low friction coefficients, and minimal human interaction alleviates force-from-motion [1, 16] shortcomings. Under these assumptions, these tasks can be successfully performed with almost any choice of action space [12].

These shortcomings emerge as the force-from-motion action spaces are not explicitly designed for interaction control and the policy applies forces indirectly from motion commands. This illustrates how the choice of action space easily hinders task success. While our 1D example is intentionally simplified, similar conclusions can be drawn for more general settings, e.g., for robots with more degrees of freedom. Scaling robot learning to dynamic and human-robot interaction tasks would require more careful considerations.

III Discussion

There are multiple approaches to overcome the shortcomings of the force-from-motion action spaces. Torque control, where the policy directly outputs joint-level torques, provides full control over the robot interactions. However, learning such policies in the real world is very challenging due to safety considerations. Training them first in simulation before deploying them on the real robot, while possible, suffers from a very large sim-to-real gap compared to other action spaces. This is due to the lack of feedback loops to compensate for dynamic mismatches between simulation and the real robot [12]. Delta action spaces, where the policy output is integrated to obtain a position or velocity target [16] provide a different approach on controlling the interaction forces, but have similar force-from-motion limitations. Delta action spaces introduce additional hidden dynamics and reduce the reactivity of the robot, that further degrade sim-to-real transfer [12]. Applied to our illustrative example, the robot will move the cube only after the position target is integrated sufficiently beyond, to generate the required force.

To overcome the limitations of force-from-motion, we can use interaction-explicit action spaces, as for example in [13, 6, 5, 14], or develop new ones. Interaction-explicit action spaces can accurately control the interaction forces and are better suited for more dynamic manipulation tasks. However, a notable drawback of these spaces is the difficulty of collecting data to train policies using imitation learning, which can significantly boost the learning process by training policies from demonstrations. Interaction-explicit action spaces that are trainable from imitation are currently missing in the literature.

In recent robot learning works, the force-from-motion has been preferred for its simplicity and effectiveness in specific scenarios, particularly when manipulating light objects and the physical robot interactions are limited. However, it is inadequate for general-purpose robotics. This article demonstrates how force-from-motion limits the range of learned behaviors and often results in undesirable effects (e.g., exceeding torque limits), even in basic scenarios. We have emphasized the necessity for more flexible action spaces that can better accommodate physical interactions and dynamic real-world tasks.

Adopting interaction-explicit action spaces could mark a significant advancement towards more robust and general-purpose robotic manipulation learning. Future work should further explore this direction and develop action spaces that are applicable to a large range of real-world-relevant manipulation tasks.

References

  • [1] I. Akkaya, M. Andrychowicz, M. Chociej, M. Litwin, B. McGrew, A. Petron, A. Paino, M. Plappert, G. Powell, R. Ribas, et al., “Solving rubik’s cube with a robot hand,” arXiv preprint arXiv:1910.07113, 2019.
  • [2] M. Alles and E. Aljalbout, “Learning to centralize dual-arm assembly,” Frontiers in Robotics and AI, vol. 9, 2022.
  • [3] M. J. Kim, K. Pertsch, S. Karamcheti, T. Xiao, A. Balakrishna, S. Nair, R. Rafailov, E. Foster, G. Lam, P. Sanketi, et al., “Openvla: An open-source vision-language-action model,” arXiv preprint arXiv:2406.09246, 2024.
  • [4] N. Wahlström, T. B. Schön, and M. P. Deisenroth, “From pixels to torques: Policy learning with deep dynamical models,” arXiv preprint arXiv:1502.02251, 2015.
  • [5] R. Martín-Martín, M. A. Lee, R. Gardner, S. Savarese, J. Bohg, and A. Garg, “Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks,” in 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2019.
  • [6] M. Ulmer, E. Aljalbout, S. Schwarz, and S. Haddadin, “Learning robotic manipulation skills using an adaptive force-impedance action space,” arXiv preprint arXiv:2110.09904, 2021.
  • [7] A. Allshire, R. Martín-Martín, C. Lin, S. Manuel, S. Savarese, and A. Garg, “Laser: Learning a latent action space for efficient reinforcement learning,” in 2021 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 2021.
  • [8] S. Bahl, M. Mukadam, A. Gupta, and D. Pathak, “Neural dynamic policies for end-to-end sensorimotor learning,” Advances in Neural Information Processing Systems, vol. 33, 2020.
  • [9] E. Aljalbout, J. Chen, K. Ritt, M. Ulmer, and S. Haddadin, “Learning vision-based reactive policies for obstacle avoidance,” in Conference on Robot Learning, PMLR, 2021.
  • [10] W. Zhou, S. Bajracharya, and D. Held, “Plas: Latent action space for offline reinforcement learning,” in Conference on Robot Learning, 2020.
  • [11] E. Aljalbout, M. Karl, and P. van der Smagt, “Clas: Coordinating multi-robot manipulation with central latent action spaces,” in Learning for Dynamics and Control Conference, PMLR, 2023.
  • [12] E. Aljalbout, F. Frank, M. Karl, and P. van der Smagt, “On the role of the action space in robot manipulation learning and sim-to-real transfer,” IEEE Robotics and Automation Letters, 2024.
  • [13] C. C. Beltran-Hernandez, D. Petit, I. G. Ramirez-Alpizar, T. Nishi, S. Kikuchi, T. Matsubara, and K. Harada, “Learning force control for contact-rich manipulation tasks with rigid position-controlled robots,” IEEE Robotics and Automation Letters, 2020.
  • [14] J. Luo, E. Solowjow, C. Wen, J. A. Ojea, A. M. Agogino, A. Tamar, and P. Abbeel, “Reinforcement learning on variable impedance controller for high-precision robotic assembly,” in 2019 International Conference on Robotics and Automation (ICRA), IEEE, 2019.
  • [15] N. HOGAN, “Impedance control: an approach to manipulation. ii: Implementation,” Journal of dynamic systems, measurement, and control, 1985.
  • [16] B. Tang, M. A. Lin, I. Akinola, A. Handa, G. S. Sukhatme, F. Ramos, D. Fox, and Y. Narang, “Industreal: Transferring contact-rich assembly tasks from simulation to reality,” arXiv preprint arXiv:2305.17110, 2023.