research-article

Decomposed Q-Learning for Non-Prehensile Rearrangement Problem

Authors:

Hogun Kee,

Dohyeong Kim,

Songhwai OhAuthors Info & Claims

2021 21st International Conference on Control, Automation and Systems (ICCAS)

Pages 1756 - 1759

https://doi.org/10.23919/ICCAS52745.2021.9649975

Published: 12 October 2021 Publication History

Abstract

In this paper, we address a planar non-prehensile rearrangement task. As a problem of pushing objects to desired target points, we model the problem as Multi-objective Markov decision processes (MOMDPs) to efficiently solve it and propose a method of finding policies. The proposed method learns object-wise Q-value functions to learn the dynamics according to the behavior of the robot arm by individual objects. With this method, we can increase sample efficiency and improve learning speed over learning policies for multiple objects with a single Q-value function.To this end, we use the Deep Q learning framework, and since we have vision input, we can obtain a Q-value function for each pixel using the fully convolution method. Based on this learned object-wise Q-value function, we determine the behavior of the robot arm, where we confirm that the maximum strategy has the highest performance.

References

[1]

K. V. Moffaert, M. M. Drugan, and A. Nowé, “Scalarized multi-objective reinforcement learning: Novel design techniques,” in IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pp. 191–199, 2013.

Google Scholar

[2]

D. Kalashnikov, A. Irpan, P. Pastor, J. Ibarz, A. Herzog, E. Jang, D. Quillen, E. Holly, M. Kalakrish-nan, V. Vanhoucke, and S. Levine, “Qt-opt: Scalable deep reinforcement learning for vision-based robotic manipulation,” in Conference on Robot Learning (CoRL), 2018.

Google Scholar

[3]

A. Zeng, S. Song, S. Welker, J. Lee, A. Rodriguez, and T. Funkhouser, “Learning synergies between pushing and grasping with self-supervised deep reinforcement learning,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4238–4245, 2018.

Google Scholar

[4]

C. J. C. H. Watkins and P. Dayan, “Q-learning,” Machine learning, vol. 8, no. 3–4, pp. 279–292, 1992.

Digital Library

Google Scholar

[5]

M. Wilson and T. Hermans, “Learning to manipulate object collections using grounded state representations,” in Conference on Robot Learning (CoRL), pp. 490–502, PMLR, 2020.

Google Scholar

[6]

A. Nair, V. Pong, M. Dalal, S. Bahl, S. Lin, and S. Levine, “Visual reinforcement learning with imagined goals,” in Advances in Neural Information Processing Systems (NIPS), 2018.

Google Scholar

[7]

Y. Lin, J. Huang, M. Zimmer, Y. Guan, J. Rojas, and P. Weng, “Invariant transform experience replay: Data augmentation for deep reinforcement learning,” IEEE Robotics and Automation Letters, vol. 5, no. 4, pp. 6615–6622, 2020.

Crossref

Google Scholar

[8]

M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, P. Welinder, B. McGrew, J. Tobin, P. Abbeel, and W. Zaremba, “Hindsight experience replay,” in Advances in Neural Information Processing Systems (NIPS), 2017.

Google Scholar

[9]

V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforcement learning,” in Advances in Neural Information Processing Systems (NIPS), 2013.

Google Scholar

Index Terms

Decomposed Q-Learning for Non-Prehensile Rearrangement Problem

Index terms have been assigned to the content through auto-classification.

Recommendations

Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
Neural Information Processing
Abstract
As the two hottest branches of machine learning, deep learning and reinforcement learning both play a vital role in the field of artificial intelligence. Combining deep learning with reinforcement learning, deep reinforcement learning is a method ...
Backward Q-learning: The combination of Sarsa algorithm and Q-learning

Reinforcement learning (RL) has been applied to many fields and applications, but there are still some dilemmas between exploration and exploitation strategy for action selection policy. The well-known areas of reinforcement learning are the Q-learning ...
Decomposed Deep Q-Network for Coherent Task-Oriented Dialogue Policy Learning
Reinforcement learning (RL) has emerged as a key technique for designing dialogue policies. However, action space inflation in dialogue tasks has led to a heavy decision burden and incoherence problems for dialogue policies. In this paper, we propose a ...

Comments

Information & Contributors

Information

Published In

2021 21st International Conference on Control, Automation and Systems (ICCAS)

Oct 2021

1691 pages

Publisher

IEEE Press

Publication History

Published: 12 October 2021

Qualifiers

Research-article

Index Terms

Recommendations

Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning

Backward Q-learning: The combination of Sarsa algorithm and Q-learning

Decomposed Deep Q-Network for Coherent Task-Oriented Dialogue Policy Learning

Comments

Published In

Publisher

Publication History

Qualifiers

Other Metrics

Article Metrics

Other Metrics

Abstract

References

Index Terms

Recommendations

Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning

Backward Q-learning: The combination of Sarsa algorithm and Q-learning

Decomposed Deep Q-Network for Coherent Task-Oriented Dialogue Policy Learning

Comments

Information

Published In

Publisher

Publication History

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

Share

Share this Publication link

Share on social media

Affiliations