Recurrent Deterministic Policy Gradient Method for Bipedal Locomotion on Rough Terrain Challenge

Song, Doo Re; Yang, Chuanyu; McGreavy, Christopher; Li, Zhibin

doi:10.1109/ICARCV.2018.8581309

Computer Science > Artificial Intelligence

arXiv:1710.02896 (cs)

[Submitted on 8 Oct 2017 (v1), last revised 15 Dec 2019 (this version, v6)]

Title:Recurrent Deterministic Policy Gradient Method for Bipedal Locomotion on Rough Terrain Challenge

Authors:Doo Re Song, Chuanyu Yang, Christopher McGreavy, Zhibin Li

View PDF

Abstract:This paper presents a deep learning framework that is capable of solving partially observable locomotion tasks based on our novel interpretation of Recurrent Deterministic Policy Gradient (RDPG). We study on bias of sampled error measure and its variance induced by the partial observability of environment and subtrajectory sampling, respectively. Three major improvements are introduced in our RDPG based learning framework: tail-step bootstrap of interpolated temporal difference, initialisation of hidden state using past trajectory scanning, and injection of external experiences learned by other agents. The proposed learning framework was implemented to solve the Bipedal-Walker challenge in OpenAI's gym simulation environment where only partial state information is available. Our simulation study shows that the autonomous behaviors generated by the RDPG agent are highly adaptive to a variety of obstacles and enables the agent to effectively traverse rugged terrains for long distance with higher success rate than leading contenders.

Comments:	Published in IEEE proceedings: 2018 15th International Conference on Control, Automation, Robotics and Vision (IEEE-ICARCV)
Subjects:	Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
Cite as:	arXiv:1710.02896 [cs.AI]
	(or arXiv:1710.02896v6 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.1710.02896
Journal reference:	The Institute of Electrical and Electronics Engineers 2018 15th International Conference on Control, Automation, Robotics and Vision (IEEE-ICARCV)
Related DOI:	https://doi.org/10.1109/ICARCV.2018.8581309

Submission history

From: Doo Re Song [view email]
[v1] Sun, 8 Oct 2017 22:38:34 UTC (1,809 KB)
[v2] Fri, 19 Jan 2018 10:12:48 UTC (1,723 KB)
[v3] Sun, 6 May 2018 16:54:06 UTC (2,614 KB)
[v4] Sat, 11 Aug 2018 09:55:39 UTC (2,665 KB)
[v5] Thu, 13 Sep 2018 08:40:02 UTC (2,622 KB)
[v6] Sun, 15 Dec 2019 11:03:01 UTC (3,265 KB)

Computer Science > Artificial Intelligence

Title:Recurrent Deterministic Policy Gradient Method for Bipedal Locomotion on Rough Terrain Challenge

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:Recurrent Deterministic Policy Gradient Method for Bipedal Locomotion on Rough Terrain Challenge

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators