End-to-End Offline Goal-Oriented Dialog Policy Learning via Policy Gradient

Zhou, Li; Small, Kevin; Rokhlenko, Oleg; Elkan, Charles

Computer Science > Artificial Intelligence

arXiv:1712.02838 (cs)

[Submitted on 7 Dec 2017]

Title:End-to-End Offline Goal-Oriented Dialog Policy Learning via Policy Gradient

Authors:Li Zhou, Kevin Small, Oleg Rokhlenko, Charles Elkan

View PDF

Abstract:Learning a goal-oriented dialog policy is generally performed offline with supervised learning algorithms or online with reinforcement learning (RL). Additionally, as companies accumulate massive quantities of dialog transcripts between customers and trained human agents, encoder-decoder methods have gained popularity as agent utterances can be directly treated as supervision without the need for utterance-level annotations. However, one potential drawback of such approaches is that they myopically generate the next agent utterance without regard for dialog-level considerations. To resolve this concern, this paper describes an offline RL method for learning from unannotated corpora that can optimize a goal-oriented policy at both the utterance and dialog level. We introduce a novel reward function and use both on-policy and off-policy policy gradient to learn a policy offline without requiring online user interaction or an explicit state space definition.

Comments:	Workshop on Conversational AI, NIPS 2017, Long Beach, CA, USA
Subjects:	Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:1712.02838 [cs.AI]
	(or arXiv:1712.02838v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.1712.02838

Submission history

From: Li Zhou [view email]
[v1] Thu, 7 Dec 2017 19:52:50 UTC (18 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.AI

< prev | next >

new | recent | 2017-12

Change to browse by:

cs
cs.CL
cs.LG

References & Citations

DBLP - CS Bibliography

listing | bibtex

Li Zhou
Kevin Small
Oleg Rokhlenko
Charles Elkan

export BibTeX citation

Computer Science > Artificial Intelligence

Title:End-to-End Offline Goal-Oriented Dialog Policy Learning via Policy Gradient

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:End-to-End Offline Goal-Oriented Dialog Policy Learning via Policy Gradient

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators