Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Implementing Chatbots
using Deep Learning.
By : Rohan Chikorde
Introduction
What is a CHATBOT?
 A chat robot, a computer program that simulates human
conversation, or chat, through artificial intelligence.
 It is a service, powered by rules and artificial intelligence,
that you interact with via a chat interface.
 The service could be any number of things, ranging from
functional to fun, and it could live in any major chat product
(Facebook Messenger, Slack, Telegram, Text Messages, etc).
List of best AI Chatbots:
 Mitsuku (Leobner Prize Winner) - Prize in AI for Chatbots in 2013
 Jabberwacky
 PersonalityForge
 Botser
 Cleverbot
* http://www.techstext.com/list-of-best-chatbots-to-converse/
Types of Chatbot
 RETRIEVAL-BASED MODELS -
o Uses a repository of predefined responses and some kind of heuristic to
pick an appropriate response based on the input and context.
o The heuristic could be as simple as a rule-based expression match, or as
complex as an ensemble of Machine Learning classifiers.
 GENERATIVE MODELS-
o This bot has an artificial brain AKA artificial intelligence. You don’t have
to be ridiculously specific when you are talking to it. It understands
language, not just commands.
o This bot continuously gets smarter as it learns from conversations it has
with people.
Open Domain vs. Closed Domain
 In an open domain setting, the user can take the
conversation anywhere. There isn’t necessarily have a well-
defined goal or intention.
Ex: Conversation about refinancing one’s mortgage
 In a closed domain setting, the space of possible inputs and
outputs is somewhat limited because the system is trying to
achieve a very specific goal.
Ex : Hotel’s Customer Support or Shopping Assistants
 The longer the conversation the more difficult to automate it because it need to keep track of
what has been said.
Ex: Customer support conversations.
 Short-Text Conversations where the goal is to create a single response to a single input.
Ex: What is your name?
Long vs Short Conversations
Implementing a
Retrieval-Based
Model In
TensorFlow
Architecture
of AI Chatbot
Retrieval Based Model
 The vast majority of production systems today are retrieval-based, or a combination of
retrieval-based and generative model.
 Generative models are an active area of research, but we’re not quite there yet.
 For building Hotel’s Customer Support, right now best bet is most likely a retrieval-based
model.
The Ubuntu Dialog Corpus
 The Ubuntu Dialog Corpus (UDC) is one of the largest public dialog datasets available.
 It’s based on chat logs from the Ubuntu channels on a public IRC network.
 The training data consists of 1,000,000 examples, 50% positive (label 1) and 50% negative
(label 0).
 Each example consists of a context, the conversation up to this point, and an utterance, a
response to the context.
 The dataset originally comes in CSV format. We could work directly with CSVs, but it’s better
to convert our data into Tensorflow’s proprietary Example format.
 The main benefit of this format is that it allows us to load tensors directly from the input files
and let Tensorflow handle all the shuffling, batching and queuing of inputs. As part of the
preprocessing, also create a vocabulary.
 This means we map each word to an integer number, e.g. “cat” may become 2631. The
TFRecord files which will generate store these integer numbers instead of the word strings. Its
better to save the vocabulary so that we can map back from integers to words later on.
Data Pre-processing
 One of the Deep Learning model for building chatbot is called a Dual Encoder LSTM network.
 There are many Deep Learning architectures – it’s an active research area.
 seq2seq model often used in Machine Translation would probably do well on this task.
Deep Learning Model
 tf-idf predictor
o tf-idf stands for “term frequency – inverse document” frequency and it measures how important a
word in a document is relative to the whole corpus.
o Documents that have similar content will have similar tf-idf vectors.
o Intuitively, if a context and a response have similar words they are more likely to be a correct pair.
Implementation…
Dual Encoder LSTM Model
Working of Dual Encoder LSTM
 Both the context and the response text are split by words, and each word is embedded into a
vector. The word embedding are initialized with Stanford’s GloVe vectors and are fine-tuned during
training.
 Both the embedded context and response are fed into the same Recurrent Neural Network word-
by-word. The RNN generates a vector representation that, loosely speaking, captures the “meaning”
of the context and response (c and r).
 It then, multiply c with a matrix M to “predict” a response r’. The matrix M is learned during
training.
 It measure the similarity of the predicted response r’ and the actual response r by taking the dot
product of these two vectors. A large dot product means the vectors are similar and that the
response should receive a high score.
 Then it applies a sigmoid function to convert that score into a probability.
Creating an Input Function
 In order to use Tensorflow’s built-in support for training and evaluation we need to create an
input function — a function that returns batches of our input data.
 In fact, because our training and test data have different formats, we need different input
functions for them. The input function should return a batch of features and labels.
Steps:
 On a high level, the function does the following:
o Create a feature definition that describes the fields in our Example file
o Read records from the input_files with tf.TFRecordReader
o Parse the records according to the feature definition
o Extract the training labels
o Batch multiple examples and training labels
o Return the batched examples and training labels
Creating the Model
 As we have different formats of training and evaluation data we have to create a function
wrapper that take care of bringing the data into the right format.
 It takes a model argument, which is a function that actually makes predictions.
 In our case it’s the Dual Encoder LSTM, but we could easily swap it out for some other neural
network
Evaluating the model & making Predictions
 After training the model we can evaluate it on the test set.
 This will run the evaluation metrics on the test set instead of the validation set.
 We will get probability scores for unseen data.
 We could imagine feeding in 100 potential responses to a context and then picking the one
with the highest score.
References
 The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn
Dialogue Systems
o https://arxiv.org/abs/1506.08909
 Artificial intelligence markup language (aiml).
o http://alice.sunlitsurf.com/alice/aiml.html.
 Intelligent Chat Bot for Banking System
o http://www.ijettcs.org/Volume4Issue5(2)/IJETTCS-2015-10-09-16.pdf
 WILDML, Deep Learning for Chatbot
o http://www.wildml.com/2016/07/deep-learning-for-chatbots-2-retrieval-based-model-tensorflow/
Thank You

More Related Content

Chatbot_Presentation

  • 1. Implementing Chatbots using Deep Learning. By : Rohan Chikorde
  • 3. What is a CHATBOT?  A chat robot, a computer program that simulates human conversation, or chat, through artificial intelligence.  It is a service, powered by rules and artificial intelligence, that you interact with via a chat interface.  The service could be any number of things, ranging from functional to fun, and it could live in any major chat product (Facebook Messenger, Slack, Telegram, Text Messages, etc).
  • 4. List of best AI Chatbots:  Mitsuku (Leobner Prize Winner) - Prize in AI for Chatbots in 2013  Jabberwacky  PersonalityForge  Botser  Cleverbot * http://www.techstext.com/list-of-best-chatbots-to-converse/
  • 5. Types of Chatbot  RETRIEVAL-BASED MODELS - o Uses a repository of predefined responses and some kind of heuristic to pick an appropriate response based on the input and context. o The heuristic could be as simple as a rule-based expression match, or as complex as an ensemble of Machine Learning classifiers.  GENERATIVE MODELS- o This bot has an artificial brain AKA artificial intelligence. You don’t have to be ridiculously specific when you are talking to it. It understands language, not just commands. o This bot continuously gets smarter as it learns from conversations it has with people.
  • 6. Open Domain vs. Closed Domain  In an open domain setting, the user can take the conversation anywhere. There isn’t necessarily have a well- defined goal or intention. Ex: Conversation about refinancing one’s mortgage  In a closed domain setting, the space of possible inputs and outputs is somewhat limited because the system is trying to achieve a very specific goal. Ex : Hotel’s Customer Support or Shopping Assistants
  • 7.  The longer the conversation the more difficult to automate it because it need to keep track of what has been said. Ex: Customer support conversations.  Short-Text Conversations where the goal is to create a single response to a single input. Ex: What is your name? Long vs Short Conversations
  • 10. Retrieval Based Model  The vast majority of production systems today are retrieval-based, or a combination of retrieval-based and generative model.  Generative models are an active area of research, but we’re not quite there yet.  For building Hotel’s Customer Support, right now best bet is most likely a retrieval-based model.
  • 11. The Ubuntu Dialog Corpus  The Ubuntu Dialog Corpus (UDC) is one of the largest public dialog datasets available.  It’s based on chat logs from the Ubuntu channels on a public IRC network.  The training data consists of 1,000,000 examples, 50% positive (label 1) and 50% negative (label 0).  Each example consists of a context, the conversation up to this point, and an utterance, a response to the context.
  • 12.  The dataset originally comes in CSV format. We could work directly with CSVs, but it’s better to convert our data into Tensorflow’s proprietary Example format.  The main benefit of this format is that it allows us to load tensors directly from the input files and let Tensorflow handle all the shuffling, batching and queuing of inputs. As part of the preprocessing, also create a vocabulary.  This means we map each word to an integer number, e.g. “cat” may become 2631. The TFRecord files which will generate store these integer numbers instead of the word strings. Its better to save the vocabulary so that we can map back from integers to words later on. Data Pre-processing
  • 13.  One of the Deep Learning model for building chatbot is called a Dual Encoder LSTM network.  There are many Deep Learning architectures – it’s an active research area.  seq2seq model often used in Machine Translation would probably do well on this task. Deep Learning Model
  • 14.  tf-idf predictor o tf-idf stands for “term frequency – inverse document” frequency and it measures how important a word in a document is relative to the whole corpus. o Documents that have similar content will have similar tf-idf vectors. o Intuitively, if a context and a response have similar words they are more likely to be a correct pair. Implementation…
  • 16. Working of Dual Encoder LSTM  Both the context and the response text are split by words, and each word is embedded into a vector. The word embedding are initialized with Stanford’s GloVe vectors and are fine-tuned during training.  Both the embedded context and response are fed into the same Recurrent Neural Network word- by-word. The RNN generates a vector representation that, loosely speaking, captures the “meaning” of the context and response (c and r).  It then, multiply c with a matrix M to “predict” a response r’. The matrix M is learned during training.  It measure the similarity of the predicted response r’ and the actual response r by taking the dot product of these two vectors. A large dot product means the vectors are similar and that the response should receive a high score.  Then it applies a sigmoid function to convert that score into a probability.
  • 17. Creating an Input Function  In order to use Tensorflow’s built-in support for training and evaluation we need to create an input function — a function that returns batches of our input data.  In fact, because our training and test data have different formats, we need different input functions for them. The input function should return a batch of features and labels.
  • 18. Steps:  On a high level, the function does the following: o Create a feature definition that describes the fields in our Example file o Read records from the input_files with tf.TFRecordReader o Parse the records according to the feature definition o Extract the training labels o Batch multiple examples and training labels o Return the batched examples and training labels
  • 19. Creating the Model  As we have different formats of training and evaluation data we have to create a function wrapper that take care of bringing the data into the right format.  It takes a model argument, which is a function that actually makes predictions.  In our case it’s the Dual Encoder LSTM, but we could easily swap it out for some other neural network
  • 20. Evaluating the model & making Predictions  After training the model we can evaluate it on the test set.  This will run the evaluation metrics on the test set instead of the validation set.  We will get probability scores for unseen data.  We could imagine feeding in 100 potential responses to a context and then picking the one with the highest score.
  • 21. References  The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems o https://arxiv.org/abs/1506.08909  Artificial intelligence markup language (aiml). o http://alice.sunlitsurf.com/alice/aiml.html.  Intelligent Chat Bot for Banking System o http://www.ijettcs.org/Volume4Issue5(2)/IJETTCS-2015-10-09-16.pdf  WILDML, Deep Learning for Chatbot o http://www.wildml.com/2016/07/deep-learning-for-chatbots-2-retrieval-based-model-tensorflow/