Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
107 views

Automated Chatbot Implemented Using Natural Language Processing PDF

The document describes an automated chatbot implemented using natural language processing techniques. It discusses how the chatbot takes user questions as input, preprocesses the questions and stored answers using techniques like tokenization, lemmatization and stemming. It then calculates the similarity between the question and answers using cosine similarity and TF-IDF to assign scores and provide the highest scoring answer. The chatbot is intended to accurately answer customer questions for companies by matching user queries to stored responses.

Uploaded by

IRJMETS JOURNAL
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
107 views

Automated Chatbot Implemented Using Natural Language Processing PDF

The document describes an automated chatbot implemented using natural language processing techniques. It discusses how the chatbot takes user questions as input, preprocesses the questions and stored answers using techniques like tokenization, lemmatization and stemming. It then calculates the similarity between the question and answers using cosine similarity and TF-IDF to assign scores and provide the highest scoring answer. The chatbot is intended to accurately answer customer questions for companies by matching user queries to stored responses.

Uploaded by

IRJMETS JOURNAL
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

e-ISSN: 2582-5208

International Research Journal of Modernization in Engineering Technology and Science


Volume:02/Issue:08/August-2020 Impact Factor- 5.354 www.irjmets.com

AUTOMATED CHATBOT IMPLEMENTED USING NATURAL


LANGUAGE PROCESSING
Naveen S*1
*1Department of Computer, Science & Anna University, Chennai, INDIA.
E-mail : naveenatt99@gmail.com
ABSTRACT
In this paper we focus on, providing a Chatbot that will see to all our queries and will provide a solution
or answer to that. Usually, companies will be having a backend team who will be answering the
customer’s questions. This is generally a time consuming and tedious job to be done. For solving these
problems, Chatbot was created. Generally, the frequently asked customer questions corresponding
answers are stored in a text file. So in this model, it will take the customer’s question as input, pre-
processing them using some Natural Language Processing techniques that include Tokenization,
Lemmatization, and stemming, find the cosine similarity between the question and answers, and provides
a score for each answer, and the answer with more score will be considered as the answer for the given
question. The answer text file varies from company to company since questions can vary between
companies based on the different products available. Hence, the main purpose of the Chatbot is to provide
high accuracy by proving the correct and satisfying answer to the customer's question for a company.
This paper will be useful to all the Multi-National Companies, by proving a Chatbot model that would
output accurate and satisfying answers for the questions asked by their renowned customers.
KEYWORDS: Natural Language Processing; Tokenization; Lemmatisation; Stemming; Cosine Similarity;
Term Frequency-Inverse Document Frequency;
I. INTRODUCTION
When a customer buys a product from a company, they will be having lots of speculations about the
details of the product. So to solve these problems, companies have hired a backend team, to provide
answers to the customer's queries. This is generally a hectic job and requires a large team to operate on it.
Hence, the motive is to provide a Chatbot that will answer all the questions of the customers [2]. This will
consume time and saves a large amount of money. Now a day, Companies are replacing their backend
team by Chatbot. This Chatbot can also be useful in the field of academics, real estate, marketing was
there will be more queries to be solved.
Generally in this model, the repeatedly asked question’s answer will be stored in a text file and each
answer will be provided with a score based on the user given question [1]. The answer with the higher
score will be provided as the answer to the user's query. The main techniques that are used in the score
calculation are the Cosine Similarity and the TF-IDF approach. Hence by these techniques, the answers
with high precision are provided to the user, which is the main goal of this Chatbot Model [3].
II. SYSTEM ARCHITECTURE

Fig.1: Proposed Model for converting Answer Text Sheet to a pre-processed answer text

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science


[620]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
Volume:02/Issue:08/August-2020 Impact Factor- 5.354 www.irjmets.com

Fig.2: Final Proposed Model of Chatbot using Natural Language Processing


III. PSUEDO CODE FOR THE MODEL
Algorithm for providing the score for the answers based on the customer's question:
1. Start
CLASS PROCESSING:
2. FUNCTION initialization():
a. Start
b. Open the Chatbot dataset in the read mode
c. Assign the data to read to a variable Raw
d. Raw=Raw.lower()
e. Store the sentence tokens in sen_tok
f. Store the word tokens in word_tok
g. Lemmatize the tokens and remove punctuations.
3. FUNCTION LemTokens(): //Lemmatization
a. Lemmatize the tokens
b. END
4. FUNCTION stemTokens(): //Stemming
a. Stemming the tokens
b. END
5. FUNCTION greeting(sentence):
a. Create GREET_INPUTS=array (“hello”,”hi”,”greetings”,”sup”,”hey”).
b. Create GREET_RESPONSES = array ("hi", "hey", "hi there", "hello", "I am glad! You are talking to me").
c. FOR words in the sentence:
i. IF word in GREETING_INPUTS:
1. RETURN random choice in GREET_RESPONSES
d. END
CLASS USER:
6. FUNCTION response(user_responses):
a. APPEND sen_tok and user_responses
b. CREATE a variable Tf-idVector to store the vector //TF-IDF Approach
c. STORE the vector of sen_tok in Tf-idVector
d. STORE the cosine similarity of the question and answer into VAL
e. SORT the values in VAL and STORE it in FLAT // Cosine Similarity
f. STORE the index value
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[621]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
Volume:02/Issue:08/August-2020 Impact Factor- 5.354 www.irjmets.com
g. FIND the SCORE in FLAT [-2]
h. IF SCORE==0:
i. Response=”I am sorry! I don’t understand you”
ii. ADD the unknown question into the question dataset
iii. RETURN Response
i. ELSE:
i. Response= sen_tok[index]
ii. RETURN Response
j. END
7. FUNCTION Chatbot():
a. CALL PROCESSING
b. FUNCTION send():
i. GET user_responses.
ii. IF user_responses!=”bye”:
1. IF user_responses= “Thank You”:
a. END
iii. ELSE:
1. IF user_responses=GREETINGS():
a. CALL greeting.
2. ELSE:
a. CALL response(user_responses).
c. END FUNCTION send().
d. CREATE an executable file for the Chatbot.
e. RUN the user_responses on the executable file of Chatbot.
f. END.

Fig.3: Final Output after implementing the above algorithm


IV. TOKENIZATION
Generally, NLP Pre-Processing techniques are used to decrease the processing time. In this Chatbot Model
[1], the most important NLP pre-processing technique that needed to be used is the Tokenization. In This
Pre-processing technique, each of the words in the given sentence is separated and it is stored in a
separate list for words. This pre-processing technique is applied to both the question and the answers. In
this, the tokens of the question will be stored in one separate list and the tokens of the answers will be
stored in other separate lists [5] so that comparison and providing a score between these lists can be
done easily. This tokenization will help in increasing the accuracy of providing the correct score when the
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[622]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
Volume:02/Issue:08/August-2020 Impact Factor- 5.354 www.irjmets.com
tokenized word of question is compared with the tokenized word of the answers [4]. This tokenization
also helps to faster the processing of Lemmatisation and Stemming. All these pre-processing techniques
are available in the Natural Language Processing Tool Kit (NLTK).

Fig.4: The above figure shows the tokenization into words

Fig.5: The above figure shows the tokenization into sentences


V. LEMMATISATION AND STEMMING
For decreasing the processing time, the other pre-processing techniques like lemmatization and
stemming are included in the model. Lemmatization and stemming both the techniques are served for the
same particular purpose. These pre-processing techniques are used to reduce the count of words that are
repeated in the given sentence. It reduces the count of words by finding the root of the word. For
Example, if a sentence has words like Playing, Play, Played, then these techniques will consider all these
three words as one word as Play, so that it decreases the number of words stored in the list and increases
the processing speed. This is done by both techniques. These techniques will also find the root word for
the words that are adjectively related. This NLP pre-processing technique is considered in all the models
were the Natural Language Processing plays a major role. In this Chatbot model, it helps in increasing the
processing speed and decreasing the processing time.
VI. COSINE SIMILARITY AND TERM FREQUENCY-INVERSE DOCUMENT
FREQUENCY (TF-IDF)
After the pre-processing techniques, next comes the important part of the Chatbot model. After the
question and all the answers are pre-processed into two separate lists, they needed to be compared to get
the scores. So this is done using Cosine Similarity and TF-IDF approach. The first approach is to rescale
the frequency of words by how often they appear in all documents so that the scores for frequent words
like “the” that are also frequent across all documents are penalized. This approach is generally called as
Term Frequency-Inverse Document Frequency or TF-IDF [6]. To generate a response from the Chatbot for
input questions, the concept of Cosine similarity will be used. A function is defined in which searches the
user’s question for one or more known Tokens and returns one of several possible answers. In this Cosine
Similarity, Score will be provided for each answer based on the occurrence of the token in the question. If
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[623]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
Volume:02/Issue:08/August-2020 Impact Factor- 5.354 www.irjmets.com
the token in the question occurs in the answer more, than that answer will be provided with a high score.
After providing every answer a score, the answer with a higher score will be displayed as an answer to
the corresponding question.

Fig.6: Code for the cosine similarity and TF-IDF approach


VII. CONCLUSION
Technology in recent years is in great growth. All the fields in the technology market are being adapted
and being updated to the newer technology. Customer satisfaction plays a major role in the company's
improvement. So the outdated call centers needed to be replaced by the Chatbot to provide highly
accurate answers and good customer satisfaction. This Chatbot will reduce the man's work and
drastically reduces the cost the company spends on the call center or a backend team. If the questions are
unable to answer, then the answer dataset can be updated to newer questions and increase their
accuracy. So this Chatbot model will be helpful in all the fields that require complete customer satisfaction
leaving them without any questions about the product.
ACKNOWLEDGEMENT
I would like to thank Ms. Dr.D.Indhumathi (Mentor), PSG College of technology for supporting my work,
and other faculties of the Department of Computer Science and engineering and its staff; students, and my
colleagues who helped me in publishing my work.
VIII. REFERENCE
[1] Abu-Jbara, A., Ezra, J., and Radev, D. R., 2013. Purpose and polarity of citation: Towards nlp-based
bibliometrics. In HLT-NAACL, Atlanta, Georgia, USA, Association for Computational Linguistics,
pp. 596–606.
[2] Feldman, S. (1999). NLP Meets the Jabberwocky: Natural Language Processing in Information
Retrieval. ONLINE-WESTON THEN WILTON-, 23, 62-73.
[3] Manning, C. D., & Schütze, H. (1999). Foundations of statistical natural language processing (Vol.
999). Cambridge: MIT press.
[4] Menaka. Text Classification using Keyword Extraction Technique, Corpus ID: 212463857, June
2014.
[5] Saif M. Mohammad. 2020b. Nlp scholar: A dataset for examining the state of nlp research. In
Proceedings of the 12th Language Resources and Evaluation Conference (LREC-2020), Marseille,
France.
[6] X. Chen, H. Xie, F. Wang, Z. Liu, J. Xu, and T. Hao, “Natural Language Processing in Medical Research:
A Bibliometric Analysis,” BMC Medical Informatics and Decision Making, vol. 18, supplement 1, no.
14, 2018.
[7] X. Schmitt, S. Kubler, J. Robert, M. Papadakis and Y. LeTraon,A Replicable Comparison Study of NER
Software: StanfordNLP, NLTK, OpenNLP, SpaCy, Gate,2019 Sixth International Conference on Social
Networks Analysis, Management and Security (SNAMS), Granada, Spain, 2019, pp. 338-343.

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science


[624]

You might also like