Authors:
Diogo Campos
1
;
Rodrigo Rocha Silva
2
and
Jorge Bernardino
3
Affiliations:
1
Polytechnic of Coimbra - ISEC, Rua Pedro Nunes, Quinta da Nora, 3030-199 Coimbra and Portugal
;
2
Centre of Informatics and Systems of University of Coimbra, Pinhal de Marrocos, 3030-290, Coimbra, Portugal, FATEC Mogi das Cruzes, São Paulo Technological College, 08773-600 Mogi das Cruzes and Brazil
;
3
Polytechnic of Coimbra - ISEC, Rua Pedro Nunes, Quinta da Nora, 3030-199 Coimbra, Portugal, Centre of Informatics and Systems of University of Coimbra, Pinhal de Marrocos, 3030-290, Coimbra and Portugal
Keyword(s):
Text Mining, Sentiment Analysis, Text Cube, Machine Learning, Stemming.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Clustering and Classification Methods
;
Computational Intelligence
;
Evolutionary Computing
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Machine Learning
;
Mining Text and Semi-Structured Data
;
Pre-Processing and Post-Processing for Data Mining
;
Soft Computing
;
Symbolic Systems
Abstract:
Text Mining is the process of extracting interesting and non-trivial patterns or knowledge from unstructured text documents. Hotel Reviews are used by hotels to verify client satisfaction regarding their own services or facilities. However, we can’t deal with this type of big and unstructured data manually, so we should use OLAP techniques and Text Cube for modelling and manage text data. But then, we have a problem, we must separate the reviews in two classes, positive and negative, and for that, we use Sentiment Analysis technique. Nevertheless, do we really need all the words of a review to make the right classification? In this paper, we will study the impact of word restriction on text classification. To do that, we create some words domains (words that belong to a Hotel Domain). First, we use an algorithm that will pre-process the text (where we use our created domains like stop words). In the experimental evaluation, we use four classifiers to classify the text, Naïve-Bayes, D
ecision-Tree, Random-Forest, and Support Vector Machine.
(More)