Abstract
Most sentiment analysis researches are done with the help of supervised machine learning techniques. Analyzing sentiment for these English text reviews is a non-trivial task in order to gauge public perception and acceptance of a particular issue being addressed. Nevertheless, there are not many studies conducted on analyzing sentiment of Malay news headlines due to lack of resources and tools. The Malay news headlines normally consist of a few words and are often written with creativity to attract the readers’ attention. This paper proposes a standard framework that investigates factors affecting sentiment prediction of Malay news headlines using machine learning approaches. It is important to investigate factors (e.g., types of classifiers, proximity measurements and number of Nearest Neighbors, k) that influence the prediction performance of the sentiment analysis as it helps to study and understand the parameters that can be tuned to optimize the prediction performance. Based on the results obtained, Support Vector Machine and Naïve Bayes classifiers were capable to obtain higher accuracy compared to the k-Nearest Neighbors (k-NN) classifier. In term of proximity measurement and number of Nearest Neighbors, k, the k-NN classifier achieved higher prediction performance when the Cosine similarity is applied with a small value of k (e.g., 3 and 5), compared to the Euclidean distance because it measures can be affected by the high dimensionality of the data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Liu, B.: Sentiment Analysis and Opinion Mining. Morgan & Claypool Publishers, San Rafael (2012)
Cassinelli, A., Chen, C.-W.: CS 224 N Final Project Boost up! Sentiment Categorization with Machine Learning Techniques. Stanford University: The Stanford Natural Language Processing Group (2009)
Gebremeskel, G.: Sentiment Analysis of Twitter posts about news. University of Malta: Department of Computer Science and Artificial Intelligence (2011)
Thelwall, M., Buckley, K., Paltoglou, G.: Sentiment in twitter events. J. Am. Soc. Inform. Sci. Technol. 62(2), 406–418 (2011)
Noah, S.A., Ismail, F.: Automatic classifications of Malay proverbs using naïve bayesian algorithm. Inf. Technol. J. 7(7), 1016–1022 (2008)
Kaur, J., Saini, J.R.: An analysis of opinion mining research works based on language, writing style and feature selection parameters. Int. J. Adv. Netw. Appl. (2013)
Naradhipa, A.R., Purwarianti, A.: Sentiment classification for indonesian message in social media. In: International Conference on Electrical Engineering and Informatics 17–19 July, Bandung, Indonesia (2011)
Jamal, N.: Masnizah mohd and shahrul azman noah: poetry classification using support vector machines. J. Comput. Sci. 8(9), 1441–1446 (2012)
Alsaffar, A., Omar, N.: Study on feature selection and machine learning algorithms for Malay sentiment classification. In: ICIMU2014, Putrajaya, Malaysia (2014)
Zhang, W., Gao, F.: An improvement to naive bayes for text classification. Proc. Eng. 15, 2160–2164 (2011)
Multilingual sentiment-Data Science Labs. Accessed https://sites.google.com/site/datascienceslab/projects/multilingualsentiment
Kwee, A.T., Tsai, F.S., Tang, W.: Sentence-level novelty detection in English and Malay. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 40–51. Springer, Heidelberg (2009)
Raschka, S.: Naive Bayes and Text Classification: Introduction and Theory. Cornell university library, Ithaca (2014)
Kalaivai, P.: Sentiment classification of movie reviews by supervised machine learning approaches. Indian J. Comput. Sci. Eng. (IJCSE) 4(4), 317–323 (2013)
Patel, F.N., Soni, N.R.: Increasing accuracy of k-NN classifier for text classification. Int. J. Comput. Sci. Inform., ISSN (PRINT) 3(2), 2231–5292 (2013)
Khamar, K.: Short text classification using kNN based on distance function. Int. J. Adv. Res. Comput. Commun. Eng. 2(4) (2013)
Ashari, A., Paryudi, I., Tjoa, A.M.: Performance comparison between naïve bayes, decision tree and k-nearest neighbor in searching alternative design in an energy simulation tool. Int. J. Adv. Comput. Sci. Appl. 4(11) (2013)
Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J. Yang, Q., Motoda, H.: Top 10 algorithms in data mining. © Springer-Verlag London Limited (2007)
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398. Springer, Heidelberg (1998)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Alfred, R., Yee, W.W., Lim, Y., Obit, J.H. (2016). Factors Affecting Sentiment Prediction of Malay News Headlines Using Machine Learning Approaches. In: Berry, M., Hj. Mohamed, A., Yap, B. (eds) Soft Computing in Data Science. SCDS 2016. Communications in Computer and Information Science, vol 652. Springer, Singapore. https://doi.org/10.1007/978-981-10-2777-2_26
Download citation
DOI: https://doi.org/10.1007/978-981-10-2777-2_26
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-2776-5
Online ISBN: 978-981-10-2777-2
eBook Packages: Computer ScienceComputer Science (R0)