Abstract
Financial sentiment analysis is a very challenging problem because the market is influenced by various factors, such as company-specific/political news, sentiment/opinions of users, and other regional financial market. Good news can drive the market to grow positively, while negative news can drag the market downwards. For this reason, it is crucial to understand the impacts of news and social media on the stock market trends. Motivated by this, this paper focuses on developing an effective and efficient company-specific financial sentiment analysis model which can detect the trends of a company’s stock price. More specifically, we develop a novel neural network model that transforms pretrained general word embeddings into domain-specific embeddings. In addition, we use a knowledge-base to enrich the training vocabulary, and thus extend the domain-specific embedding space. The main challenge for natural language processing (NLP) applications is to learn the representation for the rare and unseen words. Another challenge for financial sentiment analysis models addressed in this paper is to deal with words that change their polarities depending upon the domain in which they are used. We thoroughly evaluate the performance of the proposed model on a benchmark dataset of SemEval-2017 shared task on financial sentiment analysis. The experimental results show that the proposed model delivers state-of-the-art performance when applied on Twitter and news headlines datasets, thus demonstrating its feasibility and effectiveness.
Similar content being viewed by others
References
Akhtar MS, Kumar A, Ghosal D, Ekbal A, Bhattacharyya P (2017) A multilayer perceptron based ensemble technique for fine-grained financial sentiment analysis. In: Proceedings of the international conference on empirical methods in natural language processing (EMNLP). Association for Computational Linguistics, pp 540–546
Bach NX, Hai VT, Phuong TM (2016) Cross-domain sentiment classification with word embeddings and canonical correlation analysis. In: Proceedings of the seventh symposium on information and communication technology, SoICT’16, pp 159–166
Bahdanau D, Bosc T, Jastrzebski S, Grefenstette E, Vincent P, Bengio Y Learning to compute word embeddings on the fly, arXiv:https://arxiv.org/abs/1706.00286
Cabanski T, Romberg J, Conrad S (2017) HHU at semeval-2017 task 5: fine-grained sentiment analysis on financial data using machine learning methods. In: Proceedings of the 11th international workshop on semantic evaluations (SemEval-2017). Association for Computational Linguistics, pp 832–836
Chenlo JM, Losada DE (2014) An empirical study of sentence features for subjectivity and polarity classification. Inform Sci 280:275–288
Cortis K, Freitas A, Daudert T, Huerlimann M, Zarrouk M, Davis B (2017) SemEval-2017 task 5: fine-grained sentiment analysis on financial microblogs and news. In: Proceedings of the 11th international workshop on semantic evaluations (SemEval-2017). Association for Computational Linguistics, pp 519–535
Deborah AS, Rajendram SM, Mirnalinee TT (2017) SSN_MLRG1 at SemEval-2017 task 5: fine-grained sentiment analysis using multiple kernel Gaussian process regression model. In: Proceedings of the 11th international workshop on semantic evaluations (SemEval-2017). Association for Computational Linguistics, pp 823–826
de Kauter MV, Breesch D, Hoste V (2015) Fine-grained analysis of explicit and implicit sentiment in financial news articles. Exp Syst Applic 42 (11):4999–5010
Geethapriya A, Valli S (2021) An enhanced approach to map domain-specific words in cross-domain sentiment analysis. Inf Syst Front 23:791–805
Ghosal D, Bhatnagar S, Akhtar MS, Ekbal A, Bhattacharyya P (2017) IITP at SemEval-2017 task 5: an ensemble of deep learning and feature based models for financial sentiment analysis. In: Proceedings of the 11th international workshop on semantic evaluations (SemEval-2017). Association for Computational Linguistics, pp 899–903
Gombar P, Medic Z, Alagic D, Snajder J (2017) Debunking sentiment lexicons: a case of domain-specific sentiment classification for Croatian. In: Proceedings of the 6th Workshop on Balto-Slavic natural language processing (BSNLP@EACL). Association for Computational Linguistics, pp 54–59
Hamilton WL, Clark K, Leskovec J, Jurafsky D (2016) Inducing domain-specific sentiment lexicons from unlabeled corpora. In: Proceedings of the 2016 conference on empirical methods in natural language processing (EMNLP 2016). Association for Computational Linguistics, pp 595–605
Han W, Chen H, Poria S (2021) Improving multimodal fusion with hierarchical mutual information maximization for multimodal sentiment analysis. In: Proceedings of the empirical methods in natural language processing (EMNLP-2021)
Jaech A, Heck L, Ostendorf M (2016) Domain adaptation of recurrent neural networks for natural language understanding. In: Proceedings of the 17th Annual conference of the international speech communication association (INTERSPEECH 2016), pp 690–694
Jiang M, Lan M, Wu Y (2017) ECNU at SemEval-2017 task 5: an ensemble of regression algorithms with effective features for fine-grained sentiment analysis in financial domain. In: Proceedings of the 11th international workshop on semantic evaluations (SemEval-2017). Association for Computational Linguistics, pp 888–893
Kar S, Maharjan S, Solorio T (2017) Ritual-UH at SemEval-2017 task 5: sentiment analysis on financial data using neural networks. In: Proceedings of the 11th international workshop on semantic evaluations (SemEval-2017). Association for Computational Linguistics, pp 877–882
Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the international conference on empirical methods in natural language processing (EMNLP). Association for Computational Linguistics, pp 1746–1751
Kumar A, Sethi A, Akhtar MS, Ekbal A, Biemann C, Bhattacharyya P (2017) IITPB at SemEval-2017 task 5: sentiment prediction in financial text. In: Proceedings of the 11th international workshop on semantic evaluations (SemEval-2017). Association for Computational Linguistics, pp 894–898
Kunar A, Garg G (2019) Sentiment analysis of multimodal twitter data. Multimed Tools Appl 78:24103–24119
Liu B (2012) Sentiment analysis and opinion mining. Morgan & Claypool Publishers
Madhyastha PS, Bansal M, Gimpel K, Livescu K (2017) Mapping unseen words to task-trained embedding spaces. In: Proceedings of the 1st workshop on representation learning for (NLP). Association for Computational Linguistics, pp 100–110
Mansar Y, Gatti L, Ferradans S, Guerini M, Staiano J (2017) Fortia-FBK at SemEval-2017 task 5: bullish or bearish? inferring sentiment towards brands from financial news headlines. In: Proceedings of the 11th international workshop on semantic evaluations (SemEval-2017). Association for Computational Linguistics, pp 817–822
Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th international conference on neural information processing systems - volume 2, NIPS’13. Curran Associates Inc., pp 3111–3119
Miller GA (1995) WordNet: a lexical database for English. Commun ACM 38(11):39–41
Moore A, Rayson P (2017) Lancaster A at SemEval-2017 task 5: evaluation metrics matter: predicting sentiment from financial news headlines. In: Proceedings of the 11th international workshop on semantic evaluations (SemEval-2017). Association for Computational Linguistics, pp 581–585
Nandwani P, Verma R A review on sentiment analysis and emotion detection from text. Social Network Analysis and Mining, 11(81)
Nardo M, Petracco-Giudici M, Naltsidis M (2016) Walking down wall street with a tablet: a survey of stock market predictions using the web. J Econ Surv 30(2):356–369
Nasim Z (2017) IBA-Sys at SemEval-2017 task 5: fine-grained sentiment analysis on financial microblogs and news. In: Proceedings of the 11th international workshop on semantic evaluations (SemEval-2017). Association for Computational Linguistics, pp 827–831
Nassirtoussi AK, Aghabozorgi S, Wah TY, Ngo DCL (2014) Text mining for market prediction: a systematic review. Exp Syst Applic 41(16):7653–7670
Nassirtoussi AK, Aghabozorgi S, Wah TY, Ngo DCL (2015) Text mining of news-headlines for FOREX market prediction: a multi-layer dimension reduction algorithm with semantics and sentiment. Exp Syst Applic 42(1):306–324
Nuij W, Milea V, Hogenboom F, Frasincar F, Kaymak U (2014) An automated framework for incorporating news into stock trading strategies. IEEE Trans Knowl Data Eng 26(4):823–835
O’Hare N, Davy M, Bermingham A, Ferguson P, Sheridan P, Gurrin C, Smeaton AF (2009) Topic-dependent sentiment analysis of financial blogs. In: Proceedings of the 1st international CIKM workshop on topic-sentiment analysis for mass opinion, TSA’09. ACM, pp 9–16
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP 2014). Association for Computational Linguistics, pp 1532–1543
Pilehvar MT, Collier N (2016) Improved semantic representation for domain-specific entities. In: Proceedings of the 15th workshop on biomedical natural language processing. Association for Computational Linguistics, pp 12–16
Pilehvar MT, Collier N (2017) Inducing embeddings for rare and unseen words by leveraging lexical resources. In: Proceedings of the 15th conference of the european chapter of the association for computational linguistics. Association for Computational Linguistics, pp 388–393
Ravi K, Ravi V, Prasad PSRK (2017) Fuzzy formal concept analysis based opinion mining for CRMin financial services. Appl Soft Comput 60:786–807
Ren Y, Wang R, Ji D (2016) A topic-enhanced word embedding for Twitter sentiment classification. Inform Sci 369:188–198
Rotim L, Tutek M, Śnajder J (2017) TakeLab at SemEval-2017 task 5: linear aggregation of word embeddings for fine-grained sentiment analysis on financial news. In: Proceedings of the 11th international workshop on semantic evaluations (SemEval-2017). Association for Computational Linguistics, pp 866–871
Roy A, Park Y, Pan S Learning domain-specific word embeddings from sparse cybersecurity texts, arXiv:https://arxiv.org/abs/1709.07470
Si J, Mukherjee A, Liu B, Li Q, Li H, Deng X (2013) Exploiting topic based twitter sentiment for stock prediction. In: Proceedings of 51st annual meeting of the association for computational linguistics, ACL (2013). Association for Computational Linguistics (ACL), pp 24–29
Sohangir S, Wang D, Pomeranets A, Khoshgoftaar TM Big Data: Deep Learning For Financial sentiment analysis. J Big Data 5(1)
Tafforeau J, Artières T, Favre B, Béchet F (2015) Adapting lexical representation and OOV handling from written to spoken language with word embedding. In: Proceedings of the 16th annual conference of the international speech communication association. (INTERSPEECH 2015), pp 1408–1412
Tsai M-F, Wang C-J, Chien P-C (2016) Discovering finance keywords via continuous-space language models. ACM Trans Manage Inf Syst 7(3):1–17
Wang J, Wang Z, Zhang D, Yan J (2017) Combining knowledge with deep convolutional neural networks for short text classification. In: Proceedings of the 26th international joint conference on artificial intelligence, IJCAI’17. AAAI Press, pp 2915–2921
Zipf GK (1949) Human behavior and the principle of least effort: an introduction to human ecology. Addison Wesley, Cambridge
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Agarwal, B. Financial sentiment analysis model utilizing knowledge-base and domain-specific representation. Multimed Tools Appl 82, 8899–8920 (2023). https://doi.org/10.1007/s11042-022-12181-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-12181-y