In the sentiment classifier evaluation phase, we obtained the results of our experiment on Thai financial news sentiment classification by using machine learning techniques such as naive Bayes, random forest, and SVM, and deep learning techniques such as CNN and LSTM. Our experiment used both 5-fold and 10-fold cross validation to perform feature extractions. Then, sentiment classifier evaluation measured sentiment performance.
4.2.1 First Experiment.
First experiment, we used fivefold cross validation for each classification model, and the results are as follows.
For the naive Bayes classifier, the first unigram experiment combined with TF-IDF provides accuracy at 63.37%, whereas positive sentiment gives an F1-score of 0.45, a precision score of 0.36, and a recall score of 0.58. Neutral sentiment gives an f1-score of 0.65, a precision score of 0.66, and a recall score of 0.64. Negative sentiment gives an f1-score of 0.64, a precision score of 0.69, and a recall score of 0.60. The second experiment using bag-of-words provides accuracy at 62.14%, whereas positive sentiment gives an f1-score of 0.67, a precision score of 0.62, and a recall score of 0.73. Neutral sentiment gives an f1-score of 0.53, a precision score of 0.55, and a recall score of 0.51. Negative sentiment gives an f1-score of 0.68, a precision score of 0.71, and a recall score of 0.61. The third experiment using Word2Vec provides accuracy at 66.99%, whereas positive sentiment gives an f1-score of 0.74, a precision score of 0.70, and a recall score of 0.80. Neutral sentiment gives an f1-score of 0.65, a precision score of 0.62, and a recall score of 0.69. Negative sentiment gives an f1-score of 0.56, a precision score of 0.60, and a recall score of 0.52. Finally, the BERT model provides accuracy at 69.72%, whereas positive sentiment gives an f1-score of 0.73, a precision score of 0.70, and a recall score of 0.75. Neutral sentiment gives an f1-score of 0.64, a precision score of 0.61, and a recall score of 0.68. Negative sentiment gives an f1-score of 0.79, a precision score of 0.78, and a recall score of 0.81.
The first bigram experiment combined with TF-IDF provides accuracy at 68.29%, whereas positive sentiment gives an f1-score of 0.62, a precision score of 0.59, and a recall score of 0.66. Neutral sentiment gives an f1-score of 0.62, a precision score of 0.60, and a recall score of 0.65. Negative sentiment gives an f1-score of 0.62, a precision score of 0.59, and a recall score of 0.65. The second experiment using bag-of-words provides accuracy at 66.32%, whereas positive sentiment gives an f1-score of 0.56, a precision score of 0.59, and a recall score of 0.53. Neutral sentiment gives an f1-score of 0.63, a precision score of 0.65, and a recall score of 0.62. Negative sentiment gives an f1-score of 0.60, a precision score of 0.66, and a recall score of 0.55. The third experiment using Word2Vec provides accuracy at 67.06%, whereas positive sentiment gives an f1-score of 0.65, a precision score of 0.65, and a recall score of 0.64. Neutral sentiment gives an f1-score of 0.63, a precision score of 0.57, and a recall score of 0.71. Negative sentiment gives an f1-score of 0.73, a precision score of 0.69, and a recall score of 0.78. Finally, the BERT model provides accuracy at 70.11%, whereas positive sentiment gives an f1-score of 0.77, a precision score of 0.82, and a recall score of 0.73. Neutral sentiment gives an f1-score of 0.67, a precision score of 0.69, and a recall score of 0.65. Negative sentiment gives an f1-score of 0.74, a precision score of 0.70, and a recall score of 0.77. The results are shown in Table
7.
For the random forest classifier, the first unigram experiment combined with TF-IDF provides accuracy at 55.55%, whereas positive sentiment gives an f1-score of 0.57, a precision score of 0.66, and a recall score of 0.46. Neutral sentiment gives an f1-score of 0.52, a precision score of 0.73, and a recall score of 0.40. Negative sentiment gives an f1-score of 0.44, a precision score of 0.50, and a recall score of 0.39. The second experiment using bag-of-words provides accuracy at 54.10%, whereas positive sentiment gives an f1-score of 0.58, a precision score of 0.49, and a recall score of 0.73. Neutral sentiment gives an f1-score of 0.47, a precision score of 0.60, and a recall score of 0.38. Negative sentiment gives an f1-score of 0.52, a precision score of 0.47, and a recall score of 0.58. The third experiment using Word2Vec provides accuracy at 60.27%, whereas positive sentiment gives an f1-score of 0.58, a precision score of 0.57, and a recall score of 0.59. Neutral sentiment gives an f1-score of 0.59, a precision score of 0.52, and a recall score of 0.69. Negative sentiment gives an f1-score of 0.56, a precision score of 0.52, and a recall score of 0.60. Finally, the BERT model provides accuracy at 60.69%, whereas positive sentiment gives an f1-score of 0.57, a precision score of 0.50, and a recall score of 0.67. Neutral sentiment gives an f1-score of 0.65, a precision score of 0.70, and a recall score of 0.61. Negative sentiment gives an f1-score of 0.51, a precision score of 0.49, and a recall score of 0.54.
The first bigram experiment combined with TF-IDF provides accuracy at 57.56%, whereas positive sentiment gives an f1-score of 0.56, a precision score of 0.58, and a recall score of 0.54. Neutral sentiment gives an f1-score of 0.57, a precision score of 0.53, and a recall score of 0.60. Negative sentiment gives an f1-score of 0.55, a precision score of 0.61, and a recall score of 0.51. The second experiment using bag-of-words provides accuracy at 56.07%, whereas positive sentiment gives an f1-score of 0.55, a precision score of 0.63, and a recall score of 0.49. Neutral sentiment gives an f1-score of 0.61, a precision score of 0.68, and a recall score of 0.55. Negative sentiment gives an f1-score of 0.56, a precision score of 0.57, and a recall score of 0.55. The third experiment using Word2Vec provides accuracy at 60.98%, whereas positive sentiment gives an f1-score of 0.59, a precision score of 0.61, and a recall score of 0.56. Neutral sentiment gives an f1-score of 0.61, a precision score of 0.67, and a recall score of 0.56. Negative sentiment gives an f1-score of 0.57, a precision score of 0.70, and a recall score of 0.54. Finally, the BERT model provides accuracy at 61.18%, whereas positive sentiment gives an f1-score of 0.63, a precision score of 0.65, and a recall score of 0.60. Neutral sentiment gives an f1-score of 0.57, a precision score of 0.55, and a recall score of 0.59. Negative sentiment gives an f1-score of 0.61, a precision score of 0.72, and a recall score of 0.53. The results are shown in Table
8.
For the SVM classifier, the first unigram experiment combined with TF-IDF provides accuracy at 69.85%, whereas positive sentiment gives an f1-score of 0.74, a precision score of 0.70, and a recall score of 0.78. Neutral sentiment gives an f1-score of 0.66, a precision score of 0.64, and a recall score of 0.68. Negative sentiment gives an f1-score of 0.67, a precision score of 0.76, and a recall score of 0.60. The second experiment using bag-of-words provides accuracy at 67.75%, whereas positive sentiment gives an f1-score of 0.65, a precision score of 0.64, and a recall score of 0.66. Neutral sentiment gives an f1-score of 0.64, a precision score of 0.63, and a recall score of 0.65. Negative sentiment gives an f1-score of 0.67, a precision score of 0.64, and a recall score of 0.70. The third experiment using Word2Vec provides accuracy at 74.25%, whereas positive sentiment gives an f1-score of 0.71, a precision score of 0.77, and a recall score of 0.66. Neutral sentiment gives an f1-score of 0.64, a precision score of 0.63, and a recall score of 0.65. Negative sentiment gives an f1-score of 0.66, a precision score of 0.67, and a recall score of 063. Finally, the BERT model provides accuracy at 78.76%, whereas positive sentiment gives an f1-score of 0.75, a precision score of 0.74, and a recall score of 0.77. Neutral sentiment gives an f1-score of 0.68, a precision score of 0.75, and a recall score of 0.63. Negative sentiment gives an f1-score of 0.68, a precision score of 0.68, and a recall score of 0.68.
The first bigram experiment combined with TF-IDF provides accuracy at 70.58%, whereas positive sentiment gives an f1-score of 0.78, a precision score of 0.66, and a recall score of 0.94. Neutral sentiment gives an f1-score of 0.52, a precision score of 0.72, and a recall score of 0.41. Negative sentiment gives an f1-score of 0.65, a precision score of 1.00, and a recall score of 0.48. The second experiment using bag-of-words provides accuracy at 70.03%, whereas positive sentiment gives an f1-score of 0.74, a precision score of 0.70, and a recall score of 0.80. Neutral sentiment gives an f1-score of 0.69, a precision score of 0.69, and a recall score of 0.70. Negative sentiment gives an f1-score of 0.69, a precision score of 0.69, and a recall score of 0.69. The third experiment is Word2Vec provides accuracy at 77.98%, whereas positive sentiment gives an f1-score of 0.74, a precision score of 0.75, and a recall score of 0.72. Neutral sentiment gives an f1-score of 0.71, a precision score of 0.65, and a recall score of 0.78. Negative sentiment gives an f1-score of 0.67, a precision score of 0.64, and a recall score of 070. Finally, the BERT model provides accuracy at 78.91%, whereas positive sentiment gives an f1-score of 0.79, a precision score of 0.89, and a recall score of 0.71. Neutral sentiment gives an f1-score of 0.84, a precision score of 0.84, and a recall score of 0.84. Negative sentiment gives an f1-score of 0.80, a precision score of 0.79, and a recall score of 0.81. The results are shown in Table
9.
For the CNN classifier, the first unigram experiment combined with TF-IDF provides accuracy at 71.90%, whereas positive sentiment gives an f1-score of 0.75, a precision score of 0.77, and a recall score of 0.73. Neutral sentiment gives an f1-score of 0.59, a precision score of 0.62, and a recall score of 0.56. Negative sentiment gives an f1-score of 0.73, a precision score of 0.78, and a recall score of 0.69. The second experiment using bag-of-words provides an accuracy of 69.80%, whereas positive sentiment gives an f1-score of 0.72, precision score of 0.72, and recall score of 0.73. Neutral sentiment gives an f1-score of 0.51, a precision score of 0.44, and a recall score of 0.59. Negative sentiment gives an f1-score of 0.60, a precision score of 0.74, and a recall score of 0.51. The third experiment using Word2Vec provides accuracy at 77.66%, whereas positive sentiment gives an f1-score of 0.70, a precision score of 0.67, and a recall score of 0.73. Neutral sentiment gives an f1-score of 0.62, a precision score of 0.60, and a recall score of 0.65. Negative sentiment gives an f1-score of 0.71, a precision score of 0.78, and a recall score of 065. Finally, the BERT model provides accuracy at 78.27%, whereas positive sentiment gives an f1-score of 0.74, a precision score of 0.72, and a recall score of 0.77. Neutral sentiment gives an f1-score of 0.66, a precision score of 0.64, and a recall score of 0.68. Negative sentiment gives an f1-score of 0.75, a precision score of 0.80, and a recall score of 0.70.
The first bigram experiment combined with TF-IDF provides accuracy at 72.72%, whereas positive sentiment gives an f1-score of 0.74, a precision score of 0.72, and a recall score of 0.77. Neutral sentiment gives an f1-score of 0.63, a precision score of 0.64, and a recall score of 0.62. Negative sentiment gives an f1-score of 0.59, a precision score of 0.60, and a recall score of 0.57. The second experiment using bag-of-words provides an accuracy of 71.47%, whereas positive sentiment gives an f1-score of 0.75, precision score of 0.75, and recall score of 0.76. Neutral sentiment gives an f1-score of 0.53, a precision score of 0.55, and a recall score of 0.51. Negative sentiment gives an f1-score of 0.63, a precision score of 0.82, and a recall score of 0.52. The third experiment using Word2Vec provides accuracy at 79.89%, whereas positive sentiment gives an f1-score of 0.70, a precision score of 0.77, and a recall score of 0.64. Neutral sentiment gives an f1-score of 0.66, a precision score of 0.64, and a recall score of 0.67. Negative sentiment gives an f1-score of 0.65, a precision score of 0.70, and a recall score of 0.60. Finally, the BERT model provides accuracy at 80.64%, whereas positive sentiment gives an f1-score of 0.75, a precision score of 0.70, and a recall score of 0.81. Neutral sentiment gives an f1-score of 0.66, a precision score of 0.68, and a recall score of 0.63. Negative sentiment gives an f1-score of 0.64, a precision score of 0.62, and a recall score of 0.66. The results are shown in Table
10.
For the LSTM classifier, the first unigram experiment combined with TF-IDF provides accuracy at 73.40%, whereas positive sentiment gives an f1-score of 0.76, a precision score of 0.75, and a recall score of 0.76. Neutral sentiment gives an f1-score of 0.72, a precision score of 0.80, and a recall score of 0.66. Negative sentiment gives an f1-score of 0.73, a precision score of 0.66, and a recall score of 0.81. The second experiment using bag-of-words provides accuracy at 70.66%, whereas positive sentiment gives an f1-score of 0.69, a precision score of 0.68, and a recall score of 0.77. Neutral sentiment gives an f1-score of 0.70, a precision score of 0.71, and a recall score of 0.69. Negative sentiment gives an f1-score of 0.75, a precision score of 0.76, and a recall score of 0.73. The third experiment using Word2Vec provides accuracy at 76.18%, whereas positive sentiment gives an f1-score of 0.74, a precision score of 0.69, and a recall score of 0.79. Neutral sentiment gives an f1-score of 0.69, a precision score of 0.70, and a recall score of 0.68. Negative sentiment gives an f1-score of 0.77, a precision score of 0.79, and a recall score of 0.74. Finally, the BERT model provides accuracy at 79.62%, whereas positive sentiment gives an f1-score of 0.78, a precision score of 0.81, and a recall score of 0.76. Neutral sentiment gives an f1-score of 0.65, a precision score of 0.61, and a recall score of 0.69. Negative sentiment gives an f1-score of 0.79, a precision score of 0.77, and a recall score of 0.81.
The first bigram experiment combined with TF-IDF provides accuracy at 75.75%, whereas positive sentiment gives an f1-score of 0.74, a precision score of 0.80, and a recall score of 0.69. Neutral sentiment gives an f1-score of 0.76, a precision score of 0.84, and a recall score of 0.70. Negative sentiment gives an f1-score of 0.73, a precision score of 0.74, and a recall score of 0.72. The second experiment using bag-of-words provides accuracy at 74.04%, whereas positive sentiment gives an f1-score of 0.75, a precision score of 0.76, and a recall score of 0.74. Neutral sentiment gives an f1-score of 0.75, a precision score of 0.77, and a recall score of 0.73. Negative sentiment gives an f1-score of 0.68, a precision score of 0.71, and a recall score of 0.65. The third experiment using Word2Vec provides accuracy at 76.20%, whereas positive sentiment gives an f1-score of 0.76, a precision score of 0.82, and a recall score of 0.71. Neutral sentiment gives an f1-score of 0.72, a precision score of 0.74, and a recall score of 0.71. Negative sentiment gives an f1-score of 0.74, a precision score of 0.81, and a recall score of 0.68. Finally, the BERT model provides accuracy at 80.75%, whereas positive sentiment gives an f1-score of 0.82, a precision score of 0.83, and a recall score of 0.80. Neutral sentiment gives an f1-score of 0.72, a precision score of 0.72, and a recall score of 0.72. Negative sentiment gives an f1-score of 0.84, a precision score of 0.83, and a recall score of 0.85. The results are shown in Table
11.
4.2.2 Second Experiment.
For the second experiment, we used 10-fold cross validation for each classification model, and the results are presented next.
For the naive Bayes classifier, the first unigram experiment combined with TF-IDF provides accuracy at 66.79%, whereas positive sentiment gives an f1-score of 0.71, a precision score of 0.66, and a recall score of 0.77. Neutral sentiment gives an f1-score of 0.64, a precision score of 0.65, and a recall score of 0.62. Negative sentiment gives an f1-score of 0.63, a precision score of 0.65, and a recall score of 0.62. The second experiment using bag-of-words provides an accuracy of 65.56%, whereas positive sentiment gives an f1-score of 0.69, a precision score of 0.64, and a recall score of 0.74. Neutral sentiment gives an f1-score of 0.62, a precision score of 0.64, and a recall score of 0.60. Negative sentiment gives an f1-score of 0.64, a precision score of 0.70, and a recall score of 0.58. The third experiment using Word2Vec provides accuracy at 78.70%, whereas positive sentiment gives an f1-score of 0.68, a precision score of 0.69, and a recall score of 0.67. Neutral sentiment gives an f1-score of 0.57, a precision score of 0.58, and a recall score of 0.56. Negative sentiment gives an f1-score of 0.60, a precision score of 0.62, and a recall score of 0.58. Finally, the BERT model provides accuracy at 70.72%, whereas positive sentiment gives an f1-score of 0.71, a precision score of 0.71, and a recall score of 0.71. Neutral sentiment gives an f1-score of 0.64, a precision score of 0.64, and a recall score of 0.65. Negative sentiment gives an f1-score of 0.70, a precision score of 0.71, and a recall score of 0.69.
The first bigram experiment combined with TF-IDF provides accuracy at 69.85%, whereas positive sentiment gives an f1-score of 0.77, a precision score of 0.67, and a recall score of 0.92. Neutral sentiment gives an f1-score of 0.50, a precision score of 0.64, and a recall score of 0.41. Negative sentiment gives an f1-score of 0.65, a precision score of 0.89, and a recall score of 0.52. The second experiment using bag-of-words provides an accuracy of 67.21%, whereas positive sentiment gives an f1-score of 0.72, a precision score of 0.68, and a recall score of 0.76. Neutral sentiment gives an f1-score of 0.65, a precision score of 0.66, and a recall score of 0.63. Negative sentiment gives an f1-score of 0.64, a precision score of 0.68, and a recall score of 0.60. The third experiment using Word2Vec provides accuracy at 70.01%, whereas positive sentiment gives an f1-score of 0.68, a precision score of 0.70, and a recall score of 0.65. Neutral sentiment gives an f1-score of 0.68, a precision score of 0.67, and a recall score of 0.70. Negative sentiment gives an f1-score of 0.68, a precision score of 0.66, and a recall score of 0.70. Finally, the BERT model provides accuracy at 71.29%, whereas positive sentiment gives an f1-score of 0.69, a precision score of 0.67, and a recall score of 0.71. Neutral sentiment gives an f1-score of 0.63, a precision score of 0.62, and a recall score of 0.64. Negative sentiment gives an f1-score of 0.72, a precision score of 0.74, and a recall score of 0.70. The results are shown in Table
12.
For the random forest classifier, the first unigram experiment combined with TF-IDF provides accuracy at 61.08%, whereas positive sentiment gives an f1-score of 0.62, a precision score of 0.68, and a recall score of 0.57. Neutral sentiment gives an f1-score of 0.56, a precision score of 0.49, and a recall score of 0.66. Negative sentiment gives an f1-score of 0.48, a precision score of 0.40, and a recall score of 0.59. The second experiment using bag-of-words provides accuracy at 59.53%, whereas positive sentiment gives an f1-score of 0.45, a precision score of 0.51, and a recall score of 0.40. Neutral sentiment gives an f1-score of 0.44, a precision score of 0.34, and a recall score of 0.64. Negative sentiment gives an f1-score of 0.66, a precision score of 0.63, and a recall score of 0.70. The third experiment using Word2Vec provides accuracy at 62.98%, whereas positive sentiment gives an f1-score of 0.60, a precision score of 0.60, and a recall score of 0.60. Neutral sentiment gives an f1-score of 0.51, a precision score of 0.59, and a recall score of 0.45. Negative sentiment gives an f1-score of 0.65, a precision score of 0.62, and a recall score of 0.69. Finally, the BERT model provides accuracy at 64.17%, whereas positive sentiment gives an f1-score of 0.62, a precision score of 0.67, and a recall score of 0.57. Neutral sentiment gives an f1-score of 0.55, a precision score of 0.55, and a recall score of 0.56. Negative sentiment gives an f1-score of 0.59, a precision score of 0.57, and a recall score of 0.68.
The first bigram experiment combined with TF-IDF provides accuracy at 62.69%, whereas positive sentiment gives an f1-score of 0.52, a precision score of 0.41, and a recall score of 0.70. Neutral sentiment gives an f1-score of 0.50, a precision score of 0.53, and a recall score of 0.63. Negative sentiment gives an f1-score of 0.51, a precision score of 0.64, and a recall score of 0.43. The second experiment using bag-of-words provides accuracy at 61.12%, whereas positive sentiment gives an f1-score of 0.66, a precision score of 0.61, and a recall score of 0.73. Neutral sentiment gives an f1-score of 0.57, a precision score of 0.52, and a recall score of 0.62. Negative sentiment gives an f1-score of 0.59, a precision score of 0.78, and a recall score of 0.47. The third experiment using Word2Vec provides accuracy at 63.10%, whereas positive sentiment gives an f1-score of 0.60, a precision score of 0.61, and a recall score of 0.59. Neutral sentiment gives an f1-score of 0.55, a precision score of 0.58, and a recall score of 0.53. Negative sentiment gives an f1-score of 0.59, a precision score of 0.53, and a recall score of 0.67. Finally, the BERT model provides accuracy at 66.26%, whereas positive sentiment gives an f1-score of 0.70, a precision score of 0.69, and a recall score of 0.70. Neutral sentiment gives an f1-score of 0.55, a precision score of 0.50, and a recall score of 0.69. Negative sentiment gives an f1-score of 0.70, a precision score of 0.70, and a recall score of 0.70. The results are shown in Table
13.
For the SVM classifier, the first unigram experiment combined with TF-IDF provides accuracy at 72.60%, whereas positive sentiment gives an f1-score of 0.79, a precision score of 0.93, and a recall score of 0.68. Neutral sentiment gives an f1-score of 0.67, a precision score of 0.69, and a recall score of 0.64. Negative sentiment gives an f1-score of 0.79, a precision score of 0.77, and a recall score of 0.80. The second experiment using bag-of-words provides accuracy at 69.09%, whereas positive sentiment gives an f1-score of 0.50, a precision score of 0.63, and a recall score of 0.42. Neutral sentiment gives an f1-score of 0.33, a precision score of 0.48, and a recall score of 0.25. Negative sentiment gives an f1-score of 0.81, a precision score of 0.73, and a recall score of 0.91. The third experiment using Word2Vec provides accuracy at 77.09%, whereas positive sentiment gives an f1-score of 0.78, a precision score of 0.77, and a recall score of 0.78. Neutral sentiment gives an f1-score of 0.69, a precision score of 0.73, and a recall score of 0.65. Negative sentiment gives an f1-score of 0.70, a precision score of 0.69, and a recall score of 0.71. Finally, the BERT model provides accuracy at 82.97%, whereas positive sentiment gives an f1-score of 0.83, a precision score of 0.83, and a recall score of 0.83. Neutral sentiment gives an f1-score of 0.80, a precision score of 0.80, and a recall score of 0.80. Negative sentiment gives an f1-score of 0.84, a precision score of 0.84, and a recall score of 0.84.
The first bigram experiment combined with TF-IDF provides accuracy at 76.39%, whereas positive sentiment gives an f1-score of 0.83, a precision score of 0.90, and a recall score of 0.77. Neutral sentiment gives an f1-score of 0.86, a precision score of 0.91, and a recall score of 0.80. Negative sentiment gives an f1-score of 0.60, a precision score of 0.61, and a recall score of 0.59. The second experiment using bag-of-words provides an accuracy of 74.26%, whereas positive sentiment gives an f1-score of 0.69, a precision score of 0.61, and a recall score of 0.80. Neutral sentiment gives an f1-score of 0.76, a precision score of 0.93, and a recall score of 0.79. Negative sentiment gives an f1-score of 0.59, a precision score of 0.58, and a recall score of 0.60. The third experiment using Word2Vec provides accuracy at 78.13%, whereas positive sentiment gives an f1-score of 0.78, a precision score of 0.77, and a recall score of 0.78. Neutral sentiment gives an f1-score of 0.71, a precision score of 0.72, and a recall score of 0.70. Negative sentiment gives an f1-score of 0.76, a precision score of 0.76, and a recall score of 0.76. Finally, the BERT model provides accuracy at 83.38%, whereas positive sentiment gives an f1-score of 0.84, a precision score of 0.84, and a recall score of 0.83. Neutral sentiment gives an f1-score of 0.84, a precision score of 0.83, and a recall score of 0.85. Negative sentiment gives an f1-score of 0.84, a precision score of 0.83, and a recall score of 0.84. The results are shown in Table
14.
For the CNN classifier, the first unigram experiment combined with TF-IDF provides accuracy at 72.64%, whereas positive sentiment gives an f1-score of 0.74, a precision score of 0.75, and a recall score of 0.78. Neutral sentiment gives an f1-score of 0.77, a precision score of 0.78, and a recall score of 0.78. Negative sentiment gives an f1-score of 0.65, a precision score of 0.68, and a recall score of 0.62. The second experiment using bag-of-words provides an accuracy of 70.88%, whereas positive sentiment gives an f1-score of 0.71, a precision score of 0.70, and a recall score of 0.71. Neutral sentiment gives an f1-score of 0.69, a precision score of 0.71, and a recall score of 0.68. Negative sentiment gives an f1-score of 0.73, a precision score of 0.75, and a recall score of 0.71. The third experiment using Word2Vec provides accuracy at 75.73%, whereas positive sentiment gives an f1-score of 0.71, a precision score of 0.76, and a recall score of 0.76. Neutral sentiment gives an f1-score of 0.75, a precision score of 0.79, and a recall score of 0.71. Negative sentiment gives an f1-score of 0.81, a precision score of 0.82, and a recall score of 0.80. Finally, the BERT model provides accuracy at 80.09%, whereas positive sentiment gives an f1-score of 0.81, a precision score of 0.80, and a recall score of 0.82. Neutral sentiment gives an f1-score of 0.80, a precision score of 0.79, and a recall score of 0.80. Negative sentiment gives an f1-score of 0.80, a precision score of 0.80, and a recall score of 0.79.
The first bigram experiment combined with TF-IDF provides accuracy at 77.67%, whereas positive sentiment gives an f1-score of 0.75, a precision score of 0.74, and a recall score of 0.76. Neutral sentiment gives an f1-score of 0.72, a precision score of 0.72, and a recall score of 0.72. Negative sentiment gives an f1-score of 0.73, a precision score of 0.69, and a recall score of 0.77. The second experiment using bag-of-words provides accuracy at 75.19%, whereas positive sentiment gives an f1-score of 0.70, a precision score of 0.69, and a recall score of 0.71. Neutral sentiment gives an f1-score of 0.78, a precision score of 0.77, and a recall score of 0.79. Negative sentiment gives an f1-score of 0.74, a precision score of 0.79, and a recall score of 0.70. The third experiment using Word2Vec provides accuracy at 77.20%, whereas positive sentiment gives an f1-score of 0.77, a precision score of 0.77, and a recall score of 0.77. Neutral sentiment gives an f1-score of 0.74, a precision score of 0.77, and a recall score of 0.72. Negative sentiment gives an f1-score of 0.71, a precision score of 0.65, and a recall score of 0.78. Finally, the BERT model provides accuracy at 83.86%, whereas positive sentiment gives an f1-score of 0.84, a precision score of 0.84, and a recall score of 0.84. Neutral sentiment gives an f1-score of 0.82, a precision score of 0.83, and a recall score of 0.82. Negative sentiment gives an f1-score of 0.84, a precision score of 0.79, and a recall score of 0.89. The results are shown in Table
15.
For the LSTM classifier, the first unigram experiment combined with TF-IDF provides accuracy at 76.33%, whereas positive sentiment gives an f1-score of 0.75, a precision score of 0.73, and a recall score of 0.77. Neutral sentiment gives an f1-score of 0.73, a precision score of 0.77, and a recall score of 0.69. Negative sentiment gives an f1-score of 0.71, a precision score of 0.70, and a recall score of 0.72. The second experiment using bag-of-words provides an accuracy of 74.20%, whereas positive sentiment gives an f1-score of 0.72, a precision score of 0.74, and a recall score of 0.71. Neutral sentiment gives an f1-score of 0.73, a precision score of 0.74, and a recall score of 0.72. Negative sentiment gives an f1-score of 0.73, a precision score of 0.76, and a recall score of 0.70. The third experiment using Word2Vec provides accuracy at 79.77%, whereas positive sentiment gives an f1-score of 0.78, a precision score of 0.78, and a recall score of 0.78. Neutral sentiment gives an f1-score of 0.77, a precision score of 0.79, and a recall score of 0.75. Negative sentiment gives an f1-score of 0.80, a precision score of 0.80, and a recall score of 0.79. Finally, the BERT model provides accuracy at 80.35%, whereas positive sentiment gives an f1-score of 0.81, a precision score of 0.81, and a recall score of 0.82. Neutral sentiment gives an f1-score of 0.80, a precision score of 0.77, and a recall score of 0.82. Negative sentiment gives an f1-score of 0.77, a precision score of 0.70, and a recall score of 0.84.
The first bigram experiment combined with TF-IDF provides accuracy at 78.88%, whereas positive sentiment gives an f1-score of 0.76, a precision score of 0.77, and a recall score of 0.75. Neutral sentiment gives an f1-score of 0.75, a precision score of 0.79, and a recall score of 0.70. Negative sentiment gives an f1-score of 0.77, a precision score of 0.80, and a recall score of 0.74. The second experiment using bag-of-words provides an accuracy of 77.28%, whereas positive sentiment gives an f1-score of 0.77, a precision score of 0.79, and a recall score of 0.76. Neutral sentiment gives an f1-score of 0.76, a precision score of 0.77, and a recall score of 0.74. Negative sentiment gives an f1-score of 0.73, a precision score of 0.75, and a recall score of 0.71. The third experiment using Word2Vec provides accuracy at 80.71%, whereas positive sentiment gives an f1-score of 0.83, a precision score of 0.85, and a recall score of 0.81. Neutral sentiment gives an f1-score of 0.80, a precision score of 0.77, and a recall score of 0.82. Negative sentiment gives an f1-score of 0.80, a precision score of 0.80, and a recall score of 0.81. Finally, the BERT model provides accuracy at 84.07%, whereas positive sentiment gives an f1-score of 0.83, a precision score of 0.82, and a recall score of 0.84. Neutral sentiment gives an f1-score of 0.80, a precision score of 0.76, and a recall score of 0.86. Negative sentiment gives an f1-score of 0.84, a precision score of 0.85, and a recall score of 0.83. The results are shown in Table
16.