Margin-Based Active Learning and Background Knowledge in Text Mining
Margin-Based Active Learning and Background Knowledge in Text Mining
Margin-Based Active Learning and Background Knowledge in Text Mining
Mining
Abstract. Text mining, also known as intelligent text testing set, or even unlabeled examples, is being in-
analysis, text data mining or knowledge-discovery in vestigated as a way to improve classication perfor-
text, refers generally to the process of extracting in- mance.
teresting and non-trivial information and knowledge Seeger in (Seeger 2001) presents a report on lear-
from text. One of the main problems with text mi- ning with unlabeled data that compares several ap-
ning and classication systems is the lack of la- proaches.
beled data, as well as the cost of labeling unlabeled Our purpose is to evaluate the benets of introdu-
data (Kiritchenko and Matwin 2001). Thus, there is a cing unlabeled data in a support vector machine au-
growing interest in exploring the use of unlabeled data tomatic text classier and the possibility of actively
as a way to improve classication performance in text learning the classication task.
classication. The ready availability of this kind of The rest of the paper is organized as follows. Sec-
data in most applications makes it an appealing source tion 2 addresses several text classication issues, set-
of information. ting guidelines for problem formulation. Section 3
In this work we evaluate the benets of introdu- presents Support Vector Machines and their applica-
cing unlabeled data in a support vector machine au- tion to text mining/classication tasks.
tomatic text classier. We further evaluate the possi- Section 4 focuses on the issues related to the use
bility of learning actively and propose a method for of unlabeled data and Section 5 presents the two ap-
choosing the samples to be learned. proaches proposed and a comparison between them.
Section 6 presents the results obtained and, nally,
Keywords: Text Mining, Support Vector Machines,
Section 7 presents some conclusions and future work.
Active Learning.
Category SV Acc Prec Rec F1 Earn 1651 95.85 93.27 95.59 94.42
Acquisitions 1800 95.04 92.71 86.03 89.25
Earn 1632 95.92 95.53 92.50 93.99
Money-fx 928 96.13 71.07 53.42 60.99
Acquisitions 1751 94.93 93.09 85.15 88.94
Grain 802 98.93 92.71 64.49 76.07
Money-fx 908 96.13 71.43 52.80 60.72
Crude 697 97.18 85.29 65.91 74.36
Grain 771 97.96 92.55 63.04 75.00
Trade 661 97.68 79.75 55.75 65.62
Crude 693 97.04 84.85 63.64 72.73
Interest 744 97.22 76.92 49.59 60.30
Trade 647 97.64 79.49 54.87 64.92
Ship 505 98.49 89.58 53.09 66.67
Interest 742 97.15 77.03 47.11 58.46
Wheat 490 98.77 82.98 59.09 69.03
Ship 500 98.45 89.36 51.85 65.62
Corn 505 99.20 95.62 71.58 81.87
Wheat 487 98.77 84.44 57.58 68.47
Corn 484 99.08 93.33 53.85 68.29 Average 878.30 97.36 85.99 65.45 73.86
Earn 19 90.32 90.26 82.57 86.24 Earn 42 90.14 82.43 93.01 87.40
Acquisitions 19 49.77 32.07 98.29 48.36 Acquisitions 92 40.65 28.63 99.12 44.43
Money-fx 18 38.33 8.11 95.65 14.95 Money-fx 44 99.30 99.25 100.00 99.62
Grain 20 81.31 16.06 67.39 25.94 Grain 23 37.35 6.74 92.75 12.57
Crude 18 70.50 15.52 84.66 26.23 Crude 32 37.56 8.86 97.73 16.25
Trade 18 79.41 15.50 93.81 26.60 Trade 36 21.89 4.81 99.12 9.17
Interest 18 54.10 8.02 93.39 14.77 Interest 33 22.74 5.11 97.52 9.71
Ship 19 32.31 3.90 96.30 7.50 Ship 86 30.83 3.82 90.30 7.33
Wheat 19 95.49 29.61 68.18 41.29 Wheat 24 5.03 2.39 100.00 4.67
Corn 20 98.20 52.00 25.00 33.77 Corn 23 9.43 1.98 100.00 3.88
Average 18.80 68.97 27.11 80.52 32.57 Average 43.50 39.49 24.40 96.96 29.50