Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Next Article in Journal
Delamination Prediction in Layered Composites Using Optimized ANN Algorithms: A Comparative Analysis
Previous Article in Journal
Optimization of Multi-Vehicle Cold Chain Logistics Distribution Paths Considering Traffic Congestion
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Effective Text Classification Through Supervised Rough Set-Based Term Weighting

Department of Computer Engineering, Faculty of Engineering, Şırnak University, 73000 Şırnak, Turkey
Symmetry 2025, 17(1), 90; https://doi.org/10.3390/sym17010090
Submission received: 16 December 2024 / Revised: 4 January 2025 / Accepted: 6 January 2025 / Published: 9 January 2025
(This article belongs to the Section Engineering and Materials)

Abstract

:
This research presents an innovative approach in text mining based on rough set theory. This study fundamentally utilizes the concept of symmetry from rough set theory to construct indiscernibility matrices and model uncertainties in data analysis, ensuring both methodological structure and solution processes remain symmetric. The effective management and analysis of large-scale textual data heavily relies on automated text classification technologies. In this context, term weighting plays a crucial role in determining classification performance. Particularly, supervised term weighting methods that utilize class information have emerged as the most effective approaches. However, the optimal representation of class–term relationships remains an area requiring further research. This study proposes the Rough Multivariate Weighting Scheme (RMWS) and presents its mathematical derivative, the Square Root Rough Multivariate Weighting Scheme (SRMWS). The RMWS model employs rough sets to identify information-carrying documents within the document–term–class space and adopts a computational methodology incorporating α, β, and γ coefficients. Moreover, the distribution of the term among classes is again effectively revealed. Comprehensive experimental studies were conducted on three different datasets featuring imbalanced-multiclass, balanced-multiclass, and imbalanced-binary class structures to evaluate the model’s effectiveness. The results show that RMWS and its derivative SRMWS methods outperform existing approaches by exhibiting superior performance on balanced and unbalanced datasets without being affected by class imbalance and number of classes. Furthermore, the SRMWS method is found to be the most effective for SVM and KNN classifiers, while the RMWS method achieves the best results for NB classifiers. These results show that the proposed methods significantly improve the text classification performance.

1. Introduction

The rapid advancement of technology has impacted every aspect of human life. Human–computer interaction, particularly with the development of web technologies, has reached new dimensions through information gathering and dissemination centers such as social media, online shopping sites, news, sports, and magazine platforms. This interaction has resulted in the generation of vast amounts of data. With the increasing volume of primarily textual data, including electronic documents, web pages, messages, etc., the organization, retrieval, and processing of these extensive textual data have become significant challenges. In this context, automatic text classification, often referred to as text categorization, stands out as one of the most widely-used technologies for addressing these purposes. Text classification is the process of assigning predefined categories or classes to textual data [1]. This process is carried out using machine learning algorithms to analyze the content of textual data and make decisions based on this content. Text classification helps organize, analyze, and comprehend textual data, involving various operations and processes. At the forefront of these processes is term/feature weighting [2,3,4].
Term weighting is defined as the systematic process of increasing the value of a specific term or terms to give them more importance in analysis or computations. In the weighting process, the weight for each term in the documents is calculated using a term weighting algorithm. The primary objective is to highlight the difference between terms that provide distinct and specific information for classification and those that are commonly found across all documents, carrying no specific information. This way, a more effective and efficient classification process can be achieved. However, the challenges of high dimensionality and sparsity in text data hold a significant place in the term weighting stage. In the literature, representing document contents with multidimensional feature vectors is referred to as the vector space model (VSM) [5]. Considering terms in text data as features, leads to a highly sparse structure in the vector space model, resulting in a high-dimensional term space. One of the most significant research problems in text classification is to represent the relationship between extracted features and documents as effectively as possible in the vector space model, despite the sparsity structure. In this process, the term weighting operation comes into play, and numerous studies have been presented on this matter. For instance, Debole and Sebastiani [6] initially proposed the idea of supervised term weighting (STW), which involves weighting terms based on known categorical information in the training data. They introduced three STW schemes: TF-CHI, TF-IG, and TF-GR. These schemes replaced the global factor IDF in TF-IDF with feature selection functions like χ2 statistic (CHI), information gain (IG), and gain ratio (GR). Similarly, Emmanuel et al. [7] expressed that the positive contribution of a feature to a category could be obtained by calculating its negative contributions to other categories and proposed the PIF (Positive Impact Factor) method for term weighting. Ren and Sohrab [8] have proposed two new term weighting schemes for text classification, named TF-IDF-ICF and TF-IDF-ICSDF. These schemes utilize, in addition to the TF-IDF information of terms, the inverse class frequency (ICF) and inverse class space density frequency (ICSDF) information, respectively, in the weighting process. The mentioned researchers emphasized that the TF-IDF-ICSDF term weighting scheme, particularly due to providing positive distinctiveness for both frequently and infrequently occurring terms, has shown promising results. Moreover, one of the most advanced algorithms in the recently published field of sentence extraction is the YAKE! model [9]. The model utilizes the concept of feature-based term weighting to measure the relevance of terms. The model also focuses on the position of stop words in the document to extract expressions.
Despite the successful term weighting schemes proposed in recent years, the ongoing introduction of new schemes in this field suggests that there is room for developing new schemes with weighting strategies that better reflect the discrimination potential of terms. Regardless of how up to date the proposed methods for term weighting are, there are instances where each of them may be insufficient in the weight calculation process, ignoring or failing to produce reasonable weights for terms in some extreme scenarios due to their weighting strategies. Therefore, this paper presents a new weighting model called Rough Multivariate Weighting Scheme (RMWS) and its mathematical derivative, Square Root Rough Multivariate Weighting Scheme (SRMWS). This study aims to provide the best representation by revealing hidden patterns in text data. For this purpose, the proposed scheme uses rough sets to reveal documents with special information. Firstly, with the help of rough sets, documents containing special information on a term basis are identified. Then, the distribution relation of the term between the classes is revealed, and discrimination for the terms is determined based on these relations. Finally, term weight values are calculated depending on the distinctiveness of the terms. After determining the most distinctive features, classification algorithms are employed to evaluate the performance of the RMWS and SRMWS methods. In this study, for this purpose, Support Vector Machines (SVMs), K-Nearest Neighbors (KNNs), and Naive Bayes (NB) classifiers were utilized. Additionally, in experimental studies, the classification performance of the proposed method was compared with existing methods in the literature using evaluation criteria such as macro-F1 and micro-F1.
The flow of the rest of this study is as follows: Section 2 offers insights into related works and explores the background of the approaches used for comparison. In Section 3, a preliminary analysis of Rough Set is provided. Section 4 introduces the RMWS method, and Section 5 details the experimental work along with the obtained results. Finally, after including discussions about the study in Section 6, the study is concluded with Section 7.

2. Related Works

Term weighting processes are crucial tools to enhance the processes of extracting meaning and retrieving information from text documents. The selection of term weighting methods may vary depending on the purpose of use and the dataset under consideration. Depending on the dataset’s utilization of class information, supervised or unsupervised methods are employed. Additionally, the preferred method may vary when dealing with a dataset that is binary or multiclass. Therefore, the literature includes weighting techniques with different working mechanisms. In this section, information about these techniques will be provided. Since there are common expressions and preliminaries in the weighting equations of these techniques, detailed information about these expressions and preliminaries is provided in Table 1 (for common expressions) and Table 2 (for preliminaries).
The term frequency (TF) and term frequency–inverse document frequency (TF-IDF) methods, known as traditional term weighting schemes, are binary weighting methods that arise from information extraction [11]. Frequently occurring words in text data can increase computational costs. For example, the word “the” in English is commonly used, and its frequency is high; this means it is present in all documents. Such words often have low discriminative features in text classification processes and may, therefore, need to be excluded. To cope with this problem, the TF-IDF method is employed. This method calculates a score by considering the frequency of a word in a particular document and its frequency across all documents. Using these scores, unique words representing important information in a specific document are identified and extracted from the text. Consequently, the IDF (inverse document frequency) value of a rarely-occurring word will be high, while the IDF value of a frequently-occurring word will be low. Therefore, in the TF-IDF method, by calculating the inverse document frequency values, low scores are assigned to common terms in the text collection, and high scores are assigned to rare terms. Mathematically expressed, TF-IDF [12] is calculated as follows:
W T F . I D F t = T F t , d k log ( D d ( t ) )
The supervised weighting method TF-PB, which utilizes class-internal and class-external possibility distributions, is an effective method for imbalanced datasets and binary classification [11]. This method calculates weighting as expressed in Equation (2) below.
W T F . I D F t = T F t , d k m a x i = 1 M log ( 1 + t x t x ¯ t x t y )
TF-RF is a weighting method used to measure the importance of terms in a document [13]. This method is a supervised learning technique, particularly employed in binary classification problems. It focuses on the frequency of occurrence of terms in positive and negative categories. Equation (3) below illustrates the TF-RF weighting formula.
W T F . R F t = T F t , d k m a x i = 1 M ( log 2 + t x t y )
TF-IDF-ICF is a supervised method that utilizes data related to the total number of documents and classes in which terms appear. Using this weighting method, the weight values of terms are obtained by multiplying the TF-IDF weight values for each term by ICF values [8]. Equation (4) represents the mathematical calculations.
W T F . I D F . I C F t = T F t , d k ( 1 + log ( D d ( t ) ) ) ( 1 + log ( M C ) )
Weighting method TF-IDF-ICSDF is a supervised scheme that calculates the weight values of terms by multiplying the TF-IDF weight value of a term by ICSDF [8]. The key difference between the formula of this method and the previous one lies in considering not only the total number of documents in each class, where each term appears, but also the ratio of the number of documents in that class to the total number of documents in that class. Equation (5) includes the relevant weighting formula.
W T F . I D F . I C S D F t = T F t , d k ( 1 + log ( D d ( t ) ) ) ( 1 + log ( M j = 1 M d ( t ) j D j ) )
TF-TRR is a supervised term weighting method that utilizes the distributions of positive and negative classes to accurately weigh terms for binary classification. In the TF-TRR weighting method, the TF value is used to determine how frequently a term appears in a document [14]. The TRR value is then used to assess the relevance of a term to the subject of the document. The mathematical representation is given by Equation (6).
W T F . T R R t = log ( T F t , d k ) m a x i = 1 M ( log 2 + t x t x + t x ¯ t y t y + t y ¯ )
TF-IGM is a recently-proposed supervised weighting method for multiclass classification that provides weighting. Terms are calculated using the Inverse Geometric Moment (IGM) [15]. This method computes IGM by counting the number of documents in which each term occurs at least once for each class. These counts are then sorted in ascending order from the largest to the smallest. Equation (7) shows the mathematical formula used to calculate the IGM value for a term.
I G M t = f 1 r = 1 M f r r
In this formula, f 1 ( r = 1,2 , , M ) represents the frequency of the term’s class-based document frequency. In other words, f 1 indicates the number of text documents containing the term t in the r-th category, arranged in descending order. The TF-IGM weight of a term is calculated as shown in Equation (8).
W T F . I G M t = T F t , d k ( 1 + λ I G M ( t ) )
In this formula, λ is an adjustable constant typically defined in studies within the range of 5.0 to 9.0. Moreover, its default value is 7.0.
TF-IGMimp attempts to address such weighting issues by incorporating L o g 10 [ D t o t a l t m a x / D t _ m a x ] into the standard IGM formula, as shown in Equation (9). Terms are calculated using the Improved Inverse Geometric Moment (IGM) [16].
I G M t = f 1 r = 1 M f r r + L o g 10 [ D t o t a l t m a x / D t _ m a x ]
D t o t a l t m a x is the number of total documents in the class in which t occurs most, and D t _ m a x is the number of documents in the class in which t occurs most. D t _ m a x also corresponds to f 1 in TF-IGM.
A variety of modifications and enhancements were discussed earlier to improve the performance of the TF-IDF scheme [17]. These changes are generally categorized as supervised and unsupervised methods. However, beyond these categories, there are also term weighting approaches with different working principles, known as vector-based term weighting [18]. Numerous term weighting schemes based on the vector concept have been proposed. Many of these models use n-grams to enhance algorithms in terms of understanding the semantics of a document [19,20,21,22,23]. However, in such approaches, as the word tree grows, the term space also expands, resulting in the widespread issue of high dimensionality. This, in turn, requires higher computational power and increases time complexity. Reducing dimensions in the high dimensionality problem makes the system lighter and more easily usable without excessive computational requirements. Consistently considering high dimensions may not be advantageous, as it can lead to uncertain results. For these reasons, the TF-IDF modification techniques mentioned earlier are often preferred. Indeed, the method proposed in this study is also a supervised TF-IDF multiclass approach.

3. Rough Set Theory

Rough set theory (RST) [24] is a mathematical approach for effective inference from incomplete and inconsistent data, uncovering hidden patterns without requiring additional information like membership functions. This makes it distinct from methods like Fuzzy Logic and Dempster–Shafer Theory. Widely applied in fields such as data mining, pattern recognition, and text mining, RST supports tasks like classification, rule generation, feature selection, and dimension reduction independently [25,26,27]. RST operates by organizing uncertain data into rough sets and extracting approximate values for concepts. Its foundation lies in classifying relational databases to generate concepts and rules while identifying equivalence relations for further information discovery. Unlike Fuzzy Set Theory, which depends on membership functions with inherent uncertainty, RST uses precise boundary definitions to address uncertain problems. Key concepts of RST include the information system, which represents raw data collected from various fields. When the system includes decision attributes, it is termed a decision table; otherwise, it is an information table. Mathematically, let S = ( U , A , D ) represent a decision table or information system, where U = x 1 , x 2 , , x n denotes the universal set consisting of objects, A = a 1 , a 2 , , a m denotes a conditional attribute set, and D = d 1 , d 2 , , d k denotes a decision attribute set. If D , the system S is referred to as a decision table; otherwise, it is expressed as an information table. Table 3 provides an example of a decision table.
The indiscernibility relation or discernibility relation determines the similarity or difference between objects in a knowledge system based on a subset of attributes. For any conditional attribute subset T A , denoted as I N D ( T ) , the T i n d i s c e r n i b i l i t y   r e l a t i o n is defined as follows:
I N D T = x i , x j U 2 a T , a x i = a ( x j ) }
In the formula, the equivalence classes of the T i n d i s c e r n i b i l i t y   r e l a t i o n are represented as [ x ] T .
Rough set theory introduces lower ( T _ X ) and upper ( T ¯ X ) approximations to analyze subsets X U using attributes T A , as follows:
T _ X = x     [ x ] T X } :   O b j e c t s   d e f i n i t i v e l y   i n   X . T ¯ X = x   [ x ] T X } : : O b j e c t s   p o s s i b l e   i n   X .
These approximations help identify regions within rough sets, distinguishing certain membership ( x T _ X ) from probabilistic membership ( x T ¯ X ). For example, considering the decision table shown in Table 3, let = { a 1 , a 4 } and X = { x 1 , x 2 , x 5 , x 7 , x 8 } . In this case, according to Equation (3), the pair < T _ X , T ¯ X > would be as follows:
  • T _ X = x 2 , x 7 : C e r t a i n m e m b e r s
  • T ¯ X = x 1 , x 2 , x 3 , x 4 , x 5 , x 6 , x 7 , x 8 : P o s s i b l e m e m b e r s
An accuracy measure of the set X for T A is defined as follows:
μ T X = c a r d T _ X / c a r d T ¯ X
It reflects the determinability of set X   within U , ranging from 0 to 1. For the above example, μ T X = 2 8 = 0.25 , indicating partial determinability.

4. Proposed Method: Rough Multivariate Weighting Scheme (RMWS)

In this section, a novel term weighting approach called Rough Multivariate Weighting Scheme (RMWS) is introduced, along with its mathematical derivative, the Square Root Rough Multivariate Weighting Scheme (SRMWS).
Text documents generally represent unstructured datasets. However, to enable processing by a classifier model, unstructured text data need to be transformed into a structured feature space. Creating a system that best represents the content of each document is crucial in this complex task. The vector space model is the most common method used for this purpose. The goal is to ensure that the vector space model effectively represents the dataset. Researchers are exploring various solutions to effectively represent document vectors, which is a significant challenge. In this context, when determining the relationships between the content of documents and terms, assigning appropriate weights to terms is a critical step. Therefore, there is a need for an effective term weighting scheme that assigns reasonable weights to terms based on their classification capabilities. While there are many term weighting schemes in the literature, it is challenging to claim that these schemes ideally reflect the true distinctive abilities of terms. For instance, the word “cat” may be more important in a text related to the “animal” category, while the conjunction “and” may not be as significant. Techniques that consider class information, such as “supervised” term weighting methods, can provide higher classification accuracy. These methods can better capture the importance of terms in different categories by assigning a separate weight for each term in each class. However, it can be argued that supervised weighting methods have not yet fully achieved effective representation of the class–term relationship, and ongoing research in this area indicates the need for further improvement. Therefore, effectively revealing the class–term relationship and identifying documents containing specific information related to this relationship form one of the most important foundations and motivations of this study. Thus, this study aims to provide the best representation by uncovering hidden patterns in text data, using proven rough set methods for successful extraction of hidden patterns from data. With the help of rough sets, documents containing specific information on a term-by-term basis have been identified. In text data, certain documents may contain indicative terms related to a person, field, topic, or object. The presence of these terms provides specific information. In this study, these specific patterns have been obtained using rough sets. The indiscernibility relation in rough sets has been utilized to select documents with distinctive frequencies for specific terms. The indiscernibility relation for documents related to a term is expressed as follows:
I N D T = d i , d j D 2 t r T , t r d i t r ( d j ) }
In the equation, D represents the document space, where d i and d j denote the i -th and j -th documents, respectively. Similarly, T represents the term space, and t r represents the r-th term. Using the indiscernibility relation, equivalence classes denoted by [ d ] T are obtained, representing sets of documents. These equivalence classes include documents that contain specific terms in the document–term space, providing significant distinctive information. To determine how much information the equivalence classes offer, a subset approach is applied to the equivalence classes. For a document subset D k D , the document subset approximation T _ D k is expressed as follows:
T _ D k = d     [ d ] T D k }
The information value provided by the determined subset T _ D k depends on its ratio within the class. This ratio is referred to as Rough Rate (RR) in this study and is formulated as follows:
R R = c o u n t ( T _ D k ) / c o u n t ( C R )
In Equation (15), C R represents the documents belonging to class R . After determining the document ratio providing specific information on a class basis over the equivalence classes, the distinctiveness of each term is determined. The distinctiveness of a term depends on some fundamental criteria:
A term that is frequently seen in a single class and not observed in other classes is considered distinctive.
A term that appears in some classes is relatively distinctive.
To reveal these pieces of information, the distribution among the classes of the term is examined. For this purpose, first, the possibility of the term within the class ( P C j t i ) is calculated, and then the possibility of the term not occurring within the class ( P t i ¯ C j ) is computed.
Some coefficients are needed to fully reveal the distinctiveness of the term. Three constant coefficients were used in this study. Two of them are as follows:
β : It is used to balance the relationship between R R and P C j t i . The calculated values of R R and P C j t i may differ significantly. This coefficient is employed to make the impact of R R more comprehensible.
λ : The value of P t i ¯ C j may be zero. In such cases, this coefficient is used to enable the calculation.
In accordance with this information, the distinctiveness of a term is calculated using Equation (16) presented in this study.
R M W S ( t i ) = j = 0 M P C j t i + β R R P t i ¯ C j + λ
After calculating the distinctiveness of the term, the weight calculation for the term is performed according to the proposed approach, as follows:
W R M W S t i = T F t i , d k R M W S t i α
Taking the square root of the T F t i , d k value in Equation (17) yields the SRMWS value. Accordingly, SRMWS is calculated using Equation (18) below.
W S R M W S t i = ( T F t i , d k ) R M W S t i α
In Equation (18), α represents the third coefficient.
  • α : This coefficient was used to pull the value into the appropriate range according to the classifier.
At this point, it is important to clarify that SRMWS is not an entirely new method; rather, it is a variation of the proposed approach, obtained by taking the square root of the term frequency (TF) values. This is analogous to the relationship between the TF-IGMimp [16] method and its derivative, SQRT_TF-IGMimp.
An Illustrative example:
The working principle of RMWS has been demonstrated through the application on a simple text document, represented by Table 4. The provided simple document collection in Table 4 illustrates the operational principle of RMWS.
For the given collection of simple documents, the initial calculation involves determining the distinctiveness or importance of each term. The RMWS values for each term are computed using Equation (16) and, in this example, the values of all three coefficients ( α , β , λ ) have been set to 1.
R M W S t 1 = 1 3 + 1 0 2 1 2 + 1 + 1 3 + 1 0 2 1 2 + 1 + 1 3 + 1 0 2 1 2 + 1 = 0.6667
R M W S t 2 = 1 2 + 1 1 2 1 2 + 1 + 1 2 + 1 1 2 1 2 + 1 + 0 2 + 1 0 2 2 2 + 1 = 1.3334
R M W S t 3 = 0 3 + 1 0 2 2 2 + 1 + 2 3 + 1 1 2 0 2 + 1 + 1 3 + 1 0 2 1 2 + 1 = 1.3889
R M W S t 4 = 0 2 + 1 0 2 2 2 + 1 + 0 2 + 1 0 2 2 2 + 1 + 2 2 + 1 2 2 0 2 + 1 = 2.0000
Analyzing the distinctiveness of terms reveals that term t 4 has the highest distinctiveness, while the lowest value is assigned to t 1 . This distinction is due to t 4 being exclusively present in class c3 and occurring with a specific majority, warranting a high score. Conversely, t 1 occurs in equal amounts across all three classes, leading to a lower score. Additionally, the distinctiveness of t 2 is only slightly less than that of t 3 . While t 2 occurs in classes C 1 and C 2 , t 3 appears in classes C 2 and C 3 . The difference lies in t 3 being more concentrated in one class ( C 2 ) compared to t 2 , making it more distinctive and assigning it a higher score. Furthermore, intuitively, it is clearly evident that the distinctiveness of these terms aligns with the computation performed by RMWS. Table 5 provides a summary of these evaluations.
Now, the weight set for each term can be calculated. For this purpose, the calculated term weights are used to update the value set for the terms, according to Equation (17).
W R M W S t 1 = 0.6667 0.0000 0.0000 0.6667 0.0000 0.6667
W R M W S t 2 = 0.0000 2.6667 1.3333 0.6667 0.0000 0.0000
W R M W S t 3 = 0.0000 0.0000 1.3889 2.7778 0.0000 1.3889
W R M W S t 4 = 0.0000 0.0000 0.0000 0.0000 4.0000 2.0000
After the weighting of the data is completed, the remaining step is to observe the impact of these weightings on the classifier. Finally, the illustrative representation of the working principle of the proposed approach related to RMWS is provided in Figure 1.

5. Experimental Works

In this section, we present the outcomes of our experimental endeavors. Initially, we provide concise details about the employed datasets, followed by an elucidation of the chosen success metrics and used classifiers. Then, various tests have been conducted to determine suitable values for constant coefficients. Subsequently, we assess the impact of terms weighted by proposed methods on the efficacy of classifiers. For this purpose, we compare the performance of proposed methods with the term weighting methods outlined in the preceding sections. The performance of classifiers using terms weighted by their corresponding term weighting algorithms is discussed in this section. Lastly, a set of statistical analyses is incorporated into the conclusion of this section to ascertain whether the performance enhancement achieved by the proposed methods is statistically significant in comparison to other methods.

5.1. Datasets

Within the scope of experimental studies, three different reference datasets have been used for text classification: Reuters-21578, 10 Mini Newsgroups, and Enron1. These collections have been preferred due to their characteristics, such as balanced and unbalanced or multiple and binary classes. In other words, the advantage of this diversity has been utilized to make a fair comparison among the proposed term weighting methods.
Reuters-21578 is a collection of documents published on the Reuters news channel in 1987. The documents have been compiled and indexed into categories. Additionally, the dataset includes the first ten classes of the well-known Reuters ModApte split [28], widely used in many text classification research studies. This dataset is termed unbalanced because it contains a different number of documents in each class and is multiclass. In the context of this study, experiments were conducted on the training and test splits of Reuters-21578. During the feature extraction process, multi-labeled documents were removed from the Reuters-21578 data, and subsequently, the two classes named ’wheat’ and ’corn’ were deleted because these two classes became empty. Further details about the Reuters-21578 dataset are presented in Table 6.
The 20 Newsgroups dataset [28] is divided into two subsets, totaling approximately 18,000 newsgroups and covering 20 different topics: one for training or development and the other for testing or performance evaluation. The division between the training and test sets is based on messages sent before and after a specific date. The 20 Mini Newsgroups dataset used in this study is a mini subset of the popular text collection 20 Newsgroups, containing ten different classes. This dataset has a balanced structure, meaning the number of documents in each class is equal, and it is multiclass. In the experiments, the dataset was manually divided into training (70%) and test sections (30%). Detailed information about the 10 Mini Newsgroups dataset is provided in Table 7.
The Enron–Spam dataset, a source described in the publication ’Spam Filtering with Naive Bayes—Which Naive Bayes?’ by V. Metsis, I. Androutsopoulos, and G. Paliouras [29], was collected by the mentioned authors. The dataset contains a total of 17,171 spam and 16,545 non-spam email messages (a total of 33,716 emails). In this study, a subset of the Enron–Spam dataset, named Enron1, was used. This dataset is imbalanced as it contains a different number of documents in each class. Additionally, it is used for binary classification as it consists of only two classes. Content information related to Enron1 is presented in Table 8.

5.2. Assessment of Performance

This study employed micro-F1 and macro-F1 scores as key performance metrics to assess the efficacy of feature selection methods. The F1 score, incorporating both precision and recall, was utilized in the evaluation. In the macro-averaging approach, the F1 score is individually calculated for each class, and subsequently, the mean across all classes is determined [30]. The computation of the macro-F1 score is illustrated below in Equation (19).
M a c r o F 1 = i = 1 M F i M , F i = 2 p i r i p i + r i
In this equation, p i and r i represent the precision and recall scores of class j, respectively.
Conversely, the F1 score is computed in micro-averaging without considering class-specific information. Therefore, all classification decisions are taken into account across all corpora. In the evaluation of imbalanced datasets, the micro-averaging approach may result in the dominance of large classes over small ones. However, this scenario might not be applicable to balanced datasets, where the number of documents in each class is equal, and the feature counts are similar. The calculation of the micro-F1 score is illustrated below in Equation (20).
M a c r o F 1 = 2 p r p + r
where p and r represent precision and recall values across all classes. The micro-F1 score, influenced by the prevalence of larger classes with more documents, might not ensure a fair assessment in all scenarios. Consequently, to achieve a more unbiased evaluation, the micro-F1 score is preferred for balanced datasets, whereas the macro-F1 score is employed for imbalanced datasets. In this study, a diverse range of datasets, including both balanced and imbalanced ones, were employed. Thus, the experiments utilized both micro-F1 scores and macro-F1 criteria to ensure comprehensive evaluation.

5.3. Classifiers

The proposed method, RMWS, is not dependent on the learning model as it is a term weighting technique. Therefore, to explore the impact of the features incorporated into the ultimate feature set on classification accuracy, four distinct classifiers were utilized in the experimental phase of the study. Concise explanations of the classifiers employed are outlined in Table 9.

5.4. Coefficient Analysis

This study utilizes three adjustable coefficients, akin to a potentiometer, to achieve specific balances. To set these coefficients to appropriate values and observe their impact on the results, micro-F1 and macro-F1 results for SVM, KNN, and NB classifiers were obtained on the Reuters_21578 dataset with term sizes of 500, 1000, and 2000. The results obtained are presented in Table 10, Table 11 and Table 12. Furthermore, the best-performing results presented in the tables are emphasized using bold font to facilitate the reader’s focus on the most significant outcomes.
When examining the tables, Table 10 has been created to analyze the effect of the α coefficient. In this table, β and λ values are kept constant while the α value is varied. The α value is tested for random values greater and less than 1, α = { 0.8 , 1 , 1.7 , 3 }. Accordingly, the α coefficient has negligible impact on the KNN classifier, whereas it exerts a significant influence on the NB classifier. For the SVM classifier, this coefficient has a relatively modest effect. The performance of the NB classifier is notably enhanced when the value of this coefficient exceeds 1. For instance, with α = 0.8 and 500 terms, the macro-F1 value is 57.5188%, while for α = 1.7 , it increases to 87.7820%. This implies a positive enhancement in the performance of the NB classifier when the α coefficient is greater than 1. However, a careful examination of Table 2 reveals a decline in results when this value is 3. Therefore, it is necessary for the α value to be within a certain range. The results show that α = 1.7 is a reasonable value.
Table 11 was created to analyze the β value. In this table, other coefficients are fixed, and the β variable has random values: β = { 0.8 , 1 , 1.5 , 2.5 } . The β coefficient generally appears to positively influence results when its value is high for this dataset. This coefficient is a constant used within the method to adjust its contribution to uncovering hidden patterns within the dataset. A high value signifies the importance of rough clustering in revealing specific information. In this analytical study, the value was set at a maximum of 2.5, and optimal results were obtained for all classifiers at this value. It indicates that taking values slightly higher than 2.5 for the β parameter, and in proportion to it, will improve the results.
Table 12 was created to show the effect of the λ coefficient. The { 0.1 , 0.4 , 0.8 , 1 } random values were taken for λ . The λ value is a constant given to facilitate calculations in cases where the denominator in the method equation is zero. Therefore, values less than 1 have been assigned. This coefficient is observed to vary on a class-by-class basis.
As a result, the coefficients predicted to be ideal for the Features dataset are provided in Table 13 below. The fixed values mentioned in the table were used in the experimental section of this study.
Again, when the tables created for coefficient analyses and Table 13 are examined, the relationship between coefficients can be analyzed on a classifier basis. To perform this analysis, the binary interaction effects of coefficients were examined on a classifier basis. Accordingly, binary coefficient relationship graphs for each classifier are given in order in Figure 2, Figure 3 and Figure 4.
In order to see whether the values given in Table 13 would have an effect on the results as stated, some more tests were performed. In these tests, the coefficient values given for each classifier in Table 13 were compared with the best results for the same classifier in Table 10, Table 11 and Table 12. The results obtained are given in Table 14 and the best results are highlighted in bold within the table.
Table 14 shows that the proposed methods give better results at the coefficient values determined on a classifier basis. This situation indicates both the necessity and importance of the analyses performed.
Note: Similar procedures have been applied to other datasets in this study. The determined coefficient values have yielded comparable results in these datasets as well. Therefore, the values in Table 13 have been used for all datasets and accepted as default values for the proposed methods.

5.5. Accuracy Analysis

This section presents a comprehensive comparison of the proposed term weighting methods with established approaches, namely TF-IDF, ICF, ICSDF, TRR, IGM, SIGM, IGMimp, and SIGMimp. The performance evaluation is conducted on various term dimensions: 750, 1500, 2500, 3750, 4500, 5750, 6500, and 7750. The Distinguishing Feature Selector (DFS) [31] algorithm is utilized for term selection within the dimension selection process. Term selection methods are frequently used to address the high dimensionality of term spaces and to assess the effectiveness of proposed techniques in text classification. The DFS approach is preferred in this study because it provides effective results. This study investigates the performance of SVM, KNN, and NB classifiers in terms of macro-F1 and micro-F1 criteria on the weighted term dimensions. The obtained results are presented in separate graphs for each dataset.
Figure 5, Figure 6 and Figure 7: Macro-F1 and micro-F1 results for the Reuters_21578 dataset.
Figure 8, Figure 9 and Figure 10: Macro-F1 and micro-F1 results for the 10 Mini Newsgroups dataset.
Figure 11, Figure 12 and Figure 13: Macro-F1 and micro-F1 results for the Enron1 dataset.
Figure 5 presents a comparative analysis of the term weighting methods alongside RMWS, SRMWS, and eight additional approaches from the literature, based on micro-F1 and macro-F1 criteria, using the SVM classifier on the Reuters_21578 dataset. As observed in the figure, the SRMWS method outperforms all other methods in terms of both micro-F1 and macro-F1 scores across all term dimensions. Similar to RMWS and SRMWS, SIGM and SIGMimp represent the square root variants of IGM and IGMimp approaches, respectively. Another key observation from Figure 5 is that RMWS emerges as the second-best approach after SIGM and SIGMimp. Notably, RMWS even surpasses these approaches in the 4500-dimensional setting, according to the micro-F1 criterion. RMWS has demonstrated a remarkable superiority over non-variant methods, showcasing its potential to make a significant mark in the field of term weighting. These findings can contribute to the development of text classification algorithms and lay a crucial foundation for future research. This finding demonstrates that the proposed term weighting model offers a more effective weighting strategy compared to existing models in the literature. The weight scores assigned to terms also serve as a measure of their discriminative power.
Figure 6 presents the results obtained using the KNN classifier. It can be observed that the SRMWS method achieves the same result as SIGM and SIGMimp for micro-F1 at the 750th dimension, while outperforming them in all other dimensions. Similarly, for macro-F1, it achieves the best result in all dimensions, except for the 7750th dimension, where it is equal to SIGM and SIGMimp. Among the non-variant methods, RMWS shows the highest performance for macro-F1. For micro-F1, it shows the highest performance in all dimensions except for 3750 and 4500.
Figure 7 presents the performance of term weighting algorithms for the NB classifier. It can be observed that the RMWS and SRMWS methods yield the same results across all term sizes and demonstrate superior performance compared to other approaches. Additionally, it is noted that algorithms derived from the square root of TF values (variants) also produce similar results to their parent algorithms.
Figure 8 shows that the SRMWS method outperforms all other methods in all dimensions for both micro-F1 and macro-F1 scores when classifying text documents with the SVM classifier. This suggests that the term weights assigned by SRMWS are more effective and discriminative. Furthermore, RMWS closely follows SRMWS in terms of performance, demonstrating the potential benefits of incorporating variants in the term weighting process for SVM classification.
Figure 9 shows that the KNN classifier achieves the best micro-F1 and macro-F1 results with SRMWS at 1500 and 2500 dimensions. At 5750 dimensions, SRMWS performs as well as SIGM. These findings suggest that SRMWS is a flexible option for dimension selection and can perform well with the KNN classifier.
Figure 10 presents a noteworthy finding regarding the NB classifier. All approaches, except for the TF-IDF method, yielded identical results in all dimensions for both micro-F1 and macro-F1 criteria. The binary class structure of the dataset plays a pivotal role in obtaining these results. In binary class datasets, the performance of different term weighting methods may exhibit minimal variations. This stems from the fact that only a few key terms are sufficient to discriminate between the two classes. In this case, frequency-based methods such as TF-IDF cannot provide a significant advantage compared to other methods.
Figure 11 presents a comparison of the performance of various weighting approaches when employing the SVM classifier. Excluding dimensions 1500 and 2500, the SIGM, SIGMimp, and SRMWM schemes attain the highest values in both micro-F1 and macro-F1 metrics. However, following these approaches, RMWS yields the best results in all dimensions, except for the 750th dimension. The results obtained with the KNN classifier are presented in Figure 12. As can be observed, SRMWS clearly achieves the highest values based on both criteria in all dimensions. Again, the KNN classifier achieves the best results after SRMWS in all dimensions except for the 750th dimension with RMWS. The best NB results are obtained with RWMS in micro-F1 and macro-F1, achieving the highest score, except for dimension 750. In the 750th dimension, the highest score is obtained with SRMWS. These cases are illustrated in Figure 13. In summary, for this dataset, results regarding term weighting schema are as follows: (a) In SVM and KNN classifiers, the SRMWS method stands out as the most effective approach. (b) The SIGM, SIGMimp, and SRMWM methods also exhibit high performance in the SVM classifier. (c) For the NB classifier, the RWMS method yields the optimal outcomes.
As a result, in this study, two novel methods, RMWS and SRMWS, have been proposed and compared with existing approaches. The comparison was conducted on imbalanced-multiclass, imbalanced-binary class, and balanced-multiclass datasets. The obtained results clearly demonstrate that the RMWS and SRMWS methods outperform existing approaches on imbalanced-multiclass, imbalanced-binary class, and balanced-multiclass datasets. It was observed that RMWS and SRMWS methods consistently exhibit superior performance without being significantly affected by class imbalances and the number of classes in the dataset. Furthermore, it was determined that the SRMWS method outperforms the RMWS method on balanced-multiclass datasets.

5.6. Statistical Analysis

This section presents a comprehensive statistical analysis to assess whether the RMWS and SRMWS approaches yield meaningful results. The initial test examines the average performance of term weighting approaches across all datasets, specifically focusing on results obtained with the same term dimension. This allows us to assess whether an approach delivers consistent results regardless of the specific dataset. Moreover, certain approaches may perform well on specific datasets but exhibit poor performance on others, indicating a lack of consistency. While an approach is not expected to yield the best results on every dataset, it should not produce excessively poor results either. Consistency is crucial for evaluating an approach’s reliability and overall performance. To facilitate this analysis, Table 15, Table 16 and Table 17 have been devised for each classifier, specifically SVM, KNN, and NB, respectively. Furthermore, the notation ’fs’ within the tables denotes the term dimensions and the best results are highlighted in bold within the table.
A close examination of Table 15 and Table 16 reveals that SRMWS clearly produces the most successful results. This finding indicates that SVM and KNN classifiers exhibit more effective performances when coupled with SRMWS, rendering this method preferable for these classifiers. Moreover, SRMWS consistently outperforms other methods across all datasets and term dimensions, establishing a statistically significant difference.
An examination of Table 17 reveals that RMWS is the most successful method when used with the NB classifier. This finding indicates that the NB classifier exhibits more effective performance when coupled with RMWS, rendering this method preferable for this classifier. While other methods achieve similar results to RMWS in some term dimensions, RMWS generally emerges as the best-performing method. This demonstrates that RMWS consistently outperforms other methods when used with the NB classifier, establishing a statistically significant difference.
A t-test was also used as a statistical analysis to demonstrate the validity of the proposed best-performing RMWS and SRMWS. For this purpose, Table 18 is constructed for RMWS and Table 19 for SRMWS. The tables show the results of one-sided, paired t-tests for the obtained p-values. If the p-value is below 0.05, the obtained results are deemed statistically significant. In particular, a p-value below 0.05 signifies statistical significance at a confidence level of 95%. Furthermore, if the p-value is less than 0.01, the results achieve statistical significance at an even higher confidence level of 99%
The results demonstrate that the performance gains achieved with the proposed RMWS weighting scheme compared to other schemes are statistically significant, with a very high confidence level of 99% for NB. Additionally, this confidence level is also at 99% for all values except one p-value for the SVM classifier. As seen in Table 18, for the KNN classifier, almost all p-values are at 95% and 99% confidence levels, except for a few cases. When Table 19 is examined, it is observed that SRMWS provides results with a very high confidence level of 99% for all classifiers. The obtained p-values indicate that the RMWS and SRMWS schemes perform significantly better than other schemes. This implies that the possibility of random coincidence is low, and the findings are reliable.
As a result, these findings clearly validate the superiority of the proposed RMWS and SRMWS weighting schemes compared to other schemes. Both schemes provide a statistically significant performance increase when used with NB, SVM, and KNN classifiers.

6. Discussion

The findings from our study indicate that the proposed methods, RMWS and SRMWS, significantly enhance the performance of text classification tasks. By exploring the class–term relationship using rough sets, these methods provide a novel approach to term weighting that outperforms both traditional and contemporary methods. The RMWS approach utilizes rough sets to identify terms that offer specific and distinctive information relevant to different classes. This capability allows for the identification of patterns within documents that may otherwise be overlooked by other term weighting schemes. By incorporating the coefficients α, β, and γ, RMWS can adjust the influence of these terms more accurately, leading to superior classification results. The SRMWS further refines this process by taking the square root of RMWS values, making the term weights more balanced and discriminative.
Experimental results revealed that RMWS and SRMWS consistently outperform existing term weighting schemes, regardless of dataset structure. This includes imbalanced-multiclass, balanced-multiclass, and imbalanced-binary class datasets, demonstrating the robustness of the proposed methods. Notably, SRMWS showed the highest classification performance across most scenarios, suggesting that the square root transformation adds significant value in enhancing term distinctiveness.
One critical finding is the statistical significance of the performance improvements offered by RMWS and SRMWS. The p-values from our t-tests indicate that the differences in performance are not due to random chance. This statistical validation underscores the efficacy of the proposed methods and their potential utility in practical applications.
When comparing classifiers, the NB classifier showed the greatest improvement with RMWS, while the SVM and KNN classifiers benefited more from SRMWS. This suggests that the choice between RMWS and SRMWS may depend on the specific classifier used and the nature of the dataset. Future studies could explore this relationship further, examining how different classifiers interact with these term weighting schemes under various conditions.
This study’s results also highlight the importance of proper coefficient selection for optimizing the performance of RMWS and SRMWS. The α, β, and γ coefficients have been shown to significantly impact classification outcomes, necessitating careful calibration based on the dataset and classifier used. This aspect presents an opportunity for future research to develop automated techniques for coefficient tuning, potentially using machine learning algorithms.
Overall, the incorporation of rough set theory into term weighting presents a promising direction for improving text classification. By focusing on revealing hidden patterns and specific class–term relationships, RMWS and SRMWS offer a more nuanced and effective approach to term weighting. The findings from this study contribute to the field by presenting robust, statistically validated methods that outperform existing approaches, paving the way for future advancements in automated text classification.

7. Conclusions

This study introduces a novel supervised term weighting scheme for text classification: the Rough Multivariate Weighting Scheme (RMWS), along with its derivative, the Root (Sqrt) Rough Multivariate Weighting Scheme (SRMWS). The proposed schema leverages the information extraction capabilities of rough sets to analyze and extract discriminative features from the document–term–class space. This approach surpasses traditional methods by incorporating a broader and deeper understanding of the relationships between terms, documents, and classes, enabling a more precise determination of term importance for classification tasks. Comprehensive experiments were performed on datasets with varying characteristics, including imbalanced and balanced structures, as well as multiclass and binary class scenarios. The results revealed the effectiveness of RMWS and its derivative SRMWS in enhancing classification accuracy and F1 scores, compared to traditional and other supervised term weighting methods. Notably, SRMWS consistently achieved the best performance across all dataset types, showcasing its robustness in modeling complex relationships within the data. Additionally, RMWS outperformed its foundational approaches, such as IGM and IGMimp, highlighting its superior capability in extracting and utilizing discriminative term information. In summary, this study contributes a single main term weighting model, RMWS, and demonstrates the potential benefits of its derivative, SRMWS, as a complementary variation. Both approaches underline the importance of leveraging rough set theory for effective feature weighting and classification in text mining. Future research could explore the integration of RMWS with other advanced rough set techniques or artificial intelligence methods, aiming to further enhance its performance and applicability to diverse text classification problems.

Funding

This research received no external funding.

Data Availability Statement

The Reuters-21578 and 10 Mini Newsgroups datasets, which support this study’s findings, are freely available on Machine Learning Repository-UCI (Reference [28]). The Enron1 dataset is available at https://www.kaggle.com/datasets/wcukierski/enron-email-dataset (accessed on 7 May 2015).

Conflicts of Interest

The author declares that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

  1. Lan, M.; Sung, S.-Y.; Low, H.-B.; Tan, C.-L. A Comparative Study on Term Weighting Schemes for Text Categorization. In Proceedings of the 2005 IEEE International Joint Conference on Neural Networks, Montreal, QC, Canada, 31 July–4 August 2005; IEEE: Piscataway, NJ, USA, 2005; Volume 1, pp. 546–551. [Google Scholar]
  2. Cekik, R.; Uysal, A.K. A New Metric for Feature Selection on Short Text Datasets. Concurr. Comput. 2022, 34, e6909. [Google Scholar] [CrossRef]
  3. Parlak, B.; Uysal, A.K. A Novel Filter Feature Selection Method for Text Classification: Extensive Feature Selector. J. Inf. Sci. 2023, 49, 59–78. [Google Scholar] [CrossRef]
  4. Cekik, R.; Uysal, A.K. A Novel Filter Feature Selection Method Using Rough Set for Short Text Data. Expert Syst. Appl. 2020, 160, 113691. [Google Scholar] [CrossRef]
  5. Manning, C.D. Introduction to Information Retrieval; Syngress Publishing: Rockland, MA, USA, 2008. [Google Scholar]
  6. Debole, F.; Sebastiani, F. Supervised Term Weighting for Automated Text Categorization. In Proceedings of the 2003 ACM Symposium on Applied Computing, Melbourne, FL, USA, 9–12 March 2003; pp. 784–788. [Google Scholar]
  7. Emmanuel, M.; Khatri, S.M.; Babu, D.R.R. A Novel Scheme for Term Weighting in Text Categorization: Positive Impact Factor. In Proceedings of the 2013 IEEE İnternational Conference on Systems, Man, and Cybernetics, Manchester, UK, 13–16 October 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 2292–2297. [Google Scholar]
  8. Ren, F.; Sohrab, M.G. Class-Indexing-Based Term Weighting for Automatic Text Classification. Inf. Sci. 2013, 236, 109–125. [Google Scholar] [CrossRef]
  9. Campos, R.; Mangaravite, V.; Pasquali, A.; Jorge, A.; Nunes, C.; Jatowt, A. YAKE! Keyword Extraction from Single Documents Using Multiple Local Features. Inf. Sci. 2020, 509, 257–289. [Google Scholar] [CrossRef]
  10. Doğan, T. Metin Sınıflandırma Için Terim Ağırlıklandırma. Ph.D. Thesis, Eskişehir Technical University, Eskişehir, Turkey, August 2019. [Google Scholar]
  11. Liu, Y.; Loh, H.T.; Sun, A. Imbalanced Text Classification: A Term Weighting Approach. Expert Syst. Appl. 2009, 36, 690–701. [Google Scholar] [CrossRef]
  12. Spärck Jones, K. A Statistical Interpretation of Term Specificity and Its Application in Retrieval. J. Doc. 2004, 60, 493–502. [Google Scholar] [CrossRef]
  13. Lan, M.; Tan, C.L.; Su, J.; Lu, Y. Supervised and Traditional Term Weighting Methods for Automatic Text Categorization. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 31, 721–735. [Google Scholar] [CrossRef] [PubMed]
  14. Ko, Y. A New Term-weighting Scheme for Text Classification Using the Odds of Positive and Negative Class Probabilities. J. Assoc. Inf. Sci. Technol. 2015, 66, 2553–2565. [Google Scholar] [CrossRef]
  15. Chen, K.; Zhang, Z.; Long, J.; Zhang, H. Turning from TF-IDF to TF-IGM for Term Weighting in Text Classification. Expert Syst. Appl. 2016, 66, 245–260. [Google Scholar] [CrossRef]
  16. Dogan, T.; Uysal, A.K. Improved Inverse Gravity Moment Term Weighting for Text Classification. Expert Syst. Appl. 2019, 130, 45–59. [Google Scholar] [CrossRef]
  17. Okkalioglu, M. TF-IGM Revisited: Imbalance Text Classification with Relative Imbalance Ratio. Expert Syst. Appl. 2023, 217, 119578. [Google Scholar] [CrossRef]
  18. Rathi, R.N.; Mustafi, A. The Importance of Term Weighting in Semantic Understanding of Text: A Review of Techniques. Multimed. Tools Appl. 2023, 82, 9761–9783. [Google Scholar] [CrossRef] [PubMed]
  19. Dai, Z.; Callan, J. Context-Aware Term Weighting for First Stage Passage Retrieval. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual, 25–30 July 2020; pp. 1533–1536. [Google Scholar]
  20. Bojanowski, P.; Grave, E.; Joulin, A.; Mikolov, T. Enriching Word Vectors with Subword Information. Trans. Assoc. Comput. Linguist. 2017, 5, 135–146. [Google Scholar] [CrossRef]
  21. Peters, M.E.; Neumann, M.; Iyyer, M.; Gardner, M.; Clark, C.; Lee, K.; Zettlemoyer, L. Deep Contextualized Word Representations. arXiv 2018, arXiv:1802.05365. [Google Scholar]
  22. Zhang, D.; Xu, H.; Su, Z.; Xu, Y. Chinese Comments Sentiment Classification Based on Word2vec and SVMperf. Expert Syst. Appl. 2015, 42, 1857–1863. [Google Scholar] [CrossRef]
  23. Yang, K.; Cai, Y.; Chen, Z.; Leung, H.; Lau, R. Exploring Topic Discriminating Power of Words in Latent Dirichlet Allocation. In Proceedings of the COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, Osaka, Japan, 11–16 December 2016; pp. 2238–2247. [Google Scholar]
  24. Pawlak, Z. Rough Set Theory and Its Applications to Data Analysis. Cybern. Syst. 1998, 29, 661–688. [Google Scholar] [CrossRef]
  25. Cekik, R.; Telceken, S. A New Classification Method Based on Rough Sets Theory. Soft Comput. 2018, 22, 1881–1889. [Google Scholar] [CrossRef]
  26. Zhao, H.; Wang, P.; Hu, Q. Cost-Sensitive Feature Selection Based on Adaptive Neighborhood Granularity with Multi-Level Confidence. Inf. Sci. 2016, 366, 134–149. [Google Scholar] [CrossRef]
  27. Zheng, L.; Diao, R.; Shen, Q. Self-Adjusting Harmony Search-Based Feature Selection. Soft Comput. 2015, 19, 1567–1579. [Google Scholar] [CrossRef]
  28. Asuncion, A.; Newman, D. UCI Machine Learning Repository 2007. Available online: https://archive.ics.uci.edu/dataset/137/reuters+21578+text+categorization+collection (accessed on 25 September 2024).
  29. Metsis, V.; Androutsopoulos, I.; Paliouras, G. Spam Filtering with Naive Bayes-Which Naive Bayes? In Proceedings of the CEAS, Mountain View, CA, USA, 27–28 July 2006; Volume 17, pp. 28–69. [Google Scholar]
  30. ÇEKİK, R.; Mahmut, K. A New Feature Selection Metric Based on Rough Sets and Information Gain in Text Classification. Gazi Univ. J. Sci. Part A Eng. Innov. 2023, 10, 472–486. [Google Scholar] [CrossRef]
  31. Uysal, A.K.; Gunal, S. A Novel Probabilistic Feature Selection Method for Text Classification. Knowl. Based Syst. 2012, 36, 226–235. [Google Scholar] [CrossRef]
Figure 1. Representative illustration of the working mechanism of RMWS.
Figure 1. Representative illustration of the working mechanism of RMWS.
Symmetry 17 00090 g001
Figure 2. Correlation between coefficients for SVM classifier: (a) α and β , (b) α and λ , and (c) λ and β .
Figure 2. Correlation between coefficients for SVM classifier: (a) α and β , (b) α and λ , and (c) λ and β .
Symmetry 17 00090 g002
Figure 3. Correlation between coefficients for KNN classifier: (a) α and β , (b) α and λ , and (c) λ and β .
Figure 3. Correlation between coefficients for KNN classifier: (a) α and β , (b) α and λ , and (c) λ and β .
Symmetry 17 00090 g003
Figure 4. Correlation between coefficients for NB classifier: (a) α and β , (b) α and λ , and (c) λ and β .
Figure 4. Correlation between coefficients for NB classifier: (a) α and β , (b) α and λ , and (c) λ and β .
Symmetry 17 00090 g004
Figure 5. (a) Micro-F1 and (b) macro-F1 metric results for multiclass text classification using the SVM classifier and ten different term weighting schemes on the Reuters-21578 corpus, with different feature sizes.
Figure 5. (a) Micro-F1 and (b) macro-F1 metric results for multiclass text classification using the SVM classifier and ten different term weighting schemes on the Reuters-21578 corpus, with different feature sizes.
Symmetry 17 00090 g005
Figure 6. (a) Micro-F1 and (b) macro-F1 metric results for multiclass text classification using the KNN (k = 5) classifier and ten different term weighting schemes on the Reuters-21578 corpus, with different feature sizes.
Figure 6. (a) Micro-F1 and (b) macro-F1 metric results for multiclass text classification using the KNN (k = 5) classifier and ten different term weighting schemes on the Reuters-21578 corpus, with different feature sizes.
Symmetry 17 00090 g006
Figure 7. (a) Micro-F1 and (b) macro-F1 metric results for multiclass text classification using the NB classifier and ten different term weighting schemes on the Reuters-21578 corpus, with different feature sizes.
Figure 7. (a) Micro-F1 and (b) macro-F1 metric results for multiclass text classification using the NB classifier and ten different term weighting schemes on the Reuters-21578 corpus, with different feature sizes.
Symmetry 17 00090 g007
Figure 8. (a) Micro-F1 and (b) macro-F1 metric results for binary class text classification using the SVM classifier and ten different term weighting schemes on the Enron1 corpus, with different feature sizes.
Figure 8. (a) Micro-F1 and (b) macro-F1 metric results for binary class text classification using the SVM classifier and ten different term weighting schemes on the Enron1 corpus, with different feature sizes.
Symmetry 17 00090 g008
Figure 9. (a) Micro-F1 and (b) macro-F1 metric results for binary class text classification using the KNN (k = 5) classifier and ten different term weighting schemes on the Enron1 corpus, with different feature sizes.
Figure 9. (a) Micro-F1 and (b) macro-F1 metric results for binary class text classification using the KNN (k = 5) classifier and ten different term weighting schemes on the Enron1 corpus, with different feature sizes.
Symmetry 17 00090 g009
Figure 10. (a) Micro-F1 and (b) macro-F1 metric results for binary class text classification using the NB classifier and ten different term weighting schemes on the Enron1 corpus, with different feature sizes.
Figure 10. (a) Micro-F1 and (b) macro-F1 metric results for binary class text classification using the NB classifier and ten different term weighting schemes on the Enron1 corpus, with different feature sizes.
Symmetry 17 00090 g010
Figure 11. (a) Micro-F1 and (b) macro-F1 metric results for multiclass text classification using the SVM classifier and ten different term weighting schemes on the 10 Mini Newsgroups corpus, with different feature sizes.
Figure 11. (a) Micro-F1 and (b) macro-F1 metric results for multiclass text classification using the SVM classifier and ten different term weighting schemes on the 10 Mini Newsgroups corpus, with different feature sizes.
Symmetry 17 00090 g011
Figure 12. (a) Micro-F1 and (b) macro-F1 metric results for multiclass text classification using the KNN (k = 5) classifier and ten different term weighting schemes on the 10 Mini Newsgroups corpus, with different feature sizes.
Figure 12. (a) Micro-F1 and (b) macro-F1 metric results for multiclass text classification using the KNN (k = 5) classifier and ten different term weighting schemes on the 10 Mini Newsgroups corpus, with different feature sizes.
Symmetry 17 00090 g012
Figure 13. (a) Micro-F1 and (b) macro-F1 metric results for multiclass text classification using the NB classifier and ten different term weighting schemes on the 10 Mini Newsgroups corpus, with different feature sizes.
Figure 13. (a) Micro-F1 and (b) macro-F1 metric results for multiclass text classification using the NB classifier and ten different term weighting schemes on the 10 Mini Newsgroups corpus, with different feature sizes.
Symmetry 17 00090 g013
Table 1. Contingency table of the relationship between term and class [10].
Table 1. Contingency table of the relationship between term and class [10].
D (Number of Documents) Containing the TermNot Containing the Term
C (belonging to class C) t x t x ¯
C ¯ (Not belonging to class C) t y t y ¯
Table 2. Preliminaries.
Table 2. Preliminaries.
NotationDescription
t refers to any term
d k refers to document k
T F t , d k specifies the frequency of term t in document d k
d ( t ) indicates the number of documents in which term t appears
D represents the total number of documents
C represents the number of classes in which the term appears
M refers to the total number of classes
D j shows the total number of documents in class j
d ( t ) j represents the total number of documents where the term t occurs in class j .
Table 3. An example of a decision table.
Table 3. An example of a decision table.
x U a 1 a 2 a 3 a 4 d
x 1 12121
x 2 02111
x 3 11223
x 4 22233
x 5 22133
x 6 11022
x 7 01111
x 8 22032
Table 4. A simple collection of documents for RMWS.
Table 4. A simple collection of documents for RMWS.
Documents, Their Contents, and ClassesPre-Processed Documents and Corresponding Term Frequency Vectors
DocumentsContentClass/
Label
Pre-Processed Document ContentsDimensions of Term Documents as Vectors
t 1 t 2 t 3 t 4
AppleCatDogMouse
D 1 “It’s an apple” C 1 <”apple”>1000
D 2 “This is a cat. That one over there is a cat.” C 1 <”cat”, “cat”>0200
D 3 “This is a cat. That one over there is a dog.” C 2 <”cat”, “dog”>0110
D 4 “There is an apple here. There is also a dog. There is another dog.” C 2 <”apple”, “dog”, “dog”>1020
D 5 “It’s a mouse. That’s a mouse over there.” C 3 <”mouse”, “mouse”>0002
D 6 “Here there is an apple, a dog, and a mouse.” C 3 <”apple”, “dog”, “mouse”>1011
Table 5. A summary of the calculated values for the simple example given in Table 4.
Table 5. A summary of the calculated values for the simple example given in Table 4.
Terms# of Class/ # of DocDoc FrequenciesClass NameSorted According to Distinctiveness (Intuitively) R M W S ( t i )
Values
t 1 {1, 0, 0, 1, 0, 1} 0.6667
t 2 3/6{0, 2, 1, 0, 0, 0}{ C 1 , C 1 , C 2 , C 2 , C 3 , C 3 } t 1 < t 2 < t 3 < t 4 1.3334
t 3 {0, 0, 1, 2, 0, 1} 1.3889
t 4 {0, 0, 0, 0, 2, 1} 2.0000
Table 6. Reuters_21578 dataset.
Table 6. Reuters_21578 dataset.
IdCategory/LabelTraining SamplesTesting Samples
1earn28401083
2acq1596696
3money-fx20687
4grain4110
5crude253121
6trade251117
7interest19075
8ship10836
Table 7. The 10 Mini Newsgroups dataset.
Table 7. The 10 Mini Newsgroups dataset.
IdCategory/LabelTraining SamplesTesting Samples
1alt.atheism700300
2comp.graphics700300
3comp.os.ms-windows.misc700300
4comp.sys.ibm.pc. hardware700300
5comp.sys.mac.hardware700300
6comp.windows.x700300
7misc.forsale700300
8rec.autos700300
9rec.motorcycles700300
10rec.sport.baseball700300
Table 8. Enron1 dataset.
Table 8. Enron1 dataset.
IdCategory/LabelTraining SamplesTesting Samples
1spam1000500
2legitimate24481224
Table 9. Classifiers used for investigating the performance of the proposed feature selection method.
Table 9. Classifiers used for investigating the performance of the proposed feature selection method.
ClassifierDescription
Support Vector Machines (SVMs)The SVM is widely recognized as a potent classifier in the literature, founded on the principle of margin maximization. Additionally, it offers both linear and non-linear variations, contingent on the type of kernel applied.
K-Nearest Neighbors (KNNs)The KNN is a non-parametric method that retains all existing cases and assigns classifications to new cases based on similarity measures. This approach has found extensive use in statistical estimation and pattern recognition.
Naïve Bayes (NB)The NB method relies on Bayes’ theorem, making independence assumptions among predictors.
Table 10. On the Reuters_21578 dataset, the macro-F1 and micro-F1 results of classifiers for varying α values while keeping λ and β constant.
Table 10. On the Reuters_21578 dataset, the macro-F1 and micro-F1 results of classifiers for varying α values while keeping λ and β constant.
(SVM) α β λ Micro-F1Macro-F1
5001000200050010002000
RMWS11190.902390.601591.015081.512179.883381.1063
RMWS0.81190.827190.977490.939881.430780.631880.7550
RMWS1.71190.601590.338390.789581.076479.508680.9575
RMWS31190.375990.112890.526380.944279.155180.5139
SRMWS11191.616591.503891.954982.633481.616082.8792
SRMWS0.81191.541491.541492.030182.566381.379582.9334
SRMWS1.71191.165490.939891.766981.951981.152882.6218
SRMWS31190.639190.639191.466281.450680.569182.0570
(KNN) α β λ Micro-F1Macro-F1
5001000200050010002000
RMWS11189.436189.398588.646679.013978.931877.9083
RMWS0.81189.436189.398588.646679.013978.931877.9083
RMWS1.71189.436189.398588.646679.013978.931877.9083
RMWS31189.436189.398588.646679.013978.931877.9083
SRMWS11191.165491.090289.962481.011381.003079.6915
SRMWS0.81191.165491.090289.962481.011381.003079.6915
SRMWS1.71191.165491.090289.962481.011381.003079.6915
SRMWS31191.165491.090289.962481.011381.003079.6915
(NB) α β λ Micro-F1Macro-F1
5001000200050010002000
RMWS11187.443686.691785.639178.490377.553076.5597
RMWS0.81183.308378.458675.864774.764569.538366.5792
RMWS1.71187.782086.879786.015078.982077.727277.1018
RMWS31187.293285.075282.969977.153575.405672.8317
SRMWS11172.142964.699262.782065.211259.070356.7011
SRMWS0.81157.518850.188047.594053.671648.633945.0508
SRMWS1.71187.782086.879786.015078.982077.727277.1018
SRMWS31187.293285.075282.969977.153575.405672.8317
Table 11. On the Reuters_21578 dataset, the macro-F1 and micro-F1 results of classifiers for varying β values while keeping α and λ constant.
Table 11. On the Reuters_21578 dataset, the macro-F1 and micro-F1 results of classifiers for varying β values while keeping α and λ constant.
(SVM) α β λ Micro-F1Macro-F1
5001000200050010002000
RMWS11190.902390.601591.015081.512179.883381.1063
RMWS10.8190.939890.563991.015081.555879.798081.1063
RMWS11.5190.939890.601590.977481.604579.829781.1115
RMWS12.5190.939890.601591.052681.653379.813281.2780
SRMWS11191.541491.503891.954982.633481.616082.8792
SRMWS10.8191.541491.466291.954982.633481.575082.8792
SRMWS11.5191.616591.503891.954982.780381.616082.8792
SRMWS12.5191.654191.503891.992582.895781.616083.1633
(KNN) α β λ Micro-F1Macro-F1
5001000200050010002000
RMWS11189.436189.398588.646679.013978.931877.9083
RMWS10.8189.323389.360988.609078.928178.877477.8208
RMWS11.5189.398589.436188.684278.999078.953178.0595
RMWS12.5189.436189.360988.684279.174478.956177.9125
SRMWS11191.165491.090289.962481.011381.003079.6915
SRMWS10.8191.127891.015089.962480.966980.854579.6715
SRMWS11.5191.240691.090290.000081.195480.903179.7053
SRMWS12.5191.240691.127890.150481.220581.013879.6607
(NB) α β λ Micro-F1Macro-F1
5001000200050010002000
RMWS11187.443686.691785.639178.490377.553076.5597
RMWS10.8187.443686.691785.639178.490377.553076.5597
RMWS11.5187.406086.691785.639178.463877.594476.6018
RMWS12.5187.406086.691785.639178.463877.594476.6018
SRMWS11172.142964.699262.782065.211259.070356.7011
SRMWS10.8172.067764.624162.518865.161258.998856.5718
SRMWS11.5172.067764.849662.857165.147359.322856.9782
SRMWS12.5173.571465.864763.684266.175859.810957.3950
Table 12. On the Reuters_21578 dataset, the macro-F1 and micro-F1 results of classifiers for varying λ values while keeping α and β constant.
Table 12. On the Reuters_21578 dataset, the macro-F1 and micro-F1 results of classifiers for varying λ values while keeping α and β constant.
(SVM) α β λ Micro-F1Macro-F1
5001000200050010002000
RMWS11190.902390.601591.015081.512179.883381.1063
RMWS110.890.977490.601591.090281.665179.798981.2997
RMWS110.490.827190.526390.977481.429779.869881.1589
RMWS110.190.488790.300890.676781.010179.433380.4257
SRMWS11191.541491.503891.954982.633481.616082.8792
SRMWS110.891.541491.578992.030182.578481.582383.0536
SRMWS110.491.203091.353491.917382.089681.780982.7391
SRMWS110.190.939890.864791.616581.773681.030782.1836
(KNN) α β λ Micro-F1Macro-F1
5001000200050010002000
RMWS11189.436189.398588.646679.013978.931877.9083
RMWS110.889.323389.360988.458678.924678.868477.8730
RMWS110.489.398589.323388.195579.337678.685177.4002
RMWS110.189.060288.157987.255679.072277.460276.8887
SRMWS11191.165491.090289.962481.011381.003079.6915
SRMWS110.891.052690.864790.037681.023880.667079.6909
SRMWS110.490.751990.300890.225680.643279.328479.3950
SRMWS110.189.060289.436189.022679.072277.763577.2477
(NB) α β λ Micro-F1Macro-F1
5001000200050010002000
RMWS11187.443686.691785.639178.490377.553076.5597
RMWS110.887.406086.691785.827178.463877.687277.0840
RMWS110.487.857186.879785.977479.025977.915477.0256
RMWS110.187.744486.766985.977478.197977.097376.5829
SRMWS11172.142964.699262.782065.211259.070356.7011
SRMWS110.882.594076.278273.045173.982067.390564.7270
SRMWS110.487.857186.879785.977479.025977.915477.0256
SRMWS110.187.744486.766985.977478.197977.097376.5829
Table 13. Appropriate coefficient values calculated as a result of the analysis of the Reuters_21578 dataset by classifier.
Table 13. Appropriate coefficient values calculated as a result of the analysis of the Reuters_21578 dataset by classifier.
ClassifiersCoefficients
α β λ
SVM12.5<0.8
KNN12.5<1
NB1.72.5<0.4
Table 14. Comparison of the coefficient values given for each classifier in Table 13 with the best results for the same classifier in Table 10, Table 11 and Table 12.
Table 14. Comparison of the coefficient values given for each classifier in Table 13 with the best results for the same classifier in Table 10, Table 11 and Table 12.
(SVM) α β λ Micro-F1Macro-F1
5001000200050010002000
RMWS110.890.977490.601591.090281.665179.798981.2997
RMWS12.5190.939890.601591.052681.653379.813281.2780
RMWS11190.902390.601591.015081.512179.883381.1063
RMWS12.50.890.977490.676791.127881.729879.890181.2650
RRMWS110.891.541491.578992.030182.578481.582383.0536
RRMWS12.5191.654191.503891.992582.895781.616083.1633
RRMWS0.81191.616591.541492.030182.566381.379582.9334
RRMWS12.50.891.766991.691792.030182.898281.848482.9315
(KNN) α β λ Micro-F1Macro-F1
5001000200050010002000
RMWS11189.436189.398588.646679.013978.931877.9083
RMWS12.5189.436189.360988.684279.174478.956177.9125
RMWS11189.436189.398588.646679.013978.931877.9083
RMWS12.5189.436189.360988.684279.174478.956177.9125
RRMWS11191.165491.090289.962481.011381.003079.6915
RRMWS12.5191.240691.127890.150481.220581.013879.6607
RRMWS11191.165491.090289.962481.011381.003079.6915
RRMWS12.5191.240691.127890.150481.220581.013879.6607
(NB) α β λ Micro-F1Macro-F1
5001000200050010002000
RMWS110.487.857186.879785.977479.025977.915477.0256
RMWS12.5187.406086.691785.639178.463877.594476.6018
RMWS1.71187.782086.879786.015078.982077.727277.1018
RMWS1.72.50.487.293285.075282.969977.15375.405672.8317
RMWS1.72.5187.894786.954986.127879.200477.883877.3484
RRMWS110.487.857186.879785.977479.025977.915477.0256
RRMWS12.5173.571465.864763.684266.175859.810957.3950
RRMWS1.71187.782086.879786.015078.982077.727277.1018
RRMWS1.72.50.487.293285.075282.969977.153575.405672.8317
RRMWS1.72.5187.894786.954986.127879.200477.883877.3484
Table 15. The average results for the specified term dimensions across all datasets for the SVM classifier.
Table 15. The average results for the specified term dimensions across all datasets for the SVM classifier.
Micro-F1
fsSIGMSIGMimpTFIDFICFICSDFIGMIGMimpTRRRMWSSRMWS
75093.790893.810191.509691.924390.866392.835092.815793.129793.455794.5917
150094.550694.570091.783692.216690.981292.987193.006493.657093.836594.9499
250094.077694.116290.786891.208189.696192.455292.435893.044593.692494.8632
375094.069994.050689.736490.995489.427692.288892.288892.667693.666394.8390
450094.301994.379489.850591.107689.456792.700592.661992.874593.914795.0757
575094.425694.425689.847890.799888.812392.741192.844693.138293.924495.2688
650094.386994.464289.123190.960088.540092.890992.987693.261993.905195.2358
775094.444994.483689.950091.218788.551592.858092.819493.145893.917695.2552
Macro-F1
fsSIGMSIGMimpTFIDFICFICSDFIGMIGMimpTRRRMWSSRMWS
75090.088690.113787.530287.796886.802388.866788.841689.195489.762291.1895
150091.000591.025287.894988.345787.079189.093389.114789.878190.090091.5508
250090.601390.651587.046387.500985.875988.642888.617689.293489.994491.5428
375090.599990.578285.909087.258785.645788.379588.379588.796389.859791.5128
450090.848090.951886.104587.398585.696188.884688.837989.120790.288991.8173
575091.012291.015586.149587.175785.146289.025989.127689.463790.320492.0130
650090.940691.037785.461787.331084.912689.194789.316289.624890.256191.9960
775090.938290.995286.271987.617884.792889.125289.082089.465790.262492.0085
Table 16. The average results for the specified term dimensions across all datasets for the KNN classifier.
Table 16. The average results for the specified term dimensions across all datasets for the KNN classifier.
Micro-F1
fsSIGMSIGMimpTFIDFICFICSDFIGMIGMimpTRRRMWSSRMWS
75093.686993.783684.988388.611485.425188.901188.945588.901191.972594.3198
150091.273491.400982.309385.928781.811886.610286.661486.610290.998594.1338
250090.943990.859879.389583.239479.335684.912784.983384.912790.048993.8789
375089.527589.359277.699682.300677.230983.504283.516883.504289.214493.7424
450089.226089.116877.182182.330076.939782.932182.925382.957189.411493.6751
575089.259089.148776.837681.999976.658183.662783.566083.662789.735293.7535
650089.278389.148776.824482.398376.391583.768183.664683.768189.820193.7149
775089.273489.221176.572382.227576.318683.850183.797883.850189.317793.6558
Macro-F1
fsSIGMSIGMimpTFIDFICFICSDFIGMIGMimpTRRRMWSSRMWS
75089.757989.865180.361784.060880.692384.584384.637784.584387.941990.4358
150087.021387.175077.436181.133976.756582.252182.319282.252186.813390.1866
250086.689986.560074.608978.531074.345480.436280.507880.436285.797789.8306
375085.362285.107473.048277.669072.277978.866378.867578.866384.865589.5935
450085.088684.919472.559177.742972.137078.178078.139078.185985.041489.5590
575085.035184.878572.483477.510172.118479.218479.058979.218485.605589.6769
650085.020784.841672.426377.935571.839979.333879.181279.333885.694189.5639
775085.022584.931672.364177.847271.780079.560679.466079.560685.170189.4328
Table 17. The average results for the specified term dimensions across all datasets for the NB classifier.
Table 17. The average results for the specified term dimensions across all datasets for the NB classifier.
Micro-F1
fsSIGMSIGMimpTFIDFICFICSDFIGMIGMimpTRRRMWSSRMWS
75091.682991.682990.180791.682991.682991.682991.682991.682992.271892.3829
150090.891090.891087.735490.891090.891090.891090.891090.891092.273792.1625
250089.253889.253885.332589.253889.253889.253889.253889.253891.331590.7760
375086.818686.818682.805286.818686.818686.818686.818686.818690.253089.6974
450086.286486.286482.523386.286486.286486.286486.286486.286489.857089.3014
575085.976785.976782.098685.976785.976785.976785.976785.976789.609989.0544
650085.786685.786681.906485.786685.786685.786685.786685.786689.421588.9770
775085.261285.261280.928985.261285.261285.261285.261285.261288.968088.3013
Macro-F1
fsSIGMSIGMimpTFIDFICFICSDFIGMIGMimpTRRRMWSSRMWS
75088.069788.069786.573788.069788.069788.069788.069788.069788.982089.0977
150087.165387.165383.721587.165387.165387.165387.165387.165388.947888.8595
250085.104085.104081.142085.104085.104085.104085.104085.104087.910287.4688
375081.785281.785277.990881.785281.785281.785281.785281.785286.375185.9738
450080.955680.955677.699480.955680.955680.955680.955680.955685.882685.4848
575080.340680.340676.842980.340680.340680.340680.340680.340685.314084.9162
650079.997679.997676.363179.997679.997679.997679.997679.997684.988984.7129
775078.926378.926374.885278.926378.926378.926378.926378.926383.835183.3747
Table 18. Statistical significance of performance improvements achieved from the proposed best-performing RMWS for all datasets.
Table 18. Statistical significance of performance improvements achieved from the proposed best-performing RMWS for all datasets.
SIGM/
RMWS
SIGMimp/
RMWS
TFIDF/
RMWS
ICF/
RMWS
ICSDF/
RMWS
IGM/
RMWS
IGMimp/
RMWS
TRR/
RMWS
SVMMicro-F10.0056 *0.0040 *0.0001 *0.0034 *0.0050 *0.0022 *0.0030 *0.06904 **
Macro-F10.0055 *0.0032 *0.0008 *0.0006 *0.0023 *0.0004 *0.0008 *0.0090 *
KNNMicro-F10.19590.26690.0099 *0.0026 *0.0015 *0.0141 **0.0180 **0.0138 **
Macro-F10.19970.30460.0044 *0.0011 *0.0071 *0.0124 **0.0175 **0.0123 **
NBMicro-F10.0019 *0.0019 *0.0026 *0.0019 *0.0019 *0.0019 *0.0019 *0.0019 *
Macro-F10.0018 *0.0018 *0.0023 *0.0018 *0.0018 *0.0018 *0.0018 *0.0018 *
* Significance at 99%. ** Significance at 95%.
Table 19. Statistical significance of performance improvements achieved from the proposed best-performing SRMWS for all datasets.
Table 19. Statistical significance of performance improvements achieved from the proposed best-performing SRMWS for all datasets.
SIGM/
SRMWS
SIGMimp/
SRMWS
TFIDF/
SRMWS
ICF/
SRMWS
ICSDF/
SRMWS
IGM/
SRMWS
IGMimp/
SRMWS
TRR/
SRMWS
SVMMicro-F10.0000 *0.0001 *0.0001 *0.0012 *0.0006 *0.0003 *0.0009 *0.0001 *
Macro-F10.0000 *0.0005 *0.0003 *0.0085 *0.0006 *0.0002 *0.0000 *0.0000 *
KNNMicro-F10.0076 *0.0001 *0.0008 *0.0014 *0.0012 *0.0091 *0.0010 *0.0090 *
Macro-F10.0058 *0.0082 *0.0035 *0.0067 *0.0054 *0.0070 *0.0085 *0.0073 *
NBMicro-F10.0015 *0.0015 *0.0017 *0.0015 *0.0015 *0.0015 *0.0015 *0.0015 *
Macro-F10.0016 *0.0016 *0.0017 *0.0016 *0.0016 *0.0016 *0.0016 *0.0016 *
* Significance at 99%.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Çekik, R. Effective Text Classification Through Supervised Rough Set-Based Term Weighting. Symmetry 2025, 17, 90. https://doi.org/10.3390/sym17010090

AMA Style

Çekik R. Effective Text Classification Through Supervised Rough Set-Based Term Weighting. Symmetry. 2025; 17(1):90. https://doi.org/10.3390/sym17010090

Chicago/Turabian Style

Çekik, Rasım. 2025. "Effective Text Classification Through Supervised Rough Set-Based Term Weighting" Symmetry 17, no. 1: 90. https://doi.org/10.3390/sym17010090

APA Style

Çekik, R. (2025). Effective Text Classification Through Supervised Rough Set-Based Term Weighting. Symmetry, 17(1), 90. https://doi.org/10.3390/sym17010090

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop