A distance based classification is one of the popular methods for classifying instances using a point-to-point distance based on the nearest neighbour or k-NEAREST NEIGHBOUR (k-NN). The representation of distance measure can be one of the... more
A distance based classification is one of the popular methods for classifying instances using a point-to-point distance based on the nearest neighbour or k-NEAREST NEIGHBOUR (k-NN). The representation of distance measure can be one of the various measures available (e.g. Euclidean distance, Manhattan distance, Mahalanobis distance or other specific distance measures). In this paper, we propose a modified nearest neighbour method called Nearest Neighbour Distance Matrix (NNDM) for classification based on unsupervised and supervised distance matrix. In the proposed NNDM method, an Euclidean distance method coupled with a distance loss function is used to create a distance matrix. In our approach, distances of each instance to the rest of the training instances data will be used to create the training distance matrix (TADM). Then, the TADM will be used to classify a new instance. In supervised NNDM, two instances that belong to different classes will be pushed apart from each other. This is to ensure that the instances that are located next to each other belong to the same class. Based on the experimental results, we found that the trained distance matrix yields reasonable performance in classification.
4 th International Conference on Machine Learning & Applications (CMLA 2022) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of on Machine Learning & Applications.... more
4 th International Conference on Machine Learning & Applications (CMLA 2022) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of on Machine Learning & Applications. The aim of the conference is to provide a platform to the researchers and practitioners from both academia as well as industry to meet and share cutting-edge development in the field.
Handling numerical data stored in a relational database is different from handling those numerical data stored in a single table due to the multiple occurrences of an individual record in the non-target table and non-determinate relations... more
Handling numerical data stored in a relational database is different from handling those numerical data stored in a single table due to the multiple occurrences of an individual record in the non-target table and non-determinate relations between tables. Most traditional data mining methods only deal with a single table and discretize columns that contain continuous numbers into nominal values. In a relational database, multiple records with numerical attributes are stored separately from the target table, and these records are usually associated with a single structured individual stored in the target table. Numbers in multi-relational data mining (MRDM) are often discretized, after considering the schema of the relational database, in order to reduce the continuous domains to more manageable symbolic domains of low cardinality, and the loss of precision is assumed to be acceptable. In this paper, we consider different alternatives for dealing with continuous attributes in MRDM. The discretization procedures considered in this paper include algorithms that do not depend on the multi-relational structure of the data and also that are sensitive to this structure. In this experiment, we study the effects of taking the one-to-many association issue into consideration in the process of discretizing continuous numbers. We implement a new method of discretization, called the entropy-instance-based discretization method, and we evaluate this discretization method with respect to C4.5 on three varieties of a well-known multi-relational database (Mutagenesis), where numeric attributes play an important role. We demonstrate on the empirical results obtained that entropy-based discretization can be improved by taking into consideration the multiple-instance problem.
SVM training is usually discussed under two different algorithmic points of view. The first one is provided by decomposition methods such as SMO and SVMLight while the second one encompasses geometric methods that try to solve a Nearest... more
SVM training is usually discussed under two different algorithmic points of view. The first one is provided by decomposition methods such as SMO and SVMLight while the second one encompasses geometric methods that try to solve a Nearest Point Problem (NPP), the Gilbert–Schlesinger–Kozinec (GSK) and Mitchell–Demyanov–Malozemov (MDM) algorithms being the most representative ones. In this work we will show that, indeed, both approaches are essentially coincident. More precisely, we will show that a slight modification of SMO in which at each iteration both updating multipliers correspond to patterns in the same class solves NPP and, moreover, that this modification coincides with an extended MDM algorithm. Besides this, we also propose a new way to apply the MDM algorithm for NPP problems over reduced convex hulls.