Summary
In data mining, the k-Nearest-Neighbours (kNN) method for classification is simple and effective [3, 1]. The success of kNN in classification is dependent on the selection of a “good” value for k, so in a sense kNN is biased by k. However, it is unclear what is a universally good value for k.
We propose to solve this choice-of-k issue by an alternative formalism which uses a sequence of values for k. Each value for k defines a neighbourhood for a data record — a set of k nearest neighbours, which contains some degree of support for each class with respect to the data record. It is our aim to select a set of neighbourhoods and aggregate their supports to create a classifier less biased by k. in print To this end we use a probability function G, which is defined in terms of a mass Junction for events weighted by a measurement of events. A mass function is an assignment of basic probability to events.
In the case of classification, events can be interpreted as neighbourhoods, and the mass function can be interpreted in terms of class proportions in neighbourhoods. Therefore, a mass function represents degrees of support for a class in various neighbourhoods.
We show that under this specification G is a linear function of the conditional probability of classes given a data record, which can be used directly for classification. Based on these findings we propose a new classification procedure.
Experiment shows that this classification procedure is indeed less biased by k, and that it displays a saturating property as the number of neighbourhoods increases. Experiment further shows that the performance of our classification procedure at saturation is comparable to the best performance of kNN.
Consequently, when we use kNN for classification we do not need to be concerned with k; instead, we need to select a set of neighbourhoods and apply the procedure presented here.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Atkeson, C. G., Moore, A. W., and Schaal, S. (1997). Locally weighted learning. Artificial Intelligence Review, 11(1–5):11–73.
Han, J. and Kamber, M. (2000). Data Mining: Concepts and Techniques. Morgan Kaufmann.
Hand, D., Mannila, H., and Smyth, P. (2001). Principles of Data Mining. The MIT Press.
Smets, P. and Kennes, R. (1994). The transferable belief model. Artificial Intelligence, 66(2):191–234.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, H., Düntsch, I., Gediga, G., Guo, G. (2005). Nearest Neighbours without k . In: Monitoring, Security, and Rescue Techniques in Multiagent Systems. Advances in Soft Computing, vol 28. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-32370-8_12
Download citation
DOI: https://doi.org/10.1007/3-540-32370-8_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23245-2
Online ISBN: 978-3-540-32370-9
eBook Packages: EngineeringEngineering (R0)