Fuzzy Clustering - A Versatile Mean To Explore Medical Databases
Fuzzy Clustering - A Versatile Mean To Explore Medical Databases
Fuzzy Clustering - A Versatile Mean To Explore Medical Databases
=
=
s
p
p ip
f a A
1
the sum of all single 'simplicities ' has to be
increased, i.e.
2
1 1
2
2
1 1
4
1
2 2
1 1
= = = = =
= =
k
p
m
i
ip
k
p
m
i
ip
k
p
p
a
m
a
m
s s (2)
The weights of the resulting 5 factors were transferred to membership functions of symptoms. The symptoms
reveal in this way different memberships to the different aspects of language disorders. Fuzzy c-mean clustering
(Bezdek, 1981) was then used to advise the symptoms to the different entities, because of polarization of the five
factors results in at least 10 categories.
The algorithm comprises the following steps (cf. Zimmermann, 1996):
Step 1. Chose the number of classes c, the number of objects A, and a weighting factor m, so that < < m 1 .
Step 2. Calculate the c fuzzy cluster centers by means of the chosen parameters
( )
( )
c i
x
v
A
k
m
ik
A
k
k
m
ik
i
, , 1
1
1
= =
=
=
(3)
Step 3. Calculate the new membership of all objects to the c classes
n k c i
d
d
c
j
m
jk
ik
ik
, , 1 ; , , 1
1
1
1
2
= =
(4)
Step 4. Compare the membership matrices before and after the iteration.
1
U U
n n
where n is the number of the actual iteration (5)
If the difference between the respective factor matrices is below a predefined threshold , then stop, else go back
to step 3.
The resulting cluster should be able to separate the different clusters in a sufficient way. For practical reasons we
examined only two clusters on each of the passages.
ESIT 2000, 14-15 September 2000, Aachen, Germany 455
RESULTS
The resulting classes of the clustering method are presented in the Figure 1 - 2. For graphical interpretation the
different symptoms were put in order according to their membership to the respective feature, e.g., severe or
moderate overall severity of disturbance. It can be seen that the clustering procedure leads to clearly
distinguishable classes of symptoms. The clusters can be separated easily, as it is indicated in the small areas of
overlap between the respective features. Moreover, the description of language failures by c-mean classification
of analyzed factors correspond in many but not in all cases to the traditional diagnostic scheme.
However, it is also visible that there are differences between the factors. The slope in the presentation of factor
III is too steep to be the basis of a clinical interpretation.
Figure 1: Graphical presentation of the results of the c-means clustering. Factor I (left) represents
the overall severity of disturbance whereas factor II (right) indicates the more expressive or more
comprehensive character of the language disorder.
Figure 2: Graphical presentation of the results of the c-means clustering. Factor III and IV (upper row) represent
the granularity of the phonetic language disorders and the patients awareness of the disease, factor V (below)
exposes the deficits in communication.
DISCUSSION AND CONCLUSION
Classification is a common, pragmatic tool in clinical medicine. It is the basis for diagnostic and hence for
therapeutic decisions. We used c-mean fuzzy clustering for classification after feature extraction from an aphasia
database. The additional feature extraction allows to ensure the statistical validity of the factors. It is obvious that
Fact or V
0
0,2
0,4
0,6
0,8
1
1,2
1 3 5 7 9 11 13 15 17 19 21 23 25
sympt oms
sever e
moder ate
Factor I I I
0
0,2
0,4
0,6
0,8
1
1,2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
sympt oms
hi gh
l ow
Fact or I I
0
0,2
0,4
0,6
0,8
1
1,2
1 3 5 7 9 11 13 15 17 19 21 23 25
sympt oms
motor i c
sensor i c
Fact or I
0
0,2
0,4
0,6
0,8
1
1,2
1 3 5 7 9 11 13 15 17 19 21 23 25
sympt oms
sever e
moder ate
Fact or I V
0
0,2
0,4
0,6
0,8
1
1,2
sympt oms
hi ghl y awar e
not awar e
ESIT 2000, 14-15 September 2000, Aachen, Germany 456
the information contained in the factors is already present in the original state, that is before extraction. The
extracted factors have been polarized. Consequently, the factors may be transferred to membership functions of
symptoms using Fuzzy c-means and the five factors lead to at least 10 categories.
The clustering method seems to be insufficient to distinguish the granularity of the phonetic mistakes correctly.
Nevertheless, overall severity of the disease and the character of the language disorder can be distinguished much
better. For practical reasons, these points seem to be of greater importance. The former point determines the
clinical outcome and the prognosis for the patient. The latter includes the differentiation between the different
entities of aphasia and hence is the major input for the determination of the therapy.
Fuzzy clustering of uncertain data: a model for dealing with medical ambiguities
The ambiguities inherent in the definition of the aphasic syndromes (Marshall, 1986), cannot be resolved
completely by the applied algorithm. Definitions of syndromes are probabilistic rather than crisply defined
(Marshall, 1986) and the classification features overlap between different categories. A symptom may belong to
more than one class. This is in accordance with the classical taxonomy of aphasia, which is also polytypic in
nature (cf. Axer, 2000). This taxonomy is based upon anatomical models, developed more than a century ago
(Broca, 1861; Wernicke, 1874). The design of all neuropsychological language tests is based upon the classical
classification scheme above. Despite emphasis on standardization the uncertainty inherent in neuropsychological
testing leads to some inconsistencies in the range of all tests. As these ambiguities exist, the application of fuzzy
methods seems to be a adequate means for an exploration of the results of the patients' clinical investigations.
Moreover, the described ambiguities are suited to be generalized to many problems of classification in medicine.
The question is: What is the benefit of using methods of soft computing in this field? Does it make sense to use
artificial procedures for exploring data, when even the clinical expert cannot resolve the ambiguities of the
clinical syndromes? During the last decade much research has been focused on the advance of computational
methods to analyze large data collections. A physician, who has to work with large collections of medical data
should know the possibilities and dangers of computational methods in dealing with this kind of information. On
the other hand computer scientists working on medical software should be exposed to medical data analysis as
well as to the specific purposes of medical knowledge. Computers in medicine cannot replace the medical expert
in diagnostic or therapeutic decision making. However, computers in general, and especially Fuzzy techniques,
may facilitate standardization of classification routines and hence can be important supportive tools for the
physician in practice as well as valuable tools in medical quality control and medical training. In addition, the
communication between medical scientists and computer engineers may lead to an interdisciplinary advance in
the analysis of inconsistencies in medical classifications. In this way, soft computing can be used to generate
models to be used for different medical disciplines.
REFERENCES
Axer, H., Jantzen, J., Berks, G., Sdfeld, G., Keyserlingk, D.G.v., 2000, "The Aphasia Database on the Web:
Description of a Model for Problems of Classification in Medicine." Proc. ESIT 2000
Bezdek, J.C., 1981, "Pattern Recognition with Fuzzy Objective Function Algorithms." Plenum Press, New York,
London
Broca, P., 1861, "Remarques sur le sige de la facult de langage articul, suivie dune observation daphmie
(perte de la parole)." Bull Soc Anat 6, pp. 330-57.
Huber, W., Poeck, K., Weniger, D., 1983, Aachener Aphasie Test (AAT). Hogrefe, Gttingen.
Huber, W., Poeck, K., Weniger, D., 1984, "The Aachen Aphasia Test." In: Rose, F.C., Advances in Neurology.
Vol. 42: Progress in Aphasiology. Raven, New York.
Keyserlingk, D.G.v., Jantzen, J., Berks, G., Keyserlingk, A.G.v., Axer, H., "Critical Data Analysis Precedes Soft
Computing of Medical Data." Proc. ESIT 2000
Marshall, J.C., 1986, "The description and interpretation of aphasic language disorder." Neuropsychologia 24,
pp. 5-24.
Weber, E., 1980, "Grundriss der Biologischen Statistik." Gustav Fischer Verl., Jena
Wernicke, C., 1874, "Der aphasische Symptomenkomplex. Eine psychologische Studie auf anatomischer Basis."
Max Cohn & Weigert, Breslau.
Zimmermann, H.-J., 1996, Fuzzy Set Theory, 3rd Ed., Kluwer Acad. Publ., Boston/MA , USA
ESIT 2000, 14-15 September 2000, Aachen, Germany 457