Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Fuzzy Clustering - A Versatile Mean To Explore Medical Databases

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Fuzzy Clustering - A Versatile Mean to Explore Medical Databases.

Georg Berks, Diedrich Graf v. Keyserlingk, Jan Jantzen*, Mariagrazia Dotoli**,


Hubertus Axer
Department of Anatomy I
RWTH Aachen
Pauwelsstr. 30, D-52057 Aachen, Germany
Phone: ++49-241-8089100, Fax: ++49-241-8888431
email: {georg, keyser, hubertus }@cajal.medizin.rwth-aachen.de
*Department of Automation
Technical University of Denmark
DK-2800 Lyngby, Denmark
Phone: ++45-4525-3561, Fax: ++45-4588-1295
email: Jantzen@iau.dtu.dk
**DEE Dipartimento di Elettrotecnica ed Elettronica
Politecnico di Bari
Via Re David, 200
70125 Bari, Italy
Phone: ++39-080-5963312, Fax: ++39-080-5963410
email: dotoli@poliba.it
ABSTRACT: A clinical syndrome is a set or a cluster of concurrent symptoms which indicate together the
presence and the nature of a disease. Looking for concurrent symptoms is therefore one of the main tasks in
medical diagnosis. In medicine imprecise conditions are the rule and therefore fuzzy methods are more suitable
than crisp ones. We used fuzzy c-means clustering to assign symptoms to the different types of aphasia
categories. The results were compared with the results in some subtests of the Aachen Aphasia Test (AAT). The
polarization of the five main factors leads to at least 10 different categories. The description of language failures
by c-mean classification of the analyzed factors corresponds in many but not in all cases to the traditional
diagnostic scheme.
KEYWORDS: Aphasia, fuzzy c-mean clustering, classification.
INTRODUCTION
A symptom is a visible or even measurable condition indicating the presence of a disease and hence can be
regarded as an aid in diagnosis. Symptoms are the smallest units indicating the existence of a disease. A
syndrome on the other hand is a collection, a set, or a cluster of concurrent symptoms, which together indicate the
presence and the nature of the disease. The history of a syndrome includes its first description, its confirmation,
and the acknowledgement of its usefulness. In many cases its name is dedicated to the first author. Joining single
symptoms together to one syndrome is one of the main tasks in medical diagnosis. Classification and clustering
are therefore basic concerns in medicine. Classification depends on the definition of the classes and on the
required degree of affiliation of their elements, i.e. the cases symptoms. Although classification is a traditional
approach in medicine, many ambiguities exist in finding exact diagnoses. In a mathematical or statistical
environment a value may or may not belong to one class. In medicine there are usually imprecise conditions and
therefore fuzzy methods seem to be more suitable than crisp ones.
ESIT 2000, 14-15 September 2000, Aachen, Germany 453
FUZZY C-MEAN-CLUSTERING
Cluster analysis is a large field, both within fuzzy sets and beyond it. Many algorithms have been developed to
obtain hard clusters from a given data set. Among those, the c-means algorithms and the ISODATA clustering
methods, are probably the most widely used. Both approaches are iterative. Hard c-means algorithms assume that
the center of a class C is known, whereas C is unknown in the case of the ISODATA algorithms. Hard c-means
execute a sharp classification, in which each object is either assigned to a class or not. The membership to a class
of objects therefore amounts to either 1 or 0. The application of Fuzzy sets in a classification function causes this
class membership to become a relative one and consequently an object can belong to several classes at the same
time but with different degree. The c-means algorithms are prototype-based procedures, which minimize the total
of the distances between the prototypes and the objects by the construction of a target function. Both methods,
sharp and fuzzy classification, determine class centers and minimize, e.g., the sum of squared distances between
these centers and the objects, which are characterized by their features. Thus classes have to be developed, which
are as dissimilar as possible.
Fuzzy c-mean clustering is an easy and well improved tool, which has been applied in many medical fields.
However, in c-means algorithms, like in all other optimization procedures, which look for the global minimum of
a function, there is the danger to come into local minima. Therefore the result of such a classification has to be
regarded as an optimum solution with a determined degree of the accuracy.
MEDICAL BACKGROUND OF APHASIA
Aphasia is a disturbance in the communicative use of language, which can occur in different forms (Axer et al.,
2000). It is produced by damage to regions of the cerebral cortex, which are related to language functions. In
contrast a disturbance of the articulation alone is called dysarthria. That means, in aphasia higher
neuropsychologic functions are affected.
Major clinical entities of aphasia
In aphasiology, there are many inconsistencies concerning the definition and interpretation of aphasic syndromes.
In a clinical setting, the following aphasic syndromes are distinguished. These syndromes are strictly empirical
and based on a statistically reliable co-occurrence of a set of symptoms.
Broca's Aphasia (also called Motor or Expressive Aphasia) (Broca, 1861): The Motor Aphasia is caused by
a lesion within the 3. frontal turn. The disturbances include mostly expressive language functions. The
patients speak non-fluently with and in a so-called telegram style.
Wernicke's Aphasia (also called Sensory or Receptive Aphasia) (Wernicke, 1874): The Sensory Aphasia is
caused by a lesion near the auditory center, with the consequence that the patient does not understand words
or also does not notice the defects of his actually fluent language.
Global Aphasia (also called Total Aphasia): In global Aphasia, loss of expression and understanding is
caused by an extended destruction of both of the centers above. Hence global aphasia is a very severe
language disturbance. Often communication is not possible at all.
Anomic Aphasia: The spontaneous speech of anomic patients is fluent and grammatically correct, but these
patients have difficulties in the retrieval of words.
Conduction Aphasia: Conduction aphasia is based on a damage of the connection between the sensory and
the motor center, the so-called Fasciculus arcuatus. While spoken language is understood, the repetition of
spoken words is severely disturbed or even impossible.
MATERIALS AND METHODS
The 265 AAT-test profiles (Huber et al., 1983, 1084) collected in the Aphasia Database since 1986 (Axer et al.,
2000) were taken as the input for a factor analysis. Factor analysis was applied on a correlation matrix of 26
symptoms of language disorders and led to five factors (Keyserlingk et al., 2000). These factors displayed
meaningful indication of the disease.
ESIT 2000, 14-15 September 2000, Aachen, Germany 454
Factor-No. Meaning
I severity of disturbance
II expressive vs. comprehensive
III granularity of phonetic mistakes
IV awareness of disease
V deficits in communication
Table I: Factors derived from the factor analysis.
After the factors have been gained they are usually transformed into 'simple structure' to render easier
interpretation of their significance. The principle of the 'simple structure' is to work out from all possible feature
configurations how scattered they may be - the ideal configuration, in which the variable possesses the simplest
complexity, i.e., it can be described by only one single factor. We treated the factors with the so-called varimax
method (Weber, 1980).
If the 'simplicity' of a single factor
p
f is defined as the variance of its loadings
2
p
s , than this variance has to
become a maximum to increase the 'simplicity' of the respective factor.
( ) ( ) k p a
m
a
m
s
m
i
ip
m
i
ip p
, , 1
1 1
2
1
2
2
1
2 2
= =

= =
(1)
To increase the 'simplicity' of the complete matrix

=
=
s
p
p ip
f a A
1
the sum of all single 'simplicities ' has to be
increased, i.e.
2
1 1
2
2
1 1
4
1
2 2
1 1

= = = = =

= =
k
p
m
i
ip
k
p
m
i
ip
k
p
p
a
m
a
m
s s (2)
The weights of the resulting 5 factors were transferred to membership functions of symptoms. The symptoms
reveal in this way different memberships to the different aspects of language disorders. Fuzzy c-mean clustering
(Bezdek, 1981) was then used to advise the symptoms to the different entities, because of polarization of the five
factors results in at least 10 categories.
The algorithm comprises the following steps (cf. Zimmermann, 1996):
Step 1. Chose the number of classes c, the number of objects A, and a weighting factor m, so that < < m 1 .
Step 2. Calculate the c fuzzy cluster centers by means of the chosen parameters
( )
( )
c i
x
v
A
k
m
ik
A
k
k
m
ik
i
, , 1
1
1
= =

=
=

(3)
Step 3. Calculate the new membership of all objects to the c classes
n k c i
d
d
c
j
m
jk
ik
ik
, , 1 ; , , 1
1
1
1
2
= =

(4)
Step 4. Compare the membership matrices before and after the iteration.

1
U U
n n
where n is the number of the actual iteration (5)
If the difference between the respective factor matrices is below a predefined threshold , then stop, else go back
to step 3.
The resulting cluster should be able to separate the different clusters in a sufficient way. For practical reasons we
examined only two clusters on each of the passages.
ESIT 2000, 14-15 September 2000, Aachen, Germany 455
RESULTS
The resulting classes of the clustering method are presented in the Figure 1 - 2. For graphical interpretation the
different symptoms were put in order according to their membership to the respective feature, e.g., severe or
moderate overall severity of disturbance. It can be seen that the clustering procedure leads to clearly
distinguishable classes of symptoms. The clusters can be separated easily, as it is indicated in the small areas of
overlap between the respective features. Moreover, the description of language failures by c-mean classification
of analyzed factors correspond in many but not in all cases to the traditional diagnostic scheme.
However, it is also visible that there are differences between the factors. The slope in the presentation of factor
III is too steep to be the basis of a clinical interpretation.
Figure 1: Graphical presentation of the results of the c-means clustering. Factor I (left) represents
the overall severity of disturbance whereas factor II (right) indicates the more expressive or more
comprehensive character of the language disorder.
Figure 2: Graphical presentation of the results of the c-means clustering. Factor III and IV (upper row) represent
the granularity of the phonetic language disorders and the patients awareness of the disease, factor V (below)
exposes the deficits in communication.
DISCUSSION AND CONCLUSION
Classification is a common, pragmatic tool in clinical medicine. It is the basis for diagnostic and hence for
therapeutic decisions. We used c-mean fuzzy clustering for classification after feature extraction from an aphasia
database. The additional feature extraction allows to ensure the statistical validity of the factors. It is obvious that
Fact or V
0
0,2
0,4
0,6
0,8
1
1,2
1 3 5 7 9 11 13 15 17 19 21 23 25
sympt oms
sever e
moder ate
Factor I I I
0
0,2
0,4
0,6
0,8
1
1,2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
sympt oms
hi gh
l ow
Fact or I I
0
0,2
0,4
0,6
0,8
1
1,2
1 3 5 7 9 11 13 15 17 19 21 23 25
sympt oms
motor i c
sensor i c
Fact or I
0
0,2
0,4
0,6
0,8
1
1,2
1 3 5 7 9 11 13 15 17 19 21 23 25
sympt oms
sever e
moder ate
Fact or I V
0
0,2
0,4
0,6
0,8
1
1,2
sympt oms
hi ghl y awar e
not awar e
ESIT 2000, 14-15 September 2000, Aachen, Germany 456
the information contained in the factors is already present in the original state, that is before extraction. The
extracted factors have been polarized. Consequently, the factors may be transferred to membership functions of
symptoms using Fuzzy c-means and the five factors lead to at least 10 categories.
The clustering method seems to be insufficient to distinguish the granularity of the phonetic mistakes correctly.
Nevertheless, overall severity of the disease and the character of the language disorder can be distinguished much
better. For practical reasons, these points seem to be of greater importance. The former point determines the
clinical outcome and the prognosis for the patient. The latter includes the differentiation between the different
entities of aphasia and hence is the major input for the determination of the therapy.
Fuzzy clustering of uncertain data: a model for dealing with medical ambiguities
The ambiguities inherent in the definition of the aphasic syndromes (Marshall, 1986), cannot be resolved
completely by the applied algorithm. Definitions of syndromes are probabilistic rather than crisply defined
(Marshall, 1986) and the classification features overlap between different categories. A symptom may belong to
more than one class. This is in accordance with the classical taxonomy of aphasia, which is also polytypic in
nature (cf. Axer, 2000). This taxonomy is based upon anatomical models, developed more than a century ago
(Broca, 1861; Wernicke, 1874). The design of all neuropsychological language tests is based upon the classical
classification scheme above. Despite emphasis on standardization the uncertainty inherent in neuropsychological
testing leads to some inconsistencies in the range of all tests. As these ambiguities exist, the application of fuzzy
methods seems to be a adequate means for an exploration of the results of the patients' clinical investigations.
Moreover, the described ambiguities are suited to be generalized to many problems of classification in medicine.
The question is: What is the benefit of using methods of soft computing in this field? Does it make sense to use
artificial procedures for exploring data, when even the clinical expert cannot resolve the ambiguities of the
clinical syndromes? During the last decade much research has been focused on the advance of computational
methods to analyze large data collections. A physician, who has to work with large collections of medical data
should know the possibilities and dangers of computational methods in dealing with this kind of information. On
the other hand computer scientists working on medical software should be exposed to medical data analysis as
well as to the specific purposes of medical knowledge. Computers in medicine cannot replace the medical expert
in diagnostic or therapeutic decision making. However, computers in general, and especially Fuzzy techniques,
may facilitate standardization of classification routines and hence can be important supportive tools for the
physician in practice as well as valuable tools in medical quality control and medical training. In addition, the
communication between medical scientists and computer engineers may lead to an interdisciplinary advance in
the analysis of inconsistencies in medical classifications. In this way, soft computing can be used to generate
models to be used for different medical disciplines.
REFERENCES
Axer, H., Jantzen, J., Berks, G., Sdfeld, G., Keyserlingk, D.G.v., 2000, "The Aphasia Database on the Web:
Description of a Model for Problems of Classification in Medicine." Proc. ESIT 2000
Bezdek, J.C., 1981, "Pattern Recognition with Fuzzy Objective Function Algorithms." Plenum Press, New York,
London
Broca, P., 1861, "Remarques sur le sige de la facult de langage articul, suivie dune observation daphmie
(perte de la parole)." Bull Soc Anat 6, pp. 330-57.
Huber, W., Poeck, K., Weniger, D., 1983, Aachener Aphasie Test (AAT). Hogrefe, Gttingen.
Huber, W., Poeck, K., Weniger, D., 1984, "The Aachen Aphasia Test." In: Rose, F.C., Advances in Neurology.
Vol. 42: Progress in Aphasiology. Raven, New York.
Keyserlingk, D.G.v., Jantzen, J., Berks, G., Keyserlingk, A.G.v., Axer, H., "Critical Data Analysis Precedes Soft
Computing of Medical Data." Proc. ESIT 2000
Marshall, J.C., 1986, "The description and interpretation of aphasic language disorder." Neuropsychologia 24,
pp. 5-24.
Weber, E., 1980, "Grundriss der Biologischen Statistik." Gustav Fischer Verl., Jena
Wernicke, C., 1874, "Der aphasische Symptomenkomplex. Eine psychologische Studie auf anatomischer Basis."
Max Cohn & Weigert, Breslau.
Zimmermann, H.-J., 1996, Fuzzy Set Theory, 3rd Ed., Kluwer Acad. Publ., Boston/MA , USA
ESIT 2000, 14-15 September 2000, Aachen, Germany 457

You might also like