$+/Eulg6/Vwhpe/,Qwhjudwlqj&Dvh%Dvhg5Hdvrqlqjdqg) X) ) / 'Hflvlrq7Uhhiru) Lqdqfldo7Lph6Hulhv'Dwd
$+/Eulg6/Vwhpe/,Qwhjudwlqj&Dvh%Dvhg5Hdvrqlqjdqg) X) ) / 'Hflvlrq7Uhhiru) Lqdqfldo7Lph6Hulhv'Dwd
$+/Eulg6/Vwhpe/,Qwhjudwlqj&Dvh%Dvhg5Hdvrqlqjdqg) X) ) / 'Hflvlrq7Uhhiru) Lqdqfldo7Lph6Hulhv'Dwd
Fig 1. The Framework oI CBFDT
TABLE I
FUNDAMENTAL INDICES FOR INVESTING IN STOCK
Indices Descriptions
Stock Capital
This index is used to estimate the scale oI a
company. The higher the amount oI stock capital,
the higher the circulating ability is.
Monthly Revenue
Monthly Revenue represents the operation
achievements oI a company. The better revenue
situation shows the company having the ability to
make more proIits.
Earnings Per Share
(EPS)
EPSTotal ProIit / Total Stock Shares
Turnover Rate
Turnover rate is an index to be observed and it
represents the level which investors concern.
Net worth and
market value ratio
(NWMV)
NWMVR Stock Net worth / Market price
Price-Earnings
Ratio, PER
PER Stock price / ProIit aIter taxes
The lower ratio represents investors can buy stock
with lower price.
2008 IEEE International Conference on Fuzzy Systems (FUZZ 2008) 77
Authorized licensed use limited to: IEEE Xplore. Downloaded on February 23, 2009 at 11:34 from IEEE Xplore. Restrictions apply.
B. A Case Based Weighted-clustering method
A stock historic case library which is derived Irom
yahoo.com.tw is applied to develop the weighted distance
metric and a similarity measure used in the Iollowing. |18|
First assume a Stock Library equal to
{ }
1 2
, ,.....,
N
SL e e e =
{
. Each
case in the library can be identiIied by an index oI
corresponding Ieatures. In addition each stock has an
associated action to be made Ior its current perIormance and
the action is either a hold, sell or buy decision. More
Iormally we use a collection oI Ieatures ( )}
1,.....,
j
F j n
( )
n i
y
=
1 2
, ,......, ,
i i i i
x x =
to
represent the cases and a variable V to denote the action. The
i-th case e
i
in the library can be represented as a
n1-dimensional vector, i.e. e x .
Where
j
x corresponds to the value oI Ieature
1
j
F j n
| |
|
\ .
s s
N
|
|
\ .
and
y
i
corresponds to the action |
e
p
e
q
e
( )
1/ 2 1/ 2
2
2
1 1
n n
w
pq j pj qj j j
j j
d w x x w x
| | |
| | |
| |
\ . |
\ . \
= =
=
(1)
Where .When all the weights are equal to 1 the
distance metric deIined above coincides with the Euclidean
measure, denote by .
2
2
pj qj j
x x x
|
\ .
=
d
1
pq
| |
|
\ .
By using the weighted distance deIined in equation (1), a
similarity measure between two cases, , can be deIined
as Iollows:
( ) w
pq
SM
( )
( )
1
1
w
pq
w
pq
SM
d o
=
+
(2)
Where is a positive parameter. When all weighs take value
1, the similarity measure is denoted by .
1
pq
SM
| |
|
\ .
AIter introducing the weighted distance metric and the
similarity measure, the weighted clustering method is Iurther
described in the Iollowing steps:
Phase one: Finding every weighted value from important
Technical Indices.
In this step, the gradient method is applied to Iind the
weighted values Irom important Technical Indices and a
Ieature evaluation Iunction is deIined. The smaller is the
evaluation value, the better are the corresponding Ieatures.
Thus we would like to Iind the weights such that the
evaluation Iunction attains its minimum. The detail
processes can be described as Iollows:
Step 1. Select the parameter o and the learning rateq .
Step 2. Initialize
j
w with random values in |0, 1|.
Step 3. Compute
j
w A Ior each j using equation (3)
j
j
E
w
w
q
c
A =
c
(3)
In this equation, E is deIined as equation (4)
( )
( )
( )
( )
( ) ( )
( )
( )
1 1
2* 1 1
* 1
w
pq pq pq pq
pq q p
SM SM SM SM
E w
N N
<
w (
+
(
=
(4)
where N is the number oI cases in the SL base
Step 4. Update with
j
w
j
w w
j
+A Ior each j.
Step 5. Repeat step 3 and step 4 until convergence, i.e., until
the value oI E becomes less than or equal to a given
threshold or until the number oI iterations exceeds a certain
predeIined number.
Phase two: Dividing the SL (Stock library) into Several
Clusters
This section attempts to partition the Stock library into
several clusters by using the weighted distance metric with
the weights learned in previous section. Since the Ieatures
are considered to be in real-value, many methods such as
K-Means clustering |5| and Kohonen` selI-organizing
network |5| |15|can be used to partition the case library.
However, this paper adopts a typical approach oI clustering,
by Shiu et al |18| which uses only the inIormation oI
similarity between cases. This approach Iirst transIorms the
similarity matrix to an equivalent matrix and then considers
the cases being equivalent to each other as one cluster. The
detail processes can be described as Iollows:
Step 1. Give a signiIicant level (threshold)
( | 0,1 | e
Step 2. Determine the similarity matrix
( )
( )
w
pq
SM SM =
according to equation (1) and (2)
Step 3. Compute 1 SM SM = .
( )
pq
SM s =
Where
( )( )
( ) ( )
max min ,
w w
pq k pk kq
s sm = sm
Step 4. II then go to step 5, else replace SM
with SM1 and go to step 3.
1 SM SM c
Step 5. Determine several clusters based on the rule 'case p
and case q belong to the same cluster iI and only iI _ .
pq
s
AIter clustering the case library into smaller cases, next
section will take a brieI introduction to the GAFDT
Iorecasting model.
78 2008 IEEE International Conference on Fuzzy Systems (FUZZ 2008)
Authorized licensed use limited to: IEEE Xplore. Downloaded on February 23, 2009 at 11:34 from IEEE Xplore. Restrictions apply.
C. GAFDT forecasting model 0
( )
0
x a
x a a x b
b a
x
c x b x c
c b
c x
s s
s s
(5)
This research Iirst uses case-based reasoning methods to
clustering Stocks data.Then combines our previous research
Genetic Algorithms and Fuzzy Decision Trees (GAFDT) |4|
to develop a Iorecasting model Ior the prediction oI stock
price movement. The Iramework oI GAFDT is depicted as
Iollows:
(6)
where S is the total input space and is the subset oI S Ior
which attribute A has a value v. The Entropy (S) over classes
is given by , where
v
S
2
1
log ( )
c
i
i
p p
=
i i
p represents the
probability oI class 'i. The attribute with the highest
inIormation gain, says B, is chosen as the root node oI the
tree. Next, a new decision tree is recursively constructed
over each value oI B using the training subspace
{ }
B
S S .A leaI-node or a decision-node is Iormed when all
the instances within the available training subspace are Irom
the same class. For detecting anomalies, the ID3 decision
tree outputs binary classiIication decision oI '0 to indicate
normal and '1 to indicate anomaly class assignments to test
instances.
The Iuzzy resolution concept in Iuzzy set theory is applied to
transIorm data attribute Irom continuous to discrete. Then, a
decision tree classiIication method is Iurther embedded to
build a stock Iorecasting model. In summary, ID3 decision
tree will be applied in our model as a programming tool.
3) Evolving Fuzzy Decision Tree by Genetic Algorithm
Genetic Algorithm will be used in this stage to improve the
accuracy oI FDT (Iuzzy decision tree) in Iinancial data
Iorecasting. Genetic Algorithms will Iind the best number oI
Iuzzy terms oI every input data (technical indices), and then
the Iitness Iunction will be re-calculated aIter each new
number oI Iuzzy terms. In this research, Iitness Iunctions is
the Iorecasting accuracy oI stock price movement, i.e., buy,
sell or hold decision. Next, GA will continue the selection,
crossover, and mutation. The process will iteratively repeat
until the stopping criteria are satisIied.
Kosko12] used Fuzzy Entropy method to revise Iuzzy
theory data, and Janikow|11| used Iuzzy set`s probability to
replace clear set probability by calculating Iuzzy set data
Entropy. In Fuzzy set theory, membership Iunction is one oI
the basic concepts, through this concept one will be able to
process quantitative Iuzzy set data, and dispose oI Iuzzy
message. How to Iind an apropos membership Iunction to
approach quantitative Iuzzy set data and dispose oI Iuzzy
message becomes very important in Iuzzy set theory.
However, there is not exist one perIect rule to adopt all kinds
oI Iuzzy set data. Researchers always consider diIIerent
problems with diIIerent membership Iunction; the most used
membership Iunction includes Triangles membership
Iunctions, trapezoid membership Iunctions, Gauss
membership Iunctions. This research will adopt Triangles
membership Iunctions Ior our primary membership
Iunctions. The equation oI triangles membership Iunctions
describing as Iollows:
D. The judgment of output value
This research mainly applies evolutional Iuzzy decision trees
to predict the trend oI stock price movement. The judgment
oI stock price movement is shown as Iollows:
1 t t
t
x x
y
x
= (7)
where is the closing price oI individual stock in the t
th
period and is the closing price oI individual stock in the
(t -1)
th
period
t
x
1 t
x
2008 IEEE International Conference on Fuzzy Systems (FUZZ 2008) 79
Authorized licensed use limited to: IEEE Xplore. Downloaded on February 23, 2009 at 11:34 from IEEE Xplore. Restrictions apply.
It will be a sell decision when y is greater than 0.5. On the
other hand, it will be a buy decision when y is less than
-0.5. Otherwise, it will be a hold position iI y is between
0.5 and -0.5.
IV. EXPERIMENTAL RESULT
According to the criteria listed in Table I, there are three
diIIerent stocks selected Ior studying and they are the Epistar
Corp. (EPISTAR), Silicon Integrated System Corp. (SiS)
and UMC Corp. (UMC) which represent upward, downward
and steady state stocks Ior our research purpose. The historic
data oI these stocks are derived Ior observation Irom
2000/8/10 to 2005/9/30. The main purpose oI instance
selection is to emphasis the importance oI stock screening.
Another purpose is to show that the proposed model can
have a robust perIormance even under diIIerent type oI stock
trends. Then diIIerent input Iactors Ior each stock are
selected according to step-wise regression analysis.
A. Best Parameters Setting (Step-wise regression)
According to the Step-wise regression (SRA) method,
important Iactors oI each stock are selected Irom the set oI
input Iactor and there are 24 technical indices in the input set.
Statically soItware SPSS is applied to execute the SRA
procedure, and important input Iactors are selected and
shown in the Iollowing table.
TABLE II
INPUT FACTORS SELECTED BY STEP-WISE REGRESSION ANALYSIS
Stock Names Input Factors Results
Technical Indices 12RSI
EPISTAR DiIIerence oI Technical
Indices
10BIAS diIIerence
6RSI diIIerence
12RSI diIIerence
Technical Indices 12RSI
Sis DiIIerence oI Technical
Indices
10BIAS diIIerence
6RSI diIIerence
12RSI diIIerence
Technical Indices 12WR
UMC DiIIerence oI Technical
Indices
10BIAS diIIerence
12RSI diIIerence
12WR diIIerence
B. A Weighted fuzzy clustering method
Experimental design is applied to decide the best parameter
setting. AIter the experimental tests, the parameter setting is
shown in Table III. In addition, the best number oI cases Ior
each stock is also shown in this table.
TABLE III
BEST PARAMETER SETTING FROM EXPERIMENTAL METHOD
Parameter setting EPISTAR SIS UMC
0.6 0.6 0.6
Learning Rate 0.7 0.7 0.7
0.65 0.65 0.4
Phase-one run times 1000 1000 1000
Phase-two run times 30 30 30
Best number of Cases 8 Cases 7 Cases 4 Cases
C. Best Parameters Setting (Genetic-Algorithms)
Genetic-Algorithms are applied to evolve the Iuzzy terms oI
each Iactor in this research. Four important Iactors are
selected in this experimental design and they are Population
Size, Number oI Generation, Crossover rate and Mutation
rate. AIter GA evolving, we will expect to derive a better
Iactor design Ior GA evolving applications oI these three
stocks is shown in Table IV.
TABLE IV
PARAMETER SETUPS OF GA FOR STOCKS EPISTAR, SIS, AND UMC
Epistar Sis Umc
Factors
Levels Levels Levels
Population Size 20 20 20
Number of Generation 100 10 100
Crossover rate 0.9 0.9 0.9
Mutation rate 0.1 0.1 0.3
D. Method Comparisons
AIter setting up the parameters oI the experiments, we
take the output oI CBFDT to be compared with those Irom
traditional FDT and GAFDT. As shown in Table V, the
5-Iold crossover test show that CBFDT perIorm much batter
than GAFDT and FDT in hit rate perIormance.
TABLE V
HIT RATE COMPARISONS OF ALL STOCKS FROM DIFFERENT FORECASTING
MODELS
Cross-over test
First Second Third Fourth Fifth
FDT 0.76 0.67 0.71 0.71 0.70
GAFDT 0.85 0.79 0.87 0.82 0.82
EPISTA
R
CBFDT 0.91 0.90 0.91 0.91 0.93
FDT 0.76 0.69 0.68 0.75 0.68
GAFDT 0.83 0.81 0.78 0.81 0.77 SIS
CBFDT 0.93 0.91 0.93 0.92 0.93
FDT 0.75 0.72 0.69 0.71 0.70
GAFDT 0.84 0.83 0.83 0.86 0.78
A
V
G
H
i
t
R
a
t
e
UMC
CBFDT 0.93 0.93 0.92 0.94 0.95
E. Discussions
As observed in Table 6, CBFDT outperIorms than other
FDT methods. The reasons are: 1.) A case based-clustering
method does split the case library into more homogeneous
smaller cases in data-preprocessing stage. ThereIore, Iuzzy
rules generated Irom each case can more sensitively react to
the current stock price movement. 2.) An evolving FDT can
be more eIIectively to decide the number oI Iuzzy terms
especially when the number oI data is increasingly large. As
shown in table VI, the data amounts and number oI Iuzzy
terms show that the more the number oI data, the more the
Iuzzy terms are. To generate eIIective Iuzzy rules, the
number oI Iuzzy terms should be evolved through GA. As a
result, the hit rate can be Iurther improved than FDT.
80 2008 IEEE International Conference on Fuzzy Systems (FUZZ 2008)
Authorized licensed use limited to: IEEE Xplore. Downloaded on February 23, 2009 at 11:34 from IEEE Xplore. Restrictions apply.
TABLE VI.
DATA AMOUNTS AND NUMBER OF FUZZY TERMS IN EACH TRIAL
Data mounts
Trials
500 100 50
1 9489 7898 5766
2 9969 7849 6724
3 9629 7898 6543
4 9669 2869 5736
5 8688 8848 4778
6 9429 2839 4786
7 9649 7829 5786
8 9499 5878 2766
9 9639 8888 4777
10 9859 7849 2776
In addition, Data distribution is another important Iactor
to be considered since it will aIIect the number oI
Iuzzy-terms to be clustered. For example, 12RSIdelta is
divided clearly into 9 Iuzzy terms as shown in Figure 3 and
the number oI data in each term is small and distributed
evenly. However, iI it is divided into 3 Iuzzy terms as shown
in Figure 4, there are a large number oI data in each term and
the Iuzzy rules generated may not be able to react to the real
situation and it may lead to wrong decisions. ThereIore, the
numbers oI Iuzzy terms oI each Ieature do aIIect the number
oI rules generated.
12RSI
Fuzzy Terms
Data Numbers
Fig3 12RSIdelta Divided into 9 Iuzzy terms
12RSI
1 2 3
Fuzzy Terms
Data Numbers
Fig 4. 12RSI Divided into 3 Iuzzy terms
|5|.
V. CONCLUSION
A considerable amount oI research has been conducted to
study the behavior oI a stock price movement. However, the
investor is more interesting in making proIit by providing
simple trading decision such as Buy/Hold/Sell Irom the
system rather than predicting the stock price itselI. ThereIore,
we take a diIIerent approach by applying a case based Iuzzy
decision tree to predict the stock price movement. A
step-wise regression (SRA) method is applied to select most
important Iactors Irom the set oI inputs. Next, a weighted
clustering method is adopted to divide the case base into a
smaller case. Within each case, a more homogeneous data
are grouped into together. ThereIore, these data can be more
eIIectively react to the current stock price movement. Finally,
a GA is applied to evolve the Iuzzy terms oI each Iactor in
order to derive the best Iuzzy decision tree Irom each case.
Through a series oI experimental tests, the CBFDT
outperIorms other approaches with an average hit rate
around 91. It is the highest among the literature published
up to present. The Hit-ratio (buy or sell) oI the Iuture stock
price movement can be applied to help investors to make
better decision in trading stocks.
In the Iuture, the proposed system can be Iurther
investigated by incorporating other soIt computing
techniques or a better Data Mining Iorecasting model other
than ID3 decision tree systems. They are listed as Iollows:
2/ A diIIerent Iorecasting model: There are numerous
Iorecasting models other than ID3 model exist in the
academic area. It is worth a while to study the behavior
oI these models when applied in prediction oI the stock
price movement. DiIIerent input Iactors and diIIerent
Iorecasting models such as CART, C4.5 are possible
candidate models Ior improving the accuracy oI the
perIormance measure.
3/ DiIIerent Data FuzziIication Method: DiIIerent kinds oI
Iuzzy membership Iunctions can be applied to transIorm
the original data, including Trapezoid membership
Iunctions, Gauss membership Iunctions. These
Iunctions may lead to a better perIormance result.
REFERENCES
|1|. A. Abraham, N. Baikunth, and P.K. Mahanti. 'Hybrid Intelligent
Systems Ior Stock Market Analysis. Lecture Notes in Computer
Science,vol.2074, pp. 337-345 ,2001.
|2|. Abu-MostaIa, Y.S. and A.F Atiya. 'Introduction to Iinancial
Iorecasting. Applied Intelligence, vol.6, pp. 205-213, 1996.
|3|. Baba, N., N. Inoue and H. Asakawa. 'Utilization oI Neural
Networks & GAs Ior Constructing Reliable Decision Support
Systems to Deal Stocks. IEEE-INNS-ENNS International Joint
ConIerence on Neural Networks (IJCNN'00), vol.5, pp 5111 -5116.
2000.
|4|. P.C Chang, Chen-Hao Liu, Chin-Yuan Fan, Wei-Hsiu Huang
'Establishing a Cluster Based Evolving Fuzzy Decision Tree on
Financial Time Series Data. The 8
th
Asia paciIic Industrial
Engineering & Management ,Kaoshiung,2007
P.C. Chang, C.H. Liu 'A TSK type Fuzzy Rule Based System Ior
Stock Price Prediction, Expert Systems with Applications 34 (1),
(2006) pp. 135-144.
|6|. P.C.Chang , and T. Warren Liao, 'Combing SOM and Fuzzy Rule
Base Ior Flow Time Prediction in Semiconductor ManuIacturing
Factory. Applied SoIt Computing, vol.6 (2), pp. 198-206.2006a.
|7|. S. C. Chi,, Chen, H. P., and C. H. Cheng, 'A Forecasting Approach
Ior Stock Index Future Using Grey Theory and Neural Networks,
IEEE International Joint ConIerence on Neural Networks, pp.
3850-3855, 1999.
|8|. G. Corani, G. Guariso. Coupling Iuzzy modeling and neural
networks Ior river Ilood prediction. IEEE Transactions on Systems,
Man and Cybernetics, Part C: Applications and Reviews Vol.35(3),
2008 IEEE International Conference on Fuzzy Systems (FUZZ 2008) 81
Authorized licensed use limited to: IEEE Xplore. Downloaded on February 23, 2009 at 11:34 from IEEE Xplore. Restrictions apply.
pp.382 - 390 . 2005.
|9|. G. P. Zhang, 'Avoiding PitIalls in Neural Network Research IEEE
Transaction on Systems, Man, and Cybernetics, part C vol.37,
pp3-16.2007.
|10|. H.L.Larsen and R.R.Yager, 'A Iramework Ior Iuzzy recognition
technology IEEE Transaction on Systems, Man, and Cybernetics,
part C vol.30, pp65-76.2000.
|11|. Janikow, C.Z., 'Fuzzy decision tree: Issues and methods, IEEE
Trans. On System, Man, and Cybernetics Part B: Cybernetics, Vol.
28, No. 1, pp.1-14,1998.
|12|. Kosko, B., Neural Network and Iuzzy Systems,
Prentice-Hall,Englewood CliIIs,NJ,1992.
|13|. Mu-Chun Su, Chih-Wen Liu, Shuenn-Shing Tsay,
'Neural-network-based Iuzzy model and its application to transient
stability prediction in power systems IEEE Transaction on Systems,
Man, and Cybernetics, part C vol.29, pp.149-157.1999.
|14|. Mugambi, E.M., A. Hunter., G. Oatley and L. Kennedy.,
'Polynomial-Iuzzy decision tree structures Ior classiIying medical
data, Knowledge-Based System, Vol.17, Issue. 2-4, pp. 81-87,
2004.
|15|. Murata, T., H. Ishibuchi, and M. Gen, 'Adjusting Fuzzy Partitions by
Genetic Algorithms and Histograms Ior Pattern ClassiIication
Problems, Proc. oI IEEE ConI. on Computational Intelligence, pp.
9-14, 1998.
|16|. Quinlan J.R. 'Induction oI decision trees, Machine Learning, Vol.
1,1986
|17|. R.H. Golan, W.Ziarko, A methodology Ior stock market analysis
utilizing rough set theory. Proceedings oI the IEEE/IAFE 1996
ConIerence on Computational Intelligence Ior Financial
Engineering, pp. 3240,1995.
|18|. Shiu, S.C.K., Li,Y., Wang,X.Z. 'Using Iuzzy integral to model
case-base competence Proc. of Soft Computing in Case-based
Reasoning Workshop, conjunction with the 4th Int. ConI. in
Case-Based Reasoning, IC-CBR 2001, Vancouver, Canada, pp.
206-212. 2001.
|19|. Sorensen E. H, K. L. Miller and C. K Ooi, ,'The Decision Tree
Approach to Stock Selection, journal oI PortIolio Management, Iall,
pp.42-45, 2000
|20|. Zadeh, L.A. 'Fuzzy sets. InIormation and Control. Vol.8 ,
pp.338-353,1965.
82 2008 IEEE International Conference on Fuzzy Systems (FUZZ 2008)
Authorized licensed use limited to: IEEE Xplore. Downloaded on February 23, 2009 at 11:34 from IEEE Xplore. Restrictions apply.