Research On Pattern Analysis and Data Classification Methodology For Data Mining and Knowledge Discovery
Research On Pattern Analysis and Data Classification Methodology For Data Mining and Knowledge Discovery
Abstract
A plethora of big data applications are emerging and being researched in the
computer science community which require online classification and pattern recognition
of huge data pools collected from sensor networks, image and video systems, online
forum platforms, medical agencies etc. However, as an NP hard issue data mining
techniques are facing with lots of difficulties. To deal with the hardship, we conduct
research on the novel algorithm for data mining and knowledge discovery through
network entropy. We firstly introduce necessary data analysis techniques such as support
vector machine, neural network and decision tree methods. Later, we analyze the
organizational structure of network graphical pattern with the knowledge of machine
learning methodology and graph theory. Eventually, our modified method is finalized
with decision and validation implementation. The simulation results of our approach on
different databases show the feasibility and effectiveness of our proposed framework. As
the final part, we provide our conclusion and prospect.
1. Introduction
A plethora of big data applications are emerging and being researched in the computer
science community which require online classification and pattern recognition of huge
data pools collected from sensor networks, image and video systems, online forum
platforms, medical agencies etc. However, as an NP hard issue data mining techniques are
facing with lots of difficulties. These difficulties could be classified in the following parts:
(1) Pre-processing of the data in the wild. (2) Choice of proper data classification
algorithm. (3) The time-consuming problem. To overcome these draw-backs, we will
analyze the issue in the paper. The applications of data mining technique vary from
subjects to other areas, the most essential applications and related researches could be
categorized as the following parts. (1) Image, video and speech processing. In [1], Anyela
et al. conducted research on objective definition of rosette shape variation using a
combined computer vision and data mining approach. The pipeline provides a cost-
effective and scalable way to the analysis of inherent rose shape change. Image
acquisition does not require any special equipment and a computer program to realize the
image processing and data analysis using open source software. In [2], Borhan, et al.,
design and implemented a novel tutoring system based on data classification. In their
research, they propose supervised machine learning (SML) models for speech act
classification in the context of an online collaborative learning game environment. (2)
Business related studies. In [3], Ryan, et al., combine the data mining technique to the
educational analysis and related business model. They pointed out that analysis has
become a trend in the past few years, reflected in a large number of graduate program
commitment to the analysis of the analytical skills of the declaration of providing
lucrative jobs, and the airport lounge waiting to be filled with advertisements from
different consulting company by analyzing the commitment to a big increase in profits. In
[4], Stavros, et al., designed intelligent E-business for online commerce, the combination
of financial models and data analysis enhance the performance of traditional business.
Business intelligence is becoming an important factor, can help organizations in the
management, development and exchange their valuable information and knowledge and
so on. Data mining is the main goal of the relations and the different models, but they
exist in the data set between the "hidden" large amounts of data. (3) Medical Assistance
and Auxiliary Medical. Shamsher’s group [5] conduct review on data-guided medical
applications, they conclude that all kinds of current or potential applications of data
mining technique in health informatics through some case studies published literature. In
[6] Feng, et al., pointed out that medical information mining is crucial for diagnosis. Their
review indicates that A large number of studies have shown that the real world research
has strong external validity than conventional randomized controlled trials, evaluating the
effect of intervention measures in the actual clinical, open a new path in coronary heart
disease (CHD) comprehensive medical research. Comprehensive medical clinical data,
however, great in number and complex data types in coronary heart disease, to explore
suitable methodology of a hot topic. Data analysis and knowledge discovery from the
clustered data acts as significant roles. More related applications of data mining could be
found in the following literatures [7-15].
In this paper, we conduct research on pattern analysis and data classification
methodology for data mining and knowledge discovery. We structure our paper followed
by the following pattern: In the Section 2, we give the review work of prior knowledge on
data classification, in the Section 3, we discuss our proposed methodology and the
Section 4 gives the experimental analysis and result. In the final part, we conclude the
paper and set up the prospect.
(SVM) is a set of input data and prediction, for a given input, these two classes in the
form of the output, making it a non - probabilistic binary linear classifier. Support vector
machine (SVM) model function form similar to the neural network and radial basis
function, the two popular data mining techniques. However, these algorithms regularized
well-founded theoretical method, the basis of support vector machine (SVM).
Generalization and easy to the quality of training support vector machine (SVM) goes far
beyond the ability of the more traditional method. Study of support vector machine
training algorithm from data classification and regression rules, for example, you can use
the support vector machine (SVM) learning, radial basis function (RBF) and polynomial
multilayer perceptron classifier. The Figure 3 shows the basic structure of SVM. The
theoretical analysis of SVM is shown later.
The target data for classification is denoted as the formula 1. The objective function
waiting to be solved is expressed in the formula 2.
T x1 , y1 ,..., xl , yl R n y
l
(1)
l
w 2 C i , s.t. yi w xi b 1 i
1 2
min (2)
w,b , 2
i 1
Therefore, the Wolf Dual of the expression 2 can be expressed and re-organized as:
max j yi y j xi x j i j , s.t. j yi 0
l
1 l l l
(3)
2 i 1 j 1
j 1 j 1
With this advanced a new method, we can calculate the network using the stochastic
process of the proportion of pij dynamic entropy describes the transition i j and it is
the stationary distribution of p . Joint optimization of the performance of distributed
data mining system, we design a distributed online learning algorithm, and its long-term
average reward the best distributed solution convergence, can get online data
classification problem gives a complete knowledge of the characteristics and their
classification function is applied to the data accuracy and cost. We define the regret of the
difference between the total expected return best distributed classification scheme is given
full knowledge classification function of the precision and the expected total return each
learners use of the algorithm. We make H p to be the dynamical entropy. The detailed
induction is defined in the formula 7.
H p i H i , where H i pij log pij (7)
i j
T
R T k xt x t E I yti yt d k xt
T
(8)
t 1 t 1
Data collected by the distributed processing a set of distributed heterogeneous learners
with the precision of classification function is unknown. Communicate in this setting,
calculation and sharing of costs make it is not possible to use the centralized data mining
technology in a learners can access the entire data set. Will limit first learn a single
classifier for each view example of using the tags. The most confident on the predictions
of each classifier unlabeled data and then use the iteration construct additional labeled
training data. By considering the different views of the same data set, the relation of
certain types of data from the predefined views may be found. Another related technical
committee machine, it is composed of the classifier of the object. The description is
shown in the Figure 5.
Therefore, the project classification is the importance of class. The SVN algorithm is a
“soft” clustering method in which the objects are assigned to the clusters with a degree of
belief. Therefore, an object can belong to more than one cluster with different degrees of
belief. It tries to find the feature points in each cluster, named as the center of a cluster,
then calculating the membership of each object in the cluster. The mathematical
expression is shown as follows.
C i, l
li
, where i
l
H Gli
(9)
kL ki H Gl
Harvard dataset is a database which contains 3 classes (Iris Setosa, Iris Versicolour, Iris
Virginica) and 150 instances, where each class refers to a kind of plant. The Figure 5
shows our result, we could conclude that our method is robust. The separate results are
shown in the Figure 6-Figure 8.
References
[1] A. Camargo, D. Papadopoulou, Z. Spyropoulou, K. Vlachonasios, J. H. Doonan and A. P. Gay,
"Objective definition of rosette shape variation using a combined computer vision and data mining
approach”, PloS one, vol. 9, no. 5, (2014), pp. e96889.
[2] B. Samei, H. Li, F. Keshtkar, V. Rus and A. C. Graesser, “Context-based speech act classification in
intelligent tutoring systems”, In Intelligent Tutoring Systems, Springer International Publishing, (2014),
pp. 236-241.
[3] B. Samei, H. Li, F. Keshtkar, V. Rus and A. C. Graesser, “Context-based speech act classification in
intelligent tutoring systems”, In Intelligent Tutoring Systems, Springer International Publishing, (2014),
pp. 236-241.
[4] S. Valsamidis, I. Kazanidis, S. Kontogiannis and A. Karakos, “A Proposed Methodology for E-Business
Intelligence Measurement Using Data Mining Techniques”, In Proceedings of the 18th Panhellenic
Conference on Informatics, ACM, (2014), pp. 1-6.
[5] D. P. Shukla, S. B. Patel and A. K. Sen, “A literature review in health informatics using data mining
techniques”, Int. J. Softw. Hardware Res. Eng. IJOURNALS, (2014).
[6] M. J. Berry and G. Linoff, “Data mining techniques: for marketing, sales, and customer support”, John
Wiley & Sons, Inc., (1997).
[7] L. Vaughan and Y. Chen, “Data mining from web search queries: A comparison of google trends and
baidu index”, Journal of the Association for Information Science and Technology, vol. 66, no. 1, (2015),
pp. 13-22.
[8] N. Sonawane and B. Nandwalkar, “Time Efficient Sentinel Data Mining using GPU”, In International
Journal of Engineering Research and Technology, ESRSA Publications, vol. 4, no. 02, (2015) February.