Recently, in data science one of the most important issues has been discovering actionable inform... more Recently, in data science one of the most important issues has been discovering actionable information, interpretable patterns and relationships in large volumes of data. This process is called data mining and is commonly being used in science, engineering, business and security. One of the main methods of data mining is similarity search of time series. The approach that is discussed in this article is based on Piecewise Linear Representation of time series that imply two steps of measuring time series similarity. A new method of piecewise linear approximation of non-stationary time series is developed.
Approximation and filtration of time series belongs to one of most important problems in real wor... more Approximation and filtration of time series belongs to one of most important problems in real word scientific application. Despite of plenty of existed methods, they are effective in very specific situations, most of them assume that time series is stationary or can be stationary by finite amount of differencing. In this article we will consider several time series and compare Singular spectrum analysis and its low rank tensorial approximation method to classical wavelet decomposition approach. Keywords: DWT (discrete wavelet transform), SSA (singular spectrum analysis), Low Rank Tensorial approximation, Hankel data matrix, SNR (signal to noise ratio)
Aim of this study is applying the ensemble classification methods over the stock market closing v... more Aim of this study is applying the ensemble classification methods over the stock market closing values, which can be assumed as time series and finding out the relation between the economy news. In order to keep the study back ground clear, the majority voting method has been applied over the three classification algorithms, which are the k-nearest neighborhood, support vector machine and the C4.5 tree. The results gathered from two different feature extraction methods are correlated with majority voting meta classifier (ensemble method) which is running over three classifiers. The results show the success rates are increased after the ensemble at least 2 to 3 percent success rate.
International Journal of Business Intelligence Research, 2013
ABSTRACT Depending on the market strength and structure, it is a known fact that there is a corre... more ABSTRACT Depending on the market strength and structure, it is a known fact that there is a correlation between the stock market values and the content in newspapers. The correlation increases in weak and speculative markets, while they never get reduced to zero in the strongest markets. This research focuses on the correlation between the economic news published in a highly circulating newspaper in Turkey and the stock market closing values in Turkey. In the research several feature extraction methodologies are implemented on both of the data sources, which are the stock market values and economic news. Since the economic news is in natural language format, the text mining technique, term frequency – inverse document frequency is implemented. On the other hand, the time series analysis methods like random walk, Bollinger band, moving average or difference are applied over the stock market values. After the feature extraction step, the classification methods are built on the well-known classifiers support vector machine, k-nearest neighborhood and decision tree. Moreover, an ensemble classifier based on majority voting is implemented on top of these classifiers. The success rates show that the results are satisfactory to claim the methods implemented in this study can be spread to future research with similar data sets from other countries.
Is the concept of stock market speculations, related with the news in the news papers? This study... more Is the concept of stock market speculations, related with the news in the news papers? This study mainly focus on the correlation between economy news from one of the highest circulation rate news paper in Turkey and Istanbul stock market closing values. Data set is collected from the web page of news paper in natural language and text mining technique, term frequency – inverse document frequency is applied over these news. On the other hand the stock market values are evaluated as a signal processing job and random walk method has been applied on it. The two feature vectors are correlated with several classification algorithms such as support vector machines, k- nearest neighborhood and artificial neural networks. The results show that there is a weakly relation over 43% between the news and stock market closing values. We believe this research would be beneficiary for the literature to create some stock market estimation tools from the economy news or market strength analysis.
2011 5th International Conference on Application of Information and Communication Technologies (AICT), 2011
ABSTRACT An application of Singular Spectrum Analysis(SSA) Method, based on a new elaborated tens... more ABSTRACT An application of Singular Spectrum Analysis(SSA) Method, based on a new elaborated tensorial approach of computation of singular values and left and right singular vectors of arbitrary non-square matrices, for time series is presented. All necessary calculations of singular values and both types (left and right) singular vectors are performed on the base of elaborated tensorial approach. It is showed that non parametric SSA can be efficiently used as a universal filter to separate Low and High frequencies components in long signals and time series.
Because of the increasing studies on the big data, holding text as data source, the importance of... more Because of the increasing studies on the big data, holding text as data source, the importance of feature hashing has a major role in the literature now. A usual way of text mining on big data, mostly requires a layer of feature hashing, which reduces the size of fea-ture vector. For example getting the word count yields hundreds of thousands of features in most of the cases and taking the pos-tagging would reduce this number into features about 50. By the feature hashing the size of feature vector reduces reasonably and the data mining processes like classification, clustering or associa-tion can run faster. And in some cases, executing some algorithms are impossible with current hardware, where parallel or distribut-ed programming takes into account. The feature hashing approaches, usually can be categorized into two groups. The first group deals with natural language pro-cessing (NLP) algorithms and tries to extract a relatively smarter hash results, which represents the input ch...
Geliş Tarihi: 13 Mayıs 2013 Kabul Tarihi: 14 Mayıs 2014 ÖZET Bu çalışmanın amacı, öncelikli olara... more Geliş Tarihi: 13 Mayıs 2013 Kabul Tarihi: 14 Mayıs 2014 ÖZET Bu çalışmanın amacı, öncelikli olarak RSA şifreleme yönteminde kullanılan ve iki asal sayının çarpımından oluşan yarı-asal sayıları, çarpanlara ayırmaya yöneliktir. Bu makale kapsamında sık kullanılan ve öne çıkan çarpanlara ayırma yöntemlerinin açıklanması ve performanslarının karşılaştırılması yapılmıştır. Çalışma kapsamında yeni bir çarpanlara ayırma yöntemi önerilmiştir. Ayrıca çarpanlara ayırma yöntemleri, rastgele üretilen asal sayılar üzerinde denenerek yeni önerilen yöntemin başarısı sınanmıştır. Yapılan çalışmalar, RSA yönteminde kullanılan yarı-asal sayılara saldırmak için, önerilen yeni yöntemin, mevcut yöntemlere göre daha avantajlı olduğunu ortaya koymaktadır. Günümüz şifreleme teknolojilerinin tamamı, matematiksel bir zorluğa dayalı olarak geliştirilmiştir. Örneğin iki sayının çarpılması kolay ancak bir sayının çarpanlarına ayrılması zordur. Benzer şekilde a b şeklinde üst almak kolay ancak tersi olan logarit...
Recently, in data science one of the most important issues has been discovering actionable inform... more Recently, in data science one of the most important issues has been discovering actionable information, interpretable patterns and relationships in large volumes of data. This process is called data mining and is commonly being used in science, engineering, business and security. One of the main methods of data mining is similarity search of time series. The approach that is discussed in this article is based on Piecewise Linear Representation of time series that imply two steps of measuring time series similarity. A new method of piecewise linear approximation of non-stationary time series is developed.
Approximation and filtration of time series belongs to one of most important problems in real wor... more Approximation and filtration of time series belongs to one of most important problems in real word scientific application. Despite of plenty of existed methods, they are effective in very specific situations, most of them assume that time series is stationary or can be stationary by finite amount of differencing. In this article we will consider several time series and compare Singular spectrum analysis and its low rank tensorial approximation method to classical wavelet decomposition approach. Keywords: DWT (discrete wavelet transform), SSA (singular spectrum analysis), Low Rank Tensorial approximation, Hankel data matrix, SNR (signal to noise ratio)
Aim of this study is applying the ensemble classification methods over the stock market closing v... more Aim of this study is applying the ensemble classification methods over the stock market closing values, which can be assumed as time series and finding out the relation between the economy news. In order to keep the study back ground clear, the majority voting method has been applied over the three classification algorithms, which are the k-nearest neighborhood, support vector machine and the C4.5 tree. The results gathered from two different feature extraction methods are correlated with majority voting meta classifier (ensemble method) which is running over three classifiers. The results show the success rates are increased after the ensemble at least 2 to 3 percent success rate.
International Journal of Business Intelligence Research, 2013
ABSTRACT Depending on the market strength and structure, it is a known fact that there is a corre... more ABSTRACT Depending on the market strength and structure, it is a known fact that there is a correlation between the stock market values and the content in newspapers. The correlation increases in weak and speculative markets, while they never get reduced to zero in the strongest markets. This research focuses on the correlation between the economic news published in a highly circulating newspaper in Turkey and the stock market closing values in Turkey. In the research several feature extraction methodologies are implemented on both of the data sources, which are the stock market values and economic news. Since the economic news is in natural language format, the text mining technique, term frequency – inverse document frequency is implemented. On the other hand, the time series analysis methods like random walk, Bollinger band, moving average or difference are applied over the stock market values. After the feature extraction step, the classification methods are built on the well-known classifiers support vector machine, k-nearest neighborhood and decision tree. Moreover, an ensemble classifier based on majority voting is implemented on top of these classifiers. The success rates show that the results are satisfactory to claim the methods implemented in this study can be spread to future research with similar data sets from other countries.
Is the concept of stock market speculations, related with the news in the news papers? This study... more Is the concept of stock market speculations, related with the news in the news papers? This study mainly focus on the correlation between economy news from one of the highest circulation rate news paper in Turkey and Istanbul stock market closing values. Data set is collected from the web page of news paper in natural language and text mining technique, term frequency – inverse document frequency is applied over these news. On the other hand the stock market values are evaluated as a signal processing job and random walk method has been applied on it. The two feature vectors are correlated with several classification algorithms such as support vector machines, k- nearest neighborhood and artificial neural networks. The results show that there is a weakly relation over 43% between the news and stock market closing values. We believe this research would be beneficiary for the literature to create some stock market estimation tools from the economy news or market strength analysis.
2011 5th International Conference on Application of Information and Communication Technologies (AICT), 2011
ABSTRACT An application of Singular Spectrum Analysis(SSA) Method, based on a new elaborated tens... more ABSTRACT An application of Singular Spectrum Analysis(SSA) Method, based on a new elaborated tensorial approach of computation of singular values and left and right singular vectors of arbitrary non-square matrices, for time series is presented. All necessary calculations of singular values and both types (left and right) singular vectors are performed on the base of elaborated tensorial approach. It is showed that non parametric SSA can be efficiently used as a universal filter to separate Low and High frequencies components in long signals and time series.
Because of the increasing studies on the big data, holding text as data source, the importance of... more Because of the increasing studies on the big data, holding text as data source, the importance of feature hashing has a major role in the literature now. A usual way of text mining on big data, mostly requires a layer of feature hashing, which reduces the size of fea-ture vector. For example getting the word count yields hundreds of thousands of features in most of the cases and taking the pos-tagging would reduce this number into features about 50. By the feature hashing the size of feature vector reduces reasonably and the data mining processes like classification, clustering or associa-tion can run faster. And in some cases, executing some algorithms are impossible with current hardware, where parallel or distribut-ed programming takes into account. The feature hashing approaches, usually can be categorized into two groups. The first group deals with natural language pro-cessing (NLP) algorithms and tries to extract a relatively smarter hash results, which represents the input ch...
Geliş Tarihi: 13 Mayıs 2013 Kabul Tarihi: 14 Mayıs 2014 ÖZET Bu çalışmanın amacı, öncelikli olara... more Geliş Tarihi: 13 Mayıs 2013 Kabul Tarihi: 14 Mayıs 2014 ÖZET Bu çalışmanın amacı, öncelikli olarak RSA şifreleme yönteminde kullanılan ve iki asal sayının çarpımından oluşan yarı-asal sayıları, çarpanlara ayırmaya yöneliktir. Bu makale kapsamında sık kullanılan ve öne çıkan çarpanlara ayırma yöntemlerinin açıklanması ve performanslarının karşılaştırılması yapılmıştır. Çalışma kapsamında yeni bir çarpanlara ayırma yöntemi önerilmiştir. Ayrıca çarpanlara ayırma yöntemleri, rastgele üretilen asal sayılar üzerinde denenerek yeni önerilen yöntemin başarısı sınanmıştır. Yapılan çalışmalar, RSA yönteminde kullanılan yarı-asal sayılara saldırmak için, önerilen yeni yöntemin, mevcut yöntemlere göre daha avantajlı olduğunu ortaya koymaktadır. Günümüz şifreleme teknolojilerinin tamamı, matematiksel bir zorluğa dayalı olarak geliştirilmiştir. Örneğin iki sayının çarpılması kolay ancak bir sayının çarpanlarına ayrılması zordur. Benzer şekilde a b şeklinde üst almak kolay ancak tersi olan logarit...
Uploads
Papers by Cihan Mert