Collaborative filtering (CF) is an important approach for recommendation system which is widely used in a great number of aspects of our life, heavily in the online-based commercial systems. One popular algorithms in CF is the K-nearest... more
Collaborative filtering (CF) is an important approach for recommendation system which is widely used in a great number of aspects of our life, heavily in the online-based commercial systems. One popular algorithms in CF is the K-nearest neighbors (KNN) algorithm, in which the similarity measures are used to determine nearest neighbors of a user, and thus to quantify the dependency degree between the relative user/item pair. Consequently, CF approach is not just sensitive to the similarity measure, yet it is completely contingent on selection of that measure. While Jaccard - as one of those commonly used similarity measures for CF tasks - concerns the existence of ratings, other numerical measures such as cosine and Pearson concern the magnitude of ratings. Particularly speaking, Jaccard is not a dominant measure, but it is long proven to be an important factor to improve any measure. Therefore, in our continuous efforts to find the most effective similarity measures for CF, this research focuses on proposing new similarity measure via combining Jaccard with several numerical measures. The combined measures would take the advantages of both existence and magnitude. Experimental results on, Movie-lens dataset, showed that the combined measures are preeminent outperforming all single measures over the considered evaluation metrics.
CAPTCHA(Completely Automated Public Turing test to Tell Computers and Humans Apart) can be used to protect data from auto bots. Countless kinds of CAPTCHAs are thus designed, while we most frequently utilize text-based scheme because of... more
CAPTCHA(Completely Automated Public Turing test to Tell Computers and Humans Apart) can be used to protect data from auto bots. Countless kinds of CAPTCHAs are thus designed, while we most frequently utilize text-based scheme because of most convenience and user-friendly way [1]. Currently, various types of CAPTCHAs need corresponding segmentation to identify single character due to the numerous different segmentation ways. Our goal is to defeat the CAPTCHA,thus rstly the CAPTCHAs need to be split into character by character. There isn't a regular segmentation algorithm to obtain the divided characters in all kinds of examples, which means that we have to treat the segmentation individually. In this paper, we build a whole system todefeat the CAPTCHAs as well as achieve state-of-the-art performance.In detail, we present our self-adaptive algorithm to segment different kinds of characters optimally, and then utilize both the existing methods and our own constructed convolutional neural network as an extra classfier. Results are provided showing how our system work well towards defeating these CAPTCHAs.
Talking about organ failure and people immediately recall kidney diseases. On the contrary, there is no such alertness about liver diseases and its failure despite the fact that this disease is one of the leading causes of mortality... more
Talking about organ failure and people immediately recall kidney diseases. On the contrary, there is no such alertness about liver diseases and its failure despite the fact that this disease is one of the leading causes of mortality worldwide. Therefore, an effective diagnosis and in time treatment of patients is paramount. This study accordingly aims to construct an intelligent diagnosis system which integrates principle component analysis (PCA) and k-nearest neighbor (KNN) methods to examine the liver patient dataset. The model works with the combination of feature extraction and classification performed by PCA and KNN respectively. Prediction results of the proposed system are compared using statistical parameters that include accuracy, sensitivity, specificity, positive predictive value and negative predictive value. In addition to higher accuracy rates, the model also attained remarkable sensitivity and specificity, which were a challenging task given an uneven variance among a...
Stock prices prediction is interesting and challenging research topic. Developed countries' economies are measured according to their power economy. Currently, stock markets are considered to be an illustrious trading field because in... more
Stock prices prediction is interesting and challenging research topic. Developed countries' economies are measured according to their power economy. Currently, stock markets are considered to be an illustrious trading field because in many cases it gives easy profits with low risk rate of return. Stock market with its huge and dynamic information sources is considered as a suitable environment for data mining and business researchers. In this paper, we applied k-nearest neighbor algorithm and non-linear regression approach in order to predict stock prices for a sample of six major companies listed on the Jordanian stock exchange to assist investors, management, decision makers, and users in making correct and informed investments decisions. According to the results, the kNN algorithm is robust with small error ratio; consequently the results were rational and also reasonable. In addition, depending on the actual stock prices data; the prediction results were close and almost par...
Cosine similarity is an important measure to compare two vectors for many researches in data mining and information retrieval. In this research, cosine measure and its advanced variants for collaborating filtering (CF) are evaluated.... more
Cosine similarity is an important measure to compare two vectors for many researches in data mining and information retrieval. In this research, cosine measure and its advanced variants for collaborating filtering (CF) are evaluated. Cosine measure is effective but it has a drawback that there may be two end points of two vectors which are far from each other according to Euclidean distance, but their cosine is high. This is negative effect of Euclidean distance which decreases accuracy of cosine similarity. Therefore, a so-called triangle area (TA) measure is proposed as an improved version of cosine measure. TA measure uses ratio of basic triangle area to whole triangle area as reinforced factor for Euclidean distance so that it can alleviate negative effect of Euclidean distance whereas it keeps simplicity and effectiveness of both cosine measure and Euclidean distance in making similarity of two vectors. TA is considered as an advanced cosine measure. TA and other advanced cosine measures are tested with other similarity measures. From experimental results, TA is not a preeminent measure but it is better than traditional cosine measures in most cases and it is also adequate to real-time application. Moreover, its formula is simple too.
Similarity searching has a vast range of applications in various fields of computer science. Many methods have been proposed for exact search, but they all suffer from the curse of dimensionality and are, thus, not applicable to high... more
Similarity searching has a vast range of applications in various fields of computer science. Many methods have been proposed for exact search, but they all suffer from the curse of dimensionality and are, thus, not applicable to high dimensional spaces. Approximate search methods are considerably more efficient in high dimensional spaces. Unfortunately, there are few theoretical results regarding the complexity of these methods and there are no comprehensive empirical evaluations, especially for non-metric spaces. To fill this gap, we present an empirical analysis of data structures for approximate nearest neighbor search in high dimensional spaces. We provide a comparison with recently published algorithms on several data sets. Our results show that small world approaches provide some of the best tradeoffs between efficiency and effectiveness in both metric and non-metric spaces.