Stream analytics, a new paradigm in data analytics, has gained momentum due to the voluminous str... more Stream analytics, a new paradigm in data analytics, has gained momentum due to the voluminous stream data generation. With the huge increase in the edits performed on Wikipedia topics, it is tedious for the digital knowledge discovery users to find their domain updates immediately. The users need to go through large information and spend more time to find the potential data. There is a need for retrieving the Wikipedia edits based on the meta data of the article edits for later retriev-al. Hence, the clustering technique may be employed in order to group the Wikipedia article edits domain wise. Hence, in this paper, hierarchical stream clustering is applied in order to retrieve the edits based on the user interest. Over a period of month, the data from Wikipedia is collected and used as a dataset. Our method is compared with the state-of-the-art clus-tering system WikiAutoCat and it is observed that the accuracy is improved by 10% and the clustering time is reduced by 20%.
Dogs are domesticated mammals, not natural wild animals. They have been bred by humans for a long... more Dogs are domesticated mammals, not natural wild animals. They have been bred by humans for a long time. Today, some dogs are used as pets, others are used to help humans do their work. It’s a significant task for the owners to care and maintain their pet dog. For that, they need to know the breed of the dog to train and cure disease. The current paper presents a fine-grained image recognition problem, identifying the breed of a dog in a given image which includes convolution neural networks. Thenetwork is trained and evaluated on the Stanford Dogs Dataset. By using web scraping, the data from various websites are collected and rendered in the application.
Proceedings of the International Conference on Advances in Information Communication Technology & Computing - AICTC '16, 2016
Social networking applications are prominent among the internet user communities. Many social med... more Social networking applications are prominent among the internet user communities. Many social media websites are used for sharing the information instantly. Twitter is one of the vibrant social networking websites for sharing small textual information within a short span of time. It is essential to identify the type of information shared on these websites. Sentiment analysis involves the process of analyzing the opinion content present in the text. Millions of tweets are posted in a day about various topics. Twitter sentiment analysis mainly involves the process of identifying the polarity oriented terms mentioned in the tweet. Most of the twitter sentiment analysis works have concentrated on the sentiment polarity identification. Based on the literature, it is observed that, researchers still need to contribute in the area of sentiment score calculation of a tweet. Hence, in this work, sentiment score calculation is carried out with sentiment corpus oriented approach for calculating the score effectively. In addition, the grammatical type of the word used in a tweet, the relationship between the words are properly identified. The tweet tagger, corpus based sentiment score assignment have been distinctively used when compared to other previous works. The experimental results show that the sentimental score based tweet identification resulted in top tweets among the large collection of tweets.
2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2016
The data are generated very rapidly from different information sources. These generation of data ... more The data are generated very rapidly from different information sources. These generation of data is increasing day by day from various sources such as automated data collection tools, database systems, e-commerce and social media websites. There is an explosive growth of data from terabytes to petabytes. It is essential to extract valuable knowledge from these large data. Since large amount of data is available, people look for valuable knowledge from the available data. Several mining algorithms are used to extract interesting patterns from the data stored in a repository. Traditional data sources are static in nature that the content are not generated very rapidly. Data streams are the streams of information that are generated at very rapid rate. After the evolution of data streams, the need arises to think of a new algorithm to process it. There are various data stream algorithms used for mining the data streams with different requirements. In this work, the ensemble of classifiers model has been developed for mining the data streams by combining stream mining classifiers such as Similarity-based Data Stream Classifier (SimC) and Online Genetic Algorithm (OGA) classifier. The performance of ensemble based classifiers show improved classification accuracy and less classification error rate under various circumstances.
Advances in Intelligent Systems and Computing, 2015
Sentiment analysis is the process of finding the opinions present in the textual content. This pa... more Sentiment analysis is the process of finding the opinions present in the textual content. This paper proposes a tweet analyzer to perform sentiment analysis on twitter data. The work mainly involves the sentiment analysis process using various trained machine learning classifiers applied on large collection of tweets. The classifiers have been trained using maximum number of polarity oriented words for effectively classifying the tweets. The trained classifiers at sentence level outperformed the keyword based classification method. The classified tweets are further analyzed for identifying top N tweets. The experimental results show that the sentiment analyzer system predicted polarities of tweet and effectively identified top N tweets.
Proceedings of IEEE International Conference on Computer Communication and Systems ICCCS14, 2014
The advancement of web has empowered e-commerce facilities and induced the intimidation of physic... more The advancement of web has empowered e-commerce facilities and induced the intimidation of physical stores gradually. The online shopping space is growing all over the world. The consumers who mainly shop for products or services, before purchasing they wish to check for the product reviews that have been commented by other consumers. Therefore, analyzing the consumer reviews of online shopping is important to ease the future consumer's purchase. Generally, review analysis involves the process of determining the opinion polarity and then providing the results of performed analysis to the users by suggesting better products to users. Though various existing research methods are available for performing product analysis, it is important to reveal the individual opinion's weight by predicting the strength of each reviews and assessing the overall rank of the product by consolidating the predicted review strength. Therefore, the Online consumers can find out the reviews what they intended to attain quickly without searching all the reviews. The experimental result show that the proposed work yields better recommendation on products.
2012 International Conference on Recent Advances in Computing and Software Systems, 2012
E-transactions have become prominent and highly convenient due to the widespread usage of the int... more E-transactions have become prominent and highly convenient due to the widespread usage of the internet. The number of consumer reviews on various products is increasing day-by-day. These vast number of reviews are beneficial to manufacturers and consumers alike. It is a challenging task for a potential consumer to read all reviews to make a better purchase decision. It is beneficial
Data stream mining is the process of extracting knowledge from continuously generated data. Since... more Data stream mining is the process of extracting knowledge from continuously generated data. Since data stream processing is not a trivial task, the streams have to be analyzed with proper stream mining techniques. In many large volume of data stream processing, stream clustering helps to find the valuable hidden information. Many works have concentrated on clustering the data streams using various methods, but mostly those approaches lack in some core tasks needed to improve the cluster accuracy and quick processing of data streams. To tackle the problem of improving cluster quality and reducing the time for data stream processing time in cluster generation, the partition-based DBStream clustering method is proposed. The result has been compared with various data stream clustering methods, and it is evident from the experiments that the purity of clusters improves 5% and the time taken is reduced by 10% than the average time taken by other methods for clustering the data streams.
Stream analytics, a new paradigm in data analytics, has gained momentum due to the voluminous str... more Stream analytics, a new paradigm in data analytics, has gained momentum due to the voluminous stream data generation. With the huge increase in the edits performed on Wikipedia topics, it is tedious for the digital knowledge discovery users to find their domain updates immediately. The users need to go through large information and spend more time to find the potential data. There is a need for retrieving the Wikipedia edits based on the meta data of the article edits for later retriev-al. Hence, the clustering technique may be employed in order to group the Wikipedia article edits domain wise. Hence, in this paper, hierarchical stream clustering is applied in order to retrieve the edits based on the user interest. Over a period of month, the data from Wikipedia is collected and used as a dataset. Our method is compared with the state-of-the-art clus-tering system WikiAutoCat and it is observed that the accuracy is improved by 10% and the clustering time is reduced by 20%.
Dogs are domesticated mammals, not natural wild animals. They have been bred by humans for a long... more Dogs are domesticated mammals, not natural wild animals. They have been bred by humans for a long time. Today, some dogs are used as pets, others are used to help humans do their work. It’s a significant task for the owners to care and maintain their pet dog. For that, they need to know the breed of the dog to train and cure disease. The current paper presents a fine-grained image recognition problem, identifying the breed of a dog in a given image which includes convolution neural networks. Thenetwork is trained and evaluated on the Stanford Dogs Dataset. By using web scraping, the data from various websites are collected and rendered in the application.
Proceedings of the International Conference on Advances in Information Communication Technology & Computing - AICTC '16, 2016
Social networking applications are prominent among the internet user communities. Many social med... more Social networking applications are prominent among the internet user communities. Many social media websites are used for sharing the information instantly. Twitter is one of the vibrant social networking websites for sharing small textual information within a short span of time. It is essential to identify the type of information shared on these websites. Sentiment analysis involves the process of analyzing the opinion content present in the text. Millions of tweets are posted in a day about various topics. Twitter sentiment analysis mainly involves the process of identifying the polarity oriented terms mentioned in the tweet. Most of the twitter sentiment analysis works have concentrated on the sentiment polarity identification. Based on the literature, it is observed that, researchers still need to contribute in the area of sentiment score calculation of a tweet. Hence, in this work, sentiment score calculation is carried out with sentiment corpus oriented approach for calculating the score effectively. In addition, the grammatical type of the word used in a tweet, the relationship between the words are properly identified. The tweet tagger, corpus based sentiment score assignment have been distinctively used when compared to other previous works. The experimental results show that the sentimental score based tweet identification resulted in top tweets among the large collection of tweets.
2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2016
The data are generated very rapidly from different information sources. These generation of data ... more The data are generated very rapidly from different information sources. These generation of data is increasing day by day from various sources such as automated data collection tools, database systems, e-commerce and social media websites. There is an explosive growth of data from terabytes to petabytes. It is essential to extract valuable knowledge from these large data. Since large amount of data is available, people look for valuable knowledge from the available data. Several mining algorithms are used to extract interesting patterns from the data stored in a repository. Traditional data sources are static in nature that the content are not generated very rapidly. Data streams are the streams of information that are generated at very rapid rate. After the evolution of data streams, the need arises to think of a new algorithm to process it. There are various data stream algorithms used for mining the data streams with different requirements. In this work, the ensemble of classifiers model has been developed for mining the data streams by combining stream mining classifiers such as Similarity-based Data Stream Classifier (SimC) and Online Genetic Algorithm (OGA) classifier. The performance of ensemble based classifiers show improved classification accuracy and less classification error rate under various circumstances.
Advances in Intelligent Systems and Computing, 2015
Sentiment analysis is the process of finding the opinions present in the textual content. This pa... more Sentiment analysis is the process of finding the opinions present in the textual content. This paper proposes a tweet analyzer to perform sentiment analysis on twitter data. The work mainly involves the sentiment analysis process using various trained machine learning classifiers applied on large collection of tweets. The classifiers have been trained using maximum number of polarity oriented words for effectively classifying the tweets. The trained classifiers at sentence level outperformed the keyword based classification method. The classified tweets are further analyzed for identifying top N tweets. The experimental results show that the sentiment analyzer system predicted polarities of tweet and effectively identified top N tweets.
Proceedings of IEEE International Conference on Computer Communication and Systems ICCCS14, 2014
The advancement of web has empowered e-commerce facilities and induced the intimidation of physic... more The advancement of web has empowered e-commerce facilities and induced the intimidation of physical stores gradually. The online shopping space is growing all over the world. The consumers who mainly shop for products or services, before purchasing they wish to check for the product reviews that have been commented by other consumers. Therefore, analyzing the consumer reviews of online shopping is important to ease the future consumer's purchase. Generally, review analysis involves the process of determining the opinion polarity and then providing the results of performed analysis to the users by suggesting better products to users. Though various existing research methods are available for performing product analysis, it is important to reveal the individual opinion's weight by predicting the strength of each reviews and assessing the overall rank of the product by consolidating the predicted review strength. Therefore, the Online consumers can find out the reviews what they intended to attain quickly without searching all the reviews. The experimental result show that the proposed work yields better recommendation on products.
2012 International Conference on Recent Advances in Computing and Software Systems, 2012
E-transactions have become prominent and highly convenient due to the widespread usage of the int... more E-transactions have become prominent and highly convenient due to the widespread usage of the internet. The number of consumer reviews on various products is increasing day-by-day. These vast number of reviews are beneficial to manufacturers and consumers alike. It is a challenging task for a potential consumer to read all reviews to make a better purchase decision. It is beneficial
Data stream mining is the process of extracting knowledge from continuously generated data. Since... more Data stream mining is the process of extracting knowledge from continuously generated data. Since data stream processing is not a trivial task, the streams have to be analyzed with proper stream mining techniques. In many large volume of data stream processing, stream clustering helps to find the valuable hidden information. Many works have concentrated on clustering the data streams using various methods, but mostly those approaches lack in some core tasks needed to improve the cluster accuracy and quick processing of data streams. To tackle the problem of improving cluster quality and reducing the time for data stream processing time in cluster generation, the partition-based DBStream clustering method is proposed. The result has been compared with various data stream clustering methods, and it is evident from the experiments that the purity of clusters improves 5% and the time taken is reduced by 10% than the average time taken by other methods for clustering the data streams.
Uploads
Papers by Arun Manicka Raja M