The widespread use of digital images has led to a new challenge in digital image forensics. These images can be used in court as evidence of criminal cases. However, digital images are easily manipulated which brings up the need of a... more
The widespread use of digital images has led to a new challenge in digital image forensics. These images can be used in court as evidence of criminal cases. However, digital images are easily manipulated which brings up the need of a method to verify the authenticity of the image. One of the methods is by identifying the source camera. In spite of that, it takes a large amount of time to be completed by using traditional desktop computers. To tackle the problem, we aim to increase the performance of the process by implementing it in a distributed computing environment. We evaluate the camera identification process using conditional probability features and Apache Hadoop. The evaluation process used 6000 images from six different mobile phones of the different models and classified them using Apache Mahout, a scalable machine learning tool which runs on Hadoop. We ran the source camera identification process in a cluster of up to 19 computing nodes. The experimental results demonstrate exponential decrease in processing times and slight decrease in accuracies as the processes are distributed across the cluster. Our prediction accuracies are recorded between 85 to 95% across varying number of mappers.
DataFlair's Big Data Hadoop Tutorial PPT for Beginners takes you through various concepts of Hadoop:This Hadoop tutorial PPT covers: 1. Introduction to Hadoop 2. What is Hadoop 3. Hadoop History 4. Why Hadoop 5. Hadoop Nodes 6.... more
DataFlair's Big Data Hadoop Tutorial PPT for Beginners takes you through various concepts of Hadoop:This Hadoop tutorial PPT covers: 1. Introduction to Hadoop 2. What is Hadoop 3. Hadoop History 4. Why Hadoop 5. Hadoop Nodes 6. Hadoop Architecture 7. Hadoop data flow 8. Hadoop components – HDFS, MapReduce, Yarn 9. Hadoop Daemons 10. Hadoop characteristics & features Related Blogs: Hadoop Introduction – A Comprehensive Guide: https://goo.gl/QadBS4 Wish to Learn Hadoop & Carve your career in Big Data, Contact us: info@data-flair.training +91-7718877477, +91-9111133369 Or visit our website. https://data-flair.training/
Data outsourcing allows data owners to keep their data at untrusted clouds that do not ensure the privacy of data and/or computations. One useful framework for fault-tolerant data processing in a distributed fashion is MapReduce, which... more
Data outsourcing allows data owners to keep their data at untrusted clouds that do not ensure the privacy of data and/or computations. One useful framework for fault-tolerant data processing in a distributed fashion is MapReduce, which was developed for trusted private clouds. This paper presents algorithms for data outsourcing based on Shamir's secret-sharing scheme and for executing privacy-preserving SQL queries such as count, selection including range selection, projection, and join while using MapReduce as an underlying programming model. The proposed algorithms prevent the untrusted cloud to know the database or the query while also preventing output size and access-pattern attacks. Interestingly, our algorithms do not need the database owner, which only creates and distributes secret-shares once, to be involved to answer any query, and hence, the database owner also cannot learn the query. We evaluate the efficiency of the algorithms on parameters: (i) the number of communication rounds (between a user and a cloud), (ii) the total amount of bit flow (between a user and a cloud), and (iii) the computational load at the user-side and the cloud-side.
IEEE Hadoop Big Data Project Titles 2017 | 2018 IEEE Big Data Projects A Scalable Data Chunk Similarity based Compression Approach for Efficient Big Sensing Data Processing on Cloud A Systematic Approach Toward Description and... more
IEEE Hadoop Big Data Project Titles 2017 | 2018 IEEE Big Data Projects A Scalable Data Chunk Similarity based Compression Approach for Efficient Big Sensing Data Processing on Cloud A Systematic Approach Toward Description and Classification of Cybercrime Incidents Achieving Efficient and Privacy-Preserving Cross-Domain Big Data Deduplication in Cloud aHDFS: An Erasure-Coded Data Archival System for Hadoop Clusters Big Data Privacy in Biomedical Research Cost-Aware Big Data Processing across Geo-distributed Datacenters Disease Prediction by Machine Learning over Big Data from Healthcare Communities DyScale: a MapReduce Job Scheduler for Heterogeneous Multicore Processors Efficient Processing of Skyline Queries Using MapReduce FiDoop-DP: Data Partitioning in Frequent Itemset Mining on Hadoop Clusters Hadoop MapReduce for Mobile Clouds Mining Human Activity Patterns from Smart Home Big Data for Healthcare Applications PPHOPCM: Privacy-preserving High-order Possibilistic c-Means Algorithm for Big Data Clustering with Cloud Computing Practical Privacy-Preserving MapReduce Based K-means Clustering over Large-scale Dataset Public Interest Analysis Based on Implicit Feedback of IPTV Users Ring: Real-Time Emerging Anomaly Monitoring System over Text Streams Robust Big Data Analytics for Electricity Price Forecasting in the Smart Grid Scalable Uncertainty-Aware Truth Discovery in Big Data Social Sensing Applications for Cyber-Physical Systems Self-Adjusting Slot Configurations for Homogeneous and Heterogeneous Hadoop Clusters Service Rating Prediction by Exploring Social Mobile Users’ Geographical Locations
DataFlair's Big Data Hadoop Tutorial PPT for Beginners takes you through various concepts of Hadoop:This Hadoop tutorial PPT covers: 1. Introduction to Hadoop 2. What is Hadoop 3. Hadoop History 4. Why Hadoop 5. Hadoop Nodes 6.... more
DataFlair's Big Data Hadoop Tutorial PPT for Beginners takes you through various concepts of Hadoop:This Hadoop tutorial PPT covers: 1. Introduction to Hadoop 2. What is Hadoop 3. Hadoop History 4. Why Hadoop 5. Hadoop Nodes 6. Hadoop Architecture 7. Hadoop data flow 8. Hadoop components – HDFS, MapReduce, Yarn 9. Hadoop Daemons 10. Hadoop characteristics & features Related Blogs: Hadoop Introduction – A Comprehensive Guide: https://goo.gl/QadBS4 Wish to Learn Hadoop & Carve your career in Big Data, Contact us: info@data-flair.training +91-7718877477, +91-9111133369 Or visit our website. https://data-flair.training/
During the recent years, a number of efficient and scalable frequent itemset mining algorithms for big data analytics have been proposed by many researchers. Initially, MapReduce-based frequent itemset mining algorithms on Hadoop cluster... more
During the recent years, a number of efficient and scalable frequent itemset mining algorithms for big data analytics have been proposed by many researchers. Initially, MapReduce-based frequent itemset mining algorithms on Hadoop cluster were proposed. Although, Hadoop has been developed as a cluster computing system for handling and processing big data, but the performance of Hadoop does not meet the expectation for the iterative algorithms of data mining, due to its high I/O, and writing and then reading intermediate results in the disk. Consequently, Spark has been developed as another cluster computing infrastructure which is much faster than Hadoop due to its in-memory computation. It is highly suitable for iterative algorithms and supports batch, interactive, iterative, and stream processing of data. Many frequent itemset mining algorithms have been re-designed on the Spark, and most of them are Apriori-based. All these Spark-based Apriori algorithms use Hash Tree as the underlying data structure. This paper investigates the efficiency of various data structures for the Spark-based Apriori. Although, the data structure perspective has been investigated previously, but for MapReduce-based Apriori, and it must be re-investigated in the distributed computing environment of Spark. The considered underlying data structures are Hash Tree, Trie, and Hash Table Trie. The experimental results on the benchmark datasets show that the performance of Spark-based Apriori with Trie and Hash Table Trie are almost similar but both perform many times better than Hash Tree in the distributed computing environment of Spark.
Big Data is a popular term encompassing the use of techniques to capture, analyses, and process as well as visualize potentially large datasets in a reasonable timeframe not accessible to standard IT technologies, therefore platform,... more
Big Data is a popular term encompassing the use of techniques to capture, analyses, and process as well as visualize potentially large datasets in a reasonable timeframe not accessible to standard IT technologies, therefore platform, tools and software used for this purpose are collectively called Big Data technologies. This paper include the basic concept of big data with its benefits as well as its working, types of data and introduction to Apache Hadoop, its important components (HDFS and MapReduce). Further this paper contains introduction to NoSQL, NewSQL as well as its characteristics and analyses how to handle big data through apache Hadoop, NoSQL and NewSQL.
In present times, updated information and knowledge has become readily accessible to researchers, enthusiasts, developers, and academics through the Internet on many different subjects for wider areas of application. The underlying... more
In present times, updated information and knowledge has become readily accessible to researchers, enthusiasts, developers, and academics through the Internet on many different subjects for wider areas of application. The underlying framework facilitating such possibilities is networking of servers, nodes, and personal computers. However, such setups, comprising of mainframes, servers and networking devices are inaccessible to many, costly, and are not portable. In addition, students and lab-level enthusiasts do not have the requisite access to modify the functionality to suit specific purposes. The Raspberry-Pi (R-Pi) is a small device capable of many functionalities akin to super-computing while being portable, economical and flexible. It runs on open source Linux, making it a preferred choice for lab-level research and studies. Users have started using the embedded networking capability to design portable clusters that replace the costlier machines. This paper introduces new users to the most commonly used frameworks and some recent developments that best exploit the capabilities of R-Pi when used in clusters. This paper also introduces some of the tools and measures that rate efficiencies of clusters to help users assess the quality of cluster design. The paper aims to make users aware of the various parameters in a cluster environment.
Apriori is designed to operate on databases containing transactions (for example, collections of items bought by customers, or details of a website frequentation). Other algorithms are designed for finding association rules in data having... more
Apriori is designed to operate on databases containing transactions (for example, collections of items bought by customers, or details of a website frequentation). Other algorithms are designed for finding association rules in data having no transactions , or having no timestamps (DNA sequencing). Each transaction is seen as a set of items (an item set). Given a threshold , the Apriori algorithm identifies the item sets which are subsets of at least transactions in the database. Apriori uses a "bottom up" approach, where frequent subsets are extended one item at a time (a step known as candidate generation), and groups of candidates are tested against the data. The algorithm terminates when no further successful extensions are found.
Log Analysis is a critical procedure in most framework and system exercises where log information is utilized for different reasons, for example, for execution checking, security examining or notwithstanding for revealing and profiling.... more
Log Analysis is a critical procedure in most framework and system exercises where log information is utilized for different reasons, for example, for execution checking, security examining or notwithstanding for revealing and profiling. Nonetheless, as years cruised by, the volume of log information increments alongside the span of the framework just as the quantity of clients included. Customary or existing log analyser instruments are not ready to deal with the huge measure of information. Thusly, Big Data is the answer for defeated this issue. The principle motivation behind this paper is to introduce a survey of log document investigation in Big Data condition dependent on past research works. This paper likewise features the qualities of Big Data just as Hadoop Framework that has been generally utilized as Big Data application. Results from the papers assessed demonstrate that dominant part analysts connected MapReduce as the principle segment of Hadoop for investigating the log records and HDFS as the information stockpiling. Past analysts have likewise utilized different instruments and calculations together with the Hadoop Framework for investigation purposes. The discoveries of this paper will give an intelligible audit of Hadoop use execution in breaking down various kinds of log records and prescribe justifiable outcomes for end clients to use in future work.
In today's digital world large volume of data is being generated from various sources including social media, healthcare, transportation, industries, sensors, etc. This data includes structured, semi-structured and unstructured data.
The adaptation of new technologies into the electrical energy infrastructure enables development of novel energy efficiency services. Introduction of smart meters into residential households allows collection of granular energy usage... more
The adaptation of new technologies into the electrical energy infrastructure enables development of novel energy efficiency services. Introduction of smart meters into residential households allows collection of granular energy usage measures at frequent intervals. Analysis of such data could bring ample and detailed insights into the consumption behavior of households, allowing more accurate prediction of future loads. With the data intensive nature of these technologies, recent big data solutions allows harnessing of the enormous amounts of data being generated. We present a novel, scalable, distributed gaussian mean clustering algorithm for analyzing the energy consumption behavior of households in relation to different contributing factors such as weather conditions, type of day and time of the day. Based on forecasts of such contributing factors, we were able to predict a household's future energy usage much more accurately than other standard regression methods used for load forecasting.
Big Data make conversant with novel technology, skills and processes to your information architecture and the people that operate, design, and utilization them. The big data delineate a holistic information management contrivance that... more
Big Data make conversant with novel technology, skills and processes to your information architecture and the people that operate, design, and utilization them. The big data delineate a holistic information management contrivance that comprise and integrates numerous new types of data and data management together conventional data. The Hadoop is an unlocked source software framework licensed under the Apache Software Foundation, render for supporting data profound applications running on huge grids and clusters, to proffer scalable, credible, and distributed computing. This is invented to scale up from single servers to thousands of machines, every proposition local computation and storage. In this paper, we have endeavored to converse about on the taxonomy for big data and Hadoop technology. Eventually, the big data technologies are necessary in providing more actual analysis, which may leadership to more concrete decision-making consequence in greater operational capacity, cost deficiency, and detect risks for the business. In this paper, we are converse about the taxonomy of the big data and components of Hadoop.
—Aging-in-Place solutions are becoming increasingly prevalent in our society. New age big data technologies can harness upon enormous amount of data generated from sensors in smart homes to provide enabling services. Added care and... more
—Aging-in-Place solutions are becoming increasingly prevalent in our society. New age big data technologies can harness upon enormous amount of data generated from sensors in smart homes to provide enabling services. Added care and preventive services can be furnished through interoperability and bidirectional dataflow across the value chain. However the nature of the problem domain which although allows establishing better care through sharing of information also risks disclosing complete living behavior of individuals. In this paper, we introduce and evaluate a novel scalable k-anonymization solution based upon the distributed map-reduce paradigm for preserving privacy of the shared data in a welfare intercloud. Our evaluation benchmarks both information loss and data quality metrics and demonstrates better scalability/performance than any other available solutions.
Big Data make conversant with novel technology, skills and processes to your information architecture and the people that operate, design, and utilization them. The big data delineate a holistic information management contrivance that... more
Big Data make conversant with novel technology, skills and processes to your information architecture and the people that operate, design, and utilization them. The big data delineate a holistic information management contrivance that comprise and integrates numerous new types of data and data management together conventional data. The Hadoop is an unlocked source software framework licensed under the Apache Software Foundation, render for supporting data profound applications running on huge grids and clusters, to proffer scalable, credible, and distributed computing. This is invented to scale up from single servers to thousands of machines, every proposition local computation and storage. In this paper, we have endeavored to converse about on the taxonomy for big data and Hadoop technology. Eventually, the big data technologies are necessary in providing more actual analysis, which may leadership to more concrete decision-making consequence in greater operational capacity, cost de...
The Big data is the name used ubiquitously now a day in distributed paradigm on the web. As the name point out it is the collection of sets of very large amounts of data in pet bytes, Exabyte etc. related systems as well as the algorithms... more
The Big data is the name used ubiquitously now a day in distributed paradigm on the web. As the name point out it is the collection of sets of very large amounts of data in pet bytes, Exabyte etc. related systems as well as the algorithms used to analyze this enormous data. Hadoop technology as a big data processing technology has proven to be the go to solution for processing enormous data sets. MapReduce is a conspicuous solution for computations, which requirement one-pass to complete, but not exact efficient for use cases that need multi-pass for computations and algorithms. The Job output data between every stage has to be stored in the file system before the next stage can begin. Consequently, this method is slow, disk Input/output operations and due to replication. Additionally, Hadoop ecosystem doesn't have every component to ending a big data use case. Suppose we want to do an iterative job, you would have to stitch together a sequence of MapReduce jobs and execute them in sequence. Every this job has high-latency, and each depends upon the completion of the previous stage. Apache Spark is one of the most widely used open source processing engines for big data, with wealthy language-integrated APIs and an extensive range of libraries. Apache Spark is a usual framework for distributed computing that offers high performance for both batch and interactive processing. In this paper, we aimed to demonstrate a close-up view about Apache Spark and its features and working with Spark using Hadoop. We are in a nutshell discussing about the Resilient Distributed Datasets (RDD), RDD operations, features, and limitation. Spark can be used along with MapReduce in the same Hadoop cluster or can be used lonely as a processing framework. In the last comparative analysis between Spark and Hadoop and MapReduce in this paper.
Many people who want to do Hadoop Big Data Training certification, they have more than a few queries regarding roles and responsibility for Hadoop developer. How much Java skill is essential to learn Hadoop course? What are the... more
Many people who want to do Hadoop Big Data Training certification, they have more than a few queries regarding roles and responsibility for Hadoop developer. How much Java skill is essential to learn Hadoop course? What are the characteristic tasks for a Hadoop developer? What Hadoop developers do on a regular basis? What are the activities does a Hadoop developer do?
"Monte Carlo simulations employed for the analysis of portfolios of catastrophic risk process large volumes of data. Often times these simulations are not performed in real-time scenarios as they are slow and consume large data. Such... more
"Monte Carlo simulations employed for the analysis of portfolios of catastrophic risk process large volumes of data. Often times these simulations are not performed in real-time scenarios as they are slow and consume large data. Such simulations can benefit from a framework that exploits parallelism for addressing the computational challenge and facilitates a distributed file system for addressing the data challenge. To this end, the Apache Hadoop framework is chosen for the simulation reported in this paper so that the computational challenge can be tackled using the MapReduce model and the data challenge can be addressed using the Hadoop Distributed File System. A parallel algorithm for the analysis of aggregate risk is proposed and implemented using the MapReduce model in this paper. An evaluation of the performance of the algorithm indicates that the Hadoop MapReduce model offers a framework for processing large data in aggregate risk analysis. A simulation of aggregate risk employing 100,000 trials with 1000 catastrophic events per trial on a typical exposure set and contract structure is performed on multiple worker nodes in less than 6 minutes. The result indicates the scope and feasibility of MapReduce for tackling the computational and data challenge in the analysis of aggregate risk for real-time use."
In today’s digital world large volume of data is being generated from various sources including social media, healthcare, transportation, industries, sensors, etc. This data includes structured, semi-structured and unstructured data. This... more
In today’s digital world large volume of data is being generated from various sources including social media, healthcare, transportation, industries, sensors, etc. This data includes structured, semi-structured and unstructured data. This huge volume of data cannot be stored and processed using traditional systems thus it is termed as big data. To store and analyze this type of data parallel storage and analysis is required. This can be achieved by using big data analytics. Using Apache Hadoop such huge volume data can be analyzed efficiently. In this paper, a case study is performed on different fertilizers requirement and availability in different states of India in three years from 2012-2013 to 2014-2015 using Apache Hadoop.
This paper proposes a framework, namely, HSMF (Hadoop Secure Messaging Framework). Now a day, it requires efficiency in processing billion bytes of data (binary) particularly through business processes. It also has become expensive to... more
This paper proposes a framework, namely, HSMF (Hadoop Secure Messaging Framework). Now a day, it requires efficiency in processing billion bytes of data (binary) particularly through business processes. It also has become expensive to ensure reliability in each and every application that processes large amount of datasets (complex or structured). Hadoop platform was designed to address these issues. It was designed mainly for enterprise solution. Our goal is to establish secure communication between hadoop and smart devices or applications, so that applications of smart devices can send queries to hadoop and display respective results. HSMF is based on XML (Extensible Markup Language) message over TCP/IP which gives hadoop more flexibility in order to communicate not only with java based applications but also with applications written in other languages such as C/C++, C#, Python, Dot Net framework etc. Since XML is generated dynamically, client application may choose DML (Data Manipulation Language) of hadoop using HSMF. Client application does not require knowing the design of dataset, it only performs query on particular dataset based on user requirement. During communication via TCP/IP, every XML message header is checked and matched by a messaging header checking mechanism. This is why HSMF is secure. Our results show a successful communication between hadoop and smart devices or applications where smart devices and applications send query to hadoop and display the result received from hadoop.
The adaptation of new technologies into the electrical energy infrastructure enables development of novel energy efficiency services. Introduction of smart meters into residential households allows collection of granular energy usage... more
The adaptation of new technologies into the electrical energy infrastructure enables development of novel energy efficiency services. Introduction of smart meters into residential households allows collection of granular energy usage measures at frequent intervals. Analysis of such data could bring ample and detailed insights into the consumption behavior of households, allowing more accurate prediction of future loads. With the data intensive nature of these technologies, recent big data solutions allows harnessing of the enormous amounts of data being generated. We present a novel, scalable, distributed gaussian mean clustering algorithm for analyzing the energy consumption behavior of households in relation to different contributing factors such as weather conditions, type of day and time of the day. Based on forecasts of such contributing factors, we were able to predict a household's future energy usage much more accurately than other standard regression methods used for load forecasting.
This paper gives complete guidelines on BigData, Different Views of BigData, etc.How the BigData is useful to us and what are the factors affecting BigData all the things are covered under this paper. The paper also contains the BigData... more
This paper gives complete guidelines on BigData, Different Views of BigData, etc.How the BigData is useful to us and what are the factors affecting BigData all the things are covered under this paper. The paper also contains the BigData Machine learning techniques and how the Hadoop comes into the picture. It also contains the what is importance of BigData security. The paper mostly covers all the main point that affect Big Data and Machine Learning.
This paper proposes a framework, namely, HSMF (Hadoop Secure Messaging Framework). Now a day, it requires efficiency in processing billion bytes of data (binary) particularly through business processes. It also has become expensive to... more
This paper proposes a framework, namely, HSMF (Hadoop Secure Messaging Framework). Now a day, it requires efficiency in processing billion bytes of data (binary) particularly through business processes. It also has become expensive to ensure reliability in each and every application that processes large amount of datasets (complex or structured). Hadoop platform was designed to address these issues. It was designed mainly for enterprise solution. Our goal is to establish secure communication between hadoop and smart devices or applications, so that applications of smart devices can send queries to hadoop and display respective results. HSMF is based on XML(Extensible Markup Language) message over TCP/IP which gives hadoop more flexibility in order to communicate not only with java based applications but also with applications written in other languages such as C/C++, C#, Python, Dot Net framework etc. Since XML is generated dynamically, client application may choose DML (Data Manipulation Language) of hadoop using HSMF. Client application does not require knowing the design of dataset, it only performs query on particular dataset based on user requirement. During communication via TCP/IP, every XML message header is checked and matched by a messaging header checking mechanism. This is why HSMF is secure. Our results show a successful communication between hadoop and smart devices or applications where smart devices and applications send query to hadoop and display the result received from hadoop.
The design and implementation of an extensible framework for performing exploratory analysis of complex property portfolios of catastrophe insurance treaties on the Map-Reduce model is presented in this paper. The framework implements... more
The design and implementation of an extensible framework for performing exploratory analysis of complex property portfolios of catastrophe insurance treaties on the Map-Reduce model is presented in this paper. The framework implements Aggregate Risk Analysis, a Monte Carlo simulation technique, which is at the heart of the analytical pipeline of the modern quantitative insurance/reinsurance pipeline. A key feature of the framework is the support for layering advanced types of analysis, such as portfolio or program level aggregate risk analysis with secondary uncertainty (i.e. computing Probable Maximum Loss (PML) based on a distribution rather than mean values). Such in-depth analysis is not supported by production-based risk management systems since they are constrained by hard response time requirements placed on them. On the other hand, this paper reports preliminary experimental results to demonstrate that in-depth aggregate risk analysis can be realized using a framework based on the MapReduce model.