Apache spark
1,681 Followers
Recent papers in Apache spark
Sentiment analysis on Twitter data has attracted much attention recently. One of the system’s key features, is the immediacy in communication with other users in an easy, user-friendly and fast way. Consequently, people tend to express... more
Due to the exponential increase in network traffic in the data centers, thousands of servers interconnected with high bandwidth switches are required. Field Programmable Gate Arrays (FPGAs) with Cloud ecosystem offer high performance in... more
Recommender systems aim to personalize the shopping experience of a user by suggesting related products, or products that are found to be in the general interests of the user. The information available for users and products is... more
1. What is Apache Spark 2. How spark executes your program 3. Spark’s performance optimization 4. Memory Management Overview 5. Determining Memory Consumption 6. Partitions and Concurrency 7. Serialized RDD Storage 8. Garbage... more
Data outsourcing allows data owners to keep their data at untrusted clouds that do not ensure the privacy of data and/or computations. One useful framework for fault-tolerant data processing in a distributed fashion is MapReduce, which... more
There has been a strong growth in the global demand for energy over the past decade, especially in relation to renewable energy. The accelerating use of energy by household appliances, rapidly rising sales of air conditioners especially... more
Apache Spark is an engine for large scale data processing, best described as a more flexible version of the venerable MapReduce framework with the capability to perform in-memory processing to speed up computation. BLAST (Basic Local... more
During the recent years, a number of efficient and scalable frequent itemset mining algorithms for big data analytics have been proposed by many researchers. Initially, MapReduce-based frequent itemset mining algorithms on Hadoop cluster... more
( Alternatif Link download Buku Analisis Big Data: http://bit.ly/2x8ta9S ) Alhamdulillahhi robbil alamin, puji syukur kehadirat Allah SWT atas segala rahmat dan karunia-Nya dengan terselesaikannya penulisan buku ini dengan judul... more
Nowadays, data processing requirements is growing exponentially, and relational database is not always the best solution for all situations in big data such as increasing growth of data. Thus, NoSQL databases emerged to overcome the... more
VPN access to a distributed system makes it possible to implement a VPN connection connecting a remote user to a distributed LAN server by protecting the client's identity and the confidentiality of the processed data. This distributed... more
The Predictive Retailer is a retail company that utilizes the latest technological developments to connect with its customers to deliver an exceptional personalized experience to each and every one of them. Today, technology such as AI,... more
The sheer increase in the volume of data over the last decade has triggered research in cluster computing frameworks that enable web enterprises to extract big insights from big data. While Apache Spark defines the state of the art in big... more
The goal of this paper is to know which is the best Big Data architecture that is substantially adapted to make a large amount of data processed in real-time provided by the Spanish Notaries. Taking as a starting point the Hadoop... more
— Big Data is currently a very burning topic in the fields of Computer Science and Business Intelligence, and with such a scenario at our doorstep, a humungous amount of information waits to be documented properly with emphasis on the... more
In this presentation I discuss about the emerging field of data science answering the fundamental questions of what, why and how. This paper aims to provide a simple overview of data science field, various learning and career... more
These last years, the new technologies produce each day large quantities of data. Companies are faced with certain problems of collecting, storing, analyzing and exploiting these large volumes of data in order to create the added value.... more
Machine Learning (ML), which is a subset of Artificial intelligence (AI), enhances the ability of a computer to learn, from data, without being explicitly programmed end-to-end. As ML and AI learn they acquire the ability to carry out... more
Big Data konsepti ve bu konsept için kullanılan genel araçlardan bahsettim. NoSQL konusunu ayrıca işlediğim için burada o konuya değinmedim.
—In the Big Data group, Apache Hadoop and Spark are gaining prominence in handling Big Data and analytics. Similarly MapReduce has been seen as one of the key empowering methodologies for taking care of large-scale query processing. These... more
1. ABSTRACT Current database engines that provide support for JSON data usually offer a non-standardized and semantically incomplete language. In the other hand, relational databases use follow a common language (SQL), but fail to... more
Oggigiòrno si sente spesso parlare di Machine Learning ed algoritmi in grado di portare valore aggiunto al business delle aziende. Un settore dove tali algoritmi stanno prendendo piede è quello delle telecomunicazioni ma spesso i provider... more
Big Data is one of the concepts of computing that has gained notoriety in several business organizations. The amount of unstructured data is huge, especially on e-commerce sites. In this work, a study and implementation of environments... more
Nowadays, data processing requirements is growing exponentially, and relational database is not always the best solution for all situations in big data such as increasing growth of data. Thus, NoSQL databases emerged to overcome the... more
While cluster computing frameworks are continuously evolving to provide real-time data analysis capabilities, Apache Spark has managed to be at the forefront of big data an-alytics for being a unified framework for both, batch and stream... more
Smart grid is a complete automation system, where large pool of sensors is embedded in the existing power grids system for controlling and monitoring it by utilizing modern information technologies. The data collected from these sensors... more
While cluster computing frameworks are continuously evolving to provide real-time data analysis capabilities, Apache Spark has managed to be at the forefront of big data analyt-ics. Recent studies propose scale-in clusters with in-storage... more
With the recent boom in big data analytics, many application areas require optimization algorithms that work at massive scale. In this note, we consider a distributed method for solving large-scale optimization problems called alternating... more
The combination of the two quick creating logical exploration regions Semantic Web and Web Mining is called Semantic Web Mining. The immense increment in the measure of Semantic Web information turned into an ideal objective for some... more
The sheer increase in the volume of data over the last decade has triggered research in cluster computing frameworks that enable web enterprises to extract big insights from big data. While Apache Spark defines the state of the art in big... more
Vehicular traffic has become an important research area because of its specific features and applications as road safety and efficient traffic management. Vehicles should be carry enough of communication systems, onboard computing... more
In the process of knowledge discovery and representation in large datasets using formal concept analysis, complexity plays a major role in identifying the formal concepts and constructing the concept lattice (digraph of the concepts). For... more
—While cluster computing frameworks are continuously evolving to provide real-time data analysis capabilities, Apache Spark has managed to be at the forefront of big data analytics for being a unified framework for both, batch and stream... more
Smart grid is a complete automation system, where large pool of sensors is embedded in the existing power grids system for controlling and monitoring it by utilizing modern information technologies. The data collected from these sensors... more
Boosted by Big Data popularity, new languages and frameworks for data analytics are appearing at an increasing pace. Each of them introduces its own concepts and terminology and advocates a (real or alleged) superiority in terms of... more
The Big data is the name used ubiquitously now a day in distributed paradigm on the web. As the name point out it is the collection of sets of very large amounts of data in pet bytes, Exabyte etc. related systems as well as the algorithms... more
Classification of imbalanced big data has assembled an extensive consideration by many researchers during the last decade. Standard classification methods poorly diagnosis the minority class samples. Several approaches have been... more
SQL-on-Hadoop systems have been gaining popularity in recent years. One popular example of SQL-on-Hadoop systems is Apache Hive; the pioneer of SQL-on-Hadoop systems. Hive is located on the top of big data stack as an application layer.... more
Vehicular traffic has become an important research area because of its specific features and applications as road safety and efficient traffic management. Vehicles are expected to carry relatively more communication systems, onboard... more
The statistical analysis of infrastructure metrics comes with several specific challenges, including the fairly large volume of unstructured metrics from a large set of independent data sources. Hadoop and Spark provide an ideal... more
Classification of imbalanced big data has assembled an extensive consideration by many researchers during the last decade. Standard classification methods poorly diagnosis the minority class samples. Several approaches have been... more