Data analytics has the potential to be a transformer of scientific research, and data-driven business decisions. By effectively analyzing huge volumes of data, scientific research can be transformed from hypothesis-driven to data-driven, where forming scientific hypotheses will be aided by discovering patterns in vast quantities of data. For most technology companies that operate on a Web scale, analyzing customer data can provide insights on customer behavior, and lead to answers for critical business decisions.
Cloud computing has emerged as a cost-effective and elastic computing paradigm. Cloud infrastructures scale to massive numbers of commodity computing nodes and provide adaptive provisioning without prohibitive initial investments. Data analytics has the potential to be a significant cloud application, and to constitute a large fraction of the workload of modern data centers. Designing the infrastructures and systems for data management in the new computing environments remains an open challenge.
Proceeding Downloads
Don't match twice: redundancy-free similarity computation with MapReduce
To improve the effectiveness of pair-wise similarity computation, state-of-the-art approaches assign objects to multiple overlapping clusters. This introduces redundant pair comparisons when similar objects share more than one cluster. We propose an ...
Multi-objective optimization of data flows in a multi-cloud environment
As cloud-based solutions have become one of the main choices for intensive data analysis both for business decision making and scientific purposes, users face the problem of choosing among different cloud providers. In this work, we deal with data ...
ScyPer: elastic OLAP throughput on transactional data
Ever increasing main memory sizes and the advent of multi-core parallel processing have fostered the development of in-core databases. Even the transactional data of large enterprises can be retained in-memory on a single server. Modern in-core ...
Scalable I/O-bound parallel incremental gradient descent for big data analytics in GLADE
Incremental gradient descent is a general technique to solve a large class of convex optimization problems arising in many machine learning tasks. GLADE is a parallel infrastructure for big data analytics providing a generic task specification ...
A vision for personalized service level agreements in the cloud
Public Clouds today provide a variety of services for data analysis such as Amazon Elastic MapReduce and Google BigQuery. Each service comes with a pricing model and service level agreement (SLA). Today's pricing models and SLAs are described at the ...
Towards a workload for evolutionary analytics
Emerging data analysis involves the ingestion and exploration of new data sets, application of complex functions, and frequent query revisions based on observing prior query answers. We call this new type of analysis evolutionary analytics and identify ...
GPText: Greenplum parallel statistical text analysis framework
Many companies keep large amounts of text data inside of relational databases. Several challenges exist in using state-of-the-art systems to perform analysis on such datasets. First, expensive big data transfer cost must be paid up front to move data ...
Enabling secure query processing in the cloud using fully homomorphic encryption
The database community, at least for the last decade, has been grappling with querying encrypted data, which would enable secure database as a service solutions. A recent breakthrough in the cryptographic community (in 2009) related to fully homomorphic ...
A case for dynamic memory partitioning in data centers
Leveraging distributed main memory is becoming an increasingly popular approach to speed up large-scale data-intensive cluster applications. However, despite the growing number of possible performance benefits, recent studies indicate that the static ...
Recommendations
Big data analytics in Cloud computing: an overview
AbstractBig Data and Cloud Computing as two mainstream technologies, are at the center of concern in the IT field. Every day a huge amount of data is produced from different sources. This data is so big in size that traditional processing tools are unable ...
Cloud computing for big data analytics How cloud computing can handle procesing large amounts of data and improve real-time data analytics
AbstractWith the increasing volume, velocity, and variety of data generated by various sources, big data has become a critical challenge for many organizations. Cloud computing provides an efficient and cost-effective solution to address the challenges of ...