research-article

Changing engines in midstream: a java stream computational model for big data processing

Authors:

Paul SandozAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 7, Issue 13

Pages 1343 - 1354

https://doi.org/10.14778/2733004.2733007

Published: 01 August 2014 Publication History

Abstract

With the addition of lambda expressions and the Stream API in Java 8, Java has gained a powerful and expressive query language that operates over in-memory collections of Java objects, making the transformation and analysis of data more convenient, scalable and efficient. In this paper, we build on Java 8 Stream and add a DistributableStream abstraction that supports federated query execution over an extensible set of distributed compute engines. Each query eventually results in the creation of a materialized result that is returned either as a local object or as an engine defined distributed Java Collection that can be saved and/or used as a source for future queries. Distinctively, DistributableStream supports the changing of compute engines both between and within a query, allowing different parts of a computation to be executed on different platforms. At execution time, the query is organized as a sequence of pipelined stages, each stage potentially running on a different engine. Each node that is part of a stage executes its portion of the computation on the data available locally or produced by the previous stage of the computation. This approach allows for computations to be assigned to engines based on pricing, data locality, and resource availability. Coupled with the inherent laziness of stream operations, this brings great flexibility to query planning and separates the semantics of the query from the details of the engine used to execute it. We currently support three engines, Local, Apache Hadoop MapReduce and Oracle Coherence, and we illustrate how new engines and data sources can be added.

References

[1]

Apache Hadoop. http:/hadoop.apache.org/.

[2]

Apache Hadoop YARN. http:/hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html.

[3]

Apache Hive. http://hive.apache.org/.

[4]

Apache Pig. http://pig.apache.org/.

[5]

Apache Spark. http://spark.apache.org/.

[6]

Apache Storm. http://storm.incubator.apache.org/.

[7]

Apache Tez. http://tez.incubator.apache.org/.

[8]

Apache ZooKeeper. http://zookeeper.apache.org/.

[9]

Cascading. http://www.cascading.org/.

[10]

J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In Proceedings of OSDI, pages 137--150, 2004.

Digital Library

[11]

The Dryad project. http://research.microsoft.com/en-us/projects/Dryad/.

[12]

JDK 8 project. https://jdk8.java.net/.

[13]

H. Lee, K. J. Brown, A. K. Sujeeth, H. Chafi, T. Rompf, M. Odersky, and K. Olukotun. Implementing domain-specific languages for heterogeneous parallel computing. IEEE Micro, 31(5):42--53, 2011.

Digital Library

[14]

LINQ: language integrated query. http://msdn.microsoft.com/en-us/library/bb397926.aspx/.

[15]

Oracle Big Data Appliance. http://www.oracle.com/us/products/database/big-data-appliance/.

[16]

Oracle Coherence. http://www.oracle.com/technetwork/middleware/coherence/.

[17]

Shark. http://shark.cs.berkeley.edu/.

[18]

Spark streaming. http://spark.apache.org/streaming/.

[19]

M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of NSDI, pages 15--28, 2012.

Digital Library

Cited By

Yu ZBei ZQian X(2018)Datasize-Aware High Dimensional Configurations Auto-Tuning of In-Memory Cluster ComputingACM SIGPLAN Notices10.1145/3296957.317318753:2(564-577)Online publication date: 19-Mar-2018
https://dl.acm.org/doi/10.1145/3296957.3173187
Yu ZBei ZQian XShen XTuck JBianchini RSarkar V(2018)Datasize-Aware High Dimensional Configurations Auto-Tuning of In-Memory Cluster ComputingProceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3173162.3173187(564-577)Online publication date: 19-Mar-2018
https://dl.acm.org/doi/10.1145/3173162.3173187
Subercaze JGravier CGillani SKammoun ALaforest F(2017)UpsortableProceedings of the VLDB Endowment10.14778/3137765.313779710:12(1873-1876)Online publication date: 1-Aug-2017
https://dl.acm.org/doi/10.14778/3137765.3137797
Show More Cited By

Recommendations

Overlap Among Major Web Search Engines
ITNG '06: Proceedings of the Third International Conference on Information Technology: New Generations

Our study examined the overlap among results retrieved by three major Web search engines for a large set of more than 10,316 queries. Previous smaller studies have discussed the lack of overlap in results returned by Web search engines for the same ...
Evaluating leading web search engines on children's queries
HCII'11: Proceedings of the 14th international conference on Human-computer interaction: users and applications - Volume Part IV

This study compared retrieved results, relevance ranking, and overlap across Google, Yahoo!, Bing, Yahoo Kids!, and Ask Kids on 15 queries constructed by middle school children. Queries included one word, two words, and multiple words/phrases/natural ...
Site-searching strategies of searchers referred from search engines
ASIST '13: Proceedings of the 76th ASIS&T Annual Meeting: Beyond the Cloud: Rethinking Information Boundaries

In this research, we analyze the referral queries and associated site-search queries at the session level from searchers coming from web search engines. Findings are based on a random sample of 10,000 from a total of 327,261 searching sessions of an ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 7, Issue 13

August 2014

466 pages

ISSN:2150-8097

Editors:
H. V. Jagadish
University of Michigan
,
Aoying Zhou
East Normal University, China

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 August 2014

Published in PVLDB Volume 7, Issue 13

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
187
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yu ZBei ZQian X(2018)Datasize-Aware High Dimensional Configurations Auto-Tuning of In-Memory Cluster ComputingACM SIGPLAN Notices10.1145/3296957.317318753:2(564-577)Online publication date: 19-Mar-2018
https://dl.acm.org/doi/10.1145/3296957.3173187
Yu ZBei ZQian XShen XTuck JBianchini RSarkar V(2018)Datasize-Aware High Dimensional Configurations Auto-Tuning of In-Memory Cluster ComputingProceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3173162.3173187(564-577)Online publication date: 19-Mar-2018
https://dl.acm.org/doi/10.1145/3173162.3173187
Subercaze JGravier CGillani SKammoun ALaforest F(2017)UpsortableProceedings of the VLDB Endowment10.14778/3137765.313779710:12(1873-1876)Online publication date: 1-Aug-2017
https://dl.acm.org/doi/10.14778/3137765.3137797
Rubio-Conde PVillarán-Molina DGarcía-Valls M(2017)Measuring performance of middleware technologies for medical systemsACM SIGBED Review10.1145/3076125.307612614:2(8-14)Online publication date: 31-Mar-2017
https://dl.acm.org/doi/10.1145/3076125.3076126
Mei HGray IWellings AZiarek L(2015)Integrating Java 8 Streams with The Real-Time Specification for JavaProceedings of the 13th International Workshop on Java Technologies for Real-time and Embedded Systems10.1145/2822304.2822314(1-10)Online publication date: 7-Oct-2015
https://dl.acm.org/doi/10.1145/2822304.2822314

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents