Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

High-performance XML modeling of parallel queries based on MapReduce framework

Published: 01 December 2016 Publication History

Abstract

With the increasing of data at an incredible rate, the development of cloud computing technologies is of critical importance to the advances of researches. MapReduce is a widely adopted computing framework for data-intensive applications running on clusters. Traditional parallel XML parsing and indexing approaches are inadequate for processing large-scale XML datasets on clusters and; therefore, we propose an approach to exploit data parallelisms in XML processing using MapReduce in Hadoop. Our solution seamlessly integrates data storage, labeling, indexing, and parallel queries to process a massive amount of XML data. Specifically, we introduce an SDN labeling algorithm and a distributed hierarchical index using DHTs. More importantly, we design an advanced two phase MapReduce solution that is able to efficiently address the issues of labeling, indexing, and query processing on big XML data. The first MapReduce phase applies filtering, labeling, index building techniques, in which each DataNode performs elements labeling using a map function and a reduce function to merge and build indexes. In the second phase, local XML queries in multiple partitions are performed in parallel using index-table-enabled B-SLCA. Our experimental results show the efficiency and effectiveness of our proposed parallel XML data approach using MapReduce Framework.

References

[1]
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: OSDI (2004)
[2]
Fegaras, L., Li, C., Philip, J.J.: Xml query optimization in map-reduce. In: WebDB (2011)
[3]
Yang, D.D., Wei, Z.Q., Yang, Y.Q.: A novel implementation of a Hash function based on XML DOM parser. In: Cyber-Enabled Distributed Computing and Knowledge, Discovery, pp. 5---8 (2015)
[4]
Choi, H., Lee, K.-H., Lee, Y.-J.: Parallel labeling of massive xml data with mapreduce. J. Supercomput. 67, 408---437 (2013)
[5]
Zhou, J., Bao, Z., Meng, X.: Efficient query processing for xml keyword queries based on the idlist index. VLDB J. 23, 1---26 (2013)
[6]
Xu, L., Ling, T., Bao, Z.: Dde: from dewey to a fully dynamic xml labeling scheme. In: 2009 ACM SIGMOD International Conference on Management of data, pp. 719---730 (2009)
[7]
Camacho-Rodriguez, J., Colazzo, D., Manolescu, I.: Building large xml stores in the amazon cloud. In: Data Engineering Workshops (ICDEW), pp. 151---158 (2012)
[8]
Chen, G., Vo, H.T., Ooi, B.C.: A framework for supporting dbms-like indexes in the cloud. VLDB 4, 702---713 (2011)
[9]
Ottaviano, G., Grossi, R.: Semi-indexing semi-structured data in tiny space. In: Proceedings of the 20th ACM international conference on Information and Knowledge Management, pp. 1485---1494 (2011)
[10]
Feng, J., Li, G.: Efficient fuzzy type-ahead search in xml data. IEEE Trans. Knowl. Data Eng. 24, 882---895 (2012)
[11]
Li, J.F.G., Li, C., Zhou, L.: Sail: structure-aware indexing for effective and progressive top-k keyword search over xml documents. Inf. Sci. 179, 3745---3762 (2009)
[12]
Chen, L.J., Papakonstantinou, Y.: Supporting top-k keyword search in xml databases. In: ICDE (2010)
[13]
Ling, Y., Xu, G.: A distributed keyword search algorithm in xml databases using mapreduce. Comput. Inform. Cybern. Appl. 107, 1307---1316 (2012)
[14]
Zhang, C., Ma, Q., Wang, X., Zhou, A.: Distributed slca-based xml keyword search by map-reduce. Database Syst. Adv. Appl. 6193, 386---397 (2010)
[15]
Zhou, M., Hu, H., Zhou, M.: Search xml data by slca on a mapreduce cluster. In: IUCS, pp. 84---89 (2010)
[16]
Zinn, D., Bowers, S., Kohler, S., Ludascher, B.: Parallelizing xml data-streaming workflows via mapreduce. J. Comput. Syst. Sci. 76, 447463 (2010)
[17]
Fadika, Z., Head, M.R., Govindaraju, M.: Parallel and distributed approach for processing large-scale xml datasets. In: 10th IEEE/ACM International Conference on Grid Computing, pp. 105---112 (2009)
[18]
Y. Zhang, Q. L. Li and B. Liu. MapReduce implementation of XML keyword search algorithm. In: 2015 IEEE International Conference on Smart City, pp. 721---728 (2015)
[19]
Wang, X.W.W., Zhou, A.: Hash-search: an efficient slca-based keyword search algorithm on xml documents. In: DASFAA, p. 496510 (2009)
[20]
Lee, k, Choi, H., Moon, B.: Parallel data processing with mapreduce: a survey. ACM SIGMOD Rec. 40, 11---20 (2012)
[21]
Hsu, W.-C., Shih, H.-C.: A cloud computing implementation of xml indexing method using hadoop. In: Intelligent Information and Database Systems, vol. 7198, pp. 256---265 (2012)
[22]
Wang, G., Chan, C.-Y.: Multi-query optimization in mapreduce framework. VLDB 7, 145---156 (2014)

Cited By

View all
  • (2019)Big data and rule-based recommendation system in Internet of ThingsCluster Computing10.1007/s10586-017-1078-y22:1(1837-1846)Online publication date: 1-Jan-2019

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Cluster Computing
Cluster Computing  Volume 19, Issue 4
December 2016
625 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 December 2016

Author Tags

  1. B-SLCA
  2. Big XML
  3. Distributed programming
  4. MapReduce
  5. Parallel programming

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2019)Big data and rule-based recommendation system in Internet of ThingsCluster Computing10.1007/s10586-017-1078-y22:1(1837-1846)Online publication date: 1-Jan-2019

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media