research-article

Indexing multi-dimensional data in a cloud system

Authors:

Beng Chin OoiAuthors Info & Claims

SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data

Pages 591 - 602

https://doi.org/10.1145/1807167.1807232

Published: 06 June 2010 Publication History

Abstract

Providing scalable database services is an essential requirement for extending many existing applications of the Cloud platform. Due to the diversity of applications, database services on the Cloud must support large-scale data analytical jobs and high concurrent OLTP queries. Most existing work focuses on some specific type of applications. To provide an integrated framework, we are designing a new system, epiC, as our solution to next-generation database systems. In epiC, indexes play an important role in improving overall performance. Different types of indexes are built to provide efficient query processing for different applications.

In this paper, we propose RT-CAN, a multi-dimensional indexing scheme in epiC. RT-CAN integrates CAN [23] based routing protocol and the R-tree based indexing scheme to support efficient multi-dimensional query processing in a Cloud system. RT-CAN organizes storage and compute nodes into an overlay structure based on an extended CAN protocol. In our proposal, we make a simple assumption that each compute node uses an R-tree like indexing structure to index the data that are locally stored. We propose a query-conscious cost model that selects beneficial local R-tree nodes for publishing. By keeping the number of persistently connected nodes small and maintaining a global multi-dimensional search index, we can locate the compute nodes that may contain the answer with a few hops, making the scheme scalable in terms of data volume and number of compute nodes. Experiments on Amazon's EC2 show that our proposed routing protocol and indexing scheme are robust, efficient and scalable.

References

[1]

http://hadoop.apache.org/.

[2]

http://www.comp.nus.edu.sg/~epic.

[3]

http://www.fhoow.de/institute/iapg/personen/brinkhoff/generator/.

[4]

K. Aberer, P. Cudré-Mauroux, A. D. Z. Despotovic, M. Hauswirth, M. Punceva, and R. Schmidt. P-grid: A self-organizing structured p2p system. In SIGMOD 2003.

Digital Library

[5]

A. Abouzeid, K. Bajda-Pawlikowski, D. J. Abadi, A. Rasin, and A. Silberschatz. Hadoopdb: An architectural hybrid of mapreduce and dbms technologies for analytical workloads. PVLDB, 2(1):922--933, 2009.

Digital Library

[6]

M. K. Aguilera, A. Merchant, M. Shah, A. Veitch, and C. Karamanolis. Sinfonia: A new paradigm for building scalable distributed systems. In SOSP 2007.

Digital Library

[7]

S. A. Weil, S. A. Brandt, E. L. Miller, and D. D. E. Long. Ceph: A scalable, high-performance distributed file system. In SODI 2006.

Digital Library

[8]

V. Bala, E. Duesterwald, and S. Banerjia. Dynamo: a transparent dynamic optimization system. SIGPLAN Not., 35(5):1--12, 2000.

Digital Library

[9]

E. Bertino, B. C. Ooi, R. Sacks-Davis, K. Tan, J. Zobel, B. Shidlovsky, and B. Cantania. Indexing Techniques for Advanced Database Applications. Monograph series, Kluwer Academic, 1997.

Digital Library

[10]

B. F. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H. A. Jacobsen, N. Puz, D. Weaver, and R. Yerneni. Pnuts: Yahoo!'s hosted data serving platform. In VLDB 2008.

Digital Library

[11]

W. Cai, S. Zhou, W. Qian, L. Xu, K. Tan, and A. Zhou. C2: a new overlay network based on can and chord. Int. J. High Perform. Comput. Netw., 3(4):248--261, 2005.

Digital Library

[12]

R. Chaiken, B. Jenkins, P.-A. Larson, B. Ramsey, D. Shakib, S. Weaver, and J. Zhou. Scope: easy and efficient parallel processing of massive data sets. Proc. VLDB Endow., 1(2):1265--1276, 2008.

Digital Library

[13]

F. Chang, J. Dean, S.Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. OSDI 2003.

Digital Library

[14]

A. Crainiceanu, P. Linga, J. Gehrke, and J. Shanmugasundaram. Querying peer-to-peer networks using p-trees. In WebDB 2004.

Digital Library

[15]

A. Crainiceanu, P. Linga, A. Machanavajjhala, J. Gehrke, and J. Shanmugasundaram. P-ring: an efficient and robust p2p range index structure. In SIGMOD 2007.

Digital Library

[16]

J. Dean and S. Ghemawat. Mapreduce: a flexible data processing tool. Commun. ACM, 53(1):72--77, 2010.

Digital Library

[17]

G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's highly available key-value store. In SIGOPS 2007.

Digital Library

[18]

S. Ghemawat, H. Gobioff, and S.-T. Leung. The google file system. In SOSP 2003.

Digital Library

[19]

W. Gilks, S. Richardson, and D. Spiegelhalter. Markov chain monte carlo in practice. 1996.

[20]

H. V. Jagadish, B. C. Ooi, K.-L. Tan, Q. H. Vu, and R. Zhang. Speeding up search in peer-to-peer networks with a multi-way tree structure. In SIGMOD, pages 1--12, 2006.

Digital Library

[21]

H. V. Jagadish, B. C. Ooi, and Q. H. Vu. Baton: A balanced tree structure for peer-to-peer networks. In VLDB 2005.

Digital Library

[22]

J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI 2004.

Digital Library

[23]

S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker. A scalable content-addressable network. In SIGCOMM 2001.

Digital Library

[24]

A. Rowstron and P. Druschel. Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems. In International Conference on Distributed Systems Platforms 2001.

Digital Library

[25]

A. Silberstein, B. F. Cooper, U. Srivastava, E. Vee, R. Yerneni, and R. Ramakrishnan. Efficient bulk insertion into a distributed ordered table. In SIGMOD 2008.

Digital Library

[26]

I. Stoica, R. Morris, D. R. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. In SIGCOMM 2001.

Digital Library

[27]

Y. Tao, J. Zhang, D. Papadias, and N. Mamoulis. An efficient cost model for optimization of nearest neighbor search in low and medium dimensional spaces. IEEE Trans. on Knowl. and Data Eng., 16(10):1169--1184, 2004.

Digital Library

[28]

Q. H. Vu, M. Lupu, and B. C. Ooi. Peer-To-Peer Computing: Principles And Applications. Springer, November 2009.

Digital Library

[29]

S. Wu and K.-L. Wu. An indexing framework for efficient retrieval on the cloud. IEEE Data Engineering Bulletin, 32(1):77--84, 2009.

[30]

H.-c. Yang, A. Dasdan, R.-L. Hsiao, and D. S. Parker. Map-reduce-merge: simplified relational data processing on large clusters. In SIGMOD, pages 1029 1040, 2007.

Digital Library

Cited By

De Capitani di Vimercati SFacchinetti DForesti SOldani GParaboschi SRossi MSamarati P(2024)Multi-Dimensional Flat Indexing for Encrypted DataIEEE Transactions on Cloud Computing10.1109/TCC.2024.340890512:3(928-941)Online publication date: Jul-2024
https://doi.org/10.1109/TCC.2024.3408905
Tao LMa KTian MHui ZZheng SLiu JXie ZQiu Q(2023)Developing a Base Domain Ontology from Geoscience Report Collection to Aid in Information Retrieval towards Spatiotemporal and Topic AssociationISPRS International Journal of Geo-Information10.3390/ijgi1301001413:1(14)Online publication date: 30-Dec-2023
https://doi.org/10.3390/ijgi13010014
Khettabi KKouahla ZFarou BSeridi HFerrag M(2023)Efficient Method for Continuous IoT Data Stream Indexing in the Fog-Cloud Computing LevelBig Data and Cognitive Computing10.3390/bdcc70201197:2(119)Online publication date: 14-Jun-2023
https://doi.org/10.3390/bdcc7020119
Show More Cited By

Index Terms

Indexing multi-dimensional data in a cloud system
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
2. Information systems
  1. Data management systems
    1. Database management system engines
      1. Parallel and distributed DBMSs

Recommendations

An efficient multi-dimensional index for cloud data management
CloudDB '09: Proceedings of the first international workshop on Cloud data management

Recently, the cloud computing platform is getting more and more attentions as a new trend of data management. Currently there are several cloud computing products that can provide various services. However, currently the cloud platforms only support ...
Continuous data stream query in the cloud
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

Cloud computing represents one of the most important research directions for modern computing systems. Existing research efforts on Cloud computing were all focused on designing advanced storage and query techniques for static data. None of them ...
DMVI: a dynamic materialized view index for efficiently discovering usable views for progressive queries
CASCON '12: Proceedings of the 2012 Conference of the Center for Advanced Studies on Collaborative Research

Progressive queries (PQ) are a new type of query emerged from numerous data intensive applications. A user formulates a PQ in several steps using a set of inter-related step-queries (SQ). Efficiently processing PQs in a DBMS is crucial in supporting ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data

June 2010

1286 pages

ISBN:9781450300322

DOI:10.1145/1807167

General Chair:
Ahmed Elmagarmid
Purdue University, USA
,
Program Chair:
Divyakant Agrawal
University of California at Santa Barbara, USA

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGMOD/PODS '10

Sponsor:

SIGMOD

SIGMOD/PODS '10: International Conference on Management of Data

June 6 - 10, 2010

Indiana, Indianapolis, USA

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

125
Total Citations
View Citations
2,077
Total Downloads

Downloads (Last 12 months)27
Downloads (Last 6 weeks)3

Reflects downloads up to 23 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

De Capitani di Vimercati SFacchinetti DForesti SOldani GParaboschi SRossi MSamarati P(2024)Multi-Dimensional Flat Indexing for Encrypted DataIEEE Transactions on Cloud Computing10.1109/TCC.2024.340890512:3(928-941)Online publication date: Jul-2024
Tao LMa KTian MHui ZZheng SLiu JXie ZQiu Q(2023)Developing a Base Domain Ontology from Geoscience Report Collection to Aid in Information Retrieval towards Spatiotemporal and Topic AssociationISPRS International Journal of Geo-Information10.3390/ijgi1301001413:1(14)Online publication date: 30-Dec-2023
Khettabi KKouahla ZFarou BSeridi HFerrag M(2023)Efficient Method for Continuous IoT Data Stream Indexing in the Fog-Cloud Computing LevelBig Data and Cognitive Computing10.3390/bdcc70201197:2(119)Online publication date: 14-Jun-2023
Miao RZhang YQu GYang KYang TCui BSingh ASun YAkoglu LGunopulos DYan XKumar ROzcan FYe J(2023)Hyper-USS: Answering Subset Query Over Multi-Attribute Data StreamProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599383(1698-1709)Online publication date: 6-Aug-2023
Singh BYe QHu HXiao B(2023)Efficient and lightweight indexing approach for multi-dimensional historical data in blockchainFuture Generation Computer Systems10.1016/j.future.2022.09.002139(210-223)Online publication date: Feb-2023
Ouchaou LNacer HLabba C(2022)Towards a distributed SaaS management system in a multi-cloud environmentCluster Computing10.1007/s10586-022-03619-x25:6(4051-4071)Online publication date: 20-Jun-2022
de Oliveira DLiu JPacitti E(2022)Data-Intensive Workflow ManagementundefinedOnline publication date: 26-Feb-2022
Kolisetty VRajput D(2021)Big data integration enhancement based on attributes conditional dependency and similarity index methodMathematical Biosciences and Engineering10.3934/mbe.202142918:6(8661-8682)Online publication date: 2021
Li JLiu Y(2021)An Efficient Data Analysis Framework for Online Security ProcessingJournal of Computer Networks and Communications10.1155/2021/92908532021Online publication date: 1-Jan-2021
Song JBi YHan GLi T(2021)FacetsBase: A Key-Value Store Optimized for Querying on Scholarly DataIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2018.28443139:1(302-315)Online publication date: 1-Jan-2021
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents