Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2723372.2723741acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Open access

SEMROD: Secure and Efficient MapReduce Over HybriD Clouds

Published: 27 May 2015 Publication History

Abstract

This paper describes SEMROD, a sensitive data aware MapReduce (MR) framework for hybrid clouds. SEMROD steers data and computation through public and private machines in such a way that no knowledge about sensitive data is leaked to public machines. For this purpose, SEMROD keeps trace of intermediate keys (generated during MR execution) that become sensitive, based on which it makes dynamic task scheduling decisions. SEMROD guarantees that adversaries viz. public machines) cannot gain any ``additional'' information about sensitive data from either the data stored on public machines or the communication between public and private machines during job execution. SEMROD extends naturally from a single MR job to multi-phase MR jobs that result, for instance, from compiling Hive queries into MR jobs. Using SEMROD, computation that may involve sensitive data can exploit public machines, thereby bringing significant performance benefits. Such computation would otherwise be restricted to only private clouds. Our experiments clearly demonstrate performance advantages to using SEMROD as compared with other secure alternatives, even when the percentage of sensitive data is as high as 50%.

References

[1]
Cloud Survey. http://goo.gl/NNpgyc.
[2]
Hybrid Cloud. The NIST Definition of Cloud Computing. NIST, Special Publication, 800--145, 2011.
[3]
D. Abadi et al. The beckman report on database research. The Beckman Database Research Self-Assessment Meeting, 2013.
[4]
Hybrid Cloud Usage. http://goo.gl/8V8vt2.
[5]
J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In OSDI, pages 137--150, 2004.
[6]
Apache Hadoop. http://hadoop.apache.org/.
[7]
K. Zhang et al. Sedic: privacy-aware data intensive computing on hybrid clouds. In ACM CCS, 2011.
[8]
M. Atallah et al. Disclosure limitation of sensitive rules. In Proceedings of the 1999 Workshop on Knowledge and Data Engineering Exchange, 1999.
[9]
Josep Domingo-Ferrer, editor. Inference Control in Statistical Databases, From Theory to Practice. Springer-Verlag, 2002.
[10]
Alban Gabillon. Multilevel databases. In Encyclopedia of Database Technologies and Applications, pages 386--389. 2005.
[11]
Bhavani M. Thuraisingham et al. Design and implementation of a database inference controller. Data Knowl. Eng., 11(3), 1993.
[12]
Frank D. McSherry. Privacy integrated queries: An extensible platform for privacy-preserving data analysis. SIGMOD, 2009.
[13]
Gerome Miklau and Dan Suciu. A formal analysis of information disclosure in data exchange. J. Comput. Syst. Sci., 73(3):507--534, 2007.
[14]
Access Control in Oracle. http://goo.gl/cnwQVv.
[15]
R. Curtmola et al. Searchable symmetric encryption: improved definitions and efficient constructions. In ACM CCS, 2006.
[16]
HiBench. https://github.com/intel-hadoop/HiBench.
[17]
Mahout. https://mahout.apache.org/.
[18]
C. Zhang, E. Chang, and Roland H. C. Yap. Tagged-mapreduce: A general framework for secure computing with mixed-sensitivity data on hybrid clouds. In CCGRID, 2014.
[19]
Wei Wei et al. Securemr: A service integrity assurance framework for mapreduce. ACSAC '09, pages 73--82, 2009.
[20]
K.Y. Oktay et al. Risk-Aware Workload Distribution in Hybrid Clouds. In IEEE CLOUD, 2012.
[21]
C. Curino et al. Relational Cloud: a Database Service for the cloud. In CIDR, 2011.
[22]
G. Aggarwal et al. Two Can Keep A Secret: A Distributed Architecture for Secure Database Services. In CIDR, 2005.
[23]
D. X. Song, D. Wagner, and A. Perrig. Practical techniques for searches on encrypted data. In IEEE SP, pages 44--55, 2000.
[24]
H. Hacigümüs et al. Executing SQL over encrypted data in the database-service-provider model. In SIGMOD, 2002.
[25]
E. Shi et al. Multi-dimensional range query over encrypted data. In IEEE SP, 2007.
[26]
A. Boldyreva et al. Order-preserving symmetric encryption. In EUROCRYPT, 2009.
[27]
C. Gentry. A fully homomorphic encryption scheme. PhD thesis, Stanford University, 2009. crypto.stanford.edu/craig.

Cited By

View all
  • (2024)Preventing Inferences Through Data Dependencies on Sensitive DataIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.333663036:10(5308-5327)Online publication date: Oct-2024
  • (2023)Flare: A Fast, Secure, and Memory-Efficient Distributed Analytics FrameworkProceedings of the VLDB Endowment10.14778/3583140.358315816:6(1439-1452)Online publication date: 20-Apr-2023
  • (2023)Modeling and Simulating a Process Mining-Influenced Load-Balancer for the Hybrid CloudIEEE Transactions on Cloud Computing10.1109/TCC.2022.317766811:2(1999-2010)Online publication date: 1-Apr-2023
  • Show More Cited By

Index Terms

  1. SEMROD: Secure and Efficient MapReduce Over HybriD Clouds

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data
    May 2015
    2110 pages
    ISBN:9781450327589
    DOI:10.1145/2723372
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 May 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. data processing
    2. hybrid cloud
    3. mapreduce
    4. secure

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    SIGMOD/PODS'15
    Sponsor:
    SIGMOD/PODS'15: International Conference on Management of Data
    May 31 - June 4, 2015
    Victoria, Melbourne, Australia

    Acceptance Rates

    SIGMOD '15 Paper Acceptance Rate 106 of 415 submissions, 26%;
    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)66
    • Downloads (Last 6 weeks)14
    Reflects downloads up to 13 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Preventing Inferences Through Data Dependencies on Sensitive DataIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.333663036:10(5308-5327)Online publication date: Oct-2024
    • (2023)Flare: A Fast, Secure, and Memory-Efficient Distributed Analytics FrameworkProceedings of the VLDB Endowment10.14778/3583140.358315816:6(1439-1452)Online publication date: 20-Apr-2023
    • (2023)Modeling and Simulating a Process Mining-Influenced Load-Balancer for the Hybrid CloudIEEE Transactions on Cloud Computing10.1109/TCC.2022.317766811:2(1999-2010)Online publication date: 1-Apr-2023
    • (2022)Don't be a tattle-taleProceedings of the VLDB Endowment10.14778/3551793.355180515:11(2437-2449)Online publication date: 1-Jul-2022
    • (2020)PANDAACM Transactions on Management Information Systems10.1145/339752111:4(1-41)Online publication date: 12-Oct-2020
    • (2020)Advances in Cryptography and Secure Hardware for Data Outsourcing2020 IEEE 36th International Conference on Data Engineering (ICDE)10.1109/ICDE48307.2020.00173(1798-1801)Online publication date: Apr-2020
    • (2020)A Framework for Fast MapReduce Processing Considering Sensitive Data on Hybrid Clouds2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC48688.2020.00-67(1357-1362)Online publication date: Jul-2020
    • (2020)A Big Data Provenance Model for Data Security Supervision Based on PROV-DM ModelIEEE Access10.1109/ACCESS.2020.29758208(38742-38752)Online publication date: 2020
    • (2020)Process mining‐constrained scheduling in the hybrid cloudConcurrency and Computation: Practice and Experience10.1002/cpe.602533:4Online publication date: 24-Sep-2020
    • (2019)YARNBig Data Processing With Hadoop10.4018/978-1-5225-3790-8.ch006(90-124)Online publication date: 2019
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media