Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Forkbase: an efficient storage engine for blockchain and forkable applications

Published: 01 June 2018 Publication History

Abstract

Existing data storage systems offer a wide range of functionalities to accommodate an equally diverse range of applications. However, new classes of applications have emerged, e.g., blockchain and collaborative analytics, featuring data versioning, fork semantics, tamper-evidence or any combination thereof. They present new opportunities for storage systems to efficiently support such applications by embedding the above requirements into the storage.
In this paper, we present ForkBase, a storage engine designed for blockchain and forkable applications. By integrating core application properties into the storage, ForkBase not only delivers high performance but also reduces development effort. The storage manages multiversion data and supports two variants of fork semantics which enable different fork worklflows. ForkBase is fast and space efficient, due to a novel index class that supports efficient queries as well as effective detection of duplicate content across data objects, branches and versions. We demonstrate ForkBase's performance using three applications: a blockchain platform, a wiki engine and a collaborative analytics application. We conduct extensive experimental evaluation against respective state-of-the-art solutions. The results show that ForkBase achieves superior performance while significantly lowering the development effort.

References

[1]
Chainalysis - blockchain analysis. https://www.chainalysis.com.
[2]
Ethereum. https://www.ethereum.org.
[3]
Github. https://github.com.
[4]
Googledocs. https://www.docs.google.com.
[5]
Hyperledger. https://www.hyperledger.org.
[6]
LevelDB. https://github.com/google/leveldb.
[7]
MongoDB. http://mongodb.com.
[8]
Redis. http://redis.io.
[9]
RocksDB. http://rocksdb.org.
[10]
The Morning Paper review on ForkBase. https://blog.acolyer.org/2018/06/01/forkbase-an-efficient-storage-engine-for-blockchain-and-forkable-applications.
[11]
I. Ahn and R. Snodgrass. Performance evaluation of a temporal database management system. SIGMOD Record, 15(2):96--107, 1986.
[12]
A. Arasu, K. Eguro, R. Kaushik, D. Kossmann, P. Meng, V. Pandey, and R. Ramamurthy. Concerto: A high concurrency key-value store with integrity. In SIGMOD, pages 251--266, 2017.
[13]
A. Bhardwaj, S. Bhattacherjee, A. Chavan, A. Deshpande, A. J. Elmore, S. Madden, and A. Parameswaran. Datahub: Collaborative data science & dataset version mangement at scale. In CIDR, 2015.
[14]
S. Bhattacherjee, A. Chavan, S. Huang, A. Deshpande, and A. Parameswaran. Principles of dataset versioning: Exploring the recreation/storage tradeoff. PVLDB, 8(12):1346--1357, 2015.
[15]
N. Bronson, Z. Amsden, G. Cabrera, P. Chakka, P. Dimov, H. Ding, J. Ferris, A. Giardullo, S. Kullkarni, H. Li, M. Marchukov, D. Petrov, L. Puzar, Y. J. Song, and V. Venkataramani. Tao: Facebook's distributed data store for the social graph. In USENIX ATC, pages 49--60, 2013.
[16]
F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. ACM TOCS, 26(2):4, 2008.
[17]
A. Chavan and A. Deshpande. Dex: Query execution in a delta-based storage system. In SIGMOD, pages 171--186, 2017.
[18]
J. D. Cohen. Recursive hashing functions for N-grams. ACM Trans. Inf. Syst., 15(3):291--320, 1997.
[19]
D. Comer. Ubiquitous B-tree. ACM Computing Surveys (CSUR), 11(2):121--137, 1979.
[20]
B. F. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H.-A. Jacobsen, N. Puz, D. Weaver, and R. Yerneni. PNUTS: Yahoo!'s hosted data serving platform. PVLDB, 1(2):1277--1288, 2008.
[21]
N. Crooks, Y. Pu, N. Estrada, T. Gupta, L. Alvisi, and A. Clement. Tardis: A branch-and-merge approach to weak consistency. In SIGMOD, pages 1615--1628, 2016.
[22]
G. Decandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's highly available key-value store. In SOSP, volume 41, pages 205--220, 2007.
[23]
T. T. A. Dinh, J. Wang, G. Chen, L. Rui, K.-L. Tan, and B. C. Ooi. Blockbench: A benchmarking framework for analyzing private blockchains. In SIGMOD, pages 1085--1100, 2017.
[24]
T. T. A. Dinh, J. Wang, S. Wang, G. Chen, W.-N. Chin, Q. Lin, B. C. Ooi, P. Ruan, K.-L. Tan, Z. Xie, and M. Zhang. UStore: A distributed storage with rich semantics. CoRR, abs/1702.02799, 2017.
[25]
T. T. A. Dinh, M. Zhang, B. C. Ooi, and G. Chen. Untangling blockchain: A data processing view of blockchain systems. IEEE Transactions on Knowledge and Data Engineering (TKDE), pages 1366--1385, 2018.
[26]
I. Drago, M. Mellia, M. M Munafo, A. Sperotto, R. Sadre, and A. Pras. Inside Dropbox: Understanding personal cloud storage services. In Proceedings of the 2012 Internet Measurement Conference, pages 481--494, 2012.
[27]
J. R. Driscoll, N. Sarnak, D. D. Sleator, and R. E. Tarjan. Making data structures persistent. Journal of Computer and System Sciences, 38(1):86--124, 1989.
[28]
C. Dwork and M. Naor. Pricing via processing or combatting junk mail. In Annual International Cryptology Conference, pages 139--147, 1992.
[29]
K. Eshghi and H. K. Tang. A framework for analyzing and improving content-based chunking algorithms. Hewlett-Packard Labs Technical Report TR, 30(2005), 2005.
[30]
R. G. Guy, J. S. Heidemann, W.-K. Mak, T. W. Page Jr, G. J. Popek, D. Rothmeier, et al. Implementation of the Ficus replicated file system. In USENIX Summer, pages 63--72, 1990.
[31]
S. Huang, L. Xu, J. Liu, A. J. Elmore, and A. Parameswaran. OrpheusDB: Bolt-on versioning for relational databases. PVLDB, 10(10):1130--1141, 2017.
[32]
R. Jain and S. Prabhakar. Trustworthy data from untrusted databases. In ICDE, pages 529--540, 2013.
[33]
L. Jiang, B. Salzberg, D. Lomet, and M. Barrena. The BT-tree: A branched and temporal access method. 2000.
[34]
M. Kallahalla, E. Riedely, R. Swaminathan, Q. Wangz, and K. Fux. Plutus: Scalable secure file sharing on untrusted storage. In FAST, pages 29--42, 2003.
[35]
H. Kalodner, S. Goldfeder, A. Chator, M. Möser, and A. Narayanan. BlockSci: Design and applications of a blockchain analysis platform. CoRR, abs/1709.02489, 2017.
[36]
J. Katz and Y. Lindell. Introduction to modern cryptography. CRC Press, 2014.
[37]
A. Kemper and T. Neumann. Hyper: A hybrid oltp&olap main memory database system based on virtual memory snapshots. In ICDE, pages 195--206, 2011.
[38]
A. Lakshman and P. Malik. Cassandra: a decentralized structured storage system. SIGOPS Operating Systems Review, 44(2):35--40, 2010.
[39]
S. Lanka and E. Mays. Fully persistent B+-trees. In SIGMOD, pages 426--435, 1991.
[40]
F. Li, M. Hadjieleftheriou, G. Kollios, and L. Reyzin. Dynamic authenticated index structures for outsourced databases. In SIGMOD, pages 121--132, 2006.
[41]
J. Li, M. Krohn, D. Mazieres, and D. Shasha. Secure untrusted data repository. In OSDI, 2004.
[42]
W. Lloyd, M. J. Freedman, M. Kaminsky, and D. G. Andersen. Don't settle for eventual: Scalable causal consistency for wide-area storage with cops. In SOSP, pages 401--416, 2011.
[43]
M. Maddox, D. Goehring, A. J. Elmore, S. Madden, A. G. Parameswaran, and A. Deshpande. Decibel: The relational dataset branching system. PVLDB, 9(9):624--635, 2016.
[44]
R. C. Merkle. A digital signature based on a conventional encryption function. In A Conference on the Theory and Applications of Cryptographic Techniques on Advances in Cryptology, pages 369--378, 1988.
[45]
A. Muthitacharoen, B. Chen, and D. Mazieres. A low-bandwidth network file system. In SIGOPS Operating Systems Review, volume 35, pages 174--187, 2001.
[46]
E. W. Myers. An O(ND) difference algorithm and its variations. Algorithmica, 1(1--4):251--266, 1986.
[47]
S. Nakamoto. Bitcoin: A peer-to-peer electronic cash system. https://bitcoin.org/bitcoin.pdf, 2009.
[48]
F. A. Nothaft, M. Massie, T. Danford, and et al. Rethinking data-intensive science using scalable analytics systems. In SIGMOD, pages 631--646, 2015.
[49]
C. Okasaki. Purely functional data structures. Cambridge University Press, 1999.
[50]
P. O'Neil, E. Cheng, D. Gawlick, and E. O'Neil. The log-structured merge-tree (LSM-tree). Acta Informatica, 33(4):351--385, 1996.
[51]
H. Pang, A. Jain, K. Ramamritham, and K.-L. Tan. Verifying completeness of relational query results in data publishing. In SIGMOD, pages 407--418, 2005.
[52]
J. Paulo and J. Pereira. A survey and classification of storage deduplication systems. ACM Computing Surveys (CSUR), 47(1):11, 2014.
[53]
O. Rodeh. B-trees, shadowing, and clones. ACM Transactions on Storage (TOS), 3(4), 2008.
[54]
B. Salzberg and V. J. Tsotras. Comparison of access methods for time-evolving data. ACM Computing Surveys (CSUR), 31(2):158--221, 1999.
[55]
D. J. Santry, M. J. Feeley, N. C. Hutchinson, and A. C. Veitch. Elephant: the file system that never forgets. In HotOS, pages 2--7, 1999.
[56]
S. Shah, A. Dockx, A. Baldet, F. Bi, C. Allchin, S. Misra, M. Huebner, B. Sherpherd, and B. Holroyd. Unlocking economic advantage with blockchain: a guide for asset managers. Oliver Wyman and JP Morgan, 2016.
[57]
Y. Sompolinsky and A. Zohar. Secure high-rate transaction processing in Bitcoin. In International Conference on Financial Cryptography and Data Security, pages 507--527, 2015.
[58]
C. A. N. Soules, G. R. Goodson, J. D. Strunk, and G. R. Ganger. Metadata efficiency in versioning file systems. In FAST, 2003.
[59]
M. Stonebraker and L. A. Rowe. The design of the POSTGRES. In SIGMOD, pages 340--355, 1986.
[60]
J. D. Strunk, G. R. Goodson, M. L. Scheinholtz, C. A. N. Soules, and G. R. Ganger. Self-securing storage: Protecting data in compromised system. In OSDI, 2000.
[61]
A. U. Tansel, J. Clifford, S. Gadia, S. Jajodia, A. Segev, and R. Snodgrass. Temporal databases: Theory, design, and implementation. Benjamin-Cummings Publishing Co., Inc., 1993.
[62]
W. Xia, H. Jiang, D. Feng, F. Douglis, P. Shilane, Y. Hua, M. Fu, Y. Zhang, and Y. Zhou. A comprehensive study of the past, present, and future of data deduplication. Proceedings of the IEEE, 104(9):1681--1710, 2016.
[63]
L. Xu, A. Pavlo, S. Sengupta, and G. R. Ganger. Online deduplication for databases. In SIGMOD, pages 1355--1368, 2017.

Cited By

View all
  • (2025)EASL: Enhanced append-only skip list index for agile block data retrieval on blockchainFuture Generation Computer Systems10.1016/j.future.2024.107554164(107554)Online publication date: Mar-2025
  • (2024)Data Deduplication Based on Content Locality of Transactions to Enhance Blockchain ScalabilityACM Transactions on Architecture and Code Optimization10.1145/368054721:4(1-24)Online publication date: 19-Nov-2024
  • (2024)LETUS: A Log-Structured Efficient Trusted Universal BlockChain StorageCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653390(161-174)Online publication date: 9-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 11, Issue 10
June 2018
248 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 June 2018
Published in PVLDB Volume 11, Issue 10

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)EASL: Enhanced append-only skip list index for agile block data retrieval on blockchainFuture Generation Computer Systems10.1016/j.future.2024.107554164(107554)Online publication date: Mar-2025
  • (2024)Data Deduplication Based on Content Locality of Transactions to Enhance Blockchain ScalabilityACM Transactions on Architecture and Code Optimization10.1145/368054721:4(1-24)Online publication date: 19-Nov-2024
  • (2024)LETUS: A Log-Structured Efficient Trusted Universal BlockChain StorageCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653390(161-174)Online publication date: 9-Jun-2024
  • (2024)Dynamic Optimization for Trade-off in Hyperledger Fabric towards Latency-Sensitive IoT Services2024 IEEE International Conference on Web Services (ICWS)10.1109/ICWS62655.2024.00108(899-909)Online publication date: 7-Jul-2024
  • (2024)Adaptive Shrink and Shard Architecture Design for Blockchain Storage EfficiencyIET Computers & Digital Techniques10.1049/2024/22808282024Online publication date: 21-Feb-2024
  • (2024)SolsDBFuture Generation Computer Systems10.1016/j.future.2024.05.050160:C(295-304)Online publication date: 18-Oct-2024
  • (2024)NeurDB: an AI-powered autonomous data systemScience China Information Sciences10.1007/s11432-024-4125-967:10Online publication date: 13-Sep-2024
  • (2024)Improving query processing in blockchain systems by using a multi-level sharding mechanismThe Journal of Supercomputing10.1007/s11227-024-06037-580:10(15066-15096)Online publication date: 29-Mar-2024
  • (2024)An Efficient Data Reduction Method for DAG BlockchainProceedings of the 13th International Conference on Computer Engineering and Networks10.1007/978-981-99-9247-8_35(356-365)Online publication date: 4-Jan-2024
  • (2024)DELTAInternational Journal of Network Management10.1002/nem.229334:5Online publication date: 5-Aug-2024
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media