Optimized Data Storage Method For Sharding-Based Blockchain

1) The document proposes an optimized data storage method for sharding-based blockchain systems to address the issue of which blocks nodes should save. 2) It presents five features to characterize block popularity: objective block features, node-associated block features, historical popularity, hidden popularity, and storage requirements. 3) An extreme learning machine classifier is used to classify blocks based on these features due to its high training and classification performance. Experimental results on synthetic data demonstrate the accuracy and efficiency of the optimized storage model.

Uploaded by

Công Nguyễn Thành

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views

Optimized Data Storage Method For Sharding-Based Blockchain

Uploaded by

Công Nguyễn Thành

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Received March 26, 2021, accepted May 2, 2021, date of publication May 5, 2021, date of current version May

13, 2021.
Digital Object Identifier 10.1109/ACCESS.2021.3077650

Optimized Data Storage Method for

Sharding-Based Blockchain
DAYU JIA 1, JUNCHANG XIN 1,2 , ZHIQIONG WANG 3,4 , AND GUOREN WANG5
1 School of Computer Science and Engineering, Northeastern University, Shenyang 110169, China
2 Key Laboratory of Big Data Management and Analytics, Northeastern University, Shenyang 110169, China
3 College of Medicine and Biological Information Engineering, Northeastern University, Shenyang 110169, China
4 Neusoft Research of Intelligent Healthcare Technology Company Ltd., Shenyang 110167, China
5 School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China

Corresponding author: Junchang Xin (xinjunchang@mail.neu.edu.cn)

This work was supported in part by the National Natural Science Foundation of China under Grant 62072089; and in part by the
Fundamental Research Funds for the Central Universities under Grant N2116016, Grant N161602003, Grant N180408019, and
Grant N180101028.

ABSTRACT COVID-19 virus is raging across the planet. In countries where the epidemic is under control,
the main mode of virus transmission is through the transport of imported refrigerated food from epidemic
areas. Blockchain is a great way for the government to trace every piece of food. However, the high-
performance requirements of the blockchain system for nodes limit its wide application. Several sharding-
based blockchain systems have been proposed to solve this limitation. Which blocks should be saved by
nodes in the sharding-based blockchain system is a new problem. To solve this problem, the optimized
data storage method is proposed in this paper. Five features of block popularity are presented, including
the objective feature of a block, the objective feature of the block associated with the node, the historical
popularity, the hidden popularity and the storage requirements. Then the ELM classifier is used in the
optimized model due to its high performance of training and classification. Finally, the experimental results
on synthetic data demonstrate the accuracy and efficiency of the optimized data storage model.

INDEX TERMS Blockchain, hot block, classification, sharding technology, extreme learning machine.

I. INTRODUCTION of research. Sharding technology is one of the widely used

Blockchain and artificial intelligence are regarded as the two methods [2]–[7].
innovative technologies that are most likely to increase the The current sharding technology mainly has two struc-
productivity of human society in the next ten years [1]. More- tures, as shown in Figure 1. One structure is that a group of
over, these two technologies, together with cloud computing resource-constrained nodes form an organization to complete
and data science, are collectively referred to as ‘‘ABCD’’, all tasks of a transaction. An organization stores the same
which is the future direction of information technology consensus transaction data, but each organization stores dif-
research. At present, blockchain technology has been widely ferent shard data, such as OmniLedger [2], RapidChain [3],
used in many fields such as finance, database, medical treat- Elastico [4] and another sharding chain [5]. Another struc-
ment and government work. ture, such as Consensus Unit [6] and Elasticchain [7], where
However, the original blockchain technology has two pain some resource-constrained nodes form a whole to complete
points. One is that full nodes have to jointly maintain the same the work of a full node in the blockchain system. These nodes
ledger, which results in a low throughput that cannot meet reach a consensus with other full nodes or composed full
the needs of real-world applications such as banking transac- nodes. Each of the full nodes (or composed full nodes) stores
tions. The other is that the blockchain system requires each the same data.
full node to keep a complete copy of the blockchain, which Both structures have their own advantages. The first
severely limits the joining of resource-constrained nodes. structure can achieve visa-level throughput. While transac-
In order to solve these two pain points, there has been a lot tion data in the second structure is more secure because
it is stored and maintained by all full nodes. The research
The associate editor coordinating the review of this manuscript and work in this paper is based on the second sharding
approving it for publication was Alberto Cano . structure.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
67890 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ VOLUME 9, 2021
D. Jia et al.: Optimized Data Storage Method for Sharding-Based Blockchain

logistics service providers and Dealerships. (2) The relation-

ship between participants is close. Any change in one link
will cause fluctuations in other links, and the impact will be
amplified step by step.
According to the characteristics of the supply chain, its
current pain points are as follows: (1) The cost of information
exchange between enterprises is high because data sharing is
hindered by data islands and privacy protection. (2) Commod-
ity traceability and anti-counterfeiting are difficult to achieve
because there is no guarantee that the data provided by a party
in the supply chain is absolutely true and reliable.
Pain point 2 is highlighted when the COVID-19 virus is
ravaging the world. In some countries with better epidemic
prevention work, such as China, the main mode of virus trans-
mission is to infect and spread the virus when transporting
imported refrigerated food from epidemic areas. At present,
China’s cold chain (a kind of supply chain application) infor-
mation system is imperfect, and a lot of resources are wasted
FIGURE 1. Sharding technology in blockchian.
to search for the source of the virus. Therefore, the pain point
of the supply chain not only spawns huge economic losses
but also affects people’s health.
In the application of sharding-based blockchain, each
The combination of blockchain and Internet of Things
resource-constrained node only saves part of the blockchain
(IoT) technology can solve the current pain points of the
data due to the huge amount of data in a ledger. When a node
supply chain. The data detected by edge devices in the supply
needs to read data, it needs to initiate a query request to other
chain are stored in the blockchain through IoT technology.
nodes if the data is not stored locally. However, in the current
Blockchain, through privacy protection mechanisms such as
sharding methods, no effective method has been proposed to
information encryption and decryption, and zero-knowledge
tell nodes which sharding data should be saved. For example,
proof, can remove the obstacles caused by data privacy to data
(details in related work) in [7], under the premise of ensuring
sharing. At the same time, the data structure constructed by
that the total number of shards in the system is sufficient,
the blockchain in a peer-to-peer network environment has the
the resource-constrained node randomly saves the shard data.
characteristics of data traceability and anti-forgery.
[6] proposed a method for nodes to save sharding data, but it
Some blockchain (BC) systems applied to the IoT scenar-
considers fewer parameters, only including the storage space
ios have been proposed, such as an optimized blockchain
of nodes and the frequency of node access data. In the current
(BC) [21], which employs a hierarchical architecture that
blockchain system, if a node needs to access other nodes
uses a centralized private Immutable Ledger (IL) at the local
when reading data, in some scenarios with poor network
IoT network level to reduce overhead, and a decentralized
environments and frequent queries, the query efficiency will
public BC at higher end devices for stronger trust. The opti-
be seriously affected.
mized BC eliminates the overhead associated with the classic
blockchain while retaining its security and privacy benefits.
As shown the red line in Figure 2, the government can
quickly trace all locations and persons (red circles) related
to the virus based on the relevant data in the blockchain (red
blocks).
Moreover, most of the edge devices of the IoT are resource-
constrained nodes and cannot be added to the blockchain
system as full nodes. Therefore, the blockchain system using
sharding technology is suitable for supply chains.
In some scenarios in the supply chain, the edge device
is in a poor network environment, and the query demand
is high. For example, in cold chain transportation, trucks
FIGURE 2. A scenario for supply chain. are often located in remote areas, and the devices in the
cold box detect the temperature of the food in real time.
Take the application scenario of the supply chain as an Truck drivers, shippers, and consignees all need to fre-
example. As shown in Figure 2, the supply chain has 2 char- quently check the temperature of food. If the query cannot be
acteristics: (1) There are a large number of independent responded to in time, the state of the food temperature greater
participants, including cross-country production factories, than zero degrees Celsius will not be discovered in time.