Optimized Data Storage Method For Sharding-Based Blockchain
Optimized Data Storage Method For Sharding-Based Blockchain
13, 2021.
Digital Object Identifier 10.1109/ACCESS.2021.3077650
ABSTRACT COVID-19 virus is raging across the planet. In countries where the epidemic is under control,
the main mode of virus transmission is through the transport of imported refrigerated food from epidemic
areas. Blockchain is a great way for the government to trace every piece of food. However, the high-
performance requirements of the blockchain system for nodes limit its wide application. Several sharding-
based blockchain systems have been proposed to solve this limitation. Which blocks should be saved by
nodes in the sharding-based blockchain system is a new problem. To solve this problem, the optimized
data storage method is proposed in this paper. Five features of block popularity are presented, including
the objective feature of a block, the objective feature of the block associated with the node, the historical
popularity, the hidden popularity and the storage requirements. Then the ELM classifier is used in the
optimized model due to its high performance of training and classification. Finally, the experimental results
on synthetic data demonstrate the accuracy and efficiency of the optimized data storage model.
INDEX TERMS Blockchain, hot block, classification, sharding technology, extreme learning machine.
The consignees cannot quickly take countermeasures and their data. MOF-BC introduces the notion of a Generator
cause additional losses. Verifier (GV) which decreases BC memory consumption
Therefore, it is necessary to improve the query efficiency effectively. Reference [23] explores key benefits and design
of the sharding-based blockchain. This paper first proposes challenges for blockchain technologies, and potential appli-
an optimized data storage model based on blockchain shard- cations of blockchain technologies for IoT. One of the most
ing technology. For each node in the optimization model, important challenges is the scalability of blockchain tech-
blocks can be divided into hot blocks and non-hot blocks nologies does not meet the IoT application requirements.
by the Extreme Learning Machine (ELM) method. After
classification, each node saves the most relevant hot block. B. SHARDING-BASED BLOCKCHAIN SYSTEMS
When a node initiates a query, it can query locally instead of Consensus Unit [6] is proposed to address the high storage
frequently sending query requests to other nodes. requirement in the wide usage of blockchain on various
Then, we define a hot block for a node. A hot block is com- devices such as mobile phones or low-end PCs. A Consensus
prehensively evaluated from the five aspects: the objective Unit organizes different nodes into one unit and lets them
evaluation of the block, the objective evaluation of the block store at least one copy of blockchain data in the system
related to the node, the historical popularity of the block being together. Based on this structure, [6] proposed a block allo-
used by the node, the hidden popularity of the block and the cation method to make full use of storage space and mini-
storage requirements of the block in the system, so that the mize query costs. The definition of query cost only considers
node can accurately find the most relevant hot block and store how often the block is queried. However, there are many
it locally. features that affect the relevance of a block to this node,
Specifically, the major contributions of this paper are the such as the number of transactions contained in the block,
followings: the transaction value recorded in the block, the number of
• We propose an optimized data storage model based on transactions related to the node in the block, etc. In this
blockchain sharding technology. The ELM method is paper, the query cost is evaluated more comprehensively and
used in this model to be the classifier in order to improve accurately. According to the evaluation results, more relevant
classification efficiency. data will be stored in the node, and query response time will
• We design an evaluation method of hot block for a node. be reduced.
According to this evaluation standard, nodes can classify Elasticchain [7] is another sharding-based blockchain sys-
the most relevant hot block and store it locally. tem. Nodes in ElasticChain store the shardings of the com-
• We conduct a set of experiments to demonstrate the plete chain based on the duplicate ratio regulation algorithm.
accuracy of the optimized data storage model and the Meanwhile, the node reliability verification method was used
query efficiency of the blockchain system using the new for increasing the stability of full nodes and reducing the
model based on the synthetic data. risk of data imperfect recovering caused by the reduction of
The remainder of the paper is organized as follows. duplicate numbers. However, data security is only considered
Section 2 reviews the related work on the blockchain technol- in the duplicate ratio regulation algorithm. The algorithm
ogy in IoT and the sharding-based blockchain technologies. gives the minimum number of copies stored for each block
Section 3 introduces the background of ELM. Section 4 intro- sharding, and nodes will be randomly selected to store these
duces the architecture of the optimized data storage model shardings. In this case, the stored data in the node is very
and the strategies of feature selection. Section 5 reports likely to be irrelevant to itself. When the node initiates a query
experimental evaluation. Finally, conclusions are presented request, more time will be spent to accessing other nodes. The
in Section 6. Optimized data storage method proposed in this paper will
alleviate this problem.
II. RELATED WORK
A. BLOCKCHAIN SOLUTIONS FOR IoT III. PRELIMINARIES
Currently, many researchers are working on the blockchain In this paper, ELM will be used as a classifier to distinguish
application in IoT. Such as, [8] proposes a blockchain-based whether a block is a hot spot of nodes. In this section, we give
framework for data integrity service without relying on any some preliminaries of this work including the theory and
Third Party Auditor. Reference [9] proposes CreditCoin, advantage of ELM, then we propose the problem definition.
a blockchain-based announcement network, which imple-
ments a reliable vehicular announcement by a user who does A. THE THEORY OF ELM
not reveal identity. Reference [10] implements a fully dis- ELM is originally developed for single hidden-layer feed-
tributed access control system based on blockchain to manage forward neural networks (SLFNs) and then extended to the
billions of IoT devices in a unified manner. This system ‘‘generalized’’ SLFNs where the hidden layer need not be
frees up a large of space and performance of edge devices. neuron alike [11], [12]. ELM first randomly assigns the input
Reference [22] proposes a Memory Optimized and Flexi- weights and the hidden layer biases, and then analytically
ble BC (MOF-BC) that enables the IoT users and service determines the output weights of SLFNs. ELM can achieve
providers to remove or summarize their transactions and age better generalization performance than other conventional
number of transactions in the block are related to a node, and stored in storage nodes. The verification node is to provide
the block includes a large number of transactions of other reliable storage nodes for the user nodes. The verification
nodes that are closely related to the node, etc. nodes visit and check the reliability of storage nodes at every
According to the current evaluation method for the corre- same period time, and the two inspection results are returned,
lation between a block and a node, there are two situations which are the integrity of the data in the storage nodes and
that may happen: Some blocks are currently accessed fre- number of successful verification by storage nodes.
quently by a node, but may be rarely accessed in the future. In our optimized data storage model, the verification nodes
The other is that the current node does not have high query also read the historical query records of the storage node.
requirements for a certain block, but it may need to visit this Meanwhile, the number of transactions, transaction contents
block frequently in future applications. When the above two in a block and the security of the block will be detected by
situations occur, the blockchain system will frequently update verification nodes, as shown in the feature extraction module.
the locally stored fragmented data, causing query delays. Then, five features (objective feature of a block, objective
Therefore, we propose an optimized data storage model in feature of the block associated with the node, historical pop-
this paper, which can comprehensively describe the correla- ularity, the hidden popularity and the storage requirements)
tion between blocks and nodes, and accurately classify the will be evaluated based on the four pieces of knowledge.
hot blocks for a node. These features can describe the popularity of a block for a
node completely.
IV. THE OPTIMIZED DATA STORAGE MODEL
The upper part of sharding-based blockchain module is
In this section, we first describe the architecture of the opti- the structure of storage nodes of ElasticChain. While the
mized data storage model. Then we introduce the features structure of other sharding-based blockchain systems like
used to classify hot nodes. After that, we propose an algo- Consensus Unit [6] is like the upper part. The sharding data
rithm to describe the data distribution process. is uniformly distributed by the system. Therefore, the classi-
fication process is directly completed by the system.
Finally, in the classifier module, the hot blocks of a node
A. ARCHITECTURE are classified based on these five important features by using
Figure 3 shows the architecture of the optimized data storage ELM. Then, nodes store the hot blocks as their sharding data.
model based on ELM. It mainly consists of three modules: In ELM classifier, some of the blocks are sampled as training
the left is the sharding-based blockchain module; the upper data. The sample blocks are the input of the classifier module.
right is the feature extraction module and the lower right is They consist of two kinds of blocks, hot blocks, and non-
the classifier module. hot blocks. The way to create and sample training data are
In the sharding-based blockchain module, the lower introduced in Section 4.3.
part is the ElasticChain model. There are three roles for Next, we describe the feature selection process in detail.
nodes [7]: the user node, the storage node, and the verification
node. User nodes are participants in the blockchain system. B. FEATURE SELECTION
Blockchain operations, such as transactions, are completed In sharding-based blockchain systems, multiple features will
between user nodes. And the sharding blockchain data is affect the popularity of a block for a node, and we choose
five important features among them in this paper. The chosen Definition 2 (Objective Feature of the Block Associated
features are the objective feature of a block (OF), the objec- With the Node (OFN )): The objective feature of a block
tive feature of the block associated with the node (OFN ), (OF) can make a unified evaluation for all blocks, and the
the historical popularity (HIS), the hidden popularity (HID) evaluation is macroscopic. In the sharded blockchain system,
and the storage requirements (SR). Correct evaluations on the each node keeps a number of blocks in which the data is
popularity of a block can be made in most cases by using these commonly used by the nodes. Therefore, it is also necessary
five features. to objectively evaluate the block from a micro perspective,
Admittedly, other features may also affect the popularity of which is related to the node. If the value of the objective
blocks in some special scenarios. The features can be added to feature of a block is not very large, but the features are
the feature extraction module without changing the structure strongly correlated with a node, the node will also have a high
of optimized model. probability to query the data in this block in the future.
The five features we proposed are all calculated from other The objective feature of a block associated with the node
two to four features. We input the five summarized features (OFN ) can be expressed as follows:
into the ELM classifier because we want to reduce the dimen- j
sion of classifier and increase the speed of classification.
X
OFN = Nn × (TVnk ) × b−(I −i) (11)
The blocks B (B = {b1 , b2 , . . . , bI }) will be detected k=1
at a fixed interval. The five features are updated after each
Here, when a node store a block bi , Nn is the number of
detection. When I blocks are produced, I sets of feature data
transactions which the initiator or receiver is this node in this
will be generated. Each set has five features, so the data sets
block. TVn is the total value of these Nn transactions. b−(I −i)
(DS) of blocks can be expressed as a 5 × I matrix:
is the coefficient of block location (b > 1), which is the same
OF1 OFN1 HIS1 HID1 SR1 as in formula (10).
OF2 OFN2 HIS2 HID2 SR2 Definition 3 (Historical Popularity (HIS)): Besides the
DS = . . . .. ..
.. .. .. objective features of blocks, the historical record of a block
. . read by nodes also has a greater impact on the block popu-
OFI OFNI HISI HIDI SRI 5×I larity. By analyzing historical query records, we evaluate the
(9) historical popularity of a block for a node from three aspects:
the usage frequency of the block, the time since the block is
Then, we define five block popularity features for a node.
used, and the response time of each query. When a block is
Definition 1 (Objective Feature of a Block (OF)): The
frequently used by a node, the node should store the block
objective feature is the evaluation of a block based on its basic
locally to reduce the time cost in the next query. However,
characteristics. A block (bi (1 ≤ i ≤ I )) consists of a block
if some blocks have not too many total visits, but they have
header (hi ) and a block body (di ), i.e. bi = {hi , di }. Many
been visited many times recently, these blocks can also be
transactions (t) exist in a block body (di = {t1 , t2 , . . . , tj }).
regarded as hot blocks. Moreover, if the query response time
A transaction tj includes the initiator of the transaction (ITj ),
of some blocks is long, it means that the node that saves
the receiver of the transaction (RTj ) and the transaction value
this block is in a poor network environment. We also need
(TVj ), i.e. tj = {ITj , RTj , TVj }.
to consider storing the blocks with slower query response
First, let the number of transactions in a block be Nt , and
locally. The historical popularity (HIS) can be expressed as
the total number of users involved in the transactions be Nu
follows:
(Nu is the sum of IT and RT ). If the Nt and Nu of a block
P
are large, the block is likely to be accessed and queried by X 1
HIS = ( ) (12)
users in the future. Then, the greater the total value (TV ) of all TIp × QTp
p=1
transactions in a block (TV is the sum of each TVj ), the more
important the block is. Moreover, the location of the block Here, P is the total number of times the block has been
will also affect the objective popularity feature of a block. accessed. p (1 ≤ p ≤ P) is the pth access. TIp is the time
An old data may be accessed less often than new data in most interval between the pth access and the current. QTp is the
cases. query time spent in the pth access.
The objective feature of a block (OF) can be expressed as Definition 4 (Hidden Popularity (HID): Node A
follows: stores the block bi , and bi records many transactions
j {t1 , t2 , . . . , tj }, which include the initiators of the trans-
OF = (Nt + Nu ) ×
X
(TVk ) × a−(I −i) (10) action ({IT1 , . . . , ITj }), the receivers of the transaction
({RT1 , . . . , RTj }) and the transaction values ({TV1 , . . . , TVj }).
k=1
In some case, the objective feature value and historical pop-
where is the coefficient of block location (a > 1).
a−(I −i) ularity value of this block are not very large due to the small
I is the number of the lastest block, and i (1 ≤ i ≤ I ) is the transaction values ({TV1 , . . . , TVj }). However, if the transac-
number of the evaluated block. Therefore, the OF of a block tion data of these nodes ({IT1 , . . . , ITj } and {RT1 , . . . , RTj })
will decrease the addition of blocks. stored in other blocks is accessed frequently in the recent past
by node A, although block bi is not a hot block currently, it has in Elasticchain [7]. Then, the arrays of features are used as
great potential to become a hot block in the future. The reason inputs to train the ELM model. In the ELM-based classifier,
is that node A is likely to access the data related to these nodes each block can be classified into ‘‘hot’’ class or ‘‘non-hot’’
({IT1 , . . . , ITj } and {RT1 , . . . , RTj }) in block bi . We define the class.
hidden popularity of a block to evaluate the potential of the
block to become a hot block. The hidden popularity of a block D. THE OPTIMIZED DATA STORAGE MODEL
can be evaluated by formula 13. We take Elasticchain as an example to illustrate the building
Q process of the optimized data storage model, as shown in
Algorithm 2.
X
HID = DTq × (η + ξ ) (13)
q=1
Algorithm 2 The Optimized Data Storage Model
Here, we first set a fixed time Tf , and the queries within
Input: parameters a, b and Tf
time Tf is considered to be the recent queries. If the pro-
Output: blockchain data storage scheme
cessing power is limited or the hidden popularity needs to
1 verification nodes (V ) visit blocks {b1 , b2 , . . . , bi } at
be obtained quickly, we can reduce the setting of Tf . There
every same period of time;
are Q transactions in the recent time Tf . DTq is the distance
2 V record the data (Nt , Nu , Tv , Nn , TVn , TI , QT , DT ,
between the time when the qth transaction was queried and the
and security requirement};
present. If the initiator of the qth (1 ≤ q ≤ Q) transaction also
3 V calculate the feature values
has a related transaction in block bi , then η = 1, otherwise
(OF, OFN , HIS, HID, SR) of each block according to
η = 0. Similarly, if the receiver of the qth transaction also has
the recorded data and parameters;
a related transaction in block bi , then ξ = 1, otherwise ξ = 0.
4 V train the ELM classifier by using feature values;
Definition 5 (Storage Requirements (SR)): Most sharding-
5 V classify the new block (hot block or non-hot block);
based blockchain systems have a minimum requirement for
6 V record the classification results;
the number of copies of a block to be stored. For example,
7 V provide hot blocks for user nodes;
in [6], the system requires that each Consensus Unit need to
8 user nodes store hot blocks locally;
save a complete blockchain data. Therefore, as a member of
the Consensus Unit, a node cannot only store blocks with
high objective feature values. The nodes in the Consensus Firstly, The values of parameters a, b and Tf will be
Unit must meet the requirements of the system. As another determined according to system requirements. Secondly,
example, Elasticchain [7] proposes the Duplicate Ratio Reg- the verification nodes of Elasticchain system visit blocks
ulation algorithm, which analyzes the security of each block {b1 , b2 , . . . , bi } at every same period time and record their
and sets the minimum number of copies stored in each block. knowledge (each Nt , Nu and Tv ; each Nn and TVn ; each TI
The number of blocks stored by the nodes in Elasticchain and QT ; each DT ; each security requirement). Then, the opti-
system needs to reach this minimum value, and the nodes mized data storage model calculates the five feature values of
cannot blindly save blocks with higher popularity. each block based on their knowledge. The five features are
Therefore, we define storage requirements (SR) to describe the objective feature of a block (OF), the objective feature
this feature of blocks. For example, for blocks whose number of the block associated with the node (OFN ), the historical
of copies does not meet the requirements of the sharding- popularity (HIS), the hidden popularity (HID) and the storage
based blockchain system, the value of the storage require- requirements (SR).
ments for these blocks will increase, and the popularity of Next, ELM classifier trains the model by using these
this block will increase. In this way, nodes in the blockchain feature values. The trained model is used to classify new
systems will store these blocks locally and meet the system blocks as they are generated. Verification nodes will record
requirements. Here, we do not give a specific calculation the classification results (hot block or non-hot block). Finally,
formula for storage requirements, because the requirements user nodes store the hot blocks locally for quick retrieval.
of each sharding-based blockchain system are different.
V. EVALUATION
C. TRAINING ELM The setup of evaluations is firstly introduced in Section 5.1.
After the feature selection, ELM is selected as the classifier Then we evaluate the classification performance of the opti-
to learn the five features: the objective feature, the objec- mized data storage model in Section 5.2. Section 5.3 evaluates
tive feature associated with the node, the historical popu- the query performance of blockchain system by using the
larity, the hidden popularity and the storage requirements. optimized data storage model.
The blockchain system detects and records the 4 knowledge
(number of transactions, transaction contents, historical A. EXPERIMENT SETTINGS
query records and security analysis) of each block, and cal- All experiments are conducted on a 3.2-GHz, Core i5 CPU
culates the feature value to form a feature array. The job PC with 16G memory running the Window 7 operating sys-
of detection and record is finished by the verification nodes tem. Each node in blockchain system is created by VMware
Workstation. Each node is based on ubuntu16.04 system and B. EVALUATION OF CLASSIFICATION PERFORMANCE
is configured with 300MB of memory and 1GB of hard disk In the experiment, 1500 groups of features of 50 blockchains
space. will be divided into 3 groups. Dataset 1 contains the feature
The synthetic data is used as the experimental data set values of 40 blocks (1200 groups of features). Dataset 2 and
in this paper. We refer to the characteristics of the real Dataset 3 contain the feature values of 5 blocks (150 groups of
blockchain system and use the controlled variable method to features). 50% of the data in Dataset 2 is hot block for nodes,
assign the parameters of 50 blocks. The parameters includes and 70% of the data in Dataset 3 is hot block. Dataset 1 is
Nt , Nu , Tv , Nn , TVn , TI , QT and DT . In addition, a = 1 and used as the training set and input to the optimization model
b = 1. Then, 30 nodes are used in the experiment. Each based on ELM. Dataset 2 and Dataset 3 will be used as the
block will generate a group of parameters for a node. For each test set of the model. The number of hidden layer nodes is
group of parameters, four feature values (OF, OFN , HIS 10 in the ELM classifier.
and HID) will be calculated by formula (10), (11), (12) Then, the SVM method is compared with the ELM method
and (13), respectively. We ended up with 1,500 sets of in the experiment. We modify the ELM classifier in the
data. Each group of data is artificially divided into hot optimization model to an SVM classifier for training and
blocks or non-hot blocks. The storage requirements feature testing. We choose a sigmoidal kernel function and set the
are not considered in the experiment, because each sharding- penalty parameter as 10 for the SVM-based classifier.
based blockchain system has different evaluation criteria for We experimented on the accuracy, precision, recall and
the storage requirements of the blocks. F1-measure of block popularity evaluation. The accuracy of
The real dataset is not used in the experiment because a classifier can be expressed as follows:
there is no real dataset that meets the experimental require-
Accuracy = (TP + TN )/(TP + FN + FP + TN ) (14)
ments. For example, supply chain and cold chain food trace-
ability platforms (e.g. Beijing cold chain food traceability where TP is True Positive, FP is False Positive, TN is True
platform [18], etc.) are suitable application scenarios for our Negative, FN is False Negative. And the precision of a clas-
model, but most platforms have been established and applied sifier can be expressed as follows:
recently, and data cannot be obtained. Or in some public
blockchain systems (e.g. Bitcoin, Ethereum, and etc.), we can Precision = TP/(TP + FP) (15)
get the objective feature of each block, but other features such The recall of a classifier can be expressed as follows:
as OFN, HIS, and HID are not available. Therefore, synthetic
data is used. Recall = TP/(TP + FN ) (16)
The F1-measure of a classifier can be expressed as follows: amount of test data is large, the classification results are more
convincing.
Precision · Recall
F1 − measure = 2 × (17) (3) By comparing Dataset 2 and Dataset 3, we can find that
Precision + Recall the accuracy, precision, recall and F1-measure of the block
The optimized data storage model (ELM-model), popularity evaluation based on Dataset 3 are slightly higher
SVM-based optimized model (SVM-model) and Elastic- than those based on Dataset 2. However, the difference is so
Chain model are tested based on Dataset2 and Dataset3 when small that it is negligible. Therefore, different datasets have
there are 4, 12, 20 and 30 nodes in models. Experimental little effect on the performance of the optimization model.
results on Dataset 2 is shown in Figure 4 and Figure 5 shows
the results based on Dataset 3. C. EVALUATION OF QUERY PERFORMANCE
We can get the following conclusions from Figure 4 and Then, we test the query performance of sharding-based
Figure 5. blockchain systems when using the optimized data storage
(1) The four evaluation indexes (accuracy, precision, recall model. We build the ElasticChain system by using Hyper-
and F1-measure) of the optimized models (ELM-based and ledger fabric V0.6 because fabric V0.6 is one of the earliest
SVM-based) are obviously higher than that of ElasticChain widely used blockchain systems. Three ElasticChain systems
model. The reason is that the optimized data storage model is are deployed in the experiment. The first system adopts the
adopted in ELM-based and SVM-based models to classify the ELM-based optimized data storage model. The second sys-
hot blocks. The optimized models give a more comprehensive tem adopts the SVM-based optimized model, and the third
evaluation of the popularity of a block. Meanwhile, the eval- system is the ElasticChain original system. According to the
uation indexes of ELM-based optimized data storage model benchmark work of the blockbench [19], when the system is
are all the highest, and the indexes of SVM-based model is running normally, the maximum number of nodes is 16. Thus,
slightly lower than that of ELM-based model. This is because 4, 8, 12 and 16 nodes are established.
the performance of the ELM classifier is slightly better than We operate the chaincode called example02.go [20], and
that of the SVM. every time a transaction is completed, 5.39KB broadcast
(2) In the same dataset, the accuracy, precision, recall and message is generated. The block size is set to 100. In other
F1-measure of the block popularity evaluation show shows an words, each block contains 100 blocks completed. When
upward trend as the number of nodes increases. This is due 2000 transactions are completed, 20 blocks will be created
to the fact that there is less feature data for blocks when the and stored by nodes according to the duplicate ratio regulation
number of nodes is small. As the number of nodes increases, algorithm [7]. We randomly select 10 transactions for query
the data of block popularity increases continuously. When the and record the query time.
The average time for querying a transaction based on method is used to be the classifier. The experimental results
Dataset 2 and Dataset 3 is shown in Figure 6, and we can on synthetic data demonstrate the accuracy and efficiency of
draw the conclusions as below. the optimized data storage model.
(1) The average query time for a transaction when there In the future, more sharding-based blockchain systems will
are 16 nodes in three systems (ELM-based optimized model, be designed. We need to analyze the different characteristics
SVM-based optimized model and ElasticChain) is much of each new system and propose suitable data storage meth-
larger the query time when there are 4 or 8 nodes. The reason ods for each system. At the same time, machine learning tech-
is that we set the minimum number of block duplicates as niques are researched by many experts and large companies.
8 in the duplicate ratio regulation algorithm. When fewer than Novel and efficient machine learning methods are constantly
8 nodes exist in the system, each node maintains a complete being proposed. In the future, it will be an attractive direction
blockchain copy, and the queries in three models are similar to to replace the ELM method with other more effective machine
local queries. However, when there are more than eight nodes learning methods in sharding-based blockchain systems.
in the system, sharding blockchain data is stored in nodes.
Nodes may need to visit other nodes to retrieval the target REFERENCES
data and the response times are increased significantly.
[1] B. H. Yang and C. Chen, Blockchain Principle, Design and Application,
(2) When 4 and 8 nodes exist in three systems, the average 1st ed. Beijing, China: China Machine Press, 2020.
query time of the three systems is almost the same because [2] E. Kokoris-Kogias, P. Jovanovic, L. Gasser, N. Gailly, E. Syta, and B. Ford,
they all use the local query method. However, when there are ‘‘OmniLedger: A secure, scale-out, decentralized ledger via sharding,’’ in
Proc. IEEE Symp. Secur. Privacy, San Francisco, CA, USA, May 2018,
16 nodes in systems, the average query time of ELM-based pp. 583–598.
and SVM-based optimized model is lower than that of Elas- [3] M. Zamani, M. Movahedi, and M. Raykova, ‘‘RapidChain: Scaling
ticChain system. The reason is optimized models adopt the blockchain via full sharding,’’ in Proc. ACM SIGSAC Conf. Comput.
Commun. Secur., Toronto, ON, Canada, Oct. 2018, pp. 931–948.
feature extraction method based on this paper, and the blocks
[4] L. Luu, V. Narayanan, C. Zheng, K. Baweja, S. Gilbert, and P. Saxena,
with high popularity are saved in the nodes. Optimized mod- ‘‘A secure sharding protocol for open blockchains,’’ in Proc. ACM SIGSAC
els reduce the number of cross-node queries and increase the Conf. Comput. Commun. Secur., Vienna, Austria, Oct. 2016, pp. 17–30.
number of local queries. Therefore, the query time is reduced. [5] H. Dang, T. T. A. Dinh, D. Loghin, E.-C. Chang, Q. Lin, and B. C. Ooi,
‘‘Towards scaling blockchain systems via sharding,’’ in Proc. SIGMOD,
Meanwhile, the query response time of the ELM-based opti-
Amsterdam, The Netherlands, Jun. 2019, pp. 123–140.
mized model is less than the time of SVM-based optimized [6] Z. Xu, S. Han, and L. Chen, ‘‘CUB, a consensus unit-based storage scheme
model because of the high performance of the ELM model. for blockchain system,’’ in Proc. IEEE Int. Conf. Data Eng. (ICDE), Paris,
France, Apr. 2018, pp. 173–184.
[7] D. Jia, J. Xin, Z. Wang, W. Guo, and G. Wang, ‘‘ElasticChain: Support very
VI. CONCLUSION
large blockchain by reducing data redundancy,’’ in Proc. APWeb-WAIM,
In our study, we presented the optimized data storage method Macau, China, 2018, pp. 440–454.
for sharding-based blockchain. The optimized method com- [8] B. Liu, X. L. Yu, S. Chen, X. Xu, and L. Zhu, ‘‘Blockchain based data
bines blockchain and artificial intelligence and solves the integrity service framework for IoT data,’’ in Proc. IEEE Int. Conf. Web
Services (ICWS), Honolulu, HI, USA, Jun. 2017, pp. 468–475.
current hot problem that the current cold chain information is [9] L. Li, J. Liu, L. Cheng, S. Qiu, W. Wang, X. Zhang, and Z. Zhang,
difficult and inefficient to trace. Five features are proposed in ‘‘CreditCoin: A privacy-preserving blockchain-based incentive announce-
this paper to evaluate the popularity of a block, including the ment network for communications of smart vehicles,’’ IEEE Trans. Intell.
Transp. Syst., vol. 19, no. 7, pp. 2204–2220, Jul. 2018.
objective feature of a block, the objective feature of the block
[10] O. Novo, ‘‘Blockchain meets IoT: An architecture for scalable access
associated with the node, the historical popularity, the hid- management in IoT,’’ IEEE Internet Things J., vol. 5, no. 2, pp. 1184–1195,
den popularity and the storage requirements. Then the ELM Apr. 2018.
[11] C. Li, C. Deng, S. Zhou, B. Zhao, and G.-B. Huang, ‘‘Conditional ran- JUNCHANG XIN received the B.Sc., M.Sc., and
dom mapping for effective ELM feature representation,’’ Cogn. Comput., Ph.D. degrees in computer science and technol-
vol. 10, no. 5, pp. 827–847, Oct. 2018. ogy from Northeastern University, China, in 2002,
[12] D. Cui, G.-B. Huang, and T. Liu, ‘‘ELM based smile detection using 2005, and 2008, respectively. He visited the
distance vector,’’ Pattern Recognit., vol. 79, pp. 356–369, Jul. 2018. National University of Singapore as a Postdoctoral
[13] G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, ‘‘Extreme learning machine: The- Visitor, from 2010 to 2011. He is currently an
ory and applications,’’ Neurocomputing, vol. 70, nos. 1–3, pp. 489–501, Associate Professor with the School of Computer
Dec. 2006.
Science and Engineering, Northeastern University.
[14] G.-B. Huang and C. K. Siew, ‘‘Extreme learning machine: RBF net-
He has published more than 60 research articles.
work case,’’ in Proc. Int. Conf. Control, Automat., Robot. Vis. (ICARCV),
Kunming, China, 2004, pp. 1029–1036. His research interests include big data, uncertain
[15] Extreme Learning Machine. Accessed: Jan. 23, 2021. [Online]. Available: data, bioinformatics, and blockchain database. He served as PIs or Co-PIs
https://personal.ntu.edu.sg/egbhuang/ for more than ten national research grants from NSFC, the 863 Program, the
[16] N.-Y. Liang, G.-B. Huang, P. Saratchandran, and N. Sundararajan, ‘‘A fast Project 908 under the State Oceanic Administration, and so on.
and accurate online sequential learning algorithm for feedforward net-
works,’’ IEEE Trans. Neural Netw., vol. 17, no. 6, pp. 1411–1423,
Nov. 2006.
[17] J. Tang, C. Deng, and G.-B. Huang, ‘‘Extreme learning machine for
multilayer perceptron,’’ IEEE Trans. Neural Netw. Learn. Syst., vol. 27, ZHIQIONG WANG received the M.Sc. degree in
no. 4, pp. 809–821, Apr. 2016. computer applications technology and the Ph.D.
[18] Beijing Cold Chain Food Traceability Platform. Accessed: May 5, 2021. degree in computer software and theory from
[Online]. Available: https://sp.scjgj.beijing.gov.cn/cctp/login Northeastern University, China, in 2008 and 2014,
[19] T. T. A. Dinh, J. Wang, G. Chen, R. Liu, B. C. Ooi, and K.-L. Tan, respectively. She visited the National University
‘‘BLOCKBENCH: Framework for analyzing private blockchains,’’ in
of Singapore, in 2010, and The Chinese Univer-
Proc. ACM Int. Conf. Manage. Data (SIGMOD), Chicago, IL, USA, 2017,
sity of Hong Kong, in 2013, as an Academic
pp. 1085–1100.
[20] Beijing Cold Chain Food Traceability Platform. Accessed: May 5, 2021. Visitor. She is currently an Associate Professor
[Online]. Available: https://github.com/hyperledger/fabric/tree/v0.6 with the College of Medicine and Biological Infor-
[21] A. Dorri, S. S. Kanhere, and R. Jurdak, ‘‘Towards an optimized blockchain mation Engineering, Northeastern University. She
for IoT,’’ in Proc. 2nd Int. Conf. Internet-of-Things Design Implement. has published more than 30 articles. Her current research interests include
(IoTDI), Pittsburgh, PA, USA, 2017, pp. 173–178. biomedical, biological data processing, cloud computing, and machine learn-
[22] A. Dorri, S. S. Kanhere, and R. Jurdak, ‘‘MOF-BC: A memory optimized ing. She served as PIs or Co-PIs for more than ten national research
and flexible blockchain for large scale networks,’’ Future Gener. Comput. grants from NSFC, the Natural Science Foundation of Liaoning Province,
Syst., vol. 92, pp. 357–373, Mar. 2019. and so on.
[23] V. Dedeoglu, R. Jurdak, A. Dorri, R. Lunardi, R. Michelin, A. F. Zorzo, and
S. Kanhere, ‘‘Blockchain technologies for IoT,’’ in Advanced Applications
of Blockchain Technology. Singapore: Springer, 2020, pp. 55–89.