Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Securing the Ethereum from Smart Ponzi Schemes: Identification Using Static Features

Published: 22 July 2023 Publication History

Abstract

Malware detection approaches have been extensively studied for traditional software systems. However, the development of blockchain technology has promoted the birth of a new type of software system–decentralized applications. Composed of smart contracts, a type of application that implements the Ponzi scheme logic (called smart Ponzi schemes) has caused irreversible loss and hindered the development of blockchain technology. These smart contracts generally had a short life but involved a large amount of money. Whereas identification of these Ponzi schemes before causing financial loss has been significantly important, existing methods suffer from three main deficiencies, i.e., the insufficient dataset, the reliance on the transaction records, and the low accuracy. In this study, we first build a larger dataset. Then, a large number of features from multiple views, including bytecode, semantic, and developers, are extracted. These features are independent of the transaction records. Furthermore, we leveraged machine learning methods to build our identification model, i.e., Multi-view Cascade Ensemble model (MulCas). The experiment results show that MulCas can achieve higher performance and robustness in the scope of our dataset. Most importantly, the proposed method can identify smart Ponzi scheme at the creation time.

1 Introduction

1.1 Blockchain and Decentralized Application

In recent years, blockchain technology has received extensive attention from the industry and academia [28, 37]. Simply, a blockchain is a continuously growing chain of blocks (i.e., a ledger) maintained in a distributed network (i.e., peer-to-peer network), where each peer contains a complete copy of the chain [43]. Each block in the ledger contains a certain number of transactions and a corresponding timestamp, and is linked to its previous block by including a hash of it. This decentralized maintenance and hash connection make the blockchain almost immutable because modifying any one block requires simultaneously generating new hash values for all subsequent blocks and getting the approval of more than 50% of the peers.
Due to its immutability nature, blockchain technology has become the cornerstone of many new forms of applications. Bitcoin, and cryptocurrencies like it, are an important class of applications based on it. With the deepening of the research and discussion on the blockchain technology, various blockchain-empowered applications have been emerging. Due to the decentralized nature of blockchain, these new types of applications are called decentralized applications (DApps) [10, 49]. Composed of smart contracts [53, 59], which are executable programs written into the blocks, DApps have many properties different from traditional software (i.e., centralized or distributed software system [49]), such as open-source licensing, internal cryptocurrency support, decentralized consensus, and no central point of failure [10]. According to the statistical results of dapp.review, about 5,000 DApps are running on the blockchain platforms currently, with applications ranging from gambling and lottery to social and financial fields, involving 12,750 smart contracts and more than 250,000 active users. These facts show that decentralized applications have become a new software system that cannot be ignored.

1.2 Smart Ponzi Schemes and Identification

Running on an immutable blockchain, DApps provide users with a sense of trustworthiness. Yet this feature has also been exploited by criminals to create a new type of scam application. These applications themselves do not create any value, operators mainly rely on constantly attracting new users to participate in them to obtain fees and other income. Early players may be able to reap some of the benefits. However, this mechanism of wealth redistribution will inevitably cause irreversible losses to most users and hinder the development of blockchain ecology. These applications are called Smart Ponzi Scheme [2, 17].
Ponzi schemes are a classic type of scams whose core mechanism is to use the investment of new investors to compensate for the previous. Operators maintain the survival of the scam by continuously attracting new investors (and charging fees) through promising high profits. A Ponzi scheme will eventually collapse because it is hard to keep attracting new investors. By building on blockchain and smart contracts, smart Ponzi schemes exhibit many new characteristics. On the one hand, the proceeds of the fraud are some kind of cryptocurrency, and the anonymous and immutable nature of the blockchain makes many investors’ investment irreparable. On the other hand, a large number of ordinary investors are vulnerable to fraud under the veil of the emerging technology. For example, Fomo3D,1 a game known as a Ponzi Scheme, soon surpassed Cryptokitties2 and became one of the most popular games on the Ethereum platform at the time. It is reported that, by the end of the first round of the game, the final winner was awarded more than $3 million, while most ordinary users lost their investment [45].
The existence of smart Ponzi schemes will result in the loss of cryptocurrency (and money) to a large number of relevant participants, which is not conducive to the development of blockchain technology ecology. Thus, just as it is extremely important to detect malicious apps in the Android markets [65], smart Ponzi scheme detection is also an important measure to maintain the DApp ecosystem. Note that “DApp” is more of a user-oriented concept. A DApp is commonly composed of basic smart contracts to run the application and a web client for the user to interact with. Since smart contracts are the building blocks of DApp, the identification of smart Ponzi scheme is mainly aimed at individual smart contracts that apply the logic of Ponzi schemes. Therefore, when Ponzi logic is detected in a specific smart contract, the result is inferred that only this specific part of the DApp is adopting the Ponzi scheme. The detection method can provide a warning message to participants if and only if their transactions interact with this specific address. In this way, it can protect users from involving in Ponzi schemes while minimizing the impact on other benign functionalities of the DApp.
While many of the participants in the previous example (i.e., Fomo3D) may be aware of the potential risks, there are many other smart Ponzi schemes that lure ordinary investors in the guise of an investment plan with high profits (see [17] for an example). In addition, some smart Ponzi schemes that do not provide source code (i.e., hidden smart Ponzi schemes in [17]), in this case, even professionals can not judge whether it is a smart Ponzi scheme. Considering the rapid development of blockchain technology, the majority of users involved lack professional knowledge, and the relatively weak supervision, it is very urgent to study the identification method of smart Ponzi scheme.

1.3 Current Methods and Limitations

Fortunately, there have been studies on the problem of smart Ponzi schemes. Bartoletti et al. discusses the classification of smart Ponzi scheme and its influence [1, 2]. They collected samples of smart Ponzi schemes in two ways: (1) read the source code and collect some samples; (2) a small number of samples of hidden smart Ponzi schemes were obtained by evaluating the bytecode similarity of contracts. Furthermore, Chen et al. constructed the identification of smart Ponzi schemes as a classification problem, and achieved automatic recognition by adopting a machine learning method based on extracted bytecode and account behavior features [17, 18]. For a decentralized ecology, it makes more sense to establish automatic identification of smart Ponzi schemes. However, the current approaches have four main limitations as follows:
Insufficient samples: To build an automatic identification model, sufficient samples are the key. In the previous study (i.e., [2, 17, 18]), the sample size of smart Ponzi scheme is less than 200, and the sample size of non-Ponzi scheme is also not much. This is a complete mismatch with the current number of smart contracts. (Currently, there are millions of smart contracts on the Ethereum platform, including more than 100,000 open-source contracts.)
Reliance on transaction records: The feature extracted from the transaction records of contracts has made a main contribution to the accuracy in existing methods. However, when investigating the Ponzi schemes on Ethereum, we noticed that most of our collected Ponzi schemes have a short lifetime. The median of their lifetime is only about 2.5 days for those contracts with non-zero transactions. The short lifetime of Ponzi schemes leads to weakness of these transaction-based methods. The time for collecting sufficient transactions is enough for most of the Ponzi contracts to complete their lifecycle (in this paper, we use the terms “Ponzi contract” and “smart Ponzi scheme” interchangeably).
Flawed model evaluation method: In study [17] and [18], the model is trained and tested mainly by randomly dividing sample sets. However, as the technology for developing smart contracts iterates over time [66], the later smart Ponzi schemes are often more complex in their technical approach and logic. The method of randomly dividing training sets may lead to the use of newer contract information in model training and the prediction of earlier contracts, thus providing an inflated model effect. And when models are applied to future judgments, they may perform worse than expected.
Low accuracy: For machine learning based detectors, single feature construction method and simple models used in [17, 18] are insufficient to distinguish smart Ponzi schemes. At the same time, the defective evaluation method exaggerates the real effect of the model. In addition, smart Ponzi schemes are always in the minority in a large number of applications, but the sample imbalance problem was not considered in the previous study. On the other hand, detectors based on symbolic execution [15] rely on expert rules to recognize Ponzi schemes. However, with the explosive growth in the number of smart contracts, it is hard for experts to traverse emerging Ponzi schemes and keep updating detection rules for evolving attack strategies. The scalability of detailed rules is potentially limited due to the proliferation of smart contracts, while general rules may lead to high False Positive Rate and high False Negative Rate in practice [31].

1.4 Our Methods and Contributions

To tackle these challenges, we collected more samples and propose a Multi-view Cascade Ensemble method named MulCas for automatic smart Ponzi scheme identification with the following salient features:
High accuracy: To achieve high accuracy of smart Ponzi scheme detection, we extracted multiple layers of rich features from the bytecode of the contract, including bytecode features and semantic features. Similar to [17, 18], we do not use any source code information from the contract, but only the bytecode deployed on the blockchain. In addition, we extract the developer features by parsing the contract creation records on the blockchain. Based on these features, we obtain a better detection effect by using the proposed multi-view cascade ensemble method.
Identification at creation time: Our model uses only the information that must be provided when the contract is created (i.e., bytecode and a deployment transaction record), rather than relying on the interaction information after the contract is created. Thus, the model can make a decision on whether it is a smart Ponzi scheme at its creation time.
Better robustness: By combining the discriminant results from different perspectives, our model has better robustness. Compared with the Baseline model, on the one hand, the effectiveness of the model declined less as the class imbalance increased; On the other hand, the prediction effect of the model is relatively stable over time.
Figure 1 shows the framework of our study, which consists of four steps. Firstly, after collecting and verifying more samples, we convert the bytecode of each contract into opcodes, which is an important source of subsequent features. In addition, by parsing the blockchain data, we extracted all the contract creation transactions to form the creation graph, which is another source of our features. Secondly, through the above two sources, we extract three types of features, namely bytecode features, semantic features, and developer features. Next, we train identification models (i.e., viewers) based on different features. Finally, we present the Multi-view Cascade Ensemble method. The final result of the model includes two parts. The first part is the result with enough confidence in the two kinds of samples (i.e., smart Ponzi scheme and non-Ponzi scheme contract) by cascading different viewers. The other part is the voting result of all viewers for the unconfident samples.
Fig. 1.
Fig. 1. The framework of our approach.
By applying MulCas to the constructed dataset, we find that it exhibits an impressive performance of smart Ponzi scheme detection. In summary, our major contributions include:
By reading the source codes (similar to [2, 17]), we collect more samples of smart Ponzi schemes and construct a whole new dataset, which will promote the study of smart Ponzi scheme identification. The dataset and associated analysis code is available at http://xblock.pro/#/dataset/PonziContractDataset.
We extract rich features from multiple views. These features do not depend on the performance of the smart contract, enabling the identification at the creation moment. Among these features, the extracted developer features have a significant recognition effect.
We propose and develop MulCas, a model for smart Ponzi scheme detection by using supervised learning methods and the extracted features.
We conduct extensive experiments to evaluate MulCas on the new dataset. The results show that, as compared with the baseline results in [17], MulCas has greatly improved the recall and F1 values. In addition, MulCas performs better in terms of robustness.
The rest of this paper is organized as follows. After provides some background in Section 2, we introduce the dataset construction method and problem definition in Section 3. Section 4 details the proposed method and Section 5 reports the experimental results. After introducing the related work in Section 7, we conclude the paper in Section 8.

2 Background

Similar to [2, 17, 18], our study is also based on Ethereum. Below is a brief introduction of Ethereum and smart contracts.
Ethereum. Ethereum is the second-largest cryptocurrency platform (in terms of market capitalization) based on blockchain technology. Compared with Bitcoin, Ethereum introduced the Ethereum Virtual Machine (EVM) to support smart contracts [53]. EVM is a stack-based virtual machine that can execute scripts composed of instructions in its own bytecode instruction set [7].
Smart contract. A smart contract is essentially a piece of executable code deployed on Ethereum. Generally, the creation of a smart contract takes several steps: (1) write smart contracts in high-level languages such as Solidity;3 (2) compile source code into bytecode; (3) release the bytecode to Ethereum by sending a transaction. Once a smart contract is deployed to the blockchain, the corresponding bytecode and the creation transaction are permanently stored on the blockchain.
Account. There are two types of accounts on Ethereum: Externally Owned Accounts (EOAs) and smart contract accounts. The EOAs are controlled by private keys and smart contract accounts contain the associated bytecode.
Transaction. Technically, a transaction on Ethereum is a cryptographically-signed instruction. There are two types of transactions on Ethereum [59]: (1) those resulting in message calls; (2) those resulting in the creation of new smart contract accounts. The first can be used for transferring Ether or call specific functions of a smart contract. The latter can be used to deploy a smart contract.
Gas. Gas is the name of execution fee that accounts need to pay for running transactions on Ethereum. For deploying a smart contract, the creator needs to pay the deployment fee related to the length of the contract and the storage space occupied by the contract. For calling a smart contract, the caller needs to pay gas related to the number of opcodes executed and the storage space modified by this function call.
EVM bytecode and opcode. Once a smart contract is created, the EVM bytecode becomes the only code information of the contract stored on blockchain. EVM bytecode consists of three EVM-code fragments: initialization code, contract code, and auxiliary data.
Initialization code: the EVM-code fragment for the account initialization procedure. This part of EVM bytecode is executed only once when the contract is being created.
Contract code: the main fragment of code which is executed whenever the contract account receives a message call.
Auxiliary data: an optional fragment that plays the role of a cryptographic fingerprint of the source code and is never executed.
However, the binary-form bytecode is obscure for developers to conduct analyses. For the sake of interpreting binary values, an instruction set is provided by Ethereum to map bytecode instruction to a mnemonic form called “opcode” [59]. Figure 2 illustrates an example of opcode file converted from the bytecode of a smart contract.
Fig. 2.
Fig. 2. Converted opcode file fragment of a smart contract.

3 Dataset Building AND Problem Definition

3.1 Ponzi Contract: A Case

One of the main contributions of our work is to manually check the source codes to determine whether a smart contract deploys Ponzi scheme logic. Recall that the key of Ponzi scheme logic is to use the investment of new investors to compensate for the previous investments. Hence, the source code of Ponzi contracts must reflect the same logic. In practice, Ponzi scheme logic can be adopted through various approaches, such as array-based schemes and tree-based schemes introduced in [1].
The following example illustrates how we recognize Ponzi contracts, in other words, how we determine if a smart contract is a Ponzi contract from source code. The example contract, named Daily12, is a verified Ponzi smart contract with source code available on etherscan.io (one of the most famous Ethereum explorer).
We introduce some interesting characteristics of Daily12. As shown in Figure 3, Daily12 is swift: the lifecycle of the contract is only 10 days. The balance of Daily12 rapidly increased to about 146.59 ETH in 3 days but drops to 0 in the following 7 days. We then download all the transactions of the contract to take an insight into this Ponzi contract. According to our statistics, 171.04 ETH had been involved in Daily12 (now about $219,782.98), which is a considerable amount of money. These characteristics encounter our motivation to identify the Ponzi contract at the creation time. To the best of our knowledge, the key component of the existing method is the transaction record of a contract. However, the time for collecting sufficient records may be long enough for a swift Ponzi contract like Daily12 to complete its lifecycle. Therefore, the detection result based on transactions is not that much meaningful as it seems.
Fig. 3.
Fig. 3. Balance of Daily12.
The contract has made a seductive claim to attract participants. It promotes itself by claiming that the participants would gain 12% of their investments every day and they could withdraw the profits at any time:
Propaganda
Easy investment contract
- GAIN 12% PER 24 HOURS (every 5900 blocks)
- NO COMMISSION on your investment (every ether stays on contract’s balance)
- NO FEES are collected by the owner, in fact, there is no owner at all (just look at the code)
How to use:
1. Send any amount of ether to make an investment
2a. Claim your profit by sending 0 ether transaction (every day, every week, i don’t care unless you’re spending too much on GAS)
OR
2b. Send more ether to reinvest AND get your profit at the same time
However, the propaganda does not reflect the nature of Daily12. The contract has made its source code available on the etherscan.io to win the trust of the participants. In this sense, we reveal how the Ponzi logic is implemented by Daily12 source code shown in Figure 4.
Fig. 4.
Fig. 4. Source code of Daily12.
The source code of Daily12 is brief and concise. Like most traditional Ponzi schemes, the contract would record the investments of participants (line 2). The mapping variable in line 3 records the time that the participants last retrieved their profits. Next, the function in line 4–11 reflects the critical logic of the Ponzi contract. Once the contract receives a transaction, it computes the 12% of the sender’s investment scaled by the time interval (line 6), and transfer the profit back to the sender (line 7), where the so-called profit comes from the investment of the participants.
It could be seen that the payback window size is about 8.3 days, since 100%/12% = 8.33. A participant was able to earn the investment back after 8.33 days, as long as the contract had enough balance. If there is even remaining balance, i.e., there are latter investment, an investor can earn from these investment.
Winning or losing in the Ponzi contract depends on whether there are latter investors. The key is to continuously attract new investors. Daily12 adopted the “12%” trick. In the first 8.33 days, Daily12 uses 12% of the participant’s investment to pay back to him/her, causing an illusion that the payback is stable and continuous. Due to this seeming long life cycle, the contract even received about 0.11 ETH on the last day. However, it could be interpreted that the contract had to use the investment of latter participants to compensate for the profit after running out of the investment of the previous.
This mechanism leads to the fact that only the earlier participants could win in the Ponzi contract. Figure 5 shows the ether flow of the smart contract. The ether flow graph is introduced to visualize the transactions involved in a smart contract. Three types of transactions are encoded in the graph: investment, payback, and profit. These types of transactions are denoted by blue circles, green squares and orange triangles, respectively. Only those who earned from the contract chould receive profit transactions (i.e., orange triangles in the graph). The amount of ether involved in these transactions are reflected by the size of the circles. The x-axis represents the time line, while the y-axis represents individual participants. There are also two lines in the figure. The solid line is the regression line of investments and the dotted line is the regression line of first profit transaction for each investor. Several insights come with the two lines: (1) The dotted line roughly splits the payback area (green) and profit area (orange). Investors can only get profit from the orange area below the dotted line. A number of investors have no transactions below the dotted line, meaning that they are victims of the Ponzi contract; (2) The two lines are parallel, indicating the payback window of 8.33 days. And the left margin of orange area have composed another parallel line of payback. Notably, most of the earned participants had made their first investment in the early stage of the contract. The fact is that only those investments before block height 6547454 (the contract lifetime minus the payback window size) have a chance to earn from the contract. Unfortunately, some early investors still lost their money. They reinvested the contract several times after their first investment. Apparently, they did not understand the degenerating nature behind the seemingly high and stable profit.
Fig. 5.
Fig. 5. The ether flow graph of Daily12.
The Daily12 example shows that a Ponzi contract can be swift but may involve a considerable amount of money. Detection methods based on transaction records have weak constraints on these contracts, since the contracts may have completed their life cycle before sufficient transaction records are collected. Therefore, it is significant to develop a detection method independent of transaction records.

3.2 Dataset Building

As is presented in Figure 6, the dataset is built in the following steps:
Fig. 6.
Fig. 6. Dataset building procedure.
Crawl the contracts with accessible source code from etherscan.io. Smart contracts are not required to provide source code on Ethereum. There are millions of smart contracts deployed on Ethereum, but only a small fraction of them have verified source code on etherscan.io. The study [44] has found that only 0.05% of the smart contracts are the target of over 80% of the transactions that are sent to contracts, where 73.1% of them have available source code. Therefore, these open-source contracts have made a major contribution to transactions on Ethereum.
Manually check the source code. Similar to the source code inspection in Section 3.1, we manually read the source code of these smart contracts and check if the smart contract reflects the Ponzi scheme logic. Once we find there are codes that use the investment of new investors to compensate for the previous, we label the contract as a Ponzi scheme. Next, we cross-check our result to make it more precise.
Extract the contract bytecode and developer information from the blockchain. We use OpenEthereum4 to download the bytecode and developer information. Auxiliary bytecode fragment, introduced in Section 2, is discarded since they will never be executed by EVM.
We make several notes for our dataset collecting procedure:
While we collect our dataset from open-source smart contracts and learned that the majority of transactions come from or target at these contracts, we develop our Ponzi contract detector from the bytecode level. This design comes from the consideration that there are a considerable number of latent Ponzi schemes running on Ethereum (as can be found in the studies of Bartoletti et al. [2] and Chen et al. [17]), a bytecode-level detector can extend its application scenarios to all smart contracts deployed on Ethereum. Therefore, a warning guide can be provided for users before they interact with bytecode-only smart contracts, or for e-wallets when adversaries want to deploy Ponzi contracts through them. We discuss the potential threats to validity introduced from the biased distribution of our dataset comparing to the entire smart contracts family on Ethereum in Section 6.1.
To ensure the validity of our dataset, a cross-check procedure is adopted when we manually classify the source code of smart contracts. There are seven researchers involved in this process. In the first step, we employed five researchers to label these smart contracts. Generally speaking, recognizing Ponzi contracts from source code is relatively easy: we only need to recognize the ether flow relationship and decide whether the relationship falls into Ponzi logic. One smart contract is read by two researchers. A label is considered to be reliable only if both of the two researchers add the same label. If discrepancy occurs between the two labels, this smart contracts will be delivered to a more sophisticated researcher and they will make a discussion. If no agreement is reached, this smart contract will be discarded. In the second step, one researcher will randomly challenge the labeled smart contracts to strengthen the reliability of the dataset.
After collecting and checking the smart contracts with open source code, we obtain 6,498 smart contracts, among which there are 314 smart Ponzi schemes (i.e., Ponzi contracts). These smart contracts range from height 0 to height 7,500,000. The statistical features of the lifetime of these Ponzi contracts are shown in Table 1. It is not surprising that the Ponzi contracts usually have a short life since they do not create any value. As explained in Section 3.1, this phenomenon meets our motivation to identify the Ponzi contract at its creation time. Ponzi contracts usually do not last a long time, so transaction-based methods are not able to alleviate the damage caused by Ponzi contracts on Ethereum.
Table 1.
 meanmedianmax
without zero transaction contracts34.4842.549384.795
with zero transaction contracts21.6880.027384.795
Table 1. Statistics of the Collected Ponzi Contracts’ Lifetime

3.3 Problem Definition

Since smart contracts are the building blocks of DApps, the identification of smart Ponzi schemes is essentially the identification of whether any contracts it contains are Ponzi schemes or not. Let \(C = \lbrace c_1, c_2 \ldots ,c_n\rbrace\) be a set of smart contracts with labels \(L = \lbrace l_1, l_2,\ldots , l_n\rbrace\) , where n is the number of contracts. The contracts labeled with \(-\) 1 (i.e., negative) are non-Ponzi contracts, while contracts labeled with 1 (i.e., positive) are Ponzi contracts. Let \(D = \lbrace d_1, d_2,\ldots , d_n\rbrace\) be the set of developer information of the corresponding contracts, where the developer information of a certain contract is composed of its creator address and the height of the creation block. Let \(B = \lbrace b_1, b_2, \ldots , b_n\rbrace\) be the bytecode of the corresponding contracts. The task of our work is to construct a high-performance classification model to classify given contracts into non-Ponzi contracts or Ponzi contracts, where only bytecode B and developer information D are used for classification.

4 Methodology

Figure 1 illustrates the overall framework of our analysis, which consists of four main stages: preprocessing, feature extraction, viewer training, and Ensemble the Viewers.

4.1 Preprocessing

With Dataset built in Section 3.2, we have the following information: (1) labeled bytecodes of contracts; (2) contract creation records. In this stage, bytecodes are converted to opcodes and the developer features are constructed.
For bytecode, Ethereum yellow paper [59] has provided a complete table to convert binary instruction to mnemonic forms, i.e., opcodes. Therefore, the conversion is done in the following steps: (1) tokenize bytecode into tokens; (2) convert these tokens into opcodes according to the instruction set. A disassembler5 can be used to convert bytecode to operation code. The operation code is composed of two parts: opcodes and operands. For example, the mnemonic form “PUSH1” is an opcode and the hexadecimal number “0x80” following this opcode is an operand. In our preprocessing stage, the operands are discarded for the reason that the operands are pure hexadecimal numbers which are highly dependent on individual smart contracts, and the introduction of these numbers should increase the model complexity dramatically. Insight into these operands remains for future research.
For contract creation records, we construct a creator vector as the developer feature. The detailed construction method will be introduced in Section 4.2.

4.2 Feature Extraction

In this stage, we will use the opcodes and contract developer information as inputs to extract features from different views. Note that the opcodes are series of tokenized mnemonic words, so they can be viewed as documents containing tokenized opcodes as words. Thus, feature extraction methods in the scope of natural language processing can be fit into the opcode data.
To investigate which features could be leveraged to build an effective model, statistical analysis are firstly conducted before we select which features to use. Consequently, several feature extraction methods are conducted from different views. First, term frequency count and TF-IDF value are calculated based on the Bag-of-Words model [47]. The two methods view the documents as an unordered set of words. Besides, N-grams and TF-IDF for N-grams are calculated from the view of considering locality of words in contiguous sequences. Moreover, the word2vec [36, 40] algorithm is performed to embed opcodes into vectors from a semantic view. Furthermore, we present a novel feature extracted from developer information called creator vector and prove the effectiveness of this feature.

4.2.1 Term Count and N-gram Count.

Term count feature is counted based on Bag-of-Words model [47], which is the same feature (i.e., word frequency) used in [17, 18]. From the view of this model, a document is considered to be a bag of the words in it, discarding the order of the words but keeping the information on multiplicity.
In contrast, N-gram model is based on a \((N-1)-order\) Markov model, which supposes that the occurrence of a word is completely dependent on the last \(N-1\) words. Generally speaking, an N-gram is a contiguous sequence of blocks consisting of n items from a given document. When n is assigned 1, 2, and 3, N-gram is called unigram, bigram, and trigram, respectively.
To learn whether the two features are valuable or not, we counted the top 10 occurrences of words, bigrams, and trigrams for both Ponzi contracts and non-Ponzi contracts. The result is shown in Table 2. It can be seen that the stack operations, such as PUSH and POP, have dominated the top 10 statistics in both of the two kinds of contracts. On the other hand, it seems that the Ponzi contracts favor jump operations more and uses fewer algorithmic opcodes such as ADD and AND. This result comes from the fact that deploying a Ponzi scheme is not difficult and does not necessarily need many algorithmic operations. Other contracts implementing relatively complex applications rely more on the algorithmic operation.
Table 2.
Ponzi Contractsnon-Ponzi Contracts
PUSH1PUSH1
SWAP1PUSH2
PUSH2SWAP1
JUMPDESTPOP
DUP1JUMPDEST
POPDUP1
DUP2DUP2
ADDADD
PUSH1 PUSH1JUMPI
ANDPUSH2 JUMPI
Table 2. Top 10 Occurrence of Words, Bigrams, and Trigrams
Therefore, we take (1) term count; (2) the count of unigram, bigram, and trigram, as the features.
The term count of a word w in document d is calculated by counting the occurrence of it in a particular document. There is the situation that words in long documents would have more occurrence than those in shorter documents if there is no normalization step. However, when we count the length of opcodes for the two kinds of contracts, we find that the length information of a contract is also potentially beneficial to our classification. As shown in Figure 7, the length of opcodes in Ponzi contracts is generally smaller than that in non-Ponzi contracts, due to the simplicity of implementing a Ponzi scheme. In general, the term count feature under this model provides a view of quantity information while keeping the length of the document as potential information.
Fig. 7.
Fig. 7. The length of opcodes for Ponzi and non-Ponzi contracts.
Similarly, the N-gram feature is extracted by counting the occurrence of all three N-grams in a document. However, with n grows larger, the number of N-gram sequence grows exponentially with n. To control the complexity of our model, N-gram sequences with document frequency less than 0.1 are discarded, where the document frequency denotes the reciprocal of IDF before taking the logarithm. Table 3 shows the number of features before and after this selection method.
Table 3.
CategoryBefore SelectionAfter Selection
unigram14177
unigram+bigram7211767
unigram+bigram+trigram525322292
Table 3. Number of Features Before and After Selection
Compared with feature extraction based on the Bag-of-Words model, the N-gram model introduces locality to the feature, since it views the opcode sequences as blocks containing adjacent opcodes. The same consideration on document length in extracting the term count feature also exists in the extraction of the N-gram count feature. In general, the N-gram count feature provides the view of the dataset from information on quantity and locality while potentially keeping the document length as extra information.

4.2.2 TF-IDF and TF-IDF for N-gram.

While the two kinds of smart contracts differ in choosing jump operations and algorithmic operations, they agree on the massive use of stack operations. All smart contracts must use stack operations because EVM is stack-based. There are similar situations in the field of natural language processing: some items, such as “of”, have dramatically high occurrence in documents but they usually carry little information. To handle this problem, the TF-IDF method is introduced.
TF-IDF, short for Term Frequency-Inverse Document Frequency, is a statistical measure used to evaluate the importance of a word to a document in a collection of documents [48]. The calculation of TF-IDF is composed of two parts: (1) calculating the TF value; (2) calculating the IDF value. Similar to term count, Term Frequency calculation aims to infer multiplicity information of a word in a particular document. The difference is that term frequency introduces a normalization step to exclude the influence introduced by document size. The IDF value describes how frequent is a word in all documents. The more frequent a word exists in all documents, the lower the IDF value is. This negative correlation is introduced to reduce the weights of meaningless but frequent words such as “of” in sentences. In our study, the negative correlation may reduce the weights of operations such as “PUSH” and “POP”.
With term frequency and inverse document frequency calculated in the above steps, the TF-IDF value can be calculated by multiply the two items. TF-IDF value evaluates the importance of a certain word given a document and to a collection of documents. The TF-IDF is made taking quantity information into account while ignoring the order information.
The extraction of TF-IDF values for N-gram sequences is similar to that for opcode sequences, except that in this method we regard N-gram sequences as word sequences. According to the idea of N-gram and TF-IDF calculation, this feature provides a combined view of the two methods and evaluates the importance of an item by weighted frequency from a macro view while keeping locality information in the document.

4.2.3 Word2vec Embedding.

Word2vec [27] is a more sophisticated technique for producing word embeddings in the fields of natural language processing. The algorithm uses a neural network model to embed words from a given corpus of text into low-dimensional vectors, where the dimension is set to 100 by default in our study. These vectors are learned to represent the semantic information of corresponding words. Cosine similarity between vectors is used to indicate the level of semantic similarity of the words. Given the word2vec embedding of each word, a document is represented by the weighted sum of all the word vectors in the document with IDF value as the weight of each word. In this sense, the Word2vec embedding is providing a semantic view of the opcodes.

4.2.4 Developer Feature.

A novel developer feature called “creator vector” is introduced in this paper. The main idea of using developer information is based on the the following assumption:
Assumption 4.1.
The information that whether a smart contract is created by a tainted creator can be useful in the Ponzi scheme detection task.
In this work, we try to explain this assumption from two different view: human behavior and code logic.
From the view of human behavior, the idea is inspired by the discovery in phishing scam detection research that it is common for a phishing scam creator to create multiple phishing scams [14]. Therefore, we assume that there is also a higher chance for one who has ever created a Ponzi contract to create more Ponzi contracts. To explore the assumption, we conduct some statistical methods on our dataset and the results are shown in Figures 8 and 9.
Fig. 8.
Fig. 8. Distribution of Ponzi contract creators grouped by the number of Ponzi contracts they created.
Fig. 9.
Fig. 9. Distribution of Ponzi contracts grouped by creators.
As is illustrated in Figure 8, although the majority of Ponzi contract creators have created exactly one Ponzi contract, there is still a small proportion of Ponzi contract creators who have created more than one Ponzi contracts. Considering the high contract deploying fee on Ethereum, we can essentially suppose that these group of creators have earned from their previous Ponzi contracts. When we look into these creators, we find that in Figure 9, the number of Ponzi contracts created by these creators have been accounting for a considerable amount. This is a reasonable observation because the number of contracts equals the number of creator multiplying the number of contracts they created. As a result, 61% of all Ponzi contracts come from these experienced creators. Therefore, the assumption is reasonable from the statistical results in the scope of our dataset.
From the view of code logic, the developer information has been found to have a positive correlation with code similarity. Huang et al. [63] and Chen et al. [19] have found that smart contract developers tend to reuse their code when deploying new smart contracts with similar functionality. The code similarity between smart contracts from the same developer is significantly higher than the code similarity between independent smart contracts [63]. In this sense, Ponzi contracts from the same creator tend to have higher similarity in code logic, which can be beneficial in the task of Ponzi scheme detection task.
In summary, the developer information could play an assistant role to amplify the code similarity from the same developer and the natural motivation of a malicious developer. It can benefit the model to identify Ponzi contracts created by a tainted developer, but developer information alone is not able to recognize Ponzi logic.
To embed this new-found information into our model, we propose an algorithm to construct a structured feature from given contract information, as is shown in Algorithm 1.
The construction aims to convert the developer information to a \((n \times 1)\) boolean vector, where n refers to the number of the contracts. The vector is constructed in two steps. Firstly, Algorithm 1 collects all creators who have created a Ponzi contract (lines 8–10). In the second step, when new a contract arrives, the algorithm checks if the creator is in the collection and sets the corresponding value of CreatornVector to True or False depending on the result (lines 5–7). In other words, if the corresponding element of a contract in this vector is True, it tells that this contract’s creator has created a Ponzi contract before.
When the creator vector is retrieved by the algorithm, the feature will be concatenated to other opcode features presented above so as to add a view of the dataset from developer information. The effectiveness of the feature will be evaluated by experiments in Section 5. It is worth noting that the creator vector is the collection of Ponzi contracts’ creator information, meaning that the creator vector can be expanded with upcoming labeled Ponzi contracts. In other words, the model can improve itself with new contracts appended to the dataset.

4.3 Viewer Training

In this stage we have an input of 5 types features concatenated with developer feature, each providing a distinctive view of the dataset. This stage mainly aims to train models based on each of the features. Here, the term “viewer” is used to refer to the machine learning model fited with those features from different views.
There have been a number of classification models proved to be useful in many fields, including liner models such as SVM [52] and ridge regression [34], tree-based models such as random forest classifier [38] and XGBoost [13]. The five features retrieved in the previous stage are fitted to the above models and the performances of these models are grouped by their training features. We selected the models with the best performance on the testset for each kind of features. At last, five final models are kept, each of which comprehends the data from distinctive views concerning the features fitted to the model. These models include XGBoost trained with term count feature, N-gram count feature, N-gram TF-IDF feature, and ridge regression trained with TF-IDF feature.

4.4 Ensemble the Viewers

To achieve a comprehensive view of the data, model ensemble techniques are effective to merge the models trained in the previous stage. Given the models with distinctive views of the dataset, we propose a Cascade Multi-view Ensemble algorithm to achieve a comprehensive model. The pseudocode is shown in Algorithm 2.
The ensemble is achieved by two main steps: a cascade ensemble in the training step and a vote-based ensemble in predicting step.
Algorithm 2 takes in the input of a collection of viewers V, two threshold values \(\theta\) and \(\lambda\) , complexity control parameter N, trainset and testset. Among the inputs, the complexity control parameter N is used to restrict the maximum iterations in cascade ensemble step. The \(\theta\) denotes the confidence threshold of predictions. In other words, if the predicted probability is larger than \(1-\theta\) or smaller than \(\theta\) , the model is considered to be confident in this prediction. The other parameter \(\lambda\) serves for the voting mechanism, where a sample receiving votes more than \(\lambda\) is considered to be positive.
At the beginning of Algorithm 2, a cascade ensemble in training step is performed (lines 3–10). Specifically, the trainset and testset is fitted to the first viewer and the viewer then makes predictions on both of the two sets. According to the prediction results, the trainset and testset are split into a confidently-predicted subset, if the probability prediction is larger than \(1 - \theta\) , and an unconfidently-predicted subset, if the probability prediction is smaller than \(\theta\) . The threshold is commonly chosen as 0.05, a commonly used significant level in the fields of statistics. This threshold can be set to other values for certain purposes as well. For example, the threshold measuring confident positive predictions could be set to 0.51 so as to trade precision score for recall score, where the precision score and recall score are two commonly used metrics in classification. The trade is reasonable under the consideration that it is more of a problem if we missed a Ponzi contract than misclassified a normal contract. Secondly, the unconfident trainset and unconfident testset are passed to the second viewer and a similar procedure is carried out for \(N-1\) times.
In the end, the testset is now split into two subsets: confidently-predicted subset and unconfidently-predicted subset. Predictions of the confidently-predicted subset are set to predictions made by the cascade-ensemble predictors. For the unconfidently-predicted set, a voting mechanism is introduced to make predictions (lines 12–25).
The voting mechanism synthesizes all the viewers as voters. In this step, the viewers are trained with all data in trainset for the reason that the remaining unconfidently-predicted trainset is relatively small and the information may be inadequate to be used for predicting the remaining testset.
From the above steps, we have built a Ponzi scheme detection model based on smart contract bytecode and developer information. Since the detection procedure does not need any human intervention, the model could be integrated into a local platform or local client to monitor the creation of smart Ponzi schemes. The detection pipeline could be divided into two modules: data collection and detection. For the data collection task, the same procedure in Section 3.2 could be leveraged, i.e., use Ethereum client such as OpenEthereum6 to collect smart contract bytecode and extract developer information from historical transactions. Once a new smart contract is deployed, the corresponding bytecode and developer information could be passed to the detection module. In the detection module, the features in 4.2 could be automatically extracted for MulCas to classify the target smart contract. Once a smart contract is detected to be a Ponzi contract, the monitor client could label the contract address and update the developer information. In this way, the client could provide a warning message to users if any of their transactions will be involved in a smart Ponzi scheme.

5 Evaluations

5.1 Study Setup

5.1.1 Dataset.

We evaluate the Multi-view Cascade Ensemble model on the dataset collected in Section 3.2. As is described in the section, the dataset consists of 6,498 smart contracts in chronological order, among which there are 314 Ponzi contracts.

5.1.2 Metrics.

The popular Recall, Precision, and F1 metrics for classification are used in our work. However, in contrast to traditional supervised learning problems who calculate these metrics by cross-validation, there are chronological relationships between contracts in our case of study.
In this sense, the cross-validation does not work for our dataset. This is because cross-validation assumes that each data is independent. However, the assumption should be dropped in our case of study. For example, given that a large proportion of Ponzi contracts are created by a small group of creators, the coding style or techniques for deploying a Ponzi scheme may vary over time. Thus using the information in the future to predict contracts in the past seems to be easier than using the information in the past to predict contracts in the future.
Therefore, we will split the dataset by chronological order. By default, the preceding 80% data are divided into train data and the rest are taken as test data. Moreover, to avoid the contingency caused by sampling, the performances with different train-test ratios are evaluated in our experiments. Baseline models and the Multi-view Cascade Ensemble model will be evaluated on these splits.

5.1.3 Parameters.

Three parameters play significant roles in our model, i.e., complexity control parameter N, confident threshold \(\theta\) , and vote mechanism parameter \(\lambda\) . In the evaluation step, N is set to 2 so only two viewers are used in the Cascade ensemble step to ensure low complexity. \(\lambda\) is set to 0 in the experiment so that once there is a viewer making a positive prediction, the prediction will be accepted. This is for the reason discussed in Section 4.4, which forms our motivation to trade precision for recall. For convenience, such trades will be named as recall-emphasis conditions in this paper, meaning that we apply the condition to gain a higher recall score at acceptable cost of precision score. For \(\theta\) values, we set the \(\theta\) at 0.05 in the first iteration in cascade ensemble, which is a commonly used significant level in the fields of statistics. On the other hand, the threshold measuring the confident positive predictions is set to 0.51 in the second iteration for the same reason to apply the recall-emphasis condition.

5.1.4 Research Questions.

We designed the experiments to answer the following research questions:
RQ1: How does MulCas perform in terms of sustainability? (Section 5.2.1)
RQ2: Does MulCas outperform baseline approaches? (Section 5.2.2)
RQ3: Is the newly-introduced developer features effective? (Section 5.3.1)
RQ4: How does the model perform against different data imbalance ratios? (Section 5.4.1)
RQ 1 and 2 examines the performance of MulCas while RQ 3 takes insight into features extracted from the dataset. The robustness of the model is investigated in RQ 4.

5.2 Performance of MulCas

5.2.1 RQ1: How Does MulCas Perform in Terms of Sustainability?.

While ML-based methods have been widely applied to malware detection tasks, an emerging problem known as model aging has aroused extensive attention in recent years [8, 32, 46, 64]. The performance of these detectors may significantly degrade over time due to the rapid evolution of malware. Malware may rapidly improve their attack strategies and the detection model built from early data may not be sustainable. Consequently, malware can intentionally or unintentionally avoid being detected by early proposed malware detectors. Researches have found that the performance of early Andriod malware detectors may degrade to 75% after 2 years and below 50% after 3 years [8]. The similar problem also exists in the context of smart Ponzi scheme detection. For example, early Ponzi contracts are implemented based on four Ponzi schemes [1], while new mixed scheme is also available in recent study [15]. Fortunately, machine learning-based classifiers could improve the aged models by retraining practice. In this section, we try to evaluate the sustainability of MulCas by training and predicting smart contracts in incremental time intervals.
The distribution of Ponzi contracts we collected is extremely not uniform. For example, There are 121 Ponzi contracts whose creation block heights are from 5,000,000 to 5,500,000 but there are only 9 Ponzi contracts whose creation block heights are from 3,500,000 to 4,000,000. For the evaluation metrics used in this study, the result would better reflect the performances of detection tools if there are a minimal number of positive samples in each prediction interval. Therefore, we slice the dataset according to the occurrence of Ponzi contracts rather than slicing the dataset into equal-length time intervals. Specifically, we split the dataset into 6 parts (P0 to P5) based on the creation block heights of Ponzi contracts. P0 consists of the first 50 Ponzi contracts and benign contracts whose creation block heights are lower than the 50th Ponzi contract. P1 contains the 51st to 100th Ponzi contracts and benign contracts whose creations block heights are higher than the 51st Ponzi contract and lower than the 100th Ponzi contract. P2, P3, and P4 are obtained by a similar procedure where the Ponzi contracts are the 101st to 150th, 151rd to 200th, and 201rd to 250th in the order of their creation block heights. At last, P5 is composed of the 251st to the last (314th) Ponzi contracts and the benign contracts whose creations block heights are amid these Ponzi contracts.
In this section, we first use P0 and P1 to train the detection model and use the model to predict the smart contract in P2. Then we retrain the model with P0, P1, and P2 (to simulate the retraining procedure in practice) and use the retrained model to predict the smart contracts in P3. Similar retraining and prediction procedures are done with P4 and P5. In this way, we have collected the performances of tested models with respect to incremental time intervals. The result is shown in Figure 10. Due to space limitation, only the performances of four typical tools are shown in the figure, i.e., MulCas, XGBoost trained with Term Count feature (TCXGB) [17], XGBoost trained with Term Count Feature, and Developer Feature (TCXGB-DF), Ridge classifier trained with TF-IDF for Ngram and Developer Feature (TINRIDGE-DF). From the result, we could see that the performance of each detection tool tends to increase as the number of training samples increases before predicting P5. But the recall score of the tested tools shows a significant decrease when predicting smart contracts in P5. Considering the definition of recall, the drop in recall score tells that there are more positive samples that are misclassified by the detection model. We found that the drop in performance is mainly caused by the prevalence of a new type of Ponzi contract derived from a new Ponzi contract demo7 after block height 5,000,000. Baseline tools show ill performance in predicting this type of Ponzi contract. But models trained with developer features perform more stable for new-coming Ponzi contracts. As for the 34 new-type Ponzi contracts, 15 of them are created by tainted creators. On the other hand, MulCas shows the best performance and the performance is rather robust in predicting smart contracts in incremental time intervals.
Fig. 10.
Fig. 10. Recall and F1 scores under different prediction intervals.
For comparison with program analysis methods, we have also tested SadPonzi [15] on out dataset. Comparison result are shown in Table 4. Since SadPonzi is built based on symbolic execution, it can perform Ponzi contract detection without a prior learning procedure. We have tested SadPonzi on all 6,498 smart contracts in the dataset, in which 4,881 of the smart contracts are successfully analyzed. In the failure cases, the tool mostly encountered the timeout problem due to path explosion problem of symbolic execution, or referring to precompiled contract which is not implemented. The performance of SadPonzi on the all success cases is shown in the all column. The performance of MulCas in the same column is N/A because a trainset is needed for learning-based methods to train the detection model. We have compared the performance of SadPonzi and MulCas on P2, P3, P4, and P5. It can be seen that MulCas shows comparable performance on the early set P2 and P3, where SadPonzi tend to have less false positives and MulCas tend to have less false negatives. However, the performance of SadPonzi drops in P3 and P4. The reason is that SadPonzi proposes four expert rules to capture the Ether transfer and redistribution behaviors of Ponzi schemes, but these rules may not be adapted to new emerging scam patterns. As discussed in the previous evaluation, recently we have seen a new type of Ponzi contracts. This new type of Ponzi scheme is deployed based on an ERC-20 token trading contract, where the Ponzi reward is manifest in the increasing value of the Ponzi token. Therefore, the expert rules proposed by SadPonzi failed in detecting this new type of Ponzi contracts.
Table 4.
MethodMetricallP2P3P4P5
 Precision0.510.330.420.180.24
SadPonziRecall0.591.00.710.250.18
 F10.550.50.530.210.20
 PrecisionN/A0.880.960.810.95
MulCasRecallN/A0.380.320.940.67
 F1N/A0.530.480.870.79
Table 4. Comparison between MulCas and Symbolic Execution based Method SadPonzi [15]
Answer to RQ1: MulCas performs better than baseline models in terms of sustainability.

5.2.2 RQ2: Does MulCas Outperform Baseline Approaches?.

To answer this question, we take the model described in [17] and derived several other baseline models based on the extracted features. Firstly, we use an XGBoost model trained with term count features as is presented in [17]. The following baseline model is constructed by fitting the features in Section 4.2 to linear classification models, i.e., SVM, ridge classifier, and tree-based classification models, i.e., XGBoost. Considering simplicity, we name these models by “Feature-Classifier” in the following section. For example, “NCXGB”, “TCXGB”, “W2VXGB”, “TINXGB”, and “TIXGB” are short for XGBoost models trained with N-gram Count, Term Count, Word2Vec embedding, TF-IDF for N-gram and TF-IDF features. In addition, the symbolic execution method for detecting Ponzi contract, SadPonzi [15], is also considered. Comparisons are made between these models and our Multi-view Cascade Ensemble model.
For the evaluation of machine learning based detectors, we use the dataset spliting method described in the last section. Specifically, the trainset consists of the 1st to 250th Ponzi contracts and benign contracts in the corresponding creation block heights. The testset consists the remaining smart contracts. As a result, the trainset contains 5990 smart contracts and the testset contains 508 smart contracts. For symbolic execution based detector SadPonzi [15], we show its performance on all successfully analyzed smart contracts. The results are shown in Table 5. The following conclusions can be observed.
Table 5.
ModelFeatureRecallPrecisionF1
 Ngram Count0.4530.8290.586
 Term Count0.4060.8390.547
RidgeWord2Vec0.0000.0000.000
 TF-IDF for Ngram0.3280.9130.483
 TF-IDF0.2500.8890.390
 Ngram Count0.3750.9230.533
 Term Count0.3280.9130.483
SVMWord2Vec0.2660.8950.410
 TF-IDF for Ngram0.2810.9000.429
 TF-IDF0.2810.9000.429
 Ngram Count0.2190.8750.350
 Term Count [17]0.2190.8750.350
XGBoostWord2Vec0.2190.8750.350
 TF-IDF for Ngram0.2340.8820.370
 TF-IDF0.2340.8820.370
SadPonziN/A0.520.590.55
MulCashybrid0.6740.9510.789
Table 5. Overall Evaluation Results on the Benchmark
MulCas achieves a considerably higher recall score than the baseline models. The recall metric has increased by about 0.3 compared with the average recall score of the baseline models. Moreover, MulCas has achieved the best F1 score of 0.743, meaning that the model performs best considering both recall score and precision score.
The precision score of all models is relatively higher than the recall score due to the class imbalance problem. Referring to the definition of precision and recall, the result tells that the detection model tends to classify positive samples into the negative class rather than classify negative samples into the positive class. This phenomenon is reasonable considering the class imbalance problem in the malware detection task. In the real-world scenario, there are much more benign smart contracts than malicious Ponzi contracts. In the scope of our dataset, the class imbalance ratio is about 1:20. Machine learning based models trained on such imbalanced datasets prefer to classify samples in the minority class into samples into the majority class. As a result, the precision scores are significantly higher than recall scores.
Answer to RQ 2: MulCas can achieve a significantly higher recall score while maintaining precision score and thus improve the F1 score of the result. It outperforms all the baseline models.

5.3 Feature Importance

5.3.1 RQ3: Is the Newly-introduced Developer Features Effective?.

The effectiveness of the developer features is evaluated by the baseline models since they are the most basic model and do not introduce other variables in ensemble methods. The results are presented in Table 6.
Table 6.
  TFXGBTCXGBW2VSVMTINRIDGE
WithoutRecall0.2340.2190.2660.328
DeveloperPrecision0.8820.8750.8950.913
FeaturesF10.3700.3500.4100.483
WithRecall0.5470.5630.2660.531
DeveloperPrecision0.9460.9470.8950.971
FeaturesF10.6930.7060.4100.596
Table 6. The Effectiveness of Developer Features
As is shown in the table, all XGBoost and Ridge models perform better on F1 score after training with developer feature. As discussed in RQ1, there are 34 Ponzi contracts that belong to the variants of a newly proposed Ponzi token demo8 among all 64 Ponzi contracts in the testset. However, from the developer information we found that 14 of these new type Ponzi contracts are from tainted creators. Therefore, the developer information have significantly improved the performance of detection models in these Ponzi contracts created by tainted creators, resulting in the increase in recall score.
Answer to RQ 3: the developer features improve the performance when classifying the new type Ponzi contracts.

5.4 Robustness of MulCas

5.4.1 RQ4: How Does the Model Perform against Different Data Imbalance Ratios?.

Classification on the imbalanced dataset has been an issue of interest in many research fields. In our case of study, the imbalance ratio is about 1:20 (positive sample size vs. negative sample size), indicating that this is a highly imbalanced dataset. To illustrate how MulCas performs on datasets with different imbalance ratios, we conduct the random under-sampling method to our dataset with ratios in {1:1, 1:2, 1:5, 1:10, 1:20}. Comparisons are made between MulCas and the same baseline models in RQ1, i.e., TCXGB, TCXGB-DF, TINRidge-DF. The results are presented in Figure 11. Besides, the recall-emphasis condition is not applied throughout this experiment because this condition mainly works for a highly imbalanced dataset.
Fig. 11.
Fig. 11. Comparision between models under different imbalance ratios.
As is illustrated in the figure, the performances for all of the models are tend to decrease as the dataset becomes more and more imbalanced. A sharp decrease can be seen in both of the metrics for the baseline models. In contrast, though a similar decrease can be found in the performances of MulCas, it is worth noting that the decreasing trend is much smoother. Besides, MulCas has achieved the best recall scores and F1 scores throughout the experiment.
Answer to RQ4: MulCas has shown relatively robust performance while the dataset is getting more and more imbalanced.

6 Threats to Validity

6.1 Dataset Distribution

Our results face common validity threats due to the biased distribution of the dataset. The dataset in Section 3.2 is built from open-source smart contracts under two main concerns: feasibility and reliability. Since Ponzi logic represents a high-level semantic behind the direct control flow or data flow of a contract, manually recognizing Ponzi logic from bytecode is neither feasible nor reliable. Decompilation tools [25] can be leveraged to mitigate the difficulty, but the error introduced by these tools and the error introduced by recognizing Ponzi logic from the decompiled code can still add imprecision to our dataset. Thus, while we exhibit our detection model from the bytecode level to ensure our model to be capable of analyzing all smart contracts deployed on Ethereum, our evaluation phase is purely based on those smart contracts with available source code. The distribution of open-source smart contracts may differ from the distribution of the entire smart contract family on Ethereum. Thus, our results are best interpreted with respect to the benchmarks we analyzed.
From the practical view, however, building an automatical detection tool for Ponzi scheme is necessary for two reasons: (1) Blockchain users may not be sophisticated in programming and smart contracts. Their interaction with some DApps may be done through a Web interface. Thus, users may not be experienced enough to tell whether a smart contract deploys Ponzi logic by reading the contract code. (2) As described in the Introduction section, one of the main motivations of the proposed work is to detect latent Ponzi contracts based only on bytecode. It may be unrealistic to identify these latent Ponzi contracts based on reading the bytecode or opcode. Our methods can provide a warning guide for users before they interact with bytecode-only contracts. Considering that plenty of latent Ponzi contracts are running on Etherum (as can be found in the previous studies by Bartoletti et al. [1], Chen et al. [17] and SadPonzi [15]), our methods can necessarily benefit the Ponzi scheme detection procedure on the whole Ethereum ecosystem. Therefore, the ML methods indeed have their weakness such as bias to the dataset, but using ML methods in Ponzi scheme detection task has its practical value.

6.2 Robustness against Adversarial Attacks

While the experiments have studied the robustness of Ponzi contract detection methods, we essentially assumed adversarial attacks as orthogonal problems. However, the sustainability against adversarial attacks still falls into practical concerns. Since the main efforts of our method rely on treating the bytecode as natural language tokens, adversaries may cheat the model by purposed code injections. Most of our captured features (i.e., word frequency, TF-IDF, N-gram) focus on the quantitative information of opcode tokens. One possible adversarial attack may inject dead code to cheat on the quantitative information. However, junk code injection in the context of Ethereum introduces overhead costs of extra gas. Gas is a specific mechanism introduced by Ethereum to measure the consumption of computational expenses on the Ethereum network. A smart contract costs gas in two phases: (1) the contract deployment; (2) the contract call. Generally speaking, the longer a smart contract is, the more gas it costs [12, 26]. Therefore, this economical concern may hinder an effective junk code injection process.
On the other hand, we note that the Ponzi scheme detection problem on Ethereum is still in the early stage. Because: (1) most recently reported Ponzi contracts9 still have not adapted any schemes to evade existing detection methods, such as those proposed in [12]; (2) to the best of our knowledge, there are no platforms that integrate existing Ponzi contract detection tools, while plenty of smart contract auction platforms have adopted the tools developed in the field of smart contract vulnerability detection [6, 54]. Therefore, developing a Ponzi scheme detector with robustness against adversarial attacks is left for our future work.

6.3 Update of the Solidity Compiler Version

An additional limitation of MulCas comes with the rapid evolution of Solidity compilers. As a newly emerging program language, Solidity compiler has gone through several important compiler optimizations to fix vulnerabilities and introduce compile options such as dead code elimination. Due to the rapid change of Solidity compiler, a smart contract may be compiled to two slightly different pieces of bytecode under different compiler version. A more detailed study of the difference introduced by compiler version can be found in the work of Huang et al. [29]. Along with Solidity compiler evolution, the unpredictable contract evolution maybe another challenge for our code-based detector. For example, as the DAO brought reentrancy vulnerability into the public concern, developers has now adapted various approaches to exclude the reentrancy vulnerability in their smart contracts. The continuous and unpredictable contract evolution may be another trigger of retraining the MulCas detector.

6.4 Detection for Inter-contract Ponzi Schemes

The proposed detection method is oriented at individual smart contract. The model ignores the interaction between multiple smart contracts even when they are in the same DApp. Therefore, the detection model may miss Ponzi schemes deployed in multiple smart contracts where they collaborate toward malicious behaviors. Smart contracts deployed on Ethereum does not necessarily provide the information that which DApp it belongs to. There has been researches for modeling inter-contract interactions by performing a backward data-flow analysis to recover the contract invocation relationship [5]. Through contract invocation recovery, multiple smart contracts could be manually incorporated into an integrated smart contract for analysis.

7 Related Work

In recent years, blockchain technology has begun to receive extensive attention from researchers. On the one hand, the identification of smart Ponzi schemes belongs to the field of general analysis for smart contracts; on the other hand, it can also be regarded as the detection of malicious software. Next, we summarize relevant research results from two aspects: smart contract analysis and malware detection.

7.1 Smart Contract Analysis

Smart contracts are a key component of blockchain 2.0 and the building block of DApps. Since smart contracts are essentially a collection of bytecode running on Ethereum, vulnerability concerns also exist in the programming of smart contracts. The vulnerability has been extensively studied in traditional software. Combinatorial testing [24] and a combination of dynamic and static features [35] are used for fault localization. Niu et al. [42] further studied Minimal Failure-Causing Schema with respect to multiple faults. However, when it comes to smart contracts, the decentralized nature has brought new vulnerability concerns. To ensure the security of the contract, various vulnerability identification tools are proposed [4, 33, 39]. Yu et al. [62] proposed a general-purpose approach to search and repair the bugs in smart contracts. Besides, the security properties have been studied as reachability properties based on the abstraction of EVM bytecode [51]. To understand the potential impact of the vulnerability of smart contracts, Sayeed et al. [50] analyzed the 7 most important attack techniques and reveal that even adopting the 10 most widely used tools to detect smart contract vulnerabilities, there still exist known vulnerabilities. This fact makes the development of secure smart contracts from a software engineering perspective an important research direction [20, 58].

7.2 Malware Detection

There have been plenty of studies concerning the malware detection problem in the scenario of traditional software. Part of these studies leverages machine learning technologies to build classification models. Cai et al. [9] used a diverse set of dynamic features for Android app classification. Di Xue et al. [61] deployed convolutional neural networks to analyze static features and variable n-grams for dynamic features. Other works studied the sustainability of these learning-based detectors and proposed classification systems based on a new behavioral profile for Android apps [8]. Besides, Java Bytecode has also been verified to detect Android malware families [11]. On the other hand, malicious scams are also popular in the blockchain ecology due to the lack of regulation and decentralized nature. For example, money laundering [22, 41, 55], Ponzi Scheme [3, 56] and market manipulation [16, 23]. A summary of this problem can be found in [57]. To identify these scams, identification models based on machine learning and data mining are widely used. In view of Ponzi schemes in the bitcoin ecology, Bartoletti et al. built a model based on machine learning by extracting transaction records and constructing features [3]. As for smart Ponzi schemes in Ethereum, Chen et al. [17, 18] first proposed the identification framework based on machine learning, which is also the baseline of this study. Ibba et al. [30] and Fan et al. [21] have explored the Ponzi scheme detection problem by using other machine learning models such as Decision Tree, Support Vector Machine and gradient boosting algorithms. Apart from machine learning based methods, static problem analysis method such as Symbolic execution has also been leveraged to detect Ponzi schemes on Ethereum [15]. In terms of phishing scam identification, graph-based methods are widely used in recent studies [14, 60].

8 Conclusion

We construct a new dataset for smart Ponzi scheme detection, which is larger than the dataset used in [17], and extracted many features from different perspectives. Based on this, we propose MulCas, a Multi-view Cascade Ensemble method for detecting smart Ponzi schemes on Ethereum at the creation moment. Our extensive evaluation results show that MulCas outperforms the state-of-the-art approaches and many other baseline methods in terms of F1 and robustness. With no reliance on the interaction information of contracts, our model can give the identification results at the creation moment. Due to the short lifetime of most Ponzi contracts, identificaiton using static features can significantly alleviate the damage caused by Ponzi contracts on Ethereum.

Footnotes

References

[1]
Massimo Bartoletti, Salvatore Carta, Tiziana Cimoli, and Roberto Saia. 2017. Dissecting ponzi schemes on ethereum: Identification, analysis, and impact. (2017). arxiv:1703.03779
[2]
Massimo Bartoletti, Salvatore Carta, Tiziana Cimoli, and Roberto Saia. 2020. Dissecting ponzi schemes on ethereum: Identification, analysis, and impact. Future Generation Computer Systems 102 (2020), 259–277.
[3]
Massimo Bartoletti, Barbara Pes, and Sergio Serusi. 2018. Data mining for detecting bitcoin ponzi schemes. In 2018 Crypto Valley Conference on Blockchain Technology (CVCBT’18). IEEE, 75–84.
[4]
Karthikeyan Bhargavan, Antoine Delignat-Lavaud, Cédric Fournet, Anitha Gollamudi, Georges Gonthier, Nadim Kobeissi, Natalia Kulatova, Aseem Rastogi, Thomas Sibut-Pinote, Nikhil Swamy, et al. 2016. Formal verification of smart contracts: Short paper. In Proceedings of ACM Workshop on Programming Languages and Analysis for Security. 91–96.
[5]
Priyanka Bose, Dipanjan Das, Yanju Chen, Yu Feng, Christopher Kruegel, and Giovanni Vigna. 2022. SAILFISH: Vetting smart contract state-inconsistency bugs in seconds. In 2022 IEEE Symposium on Security and Privacy (SP’22). 161–178. DOI:
[6]
Lexi Brent, Neville Grech, Sifis Lagouvardos, Bernhard Scholz, and Yannis Smaragdakis. 2020. Ethainter: A smart contract security analyzer for composite vulnerabilities. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation. 454–469.
[7]
Vitalik Buterin et al. 2014. A next-generation smart contract and decentralized application platform. Ethereum White Paper (2014).
[8]
Haipeng Cai. 2020. Assessing and improving malware detection sustainability through app evolution studies. ACM Trans. Softw. Eng. Methodol. 29, 2, Article 8 (mar2020), 28 pages. DOI:
[9]
Haipeng Cai, Na Meng, Barbara Ryder, and Daphne Yao. 2018. Droidcat: Effective android malware detection and categorization via app-level profiling. IEEE Transactions on Information Forensics and Security 14, 6 (2018), 1455–1470.
[10]
Wei Cai, Zehua Wang, Jason B. Ernst, Zhen Hong, Chen Feng, and Victor C. M. Leung. 2018. Decentralized applications: The blockchain-empowered software system. IEEE Access 6 (2018), 53019–53033.
[11]
Gerardo Canfora, Fabio Martinelli, Francesco Mercaldo, Vittoria Nardone, Antonella Santone, and Corrado Aaron Visaggio. 2018. Leila: Formal tool for identifying mobile malicious behaviour. IEEE Transactions on Software Engineering 45, 12 (2018), 1230–1252.
[12]
Ting Chen, Youzheng Feng, Zihao Li, Hao Zhou, Xiapu Luo, Xiaoqi Li, Xiuzhuo Xiao, Jiachi Chen, and Xiaosong Zhang. 2020. Gaschecker: Scalable analysis for discovering gas-inefficient smart contracts. IEEE Transactions on Emerging Topics in Computing (2020).
[13]
Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of ACM SIGKKD International Conference on Knowledge Discovery and Data Mining. 785–794.
[14]
Weili Chen, Xiongfeng Guo, Zhiguang Chen, Zibin Zheng, and Yutong Lu. 2020. Phishing scam detection on ethereum: Towards financial security for blockchain ecosystem. In Proceedings of International Joint Conference on Artificial Intelligence Special Track on AI in FinTech. 4506–4512.
[15]
Weimin Chen, Xinran Li, Yuting Sui, Ningyu He, Haoyu Wang, Lei Wu, and Xiapu Luo. 2021. Sadponzi: Detecting and characterizing ponzi schemes in ethereum smart contracts. Proceedings of the ACM on Measurement and Analysis of Computing Systems 5, 2 (2021), 1–30.
[16]
Weili Chen, Jun Wu, Zibin Zheng, Chuan Chen, and Yuren Zhou. 2019. Market manipulation of bitcoin: Evidence from mining the Mt. Gox transaction network. In Proceedings of IEEE INFOCOM 2019-IEEE Conference on Computer Communications. IEEE, 964–972.
[17]
Weili Chen, Zibin Zheng, Jiahui Cui, Edith Ngai, Peilin Zheng, and Yuren Zhou. 2018. Detecting ponzi schemes on ethereum: Towards healthier blockchain technology. In Proceedings of World Wide Web Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1409–1418.
[18]
Weili Chen, Zibin Zheng, Edith C.-H. Ngai, Peilin Zheng, and Yuren Zhou. 2019. Exploiting blockchain data to detect smart ponzi schemes on ethereum. IEEE Access 7 (2019), 37575–37586.
[19]
Xiangping Chen, Peiyong Liao, Yixin Zhang, Yuan Huang, and Zibin Zheng. 2021. Understanding code reuse in smart contracts. In 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER’21). IEEE, 470–479.
[20]
Giuseppe Destefanis, Michele Marchesi, Marco Ortu, Roberto Tonelli, Andrea Bracciali, and Robert Hierons. 2018. Smart contracts vulnerabilities: A call for blockchain software engineering?. In Proceedings of International Workshop on Blockchain Oriented Software Engineering (IWBOSE’18). IEEE, 19–25.
[21]
Shuhui Fan, Shaojing Fu, Haoran Xu, and Xiaochun Cheng. 2021. Al-SPSD: Anti-leakage smart Ponzi schemes detection in blockchain. Information Processing & Management 58, 4 (2021), 102587.
[22]
Yaya Fanusie and Tom Robinson. 2018. Bitcoin laundering: An analysis of illicit flows into digital currency services. Center on Sanctions and Illicit Finance Memorandum, January (2018).
[23]
Neil Gandal, J. T. Hamrick, Tyler Moore, and Tali Oberman. 2018. Price manipulation in the bitcoin ecosystem. Journal of Monetary Economics 95 (2018), 86–96.
[24]
Laleh Sh Ghandehari, Yu Lei, Raghu Kacker, D. Richard Rick Kuhn, David Kung, and Tao Xie. 2018. A combinatorial testing-based approach to fault localization. IEEE Transactions on Software Engineering (2018).
[25]
Neville Grech, Lexi Brent, Bernhard Scholz, and Yannis Smaragdakis. 2019. Gigahorse: Thorough, declarative decompilation of smart contracts. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE’19). IEEE, 1176–1186.
[26]
Neville Grech, Michael Kong, Anton Jurisevic, Lexi Brent, Bernhard Scholz, and Yannis Smaragdakis. 2018. Madmax: Surviving out-of-gas conditions in ethereum smart contracts. Proceedings of the ACM on Programming Languages 2, OOPSLA (2018), 1–27.
[27]
Martin Grohe. 2020. Word2vec, Node2vec, Graph2vec, X2vec: Towards a theory of vector embeddings of structured data. In Proceedings of ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. 1–16.
[28]
Huawei Huang, Wei Kong, Sicong Zhou, Zibin Zheng, and Song Guo. 2021. A survey of state-of-the-art on blockchains: Theories, modelings, and tools. ACM Comput. Surv. 54, 2, Article 44 (mar2021), 42 pages. DOI:
[29]
Jianjun Huang, Songming Han, Wei You, Wenchang Shi, Bin Liang, Jingzheng Wu, and Yanjun Wu. 2021. Hunting vulnerable smart contracts via graph embedding based bytecode matching. IEEE Transactions on Information Forensics and Security 16 (2021), 2144–2156.
[30]
Giacomo Ibba, Giuseppe Antonio Pierro, and Marco Di Francesco. 2021. Evaluating machine-learning techniques for detecting smart ponzi schemes. In 2021 IEEE/ACM 4th International Workshop on Emerging Trends in Software Engineering for Blockchain (WETSEB’21). IEEE, 34–40.
[31]
Brittany Johnson, Yoonki Song, Emerson Murphy-Hill, and Robert Bowdidge. 2013. Why don’t software developers use static analysis tools to find bugs?. In 2013 35th International Conference on Software Engineering (ICSE’13). IEEE, 672–681.
[32]
Roberto Jordaney, Kumar Sharad, Santanu K. Dash, Zhi Wang, Davide Papini, Ilia Nouretdinov, and Lorenzo Cavallaro. 2017. Transcend: Detecting concept drift in malware classification models. In 26th USENIX Security Symposium (USENIX Security 17). USENIX Association, Vancouver, BC, 625–642. https://www.usenix.org/conference/usenixsecurity17/technical-sessions/presentation/jordaney.
[33]
Sukrit Kalra, Seep Goel, Mohan Dhawan, and Subodh Sharma. 2018. ZEUS: Analyzing safety of smart contracts. In 25th Annual Network and Distributed System Security Symposium, NDSS 2018, San Diego, California, USA, February 18–21, 2018. The Internet Society.
[34]
B. M. Kibria and Shipra Banik. 2016. Some ridge regression estimators and their performances. Journal of Modern Applied Statistical Methods 15, 1 (2016), 12.
[35]
Yunho Kim, Seokhyeon Mun, Shin Yoo, and Moonzoo Kim. 2019. Precise learn-to-rank fault localization using dynamic and static features of target programs. ACM Transactions on Software Engineering and Methodology (TOSEM) 28, 4 (2019), 1–34.
[36]
Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of International Conference on Machine Learning. 1188–1196.
[37]
Dun Li, Dezhi Han, Tien-Hsiung Weng, Zibin Zheng, Hongzhi Li, Han Liu, Arcangelo Castiglione, and Kuan-Ching Li. 2022. Blockchain for federated learning toward secure distributed machine learning systems: A systemic survey. Soft Comput. 26, 9 (2022), 4423–4440. DOI:
[38]
Andy Liaw, Matthew Wiener, et al. 2002. Classification and regression by RandomForest. R news 2, 3 (2002), 18–22.
[39]
Chao Liu, Han Liu, Zhao Cao, Zhong Chen, Bangdao Chen, and Bill Roscoe. 2018. Reguard: Finding reentrancy bugs in smart contracts. In Proceedings of International Conference on Software Engineering: Companion (ICSE-Companion). IEEE, 65–68.
[40]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).
[41]
Malte Moser, Rainer Bohme, and Dominic Breuker. 2013. An inquiry into money laundering tools in the bitcoin ecosystem. In Proceedings of eCrime Researchers Summit (eCRS’13). IEEE, 1–14.
[42]
Xintao Niu, Nie Changhai, Yu Lei, Hareton K. N. Leung, and Xiaoyin Wang. 2018. Identifying failure-causing schemas in the presence of multiple faults. IEEE Transactions on Software Engineering (2018).
[43]
Michael Nofer, Peter Gomber, Oliver Hinz, and Dirk Schiereck. 2017. Blockchain. Business & Information Systems Engineering 59, 3 (2017), 183–187.
[44]
Gustavo A. Oliva, Ahmed E. Hassan, and Zhen Ming Jack Jiang. 2020. An exploratory study of smart contracts in the Ethereum blockchain platform. Empirical Software Engineering (2020), 1–41.
[45]
Crypto Panda. 2018. The $3 Million Winner of Fomo3D Is Still Playing to Win. (2018). Retrieved Aug 25, 2020 from “https://en.longhash.com/news/the-3-million-winner-of-fomo3d-is-still-playing-to-win”.
[46]
Feargus Pendlebury, Fabio Pierazzi, Roberto Jordaney, Johannes Kinder, and Lorenzo Cavallaro. 2019. TESSERACT: Eliminating experimental bias in malware classification across space and time. In 28th USENIX Security Symposium (USENIX Security 19). USENIX Association, Santa Clara, CA, 729–746. https://www.usenix.org/conference/usenixsecurity19/presentation/pendlebury.
[47]
Wisam A. Qader, Musa M. Ameen, and Bilal I. Ahmed. 2019. An overview of bag of words; Importance, implementation, applications, and challenges. In Proceedings of International Engineering Conference (IEC’19). IEEE, 200–204.
[48]
Shahzad Qaiser and Ramsha Ali. 2018. Text mining: Use of TF-IDF to examine the relevance of words to documents. International Journal of Computer Applications 181, 1 (2018), 25–29.
[49]
Siraj Raval. 2016. Decentralized Applications: Harnessing Bitcoin’s Blockchain Technology. “O’Reilly Media, Inc.”.
[50]
Sarwar Sayeed, Hector Marco-Gisbert, and Tom Caira. 2020. Smart contract: Attacks and protections. IEEE Access 8 (2020), 24416–24427.
[51]
Clara Schneidewind, Ilya Grishchenko, Markus Scherer, and Matteo Maffei. 2020. eThor: Practical and provably sound static analysis of ethereum smart contracts. In CCS’20: 2020 ACM SIGSAC Conference on Computer and Communications Security, Virtual Event, USA, November 9–13, 2020, Jay Ligatti, Xinming Ou, Jonathan Katz, and Giovanni Vigna (Eds.). ACM, 621–640. DOI:
[52]
Johan A. K. Suykens and Joos Vandewalle. 1999. Least squares support vector machine classifiers. Neural Processing Letters 9, 3 (1999), 293–300.
[53]
Nick Szabo. 1996. Smart contracts: Building blocks for digital markets. EXTROPY: The Journal of Transhumanist Thought, (16) 18, 2 (1996).
[54]
Petar Tsankov, Andrei Dan, Dana Drachsler-Cohen, Arthur Gervais, Florian Bünzli, and Martin Vechev. 2018. Securify: Practical security analysis of smart contracts. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security (CCS’18). Association for Computing Machinery, New York, NY, USA, 67–82. DOI:
[55]
Rolf van Wegberg, Jan-Jaap Oerlemans, and Oskar van Deventer. 2018. Bitcoin money laundering: Mixed results? Journal of Financial Crime (2018).
[56]
Marie Vasek and Tyler Moore. 2018. Analyzing the bitcoin ponzi scheme ecosystem. In Proceedings of International Conference on Financial Cryptography and Data Security. Springer Berlin Heidelberg, Berlin, Heidelberg, 101–112.
[57]
Chen Weili and Zheng Zibin. 2018. Blockchain data analysis: A review of status, trends and challenges. Journal of Computer Research and Development 55, 9 (2018), 1853–1870.
[58]
Maximilian Wohrer and Uwe Zdun. 2018. Smart contracts: Security patterns in the ethereum ecosystem and solidity. In Proceedings of Workshop on Blockchain Oriented Software Engineering (IWBOSE’18). IEEE, 2–8.
[59]
Gavin Wood et al. 2014. Ethereum: A Secure Decentralised Generalised Transaction Ledger. Retrieved August 28, 2020 from https://github.com/ethereum/yellowpaper.
[60]
Jiajing Wu, Qi Yuan, Dan Lin, Wei You, Weili Chen, Chuan Chen, and Zibin Zheng. 2022. Who are the phishers? Phishing scam detection on ethereum via network embedding. IEEE Transactions on Systems, Man, and Cybernetics: Systems 52, 2 (2022), 1156–1166. DOI:
[61]
Di Xue, Jingmei Li, Tu Lv, Weifei Wu, and Jiaxiang Wang. 2019. Malware classification using probability scoring and machine learning. IEEE Access 7 (2019), 91641–91656.
[62]
Xiao Liang Yu, Omar Al-Bataineh, David Lo, and Abhik Roychoudhury. 2020. Smart contract repair. ACM Transactions on Software Engineering and Methodology (TOSEM) 29, 4 (2020), 1–32.
[63]
Huang Yuan, Queping Kong, Nan Jia, Xiangping Chen, and Zibin Zheng. 2019. Recommending differentiated code to support smart contract update. In 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC’19). IEEE, 260–270.
[64]
Xiaohan Zhang, Yuan Zhang, Ming Zhong, Daizong Ding, Yinzhi Cao, Yukun Zhang, Mi Zhang, and Min Yang. 2020. Enhancing state-of-the-art classifiers with API semantics to detect evolved android malware. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security (CCS’20). Association for Computing Machinery, New York, NY, USA, 757–770. DOI:
[65]
Yajin Zhou, Zhi Wang, Wu Zhou, and Xuxian Jiang. 2012. Hey, you, get off of my market: Detecting malicious apps in official and alternative android markets. In NDSS, Vol. 25. 50–52.
[66]
Weiqin Zou, David Lo, Pavneet Singh Kochhar, Xuan-Bach D. Le, Xin Xia, Yang Feng, Zhenyu Chen, and Baowen Xu. 2019. Smart contract development: Challenges and opportunities. IEEE Transactions on Software Engineering (2019).

Cited By

View all
  • (2024)Adaptive Attention-Based Graph Representation Learning to Detect Phishing Accounts on the Ethereum BlockchainIEEE Transactions on Network Science and Engineering10.1109/TNSE.2024.335508911:3(2963-2975)Online publication date: May-2024
  • (2024)SCcheck: A Novel Graph-Driven and Attention- Enabled Smart Contract Vulnerability Detection Framework for Web 3.0 EcosystemIEEE Transactions on Network Science and Engineering10.1109/TNSE.2023.332494211:5(4007-4019)Online publication date: Sep-2024
  • (2024)SourceP: Detecting Ponzi Schemes on Ethereum with Source CodeICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10448439(4465-4469)Online publication date: 14-Apr-2024
  • Show More Cited By

Index Terms

  1. Securing the Ethereum from Smart Ponzi Schemes: Identification Using Static Features

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Software Engineering and Methodology
      ACM Transactions on Software Engineering and Methodology  Volume 32, Issue 5
      September 2023
      905 pages
      ISSN:1049-331X
      EISSN:1557-7392
      DOI:10.1145/3610417
      • Editor:
      • Mauro Pezzè
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 22 July 2023
      Online AM: 24 November 2022
      Accepted: 28 October 2022
      Revised: 22 October 2022
      Received: 10 February 2021
      Published in TOSEM Volume 32, Issue 5

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Blockchain
      2. Ethereum
      3. Ponzi schemes
      4. malware detection

      Qualifiers

      • Research-article

      Funding Sources

      • National Key R&D Program of China
      • National Natural Science Foundation of China
      • Guangdong Basic and Applied Basic Research Foundation
      • Youth Innovation Talent Program in Universities and Colleges of Guangdong Province
      • Technology Program of Guangzhou, China

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)2,506
      • Downloads (Last 6 weeks)152
      Reflects downloads up to 30 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Adaptive Attention-Based Graph Representation Learning to Detect Phishing Accounts on the Ethereum BlockchainIEEE Transactions on Network Science and Engineering10.1109/TNSE.2024.335508911:3(2963-2975)Online publication date: May-2024
      • (2024)SCcheck: A Novel Graph-Driven and Attention- Enabled Smart Contract Vulnerability Detection Framework for Web 3.0 EcosystemIEEE Transactions on Network Science and Engineering10.1109/TNSE.2023.332494211:5(4007-4019)Online publication date: Sep-2024
      • (2024)SourceP: Detecting Ponzi Schemes on Ethereum with Source CodeICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10448439(4465-4469)Online publication date: 14-Apr-2024
      • (2024)Ponzi Scheme Detection Based on CNN and BiGRU combined with Attention Mechanism2024 27th International Conference on Computer Supported Cooperative Work in Design (CSCWD)10.1109/CSCWD61410.2024.10580692(1852-1857)Online publication date: 8-May-2024
      • (2024)Enhancing smart contract security: Leveraging pre‐trained language models for advanced vulnerability detectionIET Blockchain10.1049/blc2.12072Online publication date: 29-Mar-2024
      • (2024)IDPonzi: An interpretable detection model for identifying smart Ponzi schemesEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.108868136(108868)Online publication date: Oct-2024
      • (2024)DS-Ponzi: Anti-jamming Detection of Ponzi Scheme on Ethereum Utilizing Dynamic-Static Features of Smart Contract CodesDatabase Systems for Advanced Applications10.1007/978-981-97-5575-2_5(70-86)Online publication date: 2-Sep-2024
      • (2024)Detection of malicious smart contracts by fine‐tuning GPT‐3SECURITY AND PRIVACY10.1002/spy2.430Online publication date: 9-Jun-2024
      • (2023)Enhancing Smart-Contract Security through Machine Learning: A Survey of Approaches and TechniquesElectronics10.3390/electronics1209204612:9(2046)Online publication date: 28-Apr-2023
      • (2023)Strengthening the Security of Smart Contracts through the Power of Artificial IntelligenceComputers10.3390/computers1205010712:5(107)Online publication date: 18-May-2023
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media