Proof-of-work based new encoding scheme for information hiding purposes

Pawel Rajba, University of Wroclaw, Poland, pawel@cs.uni.wroc.pl

Joerg Keller, FernUniversitaet in Hagen, Germany, joerg.keller@fernuni-hagen.de

Wojciech Mazurczyk, Warsaw University of Technology, Poland, wojciech.mazurczyk@pw.edu.pl

DOI: https://doi.org/10.1145/3600160.3605085
ARES 2023: The 18th International Conference on Availability, Reliability and Security, Benevento, Italy, August 2023

Steganography techniques often assume that the secret message looks randomly or is encrypted. If encryption is required, it leads to a random-looking message, but key exchange may be problematic and jeopardize covert communication. If encryption is not required, then the question arises of whether other cryptographic solutions that are “cheaper” than encryption can provide the same level of randomness. In this paper, we investigate both questions. First, we propose a proof-of-work-inspired approach to securely transfer the key with the encrypted message, avoiding a previous key exchange. Second, we introduce a scheme that uses T-functions to substitute symmetric encryption algorithms. We implement both proposed solutions, measure the entropy of the resulting messages, and apply the Kolmogorov-Smirnoff tests. The results obtained prove that both schemes are feasible.

CCS Concepts: • Security and privacy → Web application security; • Security and privacy → Web protocol security;

Keywords: information hiding, steganography, randomness, encryption, key exchange, proof-of-work

ACM Reference Format:
Pawel Rajba, Joerg Keller, and Wojciech Mazurczyk. 2023. Proof-of-work based new encoding scheme for information hiding purposes. In The 18th International Conference on Availability, Reliability and Security (ARES 2023), August 29--September 01, 2023, Benevento, Italy. ACM, New York, NY, USA 8 Pages. https://doi.org/10.1145/3600160.3605085

1 INTRODUCTION

Information hiding, particularly steganography, deals with techniques for stealth storing or transmitting secret messages [10]. So far, many different techniques have been proposed in the literature [9, 12, 13, 18], and it must be noted that malware creators increasingly utilize such methods to cover their tracks (e.g., exfiltrate confidential data or download further malware modules) [2, 3]. In many schemes presented so far, to make the steganalysis more challenging, it is desired to make the secret messages look random [10]. Sometimes, an even distribution of byte values is to be achieved, or an entropy must be matched, for example, to remain unsuspicious. Although this randomization can be achieved with encryption of the secret message, which additionally provides confidentiality of the secret message even if its existence is revealed, the two properties are different. If encryption is used, it requires shared knowledge of the key by covert sender and receiver, e.g., by a key exchange, which may not always be possible to achieve in a secure manner. Moreover, encryption and decryption incur some effort on the side of the covert sender and receiver, respectively, which in turn might compromise the stealthiness of the scheme, although the secret message itself is well hidden. Considering the above, two questions arise:

If confidentiality is needed, how can key exchange prior to message transfer be avoided.
If confidentiality is not needed, can encryption algorithms be substituted by algorithms that still provide a suitable level of randomness, but require less effort than encryption.

In this research, we address both questions.

Regarding the key, we propose a proof-of-work-inspired approach that allows secure transmission of the key with the encrypted message, thus avoiding the need for a previous key exchange. This comes at the cost of “key mining” by the covert sender or receiver, but not both. e Obviously, in this case, we need to keep in mind that we provide the confidentiality of the message on a minimal level. Thus, if the steganalyst discovers that the hidden transmission is taking place, then it is relatively easy to decrypt the secret message. Thus, the main goal of our approach is to make it harder for detectors inspecting network traffic to discover secret communication.

Regarding the lightweight scheme, we propose to substitute encryption by T-functions which have been used previously in pseudo-random number generators [8]. We implement both schemes to measure the effort invested by the covert sender and the covert receiver and to measure the entropy of the resulting message to check for suitable randomness. Our preliminary experimental results, presented in this paper, indicate that both schemes are feasible.

Both schemes should be seen as promising examples of a large field of possible solutions that may range from ID-based encryption, where the receiver's identity serves as a key to encrypt a session key that, in turn, is used to encrypt the secret message, to lossless compression, where entropy is increased by removing statistical redundancy in the original message, and which might look random, too. However, it should be emphasized that exploring the full spectrum of solutions is beyond the scope of this work. Still, we discuss some related approaches in Section 2.5.

Considering the above, the main contributions of this paper are summarized as follows:

We investigate the randomization of messages and encryption of messages as separate, yet partly related entities,
We propose a scheme that avoids key exchange before the actual transmission of the secret message when encryption is needed. Proof-of-work approaches inspire the scheme and come at the cost of “key mining” on either the covert sender or covert receiver side.
We propose a scheme that substitutes symmetric block encryption algorithms by T-functions to lower computational effort without compromising the randomness of the resulting message.
We implement both schemes and perform experiments to demonstrate that randomization without encryption can indeed be provided at a smaller cost and that, in the case of encryption, the key exchange can be avoided by transmitting a hashed key, where the effort on the side of the covert sender to generate that key might be traded for effort on the receiver side to recover the key.

The remainder of this paper is structured as follows. Section 2 summarizes the techniques needed for our investigation and describes the most notable published works related to the topic described in this paper. Then in Section 3, we present two proposals on how to randomize a secret message if either confidentiality is needed or not. Section 4 presents the concrete implementations of our proposals and describes the preliminary experiments we performed for evaluation, while Section 5 presents the results of the analysis of our experiments. Finally, Section 6 presents conclusions and an outlook for future work.

2 BACKGROUND AND RELATED WORK

In this section, we first present relevant background in encryption and its modes, pseudo-random number generation, as well as cryptographic hash functions and proof-of-work. Then we focus on describing the most notable related work in this area.

2.1 Encryption and Encryption Modes¹

With symmetric encryption, a sender applies an encryption function E, parameterized by a key k, onto a plaintext message m to obtain a ciphertext message c = E_k(m) to be transmitted. Without the key, the ciphertext may not give any hint about the plaintext to an eavesdropper, and thus normally looks random. With the key, the receiver can apply the corresponding decryption function D on the ciphertext and recovers the plaintext, i.e., m = D_k(c). We do not go into details of asymmetric or even ID-based encryption here, as they are much more computationally intensive than symmetric encryption. However, they avoid the problem of key exchange. With a symmetric encryption algorithm, the sender and the receiver must securely agree on or exchange the key prior to the actual message transmission.

Symmetric encryption algorithms are classified into stream ciphers and block ciphers. The former encrypt each bit or symbol of the plaintext by producing a key stream of bits or symbols and combining those with the plaintext by a linear function such as exclusive or. The latter encrypt a block of a fixed bit length, where multiples of 64 are usual. To ensure proper decryption, the encryption function E_k must be injective for each possible value of the key k, which means that it is bijective in the usual case that blocksize for plaintext and ciphertext blocks is identical. The block decryption function is the inverse function of the encryption function, when both use the same key.

If the message is longer than a block, an encryption mode must be used to avoid equal plaintext blocks being transformed into equal ciphertext blocks that would allow statistical analysis.

In Cipher Block Chaining (CBC) mode, a plaintext message m is split into blocks m₁m₂…, which are subsequently encrypted after being combined with the previous ciphertext block, i.e., c_i = E_k(m_i⊕c_{i − 1}) for i = 1, 2, …, and c = c₁c₂…. To start the scheme, an initialization vector (IV) is needed, which serves as c₀. Decryption is non-recursive, i.e., m_i = D_k(c_i)⊕c_{i − 1}.

In the Counter mode, values of a counter are subsequently encrypted and serve as a kind of key stream for the plaintext blocks, that is, c_i = m_i⊕E_k(ctr_i) for i = 1, 2, …. The start value ctr₁ of the counter is normally a random value or nonce appended by a sufficient number of zero bits, and ctr_{i + 1} = ctr_i + 1 as the name counter suggests. Also here, the key might serve as the nonce. Decryption occurs via m_i = c_i⊕E_k(ctr_i), i.e., only the encryption function is used in counter mode, which may be advantageous if decryption incurs more computation than encryption.

2.2 Pseudo-Random Number Generators

A pseudo-random number generator (PRNG) is a finite state machine, which gets initialized into a state by seeding and from then, on each call, produces an output value that depends on the actual state and transitions into a follow-up state. Thus, a PRNG is structurally similar to a stream cipher, however, with partly different security requirements if used in a cryptographic setting. The output produced should be uniformly distributed in their range and pass statistical tests of randomness [1]. For example, if only bits are output, then the sequence 0, 1, 0, 1, … has a uniform distribution over {0, 1} but will not pass the tests, as it is too regular. In order to avoid regularity, the period, i.e., the number of transitions until an inner state is invoked again, should be as long as possible. This is best achieved if the transition function is bijective, that is, a permutation on the set of the inner states, as such a function can consist of only one cycle that includes all states. A randomly chosen permutation has an expected longest cycle length of about 0.62N if N is the number of inner states [15]. In contrast, a randomly chosen arbitrary function on the set of states (most likely non-bijective) has an expected cycle length of about $1.5\sqrt {N}$ [5].

T-Functions [7] are functions that can be expressed by simple formula and of which classes can be given, where functions are bijective and only comprise one cycle. Such a function, $f(x)=x+(x\cdot x \vee 5) \bmod 2^{64}$, has been used as part of the transition function in the AKARI-X PRNG [8]. By ∨, the bitwise OR of unsigned integers in binary representation is meant.

2.3 Cryptographic Hash Functions and Proof-of-Work

A hash function is a function h: U → R, where mainly U = {0, 1}^* and R = {0, 1}ⁿ ⊂ U. A cryptographic hash function is a hash function with one-way property (non-invertible) and collision resistance (in practice injective) [11]. For a good hash function, we assume that for each input value, the probability of each bit of the hash value being equal to 1 is 50%.

A proof-of-work is a protocol in which one participant demonstrates to another participant that he has invested some amount of work [6]. A typical form appears in cryptocurrencies, where a given bit string s must be completed by a fixed length second-bit string s′ such that h(s|s′) starts with a predefined number of zeros. Although finding such an s′ is computationally intensive, checking that h(s|s′) has the required property when s and s′ are known only requires one evaluation of the hash function. If j bits must be zero, then the chance that a randomly chosen s′ leads to an appropriate hash value is p = 2^{− j}. Thus, the expected number of trials until a hit follows a geometric distribution and thus is 1/p = 2^j [4].

2.4 Network Steganography

Network steganography, as a prominent example of information hiding, is the art of hiding the existence of a secret message within an innocent communication called a carrier, and it is often used for criminal purposes such as botnet communication [2]. If the arbitrary content of a message is replaced by a secret message, then the secret message might be randomized or encrypted to avoid unusual or repeating patterns, as a warden might sit on the communication path. The magic triangle of network steganography is steganographic bandwidth, stealthiness, and robustness (against modifications that the carrier may undergo). In order to maximize one parameter, normally the others are decreased. Stealthiness is not only related to unsuspicious message content but often also connected with resource usage on the sender or receiver side. If, for instance, a botnet-infected machine is notably slower than usual because of computationally intensive encryption when leaking data, it might be re-installed. Thus, techniques to reduce suspicion of messages should also take into account computations for preparing and injecting the secret message. Therefore, encryption might be too heavy.

2.5 Related Work

In experiments of all kinds (ranging from medicine to social science and physics), and sometimes in algorithmics, a random order of a set of n objects is sought. This is achieved by applying a permutation, randomly chosen among the set of all permutations on n objects. Yet, here, the permutation is explicit, i.e., stored in a kind of table (explicitly or implicitly, e.g., by storing the objects in the random order, yet maintaining their original order). Finding such a random permutation can be reduced to drawing a sequence of random numbers [15]. In contrast, we need a random bijective function, i.e., a random permutation of all possible values in a block of bits, and this function must be expressed as a piece of code.

More closely is the field of pseudo-random number generators, where the outputs must pass a number of statistical tests that range from simple ones, such as tests if all possible output values appear in similar frequencies, to quite complex tests, cf. e.g. NIST or DIEHARD suites of tests [1]. This is achieved at least in part by using a state transition function that computes the follow-up state in a randomly looking way. If the state transition function is bijective, then it is a random permutation on the set of states. T-functions [7] have been proposed for such a purpose and have been used in the AKARI-X pseudo-random number generator [8]. We will borrow their idea and transfer the use of T-functions to our use of randomization, acknowledging that T-functions, even if parameterized (the parameter serves as a key), might be attacked when used as symmetric block ciphers, e.g. cf. [17].

Symmetric block encryption algorithms must possess the property that their output looks random to resist attacks such as linear and differential cryptanalysis [11]. Thus, after choosing a key, the encryption algorithm realizes a random-looking permutation on the set of possible block values.

Also, cryptographic hash functions produce random-looking outputs, to realize the one-way property. Yet, they do not have injectivity property, even if the domain is restricted to the range of hash values. However, if the function needs not to be inverted, such as in counter mode, it can still be used. Cryptographic hash functions have been used in the past to replace encryption algorithms in message authentication codes (HMAC vs. classic MAC) [11].

Lossless compression seeks to eliminate statistical redundancy in an information representation, with the aim of providing a shorter representation of the same information [14]. As a by-product, the compressed information typically looks random because the entropy is increased. However, this is not guaranteed as it is not a design goal.

3 PROOF-OF-WORK-INSPIRED RANDOMIZATION AND ENCRYPTION

As a first scenario, we consider a covert sender that randomizes a secret message m = m₁m₂… by encrypting it symmetrically with a key k in CBC mode using an initialization vector IV = c where c is a randomly chosen salt or nonce. To ensure that the covert receiver can decrypt the secret message, the covert sender must also transmit the key k and the nonce c. The covert sender can simply precede the encrypted message E_k(m) with c and k, but then confidentiality is not provided anymore, and there may be randomization approaches that require less computation on the side of the covert receiver, and thus may be more stealthy. Instead, the covert sender can transmit a hashed version of the key. As hash block sizes are normally longer than key sizes, and to be able to control the effort to be invested by the covert receiver, the covert sender chooses a key k where the first j bits are set to zero and ultimately transmits c|H(c|k)|E_k(m).

Upon receiving this message, the covert receiver performs a kind of proof-of-work scheme. It repeatedly applies a randomly chosen key k′ with the first j bits set to zero and computes H(c|k′), until it equals H(c|k). As H is assumed to be resistant to collisions, the covert receiver can assume k′ = k, and it can derive the secret message by decrypting: $m=D_{k^{\prime }}(E_k(m))$. If the key size is n bits, then there are 2^{n − j} possible keys, and on average the covert receiver will try out half of them before obtaining a hit. The general flow is presented in Figure 1.

Figure 1: The overall flow for PoW-based messages encoding

The covert sender and the covert receiver must agree in advance on the encryption function E, the hash function H, and the number j of bits set to zero in the key.

The rationale behind this approach is that, in many cases, the covert sender may be in greater danger of being detected (e.g., a dissident in an oppressive regime) than the covert receiver if it is not very careful to lie low and only scarcely does computations. The covert receiver, on the other hand, often can either freely do computations, because that computer is under the attacker's control, or it can forward the encrypted message into a cloud where it can be decrypted even with high computational effort.

The confidentiality of the secret message is not absolute, as a warden (an entity that in an information hiding area is typically responsible for hidden communication detection and prevention) could, in principle, apply the computations that the covert receiver does. However, since a warden normally does not know in advance if a particular transmission contains an encrypted secret message, it would be forced to perform such a key recovery on many strings, which might not be possible anymore with its computational capabilities. Moreover, the warden may be more interested in interrupting this transmission than decrypting a particular secret message.

If confidentiality is not needed, then also an encryption algorithm is not necessarily needed in the scheme above, but any bijective function will. This reduces effort and does not require transmitting a key. Thus, in this case, a T-function could be used. However, the inverse of a T-function, which is needed for encryption, is often notably slower than the T-function itself [7], so this could create problems for the covert receiver if it also needs to lie low and can do computations only scarcely.

Instead of using the CBC mode, the Counter mode can also be used. In this case, both the covert sender and the covert receiver apply the same encryption algorithm, and no decryption algorithm is needed. If confidentiality is not needed, so we propose to replace the encryption algorithm with a T-function, as it seems a suitable and work-efficient alternative in this setting.

The transmission overhead is the nonce and the hashed key if confidentiality is needed. The nonce is chosen randomly, and the hash value will look random so that the whole data for transmission is randomized. The nonce also ensures that if a key is used twice, it will result in different hash values. If this is not an issue, then also the key might be used as a nonce, and only the hashed key must be transmitted.

If confidentiality is not needed, overhead consists of the nonce.

Transmission overhead can be notable for short messages, yet will be marginal for longer data transfers.

Note that there might be a situation where an additional effort may be expected from the covert sender rather than from the covert receiver. In order to address that, it is possible to introduce the proof-of-work effort on the sender side, however, it is out of the scope of this paper. However, we plan to explore such scenarios in further research.

4 ENCODING TECHNIQUES

Each encoding technique will be classified according to the following criteria:

Additional intentional effort required by the covert receiver,
Need for key exchange,
Confidentiality,
Integrity checks.

First, let us discuss the asymmetry in the expected effort between the covert sender and the covert receiver. Although the covert sender wants to prepare the secret message without unnecessary delay, additional effort on the receiver side is introduced to make the task of detecting and discovering the message for any attacker online much more difficult and time-consuming. Let us consider an environment where there is a huge amount of data transferred each hour and only a minuscule portion is our covert message. In such a case, we can see that even a small additional effort might be a significant difficulty for detection and analysis. The technique itself is also recognized and applied in many other areas, e.g., against brute force password hashing by introducing the so-called slow hashing functions.

It is also worth mentioning that the level of computational effort can be managed by how the encryption key is created, and we can influence the following two aspects:

Alphabet. The space search will be much smaller (and faster) if we agree only on digits, it will be bigger if we agree on digits plus lower- and upper-case letters, and it will be the biggest one if we agree on all possible characters. In our considerations, we applied digits only as a simple and good enough approach for the purpose.
Key pattern. Agreeing on certain patterns, e.g., a number of leading 0’s in case of digits only, can reduce the search space. Moreover, we can control that by agreeing on a number of leading 0’s. Obviously, the pattern can be anything like a fixed part, different parts of the key encoded with different alphabets, etc. In our consideration, keeping in mind that we use digits only as the alphabet, we applied the leading 0’s pattern as, again, a simple and good enough approach for the purpose.

Now let us introduce basic terms and notation:

s₁|s₂ - a concatenation of strings s₁ and s₂
c - random salt
iv - random initialization vector required by encryption CBC mode
H(·) - hash function, in our consideration it will be SHA256
E_k(·) - encryption function with the key k, in our consideration it will be DES or AKARI²
D_k(·) - decryption function with the key k, as above.
m - secret open text message
S - secret encoded message
CS - covert sender
CR - covert receiver

In the following sections, we will introduce the proposed schemas.

4.1 Scheme with DES and intentional CR effort (Sch-1)

4.1.1 Encoding and sending by CS. Input:

m - secret message
c - 8 bytes random salt
iv = c
k - 7 bytes of key with z zero bits prefix (z adjusted to the expected security level)
E(·) - DES in CBC mode
H(·) - SHA256

Output:

4.1.2 Decoding and receiving by CR. Input:

Output:

split S into: c, H(c|k), E_{k, iv}(m)
generate k′ by enumeration or repeated random choice (assuming z zeros bits prefix) until:

under the assumption of collision resistance of H we obtain k = k′
Decrypt E_{k, iv}(m) using k and c and obtain m

4.2 Scheme with AKARI-1 or T-function and no intentional CR effort (Sch-2)

4.2.1 Encoding and sending by CS. Input:

m - secret message
c - 12 bytes random nonce
E(·) - AKARI-1 in CNT mode or T-function (defined as $f(x)=x+(x\cdot x \vee 5) \bmod 2^{64}$) in CNT mode

Output: T-function is defined clearly, but let us quickly recap how the AKARI-1 function works: it takes 16 (left 8 and right 8) bytes of seed, and an output produces 16 bytes of output where the left 8 bytes are considered random. So, both the T-function and AKARI-1 function return 8 bytes of random output. Having that, we split the message m into 8 bytes blocks m₁, m₂, … and then calculate n₁, n₂, … in the following way:

\begin{eqnarray*} n_1 & = & {AKARI1}\_OR\_T(c|0000)\ {\rm XOR}\ m_1\\ n_2 & = & {AKARI1}\_OR\_T(c|0001) \ {\rm XOR}\ m_2\\ \ldots && \end{eqnarray*}

Finally, the following output is created and sent to the receiver:

4.2.2 Decoding and receiving by CR. Input:

Output:

Split S into: c, n₁, n₂, etc.
Decrypt n_i into m_i following the same schema as before:

\begin{eqnarray*} m_1 & = & {AKARI1}\_OR\_T(c|0000)\ {\rm XOR}\ n_1 \\ m_2 & = & {AKARI1}\_OR\_T(c|0001)\ {\rm XOR}\ n_2 \\ \ldots && \end{eqnarray*}
reconstitute m = m₁|m₂|…

5 ANALYSIS

In this section, we analyze several properties of all proposed schemas:

Bandwidth (a.k.a. data hiding capacity),
Stealthiness (a.k.a. how random the message is and how difficult it would be to discover embedded hidden data),
Robustness.

The test bed for analysis is created as follows. We take a sample text from the literature, i.e., the first chapter of Master and Margarita by Mikhail Bulhakov, and we split it into M₁, …, M_n equal size messages. In our considerations, we take messages of size 1000 characters, which resulted in 25 full files (the last one of less size was discarded).

Having the messages, we executed both operations, i.e., encoding and decoding, for all the messages and, based on that, performed the below analysis.

In the following sections, we analyze each of the proposed schemas.

5.1 Scheme with DES and intentional CR effort (Sch-1)

5.1.1 Bandwidth. Calculations are quite straightforward here: having a message of length n, our scheme produces an output of length n + 40 where 8 bytes are for the salt (also playing the role of IV for encryption in a CBC mode), 32 bytes are for SHA256, and n bytes for ciphertext. However, this is valid, assuming n is a multiple of 8 (if not, we need to add the padding to the size of 8).

5.1.2 Stealthiness. We analyze stealthiness by looking at the following aspects. On the one hand, we can refer to papers that conclude that both DES and SHA256 results are indistinguishable from the randomly generated bits stream ([1], [16]), so if we also add randomly and independently generated nonce (serving for IV and salt), the final covert message should also keep the property.

On the other hand, despite the above, we also applied some simple tests to ensure that in case some tool like IDS/IPS is continuously scanning the network stream, by making an initial analysis, it is not possible to get suspicious of the content.

Entropy We consider the entropy based on the distribution of characters in the array of considered bytes. For each message, we calculate the entropy for the following:

the plain message itself,
the covert message,
randomly generated bytes.

On top of that, we calculated the maximum possible entropy for a given string length as a reference. The calculated results can be found in Table 1.

Table 1: Entropy analysis for DES for all 25 considered messages

Message type	Mean	Std. Dev.
Plain message	4.3538	0.0595
Covert message	4.2358	0.0439
Random string	4.2345	0.0407
Maximum entropy	10.0223	—

Based on the presented results, we can observe that entropy for the covert message and a random string are almost indistinguishable. Moreover, as the deviation is almost negligible, the chance that the IDS/IPS tool will classify the traffic as suspicious and start making any further deep analysis is very low.

K-S test

The other test we applied is the Kolmogorov-Smirnoff test for two samples to check the similarity between the underlying distributions. The null hypothesis is that two samples were drawn from the same distribution. We apply a confidence level of 95%, so in other words, we will reject the null hypothesis in favor of the alternative if the p-value is less than 0.05.

In our consideration, one sample is the covert message, and the other one is the randomly generated string using the uniform distribution. The summary of the results obtained is given in Table 2.

Table 2: Kolmogorov-Smirnoff test analysis for DES for all 25 considered messages

Mean	Std. Dev.
0.5319	0.3327

As the p-value of 0.5319 is definitely not below our threshold of 0.05 (even considering quite a large deviation), we cannot reject the null hypothesis. This means that the IDS/IPS tool has no reason to consider the traffic suspicious and start making any further deep analysis.

5.1.3 Robustness. This encoding scheme does not assume any robustness, i.e., there is no protection against any accidental and malicious message tampering.

5.2 Scheme with AKARI-1 or T-function and no intentional CR effort (Sch-2)

5.2.1 Bandwidth. Calculations are quite straightforward here: having a message of length n, our scheme is producing an output of length n + 12 where 12 bytes (not 16 bytes because 4 missing bytes are dedicated for a counter to support counter encryption mode) are for the nonce and n bytes for ciphertext. However, this is valid, assuming n is a multiple of 8 (if not, we need to add the padding of the size of 8).

5.2.2 Stealthiness. We analyze stealthiness by applying the same approach as in 5.1.

Entropy We consider the entropy based on the distribution of characters in the array of considered bytes. For each message, we calculate the entropy for the following:

the plain message itself,
the covert message,
randomly generated bytes.

On top of that, we calculated the maximum possible entropy for a given string length as a reference. The calculated results one can find in:

Table 3 for T-function, and
Table 4 for AKARI-1 function.

Table 3: Entropy analysis for T-function for all 25 considered messages

Message type	Mean	Std. Dev.
Plain message	4.3538	0.0595
Covert message	4.2438	0.0347
Random string	4.2462	0.0359
Maximum entropy	10.0223	—

Table 4: Entropy analysis for AKARI-1 function for all 25 considered messages

Message type	Mean	Std. Dev.
Plain message	4.3538	0.0595
Covert message	4.2488	0.0361
Random string	4.2466	0.0383
Maximum entropy	10.0223	—

From the results presented, it is easily visible that the entropy of the covert message and a random string are almost indistinguishable both for the T-function and AKARI-1 schemes. Moreover, as the standard deviation is almost negligible, the chance that IDS/IPS tool will classify the traffic as traffic suspicious and start making any further deep analysis is very low.

K-S test

As previously, we also applied the Kolmogorov-Smirnoff test for two samples to check the similarity between the underlying distributions under the same assumptions.

In our consideration, one sample is the covert message, and the other one is the randomly generated string using the uniform distribution. The summary of the results obtained is presented in Tables 5 (for the T function) and 6 (for AKARI-1).

Table 5: Kolmogorov-Smirnoff test analysis for T-function for all 25 considered messages

Mean	Std. Dev.
0.5929	0.3461

Table 6: Kolmogorov-Smirnoff test analysis for AKARI-1 function for all 25 considered messages

Mean	Std. Dev.
0.6603	0.2381

It is worth noting that both the p-values are not below our threshold of 0.05 (even considering quite large deviations) as the p-values are 0.5929 for the T-function and 0.6603 for the AKARI-1. Thus, we cannot reject the null hypothesis. This means that the IDS/IPS tool should not consider the inspected traffic suspicious.

5.2.3 Robustness. This encoding scheme does not assume any robustness, i.e., there is no protection against any accidental and malicious message tampering.

5.3 Proof-of-Work consideration

In the schema Sch-1 we introduce the so-called intentional covert receiver effort, which is the idea based on the Proof-of-Work concept from Blockchain. The main reasoning behind introducing the concept into the information hiding domain and sending secret messages is as follows. The network traffic is scanned for any potential anomalies like malware software, attackers’ activities, and many others, including hidden messages being a part of steganography communication, e.g., to control the botnet. The inspecting solutions (e.g., IDS/IPS systems) must scan huge amounts of data in real time. Now, if we introduce a mechanism that forces IDS/IPS to spend a certain amount of time for each portion of data to make it meaningful, then the task for IDS/IPS to identify suspicious packets is getting much more difficult as the software cannot operate in real-time anymore.

In our schema, we introduce a mechanism based on key search, so to be able to restore the original message, a specific decryption key needs to be found. As discussed at the beginning of Section 4, the decryption key can be based on a certain alphabet, it can also follow a certain pattern, and in our consideration, we applied the (a) digits only alphabet, and (b) certain leading 0’s pattern. In Bitcoin, each block is supposed to be added (i.e., proof-of-work task completed) in 10 minutes, and the task is being adjusted over time to align with the technology evolution (including applying GPU or ASICs). Obviously, in our scenario, there is no need for such a sophisticated adjustment. Still, we can make a rough time estimate assuming a single core on a modern processor produced in recent 2-3 years. Having that, we made a quick evaluation to see what key pattern gives which delays. The results are summarized in Table 7.

Table 7: Time needed to calculate the specific number of hashes

Hashes number	Average time [s]	StdDev [s]
10⁴	0.0005	0.0030
10⁵	0.1460	0.0112
10⁶	1.5463	0.0202
10⁷	15.4655	0.1569
10⁸	156.2532	0.6384

Based on the results presented, it is easily observable that till 10⁴, that time is around 0, but starting from 10⁵, it increases linearly with very small deviations. From our key search perspective, we can conclude time ranges summarized in Table 8.

Table 8: Time needed to find a key of a specific pattern (X - any digit, Y - any digit except 0)

Name	Key pattern	Min time [s]	Max time [s]
A	000YXXXX	0.00	0.14
B	00YXXXXX	0.14	1.54-0.14= 1.40
C	0YXXXXXX	1.54	15.46-1.54= 13.92
D	YXXXXXXX	15.46	156.25-15.46= 140.79

The results presented in Table 8 show that we have quite significant flexibility to control the time the receiver needs to restore the message, keeping in mind that the same time will be needed by detecting software to discover the message. In most cases, pattern B will be good enough to minimize the risk of detection to a value close to 0, and pattern C gives almost certainty (assuming an oblivious scenario where the attacker does not expect a secret message in the traffic). To show the magnitude of different ranges, the visual representation is presented in Figure 2.

Figure 2: Time ranges to find a key of a specific pattern

Finally, since the ranges 1, 5s − 13, 9s (C) and 15, 4s − 140, 8s (D) can be seen to be quite wide, we can introduce subclasses like 0YZXXXXX where Z can be only certain digits, e.g., 5 − 9, and by that control the time even more strictly.

5.4 Results summary

In the analysis results, we can observe that the main properties have been delivered at a similar level for all the considered schemas:

bandwidth which is n + b where n is the size of the plain message and b is a small constant (12 or 40) depending on the key size and the scenario (PoW vs. non-PoW scenario),
stealthiness which is achieved on an acceptable level following the entropy calculation and K-S test results,
robustness which is simply not considered at this stage of research.

Besides analyzing the main properties, for one scenario, we also introduce intentional effort on the receiver's side based on the proof-of-work concept. The proposed mechanism works according to the expectation and also provides a flexible way of controlling the time required by the receiver to restore the message. Finally, both schemas, either based on standard encryption (DES) or lightweight PRNG, meet the expectation.

In summary, the current results are very promising and give many possibilities for further investigation. Some of our future research plans are described in the next section.

6 CONCLUSIONS

In this paper, we analyzed the problems of randomization or encryption of messages as separate but similar concepts but consider them in the context of hiding information. Moreover, we propose a scheme that avoids key exchange, which is inspired by proof-of-work approaches and comes at the cost of “key mining” on the covert sender or receiver side. To reduce computational effort without compromising the randomness of the resulting message, we also introduce a mechanism that substitutes symmetric block encryption algorithms with T-functions. Finally, we implement both solutions, and based on the results of the performance evaluation, we show that randomization without encryption can be achieved at a smaller cost.

Our future work will be focused on the following areas:

exploring more schemes with proof-of-work both on the covert sender and covert receiver side,
exploring other Blockchain-related techniques (e.g., Smart Contracts) including leveraging Blockchain platforms themselves,
exploring other available hashing, cryptographic, and PRNG functions for this purpose,
performing more experiments to characterize better performance differences,
presenting a more in-depth analysis of the obtained output leveraging, e.g., AI techniques.

REFERENCES

Mohammed M Alani. 2010. Testing randomness in ciphertext of block-ciphers using DieHard tests. Int. J. Comput. Sci. Netw. Secur 10, 4 (2010), 53–57.
Krzysztof Cabaj, Luca Caviglione, Wojciech Mazurczyk, Steffen Wendzel, Alan Woodward, and Sebastian Zander. 2018. The New Threats of Information Hiding: The Road Ahead. IT Professional 20, 3 (2018), 31–39. https://doi.org/10.1109/MITP.2018.032501746
Luca Caviglione and Wojciech Mazurczyk. 2022. Never Mind the Malware, Here's the Stegomalware. IEEE Security & Privacy 20, 5 (2022), 101–106. https://doi.org/10.1109/MSEC.2022.3178205
William Feller. 1968. An Introduction to Probability Theory and Its Applications vol. 1 (3rd ed.). John Wiley & Sons, Hoboken, NJ.
Philippe Flajolet and Andrew M. Odlyzko. 1989. Random Mapping Statistics. In Advances in Cryptology - EUROCRYPT ’89, Workshop on the Theory and Application of of Cryptographic Techniques, Houthalen, Belgium, April 10-13, 1989, Proceedings(Lecture Notes in Computer Science, Vol. 434), Jean-Jacques Quisquater and Joos Vandewalle (Eds.). Springer, Berlin, 329–354. https://doi.org/10.1007/3-540-46885-4_34
Markus Jakobsson and Ari Juels. 1999. Proofs of Work and Bread Pudding Protocols(Extended Abstract). In Secure Information Networks: Communications and Multimedia Security IFIP TC6/TC11 Joint Working Conference on Communications and Multimedia Security (CMS’99) September 20–21, 1999, Leuven, Belgium, Bart Preneel (Ed.). Springer US, Boston, MA, 258–272. https://doi.org/10.1007/978-0-387-35568-9_18
Alexander Klimov and Adi Shamir. 2002. A New Class of Invertible Mappings. In Cryptographic Hardware and Embedded Systems - CHES 2002, 4th International Workshop, Redwood Shores, CA, USA, August 13-15, 2002, Revised Papers(Lecture Notes in Computer Science, Vol. 2523), Burton S. Kaliski Jr., Çetin Kaya Koç, and Christof Paar (Eds.). Springer, Berlin, 470–483. https://doi.org/10.1007/3-540-36400-5_34
Honorio Martín, Enrique San Millán, Luis Entrena, Julio César Hernández Castro, and Pedro Peris-Lopez. 2011. AKARI-X: A pseudorandom number generator for secure lightweight systems. In 17th IEEE International On-Line Testing Symposium (IOLTS 2011), 13-15 July, 2011, Athens, Greece. IEEE Computer Society, New York, NY, 228–233. https://doi.org/10.1109/IOLTS.2011.5994534
Wojciech Mazurczyk, Krystian Powójski, and Luca Caviglione. 2019. IPv6 Covert Channels in the Wild. In Proceedings of the Third Central European Cybersecurity Conference (Munich, Germany) (CECC 2019). Association for Computing Machinery, New York, NY, USA, Article 10, 6 pages. https://doi.org/10.1145/3360664.3360674
Wojciech Mazurczyk, Steffen Wendzel, Sebastian Zander, Amir Houmansadr, and Krzysztof Szczypiorski. 2016. Information Hiding in Communication Networks: Fundamentals, Mechanisms, Applications, and Countermeasures (1st ed.). Wiley-IEEE Press, Hoboken, NJ.
Alfred Menezes, Paul C. van Oorschot, and Scott A. Vanstone. 1996. Handbook of Applied Cryptography. CRC Press, Boca Raton, FL. https://doi.org/10.1201/9781439821916
Aleksandra Mileva and Boris Panajotov. 2014. Covert channels in TCP/IP protocol stack - extended version-. Open Computer Science 4, 2 (2014), 45–66. https://doi.org/10.2478/s13537-014-0205-6
Paweł Rajba and Wojciech Mazurczyk. 2021. Information Hiding Using Minification. IEEE Access 9 (2021), 66436–66449. https://doi.org/10.1109/ACCESS.2021.3077197
David Salomon. 2007. Data compression - The Complete Reference, 4th Edition. Springer, Berlin.
Robert Sedgewick and Philippe Flajolet. 1996. An introduction to the analysis of algorithms. Addison-Wesley-Longman, Upper Saddle River, NJ.
Juan Soto. 1999. Randomness testing of the advanced encryption standard candidate algorithms. US Department of Commerce, Technology Administration, National Institute of Standards and Technology, Washington, D.C.
Yukiyasu Tsunoo, Teruo Saito, Hiroyasu Kubo, and Tomoyasu Suzaki. 2007. Cryptanalysis of Mir-1: A T-Function-Based Stream Cipher. IEEE Trans. Inf. Theory 53, 11 (2007), 4377–4383. https://doi.org/10.1109/TIT.2007.907340
Sebastian Zander, Grenville Armitage, and Philip Branch. 2007. A survey of covert channels and countermeasures in computer network protocols. IEEE Communications Surveys & Tutorials 9, 3 (2007), 44–57. https://doi.org/10.1109/COMST.2007.4317620

FOOTNOTE

¹This subsection is based on [11].

²AKARI-X [8] is a pseudo-random number generator with a bijective state-transition function, which can be considered as an encryption function with a fixed key.

CC-BY license image
This work is licensed under a Creative Commons Attribution International 4.0 License.

ARES 2023, August 29–September 01, 2023, Benevento, Italy