4.2 Packet Structure Inference
The official document [
22] provides a formula to calculate the number of symbols in a packet:
where
\(PL\) is the number of payload bytes,
\(SF\) is spreading factor,
\(CRC\) is 1 if CRC check is enabled and otherwise 0,
\(IH\) is 1 if implicit header mode (without header) is enabled and 0 if explicit mode (with header) is enabled, and
\(DE\) is 1 if LDRO is enabled and otherwise 0.
We can derive the following
properties from Equation (
6):
•
A LoRa packet has at least eight data symbols as \(n_{sym} \ge 8\).
•
\(2PL + 4CRC\): The \(n_{sym}\) of a packet with \(PL\) payload bytes and CRC check enabled is equal to the \(n_{sym}\) of a packet with \(PL+2\) payload bytes and CRC check disabled. We can infer that the CRC check takes 2 bytes in the packet.
•
\(2PL + 4CRC - 5IH\): Similarly to CRC, we can infer the header takes 2.5 bytes.
•
\(SF - 2DE\): SF is the number of bits in a data symbol. Enabling LDRO causes the reduction of two bits per data symbol.
•
\(\frac{4}{CR}\): The number of data symbols increase with the unit of \(\frac{4}{CR}\) (CR=\(\frac{4}{5}, \frac{4}{6}, \frac{4}{7}\), or \(\frac{4}{8}\)). For example, \(CR=\frac{4}{7}\) means the packet applies (7,4) Hamming code, and \(n_{sym}\) has the form \(\frac{4}{CR}n+8 = 7n+8 (n=0, 1, 2,\dots)\).
Based on the properties, we then manipulate the data packets to infer the packet structure. We first gradually increase packet size and observe a stairlike increase of the number of data symbols in real signals. If \(CR=\frac{4}{7}\), then the number of symbols \(n_{sym}\) would increase following an arithmetic sequence (i.e., \(8,15,22,\dots\)). We also observe that when continually increasing the packet size, the newly added bytes are encoded in the last \(\frac{4}{CR}\) symbols, and the first \(\frac{4}{CR}(n-1)+8\) symbols do not change. It reveals that data bytes are encoded by a \(\frac{4}{CR}\) symbols block.
Further, we find that the first eight symbols are specially treated. Whether in explicit or implicit header mode, the first eight symbols’ values always appear in the form of \(4k+1(k=0,1,2,\dots)\), which means the last two bits of data are discarded, and there are \(SF - 2\) bits data in each symbol. We can infer that the first eight symbols are in LDRO mode, i.e., \(DE=1\). LDRO gives better protection, and the header may be in the first eight symbols. We know that LoRa uses coding rate \(CR\) to protect the payload, and the payload bytes are encoded by a \(\frac{4}{CR}\) symbols block. Similarly, the first eight symbols form an eight symbols block, and we guess that it uses a coding rate \(\frac{4}{8}\) to give the header the highest protection (4 parity bits for a nibble). We also observe that the first eight symbols encode payload bits in implicit header mode. Such a design can reduce the complexity of the decoding hardware complexity, because the explicit and implicit header mode can now be processed with the same procedure.
Figure
8 shows the inferred packet structure of a packet with LDRO mode disabled/enabled. In LDRO mode, the last two bits in each symbol are discarded. No matter whether the LDRO mode is enabled or disabled, the first eight symbols are in LDRO mode and use coding rate
\(\frac{4}{8}\). The following symbols are in blocks containing
\(\frac{4}{CR}\) symbols. The first four symbols encode data bits while the rest symbols encode parity bits.
Packet Structure Verification. We verify the above packet structure by comparing Equation (
6) and the calculated symbol number using the inferred packet structure. For
\(SF\ge 7\), the first eight symbols contain
\(8\cdot (SF-2) \cdot \frac{4}{8} = 4SF - 8 \ge 20\) data bits. Therefore, the first eight symbols could always include all header information. Except the header, the first eight symbols can contain
\(4(SF - 2) - 20(1-IH) = 4SF +20IH -28\) bits payload. Suppose there are
\(PL\) bytes (
\(8PL\) bits) payload in total. If CRC is enabled, then additional 16 bits are needed. Except the first eight symbols, the total number of required data bits for the rest symbols is
\(8PL + 16CRC - 20IH - 4SF + 28\). Whitening and Gray decoding do not affect the total number of bits and the total number of symbols. The Interleaving operation only requires us to pad redundant bytes when the rest payload bytes cannot fill a block, thus not affecting the total number of symbol blocks. Each block has
\(\frac{4}{CR}\) symbols, and each symbol in the block could encode
\(SF-2DE\) bits. Due to the existence of parity bits, only
\(4(SF-2DE)\) bits in a block are actual data bits. Then the total number of symbols in the packet is
It is obvious that formula (
7) is the same as Equation (
6), which means the inferred packet structure is correct.
4.3 Order of Decoding Operations
In this section, we try to reveal the order of the four main processes in LoRa decoding: Gray coding (G), deinterleaving (I), Hamming decoding (H), and dewhitening (W). It is easy to know that Gray coding should be the first step, because it intends to solve the symbol adjacent drift problem (see Gray Coding in Section
4.4). Hamming decoding operates on bytes stream while a LoRa symbol contains
\(SF-2DE\) bits. So it relies on the transformed bytes stream after deinterleaving. Thus, the order of “G, I, H” should be “G
\(\rightarrow\)I
\(\rightarrow\)H.” Hence, dewhitening has three possible positions, i.e., after “G,” “I,” or “H.” The implementations of BR, RPP0, and TAPP assume three different dewhitening positions, respectively.
We prove that the position of dewhitening does not affect the decoding results. Because the encoding process order is the reverse of decoding order, we use the operation in encoding to show the influence of the position of “W.” Consider the data as a bit vector
\(D\), then interleaving is an operation that rearranges the position of bits. We can represent interleaving as a matrix
\(I\) with each row or column containing only one “1.” Hamming coding, as a linear coding method, can be represented as a matrix
\(H\). We have
\(H\cdot D = [P\cdot D ~~ D]\), where
\(P\) is the parity matrix. Whitening is
\(W \oplus D\), where
\(W\) is a random bit vector and
\(\oplus\) is exclusive-or (XOR). Following are the three possible orders of decoding and the corresponding encoding operations:
where
\(W_1\),
\(W_2\), and
\(W_3\) are three different random bit vectors. Formula (
8b) can be rewritten as
Therefore, formula (
8a) and formula (
9) are equivalent (let
\(W_1 = I\cdot W_2\)). Formula (
8c) can be rewritten as
Formula (
10) is a special case of formula (
8b) (let
\(W_2 = [P\cdot W_3 ~~ W_3]\)). Therefore, both “I
\(\rightarrow\)W
\(\rightarrow\)H” and “I
\(\rightarrow\)H
\(\rightarrow\)W” can be represented by “W
\(\rightarrow\)I
\(\rightarrow\)H,” which means the position of “W” does not affect the final decoding results. We will show the correct position for “W” in Section
4.4.
For the convenience of analysis, we first assume the decoding order is “G\(\rightarrow\)W\(\rightarrow\)I\(\rightarrow\)H.” Because the even parity of all-zeros is zero and interleaving in LoRa is just a diagonal realignment of all bits, the output codewords after “H” and “I” are zeros when the input bits are all zeros. Here we assume that LoRa adopts commonly used even-parity check, and the final results show that this assumption is correct. “W” is the XOR of data values and a pseudo-random sequence. When the decoding order is “G\(\rightarrow\)W\(\rightarrow\)I\(\rightarrow\)H,” as \(x \oplus 0 = x\), if we set all transmission bits to zeros, then the output values after “G” are the dewhitening sequence. Then, we could use the derived dewhitening sequence to recover the temporal results before “I” and “H” for further analysis.
4.4 Configuration of Each Operation
This section reveals the key configurations of the four operations in LoRa decoding: Gray coding, deinterleaving, Hamming decoding, and dewhitening. We also show how to reveal the header structure and CRC polynomial.
Gray Coding. Gray coding is widely used in many wireless communication systems, which is a mapping from a bit vector to a binary representation. The adjacent representations of Gray coding only have a one-bit difference. In wireless communication systems, it is more likely to happen that we may misidentify a symbol to its adjacent symbol rather than another random symbol. For example, in LoRa modulation, the adjacent bin drift is common as described in Section
3.2. With Gray coding, the bit error caused by adjacent misidentification is reduced to one bit per symbol, which has a high possibility to be corrected by the error correction mechanism. The standard Gray coding can be expressed as
where
\(v_0\) represents the raw demodulated bin value,
\(\gt \gt\) is the bitwise right shift operation, and
\(v\) is the Gray coding output. However, when we continue our analysis based on natural symbol mapping and standard Gray coding, we cannot decode packets with a 100% decoding success rate. Project RPP0 suffers from this problem and still contains an unfixed bug.
7 To understand the reason, let us recall the LDRO mode of LoRa. when LDRO is enabled, the encoder will put data bytes into high
\(SF-2\) bits of a symbol and then add “1” to reduce the influence of bin drift. From the hardware perspective, it is reasonable to add “1” under any condition, because we can save the circuit cost to judge whether LDRO is enabled. We guess that all symbols output from the encoder have one bin shift. If it is true, before we apply Gray coding, then we should subtract “1” from the demodulation results. Fortunately, our guessing is supported by the decoding results. The process of “G”in LoRa decoding could be modified as
Note that TAPP also applies similar operations on Gray coding as we do, but they explain it as a different Gray coding mechanism and uses brute-force algorithm to derive the additional ”one.” However, it seems unnatural to use a non-standard Gray coding. By contrast, our explanation is more reasonable.
Deinterleaving. Interleaving is used to reduce the impacts of burst errors. With interleaving, the errors could be distributed to multiple bit groups and corrected by
forward error correction (FEC). It is mentioned [
26] that LoRa applies diagonal interleaving instead of conventional row-line interleaving. Figure
9(a) shows the column-major order row-line interleaving of eight symbols. For row-line interleaving, the LSBs (
\(b_{i,1}\)) of the eight symbols are assembled into a byte. From Section
3, we know that the LSBs of a symbol are more fragile than the
most significant bits (MSBs) of the symbol. From the perspective of FEC, it is a bad design to group fragile bits together. Figure
9(b) shows diagonal interleaving used in LoRa. Diagonal interleaving distributes the fragile LSBs into different bytes and is more robust. We then manipulate the transmitted packets to derive the detailed diagonal mapping. As an example, we send packets with
\(SF=8\) and
\(CR = \frac{4}{8}\) in implicit header mode. Thus, the interleaving block is an 8
\(\times\) 8 block as shown in Figure
9(b). First, we assume the FEC used in
\(CR=\frac{4}{8}\) is the standard (7, 4) Hamming code with one bit extension. Therefore, after “G” and “W,” the codeword for nibble “0000” is “00000000,” and the codeword for “1111” is “11111111.”
8 Suppose the sending bytes are all zeros except that the fourth byte is 0x0F, we observe
\(b_{11}=b_{22}=\cdots =b_{88}=1\) in Figure
9(b). Therefore the main diagonal represents the fourth byte 0x0F. The one bin shift problem mentioned in Gray coding reflects here that we cannot always get eight ones in a block if we directly apply the standard Gray coding. Shifting the mapping by one solves this problem and perfectly matches our following decoding process. By changing the all-1 data bits in the transmitted packet, we can derive the entire mapping for interleaving as shown in Figure
9(b). For other parameters, the deinterleaving process is similar. The only difference is that the block size becomes
\(\frac{4}{CR}\times (SF-2DE)\). As a result, we summarize the deinterleaving process as
where
\(c_{i,j}\) is the
\(j{\rm th}\) bit of the
\(i{\rm th}\) codeword after deinterleaving,
\(b_{j,i}\) is the
\(i{\rm th}\) bit of the
\(j{\rm th}\) symbol after Gray coding, and
\(i \in \lbrace 1, 2, \ldots , \frac{4}{CR}\rbrace , j \in \lbrace 1, 2, \ldots , SF-2DE\rbrace\).
Hamming Decoding. After deinterleaving, we get the encoded data in the form of codewords, but the position of data bits and parity bits in a codeword are still unknown. LoRa provides four valid CR values (
\(\frac{4}{5}, \frac{4}{6}, \frac{4}{7}, ~\text{and}~ \frac{4}{8}\)), which determine the codeword length and thus FEC strength. When CR is set to
\(\frac{4}{7} ~\text{or}~ \frac{4}{8}\), LoRa applies Hamming
\((7,4)\) code or extended Hamming code with one additional parity bit. When CR is set to
\(\frac{4}{5} ~\text{or}~ \frac{4}{6}\), LoRa can detect bit errors but cannot correct them. First, we assume that a nibble with code rate
\(\frac{4}{8}\) is protected by the standard Hamming
\((8,4)\) code. But our final analysis results show that this assumption requires some modifications. To determine the position of each data/parity bit in the codeword, we could vary the sending bytes but keep one specific bit fixed. For example, to test the position of the LSB of the fourth byte, we set the transmitting bytes as 0x01, 0x03, 0x05,
\(\cdots\), 0x0F.
9 Then the bit that is always 1 in the interleaving block is our target, i.e., LSB in this case. We find that the four LSBs in the codeword are data bits while the four MSBs are parity bits, which differs from the standard Hamming
\((8,4)\) code
\(p_1p_2d_1p_3d_2d_3d_4p_4\), where
\(d_i\) is the
\(i\)th data bit and
\(p_i\) is the
\(i\)th parity bit. Equation set (
14) shows the relation between data bits and parity bits in standard Hamming
\((8,4)\) code,
We denote the four LSBs as
\(d_1, d_2, d_3, d_4\) sequentially. To find which bit in the MSBs is
\(p_i\), we vary the data bits to keep
\(p_i = 1\). For example, selecting the codewords with
\(d_1 \oplus d_2 \oplus d_4 = 1\), the parity bit that always equals “1” is the parity bit
\(p_1\). Similarly, the positions of
\(p_2\) and
\(p_3\) are derived. However, the remaining parity bit does not fit for the definition of
\(p_4\). After careful observation, we find that it is a parity covering
\(d_1, d_2\) and
\(d_3\), i.e.,
For other code rates, we can derive the bit position similarly. When
\(CR = \frac{4}{7}\) is used, parity bit
\(p_1\) is abandoned. When
\(CR = \frac{4}{6}\) is used, parity bits
\(p_1\) and
\(p_2\) are abandoned. When
\(CR = \frac{4}{5}\) is used, there is only one parity bit and it is natural to use
\(p_4\) to cover all bits. The conclusion does not change when SF varies. Figure
10 shows the bit positions for different code rates. Note that LoRa applies non-standard Hamming code, the naming number of parity is arbitrary and our naming is just one kind of them.
Dewhitening. In Section
4.3, we assume the dewhitening operation happens after “G” and we prove that the dewhitening position does not affect the final results. We here discuss the “correct” position for dewhitening. The process of deinterleaving and Hamming decoding has been interpreted clearly above. We move the position of “W” to the other two positions and send all-zero bytes to derive the whitening sequence under different order selections. We find that “G
\(\rightarrow\)W
\(\rightarrow\)I
\(\rightarrow\)H” gives different whitening sequences for various combinations of SF and CR while the whitening sequence of “G
\(\rightarrow\)I
\(\rightarrow\)W
\(\rightarrow\)H” and “G
\(\rightarrow\)I
\(\rightarrow\)H
\(\rightarrow\)W” keep the same. Using the same sequence for all packets is more reasonable. Thus we first exclude “G
\(\rightarrow\)W
\(\rightarrow\)I
\(\rightarrow\)H.” Then we need to choose the correct order from “G
\(\rightarrow\)I
\(\rightarrow\)W
\(\rightarrow\)H” and “G
\(\rightarrow\)I
\(\rightarrow\)H
\(\rightarrow\)W.” LoRa chip datasheet [
22] mentions that the whitening sequence in FSK mode is generated by a
Linear Feedback Shift Register (LFSR). We guess that there also exists a LFSR generating the whitening sequence for LoRa mode. We apply the Berlekamp–Massey algorithm [
38] on the whitening sequences of the two decoding orders to obtain the corresponding LFSR. For “G
\(\rightarrow\)I
\(\rightarrow\)W
\(\rightarrow\)H,” no matter how we change the input order of the bits (e.g., LSB first or MSB first), the minimal LFSR size we get from the Berlekamp–Massey algorithm is at least 64. But we know
\(2n\) bits sequence could always be constructed from an n-bits LFSR. Any sequence containing 128 bits could be represented as the output of a 64-bits LFSR. The LFSR of the “G
\(\rightarrow\)I
\(\rightarrow\)W
\(\rightarrow\)H” sequence is too long. Carefully checking the whitening sequence of “G
\(\rightarrow\)I
\(\rightarrow\)H
\(\rightarrow\)W” as shown in Figure
11, we find that it is not like typical random bits, and each byte seems to be the state of an LFSR. We collect the MSBs of the sequence bytes, as shown in the red box of Figure
11, to run the Berlekamp–Massey algorithm. The LFSR polynomial for deriving such sequence is
\(x^8+x^6+x^5+x^4+1\). It can be seen from the literature [
39] that it is the maximal-length polynomial for 8-bits shift-register. Figure
12 shows the structure of the LFSR, and we can see that all the eight bits of the register compose a byte of the whitening sequence, and each bit of the register is used to whiten one-bit data. Since the whitening sequence of “G
\(\rightarrow\)I
\(\rightarrow\)H
\(\rightarrow\)W” can be generated by a LFSR with a much smaller length, we consider it the correct order.
4.5 CRC
After four steps of Gray coding, deinterleaving, Hamming coding, and dewhitening, the raw LoRa PHY packets are shown. In previous sections, we send packets in implicit header mode and disable CRC check. We next try to analyze the CRC algorithm used in the LoRa payload. From our observation in Section
4.2, CRC checksum occupies 16 bits at the tail of a packet. Therefore, we first send explicit header and implicit header packets with the same data content to check the coverage of CRC. The results show that the CRC is only performed on the payload. Calculating the CRC checksum of a series of bits, in theory, is considering it as a large binary number and calculating the remainder with respect to a “divisor.” The divisor is usually represented as a polynomial based on
\(GF(2)\) (Galois field of two elements), e.g.,
\(x^3+x+1\). Analyzing the CRC part of LoRa PHY is indeed finding the polynomial used. A feature of CRC is that for a polynomial of degree
\(n\), if the original dividend is 1, then the remainder is the polynomial itself. If we construct a packet only with last bit equaling 1, then the CRC checksum in the last two bytes tells us exactly what polynomial is used. Meanwhile, the CRC polynomial cannot be randomly selected; it must follow some principles to ensure some specific requirements. Therefore, it is natural to choose the polynomial from standard CRC polynomials. Comparing the polynomial we derive and the standard polynomial sets, it takes little effort to find the actual polynomial used. Based on the above analysis, we enable the payload CRC in implicit header mode and set the last two bytes of payload as 0x0001, 0x0080, 0x0100, and 0x8000, respectively (other bytes are set to zero). Since we are not sure the endian and LSB-MSB order in CRC hardware implementations, we test the four possible combinations. The result, however, surprisingly, is beyond our expectation. Whatever we set in the last two payload bytes, CRC checksum are exactly the same with the last two payload bytes. In common CRC implementation, before doing the remainder calculation, it will pad
\(n\) zero bits after data to ensure that the last
\(n\) bits are also under full CRC protection.
10 The phenomenon we observed means that the zero-padding step is not implemented in LoRa PHY. Therefore, we send 0x0001, 0x0080, 0x0100, 0x8000 at the third and fourth bytes from last to derive the polynomial. The received CRC bytes are 0x1021, 0x9188, 0x3331, and 0x1B98, respectively. 0x1021 refers to the polynomial named CCITT-16, being
\(x^{16}+x^{12}+x^5+1\). We apply CRC check with this polynomial and test packets with random bytes. The CRC checksums calculated by our implementation are consistent with the CRC bytes in all samples. Due to the characteristic of CRC, it is hard for a wrong CRC implementation to have the same checksum with the correct one even in a small group of samples, which means our result is correct.
4.6 Header
The left unknown part about LoRa PHY is the header. In Section
4.2, we have already known that the header in explicit mode has 20 bits. Our goal is to find the organization and meaning of the 20 bits.
Payload Length (PL) is said to be encoded in the header and occupies at least 8 bits, because the maximal payload length is 255. We vary the payload length and try to decode the packet using the same decoding process in implicit header mode assuming the header is also whitened. Unfortunately, we are not able to recover the data bytes in implicit header mode. The problem comes from the whitening sequence used for header. Nonetheless, by observing the bits that change with the payload length, we can identify the position of PL. Note that in the first 2.5 bytes, only the bits of PL and header CRC will change when varying packet payload length. We observe that the bits in positions 1–8 and 16–20 are possible PL bits. Additionally, the intermediate results show us that the header is not whitened and the first byte of a header before dewhitening is exactly PL. Our derived LFSR can only generate 255 possible whitening bytes. There are no more additional whitening bytes for header dewhitening. Hence it is reasonable not to whiten the packet header. Our following tests on finding other bits strengthen this assumption. Therefore, we do not apply dewhitening operation on header in the following. Changing one parameter and fixing other parameters, the position of code rate and payload-CRC-enable bit are determined similarly. The code rate bits are
\(001, 010, 011, 100\) for
\(CR=\frac{4}{5}, \frac{4}{6}, \frac{4}{7}, \frac{4}{8}\), respectively. The payload-CRC-enable bit is set to 1 if setting payload CRC on. In our experiments, there are five bits changing rapidly when setting different parameters. We assume these five bits as header CRC. There are three bits keeping zero whatever the parameter setting is, and we consider them as reserved bits. Figure
13 illustrates the header structure in the first three bytes.
We failed to find a standard CRC5 polynomial satisfying the header CRC results using the method in Section
4.5. But thanks to the linearity of CRC algorithm, we could use a CRC matrix to equivalently represent the CRC calculation.
11 We denote 12 parameter bits (PL, CR, and payload-CRC-enable bit) as a bit vector
\(v_1\). We denote the five header CRC bits as a bit vector
\(v_2\). Our target is to find a matrix
\(M\) satisfying
\(v2 = M\cdot v_1\). Since the value of
\(v_1\) is under our control, we can design a series of
\(v_1\), say,
\(v_1^{(1)}, v_1^{(2)}, \ldots , v_1^{(12)}\). They form a matrix
\(V_1\). The relative
\(v_2\) series is
\(v_2^{(1)}, v_2^{(2)}, \ldots , v_2^{(12)}\). They form a matrix
\(V_2\). Therefore,
\(V_2 = M \cdot V_1\). If
\(V_1\) is an identity matrix, then
\(M = V_2\).
How to make an identity matrix \(V_1\)? Despite the fact that we can control the header content, we cannot finely control the state of each bit. In our case, it seems impossible to send a packet with
\(v_1\) containing only a single “1.” However, to our surprise, the LoRa chip unexpectedly supports
sending a packet with zero payload length. The eight PL bits then become all zeros. If we disable payload CRC and set
\(CR=\frac{4}{5}\) or
\(CR=\frac{4}{6}\) or
\(CR=\frac{4}{8}\), then only one code rate bit in
\(v_1\) is 1. Is it possible that only the payload-CRC-enable bit is “1”? The answer is again using the linearity of CRC. Suppose we send two packets with zero payload and
\(CR=\frac{4}{5}\). Packet A has payload CRC but packet B does not. Then the corresponding bit vectors are
\(v_1^A = 000000000011, v_2^A = 01100, v_1^B = 000000000010, v_2^B = 00111\). Therefore,
A similar process can be applied for PL bits. Finally, we summarize the header CRC calculation as
\(v_2 = M \cdot v_1\), where