Similar to any modern radio system, GNSS receivers are composed of an analog stage, used for the recovery of Radio Frequency (RF) signals (e.g., antenna, amplifiers, Analog-to-Digital Converter (ADC)s), and a digital stage, used for DSP and extraction of the signal data. SyDR mainly focuses on the digital stage’s DSP algorithms. As detailed in our previous work [
9], the goal of SyDR is not only to propose a software implementation of a GNSS receiver in a high-level programming language—Python—but also to render it modular enough to enable benchmarking of algorithms within a realistic GNSS receiver architecture. The benefits for researchers include, among others, avoiding redundant development and focusing research efforts on the underlying algorithms. Using a well-defined framework also ensures any changes are isolated to the algorithms, thus enhancing their comparability. We start by recalling the typical GNSS processing stages before diving into the motivations to go closer to the hardware through the PYNQ platform.
2.1. GNSS Processing Stages and Computational Complexity
A processing chain embedded in GNSS receiver can be simplified into the following four main processing stages, which are summarized in
Figure 1 and as follows [
5]:
- 1.
Acquisition: Any new signal scheduled for tracking must first pass through an acquisition stage, which identifies the visible satellites in the sky and searches for the initial Doppler and code shifts of each of the satellites visible in the sky at the location of interest. The size of the acquisition search space depends on the current status of the GNSS receiver (i.e., cold/warm/hot start), and of the possible access to Assisted-GNSS data (e.g., cellular radio, internet). If this stage fails due to the signal not being present or being suppressed by noise, the receiver will not proceed to the next stage.
- 2.
Tracking: Once the coarse parameters for the Doppler and code shifts have been found, their values must be fine-tuned and continuously tracked to precisely align the signal replica with the incoming signal to retrieve the navigation message (i.e., data bits). If the tracking is lost, the receiver reverts to the acquisition stage, though a very limited search space is needed in most cases. Otherwise, the receiver remains in this stage until shutdown.
- 3.
Decoding: Recovery of the navigation data bits becomes possible when the replica perfectly aligns with the incoming signal. These bits encode the timestamp necessary to compute the measurements used for user positioning.
- 4.
Positioning: Using the tracking measurements and the decoded navigation data from multiple satellites, the receiver can recompose the pseudoranges and compute a position (e.g., Standard Point Positioning (SPP), Differential GNSS (DGNSS), Precise Point Positioning (PPP)).
Steps 1, 2, and 3 should be performed concurrently for tracking each signal. In this paper, we encapsulate these functions within the concept of channels, where each channel is responsible for tracking the signal from one satellite.
The acquisition is often highlighted as the most computationally complex part [
1] as it might involve a significant number of arithmetic operations for performing convolutions. While the use of the FFT in modern receivers dramatically reduces this complexity, from
(Serial Search) to
(Parallel Code Phase Search (PCPS)) [
1], the acquisition stage still requires more operations than the tracking stage. Moreover, using the FFT in hardware has proven challenging, as discovered in our experiments reviewed in
Section 5. As tracking “only” requires a few correlation operations, its computational complexity will in any case be lower than the one in the acquisition stage. The actual complexities of the DSP operations of a GNSS receiver are highly dependent on the scenario assumptions (e.g., algorithms used, environment, Assisted-GNSS, user dynamics, etc.) [
1].
Nevertheless, using the SyDR platform, it is possible to list these assumptions and have a better idea of the actual complexity in processing time. Absolute processing times are irrelevant, as they are affected by many factors (e.g., hardware technology, operating system, programming language). However, relative processing times can enable comparison between different algorithms of the same processing stage or between processing stages. The former allows for identifying an optimal algorithm based on predefined KPIs (e.g., complexity, quality, reliability); the latter helps us expose critical points of the processing chain with high computational complexity. Optimizing these points can lead to large energy savings with minimal negative impact on the results.
In
Figure 2, we review the processing time needed for the different functions of SyDR. The processing is based on real data recorded in our lab and replayed in post-processing within SyDR. The details of the RF signal, algorithms, and parameters used are provided in
Table 1. Execution time is measured using the
time.perf_counter_ns functionality in Python, which is recommended in the Python PEP guidelines for performance comparison [
25].
In these plots, we separate the processing stages into a number of functions within SyDR to achieve a more fine-grained resolution of the processing costs in and between the stages. We evaluate this processing cost over a scenario with 15 s of RF data (details in
Table 1). We also propose two metrics to evaluate the processing cost: the average processing time per epoch (left) normalized by the highest-cost process (i.e., Signal Search), and the total processing time (right) normalized by the total processing time. An “epoch” is defined here as each millisecond of RF data fed to the channel. Normalization of the results is performed to ease comparison. We do not include the positioning stage in our comparison, as it is performed outside the processing flow of a single channel. However, we assume it is much less computationally expensive than both the acquisition and tracking stages, knowing the low rate with which the positioning is executed (∼1–10 Hz) compared with the other stages (∼1000 Hz) and the few arithmetic operations required.
While falling victim to the aforementioned limitations and pitfalls of comparing processing times of software, the results presented can still provide some gross/coarse insights on the promising regions for optimization. For instance, the assumption that acquisition is more costly than tracking is true and clearly visible in the left graph. In fact, the difference is greater than two orders of magnitude when compared to correlator operations. However, when integrating this processing time over several seconds of data, tracking has the highest total processing time by far. This is expected as the acquisition is performed only once at the beginning of this scenario, and the channel spends most of the processing time in the tracking stage. Yet, the important conclusion is that while acquisition might be more power-hungry than tracking, minor optimizations in the tracking stage could ultimately lead to considerably larger energy savings. Thus, while research over the last years has focused on reducing acquisition complexity [
1], greater energy savings could be achieved with improvements in the correlator complexity.
Notably, several assumptions need to be taken into account for an enlightened interpretation of these results, besides the parameters listed in
Table 1, such as the following:
From this discussion, it is clear that SyDR can help assess the cost of each stage in the GNSS processing chain. However, given the high abstraction level and associated overheads of Python, benchmarking in SyDR is restricted to the software level. To propel our benchmarks to assess energy consumption better, we propose moving the processing to a more constrained environment, namely, a dedicated hardware platform. We denote this version of our tool “Hard SyDR”.
2.2. Moving Closer to Hardware
Thanks to the improvements in computers and microelectronics over the last 25 years, algorithm developments in GNSS processing have slowly transitioned from hardware to software implementation [
9]. While this has greatly simplified the development and verification of new algorithms, it has complicated realistically estimating the cost of using a specific algorithm within the processing chain. Relative comparisons between algorithms can still be achieved, assuming they are developed with a similar quality (e.g., optimized resource usage) and benchmarked over identical datasets. However, absolute estimates of their energy consumption are not feasible, given the potential interference of concurrent operations running on the same processor.
Consequently, we consider moving the algorithms back to hardware for more realistic estimates of their costs. Indeed, doing so not only permits the collection of additional KPIs related to resource consumption and timing, but it can also help pinpoint any potential challenges pertaining to the hardware implementation of an algorithm not immediately visible from the software perspective. While one may argue that these metrics are better collected from the type of platform that the algorithms are expected to be implemented in, i.e., ASICs, for ease of use, we choose FPGAs as the target hardware platform. We acknowledge that, owing to their reconfigurability, FPGAs are known to be slower, and less area- and energy-efficient than ASICs. Yet, the differences are well-established in the literature [
27]. Hardware simulations of ASICs and FPGAs are feasible but unattractive due to long execution time and occasionally doubtful accuracy [
28,
29].
Using FPGAs for GNSS receiver development is not new and has been performed in several SDRs. We have summarized examples of such in a previous work [
9]. However, all previous implementations focus on real-time processing and, therefore, use FPGAs for hardware acceleration (e.g., GNSS-SDR [
6], Tiira [
30]). We take a different approach and propose to review the applicability of reconfigurable platforms like FPGAs for power estimation and benchmarking.
In the context of benchmarking, real-time processing is not a necessity, as it is more interesting to replay and post-process well-defined datasets to accurately compare algorithms. While post-processing can be carried out by a real-time processing system (i.e., using a Universal Software Radio Peripheral (USRP) to replay an RF signal), this approach over-complicates the software and overall system greatly with no added value to our analyses. This is the prime reason for developing SyDR instead of using a more advanced open-source tool, such as GNSS-SDR. However, our approach is not to be confused with GNSS-SDR [
6], which used GNU-Radio to further replicate the behavior of a USRP using Python code. In our applications, we only target evaluating the energy efficiency of GNSS algorithms at the hardware layer and do not review any aspect of the hardware used for recording.
Interoperability and usability have been major drivers for the development of SyDR from the start, as defined in [
9]. Consequently, with Hard SyDR, we look into a way to move from software to hardware while retaining modularity. The present work is a proof-of-concept that probes the feasibility of such developments.
2.3. The PYNQ Flow
To the best of our knowledge, no straightforward way to transform Python code into hardware designs exists. We are aware of several Python-embedded Hardware Description Languages (HDLs), including Amaranth [
31], MyHDL [
32], and Mamba [
33], but their use in SyDR would require vast re-development efforts. Moreover, as we only aim for a partial conversion of SyDR, we need a software framework and a hardware platform that support low-overhead communication between each other and with which a synthesizable hardware design can be attained with minimal re-development efforts.
These characteristics match exactly the purpose of recent MPSoC and RFSoC systems proposed by the leading FPGA manufacturers, Intel and AMD, which integrate Central Processing Units (CPUs) with an FPGA on a single chip. These platforms enable previously unseen degrees of freedom for implementing processing in the CPUs and FPGA concurrently. For our purposes, the AMD Zynq platform is a relevant example of such a platform as it is easily accessible through the associated Python-embedded PYNQ flow [
12,
34]. Thus, we chose the AMD KV260 development board [
35] to perform the first implementation of our system. This board is used as our target platform throughout the paper.
Zynq chips are logically split into two parts: the Processing System (PS) comprising general-purpose (ARM) CPU cores, caches, I/O devices, etc., and the Programmable Logic (PL) comprising an FPGA [
34]. An Advanced eXtensible Interface (AXI)-based Network on Chip (NoC) serves to interconnect the PS’s devices with the PL. The PS–PL interface in particular comprises a total of nine master or subordinate interfaces with bit-widths of either 32 or 64 available from within the PYNQ environment [
36], which is specified in more detail in [
37], Chapter 35. The PYNQ flow abstracts reasoning about data movement over these interfaces as well as the dynamic loading of different designs (denoted by
overlays) in the PL from Python code running in the PS [
12]. These features render PYNQ an ideal fit for our proof-of-concept implementation and directly enable our partial conversion of SyDR with only the processing algorithms ported to an FPGA.
However, to be implemented on an FPGA, a hardware design must be
synthesized into a
bitstream using AMD Vivado (version 2022.2). This requires algorithms to be written in an HDL like VHDL or (System)Verilog. This implies a step of design conversion either directly from Python to HDL, or from Python via a lower-level language to HDL. Although the first option is more direct, it generally involves long development cycles for design and verification. Instead, we opt for the second option and choose C++ as the intermediate language, as it permits us to shortcut the hardware design process by using HLS to generate HDL code [
38]. Both conversion strategies are illustrated in
Figure 3.
Using C++ and HLS has multiple benefits: (1) it reduces the required development efforts; (2) it enables using the AMD Vitis HLS (version 2022.2) flow to automatically configure the interfaces needed between the PS and the PL [
12]; (3) it permits testing algorithms without an FPGA board using a library for interfacing external functions with Python, e.g.,
ctypes [
39]; and (4) in the future, it may allow us to explore compiler-enabled AxC techniques such as precision tuning [
40]. Moreover, HLS simplifies design space exploration into dataflows, pipelining, and initiation intervals of functions and loops with precompiler annotations [
38]. This renders it an ideal fit for our proof-of-concept, benchmarking-focused use case.
We are aware that the typical HLS flow tends to produce hardware designs with lower performance and higher resource consumption than expert-written HDL equivalents [
41]. Yet, as the latter would require intricate knowledge both of GNSS processing and development in HDLs, we see HLS as a reasonable shortcut to obtain coarse, yet indicative, hardware results. More importantly, we note that the HLS step
can be circumvented by manual development, assuming the handwritten design remains compatible with the defined PYNQ interface.