Modeling Results and Baseline Design for an RF-SoC-Based Readout System for Microwave Kinetic Inductance Detectors

colm bracken

Modeling Results and Baseline Design for an RF-SoC-Based Readout System for Microwave Kinetic Inductance Detectors

arXiv (Cornell University), 2022

MODELING R ESULTS AND BASELINE D ESIGN FOR AN RF-S O C-BASED R EADOUT S YSTEM FOR MICROWAVE K INETIC I NDUCTANCE D ETECTORS Colm Bracken a,b,* , Eoin Baldwin b,c , Gerhard Ulbricht b , Mario de Lucia b,c , and Tom Ray b a Maynooth University, Department of Experimental Physics, Maynooth, Co Kildare, Ireland b Dublin Institute for Advanced Studies, Astronomy & Astrophysics, 31 Fitzwilliam Place, Dublin 2, Ireland c Trinity College Dublin and CRANN, School of Physics, Dublin 2, Ireland ABSTRACT Building upon existing signal processing techniques and open-source software, this paper presents a baseline design for an RF System-on-Chip Frequency Division Multiplexed readout for a spatio- spectral focal plane instrument based on low temperature detectors. A trade-off analysis of different FPGA carrier boards is presented in an attempt to ﬁnd an optimum next-generation solution for reading out larger arrays of Microwave Kinetic Inductance Detectors (MKIDs). The ZCU111 RF SoC FPGA board from Xilinx [1] was selected, and it is shown how this integrated system promises to increase the number of pixels that can be read out (per board) which enables a reduction in the readout cost per pixel, the mass and volume, and power consumption, all of which are important in making MKID instruments more feasible for both ground-based and space-based astrophysics. The on-chip logic capacity is shown to form a primary constraint on the number of MKIDs which can be read, channelised, and processed with this new system. As such, novel signal processing techniques are analysed, including Digitally Down Converted (DDC)-corrected sub-maximally decimated sampling, in an effort to reduce logic requirements without compromising signal to noise ratio. It is also shown how combining the ZCU111 board with a secondary FPGA board will allow all 8 ADCs and 8 DACs to be utilised, providing enough bandwidth to read up to 8,000 MKIDs per board-set, an eight-fold improvement over the state-of-the-art, and important in pursuing 100,000 pixel arrays. Finally, the feasibility of extending the operational frequency range of MKIDs to the 5 - 10 GHz regime (or possibly beyond) is investigated, and some beneﬁts and consequences of doing so are presented. Keywords Frequency-Division Multiplexing, Microwave Kinetic Inductance Detector, System-on-Chip, Superconduc- tivity 1 Introduction As part of the development of a new spatio-spectral focal plane instrument being designed, fabricated, and tested by a new low-temperature detector research group in Ireland, a new Frequency Division Multiplexed (FDM) readout electronics system was studied, employing an RF System-on-Chip (SoC). The MKID [2] fabrication work is done at Trinity College Dublin (TCD) and the Centre for Research on Adaptive Nanostructures and Nanodevices (CRANN). The cryogenic testing and characterisation is carried out at the Dublin Institute for Advanced Studies (DIAS), and the readout electronics research and numerical modelling is done by Maynooth University Experimental Physics and DIAS. Low-temperature, superconducting microresonators are naturally suited for readout in the frequency domain. If a multi-gigahertz frequency range is utilised, this opens-up a large space for multiplexing huge numbers of extremely high-quality (high Q) resonators for a range of scientiﬁc applications. In contrast to, for example, Transition Edge Sensors (TESs) [3] which have to combine the sensor with additional microresonators and Superconducting Quantum Interference Devices (SQUIDs) [4] to reach higher pixel numbers, multiplexing MKIDs is much more straight forward. It is difﬁcult to put a number of the increase in pixel number possible with MKIDs compared to TES arrays. It would arXiv:2212.07938v3 [astro-ph.IM] 9 Feb 2023

depend on many values, such as available volume in the cryostat, cooling power, and feedlines in/out of the system. What is more relevant is the reduction in complexity of reading MKID arrays compared to TES arrays. The removal of circuitry (for the SQUIDs) makes scaling-up to large pixel numbers far easier with MKIDs. In the case of kinetic inductance detectors, the microresonator itself often serves as the radiation sensor, where the inductor of the thin-ﬁlm LC tank circuit acts as the sensitive area for the absorption of UV, optical, or even infrared radiation. By using an array of optical focusing elements such as microlenses [5–7], the radiation can be focused onto the inductor portion of the resonator, effectively maximising the array ﬁll-factor. MKIDs have also been successfully used to detect photons at far-infrared and submillimeter wavelengths, with alternative coupling schemes and quasioptical focusing elements such as feed horns [8–10]. Digital FDM readout electronics, based on Software Deﬁned Radio (SDR) have already been shown to be a good solution for the readout of large-format arrays of optical/near-infrared MKIDs [11–14]. MKIDs, ﬁrst demonstrated in 2003 [2], are a speciﬁc class of cryogenic superconducting detector that show the most promise for scalabillity up to pixel number of > 100, 000 (Fig. 1). MKIDs are inherently multiplexable through FDM as each detector/resonator can be lithographically fabricated to have a unique resonant frequency in the GHz range. The basic principle for operating MKIDs requires cooling the microresonators below the critical temperature for superconductivity for their material (T c ), where the conduction electrons combine into so-called Cooper pairs [15]. Typically we aim to maintain the resonators at a temperature of less than 1/8 T c . Thus, for a material with T c ≈ 800 mK, we cool the MKID device to 100 mK. Any photons with energy of more than twice the superconductor band gap incident on the material of a given resonator can cause breaking of the Cooper pairs, increasing the total inductance, which reduces the resonant frequency [16]. An in-depth explanation of the kinetic inductance effect and superconductivity in general can be found in Zmuidzinas [16] and Gao [17], and some relevant aspects of the theory of superconductivity are covered by Mattis and Bardeen [18] and Kaplan et al. [19]. Figure 1: Close-up image of an MKID array, clearly showing the geometry of a typical lumped element MKID. The inductor (smaller meandering section marked with red boundary for clarity) and the larger inter-digitated capacitor (IDC) can be seen. It is the geometry of the capacitor that is typically varied from pixel to pixel, deﬁning the unique resonant frequency for each pixel. Resonator design was based on work reported on in [20] . 2 Requirements and Methodology In this section we will layout the requirements we have set for our MKIDs, and how these requirements translate to readout speciﬁcations in terms of acceptable noise, channel isolation and ﬂatness, sampling rates for the data converters, and bandwidth of the FFT bins. Fig 3 illustrates the relatively complex manner of how some of these speciﬁcations and requirements of our MKIDs can deﬁne and constrain the speciﬁcations and cost-per-pixel of a readout system. 2

arXiv:2212.07938v3 [astro-ph.IM] 9 Feb 2023 M ODELING R ESULTS AND BASELINE D ESIGN FOR AN RF-S O C-BASED R EADOUT S YSTEM FOR M ICROWAVE K INETIC I NDUCTANCE D ETECTORS Colm Brackena,b,* , Eoin Baldwinb,c , Gerhard Ulbrichtb , Mario de Luciab,c , and Tom Rayb b a Maynooth University, Department of Experimental Physics, Maynooth, Co Kildare, Ireland Dublin Institute for Advanced Studies, Astronomy & Astrophysics, 31 Fitzwilliam Place, Dublin 2, Ireland c Trinity College Dublin and CRANN, School of Physics, Dublin 2, Ireland A BSTRACT Building upon existing signal processing techniques and open-source software, this paper presents a baseline design for an RF System-on-Chip Frequency Division Multiplexed readout for a spatiospectral focal plane instrument based on low temperature detectors. A trade-off analysis of different FPGA carrier boards is presented in an attempt to find an optimum next-generation solution for reading out larger arrays of Microwave Kinetic Inductance Detectors (MKIDs). The ZCU111 RF SoC FPGA board from Xilinx [1] was selected, and it is shown how this integrated system promises to increase the number of pixels that can be read out (per board) which enables a reduction in the readout cost per pixel, the mass and volume, and power consumption, all of which are important in making MKID instruments more feasible for both ground-based and space-based astrophysics. The on-chip logic capacity is shown to form a primary constraint on the number of MKIDs which can be read, channelised, and processed with this new system. As such, novel signal processing techniques are analysed, including Digitally Down Converted (DDC)-corrected sub-maximally decimated sampling, in an effort to reduce logic requirements without compromising signal to noise ratio. It is also shown how combining the ZCU111 board with a secondary FPGA board will allow all 8 ADCs and 8 DACs to be utilised, providing enough bandwidth to read up to 8,000 MKIDs per board-set, an eight-fold improvement over the state-of-the-art, and important in pursuing 100,000 pixel arrays. Finally, the feasibility of extending the operational frequency range of MKIDs to the 5 - 10 GHz regime (or possibly beyond) is investigated, and some benefits and consequences of doing so are presented. Keywords Frequency-Division Multiplexing, Microwave Kinetic Inductance Detector, System-on-Chip, Superconductivity 1 Introduction As part of the development of a new spatio-spectral focal plane instrument being designed, fabricated, and tested by a new low-temperature detector research group in Ireland, a new Frequency Division Multiplexed (FDM) readout electronics system was studied, employing an RF System-on-Chip (SoC). The MKID [2] fabrication work is done at Trinity College Dublin (TCD) and the Centre for Research on Adaptive Nanostructures and Nanodevices (CRANN). The cryogenic testing and characterisation is carried out at the Dublin Institute for Advanced Studies (DIAS), and the readout electronics research and numerical modelling is done by Maynooth University Experimental Physics and DIAS. Low-temperature, superconducting microresonators are naturally suited for readout in the frequency domain. If a multi-gigahertz frequency range is utilised, this opens-up a large space for multiplexing huge numbers of extremely high-quality (high Q) resonators for a range of scientific applications. In contrast to, for example, Transition Edge Sensors (TESs) [3] which have to combine the sensor with additional microresonators and Superconducting Quantum Interference Devices (SQUIDs) [4] to reach higher pixel numbers, multiplexing MKIDs is much more straight forward. It is difficult to put a number of the increase in pixel number possible with MKIDs compared to TES arrays. It would depend on many values, such as available volume in the cryostat, cooling power, and feedlines in/out of the system. What is more relevant is the reduction in complexity of reading MKID arrays compared to TES arrays. The removal of circuitry (for the SQUIDs) makes scaling-up to large pixel numbers far easier with MKIDs. In the case of kinetic inductance detectors, the microresonator itself often serves as the radiation sensor, where the inductor of the thin-film LC tank circuit acts as the sensitive area for the absorption of UV, optical, or even infrared radiation. By using an array of optical focusing elements such as microlenses [5–7], the radiation can be focused onto the inductor portion of the resonator, effectively maximising the array fill-factor. MKIDs have also been successfully used to detect photons at far-infrared and submillimeter wavelengths, with alternative coupling schemes and quasioptical focusing elements such as feed horns [8–10]. Digital FDM readout electronics, based on Software Defined Radio (SDR) have already been shown to be a good solution for the readout of large-format arrays of optical/near-infrared MKIDs [11–14]. MKIDs, first demonstrated in 2003 [2], are a specific class of cryogenic superconducting detector that show the most promise for scalabillity up to pixel number of > 100, 000 (Fig. 1). MKIDs are inherently multiplexable through FDM as each detector/resonator can be lithographically fabricated to have a unique resonant frequency in the GHz range. The basic principle for operating MKIDs requires cooling the microresonators below the critical temperature for superconductivity for their material (Tc ), where the conduction electrons combine into so-called Cooper pairs [15]. Typically we aim to maintain the resonators at a temperature of less than 1/8 Tc . Thus, for a material with Tc ≈ 800 mK, we cool the MKID device to 100 mK. Any photons with energy of more than twice the superconductor band gap incident on the material of a given resonator can cause breaking of the Cooper pairs, increasing the total inductance, which reduces the resonant frequency [16]. An in-depth explanation of the kinetic inductance effect and superconductivity in general can be found in Zmuidzinas [16] and Gao [17], and some relevant aspects of the theory of superconductivity are covered by Mattis and Bardeen [18] and Kaplan et al. [19]. Figure 1: Close-up image of an MKID array, clearly showing the geometry of a typical lumped element MKID. The inductor (smaller meandering section marked with red boundary for clarity) and the larger inter-digitated capacitor (IDC) can be seen. It is the geometry of the capacitor that is typically varied from pixel to pixel, defining the unique resonant frequency for each pixel. Resonator design was based on work reported on in [20] . 2 Requirements and Methodology In this section we will layout the requirements we have set for our MKIDs, and how these requirements translate to readout specifications in terms of acceptable noise, channel isolation and flatness, sampling rates for the data converters, and bandwidth of the FFT bins. Fig 3 illustrates the relatively complex manner of how some of these specifications and requirements of our MKIDs can define and constrain the specifications and cost-per-pixel of a readout system. 2 Figure 2: Magnification of the inductor section of a single resonator (marked-off by the red box in Fig. 1). This inductor gemoetry is typically kept constant for each resonator in an array, that each pixel should have the same sensitivity to the incident photons. Resonator design was based on work reported on in [20] Figure 3: A diagram illustrating how the various MKID specifications, quality factors, and time-resolution requirements translate to readout specifications. 3 2.1 Time Resolution and ADC Sampling An important point to state at the outset of the following sections, is that a relatively basic (and possibly naive) approach is taken in determining our acceptable noise levels. Based on the available literature to date for MKIDs designed for optical/near-IR photon detection,the assumption appears to be that the HEMT amplifier is the component in the signal line which sets the lower-limit for noise contribution. Based on this, we aim to achieve a noise floor equivalent to the measured noise temperature of our HEMT, namely ≈2.5 K. The particular frequency regime targeted with any MKID-based instrument will play an important role in determining many of the DSP parameters. Since we are developing a relatively high-format array of MKIDs intended for photoncounting in the optical to near-infrared range, and we desire an ability to time-stamp each photon to a precision of roughly 1 microsecond, our channelisation firmware will have quite strict requirements. The microsecond time resolution obviously requires that we effectively sample each pixel one million times per second. Since we intend to implement an FDM technique, which involves buffering a series of time-sampled data before applying an FFT, we must ensure that each FFT ‘loop’ repeats with a recurrence frequency of ≈ 1µs. A simulated time-stream data ‘chunk’ is shown in Fig. 4. Figure 4: Simulated time-stream data for a raw data stream truncated with a standard top-hat, as well as weighted data corresponding to two different filters. In each case the sampling rate was 4 GSPS, with 1 microsecond of non-zero values defined, and padded by zeroes either side. Based on the required time-resolution, for each GHz of bandwidth from a single ADC (or, 1 Giga-Samples Per Second (GSPS) available), we can expect to achieve a number of frequency bins (NB ) of roughly: NB = 1 × 109 s−1 ≈ 500 2 × 106 s−1 (1) where the factor 2 in the denominator is to satisfy the Nyquist criterion. Each of these 500 frequency bins will be sensitive to 1 MHz of bandwidth, ranging from [0-1, 1-2, ..., 498-499, 499-500] MHz. However, since we will synthesise our output tones using two DACs (one for In-phase (I) waveforms and one for Quadrature (Q) waveforms), and we will use two ADCs for digitising the complex signals before processing, we can also define negative frequency baseband probe signals and FFT bins, respectively. These negative frequency bins will range from -500 MHz to 0 MHz. The positive DAC baseband tones (rotating counter-clockwise in the I/Q plane), when up-mixed to higher frequencies with an LO and I/Q mixer, will yield the RF tones for driving the MKIDs in the upper side-band of up-mixed output. The negative baseband tones (rotating clockwise in the I/Q plane) will yield the RF tones in the lower side-band of up-mixed output. A plot of an example broadband frequency comb comprising a few hundred tones uniformly separated by 1 MHz, after up-mixing with an LO of 5.1 GHz, is shown in Fig. 5. Correspondingly, after the RF tones exit the MKIDs array, and are down-mixed with the same LO frequency, they can be channelised in both positive and negative FFT bins. Considering the above constraints, a pair of 1 GSPS ADCs will allow just over 1,000 frequency bins, each sensitive to 1 MHz bandwidth. 2.2 Sensitivity and Pixel Cross-Talk While a raw FFT could be employed for some FDM applications, there are a number of potential issues that arise when employing the FDM readout technique for MKIDs, particularly if we wish to pack the MKIDs close together in 4 Figure 5: A typical broadband frequency comb waveform, measured with a VNA. frequency space. So-called clashing pixels, or pixel cross-talk, where two or more pixels are too close in frequency, have been a primary cause of low pixel yield in previous optical/near-IR MKID instruments such as ARCONS and DARKNESS [7, 13]. The clashing typically results when the measured resonant frequencies shift significantly from the design frequencies. These resonance shifts can be due to minor fabrication inaccuracies, or resonators which exhibit a drift in resonant frequency over time. Ideally, we would like to have each pixel’s resonant frequency to be precisely centred on one of the frequency-domain bins, with a 2 MHz spacing between neighbouring pixels, but this is difficult to achieve. It would also be desirable for each resonator to be optimally driven with an equal power level, but again, this is not typically observed. Considering these non-ideal frequency-spacing and driving-powers, a raw FFT would appear to be offer insufficient isolation from pixel cross-talk or clashing. A raw FFT will typically exhibit sidelobe structure with sensitivities approaching -20 dB (relative to bin-centre sensitivity), as shown in Fig. 6. This means that if a pixel happens to resonate at a frequency corresponding to the side-lobe of a neighbouring pixel, any significant change in signal throughput (say, from an incident photon causing the pixel to shift off-resonance) may appear as signal increase in the neighbouring pixel too. Now, we aim to drive our resonators with relative power of up to 20 dB, where relative power refers to the change in power throughput of a probe tone when a resonator is switched from being driven on-resonance to off-resonance. Further, the actual power required to optimally drive each resonator, typically around -100 dBm, can vary from pixel to pixel by up to ±10 dB. The most obvious way to reduce the levels of these cross-talk effects is to drive down the side-lobe structure in the FFT bins. This is best achieved by implementing a digital window function. The significant difference this can make is illustrated in Fig. 6. For robustness, we aim to ensure that the highest side-lobe in our FFT bins is no larger than -40 dB (relative to bin-centre sensitivity). An additional benefit of this side-lobe suppression is to reduce our sensitivity to higher frequency noise. This is also very important for our applications, since we aim to maximise energy resolution, ∆E, and this is directly affected by signal to noise ratio (SNR). Figure 6: Power attributed by a discrete FFT of a perfectly sharp input signal to a single, 1 MHz wide frequency bin, plotted over the deviation of input signal frequency to bin center, using the different time-stream data shown in Fig. 4. 5 2.3 Signal-to-Noise Requirements, and Scalloping Losses Another consideration given to the structure of our FFT bins is flatness of response of each bin. Given that we aim to drive our resonators at relatively low powers of around -100 dBm, and the fact that SNR at these low power levels is challenging, we want to ensure that we are fully sensitive to the signal power of each probe tone. Since we can not expect that each of our resonators will always fall precisely on a bin centre, as per design, the non-flat response of the FFT bins, even with an appropriate window function applied, is a concern. We thus desire a perfectly flat response for each FFT bin. Another issue, related to the non-flat bin response, is so-called scalloping losses. Small gaps between FFT bins are typically present if additional DSP steps are not implemented. We correct for these potential scalloping losses and non-flat bin respnose by employing a polyphase filterbank (PFB). Unfortunately there is a correlation between the effectiveness of the PFB and the amount of FPGA resources required. More importantly, there is a direct correlation between the effectiveness of the PFB and the time resolution in the final data. These questions and trade-offs are discussed now. There are a number of techniques we could apply to approximate the perfectly flat, perfectly isolated bin response shown in Fig. 7. One approach would be to use a large number of taps in the PFB. The FFT of a top-hat function is of Figure 7: The ideal frequency response of our DSP, with perfect isolation and no truncation of signals. course the function. Therefore, we should be weighting or ‘windowing’ each group of time-stream data points g(tn ) by coefficients (hm ) corresponding to a digital filter, essentially applying a sinc-type normailisation. As such, in the frequency domain we would like to see the effect of F{g(tn )} · F{hm }. However, the convolution theorem tells us that if we want to apply this in the time-domain, the problem becomes a discrete convolution as shown in Eq. (2). F{g(tn )} · F{hm } = F{g(tn ) ⊛ hm } (2) Looking at he right hand side of Eq. (2), it is clearly a discrete convolution if n = m. The authors have reviewed the literature, and have not found it explained as such before. This is most likely due to the fact that in practice, m is usually set to a value much smaller than n, and this would then describe a traditional PFB. For the system requirements described above, this would be an n = m = 1024 for each 1 GSPS of sampling. This would require buffering 1024 blocks of data time-streams - each 1 µs in length and containing ≈1000 samples - and weighting each block with its corresponding coefficient hm of the FIR window function. The weighted blocks would then be summed, and the result would undergo the FFT. Now, the implication most discussed in the literature of such PFB systems is the increased amount of logic and memory for buffering, which we are indeed trying to minimise on our chip. Nonetheless, a moderately large PFB was simulated with m = 128, and the result is shown in Fig. 8. As can be seen the FFT bins are already quite uniform and isolated even with m = 128. A more interesting implication to using a large number of PFB taps is the effect that this has on the final time resolution. The buffering and summing of the large number of data-streams essentially washes out the time information which a more simple windowed-FFT would yield. And, given that we are aiming for roughly microsecond time resolution, the number of taps we apply in the processing should be considered carefully. The indirect effects on MKID energy resolution will need to be examined also. This work will form the basis of a later paper. For now, we settle for a much lower number of taps, and only crudely approximate the top-hat shape. Fig. 9 shows the frequency bin response of a 4 tap PFB with the commonly used Hamming FIR window. For the reader unfamiliar with these DSP techniques, there are a number of sources available in the literature; [21], for example. 6 Figure 8: Simulated response of a 128-tap PFB (with a Hamming FIR window before the FFT). Three adjacent channels are shown, indicating the expected levels of spectral leakage, and losses due to scalloping. Figure 9: Frequency response of a 4-tap PFB which employs a Hamming FIR window before the FFT. Three adjacent channels are shown, indicating the expected levels of spectral leakage and losses due to scalloping. The 4 dashed vertical lines indicate the frequencies of 4 test tones that were measured, as shown in Fig. 13. The scalloping losses due to the sensitivity dips between neighbouring FFT bins are a main concern for our application, as the signals driving the MKID resonators are typically very low power, as previously mentioned. The scalloping losses could be taken care of in a number of ways, including: 1. Simply use a standard, critically Nyquist-sampled PFB, and calibrate/normalise for the lower signal levels since the level of scalloping loss will be determinable. 2. Implement overlapping FFT bins, covering-over the dips between FFT bins, and thereby removing scalloping loses. This is achieved by doubling the FIR window function width, and then using two overlapping (but out of phase) FFTs, effectively increasing the sampling rate back up to satisfying Nyquist criteria. 3. Implement a partially oversampled (or so-called ‘non-maximally decimated’ PFB, with a partially widened FIR window function. Of the three solutions above, option 1) is the easiest to implement, and the most computationally efficient. However, the signal-to-noise ratio will still be affected, and this is unacceptable for our application. Option 2) is also relatively straightforward to implement, and has been the standard approach used for optical/near-IR MKID readouts [12, 22]. However, it is very demanding in terms of FPGA logic. Fig. 10 shows an implementation of option 2) for three adjacent channels, where the overlapped channels are now 2 MHz wide but still separated by 1 MHz, yielding a 50% overlap. However, as shown by the shaded grey areas there is quite significant full uniform overlap where there is uniform response across roughly 0.3 MHz of bandwidth covered twice in each channel. This area of uniform overlap slightly increases as the number of taps of the PFB is increased, and it is deemed overkill and wasteful of FPGA resources. As such, an alternative, more economic approach was sought. Option 3) is the most difficult technique to implement 7 in firmware, but it requires less logic resources than option 2, while still providing a flat response across the full band. It has been found by J. Tuthill, et al., 2015 [23], for example that a non-maximally decimated (or partially oversampled) PFB is feasible to implement. If a time-stream longer than the critically sampled points is fed into the FFT in combination with a widened FIR window function, this widens the FFT bin response by approximately 20% and can sufficiently ‘fill the gaps’ between FFT bins, ensuring no scalloping losses occur [23, 24]. Then, an oversampling ratio of 32/27 satisfies the Nyquist criteria for the now broader frequency response of the FFT bins. Fig. 11 shows a simulation of the 32/27 non-maximally decimated PFB, with minimal redundancy in uniformity of response shown by the very narrow shaded regions. Figure 10: Frequency response of an 8-tap PFB which employs a x2 window length, and would require two staggered FFT stages to double up the sampling if aliasing is to be avoided. A Hamming FIR window function was used. Figure 11: Frequency response of an 8-tap PFB which employs a slightly increased window length (32/27 x critical). This approach only requires partial oversampling (or non-maximally decimated sampling). A Hamming FIR window function was again used. Using the non-maximally decimated approach should save significant logic resources on the FPGA. However, while this non-maximally decimated PFB does indeed flatten the full passband, it has also been identified that this partial oversampling introduces so called phase rotations [23]: If a signal analyzed with a discrete FFT with a finite data sampling rate is not at an integer multiple of that sampling rate, the resulting power in a FFT bin will oscillate with the difference between the bin center frequency and the signal frequency. This oscillation is removed in the last DSP step, the low pass filtering (see below). A non-maximally decimated PFB slightly shifts the FFT bin centers, thereby introducing a periodic phase rotation for any ’would-be’ bin-centred signal. The velocity (or frequency) of this rotation will be dependent on the particular FFT bin and where it lies within the full band. The phase-velocity will equal the difference between the FFT bin centre and the shift introduced by the non-maximal decimation Eq. (3). A detailed explanation of the cause of this phase-rotation, and a solution to correct for it is discussed in section 4.3. φf = f0 − fshif t 8 (3) 2.4 Basic Experimental Verification of Models A primary objective of this work was how we might optimise the efficiency of the DSP; that we might take full advantage of available FPGA resources, and thus maximise the number of MKIDs to be read per board set. However, editing firmware in the Xilinx Vivado tools, then compiling the software into bit-stream files, and finally loading the bit-streams onto the FPGAs is very time consuming. For this reason a Python DSP simulator was developed, so that new DSP designs and ideas could be quickly iterated through, purely in software. While most of this simulation work was based on this in-house DSP simulator, additional open-source code was also used for comparison of our PFB results [21]. An experimental verification was carried out for the predicted losses due to scalloping effects on non-bin-centred tones. The measurements also served to check the inter-channel leakage predicted by the models. The FPGA was programmed with a critically sampled, 4 tap, 256 frequency-bin PFB (using a Hamming FIR window). Using a signal generator, a series of tones were then fed to the ADC, where the tones were given frequencies to be centred on two FFT bins, and then stepped-up in increments of ∆/4, where ∆ = 1 MHz (the frequency spacing between bins). The system was stable and reproducible to within ≈ ±1.1 dB, which was considered sufficiently stable (relative to the somewhat approximate requirements laid out in sections 2.2 and 2.4). Fig. 12 shows the raw data from the measurements. Although the raw data shown has not yet been normalised and corrected, the effects of the roughly -6 dB edge-of-frequency-bin scalloping predicted by the model (Fig. 9) can be seen to clearly reduce the signal for tones not centred on a given bin. Fig 13 shows a more detailed analysis of the measurements, with the PFB bins overlaid for comparison. Figure 12: A number of narrow-band tones were channelised with a 4 Tap PFB. Tones around 38 MHz and 39 MHz were used, with increasing in frequency in steps of ∆/4 = 0.25 MHz. A 1 second integration is shown here, ten of which were used and averaged. Clearly in Fig 13 the bin-centred tone (red) is picked up perfectly by the 38 MHz bin, and partially detected by the bin to the right. The tone with 38.25 MHz (f0 + ∆/4) (black) is very slightly attenuated due to the non-uniform response across the sub-channel, with the neighbouring bin again picking up the signal (a bit stronger this time). Then, with the 38.5 MHz (f0 + ∆/2) tone (purple), it is attenuated by ≈ 6 dB, and is detected at identical levels in the two adjacent bins, since it falls exactly between the two bins. Finally, for the 38.75 MHz (f0 + 3∆/4) tone (yellow), we have the opposite case as for the 38.25 MHz tone, namely a slightly attenuated signal in the 39 MHz bin, and a heavily attenuated signal in the 38 MHz bin. This makes sense since this is equivalent to 39 MHz - ∆/4. There is nothing too surprising about these results, but the good agreement between measurement and simulation simply verifies that our Python DSP simulator is working correctly. The verification of our models was pleasing, as it provided confidence to push forward with new DSP designs. Writing DSP firmware, then transferring it to FPGAs and other hardware, and then testing it is very time consuming. Instead, simulating new prototype DSP designs in software is far faster, but of course it is important to verify these simulations regularly. The only measured data point that is significantly higher than the simulated value is the inter-channel leakage of the bin-centred tone (red) into the neighbouring 39 MHz bin. This indicates that the first side-lobe of the channels may be a bit higher than that predicted by the model. In fact, through these simulations it was noticed that when a 4 tap PFB is implemented, the first side-lobe of each sub-channel falls directly on the centre of the neighbouring sub-channel, regardless of the FIR window function that is used (Hamming, Hann, Blackmann-Harris, etc). Nonetheless, the leakage was still recorded to be below the required -40 dB level, so considered to be acceptable. Furthermore, the measured 9 Figure 13: Zoomed-in data from Fig. 12, after calibration and normalisation. spectral leakage between frequency bins is not a significant concern, as it will be removed automatically by subsequent steps in the DSP pipeline (lowpass filtering after digital down conversion). 3 Trade-off Comparison of Different Hardware Solutions Due to the recent successes of Field Programmable Gate Array (FPGA) carrier boards being used for optical/near-IR MKID readout [12, 22], we decided to look for a similar (but improved) solution for our next-generation board set. A board set, as defined here, comprises analogue-to-digital and digital-to-analogue data converters (ADCs and DACs), processing units (FPGAs, CPUs or GPUs), and IF/RF mixers used for up/down frequency conversion. The primary aims for a next-generation board set are: 1. An increase in the number of pixels that can be readout and processed (per board set). 2. A reduction in mass and volume of a full board set. 3. A reduction in the cost of a full board set (which then translates to a reduction in the cost per pixel, based on point (1) above). 4. A reduction in power consumption of each board set. The successfully commissioned ARCONS MKID instrument [5], and more recently the DARKNESS MKID instrument [7] for optical/near-IR wavelengths used the ROACH 1 and ROACH 2 boards [25], respectively, as the core of their readout systems. These ROACH boards had originally been designed for radio astronomy by the Collaboration for Astronomy Signal Processing and Electronics Research (CASPER) group [26]. As such, the most obvious boards to examine first were the current cutting-edge bespoke FPGA carrier boards being used by the radio astronomy community. The SKARAB board [27] by Peralex (often nick-named the ROACH 3) is currently the most powerful CASPERsupported FPGA system in use in radio astronomy. Now, although the SKARAB offers impressive processing capacity in terms of FPGA logic, there were a number of issues identified in its application as an MKID readout system. First, there are no DAC units available to fit the customised Mezzanine port, so an adapter unit would be required to convert commercially available DACs. Currently, the most commonly used connection for high-throughput data transfer for industry standard data converters is the FPGA Mezzanine Card (FMC). Adapting high-speed FMC-based DACs with the SKARAB’s Mezzanine port would require non-trivial hardware development with additional costs and potential compatibility complexities. Furthermore, the SKARAB and its custom ADCs were very expensive, resulting in a very high cost per pixel which ultimately directed us away from this option. Other bespoke systems designed with radio astronomy in mind were examined, and compared, but all required at least some level of hardware alteration, thereby driving up the readout cost per pixel. This should have been expected, as even the ROACH 2-based DARKNESS readout required additional FPGAs and hardware alterations. Having exhausted the bespoke electronics solutions available, we next looked to ‘commercial off-the-shelf’ (COTS) options. The readout electronics for MKID instruments continually benefit from the regular advances in high speed data converters and signal processing capabilities being demanded by industry; 5G technology being the latest and most demanding example. To satisfy some of this demand, in early 2019 Xilinx Inc. released their range of System-on-Chip (SoC) FPGA series, including the Zynq UltraScale+ RF-SoC [28]. These chips, which contain generous logic and Digital Signal Processing 10 Table 1: FPGA Board Comparison. It should be noted that the costs for the systems are approximate, typically rounded to the nearest 1,000 Euro. Board SKARAB UniBoard II AASL Board Abaco VP880 Xilinx VCU118 Xilinx ZCU111 Abaco VP430 VT AMC599 Unit Price( C) 11 k 20 k 15 k 15 k 6k 9k 25 k 20 k System Cost ( C) 40 k 40 k 35 k 35 k 25 k 19 k 67 k 28 k Pixel Count 1,500 2,000 2,000 2,000 2,000 4,000 4,000 3,000 Cost/Pixel ( C/pixel) 26.67 20.00 17.50 17.50 12.50 4.75 16.75 9.33 System-on-Chip (Y/N) NO NO NO NO NO YES YES NO (DSP) resources as well as on-chip data converters lend themselves perfectly to the needs of MKID readout hardware. The XCZU28DR [28], for example, hosts 8 x 4.0 GSPS ADCs and 8 x 6.4 GSPS DACs, as well as 930,000 system logic cells and 4,272 DSP slices. During the comparison analysis of potential readout boards, Xilinx Inc. announced the release of their ZCU111 board [1] which hosts the XCZU28DR chip. This announcement by Xilinx was followed by a number of related announcements from electronics companies stating that they would soon be releasing FPGA carrier boards housing these SoCs with high-speed interconnect to the chip among other features very useful for MKID readout requirements. As such, a number of these commercial systems were examined and compared against the more bespoke radio astronomy boards. Table 1 shows a comparison of a representative sample of the systems which were examined. The third column, system cost, accounts for the additional components that would be required for each system. For some systems this includes DACs and ADCs, but the RF SoC systems do not require additional data converters. I/Q mixer boards are included for all systems, with an estimated cost of C5 K per board, and 1 board per 2,000 pixels. It should be noted that the stated costs are based on best estimates rounded to the nearest 1,000 Euro (as of December 2019). The fourth column in Table 1, quantifying the number of pixels each system is capable of reading out is not necessarily based on logic resources on the processing chip. For example, for the SKARAB board the 1,500 pixels is constrained by the bandwidth of the Peralex ADC which samples at a speed of 3 GSPS. Similarly, for the Xilinx VCU118 borad (which has huge logic capacity) we paired this board with a high-speed ADC with sampling speed of 4 GSPS. The cost of this ADC is included in the ‘System Cost’ column. The Xilinx ZCU111 RF SoC clearly stands out from the rest of the systems in Table 1, at least based on cost per pixel. This cost of ≈C4.75 per pixel is assuming that 4,000 analogue probe tones can be generated, re-digitised, channelised and processed with the ZCU111. This assumption will be justified in Section 4. In fact, there is enough bandwidth available from the 8 high-speed on-chip DACs and ADCs on the ZCU111 to readout over 8,000 MKID pixels. However, the logic resources on the chip/board would almost certainly be exhausted beyond 4,000 pixels. It was desirable to include an expected power consumption trade-off in Table 1, but obtaining or calculating the power demands for each system was not feasible during the current work. Instead, we simply stated whether each system employed an integrated system-on-chip, since this is expected to significantly affect power consumption. As well as offering a very attractive cost per pixel, the ZCU111 also fulfills the other main aims of our next-generation readout system. 4,000 pixels per board is a factor x16 improvement on our current ROACH 1 system, and a x4 improvement on the state-of-the-art ROACH 2-based system. In fact, the eight high speed DACs could be used in pairs to synthesise 4 x 2,000 I/Q base-band tones with 2 MHz spacing over four parallel -2 to +2 GHz complex I/Q channels. Each of these parallel base-band waveforms could be up-mixed in frequency to the 4 - 8 GHz band with a 6 GHz LO, where each 4 - 8 GHz waveform would feed one of four separate feedlines. In this manner, up to 8,000 detectors could potentially be driven with just 4 feedlines (2,000 per feedline). In a similar manner, 4 pairs of the high speed ADCs could be used to re-digitise the 8,000 tones in I/Q. However, it is a near certainty that the logic resources available on the FPGA portion of the RFSoC will not be capable of processing this amount of data. For the near-term (until 2022), we will continue to work with programming the ZCU111 for just 4 DAC/ADC pairs, toward reading, channelising, and processing 4,000 resonators over two feedlines. But looking to the future with aims of utilising the full available bandwidth from this board, a brief discussion on combining the ZCU111 board with additional FPGAs is presented in section 5.1. As well as the reduced cost per pixel, and the increased number of pixels per board, the mass and volume are also significantly reduced compared to both ROACH 1 and ROACH 2 systems, simply due to the reduction in the number of boards needed for a given number of pixels and the integrated format of the data converters. Finally, the power consumption is also dramatically reduced. Estimates made by Xilinx claim that as much as half of the system power of a conventional FPGA/ADC/DAC board set is dissipated as heat in the Serialise/De-serialise (SERDES)) lines connecting 11 Table 2: FPGA Resources Comparison Board ROACH 1 ROACH 2 ZCU111 DSP Slices 640 2016 4272 Logic Blocks 7360 476160 930000 the ADCs and DACs to the FPGA. This power dissipation is significantly reduced in the SoC format because chip to chip transmission of high-speed digital data is eliminated. Another major factor is the use of a more advanced CMOS node for the RFSoC vs. older FPGAs. Since at first sight the ZCU111 appears to meet all of our requirements, this board will form the basis of the remainder of this paper. 4 4.1 System Design Estimation of Usable Bandwidth on ZCU111 Based on our requirements laid out earlier in this paper, this section outlines the broad structure of our DSP signal chain. Before any firmware or software was written for the new ZCU111-based system, an estimate was made for how much of the bandwidth (and thus how many pixels) could be generated, digitised, and processed. As a first order approximation, the logic resources of the ROACH 1 and ROACH 2 FPGAs were considered, and the number of MKIDs they could process was extrapolated based on the logic capacity of the RF SoC. The ROACH 1 based readout system used for the ARCONS array was capable of reading out 1024 pixels across 4 ROACH boards, achieving 256 pixels per board [12]. Moreover, the ROACH 2 based system used for the DARKNESS array could read out 10,240 pixels using 10 boards, achieving 1024 pixels per ROACH 2 board [7]. The ROACH 1’s Virtex 5 FPGA has 640 DSP slices, and 7,360 logic blocks [29], while the ROACH 2’s Virtex 6 FPGA boasts resources of 2,016 DSP slices, and 476,160 logic blocks [30]. In comparison, the Xilinx ZCU111 has 4,272 DSP slices and 930,000 logic cells [31]. Table 2 summarises this. Fig. 14 shows the extrapolation applied when both Configurable Logic Blocks (CLBs) and DSP Slices are considered. It should be noted that, to compare like with like, a time resolution of 1 µs was assumed, along with an assumed 2 MHz spacing between resonators in frequency space. Figure 14: A very rough approximation of the number of MKID pixels that will be possible to read with the ZCU111 board. Now, Fig. 14 seems to show that the ZCU111 will only be able to read 2,300 pixels if we base the extrapolation on DSP slices. In fact it only gets worse if we instead base the extrapolation on CLBs, resulting in only about 1,800 pixels; such a small multiplex (MUX) factor would make the development of the ZCU111 as an MKID readout hardly worthwhile. Fortunately, there are a number of factors that lead us to be confident that this number is an overly conservative estimate, and the actual MUX factor is at least double that shown in the extrapolations. These factors are as follows: 12 Table 3: DSP Slice and CLB Structure Comparison Component / Device: CLB LUTs CLB Flip-Flops DSP Slice Multiplier Accumulator Virtex 5 4 4 DSP48E 25 x 18 48-bit Virtex 6 4 8 DSP48E1 25 x 18 48-bit XCZU28DR 8 16 DSP48E2 27 × 18 48-bit 1. The number of MKIDs readout by the ROACH 2-based systems were limited by the sampling rates of the compatible data converters available at the time. In fact, less than half of the Virtex 6’s logic resources were utilised in the DARKNESS ROACH 2 readout for channelisation [22] and pulse detection. 2. Our simple approach of extrapolating FPGA capabilities in the manner shown in Fig. 14 is typically very conservative. DSP algorithms such as Polyphase Filter Banks (PFBs) are not usually quite so linear, and if anything the resources scale in a more logarithmic manner. For example, the number of points in an FFT does increase linearly with bandwidth, but the number of taps in the PFB can in fact stay the same. Thus, there is no increase in the number of buffers and multiplies required, only the size of each buffer is extended [32]. On-chip memory does tend to be linear though. 3. We shall implement a more resource-efficient signal processing chain in firmware, significantly reducing the demands on FPGA logic resources. The firmware design is laid-out in Section 4.2, and significantly reduces the processing required, compared to a twice-oversampled PFB. 4. The DSP48E2 DSP slices which are specifically designed to carry the most intensive signal processing, have significantly improved since the Virtex 5 FPGA starndard. Most noticeable is the additional pre-adder in each DSP slice, which significantly improves performance in densely packed DSP designs and reduces the DSP slices required (for an FIR window filter for example, see below) by up to 50% [31]. The configurable logic blocks (CLB’s) also have extended capacity, and have increased routing and connectivity compared to Virtex CLBs. They also have additional control signals to enable superior register packing, resulting in overall higher device utilization. Table 3 shows how the DSP slices and configurable logic blocks have improved over chip generations. 5. Finally, the FPGA fabric clock on the ZCU111 will run at 500 MHz, twice the speed of the Virtex 6 fabric clock for the DARKNESS firmware design (250 MHz). This clock signal will be generated with the ZCU111’s I2C programmable SI570 low-jitter differential oscillator (10 to 810 MHz). Memory should not be an issue as an additional memory can be added, such as a Hybrid Memory Cube (HMC) card, or High Bandwidth Memory (HBM) via the FPGA Mezzanine Card plus (FMC+) port on the ZCU111. 4.2 Coarse Channelisation Assuming that roughly 4,000 can be read and processed with the ZCU111 board, Fig. 15 shows how the design will be implemented in hardware, over two independent I/Q channels using 4 DACs and 4 ADCs. An I/Q up-mixer and down-mixer board with an LO frequency of 6 GHz will be required for each of the two channels, but a common Rubidium frequency standard will keep both channels and all components synchronised. Considering the complex signal in just one of the two independent feedlines, a broadband complex ‘frequency-comb’ (-2 to +2 GHz) will be generated by two DACs (I and Q). After these baseband signals are up-mixed to the 4 - 8 GHz and sent through the MKID array, they will subsequently be down-mixed back to -2 to +2 GHz band before being re-digitised by two of the ADCs at sample rates of 4 GSPS (again in I/Q format). We desire frequency bins with ≈1 MHz spacing, so we will apply a looped 4,096 point FFT to the time-stream data, resulting in FFT bins with just under 1 MHz spacing Fig. 17. As previously mentioned in section 2.4, in order to ensure there are no scalloping losses in this coarse channelisation stage, while also trying to make our DSP as efficient as possible, we will implement a partially oversampled PFB, with oversampling ratio of 32/27. The result of this partially overlapping or staggering of data time-streams as they are fed into the buffers of the PFB causes a partial breakdown in the synchronisation of the sample timing in the sub-channels, which is discussed in the following section 4.3. 4.3 Fine Channelisation and Phase-Rotation Correction After performing the discrete FFT, the next step in signal processing is bin selection and refinement of those chosen FFT bins (often called fine channelization). The useful frequency bins (those sensitive to a probe tone) are selected, 13 Figure 15: A block diagram showing the hardware components of our planned RF SoC-based readout system.The blue shell encompasses all electronic components housed on the ZCU111 board. The red shell is where the mixer components shall be housed: I and Q Inputs and Outputs, 6 GHz LO synthesiser, I/Q mixers, Ref Clock inputs, fixed-gain amplifiers and programmable attenuators. The green shell encloses the 3-stage cryogenic system, where the MKIDs will be cooled to a base temperature of 100 mK. and a time-stream for each one is generated; we shall call these chosen bins ‘channels’ now. Because of the partially oversampled approach used during the coarse channelisation stage, there will now be an observed oscillation in power in both the I and Q data time-streams for a tone that should be expected to be centred in a given channel. This is simply due to the slightly out-of-step sampling of each coarse FFT bin because of the fractional oversampling. However, since I/Q data will always be mutually 90o out of phase, by definition, the asynchronous effect will be observed in the complex signal as a rotation of phase in the I/Q plane. So, since it is the phase of each probe tone we will be monitoring in each channel, we must correct for this. A number of correction techniques are discussed in [23] for example. Additionally, if the probe tone’s frequency itself (ftone ) is not defined to be perfectly centered on it’s corresponding channel, a similar oscillatory phase rotation effect will be observed in the I/Q plane. This oscillation will occur at a frequency equal to it’s spectral distance from the channel centre. However, given that the tones are user-generated with precisely defined frequency values, these phase rotation frequencies are known apriori and can thus be corrected for. A Digital Down Conversion (DDC) step will be used for this correction, where each channel is digitally multiplied with a sinusoidal varying value. A benefit of using this DDC step is that we can define the frequencies of the DDC signals to also account for the phase rotation introduced by partial oversampling discussed above. The DDC signal for each frequency bin, sampled at ≈ 1 MHz is used to effectively shift each FFT bin to be centred on a particular tone, so that the measured power and phase remain constant over time. This shift is necessary, as we require each measured signal to be at 0 Hz when being monitored for evidence of a photon event. Once the additional phase rotation introduced by the 32/27 oversampling is calculated, it can simply be combined in the DDC calculation as shown in Eq. (4) and illustrated in Fig. 16. 14 Figure 16: A simulated tone, if it were finely sampled within one of the 1 MHz frequency bins. The DDC must be given two frequencies to account for both the off-centre nature of the tone (200 kHz in this instance (blue arrow)), and the additional shift due to the non-maximally decimated sampling (133 kHz here (green arrow)). The DDC for each sub-channel will be achieved through a digital sine wave with frequency given by Eq. (4). Once the DDCs have been applied the time-stream from for each frequency bin with a probe tone will effectively be at DC. Now that we have corrected for the difference between the sampling of each probe tone and that tone’s frequency, a low-pass filter (LPF) can be applied to each frerquency bin to also attenuate any other signals close to (or within) that bin. In a similar manner to that described by [22], a channel selection stage will then finally be applied, whereby any frequency bins with no tones within them will be discarded, and any frequency bins capturing more than one tone will be copied and treated in a similar manner to that just explained, with a DDC and LPF. In this manner we will go from having 4,096 evenly spaced, partially overlapped frequency bins, to roughly 2,000 uniquely spaced channels, where each unique channel will contain a single tone corresponding to a single MKID. Fig. 17 shows a block diagram of the main steps in the DSP. Much of the design is based on scaling-up the firmware of other MKID instruments [12, 22], but uses the DDC in a novel way for correction of the unconventional sampling in the PFB. φf = ftone − f0 − fshif t 5 5.1 (4) Alternative Implementations Expanding to Secondary FPGA Carrier Boards Given the logic and memory resources appear to constrain the number pixels that are possible to read and process with the ZCU111 borad, a brief investigation was carried out into how the ZCU111’s FMC+ port might be used to connect the RF SoC chip to external FPGAs. To minimise development costs and complexities, only COTS components were considered. The FMC+ port on the ZCU111 provides direct high-speed interconnect to the FPGA Programmable Logic (PL), and could serve as a link to external FPGAs for this additional processing capacity. The main disadvantage of this approach is the loss of one of the primary attractions of the RF SoC; namely the power-saving due to the removal of SERDES connections between FPGA and data converters. Although difficult to quantify the saving in power for the SoC design, a first order approximation shows a somewhat significant power saving simply based on the large heat load typically generated in the lines between FPGA and data converters in a standard non-SoC system.. The FMC+ port has 16 GTY transceivers (12 directly configured and 4 general I/O channels that can also be configured as GTY (Gigabit Transceiver - version ’Y’). Each of the ZCU111’s GTY transceivers can, in principle, transfer up to 32.75 Gbps (gigabits per second) from chip-to-chip. So, in principle, an FMC+ port can transfer over 524 Gbps. Now, each of the ZCU111’s 8 ADCs generates 4 GSPS, with each sample having 12 bits. This results in each ADC generating 48 gigabits of data every second. Each GTY has two transceiver pins and can run at a maximum of 16.375 GHz PLL. The best approach is to run the internal GTY QPLL0 at 12 GHz (it can be clocked from 9.8 GHz to 16.375 GHz). Then, clocking the ADCs at 4 GHz, on each ADC duty cycle the 12 bit ADC sample will be split in 4, then serialised and sent across 2 GTY transceivers (4 Tx (transmit) pins in total, since there are 2 Tx pins per GTY). So, we can send across 1 ADC sample (from 1 ADC), over 2 GTYs, per ADC duty cycle. Or in other words, on each GTY duty cycle we will transmit 1 bit on each Tx pin, 15 Figure 17: Block diagram showing a possible new DSP design, implementing a non-maximally decimated PFB for more efficient use of FPGA logic. and do this over 2 GTYs. Since the GTY duty cycle will be exactly 3 times faster than the ADC cycle, we can transmit 3 bits per GTY pin, per ADC clock cycle, for a total of 12 bits (or 1 sample) per ADC cycle. Now, we can duplicate this 8 times, for the 8 ADCs, using up the full 16 available GTY channels (2 GTY for each ADC), and we don’t even have to clock the GTY at its maximum PLL speed. As such, all data from the 8 ADCs can be transferred continuously through this interface, and the bulk of the computationally intensive DSP can then be performed on the additional FPGA board. The HTG-940 from Hitech Global, hosting the Virtex UltraScale+ and QUAD FMC+ ports would appear to be a sufficient solution for this format. Two of the four FMC+ ports may even allow two ZCU111 boards to be connected to it, since the UltraScale+ FPGA on the board has immense logic and DSP capacity (3.78 million system logic cells and 12,288 DSP Slices). The remaining two FMC+ ports could be used for HMC (hybrid memory cube) cards if needed. Fig. 18 shows a similar plot to the extrapolation that was shown earlier, but now showing what would be possible if the ZCU111 was to be combined with the HTG-940 board, for example. The processing capacity of this system would approach 10,000 pixels, surpassing the 8,000 pixel bandwidth available. However, as shown in Table 4, the calculated cost per pixel is only marginally smaller with the addition of the HTG-940 board. Considering this, and the fact that exporting the ADC data off the ZCU111 chip to the external board will clearly consume more power due to dissipation, this option is not as attractive as first thought. In fact, it would seem to make more sense to simply use two ZCU111 boards to readout 8,000 resonators (4,000 per board), rather than a single ZCU111 connected an additional FPGA carrier board. The mass/volume footprint and cost per pixel are quite similar, the double ZCU111 approach would require less power per pixel to operate. Additionally, using two ZCU111 boards would allow the firmware and software to be cloned from one board to the next, avoiding the additional firmware and software development for the HTG-940 FPGA board. The HTG board hosts an FPGA from a different series, and the architecture is significantly different to that of the RF SoC chips. 16 Figure 18: A very rough approximation of the number of MKID pixels that will be possible to readout if the ZCU111 board was to be combined with a secondary FPGA carrier board such as the HTG-940. Table 4: Optical/Near-IR MKID Readout System Comparison. It should be noted that the costs for the systems are approximate, typically rounded to the nearest 1,000 Euro. A cost of £5 k has been included for each I/Q mixer board required by each system, assuming one mixer board per feedline running 2,000 pixels. Board ROACH I ROACH II Xilinx ZCU111 ZCU111 Plus HTG940 5.2 Unit Price( C) 5k 5k 9k 17 k System Cost ( C) 10 k 10 k 19 k 37 k Pixel Count 250 1,000 4,000 8,000 Cost/Pixel ( C/pixel) 40.00 10.00 4.75 4.63 System-on-Chip (Y/N) NO NO YES NO Expanding to Higher Frequency Bands Although the current RF-SoC-based readout system is only now being developed, it is never too early to start thinking to the future. In fact Xilinx Inc. have already released their so-called Gen 3 version of the RF SoC chips (our ZCU111 uses a Gen 2 version). An example of a carrier board that hosts one of these Gen 3 RF SoC chips is the Xilinx Zynq UltraScale+ RF SoC ZCU208 ES1 [33]. Similar in design to the board used for our current system, it again hosts 8 on-chip ADCs and 8 DACs, but the data converters are even faster. The ADCs clock at speeds up to 5 GSPS while the DACs run at speeds up to 10 GSPS. Both the ADCs and DACs are 14 bit, but most other specs on the board are essentially the same as the ZCU111. So, the main advantage this Gen 3 chip and board would offer the low temperature detector community is a higher frequency octave to work within, namely 5 - 10 GHz. The question then becomes, are there any sufficiently performing HEMTs on the market for this band (again pointing out that we are, possibly naively, demanding a < 3 K noise temperature for our HEMTs). No doubt, there are exceptional HEMP amplifiers available for many frequency bands, from radio trough high-GHz to THz. However, we currently operate with the (possibly naive) assumption that our HEMT noise temperature will set a hard limit for our otipal/near-IR detectors. This assumed baseline may be naive because it is based on work carried out by McHugh et al. (2012) [12]. The HEMT was shown to be the highest contributor of noise for the first UVOIR MKID instrument, ARCONS [13], where the HEMT noise was measured as roughly 40 dB above the ADC noise [12]. Whilst it was demonstrated that the available HEMTs at the time did appear to form a hard lower-limit of achievable noise, the quality of HEMTs has increased somewhat since 2012. Furthermore, the significantly broader bandwidths being demanded from our currently used ADCs may very well prove this assumption incorrect. If this proves to be true, then it would make sense to examine alternative HEMTs suitable for higher octaves. An analysis of measuring, and determining the highest contributing component of noise in our system will form the basis of a publication in the very-near future. For the moment, assuming that HEMT noise is our target baseline, it can be pointed out that significant progress has been made over the past 2 to 3 years in lowering the noise temperature of some cryogenic HEMTs that are operational over higher frequency ranges. Fig. 19 shows the noise temperature data for three HEMTs that operate over three different frequency ranges. They are all manufactured by Low Noise Factory in Sweden, and the data is accurate as of July 20th, 2020. 17 Figure 19: Noise temperature data for a number of LNF HEMT amplifiers. The gain levels of each of the three HEMTs shown are roughly 40 dB, 35 dB, and 32 dB across the frequency ranges shown. Data credit: Low Noise Factory, Sweden [34–36]. With new boards such as the above-mentioned ZCU208 ES1 from Xilinx, the 5 - 10 GHz band should certainly be considered for future MKID instruments, especially seeing the excellent performance of the LNC4-16-B HEMT across this frequency band [35]. Again though, the noise-floor of the on-chip components of these boards will need to characterised and compared to what is achievable with current HEMT technology, and acceptable noise levels. And assuming further increases in the frequency range of Xilinx RF SoC devices, the 6 - 12 GHz band should be kept in mind also. Assuming an equivalent spacing between resonators, these increased operational frequency ranges would correspond to a 25% and a 50% increase, respectively, in the number of pixels per feedline. As Fig. 19 shows, even frequency ranges beyond 14 GHz [36] are now capable of being kept well below the levels of HEMTs on previous instruments. The noise spectral density (NSD) contributed from the HEMT noise temperature can be calculated by Eq. (5) [12]. N SD = kB Tn Pin (5) The HEMT used in any cryogenic measurement work described here, has a noise temperature of roughly 2.3 - 2.7 K across the 4 - 8 GHz band. Assuming Tn = 2.7 K, and aiming to drive our resonators at a power of around Pin − 75 dBm, this results in a phase noise of -89.3 dBc/Hz when the signal is fully off-resonance (i.e., illuminated by a photon). However, since the resonator Q-factors can vary significantly, and this can effect the power with which they can be optimally driven, we should consider the case of Pin ± 15 dBm, namely -60 dBm and -90 dBm, which yields HEMT phase noise values of -104.3 dBc/Hz and 74.3 dBc/Hz, respectively. As such, we aim to keep all other sources of noise below the limit of -104.3 dBc/Hz. Detailed performance of ZCU111, ZCU208, and HEMT amplifiers will continue as we implement systems. 6 Summary of Conclusions With the aim of achieving a compact, low-power readout system for MKID arrays for optical/near-IR astronomy, this work describes a baseline design to be built around the Xilinx ZCU111 RF SoC system. Based on extrapolations shown and the application of more efficient channelisation methods, it is expected that this new system will be capable of reading out 4,000 resonators with 2 MHz spacing. Good agreement was shown between measurements and simulation for scalloping losses for the critically-sampled 4 tap PFB, with up to 6 dB of signal loss for tones at the edge of FFT frequency bins. A logic-efficient solution to prevent these losses was shown to be a non-maximally decimated PFB, whereby a flat response can be achieved across the full frequency bin without doubling-up on the PFB. If more processing capacity is required, the ZCU111’s on-board FMC+ connector was shown to provide enough high-speed interconnect to transfer the full data from all 8 on-chip ADCs of the ZCU111, to an external FPGA board via GTY transceivers. However, it was shown that, provided the ZCU111 itself can fully process 4,000 channels, then it is more sensible to simply use multiple ZCU111 boards to readout larger pixel numbers rather than exporting data to external FPGAs. It also was concluded that if TLS noise can not be kept below that of the HEMT, it will make most 18 sense to extend our operational frequency range to a higher octave. Both the 5-10 and the 6-12 GHz octaves should be investigated further as a route to increasing the MUX factor. 7 Acknowledgments This publication has emanated from research conducted with the financial support of Science Foundation Ireland under Grant number 15/IA/2880. The authors would like to thank University of California Santa Barbara (UCSB), Mazin Labs, and the Collaboration for Astronomy Signal Processing and Electronics Research (CASPER) at University of California Berkley for their assistance. The MKID array in fig. 1 was fabricated by G.U. and is based on a design by Benjamin A. Mazin, UCSB. Thanks is also given to Xilinx and Cathal McCabe for donated FPGAs for the current ROACH 1 readout system and Vivado/ISE software under the Xilinx DUP programme. This work also benefited from remote discussions with an RF-SoC collaboration led by Gustavo Cancelo at Fermi National Accelerator Laboratory (Fermilab), Batavia, United States. Data Availability Statement: The data that support the findings of this study are available from the corresponding author upon reasonable request. References [1] “Xilinx-ZCU111 product.” [2] P. Day, H. LeDuc, B. Mazin, et al., “A broadband superconducting detector suitable for use in large arrays,” Nature 425, 817–821 (2003). [3] K. D. Irwin and G. C. Hilton, “Transition-edge sensors,” Topics in Applied Physics 99(Cryogenic Particle Detection), 63–150 (2005). [4] Y. Nakashima, F. Hirayama, S. Kohjiro, et al., “Adjustable squid-resonator direct coupling in microwave squid multiplexer for TES microcalorimeter array,” IEICE Electronics Express 14 (2017). [5] B. Mazin, S. Meeker, M. Strader, et al., “Arcons: A 2024 pixel optical through near-ir cryogenic imaging spectrophotometer,” Publications of the Astronomical Society of the Pacific 125 (2013). [6] K. Mitsui, T. Nitta, N. Okada, et al., “Fabrication of 721-pixel silicon lens array of an MKID camera,” in Millimeter, Submillimeter, and Far-Infrared Detectors and Instrumentation for Astronomy VII, W. S. Holland and J. Zmuidzinas, Eds., 9153, 903 – 912, International Society for Optics and Photonics, SPIE (2014). [7] S. R. Meeker, B. A. Mazin, A. B. Walter, et al., “Darkness: A microwave kinetic inductance detector integral field spectrograph for high-contrast astronomy,” Publications of the Astronomical Society of the Pacific 130, 065001 (2018). [8] B. Johnson, D. Flanigan, M. Abitbol, et al., “Development of multi-chroic mkids for next-generation cmb polarization studies,” Journal of Low Temperature Physics (2017). [9] B. R. Johnson, D. Flanigan, M. H. Abitbol, et al., “Polarization sensitive Multi-Chroic MKIDs,” in Millimeter, Submillimeter, and Far-Infrared Detectors and Instrumentation for Astronomy VIII, W. S. Holland and J. Zmuidzinas, Eds., 9914, 216 – 227, International Society for Optics and Photonics, SPIE (2016). [10] M. R. Vissers, J. E. Austermann, M. Malnou, et al., “Ultrastable millimeter-wave kinetic inductance detectors,” Applied Physics Letters 116(3), 032601 (2020). [11] B. A. Mazin, P. K. Day, K. D. Irwin, et al., “Digital readouts for large microwave low-temperature detector arrays,” Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment 559(2), 799–801 (2006). Proceedings of the 11th International Workshop on Low Temperature Detectors. [12] S. McHugh et al., “A readout for large arrays of microwave kinetic inductance detectors,” Review of Scientific Instruments 83 (2012). [13] B. A. Mazin, K. O’Brien, S. McHugh, et al., “ARCONS: a highly multiplexed superconducting optical to near-IR camera,” in Ground-based and Airborne Instrumentation for Astronomy III, 7735, 773518 (2010). [14] B. A. Mazin, B. Bumble, S. R. Meeker, et al., “A superconducting focal plane array for ultraviolet, optical, and near-infrared astrophysics,” Opt. Express 20, 1503–1511 (2012). [15] L. N. Cooper, “Bound electron pairs in a degenerate fermi gas,” Phys. Rev. 104, 1189–1190 (1956). 19 [16] J. Zmuidzinas, “Superconducting microresonators: Physics and applications,” Annual Review of Condensed Matter Physics 3, 169 – 214 (2012). [17] J. Gao, The Physics of Superconducting Microwave Resonators. PhD thesis, California Institute of Technology, Califoria, U.S.A. (2008). [18] D. C. Mattis and J. Bardeen, “Theory of the anomalous skin effect in normal and superconducting metals,” Phys. Rev. 111, 412–417 (1958). [19] S. B. Kaplan, C. C. Chi, D. N. Langenberg, et al., “Quasiparticle and phonon lifetimes in superconductors,” Phys. Rev. B 14, 4854–4873 (1976). [20] P. Szypryt, S. R. Meeker, G. Coiffard, et al., “Large-format platinum silicide microwave kinetic inductance detectors for optical to near-ir astronomy,” Opt. Express 25, 25894–25909 (2017). [21] D. C. Price, “Spectrometers and Polyphase Filterbanks in Radio Astronomy,” arXiv e-prints , arXiv:1607.03579 (2016). [22] M. Strader, Digital Readout for Microwave Kinetic Inductance Detectors and Applications in High Time Resolution Astronomy. PhD thesis, University of California Santa Barbara, Califoria, U.S.A. (2016). [23] J. Tuthill, G. Hampson, J. D. Bunton, et al., “Compensating for oversampling effects in polyphase channelizers: A radio astronomy application,” in IEEE Signal Processing and Signal Processing Education Workshop (SP/SPE), 255–260 (2015). [24] J. Tuthill, G. Hampson, J. Bunton, et al., “Development of multi-stage filter banks for askap,” in International Conference on Electromagnetics in Advanced Applications, 1067–1070 (2012). [25] J. Hickish et al., “A decade of developing radio-astronomy instrumentation using casper open-source technology,” Journal of Astronomical Instrumentation 5 (2016). [26] J. Hickish and D. Werthimer, “The collaboration for astronomy signal processing and electronics research (casper): Lowering barriers to entry in radio astronomy research with open-source, community-driven digital signal processing hardware and software,” in Progress in Electromagnetic Research Symposium (PIERS), 1670–1670 (2016). [27] “SKARAB.” [28] “Xilinx-RF-SoC data sheet.” [29] “Virtex5 family overview.” [30] “Virtex6 family overview.” [31] “Zynq UltraScale plus RF SoC data sheet.” [32] C. Chang, Design and Applications of a Reconfigurable Computing System for High Performance Digital Signal Processing. PhD thesis, University of California, Berkeley, Califoria, U.S.A. (2005). [33] “Xilinx-RF-SoC ZCU208 ES1 data sheet.” [34] “LNF-LNC4-8C data sheet.” [35] “LNF-LNC4-16B data sheet.” [36] “LNF-LNC6-20C data sheet.” List of Figures 1 2 3 4 Close-up image of an MKID array, clearly showing the geometry of a typical lumped element MKID. The inductor (smaller meandering section marked with red boundary for clarity) and the larger interdigitated capacitor (IDC) can be seen. It is the geometry of the capacitor that is typically varied from pixel to pixel, defining the unique resonant frequency for each pixel. Resonator design was based on work reported on in [20] Magnification of the inductor section of a single resonator (marked-off by the red box in Fig. 1). This inductor gemoetry is typically kept constant for each resonator in an array, that each pixel should have the same sensitivity to the incident photons. Resonator design was based on work reported on in [20] A diagram illustrating how the various MKID specifications, quality factors, and time-resolution requirements translate to readout specifications. Simulated time-stream data for a raw data stream truncated with a standard top-hat, as well as weighted data corresponding to two different filters. In each case the sampling rate was 4 GSPS, with 1 microsecond of non-zero values defined, and padded by zeroes either side. 20 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 A typical broadband frequency comb waveform, measured with a VNA. Power attributed by a discrete FFT of a perfectly sharp input signal to a single, 1 MHz wide frequency bin, plotted over the deviation of input signal frequency to bin center, using the different time-stream data shown in Fig. 4. The ideal frequency response of our DSP, with perfect isolation and no truncation of signals. Simulated response of a 128-tap PFB (with a Hamming FIR window before the FFT). Three adjacent channels are shown, indicating the expected levels of spectral leakage, and losses due to scalloping. Frequency response of a 4-tap PFB which employs a Hamming FIR window before the FFT. Three adjacent channels are shown, indicating the expected levels of spectral leakage and losses due to scalloping. The 4 dashed vertical lines indicate the frequencies of 4 test tones that were measured, as shown in Fig. 13. Frequency response of an 8-tap PFB which employs a x2 window length, and would require two staggered FFT stages to double up the sampling if aliasing is to be avoided. A Hamming FIR window function was used. Frequency response of an 8-tap PFB which employs a slightly increased window length (32/27 x critical). This approach only requires partial oversampling (or non-maximally decimated sampling). A Hamming FIR window function was again used. A number of narrow-band tones were channelised with a 4 Tap PFB. Tones around 38 MHz and 39 MHz were used, with increasing in frequency in steps of ∆/4 = 0.25 MHz. A 1 second integration is shown here, ten of which were used and averaged. Zoomed-in data from Fig. 12, after calibration and normalisation. A very rough approximation of the number of MKID pixels that will be possible to read with the ZCU111 board. A block diagram showing the hardware components of our planned RF SoC-based readout system.The blue shell encompasses all electronic components housed on the ZCU111 board. The red shell is where the mixer components shall be housed: I and Q Inputs and Outputs, 6 GHz LO synthesiser, I/Q mixers, Ref Clock inputs, fixed-gain amplifiers and programmable attenuators. The green shell encloses the 3-stage cryogenic system, where the MKIDs will be cooled to a base temperature of 100 mK. A simulated tone, if it were finely sampled within one of the 1 MHz frequency bins. The DDC must be given two frequencies to account for both the off-centre nature of the tone (200 kHz in this instance (blue arrow)), and the additional shift due to the non-maximally decimated sampling (133 kHz here (green arrow)). Block diagram showing a possible new DSP design, implementing a non-maximally decimated PFB for more efficient use of FPGA logic. A very rough approximation of the number of MKID pixels that will be possible to readout if the ZCU111 board was to be combined with a secondary FPGA carrier board such as the HTG-940. Noise temperature data for a number of LNF HEMT amplifiers. The gain levels of each of the three HEMTs shown are roughly 40 dB, 35 dB, and 32 dB across the frequency ranges shown. Data credit: Low Noise Factory, Sweden [34–36]. List of Tables 1 2 3 4 FPGA Board Comparison. It should be noted that the costs for the systems are approximate, typically rounded to the nearest 1,000 Euro. FPGA Resources Comparison DSP Slice and CLB Structure Comparison Optical/Near-IR MKID Readout System Comparison. It should be noted that the costs for the systems are approximate, typically rounded to the nearest 1,000 Euro. A cost of £5 k has been included for each I/Q mixer board required by each system, assuming one mixer board per feedline running 2,000 pixels. 21

Log In

Modeling Results and Baseline Design for an RF-SoC-Based Readout System for Microwave Kinetic Inductance Detectors