A 128×128 Single-Photon Imager with on-Chip Column-Level 10b Time-to-Digital Converter Array Capable of 97ps Resolution

Cristiano Niclass; Claudio Favi; Theo Kluter; Marek Gersbach; Edoardo Charbon

A 128�?128 Single-Photon Imager with on-Chip Column-Level 10b Time-to-Digital Converter Array Capable of 97ps Resolution

Solid-State Circuits IEEE International Conference, 2008

We present an array of 128times128 highly miniaturized SPAD (single-photon avalanche diodes) pixels with a bank of 32 time-to-digital converters (TDCs) on chip. A decoder selects a 128-pixel row. Every group of 4 pixels in the row shares a TDC based on an event-driven mechanism. As a result, row-wise parallel acquisition is obtained with a low number of TDCs. Because...Read more

3 • 2008 IEEE International Solid-State Circuits Conference ISSCC 2008 / SESSION 2 / IMAGE SENSORS & TECHNOLOGY / 2.1 2.1 A 128×128 Single-Photon Imager with on-Chip Column-Level 10b Time-to-Digital Converter Array Capable of 97ps Resolution Cristiano Niclass, Claudio Favi, Theo Kluter, Marek Gersbach, Edoardo Charbon Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland Time-resolved optical imaging has many uses in physics, molecular biology, medical sciences and computer vision, just to name a few. Deep sub-nanosecond timing resolution, in combination with high sensitivity, is becoming increasingly important in a number of imag- ing methods. Non solid-state devices enabling picosecond time reso- lution, such as photomultiplier tubes and microchannel plates, have existed for decades. However, cost and size have limited their use to low-scale and scientific applications. In solid-state technology, single- photon avalanche diodes (SPADs) have become the alternative of choice [1]. Recently, SPADs are even more compelling with the emer- gence of CMOS implementations [2] and the appearance of multi- pixel designs [3]-[6]. With the growth of the array size however, it has become increasing- ly hard to process massive volumes of data from SPAD pixels. To address this issue, hybrid systems have been proposed that combine advanced CMOS technologies with processes designed to optimize SPAD performance [7]. The main limitation of this approach is the increased complexity of fabrication and, possibly, higher costs. Analog design techniques have also been used to evaluate the photon’s time- of-arrival (TOA) on-chip [5]. However, increased pixel size and poten- tially complex schemes to compensate for temperature and technolog- ical variability are often needed. We present an array of 128×128 highly miniaturized SPAD pixels with a bank of 32 time-to-digital converters (TDCs) on chip. The block diagram of the system is shown in Fig. 2.1.1. A decoder selects a 128- pixel row. Every group of 4 pixels in the row shares a TDC based on an event-driven mechanism similar to [4]. As a result, row-wise par- allel acquisition is obtained with a low number of TDCs. Thanks to the outstanding timing precision of SPADs and an optimized TDC design, a typical resolution of 97ps is achieved within a range of 100ns (10b) at a maximum rate of 10MS/s per TDC. The TDC bank exhibits a DNL of 0.08LSB and an INL of 1.89LSB. Figure 2.1.2 (a) shows the pixel schematics based on a configuration with seven NMOS transistors. The SPAD, implemented as a p+/p- well/deep n-well junction, is based on [4], where detailed device char- acterization is reported. The breakdown voltage V BD of the SPAD in this design is 17.7V. At its cathode, a bias voltage of 21V is applied in order to operate with an excess bias voltage V E of 3.3V. A row selec- tion transistor (M 1 ) decouples the SPAD anode from pixels in the columns that are not selected. At the selected row, the SPAD is charged as a result of its anode being connected to ground via quenching/recharge transistor M 2 . Transistors M 3 , M 4 , and M 6 oper- ate as switches to set V GS of M 2 to either V QCH or V RCH in order to quench the avalanche or recharge the SPAD. M 5 is used as a capaci- tor to reduce the effect of switching noise on V QCH caused by charge injection from the gate of M 2 . M 7 is the pixel output transistor. It operates as a pull-down for the column line when a photon is detect- ed. The column line potential is kept high by pull-up transistors at the bottom of the column. Figure 2.1.2 (b) shows a simplified block diagram of the TDC. Time- to-digital conversion is obtained as a result of an interpolation of three delay measurements: coarse, medium, and fine. The main TDC structure is similar to [8]. Nonetheless, further improvements have been implemented to reduce the silicon area and perform flash con- version, thus increasing throughput. Each TDC has an independent controller that is used as a time interpolator, to generate internal sig- nals and to control its operating mode. Each controller also manages the interface with the global readout circuit and the column circuit- ry. A global master DLL generates 16 uniformly spaced phases PHI[15:0] based on a global CLK/START signal. Example waveforms for CLK/START and PHI are shown in Fig. 2.1.2(c). The input fre- quency of the master DLL is typically 40MHz, thus τ C is 25ns for time interval measurements. The time separation between two successive phases sets τ M to 1.5625ns. The TDC supports two main operating modes: (i) a measurement mode and (ii) a calibration mode. In measurement mode, the STOP signal originated by the first of four SPADs that detects a photon is mapped to signal TRG in column-level TDC. A 2b counter clocked by signal CLK delivers coarse resolution τ C for a measurement range of 4τ C . Medium resolution τ M is achieved by finding that pair of global phases PHI which delimits the transition of TRG. The register used for τ M also generates a synchronization signal SYNC that is precise- ly asserted on the second phase transition following the rising edge of TRG. The time delay between TRG and SYNC is measured by means of a 32-tap delay line and register, which are designed based on the TDC core of [9]. In the TDC interpolator, the three measure- ments, delivering 2, 4, and 4 bits of resolution, are combined into the total time delay code. Only 16 delay cells (4b) of the fine delay line are used for the final result. The remaining delay cells are added to accommodate timing shifts due to process, voltage or temperature variations. In calibration mode, the TDC utilizes the full 32-tap fine delay line as a local DLL that locks to two non-successive phases (PHI[i] and PHI[i+2]) with a total duration of 2τ M , thus generating a fine resolu- tion τ F of 97.66ps. The analog control voltage of the DLL is stored on a local capacitor. Since calibration is performed individually in each TDC, matching requirements between TDCs may be relaxed. The output of all TDCs is transferred off-chip via a fast global readout cir- cuit consisting of 32 TDC interface blocks, a configuration/testing JTAG controller and a pipelined time-multiplexer readout chain. The readout circuit controls 8 digital 12b output buses. Each bus provides the 10b TDC data and 2b column address to identify the originator of the STOP signal among four pixels. In order to maximize data rate, the readout circuit operates four times faster than the TDC frequen- cy. To reduce power consumption, IO pads only change state when valid data are available in a readout cycle. The readout circuit also provides configuration/testing measures to read and modify most TDCs and readout circuit registers via an integrated JTAG controller. The chip micrograph is shown in Fig. 2.1.3 along with a detail of the pixel. The pixel measures 25×25μm 2 . The sensor was tested in three steps. First, the TDCs were characterized separately. Second, the TDC array was operated in measurement mode and connected with the SPAD array when exposed to ambient light. The performance of the TDC bank is summarized in Fig. 2.1.4. The figure shows the worst-case DNL and INL measured over the entire bank over sever- al days, to verify the effectiveness of calibration over temperature and technological variations. Finally, the chip was exposed to direct pulsed laser illumination generated by a 637nm solid-state laser source. The pulses were 80ps wide with a repetition rate of 40MHz. The power of the laser was adjusted to minimize pile-up distortion and a TOA histogram was built for each pixel. The resulting jitter measurements, along with the dark count rate (DCR), are shown in Fig. 2.1.5. The chip’s overall performance was tested in a breadboard system based on an FPGA. The breadboard was designed to provide all the digital interface signals and memory support for the imager output. The sensor imaged a 3D scene illuminated by a pulsed laser. Figure 2.1.6 shows the 3D image obtained using the same techniques as in [3], whereby the total integration time for one frame was 1s, with a worst-case distance error of 1.4mm. Fig. 2.1.7 is a performance sum- mary of the sensor chip and its various components. This research was supported by a grant from the Swiss National Science Foundation. The authors are grateful to Maximilian Sergio for help during the IC tape-out. References: [1] S. Cova, A. Longoni and A. Andreoni, “Towards Picosecond Resolution with Single- Photon Avalanche Diodes,” Rev. Sci. Instrum., 52 (3), pp 408-412, 1981. [2] A. Rochas, M. Gani, B. Furrer et al., “Single Photon Detector Fabricated in a Complementary Metal-Oxide-Semiconductor High-Voltage Technology,” Rev. Sci. Instrum., Vol. 74, N. 7, pp 3263-3270, July 2003. [3] C. Niclass and E. Charbon, “A Single Photon Detector Array with 64x64 Resolution and Millimetric Depth Accuracy for 3D Imaging,” ISSCC Dig. Tech. Papers, pp. 364- 365, Feb. 2005. [4] C. Niclass, M. Sergio and E. Charbon., “A Single-Photon Avalanche Diode Array Fabricated in 0.35μm CMOS and Based on an Event-Driven Readout for TCSPC Experiments,” APCT Conference, SPIE Optics East, Oct. 2006. [5] D. Stoppa, L. Pancheri, M. Scandiuzzo et al., “A CMOS 3-D Imager Based on Single Photon Avalanche Diode,” IEEE T. CAS I, pp. 4-12, Jan. 2007. [6] M. Sergio, C. Niclass and E. Charbon, “A 128×2 CMOS Single-Photon Streak Camera with Timing-Preserving Latchless Pipeline Readout,” ISSCC Dig. Tech. Papers, pp. 394-395, Feb. 2007. [7] B. Aull, J. Burns, C. Chen et al., “Laser Radar Imager Based on 3D Integration of Geiger-Mode Avalanche Photodiodes with Two SOI Timing Circuit Layers,” ISSCC Dig. Tech. Papers, pp. 238-239, Feb. 2006. [8] A. Mantyniemi, T. Rahkonen and J. Kostamovaara, “An Integrated 9-Channel Time Digitizer with 30ps Resolution,” ISSCC Dig. Tech. Papers, pp 266-267, Feb. 2002. [9] R. B. Staszewski, S. Vemulapalli, P. Vallur et al., “Time-to-Digital Converter for RF Frequency Synthesis in 90nm CMOS,“ IEEE RFIC Symp., pp. 473-476, 2005. ©2008 IEEE

4 DIGEST OF TECHNICAL PAPERS • Continued on Page ISSCC 2008 / February 4, 2008 / 1:30 PM Figure 2.1.1: Block diagram of the proposed sensor. The sensor consists of a 128x128 pixel array, a bank of 32 TDCs, and a fast parallel readout circuitry. A row decoding logic selects 128 pixels that are activated for detection. The pixels are organized in groups of four that access the same TDC based on a first-in-take-all sharing scheme. Figure 2.1.3: Photomicrograph of the sensor chip with a pixel detail in the inset. The circuit, fabricated in 0.35μm CMOS technology, has a surface of 8x5mm 2 . The pixel pitch is 25μm. Figure 2.1.4: Measurements of differential non-linearity (DNL) and integral non-linearity (INL) for the worst case TDC at room temperature. Figure 2.1.5: (a) Time jitter measurement of the SPAD detector and overall circuitry using the integrated TDCs. In the inset, a logarithmic plot is shown. The illumination laser pulse width was 80ps. (b) Dark count rate (DCR) distribution over the array. Figure 2.1.2: (a) Pixel schematic. (b) Simplified TDC block-diagram. (c) TDC signal waveform. (a) (b) (c) I/O 128x128 SPAD array 32 TDC Array (a) (b) 2

ISSCC 2008 / SESSION 2 / IMAGE SENSORS & TECHNOLOGY / 2.1 2.1 A 128×128 Single-Photon Imager with on-Chip Column-Level 10b Time-to-Digital Converter Array Capable of 97ps Resolution Cristiano Niclass, Claudio Favi, Theo Kluter, Marek Gersbach, Edoardo Charbon Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland Time-resolved optical imaging has many uses in physics, molecular biology, medical sciences and computer vision, just to name a few. Deep sub-nanosecond timing resolution, in combination with high sensitivity, is becoming increasingly important in a number of imaging methods. Non solid-state devices enabling picosecond time resolution, such as photomultiplier tubes and microchannel plates, have existed for decades. However, cost and size have limited their use to low-scale and scientific applications. In solid-state technology, singlephoton avalanche diodes (SPADs) have become the alternative of choice [1]. Recently, SPADs are even more compelling with the emergence of CMOS implementations [2] and the appearance of multipixel designs [3]-[6]. With the growth of the array size however, it has become increasingly hard to process massive volumes of data from SPAD pixels. To address this issue, hybrid systems have been proposed that combine advanced CMOS technologies with processes designed to optimize SPAD performance [7]. The main limitation of this approach is the increased complexity of fabrication and, possibly, higher costs. Analog design techniques have also been used to evaluate the photon’s timeof-arrival (TOA) on-chip [5]. However, increased pixel size and potentially complex schemes to compensate for temperature and technological variability are often needed. We present an array of 128×128 highly miniaturized SPAD pixels with a bank of 32 time-to-digital converters (TDCs) on chip. The block diagram of the system is shown in Fig. 2.1.1. A decoder selects a 128pixel row. Every group of 4 pixels in the row shares a TDC based on an event-driven mechanism similar to [4]. As a result, row-wise parallel acquisition is obtained with a low number of TDCs. Thanks to the outstanding timing precision of SPADs and an optimized TDC design, a typical resolution of 97ps is achieved within a range of 100ns (10b) at a maximum rate of 10MS/s per TDC. The TDC bank exhibits a DNL of 0.08LSB and an INL of 1.89LSB. Figure 2.1.2 (a) shows the pixel schematics based on a configuration with seven NMOS transistors. The SPAD, implemented as a p+/pwell/deep n-well junction, is based on [4], where detailed device characterization is reported. The breakdown voltage VBD of the SPAD in this design is 17.7V. At its cathode, a bias voltage of 21V is applied in order to operate with an excess bias voltage VE of 3.3V. A row selection transistor (M1) decouples the SPAD anode from pixels in the columns that are not selected. At the selected row, the SPAD is charged as a result of its anode being connected to ground via quenching/recharge transistor M2. Transistors M3, M4, and M6 operate as switches to set VGS of M2 to either VQCH or VRCH in order to quench the avalanche or recharge the SPAD. M5 is used as a capacitor to reduce the effect of switching noise on VQCH caused by charge injection from the gate of M2. M7 is the pixel output transistor. It operates as a pull-down for the column line when a photon is detected. The column line potential is kept high by pull-up transistors at the bottom of the column. Figure 2.1.2 (b) shows a simplified block diagram of the TDC. Timeto-digital conversion is obtained as a result of an interpolation of three delay measurements: coarse, medium, and fine. The main TDC structure is similar to [8]. Nonetheless, further improvements have been implemented to reduce the silicon area and perform flash conversion, thus increasing throughput. Each TDC has an independent controller that is used as a time interpolator, to generate internal signals and to control its operating mode. Each controller also manages the interface with the global readout circuit and the column circuitry. A global master DLL generates 16 uniformly spaced phases PHI[15:0] based on a global CLK/START signal. Example waveforms for CLK/START and PHI are shown in Fig. 2.1.2(c). The input frequency of the master DLL is typically 40MHz, thus τC is 25ns for time interval measurements. The time separation between two successive phases sets τM to 1.5625ns. The TDC supports two main operating modes: (i) a measurement mode and (ii) a calibration mode. In measurement mode, the STOP signal originated by the first of four SPADs that detects a photon is 3 mapped to signal TRG in column-level TDC. A 2b counter clocked by signal CLK delivers coarse resolution τC for a measurement range of 4τC. Medium resolution τM is achieved by finding that pair of global phases PHI which delimits the transition of TRG. The register used for τM also generates a synchronization signal SYNC that is precisely asserted on the second phase transition following the rising edge of TRG. The time delay between TRG and SYNC is measured by means of a 32-tap delay line and register, which are designed based on the TDC core of [9]. In the TDC interpolator, the three measurements, delivering 2, 4, and 4 bits of resolution, are combined into the total time delay code. Only 16 delay cells (4b) of the fine delay line are used for the final result. The remaining delay cells are added to accommodate timing shifts due to process, voltage or temperature variations. In calibration mode, the TDC utilizes the full 32-tap fine delay line as a local DLL that locks to two non-successive phases (PHI[i] and PHI[i+2]) with a total duration of 2τM, thus generating a fine resolution τF of 97.66ps. The analog control voltage of the DLL is stored on a local capacitor. Since calibration is performed individually in each TDC, matching requirements between TDCs may be relaxed. The output of all TDCs is transferred off-chip via a fast global readout circuit consisting of 32 TDC interface blocks, a configuration/testing JTAG controller and a pipelined time-multiplexer readout chain. The readout circuit controls 8 digital 12b output buses. Each bus provides the 10b TDC data and 2b column address to identify the originator of the STOP signal among four pixels. In order to maximize data rate, the readout circuit operates four times faster than the TDC frequency. To reduce power consumption, IO pads only change state when valid data are available in a readout cycle. The readout circuit also provides configuration/testing measures to read and modify most TDCs and readout circuit registers via an integrated JTAG controller. The chip micrograph is shown in Fig. 2.1.3 along with a detail of the pixel. The pixel measures 25×25µm2. The sensor was tested in three steps. First, the TDCs were characterized separately. Second, the TDC array was operated in measurement mode and connected with the SPAD array when exposed to ambient light. The performance of the TDC bank is summarized in Fig. 2.1.4. The figure shows the worst-case DNL and INL measured over the entire bank over several days, to verify the effectiveness of calibration over temperature and technological variations. Finally, the chip was exposed to direct pulsed laser illumination generated by a 637nm solid-state laser source. The pulses were 80ps wide with a repetition rate of 40MHz. The power of the laser was adjusted to minimize pile-up distortion and a TOA histogram was built for each pixel. The resulting jitter measurements, along with the dark count rate (DCR), are shown in Fig. 2.1.5. The chip’s overall performance was tested in a breadboard system based on an FPGA. The breadboard was designed to provide all the digital interface signals and memory support for the imager output. The sensor imaged a 3D scene illuminated by a pulsed laser. Figure 2.1.6 shows the 3D image obtained using the same techniques as in [3], whereby the total integration time for one frame was 1s, with a worst-case distance error of 1.4mm. Fig. 2.1.7 is a performance summary of the sensor chip and its various components. This research was supported by a grant from the Swiss National Science Foundation. The authors are grateful to Maximilian Sergio for help during the IC tape-out. References: [1] S. Cova, A. Longoni and A. Andreoni, “Towards Picosecond Resolution with SinglePhoton Avalanche Diodes,” Rev. Sci. Instrum., 52 (3), pp 408-412, 1981. [2] A. Rochas, M. Gani, B. Furrer et al., “Single Photon Detector Fabricated in a Complementary Metal-Oxide-Semiconductor High-Voltage Technology,” Rev. Sci. Instrum., Vol. 74, N. 7, pp 3263-3270, July 2003. [3] C. Niclass and E. Charbon, “A Single Photon Detector Array with 64x64 Resolution and Millimetric Depth Accuracy for 3D Imaging,” ISSCC Dig. Tech. Papers, pp. 364365, Feb. 2005. [4] C. Niclass, M. Sergio and E. Charbon., “A Single-Photon Avalanche Diode Array Fabricated in 0.35µm CMOS and Based on an Event-Driven Readout for TCSPC Experiments,” APCT Conference, SPIE Optics East, Oct. 2006. [5] D. Stoppa, L. Pancheri, M. Scandiuzzo et al., “A CMOS 3-D Imager Based on Single Photon Avalanche Diode,” IEEE T. CAS I, pp. 4-12, Jan. 2007. [6] M. Sergio, C. Niclass and E. Charbon, “A 128×2 CMOS Single-Photon Streak Camera with Timing-Preserving Latchless Pipeline Readout,” ISSCC Dig. Tech. Papers, pp. 394-395, Feb. 2007. [7] B. Aull, J. Burns, C. Chen et al., “Laser Radar Imager Based on 3D Integration of Geiger-Mode Avalanche Photodiodes with Two SOI Timing Circuit Layers,” ISSCC Dig. Tech. Papers, pp. 238-239, Feb. 2006. [8] A. Mantyniemi, T. Rahkonen and J. Kostamovaara, “An Integrated 9-Channel Time Digitizer with 30ps Resolution,” ISSCC Dig. Tech. Papers, pp 266-267, Feb. 2002. [9] R. B. Staszewski, S. Vemulapalli, P. Vallur et al., “Time-to-Digital Converter for RF Frequency Synthesis in 90nm CMOS,“ IEEE RFIC Symp., pp. 473-476, 2005. • 2008 IEEE International Solid-State Circuits Conference ©2008 IEEE ISSCC 2008 / February 4, 2008 / 1:30 PM 2 (a) Figure 2.1.1: Block diagram of the proposed sensor. The sensor consists of a 128x128 pixel array, a bank of 32 TDCs, and a fast parallel readout circuitry. A row decoding logic selects 128 pixels that are activated for detection. The pixels are organized in groups of four that access the same TDC based on a first-in-take-all sharing scheme. (b) 128x128 SPAD array 32 TDC Array I/O (c) Figure 2.1.3: Photomicrograph of the sensor chip with a pixel detail in the inset. The circuit, fabricated in 0.35µm CMOS technology, has a surface of 8x5mm2. The pixel pitch is 25µm. Figure 2.1.2: (a) Pixel schematic. (b) Simplified TDC block-diagram. (c) TDC signal waveform. (a) (b) Figure 2.1.4: Measurements of differential non-linearity (DNL) and integral non-linearity (INL) for the worst case TDC at room temperature. Figure 2.1.5: (a) Time jitter measurement of the SPAD detector and overall circuitry using the integrated TDCs. In the inset, a logarithmic plot is shown. The illumination laser pulse width was 80ps. (b) Dark count rate (DCR) distribution over the array. Continued on Page DIGEST OF TECHNICAL PAPERS • 4 ISSCC 2008 PAPER CONTINUATIONS Parameter Photon detection probability @ Ve=3.3V Photon detection probability @ Ve=4.0V Pixel Sensitivity spectrum Symbol Min. η 3 η λ 3 Resolution (LSB) 350 τF 3D Image Sample 284 Hz 100 ns 100 204.8 ns 70 97.66 200 ps 10 0.08 1.89 19.5 40 MS/s LSB LSB 55.8 MHz 7.68 Gbps JTAG bandwidth 8 Mbps Static power dissipation 33 mW Dynamic power dissipation 150 mW Integration time 1 s Illumination average power 1 mW Illumination peak power 250 mW Illumination duty cycle 0.4 % Target area 1 m2 0.1 1.5 3.75 Figure 2.1.7: Performance summary for the image sensor. • 2008 IEEE International Solid-State Circuits Conference nm Total IO bandwidth Target distance @40MHz 5 % 71.68 INL System 800 Measurement rate Clock frequency % @ 460nm DNL Figure 2.1.6: Experimental 3D image with model picture in inset. The 1σ error computed from two subsequent images is 1.4mm. Unit 40 τDT Tuning of measurement range (10 bits) TDC Max. 35 @ 460nm Median DCR Dead time Typ. ©2008 IEEE m

Log In

A 128�?128 Single-Photon Imager with on-Chip Column-Level 10b Time-to-Digital Converter Array Capable of 97ps Resolution