Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

SPECTRUM: A Software-defined Predictable Many-core Architecture for LTE/5G Baseband Processing

Published: 26 September 2020 Publication History

Abstract

Wireless communication standards such as Long-term Evolution (LTE) are rapidly changing to support the high data-rate of wireless devices. The physical layer baseband processing has strict real-time deadlines, especially in the next-generation applications enabled by the 5G standard. Existing basestation transceivers utilize customized DSP cores or fixed-function hardware accelerators for physical layer baseband processing. However, these approaches incur significant non-recurring engineering costs and are inflexible to newer standards or updates. Software-programmable processors offer more adaptability. However, it is challenging to sustain guaranteed worst-case latency and throughput at reasonably low-power on shared-memory many-core architectures featuring inherently unpredictable design choices, such as caches and Network-on-chip (NoC).
We propose SPECTRUM, a predictable, software-defined many-core architecture that exploits the massive parallelism of the LTE/5G baseband processing workload. The focus is on designing scalable lightweight hardware that can be programmed and defined by sophisticated software mechanisms. SPECTRUM employs hundreds of lightweight in-order cores augmented with custom instructions that provide predictable timing, a purely software-scheduled NoC that orchestrates the communication to avoid any contention, and per-core software-controlled scratchpad memory with deterministic access latency. Compared to many-core architecture like Skylake-SP (average power 215 W) that drops 14% packets at high-traffic load, 256-core SPECTRUM by definition has zero packet drop rate at significantly lower average power of 24 W. SPECTRUM consumes 2.11× lower power than C66x DSP cores+accelerator platform in baseband processing. We also enable SPECTRUM to handle dynamic workloads with multiple service categories present in 5G mobile network (Enhanced Mobile Broadband (eMBB), Ultra-reliable and Low-latency Communications (URLLC), and Massive Machine Type Communications (mMTC)), using a run-time scheduling and mapping algorithm. Experimental evaluations show that our algorithm performs task/NoC mapping at run-time on fewer cores compared to the static mapping (that reserves cores exclusively for each service category) while still meeting the differentiated latency and reliability requirements.

References

[1]
2009. Alcatel-Lucent 9926 digital 2U eNodeB baseband unit. Alcatel-lucent product brief. Retrieved from https://bit.ly/3gKlOv0.
[2]
2010. Amber ARM-Compatible Core. Retrieved from https://opencores.org/project,amber.
[3]
2011. LTE baseband targeted design platform. Xilinx product brief. Retrieved from http://www.origin.xilinx.com/publications/prod_mktg/LTE-Baseband-SellSheet.pdf.
[4]
2011. Temperature Control Solution of Communication Base Station. Retrieved from https://bit.ly/2Bpa9jH.
[5]
2012. LTE baseband targeted design platform. Xilinx product brief. Retrieved from https://www.intel.com/content/dam/altera-www/global/en_US/pdfs/literature/po/wireless-channel-card.pdf.
[6]
2012. Octean Fusion-M CN73XX. Retrieved from https://bit.ly/2TypyW7.
[7]
2013. 66AK2Hxx Multicore DSP+ARM Keystone II SoC. Retrieved from https://bit.ly/2zgPDjO.
[8]
2013. QorIQ Qonverge B4860 Baseband Processor. Retrieved from https://bit.ly/2uT6lnp.
[9]
2013. SoC and ASIC Design At Ericsson. Retrieved from https://bit.ly/2TOMLmP.
[10]
2014. Open Air Interface. Retrieved from http://www.openairinterface.org/.
[11]
2016. Transcede t3K Concurrent Dual-Mode SoC Family Communiation Infrastructure. Retrieved from https://intel.ly/2OvK4aY.
[12]
2017. LTE 3GPP releases Overview. Retrieved from https://bit.ly/2DNNnoh.
[13]
2018. Personal communication with base station manufacturer.
[14]
2019. Ericsson Mobility Report. Retrieved from https://bit.ly/2LONsuD.
[15]
2019. LTE UE Category 8 Class Definitions. Retrieved from https://bit.ly/30Kf5cw.
[16]
3GPP. 2017. Evolved Universal Terrestrial Radio Access (E-UTRA); Physical Channels and Modulation. Technical Specification (TS) 36.211. 3rd Generation Partnership Project (3GPP). Version 14.2.0.
[17]
3GPP. 2017. Evolved Universal Terrestrial Radio Access (E-UTRA); Physical Layer Procedures. Technical Specification (TS) 36.213. 3rd Generation Partnership Project (3GPP). Version 14.2.0.
[18]
3GPP. 2018. Universal Mobile Telecommunications System (UMTS); Base Station (BS) Radio Transmission and Reception (FDD). Technical Specification (TS) 25.104. Retrieved from http://www.3gpp.org/release-15 Version 15.4.0 Release 15.
[19]
Sebastian Altmeyer et al. 2014. Evaluation of cache partitioning for hard real-time systems. In Proceedings of the Euromicro Conference on Real-Time Systems (ECRTS’14).
[20]
Oren Avissar, Rajeev Barua, and Dave Stewart. 2002. An optimal memory allocation scheme for scratch-pad-based embedded systems. ACM Trans. Embed. Comput. Syst. 1, 1 (Nov. 2002), 6--26.
[21]
Rajeshwari Banakar, Stefan Steinke, Bo-Sik Lee, M. Balakrishnan, and Peter Marwedel. 2002. Scratchpad memory: Design alternative for cache on-chip memory in embedded systems. In Proceedings of the 10th International Symposium on Hardware/Software Codesign (CODES’02). ACM, NY, 73--78.
[22]
Sandro Belfanti, Christoph Roth, Michael Gautschi, Christian Benkeser, and Qiuting Huang. 2013. A 1Gbps LTE-advanced turbo-decoder ASIC in 65nm CMOS. In Proceedings of the Symposium on VLSI Circuits (VLSIC’13). IEEE, C284--C285.
[23]
Paul Bender, Peter Black, Matthew Grob, Roberto Padovani, Nagabhushana Sindhushayana, and Andrew Viterbi. 2010. CDMA/HDR: A bandwidth-efficient high-speed wireless data service for nomadic users. In The Foundations of the Digital Wireless World: Selected Works of A. J. Viterbi. World Scientific, 161--168.
[24]
Sourjya Bhaumik, Shoban Preeth Chandrabose, Manjunath Kashyap Jataprolu, Gautam Kumar, Anand Muralidhar, Paul Polakos, Vikram Srinivasan, and Thomas Woo. 2012. CloudIQ: A framework for processing base stations in a data center. In Proceedings of the 18th Annual International Conference on Mobile Computing and Networking. ACM, 125--136.
[25]
Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark D. Hill, and David A. Wood. 2011. The Gem5 simulator. SIGARCH Comput. Archit. News 39, 2 (Aug. 2011), 1--7.
[26]
Ouajdi Brini and Mounir Boukadoum. 2017. Virtualization of the LTE physical layer symbol processing with GPUs. In Proceedings of the 15th IEEE International New Circuits and Systems Conference (NEWCAS’17). IEEE, 329--332.
[27]
Dai Bui, Alessandro Pinto, and Edward A. Lee. 2009. On-time network on-chip: Analysis and architecture. EECS Department, University of California, Berkeley, Technical report UCB/EECS-2009-59.
[28]
Dai N. Bui, Hiren D. Patel, and Edward A. Lee. 2010. Deploying hard real-time control software on-chip multiprocessors. In Proceedings of the IEEE 16th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA’10). IEEE, 283--292.
[29]
Nicola Bui and Joerg Widmer. 2016. Owl: A reliable online watcher for lte control channel measurements. In Proceedings of the 5th Workshop on All Things Cellular: Operations, Applications and Challenges. ACM, 25--30.
[30]
Divya Chitimalla, Koteswararao Kondepu, Luca Valcarenghi, and Biswanath Mukherjee. 2015. Reconfigurable and efficient fronthaul of 5G systems. In Proceedings of the IEEE International Conference on Advanced Networks and Telecommuncations Systems, ANTS 2015, Kolkata, India, December 15-18, 2015. 1--5.
[31]
Christoph Cullmann et al. 2010. Predictability considerations in the design of multi-core embedded systems. In Proceedings of the IEEE Real-Time Systems Symposium (RTSS’10).
[32]
W. J. Dally. 1992. Virtual-channel flow control. IEEE Trans. Parallel Distrib. Syst. 3, 2 (Mar. 1992), 194--205.
[33]
Benoît Dupont de Dinechin, Pierre Guironnet de Massas, Guillaume Lager, Clément Léger, Benjamin Orgogozo, Jérôme Reybert, and Thierry Strudel. 2013. A distributed run-time environment for the kalray MPPA-256 integrated manycore processor. In Proceedings of the International Conference on Computational Science (ICCS’13), Vol. 13. 1654--1663.
[34]
Stephen A. Edwards and Edward A. Lee. 2007. The case for the precision timed (PRET) machine. In Proceedings of the 44th ACM/IEEE Design Automation Conference. IEEE, 264--265.
[35]
R. Damodaran et al. 2012. A 1.25 GHz 0.8 W C66x DSP core in 40 nm CMOS. In Proceedings of the IEEE International Conference on VLSI Design (VLSID’12).
[36]
Heiko Falk et al. 2007. Compile-time decided instruction cache locking using worst-case execution paths. In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’07).
[37]
Heiko Falk et al. 2009. Optimal static WCET-aware scratchpad allocation of program code. In Proceedings of the Design Automation Conference (DAC’09).
[38]
Arnon Friedmann and Sandeep Kumar. 2009. LTE emerges as early leader in 4G technologies. In White Paper. Texas Instruments.
[39]
Nan Guan et al. 2009. Cache-aware scheduling and analysis for multicores. In Proceedings of the International Conference on Embedded Software (EMSOFT’09).
[40]
Andreas Hansson, Mahesh Subburaman, and Kees Goossens. 2009. Aelite: A flit-synchronous network on chip with composable and predictable services. In Proceedings of the Conference on Design, Automation, and Test in Europe (DATE’09). European Design and Automation Association, Belgium, 250--255. http://dl.acm.org/citation.cfm?id=1874620.1874679
[41]
S. Hesham, J. Rettkowski, D. Goehringer, and M. A. Abd El Ghany. 2017. Survey on real-time networks-on-chip. IEEE Trans. Parallel Distrib. Syst. 28, 5 (May 2017), 1500--1517.
[42]
Huawei. [n.d.]. Base Station Operation Increases the Efficiency of Network Construction. Retrieved from http://carrier.huawei.com/en/solutions/maximizing-network-value/base-station-operation-increases-the-efficiency.
[43]
Yiming Huo, Xiaodai Dong, and Wei Xu. 2017. 5G cellular user equipment: From theory to practical hardware design. IEEE Access 5 (2017), 13992--14010.
[44]
ITU. 2018. Setting the Scene for 5G: Opportunities 8 Challenges. Retrieved from https://bit.ly/2MO2Swv.
[45]
Xianfeng Li, Yun Liang, Tulika Mitra, and Abhik Roychoudhury. 2007. Chronos: A timing analyzer for embedded software. Sci. Comput. Program. 69, 1--3 (2007), 56--67.
[46]
Jing Lu, Ke Bai, and Aviral Shrivastava. 2015. Efficient code assignment techniques for local memory on software managed multicores. ACM Trans. Embed. Comput. Syst. 14, 4, Article 71 (Dec. 2015), 24 pages.
[47]
Timothy G. Mattson, Michael Riepen, Thomas Lehnig, Paul Brett, Werner Haas, Patrick Kennedy, Jason Howard, Sriram Vangal, Nitin Borkar, Greg Ruhl, et al. 2010. The 48-core scc processor: The programmer’s view. In Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society, 1--11.
[48]
S. Murali, M. Coenen, A. Radulescu, K. Goossens, and G. De Micheli. 2006. A methodology for mapping multiple use-cases onto networks on chips. In Proceedings of the Design Automation Test in Europe Conference, Vol. 1. 1--6.
[49]
ns 3. 2010. ns-3 network simulator. Retrieved from https://www.nsnam.org/.
[50]
Imtiaz Parvez, Ali Rahmati, Ismail Guvenc, Arif I. Sarwat, and Huaiyu Dai. 2018. A survey on low latency towards 5G: RAN, core network and caching solutions. IEEE Communications Surveys 8 Tutorials 20, 4 (2018), 3098--3130.
[51]
Arogyaswami Paulraj, Rohit Nabar, and Dhananjay Gore. 2003. Introduction to Space-time Wireless Communications. Cambridge University Press.
[52]
Klaus I. Pedersen, Gilberto Berardinelli, Frank Frederiksen, Preben Mogensen, and Agnieszka Szufarska. 2016. A flexible 5G frame structure design for frequency-division duplex cases. IEEE Commun. Mag. 54, 3 (2016), 53--59.
[53]
Maxime Pelcat, Karol Desnos, Julien Heulot, Clément Guy, Jean François Nezan, and Slaheddine Aridhi. 2014. Preesm: A dataflow-based rapid prototyping framework for simplifying multicore dsp programming. In Proceedings of the European Embedded Design in Education and Research Conference (EDERC’14). 36.
[54]
Martin Schoeberl, Sahar Abbaspour, Benny Akesson, Neil Audsley, Raffaele Capasso, Jamie Garside, Kees Goossens, Sven Goossens, Scott Hansen, Reinhold Heckmann, et al. 2015. T-CREST: Time-predictable multi-core architecture for embedded systems. J. Syst. Architect. 61, 9 (2015), 449--471.
[55]
Martin Schoeberl, Florian Brandner, Jens Sparsø, and Evangelia Kasapaki. 2012. A statically scheduled time-division-multiplexed network-on-chip for real-time systems. In Proceedings of the IEEE/ACM Sixth International Symposium on Networks-on-Chip (NOCS’12). IEEE Computer Society, Washington, D.C., 152--160.
[56]
Philipp Schulz, Maximilian Matthe, Henrik Klessig, Meryem Simsek, Gerhard Fettweis, Junaid Ansari, Shehzad Ali Ashraf, Bjoern Almeroth, Jens Voigt, Ines Riedel, et al. 2017. Latency critical IoT applications in 5G: Perspective on the design of radio interface and network architecture. IEEE Commun. Mag. 55, 2 (2017), 70--78.
[57]
Silexica. 2016. Multi-core Software Design For an LTE Base Station, White Paper. Retrieved from https://bit.ly/2TyE7sx.
[58]
Magnus Sjalander, Sally A. McKee, Peter Brauer, David Engdal, and Andras Vajda. 2012. An LTE uplink receiver PHY benchmark and subframe-based power management. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems 8 Software (ISPASS’12). IEEE Computer Society, Washington, D.C, 25--34.
[59]
Avinash Sodani. 2015. Knights landing (KNL): 2nd Generation Intel Xeon Phi processor. In Proceedings of the Hot Chips 27 Symposium (HCS’15). IEEE, 1--24.
[60]
Manikantan Srinivasan, C. Siva Ram Murthy, and Anusuya Balasubramanian. 2015. Modular performance analysis of multicore SoC-based small cell LTE base station. In Proceedings of the IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-SoC’15). IEEE, 37--42.
[61]
Christoph Studer, Christian Benkeser, Sandro Belfanti, and Quiting Huang. 2011. Design and implementation of a parallel turbo-decoder ASIC for 3GPP-LTE. IEEE J. Solid-State Circ. 46, 1 (2011), 8--17.
[62]
Vivy Suhendra et al. 2005. WCET centric data allocation to scratchpad memory. In Proceedings of the IEEE Real-Time Systems Symposium (RTSS’05).
[63]
Michael Bedford Taylor, Jason Kim, Jason Miller, David Wentzlaff, Fae Ghodrat, Ben Greenwald, Henry Hoffman, Paul Johnson, Jae-Wook Lee, Walter Lee, et al. 2002. The raw microprocessor: A computational fabric for software circuits and general-purpose programs. IEEE Micro 22, 2 (2002), 25--35.
[64]
Theo Ungerer, Francisco Cazorla, Pascal Sainrat, Guillem Bernat, Zlatko Petrov, Christine Rochange, Eduardo Quinones, Mike Gerdes, Marco Paolieri, Julian Wolf, et al. 2010. Merasa: Multicore execution of hard real-time applications supporting analyzability. IEEE Micro 30, 5 (2010), 66--75.
[65]
Leslie G. Valiant. 1982. A scheme for fast parallel communication. SIAM J. Comput. 11, 2 (1982), 350--361.
[66]
Vanchinathan Venkataramani, Mun Choon Chan, and Tulika Mitra. 2019. Scratchpad-memory management for multi-threaded applications on many-core architectures. ACM Trans. Embed. Comput. Syst. 18, 1 (2019), 10.
[67]
Vanchinathan Venkataramani, Anuj Pathania, and Tulika Mitra. 2020. Unified thread-and data-mapping for multi-threaded multi-phase applications on SPM many-cores. In 2020 Design, Automation Test in Europe Conference Exhibition (DATE'20). 1496--1501.
[68]
Xavier Vera, Björn Lisper, and Jingling Xue. 2007. Data cache locking for tight timing calculations. ACM Trans. Embed. Comput. Syst. 7, 1 (2007), 1--38.
[69]
Reinhard Wilhelm, Jakob Engblom, Andreas Ermedahl, Niklas Holsti, Stephan Thesing, David Whalley, Guillem Bernat, Christian Ferdinand, Reinhold Heckmann, Tulika Mitra, et al. 2008. The worst-case execution-time problem—overview of methods and survey of tools. ACM Trans. Embed. Comput. Syst. 7, 3 (2008), 1--53.
[70]
Qi Zheng, Yajing Chen, Ronald G. Dreslinski, Chaitali Chakrabarti, Achilleas Anastasopoulos, Scott A. Mahlke, and Trevor N. Mudge. 2013. WiBench: An open source kernel suite for benchmarking wireless systems. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC’13), Portland, OR, USA, September 22-24, 2013. 123--132.
[71]
Qi Zheng, Yajing Chen, Hyunseok Lee, Ronald Dreslinski, Chaitali Chakrabarti, Achilleas Anastasopoulos, Scott Mahlke, and Trevor Mudge. 2015. Using graphics processing units in an LTE base station. J. Signal Process. Systems 78, 1 (Jan. 2015), 35--47.

Cited By

View all
  • (2024)Optimizing the Micro-Architectural Performance of the Current and Emerging Edge InfrastructureIEEE Transactions on Cloud Computing10.1109/TCC.2023.333381312:1(40-52)Online publication date: Jan-2024
  • (2024)Timing enclaves for performance in Lingua Franca2024 Forum on Specification & Design Languages (FDL)10.1109/FDL63219.2024.10673834(1-9)Online publication date: 4-Sep-2024
  • (2024)Coarse-grained reconfigurable architectures for radio baseband processing: A surveyJournal of Systems Architecture10.1016/j.sysarc.2024.103243154(103243)Online publication date: Sep-2024
  • Show More Cited By

Index Terms

  1. SPECTRUM: A Software-defined Predictable Many-core Architecture for LTE/5G Baseband Processing

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Embedded Computing Systems
    ACM Transactions on Embedded Computing Systems  Volume 19, Issue 5
    Special Issue on LCETES, Part 1, Real-Time, Critical Systems, and Approximation
    September 2020
    229 pages
    ISSN:1539-9087
    EISSN:1558-3465
    DOI:10.1145/3426818
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Journal Family

    Publication History

    Published: 26 September 2020
    Accepted: 01 April 2020
    Revised: 01 April 2020
    Received: 01 November 2019
    Published in TECS Volume 19, Issue 5

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. 5G
    2. LTE
    3. Time-predictable architecture
    4. baseband processing
    5. low-power
    6. many-cores

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • Industry-IHL Partnership

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)52
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 04 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Optimizing the Micro-Architectural Performance of the Current and Emerging Edge InfrastructureIEEE Transactions on Cloud Computing10.1109/TCC.2023.333381312:1(40-52)Online publication date: Jan-2024
    • (2024)Timing enclaves for performance in Lingua Franca2024 Forum on Specification & Design Languages (FDL)10.1109/FDL63219.2024.10673834(1-9)Online publication date: 4-Sep-2024
    • (2024)Coarse-grained reconfigurable architectures for radio baseband processing: A surveyJournal of Systems Architecture10.1016/j.sysarc.2024.103243154(103243)Online publication date: Sep-2024
    • (2023)Fast Shared-Memory Barrier Synchronization for a 1024-Cores RISC-V Many-Core ClusterEmbedded Computer Systems: Architectures, Modeling, and Simulation10.1007/978-3-031-46077-7_16(241-254)Online publication date: 2-Jul-2023
    • (2022)A Partial-Reconfiguration-Enabled HW/SW Co-Design Benchmark for LTE ApplicationsElectronics10.3390/electronics1107097811:7(978)Online publication date: 22-Mar-2022
    • (2022)LosaTM: A Hardware Transactional Memory Integrated With a Low-Overhead Scenario-Awareness Conflict ManagerIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.320677733:12(4849-4862)Online publication date: 1-Dec-2022
    • (2022)ASCENT: Communication Scheduling for SDF on Bufferless Software-Defined NoCIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2021.312844541:10(3266-3275)Online publication date: Oct-2022
    • (2022)Strictly Periodic Scheduling of Cyclo-Static Dataflow ModelsEmbedded Computer Systems: Architectures, Modeling, and Simulation10.1007/978-3-031-04580-6_15(229-241)Online publication date: 27-Apr-2022
    • (2021)Domain-specific Hybrid Mapping for Energy-efficient Baseband Processing in Wireless NetworksACM Transactions on Embedded Computing Systems10.1145/347699120:5s(1-26)Online publication date: 23-Sep-2021

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media