Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

An optically-enabled chip-multiprocessor architecture using a single-level shared optical cache memory

Published: 01 November 2016 Publication History

Abstract

We present an optical bus-based chip-multiprocessor architecture where the processing cores share an optical single-level cache implemented in a separate chip next to the Central-Processing-Unit (CPU) die. The interconnection system is realized through Wavelength-Division-Multiplexed optical interfaces connecting the shared cache with the cores and the Main-Memory via spatial-multiplexed waveguides. Evaluating the proposed approach, we realize system-level simulations of a wide-range parallel workloads using Gem5. Optical cache architecture is compared against the conventional one that uses dedicated on-chip Level-1 electronic caches and a shared Level-2 cache. Results show significant Level-1 miss rate reduction of up to 96% for certain cases; on average, a performance speed-up of 19.4% or cache capacity requirements reduction of ~63% is attained. Combined with high-bandwidth CPU-Dynamic Random Access Memory (DRAM) bus solutions based on optical interconnects, the proposed design is a promising architecture bridging the gap between high-speed optically connected CPU-DRAM schemes and high-speed optical memory technologies. We present an optical-bus CMP architecture where an optical shared cache is used.The optical cache resides in a separate chip and no on-chip cache is required.The CPU-DRAM communication is realized completely in the optical domain.Significant L1 miss rate reduction of up to 96% for certain cases is attained.Average speed-up of 19.4% or capacity requirements reduction of ~63% is attained.

References

[1]
S.A. McKee, Reflections on the memory wall, in: Proceedings of the 1st Conference on Computing Frontiers (CF '04), ACM, New York, NY, USA, 2004, pp. 162.
[2]
B. Ahsan and M. Zahran, Cache performance, system performance, and off-chip bandwidth¿ pick any two, in: Proceedings of International Workshop on Interconnection Network Architecture: On-Chip, Multi-Chip, Paphos, Cyprus, 2009.
[3]
K. Ali, M. Aboelaze,¿S. Datta, Modified hotspot cache architecture: a low energy fast cache for embedded processors, in: Proceedings of International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation, IC-SAMOS, 17-20 July 2006, pp. 35-42.
[4]
S. Borkar, A.A. Chien, The future of microprocessors, Commun. ACM, 54 (2011) 67-77.
[5]
L. Zhao, R. Iyer, S. Makineni, J. Moses, R. Illikkal,¿D. Newell, Performance, area and bandwidth implications on large-scale CMP cache design, in: Proceedings of the Workshop on Chip Multiprocessor Memory Systems and Interconnects, Phoenix, AZ, USA, 2007.
[6]
P. Kongetira, K. Aingaran, K. Olukotun, Niagara: a 32-way multithreaded sparc processor, IEEE Micro, 25 (2005) 21-29.
[7]
B. Dally, GPU Computing: To ExaScale and Beyond, SC 2010, New Orleans, USA, 2010. Available online at {http://www.nvidia.com/content/PDF/sc_2010/theater/Dally_SC10.pdf}.
[8]
M. Duranton, Design for Silicon Photonics, Retrieved September 6, 2014 from {http://www-leti.cea.fr}.
[9]
H. Ji, K.Ho Ha, I. Joe, S. Gu Kim, K. Won Na, D. Jae Shin, S. Dong Suh, Y. Dong Park,¿C. Hee Chung, Optical interface platform for DRAM integration, in: Proceedings of Optical Fiber Communication Conference, March 2011, Los Angeles CA, USA, Paper OThV4.
[10]
D.J. Shin, K.S. Cho, H.C. Ji, B.S. Lee, S.G. Kim, J.K. Bok, S.H. Choi, Y.H. Shin, J.H. Kim, S.Y. Lee, K.Y. Cho, B.J. Kuh, J.H. Shin, J.S. Lim, J.M. Kim, H.M. Choi, K.H. Ha, Y.D. Park, C.H. Chung, Integration of silicon photonics into DRAM process, in: Proceedings of Optical Fiber Communication Conference, March 2013, Anaheim, CA, USA, Paper OTu2C4.
[11]
K. Lee, D. Jae Shin, H. Ji, K. Na, S. Gu Kim, J. Bok, Y. You, S. Kim, I. Joe, S. Dong Suh, J. Pyo, Y. Shin, K. Ha, Y. Dong Park,¿C. Hee Chung, 10Gb/s silicon modulator based on bulk-silicon platform for DRAM optical interface, in: Proceedings of Optical Fiber Communication Conference, March 2011, Los Angeles CA, USA, Paper JThA33.
[12]
C. Baten, A. Joshi, J. Orcutt, A. Khilo, B. Moss, C.W. Holzwarth, M.A. Popovic, H. Li, H.I. Smith, J.L. Hoyt, F.X. Kartner, R.J. Ram, V. Stojanovic, K. Asanovic, Building many-core processor-to-DRAM networks with monolithic CMOS silicon photonics, IEEE Micro, 29 (2009) 8-21.
[13]
D. Brunina, D. Liu, K. Bergman, An energy-efficient optically connected memory module for hybrid packet- and circuit-switched optical networks, IEEE J. Sel. Top. Quantum Electron., 19 (2013).
[14]
C. Vagionas, D. Fitsios, G.T. Kanellos, N. Pleros, A. Miliou, All optical flip-flop with two coupled travelling waveguide SOA-XGM switches, in: Proceedings of Conference on Laser and Electro-Optics (CLEO), 2012, San Jose CA, USA.
[15]
L. Liu, R. Kumar, K. Huybrechts, T. Spuesens, G. Roelkens, E. Geluk, T. de Vries, P. Regreny, D. Van Thourhout, R. Baets, G. Morthier, An ultra-small, low-power, all-optical flip-flop memory on a silicon chip, Nat. Photonics, 4 (2010) 182-187.
[16]
J. Sakaguchi, T. Katayama, H. Kawaguchi, High switching-speed operation of optical memory based on polarization bistable vertical-cavity surface-emitting laser, IEEE J. Quantum Electron., 46 (2010) 1526-1534.
[17]
Y. Liu, R. Mcdougall, M.T. Hill, G. Maxwell, S. Zhang, R. Harmon, F.M. Huijskens, L. Rivers, H.J.S. Dorren, A. Poustie, Packaged and hybrid integrated all-optical flip-flop memory, Electron. Lett., 42 (2006) 1399-1400.
[18]
E. Kuramochi, K. Nozaki, A. Shinya, K. Takeda, T. Sato, S. Matsuo, H. Taniyama, H. Sumikura, M. Notomi, Large-scale integration of wavelength-addressable all-optical memories on a photonic crystal chip, Nat. Photonics, 8 (2014) 474-481.
[19]
K. Nozaki, A. Shinya, S. Matsuo, Y. Suzaki, T. Segawa, T. Sato, Y. Kawaguchi, R. Takahashi, M. Notomi, Ultralow-power all optical RAM based on nanocavities, Nat. Photonics, 6 (2012) 248-252.
[20]
N. Pleros, D. Apostolopoulos, D. Petrantonakis, C. Stamatiadis, H. Avramopoulos, Optical static RAM cell, IEEE Photonics Technol. Lett., 21 (2009) 73-75.
[21]
D. Fitsios, K. Vyrsokinos, A. Miliou, N. Pleros, Memory speed analysis of optical RAM and optical flip-flop circuits based on coupled SOA-MZI gates, IEEE J. Sel. Top. Quantum Electron., 18 (2012) 1006-1015.
[22]
D. Fitsios, C. Vagionas, G.T. Kanellos, A. Miliou, N. Pleros, Dual-wavelength bit input optical RAM with three SOA XGM switches, IEEE Photonics Technol. Lett., 24 (2012) 1142-1144.
[23]
G.T. Kanellos, D. Fitsios, T. Alexoudi, C. Vagionas, A. Miliou, N. Pleros, Bringing WDM into optical static RAM architectures, J. Light. Technol., 31 (2013) 988-995.
[24]
T. Alexoudi, S. Papaioannou, G.T. Kanellos, A. Miliou, N. Pleros, Optical cache memory peripheral circuitry: row and column address selectors for optical static RAM banks, J. Light. Technol., 31 (2013) 4098-4110.
[25]
C. Vagionas, S. Markou, G. Dabos, T. Alexoudi, D. Tsiokos, A. Miliou, N. Pleros, G.T. Kanellos, Optical RAM row access and column decoding for WDM-formatted optical words, in: Proceedings of Optical Fiber Communication Conference, March 2013, Anaheim, CA, USA, Paper JW2A56.
[26]
P. Maniotis, D. Fitsios, G.T. Kanellos, N. Pleros, Optical buffering for chip multiprocessors: a 16GHz optical cache memory architecture, J. Light. Technol., 31 (2013) 4175-4191.
[27]
VPI Photonics, 2014, {http://www.vpiphotonics.com/}.
[28]
P. Maniotis, D. Fitsios, G. T. Kanellos,¿N. Pleros, A 16GHz Optical Cache Memory Architecture for Set-Associative Mapping in Chip Multiprocessors, in: Proceedings of Optical Fiber Communication Conference, OSA Technical Digest (online) (Optical Society of America, 2014), paper Th2A.6.
[29]
P. Maniotis, S. Gitzenis, L. Tassiulas,¿N. Pleros, A novel Chip-Multiprocessor Architecture with optically interconnected shared L1 Optical Cache Memory, in: Proceedings of Optical Fiber Communication Conference, OSA Technical Digest (online) (Optical Society of America, 2014), paper W2A.60.
[30]
C. Bienia, K. Li, PARSEC 2.0: A new benchmark suite for chip-multiprocessors, in: Proceedings of 5th Annual Workshop on Modeling, Benchmarking and Simulation, June 2009. Available online at: {http://parsec.cs.princeton.edu/publications.htm}.
[31]
G. Hendry, S. Kamil, A. Biberman, J. Chan, B.G. Lee, M. Mohiyuddin, A. Jain, K. Bergman, L.P. Carloni, J. Kubiatowicz, L. Oliker, J. Shalf, Analysis of photonic networks for a chip multiprocessor using scientific applications, in: Proceedings of the 3rd ACM/IEEE International Symposium on Networks-on-Chip, 10-13 May 2009, pp. 104-113.
[32]
N. Binkert, B. Beckmann, G. Black, S.K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D.R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M.D. Hill, D.A. Wood, The gem5 simulator, SIGARCH Comput. Archit. News, 39 (2011) 1-7.
[33]
W. Stallings, Computer Organization and Architecture, Pearson Education, Noida, India, 2010.
[34]
A. Sugama, K. Kawaguchi, M. Nishizawa, H. Muranaka, Y. Arakawa, Development of high-density single-mode polymer waveguides with low crosstalk for chip-to-chip optical interconnection, Opt. Express, 21 (2013) 24231-24239.
[35]
K. Hasharoni, S. Benjamin, A. Geron, G. Katz, S. Stepanov, N. Margalit, M. Mesh, A high end routing platform for core and edge applications based on chip to chip optical interconnect, in: Proceedings of Optical Fiber Communication Conference, Anaheim, CA, USA, March 2013, Paper OTu3H2.
[36]
N. Li, C. L. Schow, D.M. Kuchta, F.E. Doany, B.G. Lee, W. Luo, C. Xie, X. Sun, K.P. Jackson, C. Lei, High-performance 850nm VCSEL and photodetector arrays for 25Gb/s parallel optical interconnects, in: Proceedings of Optical Fiber Communication Conference, San Diego, CA, USA, March 2010, Paper OTuP2.
[37]
F. E. Doany, B. G. Lee, D. M. Kuchta, A. V. Rylyakov, C. Baks, C. Jahnes, F. Libsch,¿C. L. Schow, Terabit/Sec VCSEL-based 48-channel optical module based on holey CMOS transceiver IC, J. Lightwave Technol. 31(4), (2012) 672-680.
[38]
G. T. Kanellos, T. Alexoudi, D. Fitsios, C. Vagionas, P. Maniotis, S. Papaioannou, A. Miliou,¿N. Pleros, WDM-enabled optical RAM architectures for ultrafast, low-power optical cache memories, In Proceedings of 15th Int. Conference on Transparent Optical Networks, Cartagena, Spain, June 2013.
[39]
D. Apostolopoulos, K. Vyrsokinos, P. Zakynthinos, N. Pleros, H. Avramopoulos, An SOA-MZI NRZ wavelength conversion scheme with enhanced 2R regeneration characteristics, IEEE Photonics Technol. Lett., 21 (2009) 1363-1365.
[40]
Q. Wang, G. Zhu, H. Chen, J. Jaques, J. Leuthold, A.B. Piccirilli, N.K. Dutta, Study of all-optical XOR using Mach-Zehnder interferometer and differential scheme, IEEE J. Quantum Electron., 40 (2004) 703-710.
[41]
R. Kumar, L. Liu, G. Roelkens, E. Geluk, T. de Vries, F. Karouta, P. Regreny, D. Van Thourhout, R. Baets, G. Morthier, 10-GHz all-optical gate based on a III-V/SOI microdisk, IEEE Photonics Technol. Lett., 22 (2010) 981-983.
[42]
K. Lengle, M. Gay, A. Bazin, I. Sagnes, R. Braive, P. Monnier, L. Bramerie, N. Nguyen, C. Pareige, R. Madec, J. Claude Simon, R. Raj,¿F. Raineri, Fast all-optical 10Gb/s NRZ wavelength conversion and power limiting function using Hybrid InP on SOI nanocavity, in: Proceedings of European Conference and Exhibition on Optical Communication, OSA Technical Digest (online) (Optical Society of America, 2012), paper We.2.E.5.
[43]
K. Nozaki, A. Shinya, S. Matsuo, T. Sato, E. Kuramochi, M. Notomi, Ultralow-energy and high-contrast all-optical switch involving fano resonance based on coupled photonic crystal nanocavities, Opt. Express, 21 (2013) 11877-11888.
[44]
T. Nam Nguyen, M. Gay, K. Lenglé, L. Bramerie, M. Thual, J.C. Simon, S. Malaguti, G. Bellanca, S. Trillo, S. Combrié, G. Lehoucq, A. De Rossi, 100-Gb/s wavelength division demultiplexing using a photonic crystal four-channel drop filter, IEEE Photonics Technol. Lett., 25 (2013) 813-816.
[45]
N. Muralimanohar, R. Balasubramonian, N.P. Jouppi, CACTI 6.0: A tool to model large caches, HP Laboratories, 2009.
[46]
AMD64 Architecture Programmer's Manual Volume 2: System Programming, Publication Number 24593 (May 2013), Revision 3.23. Retrieved September 12, 2014 from {http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/24593_APM_v21.pdf}.
[47]
J. Feehrer, S. Jairath, P. Loewenstein, R. Sivaramakrishnan, D. Smentek, S. Turullols, A. Vahidsafa, The oracle sparc T5 16-core processor scales to eight sockets, IEEE Micro, 33 (2013) 48-57.
[48]
Gem5 website, 2014, {http://www.gem5.org/SimpleCPU}.
[49]
C. Bienia, S. Kumar,¿K. Li, PARSEC vs. SPLASH-2: A quantitative comparison of two multithreaded benchmark suites on Chip-Multiprocessors, in: Proceedings of the IEEE International Symposium on Workload Characterization, 14-16 September 2008, pp. 47-56.
[50]
R. Kalla, B. Sinharoy, W.J. Starke, M. Floyd, Power7: IBM's next-generation server processor, IEEE Micro, 30 (2010) 7-15.
[51]
J. Kider, NVIDIA Fermi architecture, 2011, Retrieved September 1, 2014 from {http://www.seas.upenn.edu/~cis565/Lectures2011/Lecture16_Fermi.pdf}.
[52]
D.B. Kirk, W.W. Hwu, NVIDIA G80 architecture and CUDA programming, Retrieved September 2, 2014 from {http://tjwallas.weebly.com/uploads/3/5/1/9/3519640/nvidia_g80_architecture_and_cuda_programming.pdf}.
[53]
NVIDIA Fermi Whitepaper: NVIDIA's Next Generation CUDA Compute Architecture: FERMI, Retrieved September 2, 2014 from {http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf}.
[54]
NVIDIA Kepler Whitepaper: NVIDIA's Next Generation CUDA Compute Architecture: KEPLER GK110, Retrieved September 2, 2014 from {http://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf}.
[55]
Top 500 Supercomputers' list of November 2013 {http://www.top500.org}.
[56]
A. Bazin, K. Lengle, M. Gay, P. Monnier, L. Bramerie, R. Braive, G. Beaudoin, I. Sagnes, R. Raj, F. Raineri, Ultrafast all-optical switching and error-free 10 Gbit/s wavelength conversion in hybrid InP-silicon on insulator nanocavities using surface quantum wells, Appl. Phys. Lett., 104 (2014) 11102.
[57]
S.-M. Moon, Increasing cache bandwidth using multiport caches for exploiting ILP in non-numerical code, in: IEEE Proceedings of Computers and Digital Techniques, vol. 144 (5), September 1997, pp. 295-303.
[58]
R. Yu, S. Cheung, Y. Li, K. Okamoto, R. Proietti, Y. Yin, S.J.B. Yoo, A scalable silicon photonic chip-scale optical switch for high performance computing systems, Opt. Express, 21 (2013) 32655-32667.
[59]
J. Rosenberg, W. Green, S. Assefa, D. Gill, T. Barwicz, M. Yang, S. Shank, Y. Vlasov, A 25 Gbps silicon microring modulator based on an interleaved junction, Opt. Express, 20 (2012) 26411-26423.
[60]
Gyu-Seob Jeong, Hankyu Chi, Kyungock Kim, Deog-Kyoon Jeong, A 20-Gb/s 1.27pJ/b low-power optical receiver front-end in 65nm CMOS, in: Proceedings of the 2014 IEEE International Symposium on Circuits and Systems (ISCAS), 1-5 June 2014, pp.1492-1495.
[61]
G. Li, X. Zheng, J. Yao, H. Thacker, I. Shubin, Y. Luo, K. Raj, J. Cunningham, A. Krishnamoorthy, 25Gb/s 1V-driving CMOS ring modulator with integrated thermal tuning, Opt. Express, 19 (2011) 20435-20443.
[62]
E. Timurdogan, C.M. Sorace-Agaskar, J. Sun, E.S. Hosseini, A. Biberman, M.R. Watts, An ultralow power athermal silicon modulator, Nat. Commun., 5 (2014).
[63]
C. Vagionas, D. Fitsios, K. Vyrsokinos, G.T. Kanellos, A. Miliou, N. Pleros, XPM- and XGM-based optical RAM memories: frequency and time domain theoretical analysis, IEEE J. Quantum Electron., 50 (2014) 683-697.

Cited By

View all
  • (2022)A Practical Shared Optical Cache With Hybrid MWSR/R-SWMR NoC for Multicore ProcessorsACM Journal on Emerging Technologies in Computing Systems10.1145/353101218:4(1-28)Online publication date: 20-Apr-2022
  • (2021)Pho$Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design10.1109/ISLPED52811.2021.9502487(1-6)Online publication date: 26-Jul-2021
  1. An optically-enabled chip-multiprocessor architecture using a single-level shared optical cache memory

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Optical Switching and Networking
      Optical Switching and Networking  Volume 22, Issue C
      November 2016
      127 pages

      Publisher

      Elsevier Science Publishers B. V.

      Netherlands

      Publication History

      Published: 01 November 2016

      Author Tags

      1. Cache sharing in chip multiprocessors
      2. Optical bus-based chip multiprocessor
      3. Optical cache memories
      4. Optically connected shared cache memory

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 13 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2022)A Practical Shared Optical Cache With Hybrid MWSR/R-SWMR NoC for Multicore ProcessorsACM Journal on Emerging Technologies in Computing Systems10.1145/353101218:4(1-28)Online publication date: 20-Apr-2022
      • (2021)Pho$Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design10.1109/ISLPED52811.2021.9502487(1-6)Online publication date: 26-Jul-2021

      View Options

      View options

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media