Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleSeptember 2024
CoolDC: A Cost-Effective Immersion-Cooled Datacenter with Workload-Aware Temperature Scaling
ACM Transactions on Architecture and Code Optimization (TACO), Volume 21, Issue 3Article No.: 51, Pages 1–27https://doi.org/10.1145/3664925For datacenter architects, it is the most important goal to minimize the datacenter’s total cost of ownership for the target performance (i.e., TCO/performance). As the major component of a datacenter is a server farm, the most effective way of reducing ...
- research-articleApril 2024
A Fault-Tolerant Million Qubit-Scale Distributed Quantum Computer
- Junpyo Kim,
- Dongmoon Min,
- Jungmin Cho,
- Hyeonseong Jeong,
- Ilkwon Byun,
- Junhyuk Choi,
- Juwon Hong,
- Jangwoo Kim
ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2Pages 1–19https://doi.org/10.1145/3620665.3640388A million qubit-scale quantum computer is essential to realize the quantum supremacy. Modern large-scale quantum computers integrate multiple quantum computers located in dilution refrigerators (DR) to overcome each DR's unscaling cooling budget. However,...
- research-articleNovember 2023
Fast, Light-weight, and Accurate Performance Evaluation using Representative Datacenter Behaviors
Middleware '23: Proceedings of the 24th International Middleware ConferencePages 220–233https://doi.org/10.1145/3590140.3629117Datacenters rapidly evolve by adopting new features such as new hardware deployment and software patches. Adopting a new feature requires an accurate evaluation of its impact to minimize the risk to the multi-million dollar computing infrastructure. ...
- research-articleJune 2023
F4T: A Fast and Flexible FPGA-based Full-stack TCP Acceleration Framework
ISCA '23: Proceedings of the 50th Annual International Symposium on Computer ArchitectureArticle No.: 55, Pages 1–13https://doi.org/10.1145/3579371.3589090As complex workloads that run on many servers are pursuing higher networking throughput, more CPU cycles are consumed to support the TCP stack. To mitigate the high CPU burden from executing the compute-intensive TCP, prior works have proposed to ...
- research-articleJune 2023
QIsim: Architecting 10+K Qubit QC Interfaces Toward Quantum Supremacy
ISCA '23: Proceedings of the 50th Annual International Symposium on Computer ArchitectureArticle No.: 1, Pages 1–16https://doi.org/10.1145/3579371.3589036A 10+K qubit Quantum-Classical Interface (QCI) is essential to realize the quantum supremacy. However, it is extremely challenging to architect scalable QCIs due to the complex scalability trade-offs regarding operating temperatures, device and wire ...
-
- research-articleApril 2023
STfusion: Fast and Flexible Multi-NN Execution Using Spatio-Temporal Block Fusion and Memory Management
IEEE Transactions on Computers (ITCO), Volume 72, Issue 4Pages 1194–1207https://doi.org/10.1109/TC.2022.3218428To maximize the cost-effectiveness of neural network (NN) accelerators, architects are actively developing single-chip accelerators which can execute many NNs simultaneously. However, previous approaches fail to achieve full performance potential by ...
- research-articleFebruary 2023
A Fast and Flexible FPGA-based Accelerator for Natural Language Processing Neural Networks
ACM Transactions on Architecture and Code Optimization (TACO), Volume 20, Issue 1Article No.: 11, Pages 1–24https://doi.org/10.1145/3564606Deep neural networks (DNNs) have become key solutions in the natural language processing (NLP) domain. However, the existing accelerators customized for their narrow target models cannot support diverse NLP models. Therefore, naively running complex NLP ...
- research-articleDecember 2023
3D-FPIM: An Extreme Energy-Efficient DNN Acceleration System Using 3D NAND Flash-Based In-Situ PIM Unit
MICRO '22: Proceedings of the 55th Annual IEEE/ACM International Symposium on MicroarchitecturePages 1359–1376https://doi.org/10.1109/MICRO56248.2022.00093The crossbar structure of the nonvolatile memory enables highly parallel and energy-efficient analog matrix-vector-multiply (MVM) operations. To exploit its efficiency, existing works design a mixed-signal deep neural network (DNN) accelerator, which ...
- research-articleJune 2022
XQsim: modeling cross-technology control processors for 10+K qubit quantum computers
- Ilkwon Byun,
- Junpyo Kim,
- Dongmoon Min,
- Ikki Nagaoka,
- Kosuke Fukumitsu,
- Iori Ishikawa,
- Teruo Tanimoto,
- Masamitsu Tanaka,
- Koji Inoue,
- Jangwoo Kim
ISCA '22: Proceedings of the 49th Annual International Symposium on Computer ArchitecturePages 366–382https://doi.org/10.1145/3470496.352741710+K qubit quantum computer is essential to achieve a true sense of quantum supremacy. With the recent effort towards the large-scale quantum computer, architects have revealed various scalability issues including the constraints in a quantum control ...
- research-articleApril 2022
SmartFVM: A Fast, Flexible, and Scalable Hardware-based Virtualization for Commodity Storage Devices
ACM Transactions on Storage (TOS), Volume 18, Issue 2Article No.: 12, Pages 1–27https://doi.org/10.1145/3511213A computational storage device incorporating a computation unit inside or near its storage unit is a highly promising technology to maximize a storage server’s performance. However, to apply such computational storage devices and take their full potential ...
- research-articleFebruary 2022
CryoWire: wire-driven microarchitecture designs for cryogenic computing
ASPLOS '22: Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating SystemsPages 903–917https://doi.org/10.1145/3503222.3507749Cryogenic computing, which runs a computer device at an extremely low temperature, is promising thanks to its significant reduction of wire resistance as well as leakage current. Recent studies on cryogenic computing have focused on various architectural ...
- research-articleJanuary 2022
LSim: Fine-Grained Simulation Framework for Large-Scale Performance Evaluation
- Hamin Jang,
- Taehun Kang,
- Joonsung Kim,
- Jaeyong Cho,
- Jae-Eon Jo,
- Seungwook Lee,
- Wooseok Chang,
- Jangwoo Kim,
- Hanhwi Jang
IEEE Computer Architecture Letters (ICAL), Volume 21, Issue 1Pages 25–28https://doi.org/10.1109/LCA.2022.3168831As large-scale workloads with massive parallelism emerge, the demand for large-scale systems such as datacenters and supercomputers is rising sharply. To accurately design a large-scale system, architects heavily rely on performance modeling at design ...
- research-articleOctober 2021
UC-Check: Characterizing Micro-operation Caches in x86 Processors and Implications in Security and Performance
MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on MicroarchitecturePages 550–564https://doi.org/10.1145/3466752.3480079The modern x86 processor (e.g., Intel, AMD) translates CISC-style x86 instructions to RISC-style micro operations (uops) as RISC pipelines are more efficient than CISC pipelines. However, this x86 decoding process requires complex hardware logic (i.e., ...
- research-articleSeptember 2021
An accurate and fair evaluation methodology for SNN-based inferencing with full-stack hardware design space explorations
Neurocomputing (NEUROC), Volume 455, Issue CPages 125–138https://doi.org/10.1016/j.neucom.2021.05.020Highlights- Existing SNN evaluations are inaccurate due to limited design point considerations.
Artificial Neural Networks (ANNs) achieve high accuracy in various cognitive tasks (i.e., inferences), but often fail to meet power and latency budgets due to intensive computational overheads. To address the challenge, Spiking Neural ...
- research-articleNovember 2021
CryoGuard: a near refresh-free robust DRAM design for cryogenic computing
ISCA '21: Proceedings of the 48th Annual International Symposium on Computer ArchitecturePages 637–650https://doi.org/10.1109/ISCA52012.2021.00056Cryogenic computing, which runs a computer device at an extremely low temperature, is highly promising thanks to the significant reduction of the wire latency and leakage current. A recently proposed cryogenic DRAM design achieved the promising ...
- research-articleJune 2021
Performance Modeling and Practical Use Cases for Black-Box SSDs
ACM Transactions on Storage (TOS), Volume 17, Issue 2Article No.: 14, Pages 1–38https://doi.org/10.1145/3440022Modern servers are actively deploying Solid-State Drives (SSDs) thanks to their high throughput and low latency. However, current server architects cannot achieve the full performance potential of commodity SSDs, as SSDs are complex devices designed for ...
- research-articleApril 2021
NeuroEngine: a hardware-based event-driven simulation system for advanced brain-inspired computing
ASPLOS '21: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating SystemsPages 975–989https://doi.org/10.1145/3445814.3446738Brain-inspired computing aims to understand the cognitive mechanisms of a brain and apply them to advance various areas in computer science. Deep learning is an example to greatly improve the field of pattern recognition and classification by utilizing ...
- research-articleNovember 2020
FVM: FPGA-assisted virtual device emulation for fast, scalable, and flexible storage virtualization
OSDI'20: Proceedings of the 14th USENIX Conference on Operating Systems Design and ImplementationArticle No.: 54, Pages 955–971Emerging big-data workloads with massive I/O processing require fast, scalable, and flexible storage virtualization support. Hardware-assisted virtualization can achieve reasonable performance for fast storage devices, but it comes at the expense of ...
- research-articleNovember 2020
Scalable multi-FPGA acceleration for large RNNs with full parallelism levels
DAC '20: Proceedings of the 57th ACM/EDAC/IEEE Design Automation ConferenceArticle No.: 193, Pages 1–6The increasing size of recurrent neural networks (RNNs) makes it hard to meet the growing demand for real-time AI services. For low-latency RNN serving, FPGA-based accelerators can leverage specialized architectures with optimized dataflow. However, ...
- research-articleSeptember 2020
A multi-neural network acceleration architecture
ISCA '20: Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer ArchitecturePages 940–953https://doi.org/10.1109/ISCA45697.2020.00081A cost-effective multi-tenant neural network execution is becoming one of the most important design goals for modern neural network accelerators. For example, as emerging AI services consist of many heterogeneous neural network executions, a cloud ...