Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleSeptember 2020
Xuantie-910: a commercial multi-core 12-stage pipeline out-of-order 64-bit high performance RISC-V processor with vector extension
- Chen Chen,
- Xiaoyan Xiang,
- Chang Liu,
- Yunhai Shang,
- Ren Guo,
- Dongqi Liu,
- Yimin Lu,
- Ziyi Hao,
- Jiahui Luo,
- Zhijian Chen,
- Chunqiang Li,
- Yu Pu,
- Jianyi Meng,
- Xiaolang Yan,
- Yuan Xie,
- Xiaoning Qi
ISCA '20: Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer ArchitecturePages 52–64https://doi.org/10.1109/ISCA45697.2020.00016The open source RISC-V ISA has been quickly gaining momentum. This paper presents Xuantie-910, an industry leading 64-bit high performance embedded RISC-V processor from Alibaba T-Head division. It is fully based on the RV64GCV instruction set and it ...
- research-articleMarch 2020
AsymNVM: An Efficient Framework for Implementing Persistent Data Structures on Asymmetric NVM Architecture
ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating SystemsPages 757–773https://doi.org/10.1145/3373376.3378511The byte-addressable non-volatile memory (NVM) is a promising technology since it simultaneously provides DRAM-like performance, disk-like capacity, and persistency. The current NVM deployment with byte-addressability is \em symmetric, where NVM devices ...
- research-articleNovember 2017
A Case for Migrating Execution for Irregular Applications
IA3'17: Proceedings of the Seventh Workshop on Irregular Applications: Architectures and AlgorithmsArticle No.: 6, Pages 1–8https://doi.org/10.1145/3149704.3149770Modern supercomputers have millions of cores, each capable of executing one or more threads of program execution. In these computers the site of execution for program threads rarely, if ever, changes from the node in which they were born. This paper ...
- research-articleOctober 2017
A bandwidth accurate, flexible and rapid simulating multi-HMC modeling tool
MEMSYS '17: Proceedings of the International Symposium on Memory SystemsPages 71–82https://doi.org/10.1145/3132402.3132403Derived by the demand for ever increasing computing performance, a steadily widening performance gap between memory and processor architectures has emerged. While attempting to mitigate the effects for processing systems that already face the exascale ...
- research-articleSeptember 2016
Online Scalability Characterization of Data-Parallel Programs on Many Cores
PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and CompilationPages 191–205https://doi.org/10.1145/2967938.2967960We present an accurate online scalability prediction model for data-parallel programs on NUMA many-core systems. Memory contention is considered to be the major limiting factor of program scalability as data parallelism limits the amount of ...
- ArticleOctober 2011
A combined arithmetic logic unit and memory element for the design of a parallel computer
Memory-CPU single communication channel bottleneck of the von Neumann architecture is quickly stalling the growth of computer processors. A probable solution to this problem is to fuse processing and memory elements. A simple low latency single on-chip ...
- articleMarch 2011
Efficient memory architecture for image processing
International Journal of Circuit Theory and Applications (IJCTA), Volume 39, Issue 3Pages 351–356https://doi.org/10.1002/cta.625This Letter presents a novel purpose-designed architecture to realize efficient dual-port memory structures for image processing applications. The main innovation proposed here is the exploitation of single-port (SP) sub-banks to achieve the same data ...
- research-articleJuly 2010
Rethinking Flash in the Data Center
Deployment of flash memory depends on making the most of its unique properties instead of treating it as a drop-in replacement for existing technologies.
- ArticleDecember 2003
Efficient scratchpad allocation algorithms for energy constrained embedded systems
PACS'03: Proceedings of the Third international conference on Power - Aware Computer SystemsPages 41–56https://doi.org/10.1007/978-3-540-28641-7_4In the context of portable embedded systems, reducing energy is one of the prime objectives. Memories are responsible for a significant percentage of a system’s aggregate energy consumption. Consequently, novel memories as well as novel memory ...
- ArticleJanuary 2000
Two management approaches of the split data cache in multiprocessor systems
EURO-PDP'00: Proceedings of the 8th Euromicro conference on Parallel and distributed processingPages 301–308As processor speed continues, the gap between the processor cycle and the memory subsystem cycle is expected to grow. One solution to this growing problem is to maximize the first level (L1) cache hit ratio, therefore the mean memory access time can ...
- ArticleMarch 1995
Architectural exploration for datapaths with memory hierarchy
In this paper, we present a new design-space exploration algorithm, the architecture explorer (AE), for analyzing performance/cost tradeoffs in memory-intensive applications. AE evaluates FU, bus, and memory cost for a series of performance constraints ...
- research-articleNovember 1980
Magnetic Bubble Memory Architectures for Supporting Associative Searching of Relational Databases
IEEE Transactions on Computers (ITCO), Volume 29, Issue 11Pages 957–970https://doi.org/10.1109/TC.1980.1675490A memory organized around a major/minor loop magnetic bubble storage unit contains database information in relational form. An external marker memory, consisting of an M-bit shift register or an M X 1 RAM, provides, in conjunction with an assumed ...