Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3637494.3637500acmotherconferencesArticle/Chapter ViewAbstractPublication PagescecctConference Proceedingsconference-collections
research-article

Error Management system with RAS characteristics of the ARMarchitecture

Published: 05 February 2024 Publication History

Abstract

Abstract-Modern data centers need to keep servers running stably for a long time without affecting data integrity. Therefore, data center servers need a set of effective systematic design solutions to ensure the reliability, availability and maintainability of the server, known as the RAS feature. And CPUs based on ARM architecture only began to mandate the implementation to support CPU RAS features from ARMv8 architecture. We explore an arm-based RAS framework with the ability of mutli-dimensional error detection and mutli-level processing. As compares to predominantly RAS system for arm-based servers which usually treat errors by reading component status registers and simply isolate the component software, our approach is more robust and accurate on isolating RAS errors. In the server RAS system, the memory RAS feature is the most important aspect. Therefore, this paper proposed a set of high-end server memory RAS system architecture design scheme based on ARM processor. The architecture scheme takes software and hardware error events as the core processing object, and realizes the repair and isolation of errors step by step through the functional modular processing strategy of error detection, error reporting, error classification, error characterization and error processing. The memory RAS system architecture creates a set of perfect error management system for CPU, memory, linux virtual memory and other abnormal events, realizes the full time detection of the server system operation, and can immediately carry out targeted treatment when the error occurs, greatly reducing the serious consequences caused by the abnormal operation of the server.

References

[1]
Çekyay Bora & Özekici Süleyman.(2023).MTTF and availability of semi-Markov missions with non-identical generally distributed component lifetimes. Stochastic Models(2),414-447. http://dx.doi.org/10.1080/15326349.2022.2112225
[2]
.(2017).SUSE; Huawei and SUSE Announce SUSE Linux Enterprise Server Support for KunLun RAS 2.0. Journal of Engineering,137-. http://www.proquest.asia/zh-CN/
[3]
Sanghyeon Baeg,Mirza Qasim,Junhyeong Kwon... & Satyadev Kolli.(2019).Correctable and uncorrectable errors using large scale DRAM DIMMs in replacement network servers. Microelectronics Reliability,104-112. http://www.sciencedirect.com/science/article/pii/S0026271418310047
[4]
Wooyoung Jang.(2014).DECO: DIMM controller efficient for ECC operations. Electronics Letters(19),1349-1351. https://onlinelibrary.wiley.com/doi/10.1049/el.2014.1135
[5]
.(2014).Super Micro Computer, Inc.; Supermicro Highlights MicroBlade, SuperBlade, FatTwin, SuperStorage DCO and EX DP 32/48 DIMM Server Solutions at Microsoft WPC 2014. Network Business Weekly. http://www.proquest.asia/zh-CN/
[6]
.(2013).Montage Technology Group; Montage Technology introduces the industry's first JEDEC 0.92 specification compliant DDR4 Registering Clock Driver and Data Buffer for DDR4 RDIMMs and LRDIMMs. Telecommunications Weekly. http://www.proquest.asia/zh-CN/
[7]
(2016).Patents; "dual In-line Memory Modules (DIMMs) Supporting Storage of A Data Indicator(s) in an Error Correcting Code (ecc) Storage Unit Dedicated to Storing an Ecc" in Patent Application Approval Process (uspto 20160224414). Information Technology Newsweekly. http://www.proquest.asia/zh-CN/

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
CECCT '23: Proceedings of the 2023 International Conference on Electronics, Computers and Communication Technology
November 2023
266 pages
ISBN:9798400716300
DOI:10.1145/3637494
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 February 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. ARM Architecture
  2. Availability
  3. Reliability
  4. Serviceability
  5. error detection
  6. the RAS feature

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

CECCT 2023

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 26
    Total Downloads
  • Downloads (Last 12 months)26
  • Downloads (Last 6 weeks)4
Reflects downloads up to 28 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media