Intel Xeon Processor 7500 Series Uncore
Intel Xeon Processor 7500 Series Uncore
Intel Xeon Processor 7500 Series Uncore
CONTENTS
CHAPTER 1
INTRODUCTION
1.1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1
1.2 UNCORE PMU OVERVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1
1.3 UNCORE PMU SUMMARY TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2
1.4 REFERENCES. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-3
CHAPTER 2
UNCORE PERFORMANCE MONITORING
2.1 GLOBAL PERFORMANCE MONITORING CONTROL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1
2.1.1 Counter Overflow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-1
2.1.1.1 Freezing on Counter Overflow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-1
2.1.1.2 PMI on Counter Overflow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-1
2.1.2 Setting up a Monitoring Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-1
2.1.3 Reading the Sample Interval. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-2
2.1.4 Enabling a New Sample Interval from Frozen Counters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-2
2.1.5 Global Performance Monitors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-3
2.1.5.1 Global PMON Global Control/Status Registers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-3
2.2 U-BOX PERFORMANCE MONITORING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5
2.2.1 U-Box PMON Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-5
2.2.1.1 U-Box Box Level PMON State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-5
2.2.1.2 U-Box PMON state - Counter/Control Pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-5
2.2.2 U-Box Performance Monitoring Events. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-6
2.2.3 U-Box Events Ordered By Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-6
2.2.4 U-Box Performance Monitor Event List. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-7
2.3 C-BOX PERFORMANCE MONITORING. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9
2.3.1 Overview of the C-Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-9
2.3.2 C-Box Performance Monitoring Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-9
2.3.2.1 C-Box PMU - Overflow, Freeze and Unfreeze . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-9
2.3.3 C-BOX Performance Monitors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-10
2.3.3.1 C-Box Box Level PMON state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-13
2.3.3.2 C-Box PMON state - Counter/Control Pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-14
2.3.4 C-BOX Performance Monitoring Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-16
2.3.4.1 An Overview: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-16
2.3.4.2 Acronyms frequently used in C-Box Events: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-16
2.3.4.3 The Queues: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-17
2.3.4.4 Detecting Performance Problems in the C-Box Pipeline: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-17
2.3.5 C-Box Events Ordered By Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-17
2.3.6 C-Box Performance Monitor Event List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-18
2.4 B-BOX PERFORMANCE MONITORING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-29
2.4.1 Overview of the B-Box. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-29
2.4.2 B-Box Performance Monitoring Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-29
2.4.2.1 B-Box PMU - On Overflow and the Consequences (PMI/Freeze). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-29
2.4.3 B-BOX Performance Monitors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-30
2.4.3.1 B-Box Box Level PMON state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-30
2.4.3.2 B-Box PMON state - Counter/Control Pairs + Filters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-31
2.4.4 B-Box Performance Monitoring Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-33
2.4.4.1 On the ARBQ:. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-33
2.4.4.2 On the Major B-Box Structures: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-34
2.4.4.3 On InvItoE Transactions: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-34
2.4.5 B-Box Events Ordered By Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-34
2.4.6 B-Box Performance Monitor Event List. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-36
2.5 S-BOX PERFORMANCE MONITORING. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-45
2.5.1 Overview of the S-Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-45
2.5.2 S-Box Performance Monitoring Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-45
2.5.2.1 S-Box PMU - Overflow, Freeze and Unfreeze . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-45
2.5.3 S-BOX Performance Monitors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-45
2.5.3.1 S-Box PMON for Global State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-46
2.5.3.2 S-Box Box Level PMON state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-47
2.5.3.3 S-Box PMON state - Counter/Control Pairs + Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-48
2.5.3.4 S-Box Registers for Mask/Match Facility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-49
2.5.4 S-BOX Performance Monitoring Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-51
2.5.4.1 An Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-51
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE CONTENTS
FIGURES
Figure 1-1. Intel Xeon Processor 7500 Series Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-1
Figure 2-1. R-Box Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-72
Figure 2-2. Memory Controller Block Diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-95
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE FIGURES
TABLES
Table 1-1. Per-Box Performance Monitoring Capabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2
Table 1-2. Uncore Performance Monitoring MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2
Table 2-1. Global Performance Monitoring Control MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3
Table 2-2. U_MSR_PMON_GLOBAL_CTL Register – Field Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4
Table 2-3. U_MSR_PMON_GLOBAL_STATUS Register – Field Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4
Table 2-4. U_MSR_PMON_GLOBAL_OVF_CTL Register – Field Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4
Table 2-5. U-Box Performance Monitoring MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5
Table 2-6. U_MSR_PMON_EVT_SEL Register – Field Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5
Table 2-7. U_MSR_PMON_CTR Register – Field Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6
Table 2-8. Performance Monitor Events for U-Box Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6
Table 2-9. C-Box Performance Monitoring MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-10
Table 2-10. C_MSR_PMON_GLOBAL_CTL Register – Field Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-14
Table 2-11. C_MSR_PMON_GLOBAL_STATUS Register – Field Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-14
Table 2-12. C_MSR_PMON_GLOBAL_OVF_CTL Register – Field Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-14
Table 2-13. C_MSR_PMON_EVT_SEL{5-0} Register – Field Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-15
Table 2-14. C_MSR_PMON_CTR{5-0} Register – Field Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-15
Table 2-15. Performance Monitor Events for C-Box Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-17
Table 2-16. B-Box Performance Monitoring MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-30
Table 2-17. B_MSR_PMON_GLOBAL_CTL Register – Field Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-31
Table 2-18. B_MSR_PMON_GLOBAL_STATUS Register – Field Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-31
Table 2-19. B_MSR_PMON_GLOBAL_OVF_CTL Register – Field Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-31
Table 2-20. B_MSR_PMON_EVT_SEL{3-0} Register – Field Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-32
Table 2-21. B_MSR_PMON_CNT{3-0} Register – Field Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-32
Table 2-22. B_MSR_MATCH_REG Register – Field Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-33
Table 2-23. B_MSR_MASK_REG Register – Field Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-33
Table 2-24. Performance Monitor Events for B-Box Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-35
Table 2-25. S-Box Performance Monitoring MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-45
Table 2-26. S_MSR_PMON_SUMMARY Register Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-47
Table 2-27. S_CSR_PMON_GLOBAL_CTL Register Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-47
Table 2-28. S_MSR_PMON_GLOBAL_STATUS Register Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-47
Table 2-29. S_MSR_PMON_OVF_CTRL Register Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-48
Table 2-30. S_CSR_PMON_CTL{3-0} Register – Field Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-48
Table 2-31. S_CSR_PMON_CTR{3-0} Register – Field Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-49
Table 2-32. S_MSR_MM_CFG Register – Field Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-49
Table 2-33. S_MSR_MATCH Register – Field Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-50
Table 2-34. S_MSR_MATCH.opc - Opcode Match by Message Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-50
Table 2-35. S_MSR_MASK Register – Field Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-51
Table 2-36. S-Box Data Structure Occupancy Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-52
Table 2-37. Performance Monitor Events for S-Box Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-53
Table 2-38. Input Buffering Per Port . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-73
Table 2-39. R-Box Performance Monitoring MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-74
Table 2-40. R-Box Port Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-77
Table 2-41. R_MSR_PMON_GLOBAL_CTL_{15_8, 7_0} Register Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-78
Table 2-42. R_MSR_PMON_GLOBAL_STATUS_{15_8, 7_0} Register Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-78
Table 2-43. R_MSR_PMON_OVF_CTL_{15_8, 7_0} Register Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-78
Table 2-44. R_MSR_PMON_CTL{15-0} Register – Field Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-79
Table 2-45. R_MSR_PMON_CTL{15-8} Event Select . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-80
Table 2-46. R_MSR_PMON_CTL{7-0} Event Select . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-81
Table 2-47. R_MSR_PMON_CTR{15-0} Register – Field Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-81
Table 2-48. R_MSR_PORT{7-0}_IPERF_CFG{1-0} Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-82
Table 2-49. R_MSR_PORT{7-0}_QLX_CFG Register Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-83
Table 2-50. R_MSR_PORT{7-0}_XBR_SET{2-1}_MM_CFG Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-85
Table 2-51. R_MSR_PORT{7-0}_XBR_SET{2-1}_MATCH Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-85
Table 2-52. R_MSR_PORT{7-0}_XBR_SET{2-1}_MASK Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-86
Table 2-53. Message Events Derived from the Match/Mask filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-87
Table 2-54. Performance Monitor Events for R-Box Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-88
Table 2-55. Unit Masks for ALLOC_TO_ARB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-89
Table 2-56. Unit Masks for EOT_DEPTH_ACC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-89
Table 2-57. Unit Masks for EOT_ROLL_DEPTH_ACC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-90
Table 2-58. Unit Masks for GLOBAL_ARB_BID_FAIL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-91
Table 2-59. Unit Masks for NEW_PACKETS_RECV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-92
Table 2-60. Unit Masks for QUE_ARB_BID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-93
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE TABLES
CHAPTER 1
INTRODUCTION
1.1 INTRODUCTION
Figure 1-1 provides an Intel® Xeon® Processor 7500 Series block diagram.
1-1
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE INTRODUCTION
The general performance monitoring capabilities in each box are outlined in the following table.
S-Box 2 4 Y Y 48
B-Box 2 4 N Y 48
M-Box 2 6 N N 48
U-Box 1 1 Y N 48
W-Box 1 4 Y N 48
R-Box Counters
R-Box R 0xE3F-0xE30 Counter/Config Registers(15-8)
C-Box Counters
C-Box 7 0xDFB-0xDF0 Counter/Config Registers
M-Box Counters
1-2
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE INTRODUCTION
S-Box Counters
S-Box 1 0xE5A-0xE58 Match/Mask Registers
B-Box Counters
B-Box 1 0xE4E-0xE4D Match/Mask Registers
U-Box Counters
U-Box 0xC11-0xC10 Counter/Config Registers
W-Box Counters
W-Box 0x395-0x394 Fixed Counter/Config Registers
1.4 References
The following sections provide a breakdown of the performance monitoring capabilities of each box.
• Section 2.1, “Global Performance Monitoring Control”
• Section 2.2, “U-Box Performance Monitoring”
• Section 2.3, “C-Box Performance Monitoring”
• Section 2.4, “B-Box Performance Monitoring”
• Section 2.5, “S-Box Performance Monitoring”
• Section 2.6, “R-Box Performance Monitoring”
• Section 2.7, “M-Box Performance Monitoring”
1-3
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE INTRODUCTION
1-4
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
CHAPTER 2
UNCORE PERFORMANCE MONITORING
The Intel® Xeon® Processor 7500 Series uncore performance monitors may be configured to respond to
this overflow with two basic actions:
Note: PMI is decoupled from freeze, so if software also wants the counters frozen, it must set
U_MSR_PMON_GLOBAL_CTL.frz_all to 1.
a) Reset counters to ensure no stale values have been acquired from previous sessions:
- set U_MSR_PMON_GLOBAL_CTL.rst_all to 1.
Determine what events should be captured and program the control registers to capture them (i.e.
typically selected by programming the .ev_sel bits although other bit fields may be involved).
2-1
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
To set up a sample interval, software can pre-program the data register with a value of [2^48 -
sample interval length]. Doing so allows software, through use of the pmi mechanism, to be notified
when the number of events in the sample have been captured. Capturing a performance monitoring
sample every ‘X cycles’ (the fixed counter in the W-Box counts uncore clock cycles) is a common use of
this mechanism.
i.e. To stop counting and receive notification when the 1,000th SNP_MERGE has been detected,
- set B_MSR_PMON_EVT_SEL.pmi_en to 1
- set U_MSR_PMON_GLOBAL_CTL.frz_all to 1
f) Enable counting at the global level by setting the U_MSR_PMON_GLOBAL_CTL.en_all bit to 1. Set the
.rst_all field to 0 with the same write.
a) Polling - before reading, it is recommended that software freeze and disable the counters (by
clearing U_MSR_PMON_GLOBAL_CTL.en_all).
b) Frozen counters - If software set up the counters to freeze on overflow and send notification when it
happens, the next question is: Who caused the freeze?
Overflow bits are stored hierarchically within the Intel Xeon Processor 7500 Series uncore. First,
software should read the U_MSR_PMON_GLOBAL_STATUS.ov_* bits to determine whether a U or W box
counter caused the overflow or whether it was a counter in a box attached to the S0 or S1 Box.
The S-Boxes aggregate overflow bits from the M/B/C/R boxes they are attached to. So the next step is
to read the S{0,1}_MSR_PMON_SUMMARY.ov_* bits. Once the box(es) that contains the overflowing
counter is identified, the last step is to read that box’s *_MSR_PMON_GLOBAL_STATUS.ov field to find
the overflowing counter.
Note: More than one counter may overflow at any given time.
2-2
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
b) Clear all overflow bits. When an overflow bit is cleared, all bits that summarize that overflow (above
in the hierarchy) will also be cleared. Therefore it is only necessary to clear the overflow bits
corresponding to the actual counter.
i.e. If counter 3 in B-Box 1 overflowed, to clear the overflow bit software should set
B_MSR_PMON_GLOBAL_OVF_CTL.clr_ov[3] to 1 in B-Box 1. This action will also clear
S_MSR_PMON_SUMMARY.ov_mb in S-Box 1 and U_MSR_PMON_GLOBAL_STATUS.ov_s1.c
c) Create the next sample: Reinitialize the sample by setting the monitoring data register to (2^48 -
sample_interval). Or set up a new sample interval as outlined in Section 2.1.2, “Setting up a Monitoring
Session”.
d) Re-enable counting: Set U_MSR_PMON_GLOBAL_CTL.en_all to 1. Set the .rst_all field back to 0 with
the same write.
U_MSR_PMON_GLOBAL_CTL contains bits that can reset (.rst_all) and freeze/enable (.en_all) all the
uncore counters. The .en_all bit must be set to 1 before any uncore counters will collect events.
Note: The register also contains the enable for the U-Box counters.
2-3
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
frz_all 31 0 Disable uncore counting (by clearing .en_all) if PMI is received from box
with overflowing counter.
Ex:
If counter pmi is sent to U-Box for Box with overflowing counter...
00000000 - No PMI sent
00000001 - Send PMI to core 0
10000000 - Send PMI to core 7
11000100 - Send PMI to core 2, 6 & 7
etc.
2-4
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
It contains one counter which can be configured to capture a small set of events.
- .edge_detect - Rather than accumulating the raw count each cycle, the register can capture
transitions from no event to an event incoming.
pmi_en 20 0 When this bit is asserted and the corresponding counter overflows, a PMI
exception is sent to the U-Box.
edge_detect 18 0 When asserted, the 0 to 1 transition edge of a 1 bit event input will cause
the corresponding counter to increment. When 0, the counter will
increment for however long the event is asserted.
2-5
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
The U-Box performance monitor data register is 48b wide. A counter overflow occurs when a carry out
bit from bit 47 is detected. Software can force all uncore counting to freeze after N events by preloading
a monitor with a count value of 248 - N and setting the control register to send a PMI to the U-Box. Upon
receipt of the PMI, the U-Box will disable counting ( Section 2.1.1.1, “Freezing on Counter Overflow”).
During the interval of time between overflow and global disable, the counter value will wrap and
continue to collect events.
In this way, software can capture the precise number of events that occurred between the time uncore
counting was enabled and when it was disabled (or ‘frozen’) with minimal skew.
If accessible, software can continuously read the data registers without disabling event collection.
- Tracks NcMsgS packets generated by the U-Box, as they arbitrate to be broadcast. They are
prioritized as follows: Special Cycle->StopReq1/StartReq2->Lock/Unlock->Remote Interrupts->Local
Interrupts.
- Errors detected and distinguished between recoverable, corrected, uncorrected and fatal.
etc.
2-6
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
BUF_VALID_LOCAL_INT
• Title: Local IPI Buffer Valid
• Category: U-Box Events
• Event Code: 0x000, Max. Inc/Cyc: 1,
• Definition: Number of cycles the Local Interrupt packet buffer contained a valid entry.
BUF_VALID_LOCK
• Title: Lock Buffer Valid
• Category: U-Box Events
• Event Code: 0x002, Max. Inc/Cyc: 1,
• Definition: Number of cycles the Lock packet buffer contained a valid entry.
BUF_VALID_REMOTE_INT
• Title: Remote IPI Buffer Valid
• Category: U-Box Events
• Event Code: 0x001, Max. Inc/Cyc: 1,
• Definition: Number of cycles the Remote IPI packet buffer contained a valid entry.
BUF_VALID_SPC_CYCLES
• Title: SpcCyc Buffer Valid
• Category: U-Box Events
• Event Code: 0x004, Max. Inc/Cyc: 1,
• Definition: Number of uncore cycles the Special Cycle packet buffer contains a valid entry. ‘Special
Cycles’ are NcMsgS packets generated by the U-Box and broadcast to internal cores to cover such
things as Shutdown, Invd_Ack and WbInvd_Ack conditions.
BUF_VALID_STST
• Title: Start/Stop Req Buffer Valid
• Category: U-Box Events
• Event Code: 0x003, Max. Inc/Cyc: 1,
• Definition: Number of uncore cycles the Start/Stop Request packet buffer contained a valid entry.
CORRECTED_ERR
• Title: Corrected Errors
• Category: U-Box Events
• Event Code: 0x1E4, Max. Inc/Cyc: 1,
• Definition: Number of corrected errors.
2-7
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
FATAL_ERR
• Title: Fatal Errors
• Category: U-Box Events
• Event Code: 0x1E6, Max. Inc/Cyc: 1,
• Definition: Number of fatal errors.
IPIS_SENT
• Title: Number Core IPIs Sent
• Category: U-Box Events
• Event Code: 0x0F9, Max. Inc/Cyc: 1,
• Definition: Number of core IPIs sent.
RECOV
• Title: Recoverable
• Category: U-Box Events
• Event Code: 0x1DF, Max. Inc/Cyc: 1,
• Definition: Number of recoverable errors.
U2R_REQUESTS
• Title: Number U2R Requests
• Category: U-Box Events
• Event Code: 0x050, Max. Inc/Cyc: 1,
• Definition: Number U-Box to Ring Requests.
U2B_REQUEST_CYCLES
• Title: U2B Active Request Cycles
• Category: U-Box Events
• Event Code: 0x051, Max. Inc/Cyc: 1,
• Definition: Number U to B-Box Active Request Cycles.
UNCORRECTED_ERR
• Title: Uncorrected Error
• Category: U-Box Events
• Event Code: 0x1E5, Max. Inc/Cyc: 1,
• Definition: Number of uncorrected errors.
WOKEN
• Title: Number Cores Woken Up
• Category: U-Box Events
• Event Code: 0x0F8, Max. Inc/Cyc: 1,
• Definition: Number of cores woken up.
2-8
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
The C-Box is also the gate keeper for all Intel® QuickPath Interconnect (Intel® QPI) messages that
originate in the core and is responsible for ensuring that all Intel QuickPath Interconnect messages that
pass through the socket’s LLC remain coherent.
The Intel Xeon Processor 7500 Series contains eight instances of the C-Box, each assigned to manage a
distinct 3MB, 24-way set associative slice of the processor’s total LLC capacity. For processors with
fewer than 8 3MB LLC slices, the C-Boxes for missing slices will still be active and track ring traffic
caused by their co-located core even if they have no LLC related traffic to track (i.e. hits/misses/
snoops).
Every physical memory address in the system is uniquely associated with a single C-Box instance via a
proprietary hashing algorithm that is designed to keep the distribution of traffic across the C-Box
instances relatively uniform for a wide range of possible address patterns. This enables the individual C-
Box instances to operate independently, each managing its slice of the physical address space without
any C-Box in a given socket ever needing to communicate with the other C-Boxes in that same socket.
Each C-Box is uniquely associated with a single S-Box. All messages which a given C-Box sends out to
the system memory or Intel QPI pass through the S-Box that is physically closest to that C-Box.
For information on how to setup a monitoring session, refer to Section 2.1, “Global Performance
Monitoring Control”.
HW can be also configured (by setting the corresponding .pmi_en to 1) to send a PMI to the U-Box
when an overflow is detected. The U-Box may be configured to freeze all uncore counting and/or send a
PMI to selected cores when it receives this signal.
Once a freeze has occurred, in order to see a new freeze, the overflow field responsible for the freeze,
must be cleared by setting the corresponding bit in C_MSR_PMON_GLOBAL_OVF_CTL.clr_ov. Assuming
all the counters have been locally enabled (.en bit in data registers meant to monitor events) and the
overflow bit(s) has been cleared, the C-Box is prepared for a new sample interval. Once the global
controls have been re-enabled (Section 2.1.4, “Enabling a New Sample Interval from Frozen
Counters”), counting will resume.
2-9
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
2-10
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
MSR Size
Acces
MSR Name Addres (bits Description
s
s )
CB5_CR_C_MSR_PMON_CTR_5 RW_R 0xDBB 64 C-Box 5 PMON Counter 5
W
2-11
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
MSR Size
Acces
MSR Name Addres (bits Description
s
s )
CB6_CR_C_MSR_PMON_CTR_4 RW_R 0xD79 64 C-Box 6 PMON Counter 4
W
2-12
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
MSR Size
Acces
MSR Name Addres (bits Description
s
s )
CB4_CR_C_MSR_PMON_CTR_3 RW_R 0xD37 64 C-Box 4 PMON Counter 3
W
The _GLOBAL_CTL register contains the bits used to enable monitoring. It is necessary to set the
.ctr_en bit to 1 before the corresponding data register can collect events.
2-13
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
If an overflow is detected from one of the C-Box PMON registers, the corresponding bit in the
_GLOBAL_STATUS.ov field will be set. To reset the overflow bits set in the _GLOBAL_STATUS.ov field, a
user must set the corresponding bits in the _GLOBAL_OVF_CTL.clr_ov field before beginning a new
sample interval.
- .threshold - since C-Box counters can increment by a value greater than 1, a threshold can be applied.
If the .threshold is set to a non-zero value, that value is compared against the incoming count for that
event in each cycle. If the incoming count is >= the threshold value, then the event count captured in
the data register will be incremented by 1.
- .edge_detect - Rather than accumulating the raw count each cycle (for events that can increment by
1 per cycle), the register can capture transitions from no event to an event incoming.
2-14
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
invert 23 0 When 0, the comparison that will be done is threshold <= event. When
set to 1, the comparison that is inverted (e.g. threshold < event)
pmi_en 20 0 When this bit is asserted and the corresponding counter overflows, a PMI
exception is sent to the U-Box.
edge_detect 18 0 When asserted, the 0 to 1 transition edge of a 1 bit event input will cause
the corresponding counter to increment. When 0, the counter will
increment for however long the event is asserted.
The C-Box performance monitor data registers are 48b wide. A counter overflow occurs when a carry
out bit from bit 47 is detected. Software can force all uncore counting to freeze after N events by
preloading a monitor with a count value of 248 - N and setting the control register to send a PMI to the
U-Box. Upon receipt of the PMI, the U-Box will disable counting ( Section 2.1.1.1, “Freezing on Counter
Overflow”). During the interval of time between overflow and global disable, the counter value will wrap
and continue to collect events.
In this way, software can capture the precise number of events that occurred between the time uncore
counting was enabled and when it was disabled (or ‘frozen’) with minimal skew.
If accessible, software can continuously read the data registers without disabling event collection.
2-15
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
C-Box performance monitoring events can be used to track LLC access rates, LLC hit/miss rates, LLC
eviction and fill rates, and to detect evidence of back pressure on the LLC pipelines. In addition, the C-
Box has performance monitoring events for tracking MESI state transitions that occur as a result of
data sharing across sockets in a multi-socket system. And finally, there are events in the C-Box for
tracking ring traffic at the C-Box/Core sink inject points.
Every event in the C-Box (with the exception of the P2C inject and *2P sink counts) are from the point
of view of the LLC and cannot be associated with any specific core since all cores in the socket send
their LLC transactions to all C-Boxes in the socket. The P2C inject and *2P sink counts serve as the
exception since those events are tracking ring activity at the cores’ ring inject/sink points.
There are separate sets of counters for each C-Box instance. For any event, to get an aggregate count
of that event for the entire LLC, the counts across the C-Box instances must be added together. The
counts can be averaged across the C-Box instances to get a view of the typical count of an event from
the perspective of the individual C-Boxes. Individual per-C-Box deviations from the average can be
used to identify hot-spotting across the C-Boxes or other evidences of non-uniformity in LLC behavior
across the C-Boxes. Such hot-spotting should be rare, though a repetitive polling on a fixed physical
address is one obvious example of a case where an analysis of the deviations across the C-Boxes would
indicate hot-spotting.
AD (Address) Ring - Core Read/Write Requests and Intel QPI Snoops. Carries Intel QPI requests and
snoop responses from C to S-Box.
AK (Acknowledge) Ring - Acknowledges S-Box to C-Box and C-Box to Core. Carries snoop responses
from Core to C-Box.
IRQ - Ingress Request Queue on AD Ring. Associated with requests from core.
IPQ - Ingress Probe Queue on AD Ring. Associated with snoops from S-Box.
IDQ - Ingress Data Queue on BL Ring. For data from either Core or S-Box.
MAF - Miss Address File. Intel QPI ordering buffer that also tracks local coherence.
2-16
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
Note: IDQ, ICQ, SRQ and IGQ occupancies are not tracked since they are mapped 1:1 to the
MAF and, therefore, can not create back pressure.
It should be noted that, while the IRQ, IPQ, VIQ and MAF queues reside within the C-Box; the RWRF,
RSPF and IDF queues do not. Instead, they live in-between the Core and the Ring buffering messages
as those messages transit between the two. This distinction is useful in that, the queues located within
the C-Box can provide information about what is going on in the LLC with respect to the flow of
transactions at the point where they become “observed” by the coherence fabric (i.e., where the MAF is
located). Occupancy of these buffers informs how many transactions the C-Box is tracking, and where
the bottlenecks are manifesting when the C-Box starts to get busy and/or congested.
There is no need to explicitly reset the occupancy counters in the C-Box since they are counting from
reset de-assertion.
One relatively common scenario in which IRQ back pressure will be high is worth mentioning: The IRQ
will backup when software is demanding data from memory at a rate that exceeds the available
memory BW. The IRQ is designed to be the place where the extra transactions wait U-Box’s RTIDs to
become available when memory becomes saturated. IRQ back pressure becomes interesting in a
scenario where memory is not operating at or near peak sustainable BW. That can be a sign of a
performance problem that may be correctable with software tuning.
One final warning on LLC pipeline congestion: Care should be taken not to blindly sum events across C-
Boxes without also checking the deviation across individual C-Boxes when investigating performance
issues that are concentrated in the C-Box pipelines. Performance problems where congestion in the C-
Box pipelines is the cause should be rare, but if they do occur, the event counts may not be
homogeneous across the C-Boxes in the socket. The average count across the C-Boxes may be
misleading. If performance issues are found in this area it will be useful to know if they are or are not
localized to specific C-Boxes.
Ring Events
2-17
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
Ring Events
Local Events
2-18
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
ARB_LOSSES
• Title: Arbiter Losses.
• Category: Ring - Egress
• Event Code: 0x0A, Max. Inc/Cyc: 7,
• Definition: Number of Ring arbitration losses. A loss occurs when a message injection on to the ring
fails. This could occur either because there was another message resident on the ring at that ring
stop or because the co-located ring agent issued a message onto the ring in the same cycle.
umask
Extension Description
[15:8]
AD_SB b00000001 AD ring in the direction that points toward the nearest S-Box
AD_NSB b00000010 AD ring in the direction that points away from the nearest S-Box
AK_SB b00000100 AK ring in the direction that points toward the nearest S-Box
AK_NSB b00001000 AK ring in the direction that points away from the nearest S-Box
BL_SB b00010000 BL ring in the direction that points toward the nearest S-Box
BL_NSB b00100000 BL ring in the direction that points away from the nearest S-Box
IV b01000000 IV ring
ARB_WINS
• Title: Arbiter Wins
• Category: Ring - Egress
• Event Code: 0x09, Max. Inc/Cyc: 7,
• Definition: Number of Ring arbitration wins. A win is when a message was successfully injected onto
the ring.
umask
Extension Description
[15:8]
AD_SB b00000001 AD ring in the direction that points toward the nearest S-Box
AD_NSB b00000010 AD ring in the direction that points away from the nearest S-Box
AK_SB b00000100 AK ring in the direction that points toward the nearest S-Box
AK_NSB b00001000 AK ring in the direction that points away from the nearest S-Box
BL_SB b00010000 BL ring in the direction that points toward the nearest S-Box
BL_NSB b00100000 BL ring in the direction that points away from the nearest S-Box
IV b01000000 IV ring
2-19
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
BOUNCES_C2P_AK
• Title: C2P AK Bounces
• Category: Ring - WIR
• Event Code: 0x02, Max. Inc/Cyc: 1,
• Definition: Number of LLC Ack responses to the core that bounced on the AK ring.
umask
Extension Description
[15:8]
NSB b0000001x Direction that points away from the nearest S-Box
BOUNCES_C2P_BL
• Title: C2P BL Bounces
• Category: Ring - WIR
• Event Code: 0x03, Max. Inc/Cyc: 1,
• Definition: Number of LLC data responses to the core that bounced on the BL ring.
umask
Extension Description
[15:8]
NSB b0000001x Direction that points away from the nearest S-Box
BOUNCES_C2P_IV
• Title: C2P IV Bounces
• Category: Ring - WIR
• Event Code: 0x04, Max. Inc/Cyc: 1,
• Definition: Number of C-Box snoops of a processor’s cache that bounced on the IV ring.
BOUNCES_P2C_AD
• Title: P2C AD Bounces
• Category: Ring - WIR
• Event Code: 0x01, Max. Inc/Cyc: ,
• Definition: Core request to LLC bounces on AD ring.
umask
Extension Description
[15:8]
NSB b0000001x Direction that points away from the nearest S-Box
2-20
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
EGRESS_BYPASS_WINS
• Title: Egress Bypass Wins
• Category: Local - Egress
• Event Code: 0x0C, Max. Inc/Cyc: 7,
• Definition: Number of times a ring egress bypass was taken when a message was injected onto the
ring. The subevent field allows tracking of each available egress queue bypass path, including both 0
and 1 cycle versions.
umask
Extension Description
[15:8]
INGRESS_BYPASS_WINS_AD
• Title: Ingress S-Box/Non S-Box Bypass Wins
• Category: Local - Egress
• Event Code: 0x0E, Max. Inc/Cyc: 1,
• Definition: Number of times that a message, off the AD ring, sunk by the C-Box took one of the
ingress queue bypasses. The subevent field allows tracking of each available ingress queue bypass
path, including both 0 and 1 cycle versions.
umask
Extension Description
[15:8]
LLC_HITS
• Title: LLC Hits
• Category: Local - LLC
• Event Code: 0x15, Max. Inc/Cyc: 1,
• Definition: Last Level Cache Hits
• NOTE: LRU hints are included in count.
umask
Extension Description
[15:8]
M b0000xxx1 Modified
E b0000xx1x Exclusive
S b0000x1xx Shared
2-21
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
umask
Extension Description
[15:8]
LLC_MISSES
• Title: LLC Misses
• Category: Local - LLC
• Event Code: 0x14, Max. Inc/Cyc: 1,
• Definition: Last Level Cache Misses
umask
Extension Description
[15:8]
LLC_S_FILLS
• Title: LLC S-Box Fills
• Category: Local - LLC
• Event Code: 0x16, Max. Inc/Cyc: 1,
• Definition: Last Level Cache lines filled from S-Box
umask
Extension Description
[15:8]
LLC_VICTIMS
• Title: LLC Lines Victimized
• Category: Local - LLC
• Event Code: 0x17, Max. Inc/Cyc: 1,
• Definition: Last Level Cache lines victimized
umask
Extension Description
[15:8]
2-22
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
MAF_ACK
• Title: MAF ACK
• Category: Local - MAF
• Event Code: 0x10, Max. Inc/Cyc: 1,
• Definition: Miss Address File Acknowledgements.
MAF_NACK1
• Title: MAF NACK1
• Category: Local - MAF
• Event Code: 0x11, Max. Inc/Cyc: 1,
• Definition: Rejected (not-acknowledged) LLC pipeline passes (Set 1).
umask
Extension Description
[15:8]
GO_PENDING bxxxxxxx1 A message associated with a transaction monitored by the MAF was
delayed because the transaction had a GO pending in the requesting
core.
VIC_PENDING bxxxxxx1x An LLC fill was delayed because the victimized data in the LLC was
still being processed.
SNP_PENDING bxxxxx1xx A message associated with a transaction monitored by the MAF was
delayed because the transaction had a snoop pending.
AC_PENDING bxxxx1xxx An incoming remote Intel QPI snoop was delayed because it conflicted
with an existing MAF transaction that had an Ack Conflict pending.
IDX_BLOCK bxxx1xxxx An incoming local core RD that missed the LLC was delayed because a
victim way could not be immediately chosen.
PA_BLOCK bxx1xxxxx If this count is very high, it likely means that software is frequently
issuing requests to the same physical address from disparate threads
simultaneously. Though there will also sometimes be a small number
of PA_BLOCK nacks in the background due to cases when a pair of
messages associated with the same transaction happen to arrive at
the LLC at the same time and one of them gets delayed.
ALL_MAF_NACK2 b1xxxxxxx A message was rejected when one or more of the sub-events under
MAF_NACK2 was true. This is included in MAF_NACK1 so that
MAF_NACK1 with sub-event 0xFF will count the total number of
Nacks.
TOTAL_MAF_NACKS b11111111 Total number of LLC pipeline passes that were nacked.
MAF_NACK2
• Title: MAF NACK2
• Category: Local - MAF
• Event Code: 0x12, Max. Inc/Cyc: 1,
• Definition: Rejected (not-acknowledged) LLC pipeline passes (Set 2).
umask
Extension Description
[15:8]
MAF_FULL bxxxxxxx1 An incoming local processor RD/WR or remote Intel QPI snoop
request that required a MAF entry was delayed because no MAF entry
was available.
2-23
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
umask
Extension Description
[15:8]
EGRESS_FULL bxxxxxx1x Some incoming message to the LLC that needed to generate a
response message for transmission onto the ring was delayed due to
ring back pressure.
VIQ_FULL bxxxxx1xx An incoming local processor RD request that missed the LLC was
delayed because the LLC victim buffer was full.
NO_S_FIFO_CREDITS bxxx1xxxx Some incoming message to the LLC that needed to generate a
message to the S-Box was delayed due to lack of available buffering
resources in the S-Box.
WB_PENDING bx1xxxxxx An incoming remote Intel QPI snoop request to the LLC was delayed
because it conflicted with an existing transaction that had a WB to
Home pending.
NACK2_ELSE b1xxxxxxx Some incoming message to the LLC was delayed for a reason not
covered by any of the other MAF_NACK1 or MAF_NACK2 sub-events.
OCCUPANCY_IPQ
• Title: IPQ Occupancy
• Category: Queue Occupancy
• Event Code: 0x1A, Max. Inc/Cyc: 8,
• Definition: Cumulative count of occupancy in the LLC’s Ingress Probe Queue.
OCCUPANCY_IRQ
• Title: IRQ Occupancy
• Category: Queue Occupancy
• Event Code: 0x18, Max. Inc/Cyc: 24,
• Definition: Cumulative count of occupancy in the LLC’s Ingress Response Queue.
OCCUPANCY_MAF
• Title: MAF Occupancy
• Category: Queue Occupancy
• Event Code: 0x1E, Max. Inc/Cyc: 16,
• Definition: Cumulative count of occupancy in the LLC’s Miss Address File.
OCCUPANCY_RSPF
• Title: RSPF Occupancy
• Category: Queue Occupancy
• Event Code: 0x22, Max. Inc/Cyc: 8,
• Definition: Cumulative count of occupancy in the Snoop Response FIFO.
2-24
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
OCCUPANCY_RWRF
• Title: RWRF Occupancy
• Category: Queue Occupancy
• Event Code: 0x20, Max. Inc/Cyc: 12,
• Definition: Cumulative count of the occupancy in the Read/Write Request FIFO.
OCCUPANCY_VIQ
• Title: VIQ Occupancy
• Category: Queue Occupancy
• Event Code: 0x1C, Max. Inc/Cyc: 8,
• Definition: Cumulative count of the occupancy in the Victim Ingress Queue.
SINKS_C2P
• Title: C2P Sinks
• Category: Ring - WIR
• Event Code: 0x06, Max. Inc/Cyc: 3,
• Definition: Number of messages sunk by the processor that were sent by one of the C-Boxes.
• NOTE: Each sink represents the transfer of 32 bytes, or 2 sinks per cache line.
umask
Extension Description
[15:8]
SINKS_P2C
• Title: P2C Sinks
• Category: Ring - WIR
• Event Code: 0x05, Max. Inc/Cyc: 3,
• Definition: Number of messages sunk from the ring at the C-Box that were sent by one of the local
processors.
• NOTE: Each sink represents the transfer of 32 bytes, or 2 sinks per cache line.
umask
Extension Description
[15:8]
BL b00000100 BL (explicit and implicit WB data from the core to the LLC)
2-25
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
SINKS_S2C
• Title: S2C Sinks
• Category: Ring - WIR
• Event Code: 0x07, Max. Inc/Cyc: 3,
• Definition: Number of messages sunk from the ring at the C-Box that were sent by the S-Box.
• NOTE: Each sink represents the transfer of 32 bytes, or 2 sinks per cache line.
umask
Extension Description
[15:8]
SINKS_S2P_BL
• Title: S2P Sinks
• Category: Ring - WIRa
• Event Code: 0x08, Max. Inc/Cyc: 1,
• Definition: Number BL ring messages sunk by the processor that were sent from the S-Box. This cov-
ers BL only, because that is the only kind of message the S-Box can send to a processor.
• NOTE: Each sink represents the transfer of 32 bytes, or 2 sinks per cache line.
SNP_HITS
• Title: Snoop Hits in LLC
• Category: Local - CC
• Event Code: 0x28, Max. Inc/Cyc: 1,
• Definition: Number of Intel QPI snoops that hit in the LLC according to state of LLC when the hit
occurred. GotoS: LLC Data or Code Read Snoop Hit ‘x’ state in remote cache. GotoI: LLC Data Read
for Ownership Snoop Hit ‘x’ state in remote cache.
umask
Extension Description
[15:8]
REMOTE_ANY b11111111 Intel QPI Snoops that hit in LLC (any line state)
2-26
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
SNPS
• Title: Snoops to LLC
• Category: Local - CC
• Event Code: 0x27, Max. Inc/Cyc: 1,
• Definition: Number of Intel QPI snoops seen by the LLC.
• NOTE: Subtract CACHE_CHAR_QUAL.ANY_HIT from this event to determine how many snoops
missed the LLC.
umask
Extension Description
[15:8]
REMOTE_RD b000000x1 Remote Read - Goto S State. Intel QPI snoops (SnpData or SnpCode)
to LLC that caused a transition to S in the cache.
NOTE: ALL SnpData and SnpCode transactions are counted. If
SnpData HITM policy is M->I, this subevent will capture those snoops.
REMOTE_RFO b0000001x Remote RFO - Goto I State. Intel QPI snoops (SnpInvOwn or
SnpInvItoE) to LLC that caused an invalidate of a cache line.
REMOTE_ANY b00000011 Intel QPI snoops to LLC that hit in the cache line
STARVED_EGRESS
• Title: Egress Queue Starved
• Category: Local - EGR
• Event Code: 0x0B, Max. Inc/Cyc: 8,
• Definition: Number of cycles that an Egress Queue is in starvation
umask
Extension Description
[15:8]
P2C_AD_SB b00000001 Processor-to-C-Box AD Egress that injects in the direction toward the
nearest S-Box
AD_SB b00000011 Sum of AD Egresses that injects in the direction toward the nearest S-
Box
AD_NSB b00000100 Sum across both AD Egress that inject in the direction away from the
nearest S-Box.
AK_SB b00001000 AK Egress that injects in the direction toward the nearest S-Box.
AK_NSB b00010000 AK Egress that injects in the direction away from the nearest S-Box.
BL_SB b00100000 BL Egress that injects in the direction toward the nearest S-Box.
BL_NSB b01000000 BL Egress that injects in the direction away from the nearest S-Box.
IV b10000000 IV Egress
TRANS_IPQ
• Title: IPQ Transactions
• Category: Queue Occupancy
• Event Code: 0x1B, Max. Inc/Cyc: 1,
• Definition: Number of Intel QPI snoop probes that entered the LLC’s Ingress Probe Queue.
2-27
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
TRANS_IRQ
• Title: IRQ Transactions
• Category: Queue Occupancy
• Event Code: 0x19, Max. Inc/Cyc: 1,
• Definition: Number of processor RD and/or WR requests to the LLC that entered the Ingress
Response Queue.
TRANS_MAF
• Title: MAF Transactions
• Category: Queue Occupancy
• Event Code: 0x1F, Max. Inc/Cyc: 1,
• Definition: Number of transactions to allocate entries in LLC’s Miss Address File.
TRANS_RSPF
• Title: RSPF Transactions
• Category: Queue Occupancy
• Event Code: 0x23, Max. Inc/Cyc: 1,
• Definition: Number of snoop responses from the processor that passed through the Snoop Response
FIFO. The RSPF is a buffer that sits between each processor and the ring that buffers the processor’s
snoop responses in the event that there is back pressure due to ring congestion.
TRANS_RWRF
• Title: RWRF Transactions
• Category: Queue Occupancy
• Event Code: 0x21, Max. Inc/Cyc: 1,
• Definition: Number of requests that passed through the Read/Write Request FIFO. The RWRF is a
buffer that sits between each processor and the ring that buffers the processor’s RD/WR requests in
the event that there is back pressure due to ring congestion.
TRANS_VIQ
• Title: VIQ Transactions
• Category: Queue Occupancy
• Event Code: 0x1D, Max. Inc/Cyc: 1,
• Definition: Number of LLC victims to enter the Victim Ingress Queue. All LLC victims pass through
this queue. Including those that end up not requiring a WB.
2-28
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
The B-Box has additional requirements on managing interactions with the M-Box, including RAS flows.
All requests in-flight in the M-Box are tracked in the B-Box. The primary function of the B-Box, is as the
coherent home agent for the Intel® QuickPath Interconnect cache coherence protocol. The home agent
algorithm requires the B-Box to track outstanding requests, log snoop responses and other control
messages, and make certain algorithmic decisions about how to respond to requests.
The B-Box only supports source snoopy Intel QuickPath Interconnect protocol flows.
For information on how to setup a monitoring session, refer to Section 2.1, “Global Performance
Monitoring Control”.
HW can be also configured (by setting the corresponding .pmi_en to 1) to send a PMI to the U-Box
when an overflow is detected. The U-Box may be configured to freeze all uncore counting and/or send a
PMI to selected cores when it receives this signal.
Once a freeze has occurred, in order to see a new freeze, the overflow field responsible for the freeze,
found in B_MSR_PMON_GLOBAL_OVF_CTL.clr_ov, must be cleared. Assuming all the counters have
been locally enabled (.en bit in data registers meant to monitor events) and the overflow bit(s) has
been cleared, the B-Box is prepared for a new sample interval. Once the global controls have been re-
enabled (Section 2.1.4, “Enabling a New Sample Interval from Frozen Counters”) counting will resume.
2-29
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
The _GLOBAL_CTL register contains the bits used to enable monitoring. It is necessary to set the
.ctr_en bit to 1 before the corresponding data register can collect events.
If an overflow is detected from one of the B-Box PMON registers, the corresponding bit in the
_GLOBAL_STATUS.ov field will be set. To reset the overflow bits set in the _GLOBAL_STATUS.ov field, a
user must set the corresponding bits in the _GLOBAL_OVF_CTL.clr_ov field before beginning a new
sample interval.
2-30
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
ctr_en 3:0 0 Must be set to enable each B-Box counter (bit 0 to enable ctr0, etc)
NOTE: U-Box enable and per counter enable must also be set to fully
enable the counter.
NOTE: In the B-Box, each control register can only select from a specific set of events (see Table 2-24,
“Performance Monitor Events for B-Box Events” for the mapping).
2-31
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
pmi_en 20 0 When this bit is asserted and the corresponding counter overflows, a PMI
exception is sent to the U-Box.
en 0 0 Enable counter
The B-Box performance monitor data registers are 48b wide. A counter overflow occurs when a carry
out bit from bit 47 is detected. Software can force all uncore counting to freeze after N events by
preloading a monitor with a count value of (248 - 1) - N and setting the control register to send a PMI to
the U-Box. Upon receipt of the PMI, the U-Box will disable counting ( Section 2.1.1.1, “Freezing on
Counter Overflow”). During the interval of time between overflow and global disable, the counter value
will wrap and continue to collect events.
In this way, software can capture the precise number of events that occurred between the time uncore
counting was enabled and when it was disabled (or ‘frozen’) with minimal skew.
If accessible, software can continuously read the data registers without disabling event collection.
In addition to generic event counting, each B-Box provides a MATCH/MASK register pair that allow a
user to filter packet traffic (incoming and outgoing) according to the packet Opcode, Message Class and
Physical Address. Various events can be programmed to enable a B-Box performance counter (i.e.
OPCODE_ADDR_IN_MATCH for counter 0) to capture the filter match as an event. The fields are laid out
as follows:
Note: Refer to Table 2-103, “Intel® QuickPath Interconnect Packet Message Classes” and
Table 2-104, “Opcode Match by Message Class” to determine the encodings of the B-Box
Match Register fields.
2-32
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
addr 43:0 0 Match to this System Address - cache aligned address 49:6
addr 43:0 0 Mask this System Address - cache aligned address 49:6
2-33
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
• SNPOQ (SNP Output Queue) 256-entry - request is pushed when a snoop message has to be sent
but is blocked due to lack of credits or the output port is busy.
• DRSOQ (DRS Output Queue) 32-entry - request is pushed when a DRS message has to be sent but
is blocked due to lack of credits or the output port is busy.
• MAQ (M-Box Arbitration Queue) 32-entry - Request is pushed when it asks for M-Box access and
the M-Box backpressures the B-Box (e.g. a link error flow in M-Box).
• MDRSOQ (Mirror DRS Output Queue) 32-entry - Request is pushed onto Mirror DRS output queue
when a NonSnpWrData(Ptl) needs to be sent to the mirror slave and VN1 DRS channel or Intel QPI
output resources are unavailable.
• MHOMOQ (Mirror HOM Output Queue) 256-entry - Request is pushed onto Mirror Home output
queue when a Home message needs to be sent out from mirror master to mirror slave (NonSnpWr/
RdCode/RdInvOwn) but is blocked due to a lack of credits (VN1 HOM) or the output port is busy.
The 256-entry TF (Tracker File) holds all transactions that arrive in the B-Box from the time they arrive
until they are completed and leave the B-Box. Transactions could stay in this structure much longer
than they are needed. IMT is the critical resource each transaction needs before being sent to the M-
Box (memory controller)
NOTE: The latency is only valid under ‘normal’ circumstances in which a request generates a memory
prefetch. It will not hold true if the IMT is full and a prefetch isn’t generated.
- InvItoE - returns ownership of line to requestor (in E-state) without returning data. The Home Agent
sees it as an RFO and the memory controller sees it as memory read. The B-Box has to do more with it
than it does for regular reads. It must make sure that not only the data gets forwarded to the
requestor, but that it grants ownership to requestor. Depending on where the InvItoE request originates
from, the HA takes different actions - if triggered by directory agent then HA/B-Box has to generate
snoop invalidations to other caches/directories.
2-34
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
Counter 0 Events
Counter 1 Events
Counter 2 Events
2-35
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
Counter 3 Events
ACK_BEFORE_LAST_SNP
• Title: Ack Before Last Snoop
• Category: Snoops
• Event Code: 0x19, Max. Inc/Cyc: 1, PERF_CTL: 3,
• Definition: Number of times M-Box acknowledge arrives before the last snoop response. For
transactions issued to the memory controller (M-Box) as prefetches.
ADDR_IN_MATCH
• Title: Address In Match
• Category: Mask/Match
• Event Code: 0x04, Max. Inc/Cyc: 1, PERF_CTL: 2,
• Definition: Address Match at B-Box Input. Use B_MSR_MATCH/MASK_REG
CONFLICTS
• Title: Conflicts
• Category: Miscellaneous
• Event Code: 0x17, Max. Inc/Cyc: 1, PERF_CTL: 3,
• Definition: Number of conflicts.
COHQ_BYPASS
• Title: COHQ Bypass
• Category: ARB Queues
• Event Code: 0x0E, Max. Inc/Cyc: 1, PERF_CTL: 3,
• Definition: Coherence Queue Bypasses.
2-36
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
COHQ_IMT_ALLOC_WAIT
• Title: COHQ IMT Allocation Wait
• Category: ARB Queues
• Event Code: 0x13, Max. Inc/Cyc: 1, PERF_CTL: 3,
• Definition: Cycles Coherence Queue Waiting on IMT Allocation.
DIRQ_INSERTS
• Title: DIRQ Inserts
• Category: ARB Queues
• Event Code: 0x17, Max. Inc/Cyc: 1, PERF_CTL: 1,
• Definition: Directory Queue Inserts. Queue Depth is 256.
DIRQ_OCCUPANCY
• Title: DIRQ Occupancy
• Category: ARB Queues
• Event Code: 0x17, Max. Inc/Cyc: 1, PERF_CTL: 0,
• Definition: Directory Queue Occupancy. Queue Depth is 256.
DEMAND_FETCH
• Title: Demand Fetches
• Category: Miscellaneous
• Event Code: 0x0F, Max. Inc/Cyc: 1, PERF_CTL: 3,
• Definition: Counts number of times a memory access was issued after CohQ pop (i.e. IMT prefetch
was not used).
DRSQ_INSERTS
• Title: DRSQ Inserts
• Category: ARB Queues
• Event Code: 0x09, Max. Inc/Cyc: 1, PERF_CTL: 1,
• Definition: DRSQ Inserts.
DRSQ_OCCUPANCY
• Title: DRSQ Occupancy
• Category: ARB Queues
• Event Code: 0x09, Max. Inc/Cyc: 1, PERF_CTL: 0,
• Definition: DRSQ Occupancy. Queue Depth is 4.
EARLY_ACK
• Title: Early ACK
• Category: Miscellaneous
• Event Code: 0x02, Max. Inc/Cyc: 1, PERF_CTL: 3,
• Definition: M-Box Early Acknowledgements.
IMPLICIT_WBS
• Title: Implicit WBs
• Category: Miscellaneous
• Event Code: 0x12, Max. Inc/Cyc: 1, PERF_CTL: 3,
• Definition: Number of Implicit Writebacks.
2-37
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
IMT_FULL
• Title: IMT Full
• Category: In-Flight Memory Table
• Event Code: 0x16, Max. Inc/Cyc: 1, PERF_CTL: 3,
• Definition: Number of times In-Flight Memory Table was full when entry was needed by incoming
transaction.
IMT_INSERTS_ALL
• Title: IMT All Inserts
• Category: In-Flight Memory Table
• Event Code: 0x07, Max. Inc/Cyc: 1, PERF_CTL: 1,
• Definition: Inserts (all requests) to In-Flight Memory Table (e.g. all memory transactions targeting
this B-Box as their home node and processed by this B-Box)
• NOTE: Conflicts and AckConflicts are considered IMT insert events. If the conflict rate (CONFLICTS/
IMT_INSERTS_ALL * 100) is > ~5%, it is not recommended that this event (along with
IMT_VALID_OCCUPANCY) be used to derive average IMT latency or latency for specific flavors of
inserts.
IMT_INSERTS_INVITOE
• Title: IMT InvItoE Inserts
• Category: In-Flight Memory Table
• Event Code: 0x0F, Max. Inc/Cyc: 1, PERF_CTL: 1,
• Definition: In-Flight Memory Table inserts of InvItoE requests (e.g. all InvItoE memory transactions
targeting this B-Box as their home node and processed by this B-Box)
• NOTE: Conflicts and AckConflicts are considered IMT insert events. If the conflict rate (CONFLICTS/
IMT_INSERTS_ALL * 100) is > ~5%, it is not recommended that this event (along with
IMT_VALID_OCCUPANCY) be used to derive average IMT latency or latency for specific flavors of
inserts.
IMT_INSERTS_IOH
• Title: IMT IOH Inserts
• Category: In-Flight Memory Table
• Event Code: 0x0A, Max. Inc/Cyc: 1, PERF_CTL: 1,
• Definition: In-Flight Memory Table inserts of IOH requests (e.g. all IOH triggered memory
transactions targeting this B-Box as their home node and processed by this B-Box)
• NOTE: Conflicts and AckConflicts are considered IMT insert events. If the conflict rate (CONFLICTS/
IMT_INSERTS_ALL * 100) is > ~5%, it is not recommended that this event (along with
IMT_VALID_OCCUPANCY) be used to derive average IMT latency or latency for specific flavors of
inserts.
IMT_INSERTS_IOH_INVITOE
• Title: IMT IOH InvItoE Inserts
• Category: In-Flight Memory Table
• Event Code: 0x10, Max. Inc/Cyc: 1, PERF_CTL: 1,
• Definition: In-Flight Memory Table inserts of IOH InvItoE requests (e.g. all IOH triggered InvItoE
memory transactions targeting this B-Box as their home node and processed by this B-Box)
• NOTE: Conflicts and AckConflicts are considered IMT insert events. If the conflict rate (CONFLICTS/
IMT_INSERTS_ALL * 100) is > ~5%, it is not recommended that this event (along with
IMT_VALID_OCCUPANCY) be used to derive average IMT latency or latency for specific flavors of
inserts.
2-38
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
IMT_INSERTS_IOH_WR
• Title: IMT IOH Write Inserts
• Category: In-Flight Memory Table
• Event Code: 0x0D, Max. Inc/Cyc: 1, PERF_CTL: 1,
• Definition: In-Flight Memory Table Write IOH Request Inserts (e.g. all IOH triggered memory write
transactions targeting this B-Box as their home node and processed by this B-Box)
• NOTE: Conflicts and AckConflicts are considered IMT insert events. If the conflict rate (CONFLICTS/
IMT_INSERTS_ALL * 100) is > ~5%, it is not recommended that this event (along with
IMT_VALID_OCCUPANCY) be used to derive average IMT latency or latency for specific flavors of
inserts.
IMT_INSERTS_NON_IOH
• Title: IMT Non-IOH Inserts
• Category: In-Flight Memory Table
• Event Code: 0x0B, Max. Inc/Cyc: 1, PERF_CTL: 1,
• Definition: In-Flight Memory Table inserts of Non-IOH requests (e.g. all non IOH triggered memory
transactions targeting this B-Box as their home node and processed by this B-Box)
• NOTE: Conflicts and AckConflicts are considered IMT insert events. If the conflict rate (CONFLICTS/
IMT_INSERTS_ALL * 100) is > ~5%, it is not recommended that this event (along with
IMT_VALID_OCCUPANCY) be used to derive average IMT latency or latency for specific flavors of
inserts.
IMT_INSERTS_NON_IOH_INVITOE
• Title: IMT Non-IOH InvItoE Inserts
• Category: In-Flight Memory Table
• Event Code: 0x1C, Max. Inc/Cyc: 1, PERF_CTL: 1,
• Definition: In-Flight Memory Table inserts of Non-IOH InvItoE requests (e.g. all non IOH triggered
InvItoE memory transactions targeting this B-Box as their home node and processed by this B-Box)
• NOTE: Conflicts and AckConflicts are considered IMT insert events. If the conflict rate (CONFLICTS/
IMT_INSERTS_ALL * 100) is > ~5%, it is not recommended that this event (along with
IMT_VALID_OCCUPANCY) be used to derive average IMT latency or latency for specific flavors of
inserts.
IMT_INSERTS_NON_IOH_RD
• Title: IMT Non-IOH Read Inserts
• Category: In-Flight Memory Table
• Event Code: 0x1F, Max. Inc/Cyc: 1, PERF_CTL: 1,
• Definition: In-Flight Memory Table inserts of Non-IOH read requests (e.g. all non IOH triggered
memory read transactions targeting this B-Box as their home node and processed by this B-Box)
• NOTE: Conflicts and AckConflicts are considered IMT insert events. If the conflict rate (CONFLICTS/
IMT_INSERTS_ALL * 100) is > ~5%, it is not recommended that this event (along with
IMT_VALID_OCCUPANCY) be used to derive average IMT latency or latency for specific flavors of
inserts.
IMT_INSERTS_NON_IOH_WR
• Title: IMT Non-IOH Write Inserts
• Category: In-Flight Memory Table
• Event Code: 0x0E, Max. Inc/Cyc: 1, PERF_CTL: 1,
• Definition: In-Flight Memory Table Write Non-IOH Request Inserts (e.g. all non IOH triggered
memory write transactions targeting this B-Box as their home node and processed by this B-Box)
• NOTE: Conflicts and AckConflicts are considered IMT insert events. If the conflict rate (CONFLICTS/
IMT_INSERTS_ALL * 100) is > ~5%, it is not recommended that this event (along with
IMT_VALID_OCCUPANCY) be used to derive average IMT latency or latency for specific flavors of
inserts.
2-39
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
IMT_INSERTS_RD
• Title: IMT Read Inserts
• Category: In-Flight Memory Table
• Event Code: 0x1D, Max. Inc/Cyc: 1, PERF_CTL: 1,
• Definition: In-Flight Memory Table inserts of read requests (e.g. all memory read transactions target-
ing this B-Box as their home node and processed by this B-Box)
• NOTE: Conflicts and AckConflicts are considered IMT insert events. If the conflict rate (CONFLICTS/
IMT_INSERTS_ALL * 100) is > ~5%, it is not recommended that this event (along with
IMT_VALID_OCCUPANCY) be used to derive average IMT latency or latency for specific flavors of
inserts.
IMT_INSERTS_WR
• Title: IMT Write Inserts
• Category: In-Flight Memory Table
• Event Code: 0x0C, Max. Inc/Cyc: 1, PERF_CTL: 1,
• Definition: In-Flight Memory Table Write Request Inserts (e.g. all memory write transactions
targeting this B-Box as their home node and processed by this B-Box)
• NOTE: Conflicts and AckConflicts are considered IMT insert events. If the conflict rate (CONFLICTS/
IMT_INSERTS_ALL * 100) is > ~5%, it is not recommended that this event (along with
IMT_VALID_OCCUPANCY) be used to derive average IMT latency or latency for specific flavors of
inserts.
IMT_NE_CYCLES
• Title: IMT Non-Empty Cycles
• Category: In-Flight Memory Table
• Event Code: 0x07, Max. Inc/Cyc: 1, PERF_CTL: 2,
• Definition: In-Flight Memory Table Non-Empty Cycles.
IMT_PREALLOC
• Title: IMT Prealloc
• Category: In-Flight Memory Table
• Event Code: 0x06, Max. Inc/Cyc: 1, PERF_CTL: 3,
• Definition: In-Flight Memory Table in 1 DRS preallocation mode.
IMT_VALID_OCCUPANCY
• Title: IMT Valid Occupancy
• Category: In-Flight Memory Table
• Event Code: 0x07, Max. Inc/Cyc: 1, PERF_CTL: 0,
• Definition: In-Flight Memory Table (tracks memory transactions that have already been sent to the
memory controller connected to this B-Box) valid occupancy. Indicates occupancy of the IMT.
• NOTE: A count of valid entries is accumulated every clock cycle in a subcounter. Since the IMT Queue
Depth is 32, multiple this event by 32 to get a full count of valid IMT entries.
MSG_ADDR_IN_MATCH
• Title: Message + Address In Match
• Category: Mask/Match
• Event Code: 0x01, Max. Inc/Cyc: 1, PERF_CTL: 0,
• Definition: Message Class and Address Match at B-Box Input. Use B_MSR_MATCH/MASK_REG
MSGS_B_TO_S
• Title: SB Link (B to S) Messages
• Category: S-Box Interface
• Event Code: 0x03, Max. Inc/Cyc: 1, PERF_CTL: 2,
• Definition: Number of SB Link (B to S) Messages (multiply by 9 to get flit count).
2-40
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
MSG_IN_MATCH
• Title: Message In Match
• Category: Mask/Match
• Event Code: 0x01, Max. Inc/Cyc: 1, PERF_CTL: 1,
• Definition: Message Class Match at B-Box Input. Use B_MSR_MATCH/MASK_REG
MSGS_IN_NON_SNP
• Title: Incoming Non-Snoop Messages
• Category: Snoops
• Event Code: 0x01, Max. Inc/Cyc: 1, PERF_CTL: 2,
• Definition: Incoming Non-Snoop Messages.
MSG_OPCODE_ADDR_IN_MATCH
• Title: Message + Opcode + Address In Match
• Category: Mask/Match
• Event Code: 0x03, Max. Inc/Cyc: 1, PERF_CTL: 0,
• Definition: Message Class, Opcode and Address Match at B-Box Input. Use B_MSR_MATCH/
MASK_REG
MSG_OPCODE_IN_MATCH
• Title: Message + Opcode In Match
• Category: Mask/Match
• Event Code: 0x05, Max. Inc/Cyc: 1, PERF_CTL: 1,
• Definition: Message Class and Opcode Match at B-Box Input. Use B_MSR_MATCH/MASK_REG
MSG_OPCODE_OUT_MATCH
• Title: Message + Opcode Out Match
• Category: Mask/Match
• Event Code: 0x06, Max. Inc/Cyc: 1, PERF_CTL: 1,
• Definition: Message Class and Opcode Match at B-Box Output. Use B_MSR_MATCH/MASK_REG
MSG_OUT_MATCH
• Title: Message Out Match
• Category: Mask/Match
• Event Code: 0x02, Max. Inc/Cyc: 1, PERF_CTL: 1,
• Definition: Message Class Match at B-Box Output. Use B_MSR_MATCH/MASK_REG
MSGS_S_TO_B
• Title: SB Link (S to B) Messages
• Category: S-Box Interface
• Event Code: 0x02, Max. Inc/Cyc: 1, PERF_CTL: 2,
• Definition: Number of SB Link (S to B) Messages.
OPCODE_ADDR_IN_MATCH
• Title: Opcode + Address In Match
• Category: Mask/Match
• Event Code: 0x02, Max. Inc/Cyc: 1, PERF_CTL: 0,
• Definition: Opcode and Address Match at B-Box Input. Use B_MSR_MATCH/MASK_REG
2-41
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
OPCODE_IN_MATCH
• Title: Opcode In Match
• Category: Mask/Match
• Event Code: 0x03, Max. Inc/Cyc: 1, PERF_CTL: 1,
• Definition: Opcode Match at B-Box Input. Use B_MSR_MATCH/MASK_REG
OPCODE_OUT_MATCH
• Title: Opcode Out Match
• Category: Mask/Match
• Event Code: 0x04, Max. Inc/Cyc: 1, PERF_CTL: 1,
• Definition: Opcode Match at B-Box Output. Use B_MSR_MATCH/MASK_REG
RBOX_VNA_UNAVAIL
• Title: R-Box VNA Unavailable
• Category: R-Box Interface
• Event Code: 0x15, Max. Inc/Cyc: 1, PERF_CTL: 3,
• Definition: Number of times R-Box VNA credit was not available when needed.
SBOX_VN0_UNAVAIL
• Title: S-Box VN0 Unavailable
• Category: S-Box Interface
• Event Code: 0x14, Max. Inc/Cyc: 1, PERF_CTL: 3,
• Definition: Number of times S-Box VN0 credit was not available when needed.
SNPOQ_INSERTS
• Title: SNPOQ Inserts
• Category: ARB Queues
• Event Code: 0x12, Max. Inc/Cyc: 1, PERF_CTL: 1,
• Definition: SNP Output Queue Inserts. Queue Depth is 256.
SNPOQ_OCCUPANCY
• Title: SNPOQ Occupancy
• Category: ARB Queues
• Event Code: 0x12, Max. Inc/Cyc: 1, PERF_CTL: 0,
• Definition: SNP Output Queue Occupancy. Queue Depth is 256.
TF_ALL
• Title: TF Occupancy - All
• Category: Tracker File
• Event Code: 0x04, Max. Inc/Cyc: 1, PERF_CTL: 0,
• Definition: Tracker File occupancy for all requests. Accumulates lifetimes of all memory transactions
that have arrived in this B-Box (TF starts tracking transactions before they are sent to the M-Box).
• NOTE: This event captures overflows from a subcounter tracking all requests. Multiply by 256 to
determine the correct count.
2-42
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
TF_INVITOE
• Title: TF Occupancy - InvItoEs
• Category: Tracker File
• Event Code: 0x06, Max. Inc/Cyc: 1, PERF_CTL: 0,
• Definition: Tracker File occupancy for InvItoE requests. Accumulates lifetimes of InvItoE memory
transactions that have arrived in this B-Box (TF starts tracking transactions before they are sent to the
M-Box).
• NOTE: This event captures overflows from a subcounter tracking all requests. Multiply by 256 to
determine the correct count.
TF_IOH
• Title: TF Occupancy - All IOH
• Category: Tracker File
• Event Code: 0x0B, Max. Inc/Cyc: 1, PERF_CTL: 0,
• Definition: Tracker File occupancy for IOH requests. Accumulates lifetimes of IOH triggered memory
transactions that have arrived in this B-Box (TF starts tracking transactions before they are sent to the
M-Box).
• NOTE: This event captures overflows from a subcounter tracking all requests. Multiply by 256 to
determine the correct count.
TF_IOH_INVITOE
• Title: TF Occupancy - IOH InvItoEs
• Category: Tracker File
• Event Code: 0x0F, Max. Inc/Cyc: 1, PERF_CTL: 0,
• Definition: Tracker File occupancy for IOH InvItoE requests. Accumulates lifetimes of IOH triggered
InvItoE memory transactions that have arrived in this B-Box (TF starts tracking transactions before
they are sent to the M-Box).
• NOTE: This event captures overflows from a subcounter tracking all requests. Multiply by 256 to
determine the correct count.
TF_IOH_NON_INVITOE_RD
• Title: TF Occupancy - IOH Non-InvItoE Reads
• Category: Tracker File
• Event Code: 0x1C, Max. Inc/Cyc: 1, PERF_CTL: 0,
• Definition: Tracker File occupancy for IOH Non-InvItoE read requests. Accumulates lifetimes of IOH
triggered non-InvItoE memory transactions that have arrived in this B-Box (TF starts tracking
transactions before they are sent to the M-Box).
• NOTE: This event captures overflows from a subcounter tracking all requests. Multiply by 256 to
determine the correct count.
TF_IOH_WR
• Title: TF Occupancy - IOH Writes
• Category: Tracker File
• Event Code: 0x0D, Max. Inc/Cyc: 1, PERF_CTL: 0,
• Definition: Tracker File occupancy for IOH write requests. Accumulates lifetimes of IOH triggered
write transactions that have arrived in this B-Box (TF starts tracking transactions before they are sent
to the M-Box).
• NOTE: This event captures overflows from a subcounter tracking all requests. Multiply by 256 to
determine the correct count.
2-43
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
TF_WR
• Title: TF Occupancy - Writes
• Category: Tracker File
• Event Code: 0x05, Max. Inc/Cyc: 1, PERF_CTL: 0,
• Definition: Tracker File occupancy for write requests. Accumulates lifetimes of write memory
transactions that have arrived in this B-Box (TF starts tracking transactions before they are sent to the
M-Box).
• NOTE: This event captures overflows from a subcounter tracking all requests. Multiply by 256 to
determine the correct count.
2-44
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
As such, it shares responsibility with the C-Box(es) as the Intel QPI caching agent(s). It is responsible
for converting C-box requests to Intel QPI messages (i.e. snoop generation and data response
messages from the snoop response) as well as converting/forwarding ring messages to Intel QPI
packets and vice versa.
The S-Box also includes a mask/match register that allows a user to match packets leaving the S-Box
according to various standard packet fields such as message class, opcode, etc. (NOTE: specifically
goes with event 0, does not effect other events)
For information on how to setup a monitoring session, refer to Section 2.1, “Global Performance
Monitoring Control”.
HW can be also configured (by setting the corresponding .pmi_en to 1) to send a PMI to the U-Box
when an overflow is detected. The U-Box may be configured to freeze all uncore counting and/or send a
PMI to selected cores when it receives this signal.
Once a freeze has occurred, in order to see a new freeze, the overflow field responsible for the freeze,
must be cleared by setting the corresponding bit in S_MSR_PMON_GLOBAL_OVF_CTL.clr_ov. Assuming
all the counters have been locally enabled (.en bit in data registers meant to monitor events) and the
overflow bit(s) has been cleared, the S-Box is prepared for a new sample interval. Once the global
controls have been re-enabled (Section 2.1.4, “Enabling a New Sample Interval from Frozen
Counters”), counting will resume.
Note: Due to the nature of the subcounters used in the S-Box, if a queue occupancy count
event is set up to be captured, SW should set .reset_occ_cnt in the same write that the
corresponding control register is enabled.
MSR Size
MSR Name Access Description
Address (bits)
SS1_CR_S_MSR_MASK RW_RO 0x0E5A 64 S-Box 1 Enable Mask Register
2-45
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
MSR Size
MSR Name Access Description
Address (bits)
2-46
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
The _GLOBAL_CTL register contains the bits used to enable monitoring. It is necessary to set the
.ctr_en bit to 1 before the corresponding data register can collect events.
If an overflow is detected from one of the S-Box PMON registers, the corresponding bit in the
_GLOBAL_STATUS.ov field will be set. To reset the overflow bits set in the _GLOBAL_STATUS.ov field, a
user must set the corresponding bits in the _GLOBAL_OVF_CTL.clr_ov field before beginning a new
sample interval.
ctr_en 3:0 0 Must be set to enable each SBOX counter (bit 0 to enable ctr0, etc)
NOTE: U-Box enable and per counter enable must also be set to fully
enable the counter.
2-47
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
clr_ov 3:0 0 Writing ‘1’ to bit in filed causes corresponding bit in ‘Overflow PerfMon
Counter’ field in S_CSR_PMON_GLOBAL_STATUS register to be cleared to
0.
- .threshold - If the .threshold is set to a non-zero value, that value is compared against the incoming
count for that event in each cycle. If the incoming count is >= the threshold value, then the event
count captured in the data register will be incremented by 1.
- .edge_detect - Rather than accumulating the raw count each cycle (for events that can increment by
1 per cycle), the register can capture transitions from no event to an event incoming.
invert 23 0 Invert threshold comparison. When ‘0’, the comparison will be thresh >=
event. When ‘1’, the comparison will be threshold < event.
pmi_en 20 0 PMI Enable. If bit is set, when corresponding counter overflows, a PMI
exception is sent to the U-Box.w
edge_detect 18 0 Edge Detect. When bit is set, 0->1 transition of a one bit event input will
cause counter to increment. When bit is 0, counter will increment for
however long event is asserted.
2-48
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
The S-Box performance monitor data registers are 48b wide. A counter overflow occurs when a carry
out bit from bit 47 is detected. Software can force all uncore counting to freeze after N events by
preloading a monitor with a count value of (248 - 1) - N and setting the control register to send a PMI to
the U-Box. Upon receipt of the PMI, the U-Box will disable counting ( Section 2.1.1.1, “Freezing on
Counter Overflow”). During the interval of time between overflow and global disable, the counter value
will wrap and continue to collect events.
In this way, software can capture the precise number of events that occurred between the time uncore
counting was enabled and when it was disabled (or ‘frozen’) with minimal skew.
If accessible, software can continuously read the data registers without disabling event collection.
a) Set MM_CFG (see Table 2-32, “S_MSR_MM_CFG Register – Field Definitions”) reg bit 63 to 0.
b) Program match/mask regs (see Table 2-33, “S_MSR_MATCH Register – Field Definitions”). (if
MM_CFG[63] == 1, a write to match/mask will produce a GP fault).
NOTE: The address and the Home Node ID have a mask component in the MASK register. To mask off
other fields (e.g. opcode or message class), set the field to all 0s.
c) Set the counter’s control register event select to 0x0 (TO_R_PROG_EV) to capture the mask/match
as a performance event.
2-49
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
opc 58:48 0 Match on Opcode (see Table 2-34, “S_MSR_MATCH.opc - Opcode Match
by Message Class”)
b1xxxx - NCB
bx1xxx - NCS
bxx1xx - NDR
bxxx1x - HOM1
bxxxx1 - HOM0
Refer to Table 2-105, “Opcodes (Alphabetical Listing)” for definitions of the opcodes found in the
following table.
2-50
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
a) none of the physical queues receive more than one entry per cycle
b) The entire 7b from the ‘selected’ (by the event select) queue occ subcounter is sent to the generic
counter each cycle, meaning that the max inc of a generic counter is 64 (for the sys bound HOM
buffer).
Associated with each of the four general purpose counters is a 7b queue occupancy counter which
supports the various queue occupancy events found in Section 2.5.5, “S-Box Events Ordered By Code”.
Each System Bound and Ring Bound data storage structure within the S-Box (queue/FIFO/buffer) has
an associated tally counter which can be used to provide input into one of the S-Box performance
counters. The data structure the tally counter is ‘attached’ to then sends increment/decrement signals
as it receives/removes new entries. The tally counter then sends its contents to the performance
counter each cycle.
The following table summarizes the queues (and their associated events) responsible for buffering
traffic into and out of the S-Box.
2-51
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
Ring Bound Message Queue SNP 31 1 Packets from System (SNP/NCS/NCB - NDR
TO_RING_MSGQ_OCCUPANCY is separate)
NCS 4
Flits per
Message Class Comment
Msg
HOM 1
SNP 1
NDR 1
Ring Bound DRS 9 R2S and B2S DRS messages are always full cacheline messages which are 9
flits.
NOTE: flits are variable in the Sys Bound direction.
Ring Bound NCS 3 The only ring bound NCS message type is NcMsgS. There are always 3 flits.
NOTE: flits are variable in the Sys Bound direction.
2-52
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
Flits per
Message Class Comment
Msg
Ring Bound NCB 11 The only ring bound NCB message types are: NcMsgB, IntLogical, IntPhysical.
These are all 11 flit messages.
NOTE: flits are variable in the Sys Bound direction.
The number of flits sent or received can be divided by the total number of uncore cycles (see Section
2.8.2, “W-Box Performance Monitoring Overview”) to calculate the link utilization for each message
class. The combined number of flits across message classes can be used to calculate the total link
utilization.
Note that for S2R and R2S links, there is no single event which counts the total number of message and
credit carrying idle flits sent on the link. The total link utilization can be approximated by adding
together the number of flits of the message classes that are expected to be most frequent.
TO_RING_SNP_MSGQ_CYCLES_NE 0x23 1 Cycles Ring Bound SNP Message Queue Not Empty
2-53
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
TO_RING_NCB_MSGQ_CYCLES_NE 0x24 1 Cycles Ring Bound NCB Message Queue Not Empty
TO_RING_NCS_MSGQ_CYCLES_NE 0x25 1 Cycles Ring Bound NCS Message Queue Not Empty
TO_RING_NDR_MSGQ_CYCLES_NE 0x28 1 Cycles Ring Bound NDR Message Queue Not Empty
TO_RING_R2S_MSGQ_CYCLES_NE 0x2B 1 Cycles Ring Bound R2S Message Queue Not Empty
TO_RING_B2S_MSGQ_CYCLES_NE 0x2E 1 Cycles Ring Bound B2S Message Queue Not Empty
2-54
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
B2S_DRS_BYPASS
• Title: B-Box to S-Box DRS Bypass
• Category: Ring Bound Enhancement
• Event Code: 0x53, Max. Inc/Cyc: 1,
• Definition: Number of cycles the B-Box to S-Box DRS channel bypass optimization was utilized.
Includes cycles used to transmit message flits and credit carrying idle credit flits.
BBOX_CREDITS
• Title: B-Box Credit Carrying Flits
• Category: Ring Bound Transmission
• Event Code: 0x77, Max. Inc/Cyc: 1,
• Definition: Number credit carrying idle flits received from the B-Box.
BBOX_CREDIT_RETURNS
• Title: B-Box Credit Returns
• Category: System Bound Transmission
• Event Code: 0x6B, Max. Inc/Cyc: 1,
• Definition: Number credit return idle flits sent to the B-Box.
BBOX_HOM_BYPASS
• Title: B-Box HOM Bypass
• Category: System Bound Enhancement
• Event Code: 0x54, Max. Inc/Cyc: 1,
• Definition: B-Box HOM Bypass optimization utilized.
2-55
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
EGRESS_ARB_LOSSES
• Title: Egress ARB Losses
• Category: Ring Bound Credits
• Event Code: 0x42, Max. Inc/Cyc: 1,
• Definition: Egress Arbitration Losses.
• NOTE: Enabling multiple subevents in this category will result in the counter being increased by the
number of selected subevents that occur in a given cycle. Because only one of the even/odd FIFOs
can arbitrate to send onto the ring in each cycle, the event for the even/odd FIFOs in each direction
are exclusive. The bypass event for each direction is the sum of the bypass events of the even/odd
FIFOs.
umask
Extension Description
[15:8]
AD b000011 AD
AK b001100 AK
BL b110000 BL
EGRESS_ARB_WINS
• Title: Egress ARB Wins
• Category: Ring Bound Transmission
• Event Code: 0x41, Max. Inc/Cyc: 1,
• Definition: Egress Arbitration Wins.
• NOTE: Enabling multiple subevents in this category will result in the counter being increased by the
number of selected subevents that occur in a given cycle. Because only one of the even/odd FIFOs
can arbitrate to send onto the ring in each cycle, the event for the even/odd FIFOs in each direction
are exclusive. The bypass event for each direction is the sum of the bypass events of the even/odd
FIFOs.
umask
Extension Description
[15:8]
AD b000011 AD
AK b001100 AK
BL b110000 BL
2-56
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
EGRESS_BYPASS
• Title: Egress Bypass
• Category: Ring Bound Enhancement
• Event Code: 0x40, Max. Inc/Cyc: 1,
• Definition: Egress Bypass optimization utilized.
• NOTE: Enabling multiple subevents in this category will result in the counter being increased by the
number of selected subevents that occur in a given cycle. Because only one of the even/odd FIFOs
can arbitrate to send onto the ring in each cycle, the event for the even/odd FIFOs in each direction
are exclusive. The bypass event for each direction is the sum of the bypass events of the even/odd
FIFOs.
umask
Extension Description
[15:8]
AD b000011 AD
AK b001100 AK
BL b110000 BL
EGRESS_STARVED
• Title: Egress Cycles in Starvation
• Category: Ring Bound Credits
• Event Code: 0x43, Max. Inc/Cyc: 1,
• Definition: Number of cycles the S-Box egress FIFOs are in starvation.
• NOTE: Enabling multiple subevents in this category will result in the counter being increased by the
number of selected subevents that occur in a given cycle. Because only one of the even/odd FIFOs
can arbitrate to send onto the ring in each cycle, the event for the even/odd FIFOs in each direction
are exclusive. The bypass event for each direction is the sum of the bypass events of the even/odd
FIFOs.
umask
Extension Description
[15:8]
AD b000011 AD
AK b001100 AK
BL b110000 BL
2-57
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
FLITS_SENT_DRS
• Title: DRS Flits Sent to System
• Category: System Bound Transmission
• Event Code: 0x65, Max. Inc/Cyc: 1,
• Definition: Number of data response flits the S-Box has transmitted to the system.
FLITS_SENT_NCB
• Title: NCB Flits Sent to System
• Category: System Bound Transmission
• Event Code: 0x69, Max. Inc/Cyc: 11,
• Definition: Number of non-coherent bypass flits the S-Box has transmitted to the system.
FLITS_SENT_NCS
• Title: NCS Flits Sent to System
• Category: System Bound Transmission
• Event Code: 0x67, Max. Inc/Cyc: 1,
• Definition: Number of non-coherent standard flits the S-Box has transmitted to the system.
HALFLINE_BYPASS
• Title: Half Cacheline Bypass
• Category: Ring Bound Enhancement
• Event Code: 0x30, Max. Inc/Cyc: 1,
• Definition: Half Cacheline Bypass optimization (where the line is sent early) was utilized.
NO_CREDIT_AD
• Title: AD Ring Credit Unavailable
• Category: Ring Bound Credits
• Event Code: 0x87, Max. Inc/Cyc: 1,
• Definition: Number of times the S-Box has a pending SNP, NCS or NCB message to send and there is
no credit for the target egress FIFO.
NO_CREDIT_AK
• Title: AK Ring Credit Unavailable
• Category: Ring Bound Credits
• Event Code: 0x88, Max. Inc/Cyc: 1,
• Definition: Number of times the S-Box has a pending NDR or S2C credit return message to send but
there is no credit for the target egress FIFO.
NO_CREDIT_BL
• Title: BL Ring Credit Unavailable
• Category: Ring Bound Credits
• Event Code: 0x89, Max. Inc/Cyc: 1,
• Definition: Number of times the S-Box has a pending DRS or debug message to send and there is no
credit for the target egress FIFO.
NO_CREDIT_DRS
• Title: DRS Credit Unavailable
• Category: System Bound Credits
• Event Code: 0x82, Max. Inc/Cyc: 1,
• Definition: Number of times the S-Box has a pending data response message to send and there is no
DRS or VNA credit available.
2-58
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
NO_CREDIT_HOM
• Title: HOM Credit Unavailable
• Category: System Bound Credits
• Event Code: 0x80, Max. Inc/Cyc: 1,
• Definition: Number of times the S-Box has a pending home message to send and there is no HOM or
VNA credit available.
NO_CREDIT_IPQ
• Title: IPQ Credit Unavailable
• Category: Ring Bound Credits
• Event Code: 0x8A, Max. Inc/Cyc: 1,
• Definition: Number of times the S-Box has an incoming SNP to send but there is no IPQ credit
available for the target C-Box.
NO_CREDIT_NCB
• Title: NCB Credit Unavailable
• Category: System Bound Credits
• Event Code: 0x84, Max. Inc/Cyc: 1,
• Definition: Number of times the S-Box has a pending non-coherent bypass message to send and
there is no NCB or VNA credit available.
NO_CREDIT_NCS
• Title: NCS Credit Unavailable
• Category: System Bound Credits
• Event Code: 0x83, Max. Inc/Cyc: 1,
• Definition: Number of times the S-Box has a pending non-coherent standard message to send and
there is no NCS or VNA credit available.
NO_CREDIT_NDR
• Title: NDR Credit Unavailable
• Category: System Bound Credits
• Event Code: 0x85, Max. Inc/Cyc: 1,
• Definition: Number of times the S-Box has a pending non-data response message to send and there
is no NDR or VNA credit available.
NO_CREDIT_SNP
• Title: SNP Credit Unavailable
• Category: System Bound Credits
• Event Code: 0x81, Max. Inc/Cyc: 1,
• Definition: Number of times the S-Box has a pending snoop message to send and there is no SNP or
VNA credit available.
2-59
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
NO_CREDIT_VNA
• Title: VNA Credit Unavailable
• Category: System Bound Transmission
• Event Code: 0x86, Max. Inc/Cyc: 1,
• Definition: Number of times the S-Box has exhausted its VNA credit pool. When more than one
subevent is selected, the credit counter will be incremented by the number of selected subevents that
occur in each cycle.
umask
Extension Description
[15:8]
PKTS_RCVD_DRS_FROM_B
• Title: DRS Packets Received from B-Box
• Category: Ring Bound Transmission
• Event Code: 0x73, Max. Inc/Cyc: 1,
• Definition: Number of data response packets the S-Box has received from the B-Box.
• NOTE: DRS messages are always full cacheline messages which are 9 flits. Multiply this event by 9 to
derive flit traffic from the B-Box due to DRS messages.
PKTS_RCVD_DRS_FROM_R
• Title: DRS Packets Received from R-Box
• Category: Ring Bound Transmission
• Event Code: 0x72, Max. Inc/Cyc: 9,
• Definition: Number of data response packets the S-Box has received from the R-Box.
• NOTE: DRS messages are always full cacheline messages which are 9 flits. Multiply this event by 9 to
derive flit traffic from the R-Box due to DRS messages.
PKTS_RCVD_NCB
• Title: NCB Packets Received from System
• Category: Ring Bound Transmission
• Event Code: 0x75, Max. Inc/Cyc: 1,
• Definition: Number of non-coherent bypass packets the S-Box has received from the system.
• NOTE: The only ring bound NCB message types are: NcMsgB (StartReq2, VLW), IntLogical,
IntPhysical. These are all 11 flit messages. Multiply this event by 11 to derive flit traffic from the system
due to NCB messages.
PKTS_RCVD_NCS
• Title: NCS Packets Received from System
• Category: Ring Bound Transmission
• Event Code: 0x74, Max. Inc/Cyc: 1,
• Definition: Number of non-coherent standard packets the S-Box has received from the system.
• NOTE: The only ring bound NCS message type is NcMsgS (StopReq1). There are always 3 flits.
Multiply this event by 3 to derive flit traffic from the system due to NCS messages.
PKTS_RCVD_NDR
• Title: NDR Packets Received from System
• Category: Ring Bound Transmission
• Event Code: 0x70, Max. Inc/Cyc: 1,
• Definition: Number of non-data response packets the S-Box has received from the system.
2-60
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
PKTS_RCVD_SNP
• Title: SNP Packets Received from System
• Category: Ring Bound Transmission
• Event Code: 0x71, Max. Inc/Cyc: 1,
• Definition: Number of snoop packets the S-Box has received from the system.
PKTS_SENT_DRS
• Title: DRS Packets Sent to System
• Category: System Bound Transmission
• Event Code: 0x64, Max. Inc/Cyc: 1,
• Definition: Number of DRS packets the S-Box has transmitted to the system.
• NOTE: If multiple C-Boxes are selected, this event counts the total data response packets sent by all
the selected C-Boxes. In the cases where one DRS message spawns two messages, one to the
requester and one to the home, this event only counts the first DRS message. DRS messages are
always full cacheline messages which are 9 flits.
umask
Extension Description
[15:8]
PKTS_SENT_HOM
• Title: HOM Packets Sent to System
• Category: System Bound Transmission
• Event Code: 0x60, Max. Inc/Cyc: 1,
• Definition: Number of home packets the S-Box has transmitted to the R-Box or B-Box. If both R-Box
and B-Box are selected, counts the total number of home packets sent to both boxes.
umask
Extension Description
[15:8]
PKTS_SENT_NCB
• Title: NCB Packets Sent to System
• Category: System Bound Transmission
• Event Code: 0x68, Max. Inc/Cyc: 11,
• Definition: Number of NCB packets the S-Box has transmitted to the system.
• NOTE: If multiple C-Boxes are selected, this event counts the total non-coherent bypass packets sent
by all the selected C-Boxes. The only ring bound NCB message types are: NcMsgB (StartReq2, VLW),
IntLogical, IntPhysical. These are all 11 flit messages.
umask
Extension Description
[15:8]
2-61
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
PKTS_SENT_NCS
• Title: NCS Packets Sent to System
• Category: System Bound Transmission
• Event Code: 0x66, Max. Inc/Cyc: 3,
• Definition: Number of NCS packets the S-Box has transmitted to the system.
• NOTE: If multiple C-Boxes are selected, this event counts the total non-coherent standard packets
sent by all the selected C-Boxes. The only ring bound NCS message type is NcMsgS (StopReq1).
There are always 3 flits.
umask
Extension Description
[15:8]
PKTS_SENT_NDR
• Title: NDR Packets Sent to System
• Category: System Bound Transmission
• Event Code: 0x63, Max. Inc/Cyc: 1,
• Definition: Number of non-data response packets the S-Box has transmitted to the system.
PKTS_SENT_SNP
• Title: SNP Packets Sent to System
• Category: System Bound Transmission
• Event Code: 0x62, Max. Inc/Cyc: 1,
• Definition: Number of SNP packets the S-Box has transmitted to the system. This event only counts
the first snoop that is spawned from a home request. When S-Box broadcast is enabled, this event does
not count the additional snoop packets that are spawned.
RBOX_CREDIT_CARRIERS
• Title: R-Box Credit Carrying Flits
• Category: Ring Bound Transmission
• Event Code: 0x76, Max. Inc/Cyc: 1,
• Definition: Number credit carrying idle flits received from the R-Box.
RBOX_CREDIT_RETURNS
• Title: R-Box Credit Returns
• Category: System Bound Transmission
• Event Code: 0x6A, Max. Inc/Cyc: 1,
• Definition: Number credit return idle flits sent to the R-Box.
RBOX_HOM_BYPASS
• Title: R-Box HOM Bypass
• Category: System Bound Enhancement
• Event Code: 0x50, Max. Inc/Cyc: 1,
• Definition: R-Box HOM Bypass optimization was utilized.
2-62
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
RBOX_SNP_BYPASS
• Title: R-Box SNP Bypass
• Category: System Bound Enhancement
• Event Code: 0x51, Max. Inc/Cyc: 1,
• Definition: R-Box SNP bypass optimization utilized. When both snoop and big snoop bypass are
selected, the performance counter will increment on both subevents.
umask
Extension Description
[15:8]
REQ_TBL_OCCUPANCY
• Title: Request Table Occupancy
• Category: Ring Bound Queue
• Event Code: 0x31, Max. Inc/Cyc: 48,
• Definition: Number of request table entries occupied by socket requests. Local means request is
targeted towards the B-Boxes in the same socket. Requests to the B-Box in the same socket are
considered remote.
• NOTE: Occupancy is tracked from allocation to deallocation of each entry in the queue.
umask
Extension Description
[15:8]
S2B_HOM_BYPASS
• Title: S-Box to B-Box HOM Bypass
• Category: System Bound Enhancement
• Event Code: 0x52, Max. Inc/Cyc: 1,
• Definition: Number of cycles the S-Box to B-Box HOM channel bypass optimization was utilized.
Includes cycles used to transmit message flits and credit carrying idle credit flits.
TO_RING_B2S_MSGQ_CYCLES_FULL
• Title: Ring Bound B2S Message Queue Full
• Category: Ring Bound Queue
• Event Code: 0x2B, Max. Inc/Cyc: 1,
• Definition: Number of cycles in which the header buffer, containing B to S-Box messages on their
way to the Ring, is full.
TO_RING_B2S_MSGQ_CYCLES_NE
• Title: Cycles Ring Bound B2S Message Queue Not Empty
• Category: Ring Bound Queue
• Event Code: 0x2D, Max. Inc/Cyc: 1,
• Definition: Number of cycles in which the header buffer, containing B to S-Box messages on their
way to the Ring, has one or more entries allocated.
2-63
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
TO_RING_B2S_MSGQ_OCCUPANCY
• Title: Ring Bound B2S Message Queue Occupancy
• Category: Ring Bound Queue
• Event Code: 0x2F, Max. Inc/Cyc: 8,
• Definition: Number of entries in header buffer containing B to S-Box messages on their way to the
Ring.
TO_RING_MSGQ_OCCUPANCY
• Title: Ring Bound Message Queue Occupancy
• Category: Ring Bound Queue
• Event Code: 0x26, Max. Inc/Cyc: 1,
• Definition: Number of entries in header buffer containing SNP, NCS or NCB messages headed for the
Ring. Each subevent represents usage of the buffer by a particular message class. When more than one
message class is selected, the queue occupancy counter counts the total number of buffer entries
occupied by messages in the selected message classes.
• NOTE: Total of the buffer entries occupied by all message classes in umask will never exceed 36.
umask
Extension Description
[15:8]
TO_RING_NCB_MSGQ_CYCLES_FULL
• Title: Cycles Ring Bound NCB Message Queue Full
• Category: Ring Bound Queue
• Event Code: 0x21, Max. Inc/Cyc: 1,
• Definition: Number of cycles in which the header buffer, containing NCB messages on their way to
the Ring, is full.
TO_RING_NCB_MSGQ_CYCLES_NE
• Title: Cycles Ring Bound NCB Message Queue Not Empty
• Category: Ring Bound Queue
• Event Code: 0x24, Max. Inc/Cyc: 1,
• Definition: Number of cycles in which the header buffer, containing NCB messages on their way to
the Ring, has one or more entries allocated.
TO_RING_NCS_MSGQ_CYCLES_FULL
• Title: Cycles Ring Bound NCS Message Queue Full
• Category: Ring Bound Queue
• Event Code: 0x22, Max. Inc/Cyc: 1,
• Definition: Number of cycles in which the header buffer, containing NCS messages on their way to
the Ring, is full.
TO_RING_NDR_MSGQ_CYCLES_FULL
• Title: Cycles Ring Bound NDR Message Queue Full
• Category: Ring Bound Queue
• Event Code: 0x27, Max. Inc/Cyc: 1,
• Definition: Number of cycles in which the header buffer, containing NDR messages on their way to
the Ring, is full.
2-64
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
TO_RING_NDR_MSGQ_CYCLES_NE
• Title: Cycles Ring Bound NDR Message Queue Not Empty
• Category: Ring Bound Queue
• Event Code: 0x28, Max. Inc/Cyc: 1,
• Definition: Number of cycles in which the header buffer, containing NDR messages on their way to
the Ring, has one or more entries allocated.
TO_RING_NDR_MSGQ_OCCUPANCY
• Title: Ring Bound SNP Message Queue Occupancy
• Category: Ring Bound Queue
• Event Code: 0x29, Max. Inc/Cyc: 32,
• Definition: Number of entries in header buffer containing NDR messages on their way to the Ring.
TO_RING_R2S_MSGQ_CYCLES_FULL
• Title: Cycles Ring Bound R2S Message Queue Full
• Category: Ring Bound Queue
• Event Code: 0x2A, Max. Inc/Cyc: 1,
• Definition: Number of cycles in which the header buffer, containing R to S-Box messages on their
way to the Ring, is full.
TO_RING_R2S_MSGQ_CYCLES_NE
• Title: Cycles Ring Bound R2S Message Queue Not Empty
• Category: Ring Bound Queue
• Event Code: 0x2C, Max. Inc/Cyc: 1,
• Definition: Number of cycles in which the header buffer, containing R to S-Box messages on their
way to the Ring, has one or more entries allocated.
TO_RING_R2S_MSGQ_OCCUPANCY
• Title: Ring Bound R2S Message Queue Occupancy
• Category: Ring Bound Queue
• Event Code: 0x2E, Max. Inc/Cyc: 8,
• Definition: Number of entries in header buffer containing R to S messages on their way to the Ring.
TO_RING_SNP_MSGQ_CYCLES_FULL
• Title: Cycles Ring Bound SNP Message Queue Full
• Category: Ring Bound Queue
• Event Code: 0x20, Max. Inc/Cyc: 1,
• Definition: Number of cycles in which the header buffer, containing SNP messages on their way to the
Ring, is full.
TO_RING_SNP_MSGQ_CYCLES_NE
• Title: Cycles Ring Bound SNP Message Queue Not Empty
• Category: Ring Bound Queue
• Event Code: 0x23, Max. Inc/Cyc: 1,
• Definition: Number of cycles in which the header buffer, containing SNP messages on their way to the
Ring, has one or more entries allocated.
2-65
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
TO_R_DRS_MSGQ_CYCLES_FULL
• Title: Cycles System Bound DRS Message Queue Full.
• Category: System Bound Queue
• Event Code: 0x0E, Max. Inc/Cyc: 1,
• Definition: Number of cycles in which the header buffer for the selected C-Box, containing DRS mes-
sages heading to a System Agent (through the R-Box), is full. Only one C-Box’s DRS header buffer
should be selected for the buffer full checking to be correct, else the result is undefined.
umask
Extension Description
[15:8]
TO_R_DRS_MSGQ_CYCLES_NE
• Title: Cycles System Bound DRS Message Queue Not Empty
• Category: System Bound Queue
• Event Code: 0x0F, Max. Inc/Cyc: 1,
• Definition: Number of cycles in which the header buffer for the selected C-Box, containing DRS mes-
sages heading to a System Agent (through the R-Box), has one or more entries allocated. When more
than one C-Box is selected, the event is asserted when any of the selected C-Box DRS header buffers
are not empty.
umask
Extension Description
[15:8]
TO_R_DRS_MSGQ_OCCUPANCY
• Title: System Bound DRS Message Queue Occupancy
• Category: System Bound Queue
• Event Code: 0x10, Max. Inc/Cyc: 16,
• Definition: Number of entries in the header buffer for the selected C-Box, containing DRS messages
heading to a System Agent (through the R-Box). When more than one C-Box is selected, the queue
occupancy counter counts the total number of occupied entries in all selected C-Box DRS header buff-
ers.
• NOTE: 1 buffer per C-Box, 4 entries each.
umask
Extension Description
[15:8]
2-66
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
TO_R_B_HOM_MSGQ_CYCLES_FULL
• Title: Cycles System Bound HOM Message Queue Full.
• Category: System Bound Queue
• Event Code: 0x03, Max. Inc/Cyc: 1,
• Definition: Number of cycles in which the header buffer, containing HOM messages heading to a Sys-
tem Agent (through the B or R-Box), is full. If both R-Box and B-Box subevents are selected, this
event is asserted when the total number of entries in the R-Box and B-Box Home header buffers is
equal to 64.
umask
Extension Description
[15:8]
TO_R_B_HOM_MSGQ_CYCLES_NE
• Title: Cycles System Bound HOM Header Not Empty
• Category: System Bound Queue
• Event Code: 0x06, Max. Inc/Cyc: 1,
• Definition: Number of cycles in which the header buffer, containing HOM messages heading to a Sys-
tem Agent (through the B or R-Box), has one or more entries allocated. If both R-Box and B-Box sub-
events are selected, this event is asserted when the total number of entries in the R-Box and B-Box
Home header buffers is equal to 64.
umask
Extension Description
[15:8]
TO_R_B_HOM_MSGQ_OCCUPANCY
• Title: System Bound HOM Message Queue Occupancy
• Category: System Bound Queue
• Event Code: 0x07, Max. Inc/Cyc: 64,
• Definition: Number of entries in the header buffer containing HOM messages heading to a System
Agent (through the B or R-Box).
• NOTE: 1 buffer for the R-Box and 1 for the B-Box, 64 entries each. The sum of the occupied entries
in the 2 header buffers will never exceed 64.
umask
Extension Description
[15:8]
2-67
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
TO_R_NCB_MSGQ_CYCLES_FULL
• Title: Cycles System Bound NCB Message Queue Full.
• Category: System Bound Queue
• Event Code: 0x11, Max. Inc/Cyc: 1,
• Definition: Number of cycles in which the header buffer for the selected C-Box, containing NCB mes-
sages heading to a System Agent (through the R-Box), is full. Only one C-Box’s NCB header buffer
should be selected for the buffer full checking to be correct, else the result is undefined.
umask
Extension Description
[15:8]
TO_R_NCB_MSGQ_CYCLES_NE
• Title: Cycles System Bound NCB Message Queue Not Empty
• Category: System Bound Queue
• Event Code: 0x12, Max. Inc/Cyc: 1,
• Definition: Number of cycles in which the header buffer for the selected C-Box, containing NCB mes-
sages heading to a System Agent (through the R-Box), has one or more entries allocated. When more
than one C-Box is selected, the event is asserted when any of the selected C-Box DRS header buffers
are not empty.
umask
Extension Description
[15:8]
TO_R_NCB_MSGQ_OCCUPANCY
• Title: System Bound NCB Message Queue Occupancy
• Category: System Bound Queue
• Event Code: 0x13, Max. Inc/Cyc: 8,
• Definition: Number of entries in the header buffer for the selected C-Box, containing NCB messages
heading to a System Agent (through the R-Box). When more than one C-Box is selected, the queue
occupancy counter counts the total number of occupied entries in all selected C-Box NCB header buff-
ers.
• NOTE: 1 buffer per C-Box, 2 entries each.
umask
Extension Description
[15:8]
2-68
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
TO_R_NCS_MSGQ_CYCLES_FULL
• Title: Cycles System Bound NCS Message Queue Full.
• Category: System Bound Queue
• Event Code: 0x14, Max. Inc/Cyc: 1,
• Definition: Number of cycles in which the header buffer for the selected C-Box, containing NCS mes-
sages heading to a System Agent (through the R-Box), is full. Only one C-Box’s NCS header buffer
should be selected for the buffer full checking to be correct, else the result is undefined.
umask
Extension Description
[15:8]
TO_R_NCS_MSGQ_CYCLES_NE
• Title: Cycles System Bound NCS Message Queue Not Empty
• Category: System Bound Queue
• Event Code: 0x15, Max. Inc/Cyc: 1,
• Definition: Number of cycles in which the header buffer for the selected C-Box, containing NCS mes-
sages heading to a System Agent (through the R-Box), has one or more entries allocated. When more
than one C-Box is selected, the event is asserted when any of the selected C-Box NCS header buffers
are not empty.
umask
Extension Description
[15:8]
TO_R_NCS_MSGQ_OCCUPANCY
• Title: System Bound NCS Message Queue Occupancy
• Category: System Bound Queue
• Event Code: 0x16, Max. Inc/Cyc: 2,
• Definition: Number of entries in the header buffer for the selected C-Box, containing NCS messages
heading to a System Agent (through the R-Box). When more than one C-Box is selected, the queue
occupancy counter counts the total number of occupied entries in all selected C-Box NCS header buff-
ers.
• NOTE: 1 buffer per C-Box, 2 entries each.
umask
Extension Description
[15:8]
2-69
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
TO_R_NDR_MSGQ_CYCLES_FULL
• Title: Cycles System Bound NDR Message Queue Full
• Category: System Bound Queue
• Event Code: 0x0B, Max. Inc/Cyc: 1,
• Definition: Number of cycles in which the header buffer, containing NDR messages heading to a Sys-
tem Agent (through the R-Box), is full.
TO_R_NDR_MSGQ_CYCLES_NE
• Title: Cycles System Bound NDR Message Queue Not Empty
• Category: System Bound Queue
• Event Code: 0x0C, Max. Inc/Cyc: 1,
• Definition: Number of cycles in which the header buffer, containing NDR messages heading to a Sys-
tem Agent (through the R-Box), has one or more entries allocated.
TO_R_NDR_MSGQ_OCCUPANCY
• Title: System Bound NDR Message Queue Occupancy
• Category: System Bound Queue
• Event Code: 0x0D, Max. Inc/Cyc: 16,
• Definition: Number of entries in the header buffer, containing NDR messages heading to a System
Agent (through the R-Box).
TO_R_PROG_EV
• Title: System Bound Programmable Event
• Category: System Bound Queue
• Event Code: 0x00, Max. Inc/Cyc: 1,
• Definition: Programmable Event heading to a System Agent (through the R-Box). Match/Mask on
criteria set in S_MSR_MATCH/MASK registers (Refer to Section 2.5.3.4, “S-Box Registers for Mask/
Match Facility”).
TO_R_B_REQUESTS
• Title: System Bound Requests
• Category: System Bound Transmission
• Event Code: 0x6C, Max. Inc/Cyc: 1,
• Definition: Socket request (both B-Boxes). Local means request is targeted towards the B-Boxes in
the same socket. Requests to the U-Box in the same socket are considered remote.
umask
Extension Description
[15:8]
2-70
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
TO_R_SNP_MSGQ_CYCLES_FULL
• Title: Cycles System Bound SNP Message Queue Full
• Category: System Bound Queue
• Event Code: 0x08, Max. Inc/Cyc: 1,
• Definition: Number of cycles in which the header buffer, containing SNP messages heading to a Sys-
tem Agent (through the R-Box), is full.
TO_R_SNP_MSGQ_CYCLES_NE
• Title: Cycles System Bound SNP Message Queue Not Empty
• Category: System Bound Queue
• Event Code: 0x09, Max. Inc/Cyc: 1,
• Definition: Number of cycles in which the header buffer, containing SNP messages heading to a Sys-
tem Agent (through the R-Box), has one or more entries allocated.
TO_R_SNP_MSGQ_OCCUPANCY
• Title: System Bound SNP Message Queue Occupancy
• Category: System Bound Queue
• Event Code: 0x0A, Max. Inc/Cyc: 32,
• Definition: Number of entries in the header buffer, containing SNP messages heading to a System
Agent (through the R-Box).
2-71
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
The on-die agents include two B-Boxes (ports 4/7), two S-boxes (ports 2/6) and the U-Box (which
shares a connection with B-Box1 on Port7). The R-Box connects to these through full flit 80b links. Ports
0,1,4 and 5 are connected to external Intel QPI agents (through P-boxes also known as the physical
layers), also through full flit 80b links.
The R-Box consists of 8 identical ports and a wire crossbar that connects the ports together. Each port
contains three main sections as shown in the following figure: the input port, the output port, and the
arbitration control.
R-Box input ports have two structures of important to performance monitoring; Entry overflow table
(EOT) and Entry Table (ET). R-Box PMU supports performance monitoring in these two structures.
R-Box arbitration does not have any storage structures. This part of the logic basically determines
which port to route the packet and then arbitrate to secure a route to that port through the cross-bar.
The arbitration is done at 3 levels: queue, port and global arbitration. R-Box PMUs support performance
monitoring at the arbitration control.
2-72
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
Home HOM 96 1 1 1 1
Snoop SNP 1 1 1 1
For information on how to setup a monitoring session, refer to Section 2.1.2, “Setting up a Monitoring
Session”.
The counters, along with the control register paired with each one, are split. Half of the counters (0-7)
can monitor events occurring on the ‘left’ side of the R-Box (ports 0-3) and the other half (8-15)
monitor ports 4-7 on the ‘right’ side.
Since the R-Box consists of 12 almost identical ports, R-Box perfmon events consist of an identical set
of events for each port. The R-Box perfmon usage model allows monitoring of multiple ports at the
same time. R-Box PMUs do not provide any global performance monitoring events.
However, unlike many other uncore boxes, event programming in the R-Box is hierarchical. It is
necessary to program multiple MSRs to select the event to be monitored. In order to program an event,
each of the control registers for its accompanying counter must be redirected to a subcontrol register
attached to a specific port. Each control register can be redirected to one of 2 IPERF control registers
(for RIX events), one of 2 fields in a QLX control register or one of 2 mask/match registers. Therefore,
it is possible to monitor up to two of any event per port.
The R-Box also includes a pair of mask/match registers on each port that allow a user to match packets
serviced (packet is transferred from input to output port) by the R-Box according to various standard
packet fields such as message class, opcode, etc.
2-73
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
3) Pick a generic counter (control+data) that can monitor an event on that port. (e.g
R_MSR_PMON_CTL/CTR3)
4) Pick one of the two sub counters that allows a user to monitor the event (R_MSR_PORT1_IPERF1),
program it to monitor the chosen event (R_MSR_PORT1_IPERF1[31] = 0x1) and set the generic control
to point to it (R_MSR_PMON_CTL3.ev_sel == 0x7).
HW can be also configured (by setting the corresponding .pmi_en to 1) to send a PMI to the U-Box
when an overflow is detected. The U-Box may be configured to freeze all uncore counting and/or send a
PMI to selected cores when it receives this signal.
Once a freeze has occurred, in order to see a new freeze, the overflow field responsible for the freeze,
must be cleared by setting the corresponding bit in R_MSR_PMON_GLOBAL_OVF_CTL.clr_ov. Assuming
all the counters have been locally enabled (.en bit in data registers meant to monitor events) and the
overflow bit(s) has been cleared, the R-Box is prepared for a new sample interval. Once the global
controls have been re-enabled (Section 2.1.4, “Enabling a New Sample Interval from Frozen
Counters”), counting will resume.
MSR
Size
MSR Name Access Addres Description
(bits)
s
R_MSR_PORT7_XBR_SET2_MASK RW_NA 0x0E9E 64 R-Box Port 7 Mask 2
2-74
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
MSR
Size
MSR Name Access Addres Description
(bits)
s
R_MSR_PORT5_XBR_SET1_MASK RW_NA 0x0E86 64 R-Box Port 5 Mask 1
2-75
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
MSR
Size
MSR Name Access Addres Description
(bits)
s
R_MSR_PMON_CTR13 RW_RW 0x0E3B 64 R-Box PMON Counter 13
R_MSR_PMON_OVF_CTL_15_8 RW1C_ 0x0E22 32 R-Box PMON Overflow Ctrl for ctrs 15:8
WO
R_MSR_PMON_GLOBAL_STATUS_15_8 RO_WO 0x0E21 32 R-Box PMON Global Status for ctrs 15:8
2-76
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
MSR
Size
MSR Name Access Addres Description
(bits)
s
R_MSR_PMON_CTR1 RW_RW 0x0E13 64 R-Box PMON Counter 1
R_MSR_PMON_OVF_CTL_15_8 RW1C_ 0x0E02 32 R-Box PMON Overflow Ctrl for ctrs 7:0
WO
R_MSR_PMON_GLOBAL_STATUS_7_0 RO_WO 0x0E01 32 R-Box PMON Global Status for ctrs 7:0
2-77
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
The _GLOBAL_CTL register contains the bits used to enable monitoring. It is necessary to set the
.ctr_en bit to 1 before the corresponding data register can collect events.
If an overflow is detected from one of the R-Box PMON registers, the corresponding bit in the
_GLOBAL_STATUS.ov field will be set. To reset the overflow bits set in the _GLOBAL_STATUS.ov field, a
user must set the corresponding bits in the _GLOBAL_OVF_CTL.clr_ov field before beginning a new
sample interval.
ctr_en 7:0 0 Must be set to enable each RBOX counter (bit 0 to enable ctr0, etc)
NOTE: U-Box enable and per counter enable must also be set to fully
enable the counter.
clr_ov 7:0 0 Writing bit in field to ‘1’ will clear the corresponding overflow bit in
R_CSR_PMON_GLOBAL_STATUS_{15_8,7_0} to 0.
2-78
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
pmi_en 6 0 When this bit is asserted and the corresponding counter overflows, a PMI
exception is sent to the U-Box.
For the R-Box this means choosing which sub register contains the actual
event select. Each control register can redirect the event select to one of
3 sets of registers: QLX, RIX or Mask/Match registers. It can further
select from one of two subselect fields (either in the same or different
registers).
And finally, each control can ‘listen’ to events occurring on one of 4 ports.
The first 8 control registers can refer to the first 4 ports and the last 8
control
en 0 0 Enable counter
2-79
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
2-80
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
The R-Box performance monitor data registers are 48b wide. A counter overflow occurs when a carry
out bit from bit 47 is detected. Software can force all uncore counting to freeze after N events by
preloading a monitor with a count value of 248 - N and setting the control register to send a PMI to the
U-Box. Upon receipt of the PMI, the U-Box will disable counting ( Section 2.1.1.1, “Freezing on Counter
Overflow”). During the interval of time between overflow and global disable, the counter value will wrap
and continue to collect events.
In this way, software can capture the precise number of events that occurred between the time uncore
counting was enabled and when it was disabled (or ‘frozen’) with minimal skew.
If accessible, software can continuously read the data registers without disabling event collection.
2-81
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
MC_ROLL_ALLOC 21 0x0 Used with MC field. If set, every individual allocation of selected MC
into EOT is reported. If 0, a ‘rolling’ count is reported (count
whenever count overflows 7b count - val of 128) for selected MC’s
allocation into EOT.
2-82
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
IQA_READ_OK 8 0x0 Bid wins arbitration. Read flit from IQA and drains to XBAR.
NEW_PVN 7:6 0x0 New Packet VN Select: Anded with result of New Packet Class Bit
Mask.
b1XXXXX: Snoop
bX1XXXX: Home
bXX1XXX: Non-Data Response
bXXX1XX: Data Response
bXXXX1X: Non-Coherent Standard
bXXXXX1: Non-Coherent Bypass
0: VN0
1: VN1
ev1_cls 14:12 0x0 Performance Event 1 Class Select:
000: HOM
001: SNP
010: NDR
011: NCS
100: DRS
101: NCB
110: VNA - Small
111: VNA - Large
2-83
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
0: VN0
1: VN1
ev0_cls 6:4 0x0 Performance Event 0 Class Select:
000: HOM
001: SNP
010: NDR
011: NCS
100: DRS
101: NCB
110: VNA - Small
111: VNA - Large
ev0_type 3:0 0x0 Performance Event 0 Type Select:
2-84
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
NOTE: In order to monitor packet traffic, instead of the flit traffic associated with each packet, set
.match_flt_cnt to 0x1.
c) Set the counter’s control register event select to the appropriate IPERF subcontrol register and set
the IPERF register’s event select to 0x31 (TO_R_PROG_EV) to capture the mask/match as a
performance event.
The following table contains the packet traffic that can be monitored if one of the mask/match registers
was chosen to select the event.
RDS 51:48 0x0 Response Data State (valid when MC == DRS and Opcode == 0x0-
2). Bit settings are mutually exclusive.
b1000 - Modified
b0100 - Exclusive
b0010 - Shared
b0001 - Forwarding
b0000 - Invalid (Non-Coherent)
2-85
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
DRS,NCB:
[8] Packet Size, 0 == 9 flits, 1 == 11 flits
NCS:
[8] Packet Size, 0 == 1 or 2 flits, 1 == 3 flits
b00 - VN0
b01 - VN1
b1x - VNA
RDS 51:48 0x0 Response Data State (for certain DRS messages)
Following is a selection of common events that may be derived by using the R-Box packet matching
facility.
2-86
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
DRS.AnyDataC 0x1C00 0x1F80 Any Data Response message containing a cache line in
response to a core request. The AnyDataC messages are only
sent to an S-Box. The metric DRS.AnyResp - DRS.AnyDataC
will compute the number of DRS writeback and non snoop
write messages.
DRS.DataC_M 0x1C00 0x1FE0 Data Response message of a cache line in M state that is
&& && response to a core request. The DRS.DataC_M messages are
only sent to S-Boxes.
Match Mask
[51:48] [51:48]
0x8 0xF
DRS.WblData 0x1C80 0x1FE0 Data Response message for Write Back data where cacheline is
set to the I state.
DRS.WbSData 0x1CA0 0x1FE0 Data Response message for Write Back data where cacheline is
set to the S state.
DRS.WbEData 0x1CC0 0x1FE0 Data Response message for Write Back data where cacheline is
set to the E state.
DRS.AnyResp 0x1C00 0x1E00 Any Data Response message. A DRS message can be either 9
flits for a full cache line or 11 flits for partial data.
DRS.AnyResp9flits 0x1C00 0x1F00 Any Data Response message that is 11 flits in length. An 11
flit DRS message contains partial data. Each 8 byte chunk
contains an enable field that specifies if the data is valid.
DRS.AnyResp11flits 0x1D00 0x1F00 Any Non Data Response completion message. A NDR message
is 1 on flit.
NCB.AnyMsg9flits 0x1800 0x1F00 Any Non-Coherent Bypass message that is 9 flits in length. A
9 flit NCB message contains a full 64 byte cache line.
NCB.AnyMsg11flits 0x1900 0x1F00 Any Non-Coherent Bypass message that is 11 flits in length.
An 11 flit NCB message contains either partial data or an
interrupt. For NCB 11 flit data messages, each 8 byte chunk
contains an enable field that specifies if the data is valid.
NCB.AnyInt 0x1900 0x1F80 Any Non-Coherent Bypass interrupt message. NCB interrupt
messages are 11 flits in length.
NOTE: Bits 71:16 of the match/mask must be 0 in order to derive these events (except where noted -
see DRS.DataC_M). Also the match/mask configuration register should be set to 0x00210000 (bits 21
and 16 set).
In addition, the R-Box provides the ability to match/mask against ALL flit traffic that leaves the R-Box.
This is particularly useful for calculating link utilization, throughput and packet traffic broken down by
opcode and message class.
2-87
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
ALLOC_TO_ARB
• Title: Transactions allocated to ARB
• Category: RIX
• [Bit(s)] Value: See Note, Max. Inc/Cyc: 1,
• Definition: Transactions entered into Entry Table (counts incoming messages); This also means they
are now available.
• NOTE: Any combination of Message Class [15:9] may be monitored.
2-88
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
EOT_INSERTS
• Title: Number of Inserts into EOT
• Category: RIX
• [Bit(s)] Value: [21]0x1, Max. Inc/Cyc: 1,
• Definition: Used with MC field. Accumulated depth of packets captured in the Entry Overflow Table
for specified message types (i.e. EOT count by MC)
• NOTE: This basically counts VNA.
EOT_NE_CYCLES
• Title: Cycles EOT Not Empty
• Category: RIX
• [Bit(s)] Value: [16]0x1, Max. Inc/Cyc: 1,
• Definition: Number of cycles the Entry Overflow Table buffer is not empty (MC counts are not all
zero).
• NOTE: Signals only if EOTs for ALL message classes are empty
2-89
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
EOT_OCCUPANCY
• Title: EOT Occupancy
• Category: RIX
• [Bit(s)] Value: [21]0x0, Max. Inc/Cyc: 1,
• Definition: Used with MC field. Report a rolling count whenever a 7b counter (count == 128) over-
flows for the selected MC’s allocation into EOT.
• NOTE: This basically counts VNA.
FLITS_RECV_ERR
• Title: Error Flits Received
• Category: RIX
• [Bit(s)] Value: [24]0x1, Max. Inc/Cyc: 1,
• Definition: Counts all flits received which caused CRC Error.
FLITS_RECV_SPEC
• Title: Special Flits Received
• Category: RIX
• [Bit(s)] Value: [25]0x1, Max. Inc/Cyc: 1,
• Definition: Counts all special flits received.
FLITS_SENT
• Title: Flits Sent
• Category: RIX
• [Bit(s)] Value: [31]0x1, Max. Inc/Cyc: 1,
• Definition: Counts all flits. Output queue receives active beat.
GLOBAL_ARB_BID
• Title: Global ARB Bids
• Category: QLX
• [Bit(s)] Value: [3:0]0x2, Max. Inc/Cyc: 1,
• Definition: Count global arbitration bids from the port.
2-90
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
GLOBAL_ARB_BID_FAIL
• Title: Failed Global ARB Bids
• Category: QLX
• [Bit(s)] Value: [3:0]0x5, Max. Inc/Cyc: 1,
• Definition: Number of bids for output port that were rejected at the global ARB.
INQUE_READ_WIN
• Title: Input Queue Read Win.
• Category: RIX
• [Bit(s)] Value: [8]0x1, Max. Inc/Cyc: 1,
• Definition: Bid wins arbitration. Counts number of IQA reads and drains to XBAR.
LOCAL_ARB_BID
• Title: Local ARB Bids
• Category: QLX
• [Bit(s)] Value: [3:0]0x1, Max. Inc/Cyc: 1,
• Definition: Number of clocks of non-zero Local ARB Bids.
LOCAL_ARB_BID_FAIL
• Title: Failed Local ARB Bids
• Category: QLX
• [Bit(s)] Value: [3:0]0x4, Max. Inc/Cyc: 1,
• Definition: Number of clocks of non-zero Local ARB Bids that were rejected.
2-91
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
NEW_PACKETS_RECV
• Title: New Packets Received by Port
• Category: RIX
• [Bit(s)] Value: see table, Max. Inc/Cyc: 1,
• Definition: Counts new packets received according to the Virtual Network and Message Class speci-
fied.
• NOTE: Any combination of Message Class [5:0] may be monitored.
NULL_IDLE
• Title: Null Idle Flits
• Category: RIX
• [Bit(s)] Value: [30]0x1, Max. Inc/Cyc: 1,
• Definition: Counts all null idle flits sent.
2-92
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
OUTPUTQ_NE
• Title: Output Queue Not Empty
• Category: RIX
• [Bit(s)] Value: [26]0x1, Max. Inc/Cyc: 1,
• Definition: Output Queue Not Empty in this Output Port.
OUTPUTQ_OVFL
• Title: Output Queue Overflowed
• Category: RIX
• [Bit(s)] Value: [27]0x1, Max. Inc/Cyc: 1,
• Definition: Output Queue Overflowed in this Output Port.
QUE_ARB_BID
• Title: Queue ARB Bids
• Category: • Category: QLX
• [Bit(s)] Value: [3:0]0x0, Max. Inc/Cyc: 1,
• Definition: Number of Queue ARB Bids for specified messages (Number of clocks with non-zero bids
from that queue).
QUE_ARB_BID_FAIL
• Title: Failed Queue ARB Bids
• Category: QLX
• [Bit(s)] Value: [3:0]0x3, Max. Inc/Cyc: 1,
• Definition: Number of Queue ARB bids for input port that were rejected.
2-93
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
RETRYQ_NE
• Title: Retry Queue Not Empty
• Category: RIX
• [Bit(s)] Value: [28]0x1, Max. Inc/Cyc: 1,
• Definition: Retry Queue Not Empty in this Output Port.
RETRYQ_OV
• Title: Retry Queue Overflowed
• Category: RIX
• [Bit(s)] Value: [29]0x1, Max. Inc/Cyc: 1,
• Definition: Retry Queue Overflowed in this Output Port.
STARVING
• Title: Starvation Detected
• Category: QLX
• [Bit(s)] Value: [3:0]0xA, Max. Inc/Cyc: 1,
• Definition: Starvation detected
TARGET_AVAILABLE
• Title: Target Available
• Category: QLX
• [Bit(s)] Value: [3:0]0x9, Max. Inc/Cyc: 1,
• Definition: Number of times target was available at output port.
2-94
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
scheduler
Bbox page
mapper DRAM
cmd table
command
dispatch
issue
queue
retry and
queue timing
payload queue
data path
Bbox
control logic
ack
The memory controller interfaces to the router through the B-Box (home node coherence controller)
and to the P-Box pads.
2-95
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
The Intel 7500 Scalable Memory Buffer provides an interface to DDR3 DIMMs and supports the
following DDR3 functionality:
• DDR3 protocol and signalling, includes support for the following:
— Up to two RDIMMs per DDR3 bus
— Up to eight physical ranks per DDR3 bus (sixteen per Intel 7500 Scalable Memory Buffer)
— 800 MT/s or 1066 MT/s (both DDR3 buses must operate at the same frequency)
— Single Rank x4, Dual Rank x4, Single Rank x8, Dual Rank x8, Quad Rank x4, Quad Rank x8
— 1 GB, 2 GB, 4 GB, 8 GB, 16 GB DIMM
— DRAM device sizes: 1 Gb, 2 Gb
— Mixed DIMM types (no requirement that DIMMs must be the same type, except that all DIMMs
attached to an Intel 7500 Scalable Memory Buffer must run with a common frequency and core
timings. Host lockstep requirements may impose additional requirements on DIMMs on
separate Intel SMI channels).
— DDR buses may contain different number of DIMMs, zero through two. (Host lockstep
requirements may impose additional requirements on DIMMs on separate Intel SMI channels).
— Cmd/Addr parity generation and error logging.
• No support for non-ECC DIMMs
• No support for DDR2 protocol and signaling
• Support for integrating RDIMM thermal sensor information into Intel SMI Status Frame.
For information on how to setup a monitoring session, refer to Section 2.1, “Global Performance
Monitoring Control”.
2-96
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
For instance, to count (in counter 0) the number of RAS DRAM commands
(PLD_DRAM_EV.DRAM_CMD.RAS) that have been issued, set up is as follows:
M_MSR_PMU_CNT_CTL_0.en [0] = 1
M_MSR_PMU_CNT_CTL_0.flag_mode [7] = 0
To count (in counter 2) the number of read commands from the B-Box in the scheduler queue
(BCMD_SCHEDQ_OCCUPANCY.READS), set up is as follows:
M_MSR_PMU_CNT_CTL_2.en [0] = 1
M_MSR_PMU_CNT_CTL_0.flag_mode [7] = 1
HW can be also configured (by setting the corresponding .pmi_en to 1) to send a PMI to the U-Box
when an overflow is detected. The U-Box may be configured to freeze all uncore counting and/or send a
PMI to selected cores when it receives this signal.
Once a freeze has occurred, in order to see a new freeze, the overflow field responsible for the freeze,
must be cleared by setting the corresponding bit in M_MSR_PMON_GLOBAL_OVF_CTL.clr_ov. Assuming
all the counters have been locally enabled (.en bit in data registers meant to monitor events) and the
overflow bit(s) has been cleared, the M-Box is prepared for a new sample interval. Once the global
controls have been re-enabled (Section 2.1.4, “Enabling a New Sample Interval from Frozen
Counters”), counting will resume.
2-97
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
2-98
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
MSR Size
MSR Name Access Description
Address (bits)
MB0_CR_M_MSR_PMU_CNT_4 RW_RW 0x0CB9 64 M-Box 0 PMON Counter 4
The _GLOBAL_CTL register contains the bits used to enable monitoring. It is necessary to set the
.ctr_en bit to 1 before the corresponding data register can collect events.
If an overflow is detected from one of the M-Box PMON registers, the corresponding bit in the
_GLOBAL_STATUS.ov field will be set. To reset the overflow bits set in the _GLOBAL_STATUS.ov field, a
user must set the corresponding bits in the _GLOBAL_OVF_CTL.clr_ov field before beginning a new
sample interval.
2-99
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
ctr_en 5:0 0 Must be set to enable each MBOX 0 counter (bit 0 to enable ctr0, etc)
NOTE: U-Box enable and per counter enable must also be set to fully
enable the counter.
HW
Field Bits Reset Description
Val
clr_ov 5:0 0 Writing ‘1’ to bit in filed causes corresponding bit in ‘Overflow PerfMon
Counter’ field in MB0_CR_M_MSR_PERF_GLOBAL_STATUS register to
be cleared to 0.
2-100
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
set_flag_sel 21:19 0 Selects the ‘set’ condition for enable flag. Secondary event select.
See Table 2-84, “Performance Monitor Events for M-Box Events” for
events elected by this field.
NOTE: Bit 7 (flag_mode) must be set to 1 to enable this field.
inc_sel 13:9 0 Selects increment signal for this counter. Primary event select.
See Table 2-84, “Performance Monitor Events for M-Box Events” for
events elected by this field.
wrap_mode 6 0 Counter wrap mode. If set to 0, this counter will stop counting on
detection of over/underflow. If set to 1, this counter will wrap and
continue counting on detection of over/underflow.
storage_mode 5:4 0 Storage mode. If set to 0, no count enable flag is required. If set to 1,
count enable flag must have a value of 1 for counting to occur.
en 0 0 Enable counting
The M-Box performance monitor data registers are 48b wide. A counter overflow occurs when a carry
out bit from bit 47 is detected. Software can force all uncore counting to freeze after N events by
preloading a monitor with a count value of (248 - 1) - N and setting the control register to send a PMI to
the U-Box. Upon receipt of the PMI, the U-Box will disable counting ( Section 2.1.1.1, “Freezing on
Counter Overflow”). During the interval of time between overflow and global disable, the counter value
will wrap and continue to collect events.
In this way, software can capture the precise number of events that occurred between the time uncore
counting was enabled and when it was disabled (or ‘frozen’) with minimal skew.
If accessible, software can continuously read the data registers without disabling event collection.
The M-Box also includes a 16b timestamp unit that is incremented each M-Box clock tick. It is a free-
running counter unattached to the rest of the M-Box PMU, meaning it does not generate an event fed to
the other counters.
2-101
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
address 27:0 0 Address bits to mask ‘don’t care’ bits during match - cache aligned
address 33:6
MAP - Memory Mapper - receives read and write commands/addresses from the B-Box and translates
the received addresses (physical) into DRAM addresses (rank, bank, row and column). The commands
and translated addresses are sent to the PLD. In parallel, the broken DRAM addresses are also sent to
the PGT.
PLD - Payload Queue - Receives command and translated addresses from the MAP while the PGT
translates MAP commands into DRAM command combinations.
2-102
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
Original B-Box transaction’s FVID sent from DSP during subcommand execution where the appropriate
subcommand information is accessed to compose the Intel® SMI command frame.
PGT - Page Table - Keeps track of open pages. Translates the read/write commands into DRAM
command combinations (i.e. PRE, RAS, CASrd, CASwr). The generated command combination (e.g.
PRE_RAS_CASrd) is then sent to the Dispatch Queue.
If
Then the PGT will detect the conflict and place the command in the retryQ for later execution.
DSP - Dispatch Queue - receives DRAM command from PGT and stores request in a read or write
subqueue. In the dispatch queue, the command combinations are broken up into subcommand kinds
that are sequenced in the necessary order required to complete the read/write transaction (i.e. PRE,
RAS, CAS, CASpre). All “ready to execute” subcommands stored within the various DSP queues are
presented simultaneously to the issue logic.
Once the ISS returns the subcommand choice, the oldest DSP entry containing that subcommand kind
(for a particular DIMM) is allowed to execute. During subcommand execution, the DSP sends the
original (B-Box) transaction’s FVID (that was stored in the DSP entry) to the PLD. After subcommand
execution, the DSP’s queue entry state is updated to the next required subcommand kind (based on the
original command combination) to be executed (new state).
ISS - Issue - receives “ready to execute” subcommands from the dispatch queue(s) as a bit vector that
is organized with each bit representing a subcommand kind per DIMM (i.e. RAS for DIMM0, CAS for
DIMM3). Having an overview of all these subcommand kinds enables the ISS to flexibly schedule/
combine subcommands out-of-order. Once a subcommand kind for a particular DIMM is selected from
the issue vector by the ISS, that subcommand choice is driven back to the DSP
FVC - Fill and Victim Control - drives all the control signals required by the fill datapath and victim
datapath. Additionally, it handles issuing and control of the buffer maintenance commands (i.e. MRG,
F2V, V2V, V2F and F2B). It also contains the logic to respond to the B-Box when commands in the M-
Box have completed.
The DSP subcontrol register contains bits to specify subevents of the DSP_FILL event, breaking it into
write queue/read queue occupancy as well as DSP latency.
2-103
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
lat_cnt_en 6 0 Latency count mode. If 1, the latency for this FVID is counted.
fvid 5:0 0 FVID (Fill Victim Index) of transaction for which scheduler latency is to
be counted. Only fully completed transactions are counted.
The ISS subcontrol register contains bits to specify subevents for the ISS_EV (by Intel SMI frame),
CYCLES_SCHED_MODE (cycles spent per ISS mode) and PLD_DRAM_EV (DRAM commands broken
down by scheduling mode in the ISS) events.
HW
Field Bits Access Reset Reset Type
Val
sched_mode_pld_trig 9:7 RW 0 Selects the scheduling mode for which the number of
DRAM commands is counted in MA_PLD. Here for
implementation reasons.
Uses same encodings as M_MSR_PMU_ISS.sched_mode:
000: trade-off
001: rd priority
010: wr priority
011: adaptive
000: trade-off
001: rd priority
010: wr priority
011: adaptive
2-104
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
The MAP subcontrol register contains bits to specify subevents for BCMD_SCHEDQ_OCCUPANCY (by B-
Box command type).
HW
Field Bits Reset Reset Type
Val
set_patrol_req 11:10 0 Select various patrol requests for MAP PMU Event 2.
anycmd 8 0 Count all B-Box commands to M-Box. Event is counted by PGT Event0.
opn2cls_cnt 4:0 0 Selects FVID (Fill Victim Index) for which overall or scheduler latency is
to be counted.
The THR subcontrol register contains bits to specify subevents for the THR_TT_TRP_UP/DN_EV events
allowing a user to choose select DIMMs and whether the temperature is rising or falling.
HW
Field Bits Access Reset Reset Type
Val
trp_pt_dn_cnd 10:9 RW 0 Selects the condition to count for "downwards" trip point
crossings. See Table 2-77, “TRP_PT_{DN,UP}_CND
Encodings” for encodings.
trp_pt_up_cnd 8:7 RW 0 Selects the condition to count for "upwards" trip point
crossings. See Table 2-77, “TRP_PT_{DN,UP}_CND
Encodings” for encodings.
dimm_trp_pt 6:4 RW 0 Selects the DIMM for which to count the trip point crossings.
Unused when all_dimms_trp_pt field is set.
all_dimms_trp_pt 3 RW 0 Select all DIMMs to provide trip point crossings events instead
of a single particular DIMM.
2-105
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
ABOVE_TEMPLO Above the low temperature trip point, but below the mid temperature
0b01 trip point.
The PGT subcontrol register contains bits to specify subevents for CYCLES_PGT_STATE (time spent in
open or closed state) and PGT_PAGE_EV (op2cls or cls2opn transitions) as well as provide bits to
further breakdown throttling events into ranks.
HW
Field Bits Reset Reset Type
Val
opncls_time 6 0 Selects time counting between open and closed page mode.
0 - CLS - Counts time spent in closed page mode.
1 - OPN - Counts time spent in open page mode.
tt_rnk_cnd 5:2 0 Selects which rank is observed for thermal throttling events.
rnk_cnd 1 0 Selects how thermal throttling events are counted relative to rank.
0 - ALL - Counts thermal throttling events for all ranks.
1 - SGL - Counts thermal throttling events for the single rank selected by
tt_rnk_cnd.
The PLD subcontrol register contains bits to specify subevents for PLD_DRAM_EV (by DRAM CMD type),
PLD_RETRY_EV (to specify FVID).
2-106
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
HW
Field Bits Reset Reset Type
Val
pld_trig_sel 15:14 0 When 0, corresponding PMU event records number of ZAD parity errors.
When 1 or 2, respective trigger match event is selected.
11110 - ZQCAL_SCMD
11101 - RCR_SCMD
11100 - WCR_SCMD
11000 - NOWPE_SCMD
10111 - SFT_RST_SCMD
10110 - IBD_SCMD
10101 - CKEL_SCMD
10100 - CKEH_SCMD
10011 - POLL_SCMD
10010 - SYNC_SCMD
10001 - PRE_SCMD
10000 - TRKL_SCMD
01111 - GENDRM_SCMD
01110 - EMRS3_SCMD
01101 - EMRS2_SCMD
01100 - NOP_SCMD
01011 - EXSR_SCMD
01010 - ENSR_SCMD
01001 - RFR_SCMD
01000 - EMRS_SCMD
00111 - MRS_SCMD
00110 - CASPRE_WR_SCMD
00101 - CASPRE_RD_SCMD
00100 - CAS_WR_SCMD
00011 - (* undefined count *)
00010 - RAS_SCMD
00001 - PRECALL_SCMD
00000 - ILLEGAL_SCMD
rtry_sngl_fvid 7 0 Controls FVID (Fill Victim Index) selection for which the number of
retries is to be counted.
0 - ALL - All retries are counted, regardless of FVID
1 - FVID - Counts only the retries whose FVIDs match this CSR’s fvid
field.
fvid 6:1 0 The FVID for which the number of retries is to be counted.
The FVC subcontrol register contains bits to break the FVC_EV into events observed by the Fill and
Victim Control logic (i.e. B-Box commands, B-Box responses, various error conditions, etc). The FVC
register can be set up to monitor four independent FVC-subevents simultaneously. However, many of
the FVC-subevents depend on additional FVC fields which detail B-Box response and commands.
Therefore, only one B-Box response or command may be monitored at any one time.
2-107
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
HW
Field Bits Reset Reset Type
Val
smi_nb_trig 0b111 Select Intel SMI Northbound debug event bits from the Intel SMI
status frames as returned from the Intel 7500 Scalable Memory
Buffers OR PBOX init error (see pbox_init_err field). These bits
are denoted NBDE in the Intel SMI spec status frame description.
An OR of all the bits over all the Intel 7500 Scalable Memory
Buffers is selected here as an event.
mem_ecc_err 0b001 Memory ECC error detected (that is not a link-level CRC error).
2-108
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
Reserved 0b110
corr_resp 0b010 Corrected (after, for example, error trials or just by a retry).
retry_resp 0b001 Retry response. Possibly a correctable error. Retries are generated
until it is decided that error was either correctable or
uncorrectable.
- B-Box commands - reads, writes, fill2victim, merge, etc. Can be conditioned on fvid which allows
determining average latency of M-Box and memory.
- B-Box responses. Incrementing on read command and decrementing on read response allows one
to determine the number of simultaneous reads in the M-Box. A max detector can log the max number
of reads the M-Box received.
- Translated commands: ras_caspre, ras_cas, cas, ras_cas_pre, pre, etc (can be filtered on r/w)
2-109
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
- Open-page to closed-page policy transitions. As well as length of time spent in each policy.
- Thermal throttling
2-110
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
Events selected by
set_flag_sel
BBOX_CMDS_ALL
• Title: All B-Box Commands
• Category: M-Box Commands Received
• Event Code: 0x1a, Max. Inc/Cyc: 1,
• Definition: Number of new commands sent from the B-Box and received by the M-Box.
CYCLES
• Title: M-Box Cycles
• Category: Cycle Events
• Event Code: 0x1b, Max. Inc/Cyc: 1,
• Definition: Count M-box cycles
CYCLES_DSP_FILL
• Title: Time in DSP_FILL State
• Category: Cycle Events
• Event Code: [21:19]0x00 && [7]0x1, Max. Inc/Cyc: 1,
• Definition: Counts cycles spent in state specified in M_CSR_PMU_DSP register.
CYCLES_MFULL
• Title: M-Box Full Cycles
• Category: Cycle Events
• Event Code: 0x01, Max. Inc/Cyc: 1,
• Definition: Number of cycles spent in the "mfull" state. Also known as the “badly starved” state.
2-111
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
CYCLES_PGT_STATE
• Title: Time in Page Table State
• Category: Cycle Events
• Event Code: [21:19]0x05 && [7]0x1, Max. Inc/Cyc: 1,
• Definition: Counts cycles page table stays in state as specified in PMU_PGT.opencls_time.
CYCLES_RETRYQ_STARVED
• Title: Time RetryQ Starved
• Category: Cycle Events
• Event Code: [21:19]0x4 && [7]0x1, Max. Inc/Cyc: 1,
• Definition: Counts cycles RetryQ spends in the “starved” state.
CYCLES_RETRYQ_MFULL
• Title: Time RetryQ MFull
• Category: Cycle Events
• Event Code: [21:19]0x3 && [7]0x1, Max. Inc/Cyc: 1,
• Definition: Counts cycles RetryQ spends in the “mfull” state. Also known as the “badly starved” state.
CYCLES_SCHED_MODE
• Title: Time in SCHED_MODE State
• Category: Cycle Events
• Event Code: [21:19]0x01 && [7]0x1, Max. Inc/Cyc: 1,
• Definition: Counts cycles spent in scheduling mode specified in M_CSR_PMU_ISS.sched_mode regis-
ter.
DRAM_CMD
• Title: DRAM Commands
• Category: DRAM Commands
• Event Code: 0x0a, Max. Inc/Cyc: 1,
• Definition: Count PLD Related DRAM Events
• NOTE: In order to measure a non-filtered version of the subevents, it is necessary to make sure the
PLD Dep bits [13,7,0] are also set to 0
2-112
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
PREALL.TRDOFF [12:8]0x1 [9:7]0x0 Count Preall (no auto-precharge, open page mode)
&& [0]0x1 DRAM commands during ‘static trade off’ scheduling
mode.
PREALL.RDPRIO [12:8]0x1 [9:7]0x1 Count Preall (no auto-precharge, open page mode)
&& [0]0x1 DRAM commands during ‘static read priority’
scheduling mode.
PREALL.WRPRIO [12:8]0x1 [9:7]0x2 Count Preall (no auto-precharge, open page mode)
&& [0]0x1 DRAM commands during ‘static write priority’
scheduling mode.
PREALL.ADAPTIVE [12:8]0x1 [9:7]0x2 Count Preall (no auto-precharge, open page mode)
&& [0]0x1 DRAM commands during ‘adaptive’ scheduling mode.
RAS.TRDOFF [12:8]0x2 [9:7]0x0 Count RAS (no auto-precharge, open page mode)
&& [0]0x1 DRAM commands during ‘static trade off’ scheduling
mode.
RAS.RDPRIO [12:8]0x2 [9:7]0x1 Count RAS (no auto-precharge, open page mode)
&& [0]0x1 DRAM commands during ‘static read priority’
scheduling mode.
RAS.WRPRIO [12:8]0x2 [9:7]0x2 Count RAS (no auto-precharge, open page mode)
&& [0]0x1 DRAM commands during ‘static write priority’
scheduling mode.
RAS.ADAPTIVE [12:8]0x2 [9:7]0x2 Count RAS (no auto-precharge, open page mode)
&& [0]0x1 DRAM commands during ‘adaptive’ scheduling mode.
CAS_RD_OPN [12:8]0x3 Count CAS Read (no auto-precharge, open page mode)
DRAM commands.
CAS_WR_OPN.TRDOFF [12:8]0x4 [9:7]0x0 Count CAS Write (no auto-precharge, open page
&& [0]0x1 mode) DRAM commands during ‘static trade off’
scheduling mode.
CAS_WR_OPN.RDPRIO [12:8]0x4 [9:7]0x1 Count CAS Write (no auto-precharge, open page
&& [0]0x1 mode) DRAM commands during ‘static read priority’
scheduling mode.
CAS_WR_OPN.WRPRIO [12:8]0x4 [9:7]0x2 Count CAS Write (no auto-precharge, open page
&& [0]0x1 mode) DRAM commands during ‘static write priority’
scheduling mode.
CAS_WR_OPN.ADAPTIVE [12:8]0x4 [9:7]0x3 Count CAS Write (no auto-precharge, open page
&& [0]0x1 mode) DRAM commands during ‘adaptive’ scheduling
mode.
CAS_RD_CLS [12:8]0x5 Count CAS Read (precharge, closed page mode) DRAM
commands.
CAS_RD_CLS.TRDOFF [12:8]0x5 [9:7]0x0 Count CAS Read (precharge, closed page mode) DRAM
&& [0]0x1 commands during ‘static trade off’ scheduling mode.
CAS_RD_CLS.RDPRIO [12:8]0x5 [9:7]0x1 Count CAS Read (precharge, closed page mode) DRAM
&& [0]0x1 commands during ‘static read priority’ scheduling
mode.
CAS_RD_CLS.WRPRIO [12:8]0x5 [9:7]0x2 Count CAS Read (precharge, closed page mode) DRAM
&& [0]0x1 commands during ‘static write priority’ scheduling
mode.
CAS_RD_CLS.ADAPTIVE [12:8]0x5 [9:7]0x3 Count CAS Read (precharge, closed page mode) DRAM
&& [0]0x1 commands during ‘adaptive’ scheduling mode.
CAS_WR_CLS [12:8]0x6 Count CAS Write (precharge, closed page mode) DRAM
commands.
CAS_WR_CLS.TRDOFF [12:8]0x6 [9:7]0x0 Count CAS Write (precharge, closed page mode) DRAM
&& [0]0x1 commands during ‘static trade off’ scheduling mode.
CAS_WR_CLS.RDPRIO [12:8]0x6 [9:7]0x1 Count CAS Write (precharge, closed page mode) DRAM
&& [0]0x1 commands during ‘static read priority’ scheduling
mode.
2-113
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
CAS_WR_CLS.WRPRIO [12:8]0x6 [9:7]0x2 Count CAS Write (precharge, closed page mode) DRAM
&& [0]0x1 commands during ‘static write priority’ scheduling
mode.
CAS_WR_CLS.ADAPTIVE [12:8]0x6 [9:7]0x3 Count CAS Write (precharge, closed page mode) DRAM
&& [0]0x1 commands during ‘adaptive’ scheduling mode.
ALL.TRDOFF [0]0x0 [9:7]0x0 Count all DRAM commands during "static trade off"
scheduling mode
ALL.RDPRIO [0]0x0 [9:7]0x1 Count all DRAM commands during "static read priority"
scheduling mode
ALL.WRPRIO [0]0x0 [9:7]0x2 Count all DRAM commands during "static write
priority" scheduling mode
DSP_FILL
• Title: Dispatch Queue Events
• Category: DSP Events
• Event Code: 0x00, Max. Inc/Cyc: 1,
• Definition: Measure a dispatch queue event.
FVC_EV0
• Title: FVC Event 0
• Category: FVC Events
• Event Code: 0x0d, Max. Inc/Cyc: 1,
• Definition: Measure an FVC related event.
• NOTE: It is possible to program the FVC register such that up to 4 events from the FVC can be inde-
pendently monitored. However, the bcmd_match and resp_match subevents depend on the setting of
additional bits in the FVC register (11:9 and 8:5 respectively). Therefore, only ONE
2-114
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
FVC_EVx.bcmd_match event may be monitored at any given time. The same holds true for
FVC_EVx.resp_match
BBOX_CMDS.READS 0x5 0x0 Reads commands to M box from B box (e.g. reads
from memory)
BBOX_CMDS.WRITES 0x5 0x1 Write commands from B box to M box (e.g. writes to
memory)
BBOX_RSP.COR 0x6 0x2 Counts corrected (for example, after error trials or
just by a retry)
SMI_NB_TRIG 0x7 Select Intel SMI Northbound debug event bits from
Intel SMI status frames as returned from the Intel
7500 Scalable Memory Buffers. Used for Debug
purposes
FVC_EV1
• Title: FVC Event 1
• Category: FVC Events
• Event Code: 0x0e, Max. Inc/Cyc: 1,
• Definition: Measure an FVC related event.
• NOTE: It is possible to program the FVC register such that up to 4 events from the FVC can be inde-
pendently monitored. However, the bcmd_match and resp_match subevents depend on the setting of
additional bits in the FVC register (11:9 and 8:5 respectively). Therefore, only ONE
FVC_EVx.bcmd_match event may be monitored at any given time. The same holds true for
FVC_EVx.resp_match
2-115
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
BBOX_CMDS.READS 0x5 0x0 Reads commands to M box from B box (e.g. reads
from memory)
BBOX_CMDS.WRITES 0x5 0x1 Write commands from B box to M box (e.g. writes
from memory)
BBOX_RSP.COR 0x6 0x2 Counts corrected (for example, after error trials or
just by a retry)
SMI_NB_TRIG 0x7 Select Intel SMI Northbound debug event bits from
Intel SMI status frames as returned from the Intel
7500 Scalable Memory Buffers. Used for Debug
purposes
FVC_EV2
• Title: FVC Event 2
• Category: FVC Events
• Event Code: 0x0f, Max. Inc/Cyc: 1,
• Definition: Measure an FVC related event.
• NOTE: It is possible to program the FVC register such that up to 4 events from the FVC can be inde-
pendently monitored. However, the bcmd_match and resp_match subevents depend on the setting of
additional bits in the FVC register (11:9 and 8:5 respectively). Therefore, only ONE
FVC_EVx.bcmd_match event may be monitored at any given time. The same holds true for
FVC_EVx.resp_match
BBOX_CMDS.READS 0x5 0x0 Reads commands to M box from B box (e.g. reads
from memory)
2-116
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
BBOX_CMDS.WRITES 0x5 0x1 Write commands from B box to M box (e.g. writes
from memory)
BBOX_RSP.COR 0x6 0x2 Counts corrected (for example, after error trials or
just by a retry)
SMI_NB_TRIG 0x7 Select Intel SMI Northbound debug event bits from
Intel SMI status frames as returned from the Intel
7500 Scalable Memory Buffers. Used for Debug
purposes
FVC_EV3
• Title: FVC Event 3
• Category: FVC Events
• Event Code: 0x10, Max. Inc/Cyc: 1,
• Definition: Measure an FVC related event.
• NOTE: It is possible to program the FVC register such that up to 4 events from the FVC can be inde-
pendently monitored. However, the bcmd_match and resp_match subevents depend on the setting of
additional bits in the FVC register (11:9 and 8:5 respectively). Therefore, only ONE
FVC_EVx.bcmd_match event may be monitored at any given time. The same holds true for
FVC_EVx.resp_match
BBOX_CMDS.READS 0x5 0x0 Reads commands to M box from B box (e.g. reads
from memory)
BBOX_CMDS.WRITES 0x5 0x1 Write commands from B box to M box (e.g. writes
from memory)
2-117
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
BBOX_RSP.COR 0x6 0x2 Counts corrected (for example, after error trials or
just by a retry)
SMI_NB_TRIG 0x7 Select Intel SMI Northbound debug event bits from
Intel SMI status frames as returned from the Intel
7500 Scalable Memory Buffers. Used for Debug
purposes
FVID_RACE
• Title: FVID Race Detected
• Category: Misc
• Event Code: 0x18, Max. Inc/Cyc: 1,
• Definition: Number of FVID (Fill Victim Index) races detected.
• NOTE: This is a race condition where an IMT entry is recycled prematurely. It should not be observ-
able in hardware.
INFLIGHT_CMDS
• Title: In-flight Commands
• Category: M-Box Commands Received
• Event Code: 0x1d, Max. Inc/Cyc: 1,
• Definition: Number of new memory controller (read and write) commands accepted
FRM_TYPE
• Title: Frame (Intel SMI) Types
• Category: DRAM Commands
• Event Code: 0x09, Max. Inc/Cyc: 1,
• Definition: Count ISS Related Intel SMI Frame Type Events
ISS[3:0]
Extension Description
Bits
2-118
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
BCMD_SCHEDQ_OCCUPANCY
• Title: B-Box Command Scheduler Queue Occupancy
• Category: Cycle Events
• Event Code: [21:19]0x06 && [7]0x1, Max. Inc/Cyc: 1,
• Definition: Counts the queue occupancy of the B-Box Command scheduler per FVID. The FVID (Fill
Victim Index) for the command to be monitored must be programmed in MSR_PMU_MAP.fvid.
V2F 0x3 Victim buffer to Fill buffer transfer (V2F) command from the B-Box to
the M-Box
V2V 0x4 Victim buffer to Victim buffer transfer (V2V) command from the B-Box
to the M-Box
F2V 0x5 Fill buffer to Victim buffer transfer (F2V) command from the B-Box to
the M-Box
SPRWR 0x7 spare write commands from the B-Box to the M-Box
MA_PAR_ERR
• Title: MA Parity Error
• Category: Misc
• Event Code: 0x0c, Max. Inc/Cyc: 1,
• Definition: Number of MA even parity errors detected.
MULTICAS
• Title: Multi-CAS
• Category: Misc
• Event Code: 0x17, Max. Inc/Cyc: 1,
• Definition: Number of Multi-CAS patrol transactions detected. This is an indication of multiple CAS
commands to the same open page, and should correlate to page hit rate.
PAGE_EMPTY
• Title: Page Table Empty
• Category: Page Table Related
• Event Code: 0x15, Max. Inc/Cyc: 1,
• Definition: Number of transactions accessing a bank with no open page. The page was previously
closed either because it has never been opened, was closed via a CASpre, closed explicitly by the idle
page closing mechanism, or closed by a PREALL in order to do a refresh. This is command that
requires a RAS-CAS to complete.
PAGE_HIT
• Title: Page Table Hit
• Category: Page Table Related
• Event Code: 0x14, Max. Inc/Cyc: 1,
• Definition: Number of page hits detected. This is a command that requires a only a CAS to complete.
2-119
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
PAGE_MISS
• Title: Page Table Misses
• Category: Page Table Related
• Event Code: 0x13, Max. Inc/Cyc: 1,
• Definition: Number of page misses detected. This is a command that requires a PRE-RAS-CAS to
complete.
PATROL_TXNS
• Title: Patrol Scrub Transaction
• Category: Misc
• Event Code: 0x11, Max. Inc/Cyc: 1,
• Definition: Number of patrol scrub transactions detected.
PGT_PAGE_EV
• Title: PGT Related Page Table Events
• Category: Page Table Related
• Event Code: 0x16, Max. Inc/Cyc: 1,
• Definition: Counts PGT Related Page Table Events.
PGT
Extension Description
Bits[0]
RETRIES
• Title: Retry Events
• Category: Retry Events
• Event Code: 0x0b, Max. Inc/Cyc: 1,
• Definition: Count PLD Related Retry Events
REFRESH
• Title: Refresh Commands
• Category: DRAM Commands
• Event Code: 0x06, Max. Inc/Cyc: 1,
• Definition: Advance counter when a refresh command is detected.
REFRESH_CONFLICT
• Title: Refresh Conflict
• Category: DRAM Commands
• Event Code: 0x07, Max. Inc/Cyc: 1,
• Definition: Number of refresh conflicts detected. A refresh conflict is a conflict between a read/write
transaction and a refresh command to the same rank.
2-120
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
RETRY_MFULL
• Title: Retry MFull
• Category: Retry Events
• Event Code: 0x02, Max. Inc/Cyc: 1,
• Definition: Number of retries detected while in the "mfull" state. Also known as the “badly starved”
state.
RETRY_STARVE
• Title: Retry Starve
• Category: Retry Events
• Event Code: 0x03, Max. Inc/Cyc: 1,
• Definition: Number of retries detected while in the "starved" state.
SCHED_INFLIGHT_CMDS
• Title: Scheduler In-flight Commands
• Category: M-Box Commands Received
• Event Code: 0x1c, Max. Inc/Cyc: 1,
• Definition: Number of new scheduler commands that were accepted.
SCHED_MODE_CHANGES
• Title: Scheduling Mode Changes
• Category: DRAM Commands
• Event Code: 0x08, Max. Inc/Cyc: 1,
• Definition: Number of ISS scheduling mode transitions detected.
THERM_TRP_DN
• Title: DIMM ‘Dn’ Thermal Trip Points Crossed
• Category: Thermal Throttle
• Event Code: 0x05, Max. Inc/Cyc: 1,
• Definition: Counts when a specified thermal trip point is crossed in the “down” direction.
THR Bits
Extension Description
[10:9],[3]
ALL.GT_MID_RISE 0x3,0x1 Advance the counter when the above mid temp thermal
trip point (rising) is crossed in the "down" direction for any
DIMM
ALL.GT_MID_FALL 0x2,0x1 Advance the counter when the above mid temp thermal
trip point (falling) is crossed in the "down" direction for any
DIMM
ALL.GT_LO 0x1,0x1 Advance the counter when the above low temp, but below
mid temp thermal trip point is crossed in the "down"
direction for any DIMM.
ALL.LT_LO 0x0,0x1 Advance the counter when the below low temp thermal trip
point is crossed in the "down" direction for any DIMM
DIMM{n}.GT_MID_RISE 0x3,0x0 Advance the counter when the above mid temp thermal
trip point (rising) is crossed in the "down" direction for
DIMM #?
NOTE: THR Bits [6:4] must be programmed with the DIMM
#
DIMM{n}.GT_MID_fALL 0x2,0x0 Advance the counter when the above mid temp thermal
trip point (falling) is crossed in the "down" direction for
DIMM #?
NOTE: THR Bits [6:4] must be programmed with the DIMM
#
2-121
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
THR Bits
Extension Description
[10:9],[3]
DIMM{n}.GT_LO 0x1,0x0 Advance the counter when the above low temp, but below
mid temp thermal trip point is crossed in the "down"
direction for DIMM #?
NOTE: THR Bits [6:4] must be programmed with the DIMM
#
DIMM{n}.LT_LO 0x0,0x0 Advance the counter when the below low temp, but below
mid temp thermal trip point is crossed in the "down"
direction for DIMM #?
NOTE: THR Bits [6:4] must be programmed with the DIMM
#
THERM_TRP_UP
• Title: DIMM ‘Up’ Thermal Trip Points Crossed
• Category: Thermal Throttle
• Event Code: 0x04, Max. Inc/Cyc: 1,
• Definition: Counts when a specified thermal trip point is crossed in the “up” direction.
THR Bits
Extension Description
[8:7],[3]
ALL.GT_MID_RISE 0x3,0x1 Advance the counter when the above mid temp thermal
trip point (rising) is crossed in the "up" direction for any
DIMM
ALL.GT_MID_FALL 0x2,0x1 Advance the counter when the above mid temp thermal
trip point (falling) is crossed in the "up" direction for any
DIMM
ALL.GT_LO 0x1,0x1 Advance the counter when the above low temp, but below
mid temp thermal trip point is crossed in the "up" direction
for any DIMM.
ALL.LT_LO 0x0,0x1 Advance the counter when the below low temp thermal trip
point is crossed in the "up" direction for any DIMM
DIMM{n}.GT_MID_RISE 0x3,0x0 Advance the counter when the above mid temp thermal
trip point (rising) is crossed in the "up" direction for DIMM
#?
NOTE: THR Bits [6:4] must be programmed with the DIMM
#
DIMM{n}.GT_MID_fALL 0x2,0x0 Advance the counter when the above mid temp thermal
trip point (falling) is crossed in the "up" direction for DIMM
#?
NOTE: THR Bits [6:4] must be programmed with the DIMM
#
DIMM{n}.GT_LO 0x1,0x0 Advance the counter when the above low temp, but below
mid temp thermal trip point is crossed in the "up" direction
for DIMM #?
NOTE: THR Bits [6:4] must be programmed with the DIMM
#
DIMM{n}.LT_LO 0x0,0x0 Advance the counter when the below low temp, but below
mid temp thermal trip point is crossed in the "up" direction
for DIMM #?
NOTE: THR Bits [6:4] must be programmed with the DIMM
#
TRANS_CMDS
• Title: Translated Commands
• Category: Dispatch Queue
• Event Code: 0x12, Max. Inc/Cyc: 1,
• Definition: Counts read/write commands entered into the Dispatch Queue.
2-122
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
TT_CMD_CONFLICT
• Title: Thermal Throttling Command Conflicts
• Category: Thermal Throttle
• Event Code: 0x19, Max. Inc/Cyc: 1,
• Definition: Count command conflicts due to thermal throttling.
2-123
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
The W-Box also provides a 48-bit wide fixed counter that increments at the uncore clock frequency.
For information on how to setup a monitoring session, refer to Section 2.1, “Global Performance
Monitoring Control”.
- Umask enables core to scope events for each core (one-hot encoding) (Bit 0 in umask for Core 0, Bit
1 in umask for core 1, etc).
HW can be also configured (by setting the corresponding .pmi_en to 1) to send a PMI to the U-Box
when an overflow is detected. The U-Box may be configured to freeze all uncore counting and/or send a
PMI to selected cores when it receives this signal.
Once a freeze has occurred, in order to see a new freeze, the overflow field responsible for the freeze,
must be cleared by setting the corresponding bit in W_MSR_PMON_GLOBAL_OVF_CTL.clr_ov/
clr_ov_fixed. Assuming all the counters have been locally enabled (.en bit in data registers meant to
monitor events) and the overflow bit(s) has been cleared, the W-Box is prepared for a new sample
interval. Once the global controls have been re-enabled (Section 2.1.4, “Enabling a New Sample
Interval from Frozen Counters”), counting will resume.
2-124
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
The _GLOBAL_CTL register contains the bits used to enable monitoring. It is necessary to set the
.ctr_en/fixed_en bit to 1 before the corresponding data register can collect events.
If an overflow is detected from one of the W-Box PMON registers, the corresponding bit in the
_GLOBAL_STATUS.ov/ov_fixed field will be set. To reset the overflow bits set in the
_GLOBAL_STATUS.ov/ov_fixed field, a user must set the corresponding bits in the
_GLOBAL_OVF_CTL.clr_ov/clr_ov_fixed field before beginning a new sample interval.
ctr_en 3:0 0 Must be set to enable each WBOX counter (bit 0 to enable ctr0, etc)
NOTE: U-Box enable and per counter enable must also be set to fully
enable the counter.
2-125
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
ov_fixed 31 0 If an overflow is detected from the WBOX PMON fixed counter, this bit
will be set.
HW
Field Bits Reset Description
Val
clr_ov_fixed 31 0 Writing ‘1’ to bit in field causes corresponding bit in ‘Overflow PerfMon
Counter’ field in W_MSR_PMON_GLOBAL_STATUS register to be
cleared to 0.
clr_ov 5:0 0 Writing ‘1’ to bit in field causes corresponding bit in ‘Overflow PerfMon
Counter’ field in W_MSR_PMON_GLOBAL_STATUS register to be
cleared to 0.
- .threshold - since C-Box counters can increment by a value greater than 1, a threshold can be applied.
If the .threshold is set to a non-zero value, that value is compared against the incoming count for that
event in each cycle. If the incoming count is >= the threshold value, then the event count captured in
the data register will be incremented by 1. (Not present in fixed counter)
- .invert - Changes the .threshold test condition to ‘<‘ (Not present in fixed counter)
- .edge_detect - Rather than accumulating the raw count each cycle (for events that can increment by
1 per cycle), the register can capture transitions from no event to an event incoming. (Not present in
fixed counter)
2-126
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
invert 23 0 Invert threshold comparison. When ‘0’, the comparison will be thresh >=
event. When ‘1’, the comparison will be threshold < event.
en 22 0 Counter enable
pmi_en 20 0 PMI Enable. If bit is set, when corresponding counter overflows, a PMI
exception is sent to the U-Box.
edge_detect 18 0 Edge Detect. When bit is set, 0->1 transition of a one bit event input will
cause counter to increment. When bit is 0, counter will increment for
however long event is asserted.
umask 15:8 0 In W-Box, this field is used to enable core scope events per core. Bit 0
masks Core 0, bit 1 masks Core 1, etc.
pmi_en 1 0 PMI Enable. If bit is set, when corresponding counter overflows, a PMI
exception is sent to the U-Box.
en 0 0 Counter enable
The W-Box performance monitor data registers are 48b wide. A counter overflow occurs when a carry
out bit from bit 47 is detected. Software can force all uncore counting to freeze after N events by
preloading a monitor with a count value of (248 - 1) - N and setting the control register to send a PMI to
the U-Box. Upon receipt of the PMI, the U-Box will disable counting ( Section 2.1.1.1, “Freezing on
Counter Overflow”). During the interval of time between overflow and global disable, the counter value
will wrap and continue to collect events.
In this way, software can capture the precise number of events that occurred between the time uncore
counting was enabled and when it was disabled (or ‘frozen’) with minimal skew.
If accessible, software can continuously read the data registers without disabling event collection.
2-127
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
Note: Due to an errata found within the Intel Xeon Processor 7500 Series, SW must consider
two special cases:
• If SW reads a counter whose value ends in 0x000000 or 0x000001, SW should subtract 0x1000000
to get the correct value.
• SW should not set up a sample interval whose value ends in either 0xfffffe or 0xffffff.
Beyond that, the W-Box provides a smattering of events that indicating when, and under what
circumstances, the W-Box throttled the chip due to power constraints.
2-128
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
C_CYCLES_TURBO
• Title: Core in C0 at Turbo
• Category: W-Box Events
• Event Code: 0x04, Max. Inc/Cyc: 1,
• Definition: Selected core is in C0 and operating at a ‘Turbo’ operating point.
C_C0_THROTTLE_DIE
• Title: Core Throttled in C0
• Category: W-Box Events
• Event Code: 0x01, Max. Inc/Cyc: 1,
• Definition: Selected core is in C0 and is throttling.
C_C0_THROTTLE_PROCHOT
• Title: Core Throttled in C0 due to FORCEPR
• Category: W-Box Events
• Event Code: 0x03, Max. Inc/Cyc: 1,
• Definition: Selected core is in C0 and is throttling due to FORCEPR assertion.
C_THROTTLE_TMP
• Title: Core Throttled due to Temp
• Category: W-Box Events
• Event Code: 0x00, Max. Inc/Cyc: 1,
• Definition: Temperature of the selected core is at or above the throttle temperature.
PROCHOT
• Title: Prochot
• Category: W-Box Events
• Event Code: 0x02, Max. Inc/Cyc: 1,
• Definition: Package is asserting the PROCHOT output.
RATIO_CHANGE_ABORT
• Title: Ratio Change Abort
• Category: W-Box Events
• Event Code: 0x08, Max. Inc/Cyc: 1,
• Definition: Selected core aborted a ratio change request.
TM1_ON
• Title: TM1 Throttling On
• Category: W-Box Events
• Event Code: 0x07, Max. Inc/Cyc: 1,
• Definition: Selected core is in C0 and TM1 throttling is occurring.
2-129
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
---
---
2-130
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
2-131
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
Gen
Name Opc MC Desc
By?
AckCnfltWbI 1001 HOM0 In addition to signaling AckCnflt, the caching agent has
also written the dirty cache line data plus any partial
write data back to memory in a WBiData[Ptl] message
and transitioned the cache line state to I.
Cmp_FwdCode 1010 NDR Complete request, forward the line in F (or S) state to the
requestor specified, invalidate local copy or leave it in S
state.
Cmp_FwdInvOwn 1011 NDR Complete request, forward the line in E or M state to the
requestor specified, invalidate local copy
EvctCln 0110 HOM0 Clean cache line eviction notification to home agent.
FERR 0011 NCS Legacy floating point error indication from CPU to legacy
bridge
Gnt_Cmp 0000 NDR Signal completion and Grant E state ownership without
data to an InvItoE or ‘null data’ to an InvXtoI
Gnt_FrcAckCnflt 0001 NDR Signal FrcAckCnflt and Grant E state ownership without
data to an InvItoE or ‘null data’ to an InvXtoI
IntPrioUpd 1011 NCB Ui, Uo Interrupt priority update message to source interrupt
agents.
InvXtoI 0101 HOM0 Flush a cache line from all caches (that is, downgrade all
clean copies to I and cause any dirty copy to be written
back to memory).
2-132
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
Gen
Name Opc MC Desc
By?
NcRd 0000 NCS Ui Read from non-coherent memory mapped I/O space
NcRdPtl 0100 NCS Ui Partial read from non-coherent memory mapped I/O
space
NcWrPtl 1100 NCB Ui Partial write to non-coherent memory mapped I/O space
NonSnpWrDataPtl 1011 DRS Partial (byte-masked) non cache coherent write data
RdCode 0001 HOM0 Read cache line in F (or S, if the F state not supported)
RdCur 0000 HOM0 Request a cache line in I. Typically issued by I/O proxy
entities, RdCur is used to obtain a coherent snapshot of
an uncached cache line.
RdData 0010 HOM0 Read cache line in either E or F (or S, if F state not
supported). The choice between F (or S) and E is
determined by whether or not per caching agent has
cache line in S state.
RspCnflt 0100 HOM1 Peer is left with line in I or S state, and the peer has a
conflicting outstanding request.
RspCnfltOwn 0110 HOM1 Peer has a buried M copy for this line with an outstanding
conflicting request.
RspFwd 1000 HOM1 Peer has sent data to requestor with no change in cache
state
RspFwdI 1001 HOM1 Peer has sent data to requestor and is left with line in I
state
RspFwdIWb 1011 HOM1 Peer has sent data to requestor and a WbIData to the
home, and is left with line in I state
RspFwdS 1010 HOM1 Peer has sent data to requestor and is left with line in S
state
RspFwdSWb 1100 HOM1 Peer has sent data to requestor and a WbSData to the
home, and is left with line in S state
RspIWb 1101 HOM1 Peer has evicted the data with an in-flight WbIData[Ptl]
message to the home and has not sent any message to
the requestor.
RspSWb 1110 HOM1 Peer has sent a WbSData message to the home, has not
sent any message to the requestor and is left with line in
S-state
2-133
INTEL® XEON® PROCESSOR 7500 SERIES UNCORE PROGRAMMING GUIDE UNCORE PERFORMANCE MONITORING
Gen
Name Opc MC Desc
By?
WbMtoI 1100 HOM0 Write a cache line in M state back to memory and
transition its state to I.
WbMtoE 1101 HOM0 Write a cache line in M state back to memory and
transition its state to E.
WbMtoS 1110 HOM0 Write a cache line in M state back to memory and
transition its state to S.
2-134