First, we select seven different types of contextual information with different graininess to implement corresponding delta prefetchers. These delta prefetchers, based on different contextual information, are as follows: the
global delta prefetcher based on access histories of the entire application,
offset based on access histories of pages with identical first-accessed page offsets,
2 PC based on access histories associated with each load instruction,
page based on access histories of individual memory pages,
PC+offset based on access histories of identical PCs with identical first-accessed page offsets,
page+offset based on access histories of identical pages with identical first-accessed page offsets,
PC+page associated with identical PCs accessing the identical memory pages. The History Tables and Delta Tables of these prefetchers are indexed by the hash of different program contexts. For example, the History Table and Delta Table of
PC+offset are both indexed by the hash of the PC and the first-accessed page offset.
Figure
2 illustrates the L1D coverage and L1D accuracy of delta prefetchers based on various types of contextual information. We will compare and analyze them according to the contextual information from coarse-grained to fine-grained. (1) For delta prefetchers based on coarse-grained information, comparing
global with
offset, the former exhibits higher L1D coverage and accuracy. This suggests that
offset may not discern different patterns as accurately as
global. (2) Comparing delta prefetchers based on coarse-grained information with those based on fine-grained information, the L1D accuracy of
global is approximately 17%
\(\sim\)18% lower than that of
PC and
page. This indicates that delta prefetchers based on fine-grained contextual information generally recognize useful patterns more accurately when different patterns are interleaved. Additionally,
PC provides 10% higher L1D coverage than
global, because it effectively distinguishes between different patterns, leading to more accurate predictions. Comparing delta prefetchers based on different fine-grained contextual information,
PC provides 14% higher L1D coverage and slightly lower L1D accuracy than
page. This is primarily due to
PC’s ability to apply the deltas of a memory page to new memory pages accessed by the same PC. Furthermore,
PC can learn access patterns involving cross-page deltas. (3) Regarding delta prefetchers
PC+offset,
page+offset, and
PC+page, which are based on more fine-grained contextual information, they do not exhibit higher L1D coverage compared to delta prefetchers
PC and
page, which rely on fine-grained contextual information. Although
PC+offset and
PC+page achieve up to 6% higher L1D accuracy than
PC, their L1D coverage is 12% to 23% lower. Similarly, compared with
page,
page+offset provides similar L1D accuracy but lower L1D coverage. This discrepancy arises because, although the first-accessed offsets may differ for a page, the access patterns within this page may be similar or dissimilar. If they are similar, then the confidence of deltas recognized by
page+offset will be dispersed across several entries, potentially resulting in missed opportunities to issue more effective prefetch requests. These findings suggest that
PC and
page effectively identify regular patterns, whereas
PC+offset,
page+offset, and
PC+page, relying on more fine-grained contextual information, may overlook some regular access patterns. Among delta prefetchers that utilize a single type of contextual information,
PC achieves the highest L1D coverage at 58%, coupled with a relatively higher L1D accuracy of 84%. However, delta prefetchers based on a single type of contextual information often struggle to adapt to the diverse access patterns across various benchmark programs. For instance,
PC outperforms
page significantly on benchmark programs featuring fewer load instructions, characterized by regular PCs’ patterns, such as 605.mcf_s and 649.fotonik3d_s. Conversely, the performance improvement of
page surpasses that of
PC notably on benchmark programs with a large number of interleaved load instructions, such as 607.cactuBSSN_s. Therefore, we conduct experiments for prefetchers based on two types of contextual information.