Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19
Memory Unit
Speed, Size, and Cost
• An ideal memory would be fast, large, and inexpensive. • A very fast memory can be implemented if SRAM chips are used. But these are expensive. • The alternative is to use Dynamic RAM chips, are much less expensive. But such memories are significantly slower. • A secondary storage, mainly magnetic disks, to implement large memory spaces. • Very large disks are available at a reasonable price, and they are used extensively in computer systems. • They are much slower than the semiconductor memory units. • A huge amount of cost-effective storage can be provided by magnetic disks. • A large, yet affordable, main memory can be built with dynamic RAM technology. • All of these different types of memory units are employed effectively in a computer. • The entire computer memory can be viewed as the hierarchy. • The fastest access is to data held in processor registers. • Then the processor registers are at the top in terms of the speed of access in the memory hierarchy. • At the next level of the hierarchy is a relatively small amount of memory that can be implemented directly on the processor chip. • This memory, called a processor cache, holds copies of instructions and data stored in a much larger memory that is provided externally. • There are often two levels of caches. • A primary cache is always located on the processor chip. • This cache is small and referred to as level 1 (L1) cache. • A larger, secondary cache is placed between the primary cache and the rest of the memory. • It is referred to as level 2 (L2) cache. It is usually implemented using SRAM chips. • Including a primary cache on the processor chip and using a larger, off-chip, secondary cache is currently the most common way of designing computers. • It is possible not to have a cache on the processor chip at all. • Also, it is possible to have both L1 and L2 caches on the processor chip. • The next level in the hierarchy is called the main memory. • This large memory is implemented using dynamic memory components, typically in the form of SIMMs, DIMMs, or RIMMs. • The main memory is much larger but significantly slower than the cache memory. • In a typical computer, the access time for the main memory is about ten times longer than the access time for the L1 cache. • Disk devices provide a huge amount of inexpensive storage. • They are very slow compared to the semiconductor devices used to implement the main memory. Cache Memories • The speed of the main memory is very low in comparison with the speed of modern processors. • For good performance, the processor cannot spend much of its time waiting to access instructions and data in main memory. • Hence, it is important to devise a scheme that reduces the time needed to access the necessary information. • Since the speed of the main memory unit is limited by electronic and packaging constraints, the solution must be sought in a different architectural arrangement. • An efficient solution is to use a fast cache memory which essentially makes the main memory appear to the processor to be faster than it really is. • The effectiveness of the cache mechanism is based on a property of computer programs called locality of reference. • Analysis of programs shows that most of their execution time is spent on routine in which many instructions are executed repeatedly. • These instructions may constitute a simple loop, nested loops, or a few procedures that repeatedly during some time period, and the remainder of the program is accessed relatively infrequently. This is referred to as locality of reference. • It manifests itself in two ways: temporal and spatial. • The first means that a recently executed instruction is likely to be executed again very soon. • The spatial aspect means that instructions in close proximity to a recently executed instruction are also likely to be executed soon. • If the active segments of a program can be placed in a fast cache memory, then the total execution time can be reduced significantly. • The memory control circuitry is designed to take advantage of the property of locality of reference. • The temporal aspect of the locality of reference suggests that whenever an information item is first needed, this item should be brought into the cache where it will hopefully remain until it is needed again. • The spatial aspect suggests that instead of several items that reside at adjacent addresses as well. • The term block is used to refer to a set of contiguous address locations of some size. • Another term that is often used to refer to a cache block is cache line. • Consider the simple arrangement in fig 5.1.4. • When a Read request is received from the processor, the contents of a block of memory words containing the location specified are transferred into the cache one word at a time. • Subsequently, when the program references any of the locations in this block, the desired contents are read directly from the cache. • Usually, the cache memory can store a reasonable number of blocks at any given time, but this number is small compared to the total number of blocks in the main memory. • The correspondence between the main memory blocks and those in the cache is specialized by a mapping function. • When the cache is full and a memory word that is not in the cache is referenced, the cache control hardware must decide which block should be removed to create space for the new block that contains the referenced word. • The collection of rules for making this decision constitutes the replacement algorithm. • The processor does not need to know explicitly about the existence of the cache. • It simply issues Read and write requests using addresses that refer to locations in the memory. • The cache control circuitry determines whether the requested word currently exists in the cache. • If it does, the Read or Write operation is performed on the appropriate cache location. • In this case, a read or write hit is said to have occurred. • In a read operation, the main memory is not involved. • For a Write operation, the system can proceed in two ways. • In the first technique, called the write-through protocol, the cache location and the main memory location are updated simultaneously. • The second technique is to update only the cache location and to mark it as updated with an associated flag bit, often called the dirty or modified bit. • The main memory location of the word id updated later, when the block containing this marked word is to be removed from the cache to make room for a new block. • This technique is known as the write-back, or copy-back, protocol. • When the addressed word in a Read operation is not in the cache, a read miss occurs. • The block of words that contains the requested word is copied from the main memory into the cache. • After the entire block is loaded into the cache, the particular word requested is forwarded to the processor. • Alternatively, this word may be sent to the processor as soon as it is read from the main memory. • The latter approach, which is called load-through, or early restart, reduces the processor’s waiting period somewhat, but at the expense of more complex circuitry. • During a Write operation, if the addressed word is not in the cache, a write miss occurs. • Then, if the write-through protocol is used, the information is written directly into the main memory. • In the case of the write-back protocol, the block containing the addressed word is first brought into the cache, and then the desired word in the cache is overwritten with the new information.