An Introduction To Computer Architecture: © 2019 Arm Limited
An Introduction To Computer Architecture: © 2019 Arm Limited
Computer Architecture
Module 1
1. By Zeptobars, CC BY 3.0
4 © 2019 Arm Limited 2. By Connie Zhou, CC BY-NC 4.0
Introduction
• The modern computer is less than 100 years old.
• The first electromechanical and valve-based machines
were produced in the 1930s and 1940s.
• Today’s machines are many orders of magnitude faster,
EDSAC replica (2018)1
lower power, more reliable, and cheaper.
Computer
architecture
Application
characteristics
Markets
New
applications
Technology
Source: “Early 21st Century Processors,” S. Vajapeyam and M. Valero, IEEE Computer, April 2004
11 © 2019 Arm Limited
Design Goals I
• Functional – hard to correct (unlike software). Verification is perhaps the highest single
cost in the design process. We also need to test our chips once they have been
manufactured, again this can be a costly process and requires careful thought at the
design stage
• Performance – what does this mean? No single best answer, e.g., sports car vs. off-road
4x4 vehicle – performance will always depend on the “workload”
• Power – a first-order design constraint for most designs today. Power limits the
performance of most systems.
Year
A. Danowitz, K. Kelley, J. Mao, J. P. Stevenson, and M. Horowitz.
Clock Frequency, Stanford CPU DB. Accessed on Nov. 5, 2019.
[Online]. Available:
http://cpudb.stanford.edu/visualize/clock_frequency
18 © 2019 Arm Limited
Historical Performance Gains
• From 1985 to 2002, performance improved by ~800 times.
• Over time, technology scaling provided much greater numbers of faster and lower
power transistors.
• The “iron law” of processor performance:
improvements in IPC. 0
Year
A. Danowitz, K. Kelley, J. Mao, J. P. Stevenson, and M.
Horowitz. Stanford CPU DB. Accessed on Nov. 5, 2019.
24 © 2019 Arm Limited
[Online]. Available: http://cpudb.stanford.edu
Moore’s Law
• Moore’s Law predicts that the number of
transistors we can integrate onto a chip, for
the same cost, doubles every 2 years.
As a result performance gains slowed from 52% to 21% per year for the highest
performance processors.
130X
High-performance
32-bit core
(e.g., Arm Cortex-M7) High-performance
13X
Used in automotive, processor (e.g., Arm
sensor hub, and other Cortex-A73). For
embedded applications. mobile and consumer
devices.
38 © 2019 Arm Limited 520X
Technology Scaling: Faster Transistors
• From 1985 to 2002, we saw ~7 new process
generations.
• Scaling provides smaller and faster
transistors. Performance improves ~1.4x
Year
A. Danowitz, K. Kelley, J. Mao, J. P. Stevenson, and M. Horowitz.
Stanford CPU DB. Accessed on Nov. 5, 2019. [Online]. Available:
http://cpudb.stanford.edu
Year
Figure source: Original data collected and plotted by M. Horowitz, F. Labonte, O. Shacham, K.
Olukotun, L. Hammond, and C. Batten. Dotted-line extrapolations by C. Moore: Chuck Moore,
2011, “Data processing in exascale-class computer systems,” The Salishan Conference on High
Speed Computing, April 27, 2011.
42 © 2019 Arm Limited
Limits to Single Core Performance
• On-chip wiring
• Wire delays scale relatively poorly compared to logic delays.
• This limits the amount of state reachable in one clock cycle.
• Unfortunately, this limits the performance of large complex processors.