This document provides information for the course ECE 365 Introduction to the Design of Digital Computers. It outlines details about the course including the professor, John Lee, meeting times and location, textbooks, objectives, outcomes, grading structure, and policies. The course will cover topics such as hardware organization, instruction set architecture, arithmetic and logic unit design, memory organization, and pipelining.
This document provides information for the course ECE 365 Introduction to the Design of Digital Computers. It outlines details about the course including the professor, John Lee, meeting times and location, textbooks, objectives, outcomes, grading structure, and policies. The course will cover topics such as hardware organization, instruction set architecture, arithmetic and logic unit design, memory organization, and pipelining.
This document provides information for the course ECE 365 Introduction to the Design of Digital Computers. It outlines details about the course including the professor, John Lee, meeting times and location, textbooks, objectives, outcomes, grading structure, and policies. The course will cover topics such as hardware organization, instruction set architecture, arithmetic and logic unit design, memory organization, and pipelining.
This document provides information for the course ECE 365 Introduction to the Design of Digital Computers. It outlines details about the course including the professor, John Lee, meeting times and location, textbooks, objectives, outcomes, grading structure, and policies. The course will cover topics such as hardware organization, instruction set architecture, arithmetic and logic unit design, memory organization, and pipelining.
Download as PPT, PDF, TXT or read online from Scribd
Download as ppt, pdf, or txt
You are on page 1of 59
ECE 365
Introduction to the Design of
Digital Computers Prof. John Lee Office: SL 160C Tel: 278-2267 Email:johnlee@iupui.edu
Dept. of Electrical and Computer Engineering IUPUI 2 Lets break Born in South Korea
Worked for the Agency for Defense Development for 10 years Received Ph.D. at Georgia Tech 3 Course Information Welcome to a boring class, computer architecture Web page: Oncourse CL Will be constantly updated, so check it out regularly Prerequisites: ECE 270 and 362 Meeting time: MW 3:00 - 4:15pm, SL 109 Office hours: MTuWTh 1:30 2:30pm Textbooks Main: C. Hamacher et al., Computer Organization, Fifth Edition, McGraw-Hill, 2002. ISBN: 0-07-232086-9 Auxiliary: N. Cater, Computer Architecture, Schaum's Outlines, McGraw-Hill, 2002. ISBN: 0-07-136207-X Reference: J. Levine, Linkers & Loaders, Morgan Kaufmann Pub., ISBN: 1-55860-496-0 Hamacher and Carter books 5 Objectives To Learn Hardware organization of computer systems Instruction Set Architecture (ISA) Instruction set and design consideration Addressing modes, stacks, subroutines Arithmetic and Logic Unit design Performance consideration Memory organization I/O interface design Direct Memory Access (DMA) Pipelining Shared and distributed memory processors Computer simulation of digital systems I will try to cover as much as possible 6 Outcomes expected Identify the components needed in digital computers Use different addressing modes and different instructions to develop assembly code programs Design a control strategy for a digital computer using a hardwired or microprogrammed approach Analyze the I/O organization of a digital computer Describe the function and operation of a memory system Design an Arithmetic Logic Unit Analyze the execution of instructions for a pipelined processor Describe the function and analyze the performance of the various components of a computer system 7 Grading Attendance and participation (5%) 1 Project consisting of two parts (15%) Groups of two Simulation using Unix machine Practice: VHDL and its tool experience Actual: Exploring and evaluating computer systems mainly in terms of cache performance Tiny bit of Unix commands are necessary Simulation tools require high-performance workstations Special computer architecture related projects are also welcome Due to be posted Exams 50%: Three good out of four tests In class, for 30-60 min, beginning of the class Tentative dates: 1/28, 2/18, 3/17, 4/7 Final: 30%, 5/2 (Wed), 3:30-5:30pm Final Grade is combination of absolute and relative to your peers in class Nevertheless, aim at A + or the first in class, and you will get A or A + 8 Warning Need to possibly memorize much stuff due to architectural operations of components Course will probably be boring because neither much math nor visible phenomena mainly understanding how something works Completely rely on powerpoint slides explanation, explanation and more explanation might not be necessary to attend every class if you miss class, you will probably be able to understand by perusing the textbook but do not miss intentionally roll call at each class Oncourse will be used heavily 9 Reasons to study ECE 365 Know more about computers Get a better grade Proceed to next levels and go to graduate school Design and construct your own or better computer Be good at fixing computers Use them effectively Enjoy more with them Help others Spend money wisely (e.g., buy better ones) Get a better job Earn money Dream come true More 10 Miscellaneous Policies Homework: Assigned, but do not submit. Self-study, already uploaded on Oncourse They are assumed to be assigned upon the completion of the corresponding chapter Due is one week after that; the solution will be uploaded upon due Tests No crib sheets allowed A calculator is allowed and will be cross-checked for memory erase Cheating The corresponding test is zero with possible F for the course The incident is reported to the school for further action During exams, touching any wireless devices are considered to be cheating Cell phones Set them to vibration mode upon entering the class Turn them off during exams Borrowing anothers calculator during an exam is not allowed Exams will not be returned but you will have chance to check grading and scores Please give me feedback immediately for any issue 11 Other Expectations On time class arrival Stay entire class Violating the above two will disturb the rest of class If you cannot organize your time, you should not be in the class Going out and coming back during the class is prohibited except emergency Reading news paper in the class is prohibited Using laptop for other purpose except the class is prohibited Should not be discourteous, abrasive, aggressive, hard to get along with You should not monopolize the class discussion Always be honest, appropriate and professional Official email must be used and expect to read within 24 hours Solutions to homework or tests must not be released to others beyond this class students
For unstated minor things: please visit http://rights.iupui.edu Any Questions? More materials for college life can be found in Oncourse: How to succeed in college classes Keys to college success
Any questions? 13 14 Chapter 1. Introduction Rapidly changing field: vacuum tube -> transistor -> IC -> VLSI doubling every 1.5 years: memory capacity processor speed (due to advances in technology and organization) Things youll be learning: how computers work in an instruction level how to program to operate a computer system How to interface among components issues affecting modern processors (caches, pipelines) Why learn this stuff? you majoring EE or ECE want to answer to some computer related questions from your family, friends and relatives you want to play or work with computers or may build nice software people enjoy using (need performance) you need to make a purchasing decision or offer expert advice SERGE LEEF Rapid Growth of Size & Complexity 1979 29,000 Transistors 8088 1982 134,000 Transistors 286 1985 275,000 Transistors 386 1989 1,290,000 Transistors 486 1993 3.1M+ Transistors Pentium 1995 5.5M+ Transistors Pentium Pro 1997 7.5m+ Transistors Pentium II 1999 9.5M+ Transistors Pentium III 2000 42M Transistors Pentium 4 2004 592M Transistors Itanium 2 (9MB cache)
Moores Law Transistor count will be doubled every 18 months
Gordon Moore, Intel co-founder 16 Integrated Circuits Capacity 17 Feature Size We are currently at 65nm and moving towards 45nm Intels Roadmap
18 19 Average Transistor Cost Per Year 20 What is a computer? Components: Processor(s) Co-processors (graphics, security) Memory (SRAM, DRAM, disk drives, CD/DVD) input (mouse, keyboard, mic) output (display, printer) network Main memory I/O controller I/O controller I/O controller Disk Graphics output Network Memory I/O bus Processor Cache Interrupts Disk 21 Interfacing Processors and Peripherals I/O Design affected by many factors (expandability, resilience) Performance: access latency throughput connection between devices and the system the memory hierarchy the operating system A variety of different users (e.g., banks, supercomputers, engineers) 22 P4: Prescott w/ 2MB L2 (90nm)
Prescott runs very fast (3.4+ GHz) 2MB L2 Unified Cache 12K* trace cache (think I$) 16KB data cache
Where is the cache? Why the visual differences? Why is it square? Whats with the colors?
TC L1D L2 23 Intel Dual-core D-Series
24 AMD Dual-core
Intel Core 2 Duo Shared L2 cache architecture advantages 25 26 DCA vs. FSB Approach DCA: Direct Connect Architecture thru HyperTransport NUMA (non-uniform memory access) 27 AMD Quad-core Discrete L2, Shared L3 cache architecture Tileras Tile64 Processor
28 Know more about your computer? HWMonitor CPU-Z PC.Wizard 29 30 I/O Devices Very diverse devices behavior (i.e., input vs. output) partner (who is at the other end?) data rate Device Behavior Partner Data rate (KB/sec) Keyboard (100cwpm) input human 0.01 Mouse input human 0.02 Voice input (2kHz) input human 4.00 Scanner (USB 2.0) input human 900.00 Audio output (CD) output human 88.00 Line printer (940lpm) output human 5.00 Laser printer (17ppm) output human 1000.00 Graphics display output human 60,000.00 Modem input or output machine 2.00-8.00 Network/LAN input or output machine 500-1,000,000.00 Floppy disk storage machine 100.00 Optical disk storage machine 1000.00 Magnetic tape storage machine 2000.00 Magnetic disk storage machine 2000.00-10,000.00 31 Instruction Set Architecture Abstraction Delving into the depths reveals more information An abstraction omits unneeded detail to help us cope with complexity What are some of the details that appear in these familiar abstractions? Compiler
lw r2, mem[r7] add r3, r4, r2 st r3, mem[r8] High Level Language main() { int i,b,c,a[10]; for (i=0; i<10; i++) a[2] = b + c*i; } Assembler ISA 32 Instruction Set Architecture (ISA) A very important abstraction interface between hardware and low-level software standardizes instructions, machine language bit patterns, etc. advantage: different implementations of the same architecture Ex: Intel and AMD disadvantage: sometimes prevents using new innovations
Modern instruction set architectures: X86 (iA32), PowerPC (e.g. G4, G5) Xscale, ARM, MIPS Intel/HP EPIC (iA64), AMD64, Intels EM64T, SPARC, HP PA-RISC, DEC/Compaq/HP Alpha 33 Review Basic Structure of Computers A contemporary computer is a fast electronic calculating machine that accepts digitized input information, processes it according to a list of internally stored instructions, and produces the resulting output information.
34 Definitions Computer architecture: The functional operation of the individual hardware units in a computer system and the flow of information among and the control of those units. Types of computers Personal computer: schools, business offices, desktop Portable laptop: used mainly for word processing, desktop High performance workstations: graphics and I/O capability, higher computational power, desktop Mainframes: business data processing in medium to large range corporations Supercomputers: large scale numerical calculations. 35 Basic Components of a Computer 36 Component definitions CPU (central processing unit) arithmetic and logic circuits in conjunction with the main control circuits (or simply the processor). Memory internal storage (SRAM) I/O input and output equipment, some standard equipment provide both input and output functions. Example of I/O equipment: keyboard, screen. 37 Information Information fed to the computer is either in the form of data or instructions. Instructions are explicit commands that Govern the transfer of information within a computer as well as between the computer and its I/O devices. Specify the arithmetic and logic operations to be performed. Program: a set of instructions that perform a task The program is usually stored in the memory. The processor fetches the instructions one at a time and performs the desired operation. Data are numbers and encoded characters that are used as operands by the operations.
Information handled by a computer must be encoded in a suitable format. Each number, character or instruction is encoded as a string of binary digits called bits, each has values of 0 or 1. 38 Operation of Computer Units INPUT UNIT Computers accept coded information through input units (read data). Example: the keyboard is wired so that whenever a key is pressed, the corresponding digit is automatically translated to its corresponding code and sent to the memory or to the processor. Other examples: joysticks, trackballs and mice. OUTPUT UNIT Computers send processed results to outside world. Example: monitors, printers CONTROL UNIT The task of the control unit is to coordinate the operation between the memory, ALU and I/O units. 39 Clock and Timing Clock: a circuit that generates a signal at a regular interval Timing signals: signals that determine when a given action is to take place. Data transfers between the processor and the memory are also controlled by the control unit through timing signals. The control circuitry is usually distributed throughout the machine.
40 Memory Unit The function of the memory unit is to store programs and data. Primary memory Primary storage or main memory is a fast memory that contains a large number of storage cells. Each cell is capable of storing one bit of information. Cells are processed in groups of fixed size called words. For easy access a distinct address is associated with each word location. Main memory is organized such that the contents of one word can be stored or retrieved in one basic operation. A given word is accessed by specifying its address and issuing a control command that starts the store or retrieval process. Memories in which any location can be reached in a short and fixed amount of time after specifying the address are called random- access memories (RAM). Secondary memory Example of secondary storage: disks, USB storages Primary storage is expensive but fast while Secondary storage is large but slow 41 ALU - Arithmetic and Logic Unit Most computer operations are executed in the ALU. Example: addition of two numbers located in the main memory The numbers are brought into the arithmetic unit. Actual addition is carried out in the ALU The sum is then stored in the memory or retained in the processor for immediate use Similarly, other arithmetic or logic operations are initiated by bringing the required operands into the ALU. Not all of the operands are located in main memory, some operands are kept in temporary storage(registers) for frequent access. Access times to registers is 10 times or more faster than access times to memory. 42 Basic Operation Concepts Example: Add the content of memory location LOCA to the content of register R0 and place the sum in R0.
Assembly Instruction: Add LOCA,R0
The instruction Add LOCA,R0 combines a memory access operation with an ALU operation. The instruction is first fetched from main memory. The operand at LOCA is then fetched and added to the contents of R0. The resulting sum is stored in R0. 43 An alternative way to implement the instruction
In most computers, the above task is performed using two instructions: Load LOCA,R1 Add R1,R0
Advantage: divide a job in small units so that multiple hardware units can work in parallel. 44 Processor block diagram 45 The processor contains ALU Control circuitry IR register: holds the instruction that is currently being executed. The IR contents are available to the control circuits which generate timing signals. PC register: keeps track of the execution of a program; it contains the memory address of the next instruction to be executed. R0 through Rn-1: n general purpose registers. MAR: memory address register, hold the address of the location to or from which data are to be transferred. MDR: contains the data to be written into or read out of the addressed location. 46 Instruction Execution Steps The PC is set to point to the first instruction in the program. The contents of the PC are transferred to the MAR. A read control signal is sent to memory. The addressed word (i.e., instruction) is read from memory and loaded into the MDR. The content of MDR is transferred to IR. If the instruction involves an ALU operation, obtain the required operands. If an operand resides in memory, it has to be fetched by sending its address to MAR and initiating a read cycle. When the operand has been read into MDR, it will be transferred to the ALU. 47 Instruction Execution Steps (cont.) When all the operands are available, the ALU performs the operation. If the result is to be stored in memory, it is sent to the MDR and its corresponding address is placed in the MAR and a write cycle is initiated. While an instruction is executed, the contents of the PC are incremented so that the PC points to the next instruction. Normal execution may be preempted with interrupts. An interrupt is a request from an I/O device for a service by the processor. The processor responds by executing an interrupt service routine. The state of the processor is saved prior to the interrupt service routine and restored after the return from the interrupt service routine. 48 Bus Structures For performance purposes, a computer is organized such that all of its units can handle one full word of data at a given time. All bits in a word are transferred in parallel over a group of wires. Bus: a group of wires that connects several devices. Buses usually carry data, address and control signals. Single bus configuration: all units are connected to this bus. Only one unit can control the bus at any given instant. Low cost, new devices can be easily added on the bus. Systems that contain multiple buses achieve more parallelism (better performance at an increased cost). Possible hierarchical structure 49 Software In order for a user to enter and run an application, the computer must already have some system software, which is a collection of programs used to perform the following functions. Receive and interpret user commands. Enter and edit application programs. Store files in secondary storage. Manage the storage and retrieval of files in secondary storage. Run standard applications such as spreadsheets. Control I/O units to receive input information and produce results. Translate programs from source prepared by the user into object form (machine instructions). It is called BIOS and/or OS 50 Software Development Environment Application programs are usually written in a high- level language (Basic, C, C++, C#, etc.) independent of a particular computer. The text entry and editing system software allows users to write a source program and store it in a file. A system software, the compiler, translates the high- level language into a machine language for a specific computer. Linking and running user-written applications with existing standard libraries.
51 Operating Systems The operating system (another system software) is a collection of routines used to control the sharing and interaction among different computer units. The OS assigns resources to individual tasks such as main memory, disk space, moves data between memory and disk, handles I/O operations. Example: part of a programs task involves Reading a data file from the disk into the main memory Performing some computation on the data Printing the result When execution of a program reaches the point where the data file is needed, the program requests from the OS to transfer the data file from disk to memory. When computation is completed, the application transfers control to the OS. An OS routine is used to print the results. 52 Computer time usage 53 Comments for Computer Time Usage The disk and the processor are idle during most of the time period t4 to t5. The operating system can load the next program to be executed into memory while the printer is operating. The operating system is responsible for using the resources as efficiently as possible when several application programs are to be executed.
54 Performance The total time needed to execute application program is the most important measure of performance. Performance is affected by By the choice of machine language instructions. The design of the hardware that constitutes the computer. The way in which the compiler translates programs into the machine language. At the start of execution, all instructions and data are stored in the main memory. The processor clock cycle is an important parameter 55 Performance analysis Clock rate: cycles per second, hertz (Hz). Example: 200 million cycles per second = 200 megahertz (MHz). A computer with a higher clock rate executes programs faster in general.
EXECUTION TIME ANALYSIS Let T be the time required to execute a program that is written in a given high level language. Complete execution of the program requires the execution of N machine language instructions, N is the total number of instructions executed including repeated instructions. Let S be the average number of basic steps per machine instruction. 56 Performance analysis (cont.) Let R be the clock rate in cycles per second N x S T = ------- R The performance parameter T for an application program is more important than the value of R. Rule: minimize N and S, and maximize R. How? A substantial improvement in performance can be achieved by allowing execution of instructions to overlap. => pipelining 57 Cache memory 58 Cache memory The internal speed of instruction execution is very high. It is considerably faster than the speed at which instructions and data are fetched from the main memory performance can be improved by minimizing the movement of instructions and data from the main memory. HOW? Processing is done by bringing instructions and data from the main memory into the cache when they are first needed. Subsequent requests for the same data and instructions (e.g. loop) are satisfied from the cache. Access time to the cache is much faster than to the main memory. Due to the limited storage capacity of the cache, the information in the cache must be replaced when new data and instructions are needed.
59 Other Performance Issues More time is needed to transfer program and data files from secondary storage disks into the main memory it is possible to perform transfers to and from secondary storage in parallel The use of pipelining and parallelism leads to increased program execution rates.
PARALLEL and DISTRIBUTED COMPUTING Computer systems have evolved from machines based on a single processing unit into configurations that contain a number of processors. Computer systems with multiple processors are useful because large computations can often be partitioned into a set of tasks and some of these can be executed in parallel (concurrently). A cluster of computing machines is cooperating each other to solve a larger problem into small pieces (divide-and-conquer).