Comp 255 - Computer Organization - Processors
(12/12/2003)
"A computer is an electronic (?) device operating under control of instructions
stored in its own memory unit that
- accepts data (input)
- processes data arithmetically and logically
- displays information (output) from the processing and/or
- stores the results for future use"
Processors or Central Processing
Unit
The Central Processing Unit is the brain of the computer.
It fetches and executes instructions stored in Main Memory. It is made up
of a number of sub-components.
- Control Unit - fetches and executes instructions from memory
- Arithmetic Logic Unit - performs arithmetic and logical operations
- Registers - local high-speed storage used for temporary storage
- general purpose registers : temporary storage for intermediate calculations
- special purpose registers
- program counter (PC) : holds address of next instruction. Also
called instruction pointer (IP).
- instruction register : holds current instruction to be executed
Instruction Excution
Fetch-Decode-Execute Cycle
- Fetch next instruction from memory to instruction register
- Increment program counter
- Decode instruction
- Fetch operand(s) (optional)
- Execute Instruction
- Write Back Results (optional)
and repeat
Von Neuman Architectures
G. Blaauw and F. Brooks in their book Computer Architecture list
seven criteria for von-Neumann architectures (p. 590).
- Single stream instructions sequenced by an instruction counter
- Instructions stored in memory as addressable memory
- Instructions encoded as numbers - modifiable by arithmetic operations
- Radix 2
- Word length long enough for scientific calculation
- Single address - single operation instructions
- Single accumulator with Multiplier-Quotient register
CPU Organization - The Data Path
registers parallel
bus +-------+ buses
+-> | | -> | -> | Data flows clockwise
| +-------+ | |
| | | -> | -> |
| +-------+ | | Data is gated from
| | | -> | -> | the registers thru
| +-------+ | | the parallel buses
| . . . | | to the ALU. The
| | | result is stored back
| +-------+ | | to a register
|-> | M A R | | |
| +-------+ | | MAR and MBR registers
|-> | M B R | -> | -> | provide access to
| +-------+ | | Main Memory
| +-+ +-+
| | '---' | Instructions control
| \ A L U / flow of data thru
| +-----+ data path
| |
+<------------------+
- Data Path : Registers plus ALU connected via buses.Data can be gated
from two registers through ALU where it operated on then stored back in a
third register. Two special registers, the Memory Address Register (MAR) and
the Memory Buffer Register (MBR) provide access to main memory. At the lowest
level, computer instructions control the "flow" of data through this Data
Path.
- ALU : arithmetic and logical operations
- Registers : store data
- Memory Address Register : memory address placed here
- Memory Buffer Register : data stored to memory or data read from
memory is here.
- Types of Instructions :
- register to register
- register to memory (or memory to register)
- memory to memory
RISC versus CISC
The "semantic gap" is the gap between machine
code (i.e what the Instruction Set of a computer could do) and high-level
languages (what the programmer wanted). During the 60's and 70's the approach
to close this gap was to add more and more complex instructions using micro-code
to implement them. Micro-coding kept the cost down (the hardware could be
simpler), provided flexibility (bugs were easy to correct and new instructions
could be added) and the cost of slower individual instruction execution
(instructions were interpreted by the micro-code) was offset by faster execution
of programs.
However, in the 80's people experimented with another approach. Close
the gap by designing very fast, very simple machines which would promote
the writing of very efficient compilers.In other words, instead of raising
the hardware, lower the software.
The Instruction Set (of a level) is the set of all instructions available
to the programmer at that level. Instruction Set Architectures (level 2),
fall into one of two categories. CISC or Complete Instruction Set Computers
typically have many instructions which are complex. They are usually implemented
in micro-code. RISC or Reduced Instruction Set Computers sometimes called
(Reduced Instruction Set Complexity) tend to have fewer instructions or
instructions which are less complex. They are implemented directly.
RISC Design Principles (or Design Principles for Modern Computers)
- Instructions are simple operations that can be executed in one clock
cycle
- Maximize Rate a which Instructions are Issued (contrast with
instruction execution)
- All Instructions Directly Executed in Hardware (instead of interpreted
vis micro-code)
- Instructions should be easy to decode;
- regular fixed length instructions
- few addressing modes
- Only Load and Store instruction should reference memory (since memory
reference is slow)
- Plenty of registers (use faster on-chip storage)
Instruction-Level Parallelism
Pipelining
- instructions are pre-fetched from memory and stored in pre-fetch buffer
- execution of an instruction broken down into separate stages (see
diagram below)
- trade-off between latency (time to execute one instruction)
and processor bandwidth (how many instructions are executed in a given
time period).
Diagram - 5 stage pipeline
s1
s2 s3
s4 s5
+------+ +------+ +-------+
+------+ +------+
|instr | |instr | |operand|
|instr | |write |
|fetch |--->|decode|--->| fetch |--->| exe |--->|
back |
|unit | | unit | |
unit | | unit | | unit |
+------+ +------+ +-------+
+------+ +------+
A five stage pipeline showing how instructions [1] - [5] progress through
pipeline
s1: |[1]|[2]|[3]|[4]|[5]|[6]|[7]|[8]|[9]|
s2: | |[1]|[2]|[3]|[4]|[5]|[6]|[7]|[8]|
s3: | | |[1]|[2]|[3]|[4]|[5]|[6]|[7]|
s4: | | | |[1]|[2]|[3]|[4]|[5]|[6]|
s5: | | | | |[1]|[2]|[3]|[4]|[5]|
time 1 2 3 4
5 6 7
Note a five fold increase in instruction execution once the pipeline is
full.
Superscalar Architecture : If one
pipeline is good, two is better, and four is even better! But having four
is too expensive - so the idea is to have one pipeline with multiple function
units - Stage S4 in the above diagram. This makes sense in that the Instruction
Execution Units are usually the slowest.
s4
+------+
| ALU |
+------+
+------+
| ALU |
+------+
s1
s2 s3
+------+ s5
+------+ +------+ +-------+
| LOAD | +------+
|instr | |instr | |operand|
+------+ |write |
|fetch |--->|decode|--->| fetch |--> +------+
-->| back |
|unit | | unit | |
unit | |STORE | | unit |
+------+ +------+ +-------+
+------+ +------+
+------+
| FPU |
+------+
Processor Level Parallelism
- array processors : multiple processors controlled
by single control unit (e.g. ILLIAC IV)
- vector processors : vector registers loaded
via single instruction from memory plus "vector" instructions which operate
on vectors using pipeline; easy to add to existing architectures
- multiprocessors : multiple processors sharing
common memory
- multicomputers : multiple computers networked
together
Flynn's Categories (1972)
- SISD (Single Instruction stream, Single Data stream)
- pipeline machines
- superscalar architectures
- SIMD (Single Instruction, Multiple Data stream)
- vector machines
- array processors
- MIMD (Multiple Instruction, Multiple Data)
- multi-processor - many processors with shared memory (w/ or w/o local
memory)
- multi-computers - a network of processors each with its own memory
(no shared memory)
Return
to Comp 255 Home Page