An Overview of the Intel 80x86 Architecture
A Brief History of the
Intel 80x86 Family
- Intel 4004 (1971) - Four bit CPU with 2300 transistors
- Intel 8008 (1972) and Intel 8080 (1974) - 8 bit microprocessors;
The latter has 16 bit-addressing, 6000 transistors and was used in the MITS
Altair 8800 microprocessor kit (1975).
- Intel 8086/88 (1978/79) - 16 bit microprocessors. Both
have 29,000 transistors. The 8088 version differed by having an 8-bit data
bus instead of a 16-bit data bus; 8088 was used by the IBM-PC (IBM 5150) introduced
in 1981; The 8086 introduced a family of up-ward compatible processors
- Intel 80386DX (1985) - a 32-bit processor - 275,000 transistors
- Intel 80486 (1989) - had a built-in Floating Point Unit (FPU) and
8 KB cache - 1.2 million transistors
- Intel Pentium (R) (1993) - 32 bit registers, 64-bit data
bus - 3.1 million transistors
- Intel Pentium-4 (R) (2000) - 42 million transistors
-> Intel's
Microprocessor Hall of Fame
Bits, Bytes and their Numbering
- byte addressable architecture
- bits within bytes numbered right (lsb) to left (msb) 0 to 7
+-+-+-+-+-+-+-+-+
7 6 5 4 3 2 1 0 <- bit number
bytes within word numbered right (LSB) to left (MSB) 0 to 1 - "little
endian"
+--------+--------+
1 0 <- byte number
Assembler uses LSB/MSB display convention so 1234h displays as 34
12
The Physical Structure of Memory
- 20 bit absolute addressing = 1 megabyte memory address space
- paragraphs : 64K non-overlapping 16 byte paragraphs; 0x00000 - 0x0000f
is paragraph 0, 0x00010 - 0x0001F is paragraph 1 etc
- physical segments : 64K overlapping 64K byte blocks beginning on
paragraph boundaries; 0x00000F - 0x0FFFF is physical segment 0; 0x00010h
- 0x1000F is physical segment 1 etc. (Do not confuse with non-overlapping
segments)
- physical segments "wrap-around"
+----------------+
| | 0x00000
- 0x0000F - paragraph 0 - segment 0 begins here
+----------------+
|
| 0x00010 - 0x0001F - paragraph 1 - segment 1 begins here
+----------------+
|
| 0x00020 - 0x0002F - paragraph 2 - segment 2 begins here
+----------------+
...
+----------------+
| | 0x0FFF0 - 0x0FFFF
- paragraph 0x0fff - segment 0 ends here
+----------------+
|
| 0x10000 - 0x1000F - paragraph 0x1000 - segment 1 ends here
+----------------+
| |
+----------------+
....
+----------------+
| | 0xFFFF0 - 0xFFFFF
- paragraph 0xFFFF - segment 0xF000 ends here
+----------------+
The Logical Structure of Memory
- segment:offset addressing converts two 16-bit values into 20-bit absolute
address
- absolute address := segment x 10h + offset or left shift 16-bit segment
value 1 hexadecimal digit and add 16-bit offset
- special segment registers (CS, DS, ES, and SS registers) hold
segment values
- program modules usually have 3 segments: Data, Code, Stack
+-------+
| PSP | program segment prefix
+-------+ <- DS
| Data |
+-------+ <- CS
| Code |
+-------+
| Stack |
+-------+ <- SS
The Structure of the CPU
- 8 (or 12) General Purpose 16 bit Registers
- Four 16-bit registers can be divided into two 8-bit registers
- AX = AH/AL - Accumulator (Accumulator High/Accumulator Low)
- BX = BH/AL - Base (High/Low)- can be used as "pointer"
- CX = CH/CL - Count (High/Low) - used for counting loops
- DX = DH/DL - Data (High/Low) - paired with AX for "combined" 32-bit
register. DX is high word, AX is low word.
- Four 16-bit registers used for Indexing and Stack
- SI - Source Index - used for indirect addressing
- DI - Destination Index - used for indirect addressing
- SP - Stack Pointer - accesses Stack Segment
- BP - Base Pointer (stack frame pointer)
- 6 Special Purpose Registers
- CS (Code); DS (Data); ES (Extra); (SS) Stack segment registers
- IP - Instruction Pointer holds address of next instruction (i.e.
a program counter)
- Flag Register - OF|DF|IF|TF|SF|ZF|AF|PF|CF
"flag" bits
- Overflow Flag set if last operation caused "signed overflowed"
- Sign Flag set if last operation was negative
- Zero Flag set if last operation resulted in 0
- Carry Flag set if last operation had carry out
Memory Address Spaces
- CS:IP pair points to code address space
- DS:offset points to data address space
- SS:SP pair points to stack address space
Intel 80x86 Instruction Format 1 - 5 bytes
The Intel 80x86 uses variable length, one address instructions.
+--------+--------+---------+--------+--------+
|[prefix]| op-code|[Mod r/m]| [1-2 byte disp] |
+--------+--------+---------+--------+--------+
The Mod r/m byte determines the addressing mode, whether the instruction
is memory to register, register to register, or register to memory and which
registers are used. Pentium's add an optional SIB (Scale, Index, Base) byte
after the Mod R/M (Mode, Register, Register/Memory) byte and allow 4-byte
displacements.
Native date types supported are on 8-bit byte or 16-bit word integers, signed
and unsigned (8086). Supports BCD arithmetic. No native floating point types
or operations (optional FPU until 80486DX). ISA includes string manipulation
instructions.
I/O on the Intel 80x86
- Programmed I/O using Ports (64K Port Address Space) - programmed I/O
thru AL/AX register
- Direct Memory Access I/O (e.g. video memory)
- Interrupts and BIOS ROM Service Routines
Intel 80x86 and MS-DOS
- Conventional Memory : 00000h - 9FFFFh (640K)
- 0x00000 - 0x003FF - Interrupt Vector Table
- 0x00400 - 0x005FF - BIOS and MS-DOS Data Area
- 0x00600 - ...... - MS-DOS, Device Drivers, COMMAND.COM (Resident
MS-DOS)
- User Programs
- ...... - 0x9FFFF - temp COMMAND.COM
- The 640K Barrier
- Upper Memory : 0xA0000 - 0xFFFFF (384 K)
- 0xA0000 - 0xAFFFF - EGA/VGA Video Memory
- 0xB0000 - 0xB7FFF - Monochrome Text Memory
- 0xB8000 - 0xBFFFF - CGA Memory
- 0xC0000 - 0xDFFFF - Installable ROM
- 0xE0000 - 0xFDFFF - Fixed ROM
- 0xFE000 - 0xFFFFF - BIOS ROM
Intel 80x86 Assembler Programming
The following 80x86 assembler program is a simple Hello World program.
The source code begins with a Comment Header Block followed by a Data Segment,
a Stack Segment, and a Code Segment.
Segments are defined by the paired directives
name segment
name ends
where name is chosen by the user (although we'll use data, mystack,
and code to identify the data, stack and code segments)
;-------------------------------------------------------
;
; File: Hello World
; Name:
; Date:
;
; Desc : Displays "Hello World!" on screen
;
;-------------------------------------------------------
DATA SEGMENT
Message db 'Hello World!',0dh,
0ah,'$'
DATA ENDS
;-------------------------------------------------------
MYSTACK SEGMENT STACK 'STACK'
db 32 DUP ('STACK ')
MYSTACK ENDS
;-------------------------------------------------------
CODE SEGMENT
ASSUME CS:CODE, DS:DATA, SS:MYSTACK
MAIN PROC FAR
mov ax, data
; Load DS with data segment address
mov ds, ax
mov dx, offset Message
; Load DX with pointer to message
mov ah, 09h
; Invoke MS-DOS interrupt 09h
int 21h
; to display message
mov ax, 4c00h
; Return to MS-DOS
int 21h
MAIN ENDP
CODE ENDS
END MAIN
The DATA Segment : Variables and constants
are declared here. The db directive (define byte) declares a variable
called message containing the string "Hello World!" followed by a carriage-return
(ASCII 0dh) line-feed (ASCII 0ah) combination. Strings are terminated by
'$' characters (similar to cstrings which are terminated by null characters).
The STACK Segment : In the stack segment
we simply allocate a block of memory for the stack. The db 32 DUP('STACK ') is a programming trick to allocate 32 x 8 bytes =
256 bytes of memory for a stack (the string 'STACK ' is eight
bytes long). Placing the ASCII characters 'STACK ' in the
memory allocated to the stack has no effect on the stack but lets us see the
stack under the debugger.
The Code Segment : The code goes here. Code
modules are called procedures and are marked by the directives
name proc FAR
name endp
within the code segment. Name is chosen by the user (by convention
we'll name our main procedure main).
The FAR directive refers to the fact that the program code is in a different
code segment from the operating system or what ever procedure calls the
program.
The main procedure must begin with the statements
mov ax, data
mov ds, ax
which properly initializes the DS segment register to hold the segment
address of the DATA segment
The main procedure must end with the statements
mov ax, 4c00h
int 21h
which effects a return to the (MS-DOS?) operating system.
The Assembly and Linking Process
After creating a text file containing Intel 80x86 assembler code (a .asm
file) you must assemble the source code into object code (.obj file) then
link the object code to obtain an executable file (.exe). We will use Borland's
Turbo Assembler (tasm) and Turbo Linker (tlink).
+---------+
| pgm.asm |
+---------+
|
| <- tasm
(Turbo Assembler)
|
+---------+
| pgm.obj |
+---------+
|
| <- tlink (Turbo
Linker)
|
+---------+
| pgm.exe |
+---------+
|
| <- to run, invoke
file name pgm.exe
|
The command to assemble a source code file is
C:\> tasm [options] source [,object [,listing
[,xref]]]
where anything in brackets is optional. The only option we need is
/zi which is required for the Turbo Debugger. File extensions for source,
object, listing, and xref are .asm (assembler), .obj (object), .lst (list),
and .xrf (cross-reference). Tasm assumes these file extensions if no file
extension is given. We will not use cross-reference files and unless you
want a .lst file use the command .
C:\> tasm /zi pgm, pgm
using no file extensions
The command to link an object code file is
C:\> tlink [options] objectfile(s) [,exefile
[,mapfile [,libfiles]]]
The only options we will use are /v which is required for the Turbo
Debugger and /x to prevent the generation of a .map file. Since we
will make no use of .map file or (initially) library files, use the command
C:\> tlink /v/x pgm, pgm
Again note that we don't use file extensions.
Return to Comp
255 Home Page