Assembly Programming Tutorial
Assembly Programming Tutorial
Audience
This tutorial has been designed for those who want to learn the basics of assembly
programming from scratch. This tutorial will give you enough understanding on assembly
programming from where you can take yourself to higher levels of expertise.
Prerequisites
Before proceeding with this tutorial, you should have a basic understanding of Computer
Programming terminologies. A basic understanding of any of the programming languages
will help you in understanding the Assembly programming concepts and move fast on the
learning track.
All the content and graphics published in this e-book are the property of Tutorials Point (I)
Pvt. Ltd. The user of this e-book is prohibited to reuse, retain, copy, distribute or republish
any contents or a part of contents of this e-book in any manner without written consent
of the publisher.
We strive to update the contents of our website and tutorials as timely and as precisely as
possible, however, the contents may contain inaccuracies or errors. Tutorials Point (I) Pvt.
Ltd. provides no guarantee regarding the accuracy, timeliness or completeness of our
website or its contents including this tutorial. If you discover any errors on our website or
in this tutorial, please notify us at contact@tutorialspoint.com
i
Table of Contents
About the Tutorial ···································································································································· i
Audience ·················································································································································· i
Prerequisites ············································································································································ i
Comments ··············································································································································· 9
ii
Compiling and Linking an Assembly Program in NASM·········································································· 11
8. ASSEMBLY ─ VARIABLES····································································································· 26
iii
9. ASSEMBLY ─ CONSTANTS ·································································································· 29
iv
15. ASSEMBLY ─STRINGS ········································································································· 59
v
1. ASSEMBLY ─ INTRODUCTION
Each family of processors has its own set of instructions for handling various operations such
as getting input from keyboard, displaying information on screen, and performing various
other jobs. These set of instructions are called 'machine language instructions'.
A processor understands only machine language instructions, which are strings of 1's and 0's.
However, machine language is too obscure and complex for using in software development.
So, the low-level assembly language is designed for a specific family of processors that
represents various instructions in symbolic code and a more understandable form.
It is most suitable for writing interrupt service routines and other memory resident
programs.
6
The fundamental unit of computer storage is a bit; it could be ON (1) or OFF (0). A group of
nine related bits makes a byte, out of which eight bits are used for data and the last one is
used for parity. According to the rule of parity, the number of bits that are ON (1) in each
byte should always be odd.
So, the parity bit is used to make the number of bits in a byte odd. If the parity is even, the
system assumes that there had been a parity error (though rare), which might have been
caused due to hardware fault or electrical disturbance.
The following table shows the positional values for an 8-bit binary number, where all bits are
set ON.
Bit value 1 1 1 1 1 1 1 1
Bit number 7 6 5 4 3 2 1 0
The value of a binary number is based on the presence of 1 bits and their positional value.
So, the value of a given binary number is:
1 + 2 + 4 + 8 +16 + 32 + 64 + 128 = 255
which is same as 28 - 1.
7
Hexadecimal Number System
Hexadecimal number system uses base 16. The digits in this system range from 0 to 15. By
convention, the letters A through F is used to represent the hexadecimal digits corresponding
to decimal values 10 through 15.
0 0 0
1 1 1
2 10 2
3 11 3
4 100 4
5 101 5
6 110 6
7 111 7
8 1000 8
9 1001 9
10 1010 A
8
11 1011 B
12 1100 C
13 1101 D
14 1110 E
15 1111 F
Example: Binary number 1000 1100 1101 0001 is equivalent to hexadecimal - 8CD1
To convert a hexadecimal number to binary, just write each hexadecimal digit into its 4-digit
binary equivalent.
Example: Hexadecimal number FAD8 is equivalent to binary - 1111 1010 1101 1000
Binary Arithmetic
The following table illustrates four simple rules for binary addition:
0 1 1 1
+0 +0 +1 +1
=0 =1 =10 =11
Rules (iii) and (iv) show a carry of a 1-bit into the next left position.
9
Example
Decimal Binary
60 00111100
+42 00101010
102 01100110
A negative binary value is expressed in two's complement notation. According to this rule,
to convert a binary number to its negative value is to reverse its bit values and add 1.
Example
Number 53 00110101
Add 1 1
To subtract one value from another, convert the number being subtracted to two's
complement format and add the numbers.
Example
Subtract 42 from 53.
Number 53 00110101
Number 42 00101010
10
Reverse the bits of 42 11010101
Add 1 1
53 - 42 = 11 00001011
The processor may access one or more bytes of memory at a time. Let us consider a
hexadecimal number 0725H. This number will require two bytes of memory. The high-order
byte or most significant byte is 07 and the low-order byte is 25.
The processor stores data in reverse-byte sequence, i.e., a low-order byte is stored in a low
memory address and a high-order byte in high memory address. So, if the processor brings
the value 0725H from register to memory, it will transfer 25 first to the lower memory address
and 07 to the next memory address.
x: memory address
When the processor gets the numeric data from memory to register, it again reverses the
bytes. There are two kinds of memory addresses:
Segment address (or offset) – starting address of a memory segment with the offset
value.
11
2. ASSEMBLY ─ ENVIORNMENT SETUP
Try the following example using our online compiler option available at
http://www.compileonline.com/
section .text
global _start ;must be declared for linker (ld)
_start: ;tells linker entry point
mov edx,len ;message length
mov ecx,msg ;message to write
mov ebx,1 ;file descriptor (stdout)
mov eax,4 ;system call number (sys_write)
int 0x80 ;call kernel
section .data
msg db 'Hello, world!', 0xa ;our dear string
len equ $ - msg ;length of our dear string
For most of the examples given in this tutorial, you will find a Try it option in our website
code sections at the top right corner that will take you to the online compiler. So just make
use of it and enjoy your learning.
12
A copy of Linux operating system
Installing NASM
If you select "Development Tools" while installing Linux, you may get NASM installed along
with the Linux operating system and you do not need to download and install it separately.
For checking whether you already have NASM installed, take the following steps:
3. If it is already installed, then a line like, nasm: /usr/bin/nasm appears. Otherwise, you
will see just nasm:, then you need to install NASM.
1. Check The netwide assembler (NASM) website for the latest version.
2. Download the Linux source archive nasm-X.XX.ta.gz, where X.XX is the NASM version
number in the archive.
3. Unpack the archive into a directory which creates a subdirectory nasm-X. XX.
4. cd to nasm-X. XX and type ./configure . This shell script will find the best C compiler
to use and set up Makefiles accordingly.
6. Type make install to install nasm and ndisasm in /usr/local/bin and to install the man
pages.
This should install NASM on your system. Alternatively, you can use an RPM distribution for
the Fedora Linux. This version is simpler to install, just double-click the RPM file.
13
3. ASSEMBLY ─ BASIC SYNTAX
section .data
section .bss
section .text
global _start
_start:
Comments
Assembly language comment begins with a semicolon (;). It may contain any printable
character including blank. It can appear on a line by itself, like:
14
; This program displays a message on screen
Macros.
The executable instructions or simply instructions tell the processor what to do. Each
instruction consists of an operation code (opcode). Each executable instruction generates
one machine language instruction.
The assembler directives or pseudo-ops tell the assembler about the various aspects of
the assembly process. These are non-executable and do not generate machine language
instructions.
Macros are basically a text substitution mechanism.
The fields in the square brackets are optional. A basic instruction has two parts, the first one
is the name of the instruction (or the mnemonic), which is to be executed, and the second
are the operands or the parameters of the command.
section .text
global _start ;must be declared for linker (ld)
_start: ;tells linker entry point
mov edx,len ;message length
mov ecx,msg ;message to write
mov ebx,1 ;file descriptor (stdout)
mov eax,4 ;system call number (sys_write)
int 0x80 ;call kernel
section .data
msg db 'Hello, world!', 0xa ;our dear string
len equ $ - msg ;length of our dear string
When the above code is compiled and executed, it produces the following result:
Hello, world!
1. Type the above code using a text editor and save it as hello.asm.
2. Make sure that you are in the same directory as where you saved hello.asm.
4. If there is any error, you will be prompted about that at this stage. Otherwise, an
object file of your program named hello.o will be created.
5. To link the object file and create an executable file named hello, type ld -m elf_i386
-s -o hello hello.o
16
6. Execute the program by typing ./hello
If you have done everything correctly, it will display ‘Hello, world!’ on the screen.
17
4. ASSEMBLY ─ MEMORY SEGMENTS
We have already discussed the three sections of an assembly program. These sections
represent various memory segments as well.
Interestingly, if you replace the section keyword with segment, you will get the same result.
Try the following code:
When the above code is compiled and executed, it produces the following result:
Hello, world!
Memory Segments
A segmented memory model divides the system memory into groups of independent
segments referenced by pointers located in the segment registers. Each segment is used to
contain a specific type of data. One segment is used to contain instruction codes, another
segment stores the data elements, and a third segment keeps the program stack.
In the light of the above discussion, we can specify various memory segments as:
Data segment - It is represented by .data section and the .bss. The .data section is
used to declare the memory region, where data elements are stored for the program.
18
This section cannot be expanded after the data elements are declared, and it remains
static throughout the program.
The .bss section is also a static memory section that contains buffers for data to be
declared later in the program. This buffer memory is zero-filled.
Stack - This segment contains data values passed to functions and procedures within
the program.
19
5. ASSEMBLY ─ REGISTERS
Processor operations mostly involve processing data. This data can be stored in memory and
accessed from thereon. However, reading data from and storing data into memory slows down
the processor, as it involves complicated processes of sending the data request across the
control bus and into the memory storage unit and getting the data through the same channel.
To speed up the processor operations, the processor includes some internal memory storage
locations, called registers.
The registers store data elements for processing without having to access the memory. A
limited number of registers are built into the processor chip.
Processor Registers
There are ten 32-bit and six 16-bit processor registers in IA-32 architecture. The registers
are grouped into three categories:
General registers,
Segment registers.
The general registers are further divided into the following groups:
Data registers,
Index registers.
Data Registers
Four 32-bit data registers are used for arithmetic, logical, and other operations. These 32-bit
registers can be used in three ways:
Lower halves of the 32-bit registers can be used as four 16-bit data registers: AX, BX,
CX and DX.
Lower and higher halves of the above-mentioned four 16-bit registers can be used as
eight 8-bit data registers: AH, AL, BH, BL, CH, CL, DH, and DL.
20
Some of these data registers have specific use in arithmetical operations.
CX is known as the count register, as the ECX, CX registers store the loop count in iterative
operations.
DX is known as the data register. It is also used in input/output operations. It is also used
with AX register along with DX for multiply and divide operations involving large values.
Pointer Registers
The pointer registers are 32-bit EIP, ESP, and EBP registers and corresponding 16-bit right
portions IP, SP, and BP. There are three categories of pointer registers:
Instruction Pointer (IP) - The 16-bit IP register stores the offset address of the next
instruction to be executed. IP in association with the CS register (as CS:IP) gives the
complete address of the current instruction in the code segment.
Stack Pointer (SP) - The 16-bit SP register provides the offset value within the
program stack. SP in association with the SS register (SS:SP) refers to be current
position of data or address within the program stack.
Base Pointer (BP) - The 16-bit BP register mainly helps in referencing the parameter
variables passed to a subroutine. The address in SS register is combined with the
offset in BP to get the location of the parameter. BP can also be combined with DI and
SI as base register for special addressing.
21
Index Registers
The 32-bit index registers, ESI and EDI, and their 16-bit rightmost portions, SI and DI, are
used for indexed addressing and sometimes used in addition and subtraction. There are two
sets of index pointers:
Source Index (SI) - It is used as source index for string operations.
Control Registers
The 32-bit instruction pointer register and the 32-bit flags register combined are considered
as the control registers.
Many instructions involve comparisons and mathematical calculations and change the status
of the flags and some other conditional instructions test the value of these status flags to take
the control flow to other location.
Overflow Flag (OF): It indicates the overflow of a high-order bit (leftmost bit) of
data after a signed arithmetic operation.
Direction Flag (DF): It determines left or right direction for moving or comparing
string data. When the DF value is 0, the string operation takes left-to-right direction
and when the value is set to 1, the string operation takes right-to-left direction.
Interrupt Flag (IF): It determines whether the external interrupts like keyboard
entry, etc., are to be ignored or processed. It disables the external interrupt when the
value is 0 and enables interrupts when set to 1.
Trap Flag (TF): It allows setting the operation of the processor in single-step mode.
The DEBUG program we used sets the trap flag, so we could step through the execution
one instruction at a time.
Sign Flag (SF): It shows the sign of the result of an arithmetic operation. This flag is
set according to the sign of a data item following the arithmetic operation. The sign is
indicated by the high-order of leftmost bit. A positive result clears the value of SF to 0
and negative result sets it to 1.
22
Auxiliary Carry Flag (AF): It contains the carry from bit 3 to bit 4 following an
arithmetic operation; used for specialized arithmetic. The AF is set when a 1-byte
arithmetic operation causes a carry from bit 3 into bit 4.
Parity Flag (PF): It indicates the total number of 1-bits in the result obtained from
an arithmetic operation. An even number of 1-bits clears the parity flag to 0 and an
odd number of 1-bits sets the parity flag to 1.
Carry Flag (CF): It contains the carry of 0 or 1 from a high-order bit (leftmost) after
an arithmetic operation. It also stores the contents of last bit of a shift or rotate
operation.
The following table indicates the position of flag bits in the 16-bit Flags register:
Flag: O D I T S Z A P C
Bit no: 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Segment Registers
Segments are specific areas defined in a program for containing data, code and stack. There
are three main segments:
Code Segment: It contains all the instructions to be executed. A 16-bit Code Segment
register or CS register stores the starting address of the code segment.
Data Segment: It contains data, constants and work areas. A 16-bit Data Segment
register or DS register stores the starting address of the data segment.
Apart from the DS, CS and SS registers, there are other extra segment registers - ES (extra
segment), FS and GS, which provide additional segments for storing data.
In assembly programming, a program needs to access the memory locations. All memory
locations within a segment are relative to the starting address of the segment. A segment
begins in an address evenly divisible by 16 or hexadecimal 10. So, the rightmost hex digit in
all such memory addresses is 0, which is not generally stored in the segment registers.
The segment registers stores the starting addresses of a segment. To get the exact location
of data or instruction within a segment, an offset value (or displacement) is required. To
reference any memory location in a segment, the processor combines the segment address
in the segment register with the offset value of the location.
Example:
Look at the following simple program to understand the use of registers in assembly
programming. This program displays 9 stars on the screen along with a simple message:
23
End of ebook preview
If you liked what you saw…
Buy it from our store @ https://store.tutorialspoint.com
24