Ultimate Assembly Guide
──────────────────────────────────────────────────────────────────────────────────
Introduction
A few years back I did a deep dive in Reverse Engineering Malware and wanted to share what I learned and to refresh my knowledge on the Assembly language.
For those of you that don't know, Assembly language is a low-level programming language that provides a direct interface with the computer’s hardware. Assembly is a low-level programming language that computers use to interpret high level programming langauges such as Python or Java. Of course, at the end of the day computers speak in 0s and 1s (binary!), and ons and offs (electric current!), but when you reverse an executable or binary, you are most likely going to be seeing Assembly.
There are many many other reasons why knowing some Assembly as a computer nerd is useful, such as for debugging and optimizing code, and even directly programming embedded systems in Assembly.
So, lets get right to it!
──────────────────────────────────────────────────────────────────────────────────
Data Types (in C)
High level programming languages use data types to store information temporarily in memory, these data types effect how Assembly processes information. Knowing each types size and flags can be pretty handy later on.
Bytes | Bits | Data Types | Instruction Suffix | Suffix Name | Register Naming Scheme |
---|---|---|---|---|---|
1 | 8 | char, byte | -b | byte | -l or -h |
2 | 16 | short | -w | word | -x or nothing |
4 | 32 | int | -l | long | e- |
8 | 64 | address, float, long | -q | quad word | r- |
──────────────────────────────────────────────────────────────────────────────────
Operand Types
Immediate | constant integer data | $0x400, $-533 | encoded with 1, 2, or 4 bytes |
Register | one of 16 integer registers | %rax, %r13 {{%rsp is reserved for special use}} | |
Memory | 8 consecutive bytes of memory at address given by register | (%rax) |
──────────────────────────────────────────────────────────────────────────────────
Address Modes
* Translation Formula:
instr D(Rb, Ri, S), %rax -> %rax=Rb + Ri * S + D
D: Constant “displacement”- 1,2, or 4 byte value
Rb: Base Register
Ri: Index register {{except %rsp}}
S: Scale - 1, 2, 4, or 8
Possible Formats
Type | Example syntax | Value used |
---|---|---|
Register | %rbp | Contents of %rbp |
Immediate | $0x4 | 0x4 |
Memory | 0x4 | Value stored at address |
symbol_name | Value stored in global symbol_name. | |
symbol_name(%rip) | %rip-relative addressing for global (see below) | |
(%rax) | Value stored at address in %rax | |
0x4(%rax) | Value stored at address %rax + 4 | |
(%rax,%rbx) | Value stored at address %rax + %rbx | |
(%rax,%rbx,4) | Value stored at address %rax + %rbx*4 | |
0x18(%rax,%rbx,4) | Value stored at address %rax + 0x18 + %rbx*4 |
*jumps and function call instructions use the following format:
Type | Example syntax | Value used |
---|---|---|
Register | *%rax | Contents of %rax |
Immediate | .L3 | Address of .L3 (compiler-generated assembly) |
400410 or 0x400410 | Given address | |
Memory | *0x200b96(%rip) | Value stored at address %rip + 0x200b96 |
*(%r12,%rbp,8) | Other address modes accepted |
──────────────────────────────────────────────────────────────────────────────────
Registers 101
A register is a small storage space available as part of the CPU & are typically addressed by other mechanisms than main memory & are much faster to access. Registers are used to store values for future usage by the CPU and they can be divided into the following classes.
General Purpose Registers:
EAX) Extended Accumulator Register | ESI) Extended Source Index |
EBX) Extended Base Register | EDI) Extended Destination Index |
ECD) Extended Counter Register | EBP) Extended Base Pointer |
EDX) Extended Data Register | ESP) Extended Stack Pointer |
Segment Registers:
Segment registers are used to make segmental distinctions in the binary. The hexadecimal value 0x90 can either represent an instruction or a data value. The CPU knows which one thanks to segment registers.
Status Flag Registers
Set to 1 or 0. If signed - FF= -1 (rather than 255)
z - zero flag, set when the result of the last operation is zero.
s - signed flag {for signed}, set to determine if values should be intercepted as signed or unsigned.
o - overflow flag {for signed}, set when the result of the last operation switches the most significant bit from either F to 0 or 0 to F.
c- carry flag {for unsigned}, set when the result of the last operation changes the most significant bit.
EX - addq src,dest <-> t=a+b
CF set if carry out from most significant bit (unsigned overflow)
ZF set if t==0
SF set if t < 0 (as signed)
OF set if two’s complement (signed) overflow (a>0 && b>0 && t < 0) || (a<0 && b<0 && t>=0)
**NOT SET BY leaq
EX - cmpq src2,src1 <-> src1-src2
CF set if carry out from most significant bit (unsigned comparisions)
ZF set if (src1-src2) == 0
SF set if (src1-src2) < 0 (as signed)
OF set if two’s complement (signed) overflow (s1>0 && s2>0 && (s1-s2) < 0) || (s1<0 && s2<0 && (s1-s2) >=0)
(src1-src2) # 0 == (a # b) -- where # is <, >, ==, !=, >=, or <=
EIP - Extended Instruction Pointer
It points to the next instruction to be executed.
──────────────────────────────────────────────────────────────────────────────────
Registers Overview
Full register name (64-bit) | 32-bit | 16-bit | 8-bit-low | 8-bit high | use in calling convention | callee-saved? |
---|---|---|---|---|---|---|
General Purpose Registers: | ||||||
%rax | %eax | %ax | %al | %ah | Return value (accumulator) | No |
%rbx | %ebx | %bx | %bl | %bh | – | Yes |
%rcx | %ecx | %cx | %cl | %ch | 4th function argument | No |
%rdx | %edx | %dx | %dl | %dh | 3rd function argument | No |
%rsi | %esi | %si | %sil | – | 2nd function argument | No |
%rdi | %edi | %di | %dil | – | 1st function argument | No |
%r8 | %r8d | %r8w | %r8b | – | 5th function argument | No |
%r9 | %r9d | %r9w | %r9b | – | 6th function argument | No |
%r10 | %r10d | %r10w | %r10b | – | – | No |
%r11 | %r11d | %r11w | %r11b | – | – | No |
%r12 | %r12d | %r12w | %r12b | – | – | Yes |
%r13 | %r13d | %r13w | %r13b | – | – | Yes |
%r14 | %r14d | %r14w | %r14b | – | – | Yes |
%r15 | %r15d | %r15w | %r15b | – | – | Yes |
Special-purpose registers: | ||||||
%rsp | %esp | %sp | %spl | – | Stack pointer | Yes |
%rbp | %ebp | %bp | %bpl | – | Base pointer | Yes |
%rip | %eip | %ip | – | – | Instruction pointer | * |
%rflags | %eflags | %flags | – | – | Flags and condition codes | No |
──────────────────────────────────────────────────────────────────────────────────
Instructions!
Instructions are the basic commands that a CPU executes. They are the building blocks of a program. There are three main types of instructions:
- Computation: instructions are executed on values (that are usually stored in registers) . Instructions are broken out by the destination and source? operand, with the flow being source -> destination (left->right)
- Data movement: instructions move data between registers and memory. source operand is listed first.
- Control Flow: CPU executes instructions in sequence. After each command is executed the program pointer is moved (set to a new value on the stack)
*AT&T Syntax is used {{Intel puts destination registers before source like most}}
Instructions
Instruction | Example syntax | C Equivalent |
---|---|---|
mov( src, dest); | movl | dest = src; |
lea( src, dest); | leaq (%rax, %rbx,2), %rcx | dest = x+y*k |
add( src, dest); | addq %rax, %rbx | dest += src; |
sub( src, dest); | dest -= src; | |
imul( src, dest); | dest *= src; | |
inc( dest); | dest = ++dest; | |
dec( dest); | dest = --dest; | |
sal( src, dest); | sal $2, %rax | dest << src; |
sar( src, dest); | dest >> src; | |
shr( src, dest); | dest >> src; | |
neg( dest); | dest = -dest; | |
not( dest); | dest = ~src; | |
xor( src, dest); | dest ^= src; | |
and( src, dest); | dest &= src; | |
or( src, dest); | dest |= src; | |
cmp(p1,p2) | cmpq %rsi, %rcx | rcx - rsi <= 0 {true>jmp} |
rcx <= rsi |
Instruction Types
Instruction | Purpose | Size |
---|---|---|
b | byte | 1 byte == 8 bits |
w | word | 2 bytes == 16 bits |
l | double word | 4 bytes == 32 bits |
q | quad word | 8 bytes == 64 bits |
Combinations Types {{ex. Move}}
Instruction | Source | Destination | Types |
---|---|---|---|
movl | $0x4050 | %eax | Immediate - Register |
movw | %bp | %sp | Register - Register |
movb | (%rdi, %rcx) | %al | Memory - Register |
movb | $-17 | (%esp) | Immediate - Memory |
movq | %rax | -12(%rbp) | Register - Memory |
*cannot do a memory-memory transfer in a single instruction.
Address Computation
%rdx = 0xf000; %rcx = 0x100
Expression | Address Computation | Address |
---|---|---|
0x8 (%rdx) | 0xf000 + 0x8 | 0xf008 |
(%rdx,%rcx) | 0xf000 + 0x100 | 0xf100 |
(%rdx,%rcx,4) | 0xf000 + 0x100*4 | 0xf400 |
0x80(,%rdx,2) | 2*0xf000+0x80 | 0x1e080 |
Reading Condition Codes
SetX Instructions:
Set low order byte of destination to 0 or 1 based on combinations of condition codes.
Does not alter remaining 7 bytes
SetX | Condition | Description |
---|---|---|
sete | Equal | Set if ZF=1 |
setne | Not equal | Set if ZF=0 |
setg | Greater | Set if ZF=0 and SF=OF |
setge | Greater or equal | Set if SF=OF |
setl | Less | Set if SF!=OF |
setle | Less or equal | Set if ZF=1 or SF!=OF |
seta | Above | Set if CF=0 and ZF=0 |
setae | Above or equal | Set if CF=0 |
setb | Below | Set if CF=1 |
setbe | Below or equal | Set if CF=1 or ZF=1 |
sets | Sign | Set if SF=1 |
setns | Not sign | Set if SF=0 |
EX - return x > y;
cmpq %rsi, %rdi # Compare x:y
setg %al # Set when >
movzbl %al, %eax # Zero rest of %raxret
Jumping
Instruction | Mnemonic | C example | Flags |
---|---|---|---|
j (jmp) | Jump | break; | (Unconditional) |
je (jz) | Jump if equal (zero) | if (x == y) | ZF |
jne (jnz) | Jump if not equal (nonzero) | if (x != y) | !ZF |
jg (jnle) | Jump if greater | if (x > y),signed | !ZF && !(SF ^ OF) |
jge (jnl) | Jump if greater or equal | if (x >= y),signed | !(SF ^ OF) |
jl (jnge) | Jump if less | if (x < y),signed | SF ^ OF |
jle (jng) | Jump if less or equal | if (x <= y),unsigned | (SF ^ OF) || ZF |
ja (jnbe) | Jump if above | if (x > y),unsigned | !CF && !ZF |
jae (jnb) | Jump if above or equal | if (x >= y),unsigned | !CF |
jb (jnae) | Jump if below | if (x < y),unsigned | CF |
jbe (jna) | Jump if below or equal | if (x <= y),unsigned | CF || ZF |
js | Jump if sign bit | if (x < 0),signed | SF |
jns | Jump if not sign bit | if (x >= 0),signed | !SF |
jc | Jump if carry bit | N/A | CF |
jnc | Jump if not carry bit | N/A | !CF |
jo | Jump if overflow bit | N/A | OF |
jno | Jump if not overflow bit | N/A | !OF |
──────────────────────────────────────────────────────────────────────────────────
Conditional Statements
Conditional Branch
EX -
if ( x > y) { return x-y; } else { return = y-x; }
cmpq %rsi, %rdi #%rdi <= %rsi {y<=x}
jle .L4
movq %rdi, %rax
subq %rsi, %rax #%rax = %rdi - %rsi {x-y}
ret
.L4 # y <= x
movq %rsi, %rax
subq %rdi, %rax #%rax = %rsi - %rdi
ret
Conditional Expressions
val = Test ? Then : Else; “val = x>y ? x-y : y-x;”
Goto
EX -
if(x<=y){
goto Else;
} return x-y;
else:
return y-x;
EX -
movq %rdi, %rax #x
subq %rsi, %rax #result = x-y
movq %rsi,%rdx
subq %rdi, %rdx #eval = y-x
cmpq %rsi, %rdi # x<=y
cmovle %rdx, %rax #if <=, result = eval
ret
──────────────────────────────────────────────────────────────────────────────────
Conclusion
So, that's it for the basics of Assembly. I hope you found this helpful. Happy coding!