Ultimate Assembly Guide

──────────────────────────────────────────────────────────────────────────────────

            

Introduction

A few years back I did a deep dive in Reverse Engineering Malware and wanted to share what I learned and to refresh my knowledge on the Assembly language.

For those of you that don't know, Assembly language is a low-level programming language that provides a direct interface with the computer’s hardware. Assembly is a low-level programming language that computers use to interpret high level programming langauges such as Python or Java. Of course, at the end of the day computers speak in 0s and 1s (binary!), and ons and offs (electric current!), but when you reverse an executable or binary, you are most likely going to be seeing Assembly.

There are many many other reasons why knowing some Assembly as a computer nerd is useful, such as for debugging and optimizing code, and even directly programming embedded systems in Assembly.

So, lets get right to it!

──────────────────────────────────────────────────────────────────────────────────
                
            

Data Types (in C)

High level programming languages use data types to store information temporarily in memory, these data types effect how Assembly processes information. Knowing each types size and flags can be pretty handy later on.

Bytes Bits Data Types Instruction Suffix Suffix Name Register Naming Scheme
1 8 char, byte -b byte -l or -h
2 16 short -w word -x or nothing
4 32 int -l long e-
8 64 address, float, long -q quad word r-

──────────────────────────────────────────────────────────────────────────────────
                                
            

Operand Types

Immediate constant integer data $0x400, $-533 encoded with 1, 2, or 4 bytes
Register one of 16 integer registers %rax, %r13 {{%rsp is reserved for special use}}
Memory 8 consecutive bytes of memory at address given by register (%rax)

──────────────────────────────────────────────────────────────────────────────────
                    
            

Address Modes

* Translation Formula:

instr D(Rb, Ri, S), %rax  ->   %rax=Rb + Ri * S + D

D: Constant “displacement”- 1,2, or 4 byte value
Rb: Base Register
Ri: Index register {{except %rsp}}
S: Scale - 1, 2, 4, or 8

Possible Formats

Type Example syntax Value used
Register %rbp Contents of %rbp
Immediate $0x4 0x4
Memory 0x4 Value stored at address
symbol_name Value stored in global symbol_name.
symbol_name(%rip) %rip-relative addressing for global (see below)
(%rax) Value stored at address in %rax
0x4(%rax) Value stored at address %rax + 4
(%rax,%rbx) Value stored at address %rax + %rbx
(%rax,%rbx,4) Value stored at address %rax + %rbx*4
0x18(%rax,%rbx,4) Value stored at address %rax + 0x18 + %rbx*4

*jumps and function call instructions use the following format:

Type Example syntax Value used
Register *%rax Contents of %rax
Immediate .L3 Address of .L3 (compiler-generated assembly)
400410 or 0x400410 Given address
Memory *0x200b96(%rip) Value stored at address %rip + 0x200b96
*(%r12,%rbp,8) Other address modes accepted

──────────────────────────────────────────────────────────────────────────────────

Registers 101

A register is a small storage space available as part of the CPU & are typically addressed by other mechanisms than main memory & are much faster to access. Registers are used to store values for future usage by the CPU and they can be divided into the following classes.


General Purpose Registers:

EAX) Extended Accumulator Register ESI) Extended Source Index
EBX) Extended Base Register EDI) Extended Destination Index
ECD) Extended Counter Register EBP) Extended Base Pointer
EDX) Extended Data Register ESP) Extended Stack Pointer

Segment Registers:

Segment registers are used to make segmental distinctions in the binary. The hexadecimal value 0x90 can either represent an instruction or a data value. The CPU knows which one thanks to segment registers.

Status Flag Registers

Set to 1 or 0. If signed - FF= -1 (rather than 255)

z - zero flag, set when the result of the last operation is zero.
s - signed flag {for signed}, set to determine if values should be intercepted as signed or unsigned.
o - overflow flag {for signed}, set when the result of the last operation switches the most significant bit from either F to 0 or 0 to F.
c- carry flag {for unsigned}, set when the result of the last operation changes the most significant bit.

EX - addq src,dest <-> t=a+b
CF set if carry out from most significant bit (unsigned overflow)
ZF set if t==0
SF set if t < 0 (as signed)
OF set if two’s complement (signed) overflow (a>0 && b>0 && t < 0) || (a<0 && b<0 && t>=0)

**NOT SET BY leaq
EX - cmpq src2,src1 <-> src1-src2
CF set if carry out from most significant bit (unsigned comparisions)
ZF set if (src1-src2) == 0
SF set if (src1-src2) < 0 (as signed)
OF set if two’s complement (signed) overflow (s1>0 && s2>0 && (s1-s2) < 0) || (s1<0 && s2<0 && (s1-s2) >=0)
    (src1-src2) # 0 == (a # b) -- where # is <, >, ==, !=, >=, or <=

EIP - Extended Instruction Pointer
It points to the next instruction to be executed.


──────────────────────────────────────────────────────────────────────────────────
    

Registers Overview

Full register name (64-bit) 32-bit 16-bit 8-bit-low 8-bit high use in calling convention callee-saved?
General Purpose Registers:
%rax %eax %ax %al %ah Return value (accumulator) No
%rbx %ebx %bx %bl %bh Yes
%rcx %ecx %cx %cl %ch 4th function argument No
%rdx %edx %dx %dl %dh 3rd function argument No
%rsi %esi %si %sil 2nd function argument No
%rdi %edi %di %dil 1st function argument No
%r8 %r8d %r8w %r8b 5th function argument No
%r9 %r9d %r9w %r9b 6th function argument No
%r10 %r10d %r10w %r10b No
%r11 %r11d %r11w %r11b No
%r12 %r12d %r12w %r12b Yes
%r13 %r13d %r13w %r13b Yes
%r14 %r14d %r14w %r14b Yes
%r15 %r15d %r15w %r15b Yes
Special-purpose registers:
%rsp %esp %sp %spl Stack pointer Yes
%rbp %ebp %bp %bpl Base pointer Yes
%rip %eip %ip Instruction pointer *
%rflags %eflags %flags Flags and condition codes No

──────────────────────────────────────────────────────────────────────────────────
        

Instructions!

Instructions are the basic commands that a CPU executes. They are the building blocks of a program. There are three main types of instructions:

  1. Computation: instructions are executed on values (that are usually stored in registers) . Instructions are broken out by the destination and source? operand, with the flow being source -> destination (left->right)
  2. Data movement: instructions move data between registers and memory. source operand is listed first.
  3. Control Flow: CPU executes instructions in sequence. After each command is executed the program pointer is moved (set to a new value on the stack)

*AT&T Syntax is used {{Intel puts destination registers before source like most}}

Instructions

Instruction Example syntax C Equivalent
mov( src, dest); movl dest = src;
lea( src, dest); leaq (%rax, %rbx,2), %rcx dest = x+y*k
add( src, dest); addq %rax, %rbx dest += src;
sub( src, dest); dest -= src;
imul( src, dest); dest *= src;
inc( dest); dest = ++dest;
dec( dest); dest = --dest;
sal( src, dest); sal $2, %rax dest << src;
sar( src, dest); dest >> src;
shr( src, dest); dest >> src;
neg( dest); dest = -dest;
not( dest); dest = ~src;
xor( src, dest); dest ^= src;
and( src, dest); dest &= src;
or( src, dest); dest |= src;
cmp(p1,p2) cmpq %rsi, %rcx rcx - rsi <= 0 {true>jmp}
rcx <= rsi

Instruction Types

Instruction Purpose Size
b byte 1 byte == 8 bits
w word 2 bytes == 16 bits
l double word 4 bytes == 32 bits
q quad word 8 bytes == 64 bits

Combinations Types {{ex. Move}}

Instruction Source Destination Types
movl $0x4050 %eax Immediate - Register
movw %bp %sp Register - Register
movb (%rdi, %rcx) %al Memory - Register
movb $-17 (%esp) Immediate - Memory
movq %rax -12(%rbp) Register - Memory

*cannot do a memory-memory transfer in a single instruction.

Address Computation

%rdx = 0xf000; %rcx = 0x100

Expression Address Computation Address
0x8 (%rdx) 0xf000 + 0x8 0xf008
(%rdx,%rcx) 0xf000 + 0x100 0xf100
(%rdx,%rcx,4) 0xf000 + 0x100*4 0xf400
0x80(,%rdx,2) 2*0xf000+0x80 0x1e080

Reading Condition Codes

SetX Instructions:

 Set low order byte of destination to 0 or 1 based on combinations of condition codes.
 Does not alter remaining 7 bytes

SetX Condition Description
sete Equal Set if ZF=1
setne Not equal Set if ZF=0
setg Greater Set if ZF=0 and SF=OF
setge Greater or equal Set if SF=OF
setl Less Set if SF!=OF
setle Less or equal Set if ZF=1 or SF!=OF
seta Above Set if CF=0 and ZF=0
setae Above or equal Set if CF=0
setb Below Set if CF=1
setbe Below or equal Set if CF=1 or ZF=1
sets Sign Set if SF=1
setns Not sign Set if SF=0

EX - return x > y;

cmpq   %rsi, %rdi   # Compare x:y
setg   %al          # Set when >
movzbl %al, %eax    # Zero rest of %raxret

Jumping

Instruction Mnemonic C example Flags
j (jmp) Jump break; (Unconditional)
je (jz) Jump if equal (zero) if (x == y) ZF
jne (jnz) Jump if not equal (nonzero) if (x != y) !ZF
jg (jnle) Jump if greater if (x > y),signed !ZF && !(SF ^ OF)
jge (jnl) Jump if greater or equal if (x >= y),signed !(SF ^ OF)
jl (jnge) Jump if less if (x < y),signed SF ^ OF
jle (jng) Jump if less or equal if (x <= y),unsigned (SF ^ OF) || ZF
ja (jnbe) Jump if above if (x > y),unsigned !CF && !ZF
jae (jnb) Jump if above or equal if (x >= y),unsigned !CF
jb (jnae) Jump if below if (x < y),unsigned CF
jbe (jna) Jump if below or equal if (x <= y),unsigned CF || ZF
js Jump if sign bit if (x < 0),signed SF
jns Jump if not sign bit if (x >= 0),signed !SF
jc Jump if carry bit N/A CF
jnc Jump if not carry bit N/A !CF
jo Jump if overflow bit N/A OF
jno Jump if not overflow bit N/A !OF

──────────────────────────────────────────────────────────────────────────────────
        

Conditional Statements

Conditional Branch

EX -

if ( x > y) { return  x-y; }  else { return = y-x; } 
        cmpq %rsi, %rdi   #%rdi <= %rsi  {y<=x}
        jle .L4
        movq %rdi, %rax  
        subq %rsi, %rax   #%rax = %rdi - %rsi    {x-y}
        ret
    .L4    # y <= x
        movq %rsi, %rax   
        subq %rdi, %rax   #%rax = %rsi - %rdi
        ret

Conditional Expressions

val = Test ? Then : Else;                  “val = x>y ? x-y : y-x;”

Goto

EX -

if(x<=y){ 
            goto Else;
        } return x-y; 
        else: 
            return y-x; 

EX -

movq %rdi, %rax #x
subq %rsi, %rax #result = x-y
movq %rsi,%rdx
subq %rdi, %rdx #eval = y-x
cmpq %rsi, %rdi # x<=y
cmovle %rdx, %rax #if <=, result = eval
ret

──────────────────────────────────────────────────────────────────────────────────
            

Conclusion

So, that's it for the basics of Assembly. I hope you found this helpful. Happy coding!