Beginner’s guide to x86-64 Assembly with NASM. Learn registers, stack, syscalls, and why Assembly is essential for cybersecurity, reverse engineering, and exploit development

Most people have conflicting opinions about learning Assembly. They often ask: “Why learn Assembly when modern compilers can generate much better optimized code?”
It’s a fair question. Compilers today are incredibly smart. So why bother?
The simple answer is this, Assembly helps you see what the computer is actually doing. Compilers write the code, but Assembly reveals the truth behind it.
It makes debugging easier because you can understand the real machine instructions when optimized code looks confusing.
In cybersecurity , it’s even more valuable. Reverse engineering malware, analyzing binaries, and building exploits all depend on reading and writing Assembly.
Even if you rarely write it, learning Assembly removes the mystery of how programs really work.
The most important part? You mainly need to learn how to read Assembly, not how to write it perfectly. This skill alone makes debugging easier and unlocks reverse engineering.
NASM, or the Netwide Assembler, is an 80x86 and x86-64 assembler designed for portability. It works by translating assembly language source files into machine-readable object files, which are then linked into executable programs.
NASM is chosen for educational purposes because it supports a wide range of output formats, allowing the same source code to be compiled for Windows, Linux, or macOS with minimal changes.
Before we begin, we must learn the basics of the CPU. We will cover essential concepts such as: What are registers? What is the stack? What are flags? and much more.
A register is small, high-speed storage located directly inside the CPU. Unlike your computer RAM which are external to the CPU. Registers built into the hardware of the CPU itself. In NASM we can interact with these registers by name such as eax, ebx, rcx to perform every operation in the program.
Best understood through example
1mov eax, 10
2mov ebx,20
3add eax, ebxAt the first line, we place the number 10 into the EAX register. On the second line, we place the number 20 into the EBX register. Finally, the third line adds the value in EBX to the value already in EAX. Because 10 plus 20 is 30, the EAX register now holds the final result of 30.
When we talk about the size of a register we mean how much data it can hold at one time.
There have been a few generations of sizes:
0xFF).0xFFFF).0xFFFFFFFF).0xFFFFFFFFFFFFFFFF).The "Backward Compatibility" Trick:
Modern 64-bit CPUs are backward compatible. This means a 64-bit register (like RAX) can be partially used as a 32-bit (EAX), 16-bit (AX), or 8-bit (AL / AH) register. This is crucial for running older software.
For example, as we can see in the above image, we have the RAX register which is 64-bit (8 bytes). But here's the trick it's divided into smaller pieces, all sitting on the right side:
It is Important to note: If you modify EAX, the upper 32 bits of RAX get zeroed out. But if you modify AL, only that last byte changes—the rest of RAX stays untouched! So think of it like a variable.
There are some of these registers you can use them for almost everything but historically they each have "favorite" job.
Here is the master lists:
RIP (or EIP in 32-bit) holds the memory address of the next instruction to be executed. The CPU automatically updates it after each instruction. You never write to it directly instead, you modify it using jump, call, and ret instructions (which we'll cover later).
RSP (Stack Pointer) points to the top of the stack. When you push data, RSP goes down; when you pop, it goes up. It automatically adjusts with push and pop instructions. We'll discuss the stack in detail later.
RBP (Base Pointer) acts as a fixed reference for the current function's stack frame. It helps you access local variables and parameters at consistent offsets. You'll often see it used with instructions like push rbp and mov rbp, rsp at the start of a function. We will dive deeper into the stack too.
The flags register isn't like the other registers it does not hold single number. Instead it's a collection of boolean values called flags. these flags are automatically set or cleared by arithmetic and comparison operations let's see the most important ones you'll encounter.
CF = 1 if an unsigned addition exceeds the maximum value that can fit in the register.
For example, AL (8-bits) can hold a maximum unsigned value of 255:
1mov al, 100
2add al, 200 Here we place 100 in AL. On the second line, we add 200 to AL, which becomes 300. Since AL cannot hold 300, the CF flag will be set to 1.
The Parity Flag is simple it just counts the number of 1s in the lowest 8 bits of the result.
Important: PF only looks at the lowest 8 bits the rest of the register is ignored!
1mov al, 5 ; AL = 00000101
2 ; Count the 1s: bits 0 and 2 are 1 → 2 ones
3 ; 2 is even → PF = 1
4
5add al, 0 ; AL stays 5, PF stays 1First we set AL to have 5 value which if we convert to binary will look like this 00000101 as we can see we have 2 bits are set to 1 which are even so the PF flag is set to 1.
The Overflow Flag is used for signed arithmetic. It's set to 1 when a signed operation produces a result too big or too small to fit in the register.
Think of it like this: signed numbers have a range for 8-bit, that's -128 to +127. If you go beyond that, OF tells you something went wrong.
Example: Overflow!
1mov al, 127
2add al, 1 Here we place 127 in AL, the maximum value for 8-bit signed integers. Adding 1 should give 128, but that's outside the signed range (-128 to +127). So AL wraps to -128, and the CPU sets OF = 1 to flag the signed overflow. CF = 0 since 128 fits in unsigned range, SF = 1 because the result is negative, and ZF = 0 since it's not zero.
ZF = 1 if the result of an operation is zero. ZF = 0 if the result is anything else.
1mov al, 5
2sub al, 5 Here we place 5 in AL. On the next line, we subtract 5 from AL, which gives us 0. Since the result is zero, the CPU sets ZF = 1.
SF = 1 if the result is negative. SF = 0 if the result is positive.
1mov al, 10
2sub al, 20 Here we place 10 in AL. On the next line, we subtract 20 from AL, which gives us -10. Since the result is negative, the CPU sets SF = 1.
Stack is a region of memory that works like LIFO (Last in, First out). The CPU use the stack for storing return address when function is called, passing arguments to function, and storing local variables.
Here's the tricky part: the stack grows downward in memory. This means when you add something to the stack, the stack pointer (RSP) moves to a lower memory address.
RSP points to the top of the stack. So when we push data, RSP decreases (moves down). When we pop data, RSP increases (moves up).
Example:
Let's say RSP starts at 0x1000
1push rax
2push rbx
3
4pop rcx
5pop rdx push rax. RSP starts at 0x1000, so it updates to 0x1000 - 8 = 0x0FF8. The value of RAX is stored at address 0x0FF8.push rbx. RSP is at 0x0FF8, so it updates to 0x0FF8 - 8 = 0x0FF0. The value of RBX is stored at address 0x0FF0.pop rcx. RSP is at 0x0FF0, so it loads the value from 0x0FF0 into RCX, then RSP updates to 0x0FF0 + 8 = 0x0FF8.pop rdx. RSP is at 0x0FF8, so it loads the value from 0x0FF8 into RDX, then RSP updates to 0x0FF8 + 8 = 0x1000 (back to where we started).This is where the magic happens:
call instruction, it does two things:
ret instruction, it does two things:
When first start the program, we start at 0x1000 address it will execute what ever and go to the next line which is in the RIP
Now we are in the line 0x2000 and will execute what ever in this address then go to next line base on RIP, notice how the RIP get automatically update to the next line
Now we are in line 0x3000 and in this line we hit `CALL` instruction which will do two things:
Now we are in my_function in line 0xA000 notice the RIP changed as well
By the CPU. we execute whatever in this address and go to next.
We hit the line 0xB000 we execute what ever and go to next
Now we are in line `0xC000` which have Ret instruction, it will do the following:
After we pop the return value from the stack the execution flow return to normal.
When a function is called, it needs to set up its own stack frame. This is done through the prologue (setup) and epilogue (cleanup).
At the start of a function, we save the current base pointer (RBP) and set it to the current stack pointer (RSP). This creates a fixed reference point for accessing local variables and parameters.
1my_function:
2 push rbp
3 mov rbp, rsp
4 sub rsp, 32
5 ; ... function body ...What's happening:
push rbp – Save the caller's RBP on the stack (so we can restore it later)mov rbp, rsp – Set RBP to point to the current stack top (our new frame base)sub rsp, 32 – Reserve space for local variables (grow stack down)Before returning, we restore the stack and base pointer to their original state.
1my_function:
2 push rbp
3 mov rbp, rsp
4 sub rsp, 32
5 ; ... function body ...
6 mov rsp, rbp ; Restore RSP to where RBP is (deallocate locals)
7 pop rbp ; Restore the caller's RBP
8 ret ; Return to callerWhat's happening:
mov rsp, rbp – Set RSP back to where RBP is (effectively deallocating local variables)pop rbp – Restore the caller's RBP from the stackret – Pop return address and jump backRSP moves constantly as we push and pop data, making it hard to access local variables. RBP acts as a fixed reference point that stays constant during the function, so we can always access variables at predictable offsets like [rbp-8]. We also save the caller's RBP at the start and restore it at the end so we don't corrupt the caller's stack frame. This creates a chain that allows nested functions to return properly and helps debuggers trace the call stack.
There are tons of instructions you might encounter, but I'll explain the most common ones you'll see when disassembling programs or writing simple assembly code. For a full list, check out this.
mov is the basic data-movement instruction. It copies data from a source to a destination. It does not do math, it does not change the source.
mov destination,source
mov [rbx], [rcx] is invalidmov eax, bl ; invalid: 32-bit ← 8-bit1mov eax, 10
2mov eax,rax
3mov rax, [0x1000]In last line, take the value stored at memory address 0x1000 and copy it into RAX. like dereference pointer in c or c++
ADD and SUB are the basic arithmetic instructions in x86. I’ll explain with examples and comments for clarity.
add destination, source
this mean destination = destination + source
destination can be a register or memory.source can be a register, memory, or immediate (constant).The same rules for sub
LEA (Load Effective Address) calculates an address and stores it in a register. It does NOT access memory it just does the math.
Syntax
lea destination, source
This means destination = address calculated from source.
Examples:
1lea rax, [rbx+8] ; RAX = RBX + 8 (just math, no memory access)
2lea rax, [rbx+rcx*4] ; RAX = RBX + RCX * 4 (array indexing)
3lea rax, [rip+10] ; RAX = RIP + 10 (position-independent addressing)mov will copy the actual value from memory or register to the destination. lea just calculates the address it doesn't access memory at all.
XOR performs a bitwise exclusive OR operation. It's very important in shellcode because it can zero registers without creating null bytes.
xor destination, source
This means destination = destination ^ source (bitwise XOR).
Flags affected:
1xor rax, rax ; RAX = 0 (clears register, no null bytes!)
2xor rbx, 0xFF ; Flip the lowest 8 bits of RBX
3xor rax, rcx ; RAX = RAX ^ RCXTEST performs a bitwise AND between two operands but does NOT store the result. It only sets flags.
test operand1, operand2
This means operand1 & operand2 (bitwise AND) but the result is discarded.
Flags affected:
1test rax, rax ; Check if RAX is zero
2jz zero_label ; Jump if RAX == 0
3
4test rax, 0x01 ; Check if bit 0 is set
5jnz bit_set ; Jump if bit 0 = 1
6
7test rax, rbx ; Check if RAX & RBX is zero
8jz zero_result ; Jump if no common bitsCMP compares two values by subtracting them behind the scenes. It sets flags but doesn't store the result.
cmp operand1, operand2
This means operand1 - operand2 (subtraction) but the result is discarded. Operands are unchanged.
Flags affected:
Examples:
1cmp rax, 10 ; Compare RAX with 10
2cmp rax, rbx ; Compare RAX with RBX
3cmp [rsp], 5 ; Compare value at RSP with 5cmp rax, 10 ; Compare RAX .asmwith 10
4cmp rax, rbx ; Compare RAX with RBX
5cmp [rsp], 5 ; Compare value at RSP with 5JMP is the unconditional jump instruction. It transfers control to another location in the program.
Syntax
jmp destination
This means RIP = destination (jump to that address).
Flags affected: None.
1jmp loop_start ; Jump to loop_start label
2jmp 0x00401000 ; Jump to specific memory address
3jmp [rsp] ; Jump to address stored at RSPConditional jumps check flags (set by CMP or TEST) and jump if the condition is true.
1je destination ; Jump if equal
2jne destination ; Jump if not equal
3jg destination ; Jump if greater (signed)
4jl destination ; Jump if less (signed)
5jge destination ; Jump if greater or equal (signed)
6jle destination ; Jump if less or equal (signed)
7ja destination ; Jump if above (unsigned)
8jb destination ; Jump if below (unsigned)
9jae destination ; Jump if above or equal (unsigned)
10jbe destination ; Jump if below or equal (unsigned)Common JCC instructions:
When writing functions in x64 assembly, you need to follow specific rules so your code works correctly with other functions (and with C/C++ code). This is called the calling convention.
A calling convention is a set of rules that defines:
When you call a function in Linux x86-64, arguments are passed using specific registers. The first argument goes into RDI, the second into RSI, the third into RDX, the fourth into RCX, the fifth into R8, and the sixth into R9. Any additional arguments beyond the sixth are passed on the stack in right-to-left order. This calling convention is designed to minimize memory access and keep most function parameters in fast CPU registers.
On Windows x86-64, the calling convention is different. The first argument is placed in RCX, the second in RDX, the third in R8, and the fourth in R9. If there are more arguments, they are passed on the stack in right-to-left order. The caller must always reserve 32 bytes of shadow space on the stack before making a function call, even if fewer arguments are used. This space is used by the callee for register spilling and maintains a consistent calling environment.
Example:
1sub rsp, 32 ; reserve shadow space (REQUIRED!)
2mov rcx, 5 ; arg 1
3mov rdx, 10 ; arg 2
4mov r8, 15 ; arg 3
5mov r9, 20 ; arg 4
6mov [rsp + 32], 25 ; arg 5 (stack, after shadow space)
7call my_function
8add rsp, 32 ; clean up shadow spaceBefore call function you must put arguments in the correct register before calling. for both windows and linux.
the function returns a value in RAX for example:
1my_function:
2 mov rax, 42
3 retWhen the caller executes call my_function, the function runs and places the value 42 in RAX. After ret returns control to the caller, the caller can access the result in RAX.
Note For system calls we put the syscall number in RAX register we will see it later.
This is the most important part! Some registers must be preserved, others can be freely modified.
Preserved Registers (Callee Save) , RBP, RBX and R12-R15. If you modify these registers you MUST save them at the start and restore them before returning from the function.
Scratch Registers (Caller Saved): RAX,RCX,RDX,RSI,RDI,R8-11,RSP. you can modify these as you want without saving them. the caller knows they may by changed.
The stack must be 16-byte aligned before a call instruction. This means RSP must be a multiple of 16. Why? Some SSE/AVX instructions require 16-byte alignment. Before you call a function, make sure that is true. just make sure Before calling a function, ensure you've done an EVEN number of pushes (or equivalent sub rsp, 8n).
Syscalls are how your program communicates with the operating system kernel. When you need to read a file, write to the console, or exit your program, you use a syscall.
Before a syscall executes, place:
Then execute syscall instruction. The kernel runs the operation and returns the result in the RCX.
Let's see an example:
1section .data
2 msg: db "Hello, World!", 10 ; 10 = newline character
3 len: equ $ - msg ; Calculate length (13 bytes)
4
5section .text
6 global _start
7
8_start:
9 ; Write "Hello, World!" to console
10 mov rax, 1 ; write syscall
11 mov rdi, 1 ; stdout (console)
12 mov rsi, msg ; address of the message
13 mov rdx, len ; length of the message
14 syscall ; kernel writes to console
15
16 ; Exit with code 0
17 mov rax, 60 ; exit syscall
18 mov rdi, 0 ; success (0)
19 syscall ; kernel exits programFirst, the program places 1 in RAX (write syscall), 1 in RDI (stdout), the address of msg in RSI, and the length in RDX. When syscall executes, the kernel writes "Hello, World!" to the console. Then the program places 60 in RAX (exit syscall) and 0 in RDI (success code). The second syscall terminates the program.
In NASM assembly, the program is divided into sections. Each section tells the assembler and linker where to place different parts of your code and data in the final executable.
This section used for Initialized data have read/write permissions.
1section .data
2 msg db "Hello, World!", 10 ; String with newline
3 len equ $ - msg ; Calculate length (constant)
4 num dq 42 ; 64-bit number
5 array dd 1, 2, 3, 4, 5 ; Array of 32-bit integers
6 flag db 1 ; Single byte (true/false)
7 pi dq 3.14159 ; Floating point numberDirectives in .data:
db – Define Byte (1 byte)dw – Define Word (2 bytes)dd – Define Double Word (4 bytes)dq – Define Quad Word (8 bytes)dt – Define Ten bytes (10 bytes)equ – Define a constant (no memory allocated).bss Section – Uninitialized DataThis section reserves space for variables that will be set at runtime. The linker initializes this section to zeros. have read/write permissions
Usage: Buffers, input storage, temporary data.
1section .bss
2 buffer resb 64 ; Reserve 64 bytes
3 input resq 10 ; Reserve 10 quad words (80 bytes)
4 temp resd 1 ; Reserve 1 double word (4 bytes)
5 big_buffer resb 4096 ; Reserve 4KB bufferresb – Reserve Byte (1 byte)resw – Reserve Word (2 bytes)resd – Reserve Double Word (4 bytes)resq – Reserve Quad Word (8 bytes)rest – Reserve Ten bytes (10 bytes).text Section – Executable CodeThis section contains the actual machine code instructions. It's marked as read-only and executable.
Usage: Your program code.
1section .text
2 global _start ; Make _start visible to linker
3
4_start:
5 ; Your code here
6 mov rax, 60
7 mov rdi, 0
8 syscallThe .rodata section stores read-only data—constants that your program can read but never modify.
Usage: Strings, constant values, lookup tables, format strings.
1section .rodata
2 greeting: db "Hello, World!", 0 ; String with null terminator
3 pi: dq 3.14159 ; Constant floating point
4 hex_values: db "0123456789ABCDEF" ; Lookup table
5 format: db "Value: %d", 10, 0 ; Format string for printfTo actually run the assembly code examples, you need to compile them. Here's how to do it on Linux.
First, make sure you have NASM (the assembler) and ld (the linker) installed:
sudo apt install nasm binutil this install both tools you need.
The assembler converts your assembly code into machine code (object file):
nasm -f elf64 add.asm -o add.o the -f elf64 output format for 64-bit linux. and the add.asm source file, and -o add.o output object file.
If you have a standalone program like Example 5 (syscall), link it:
ld add.o -o add This creates an executable file called add and you can just run it!
1section .text
2 global add_numbers
3
4; int add_numbers(int a, int b)
5; Parameters: rdi = a, rsi = b
6; Return: rax
7add_numbers:
8 mov rax, rdi ; rax = a
9 add rax, rsi ; rax = rax + b
10 ret1section .text
2 global get_max
3
4; int get_max(int a, int b)
5; Parameters: rdi = a, rsi = b
6; Return: rax (the larger value)
7get_max:
8 mov rax, rdi ; rax = a
9 cmp rax, rsi ; compare a with b
10 jge skip ; if a >= b, skip the next line
11 mov rax, rsi ; rax = b (if b is larger)
12skip:
13 ret1section .text
2 global count_to_n
3
4; void count_to_n(int n)
5; Parameter: rdi = n (how high to count)
6; This function prints numbers 1 to n using a loop
7count_to_n:
8 push rbp
9 mov rbp, rsp
10 sub rsp, 8 ; align stack
11
12 mov rax, 1 ; rax = counter (starts at 1)
13
14loop_start:
15 cmp rax, rdi ; compare counter with n
16 jg loop_end ; if counter > n, exit loop
17
18 ; Print rax (the current number)
19 mov rsi, rax ; convert rax to string... (simplified)
20
21 add rax, 1 ; increment counter
22 jmp loop_start ; jump back to loop_start
23
24loop_end:
25 add rsp, 8
26 pop rbp
27 ret1section .text
2 global use_rbx_and_rbp
3
4; void use_rbx_and_rbp(int a)
5; Parameter: rdi = a
6; This function modifies RBX and RBP (preserved registers)
7use_rbx_and_rbp:
8 push rbp ; SAVE RBP (preserved, must save!)
9 mov rbp, rsp
10 push rbx ; SAVE RBX (preserved, must save!)
11
12 ; Now we can use RBP and RBX freely
13 mov rax, rdi ; rax = parameter
14 mov rbx, 10 ; rbx = 10 (we modified rbx, so we saved it)
15 imul rax, rbx ; rax = rax * rbx
16
17 ; Clean up before returning
18 pop rbx ; RESTORE RBX
19 pop rbp ; RESTORE RBP
20 ret1section .data
2 msg db "Hello, Assembly!", 0
3 msg_len equ $ - msg
4
5section .text
6 global _start
7
8_start:
9 ; write(1, msg, msg_len)
10 mov rax, 1 ; syscall 1 = write
11 mov rdi, 1 ; fd = 1 (stdout)
12 mov rsi, msg ; buffer = msg address
13 mov rdx, msg_len ; count = msg_len
14 syscall ; execute syscall
15
16 ; exit(0)
17 mov rax, 60 ; syscall 60 = exit
18 mov rdi, 0 ; exit code = 0
19 syscall ; execute syscallThis was a brief explanation of how assembly works and the basics you need to know. If you're interested in learning more and diving deeper into assembly programming, here are some great resources, the best one is the art of assembly book, you have also Felix Cloutier's x86 Reference. Good luck!
Published 1 day ago