Understanding Threads and System Calls in Windows

An overview of threads, their properties, and states. This guide details user and kernel stacks, stack growth with guard pages, and how user-mode applications interact with the kernel through system calls and the general Windows architecture.

Windows
Kernel
Operating System
Banner image

Quick note: This blog leans heavily on the "Windows Kernel Programming" book. Think of it as our foundation, and I'll be adding my own insights on top.

Threads

A thread is the actual executing entity in a process. It uses the process’s resources (its virtual memory, handles, etc.) to do work.

things that thread owns:

  • Access mode, whether it is running in user mode or kernel mode. we will see later why we need to know this.
  • Execution context, current values of CPU registers, instruction pointer, flags, etc.
  • Stacks,it normally has a user-mode stack (for when in user mode) and a kernel stack (for when it transitions to kernel mode). we will look into it more.
  • Thread Local Storage (TLS), memory that is private to that thread, accessible via consistent APIs.
  • Base and current priority, A thread has a base priority, which is its starting, official rank. But the system can adjust this on the fly, giving it a current priority. For example, when you click a menu in an application, the OS gives that thread a temporary "boost" to ensure the menu opens instantly. This allows the system to handle something urgent, and then it can "throttle" the priority back down to let other, less critical threads run.
  • Processor affinity, which CPU cores the thread is allowed to run on.

Thread states

  • Running, it's currently executing on a CPU.
  • Ready , it’s ready to run but waiting for a CPU to be free.
  • Waiting (or blocked) , the thread is waiting for some event (e.g. I/O completion, a synchronization object) before it can proceed.

Thread Stacks

Every thread has at least one stack in kernel (system) space. this kernel stack is used when the thread executes in kernel mode, during system calls exceptions, our when transcreation from user mode into kernel mode.

A user-mode thread also has a user stack in it's process's user address space. the user stack is where regular functions calls local variables and return address are stored.

The kernel stack is small by default (for example, ~12kb on 32bit windows) it's always resident in RAM when the thread is in running or Ready state. this necessary because kernel code cannot handle incur page faults while executing.

But why the stack can not paged-out and must be in RAM?

  • If part of the kernel stack were not in RAM and a function tried to use it, a page fault would be triggered to load it from the disk. But the kernel can't always handle that. It might be running at a high IRQL (Interrupt Request Level), which is essentially a "do not disturb" mode for handling critical hardware or system events. When at a high IRQL or holding important locks, the system has strict rules forbidding operations like disk access (paging), so it cannot safely stop and wait for I/O to resolve a page fault. Trying to do so would lead to a system crash.
  • The user-mode stack, by contrast, resides in user address space and can be paged out like any other user memory, because user code executes under fewer constraints.

Most importantly: kernel stacks are small and have a fixed size. This means that allocating large variables on the stack or creating very deep chains of function calls is extremely risky in kernel mode. Code like device drivers must be written carefully to avoid this, as overflowing the kernel stack will crash the entire system. If you need more memory, you must allocate it from the kernel's heap (the paged or non-paged pool) instead and we will see how later on.

If a thread never enters user mode (for example, a pure kernel worker thread), it may only use its kernel stack and not have a user-mode stack.

User-mode stack growth (guard pages)

When thread is created, the OS reserves a large contiguous address range for the stack in the process virtual address space, but does not commit all of the space right away. instead:

  1. A small initial portion (a few pages) is committed at the top (where the stack begins)
  2. immediately following the committed region is a guard page (with the PAGE_GUARD protection) that guard page is committed but marked in a way that triggers an exception on first access. to see more details look here.
  3. the remainder of the reserved stack region above the guard page is uncommitted so it does no consume physical address.

This "reserved" region is private to the process and can't be used by others. "Uncommitted" simply means it's not backed by physical memory yet, so accessing it will cause a fault.

Now how does the thread's stack grows and accesses the guard process?

  1. when it access the guard page, it will Raises a STATUS_GUARD_PAGE_VIOLATION exception.
  2. This instantly triggers a special, controlled alarm in the OS called a STATUS_GUARD_PAGE_VIOLATION exception. This isn't a crash; it's a signal to the kernel that something needs attention.
  3. The OS's default exception handler (like a security guard responding to the alarm) immediately checks what happened. It looks at the memory address that caused the fault and asks, "Was this address inside the current thread's designated guard page?"
  4. If the answer is "yes," the OS knows it's not a random error but a legitimate request for more stack space. It then performs a clever, three-part trick:
    • It commits a new page of memory right below the old guard page, making it ready for use.
    • It disarms on the old guard page, turning it into a normal, usable piece of the stack.
    • It places a new guard page just below the newly committed region.
graph
1 (Stack grows downwards -->) 2+-------------------------------------------+ High memory addresses 3| Reserved (uncommitted) | 4| (Not usable yet) | 5+-------------------------------------------+ 6| Guard Page (Tripwire) | <--- The current boundary 7+-------------------------------------------+ 8| Committed stack pages (in use) | 9+-------------------------------------------+ 10| Committed initial stack region | 11+-------------------------------------------+ 12| (stack base / top) | Low memory addresses 13+-------------------------------------------+ 14

“Uncommitted” (or “reserved but not committed”) means that the address range is taken (no other allocation can use it in that process), but no physical memory or backing store is assigned yet.

  • It is not “free” — you can’t access it or use it until it’s committed.
  • It doesn’t consume RAM or page file space until you actually commit and use it. Microsoft Learn+2DZone+2
  • When you touch a page in that uncommitted area (e.g. via a read or write), the OS will detect that it’s in the reserved region (not committed) and fail or raise an exception unless a commit is done first.

This allows the stack to grow incrementally, one page at a time, rather than committing the entire maximum stack size in advance.Technically, Windows uses 3 guard pages rather than one in most cases.

[@portabletext/react] Unknown block type "divider", specify a component for it in the `components.types` prop

System Services (a.k.a. System Calls)

Applications need to perform various operations that are not purely computational, such as allocating memory, opening files, creating threads, etc. These operations can only be ultimately performed by code running in kernel mode. So how would user-mode code be able to perform such operations?

example,When a user in Notepad chooses File → Open, the Notepad process must open a file — but because it runs in user mode, it cannot access hardware or the filesystem directly. The process goes through several layers before entering kernel mode:

  1. Notepad (user mode) calls the CreateFile API — this function lives in kernel32.dll, which is part of the Windows subsystem.
    • Even though it looks like it opens the file, it actually just prepares parameters and performs some checks.
  2. kernel32.dll then calls NtCreateFile, which is implemented in ntdll.dll.
    • ntdll.dll contains the Native API, the lowest user-mode layer that interfaces directly with the Windows kernel.
    • This function still runs in user mode.
  3. Before crossing into kernel mode, ntdll.dll prepares the system service number (SSN) for the requested function (for example, NtCreateFile) and places it into a CPU register — on x64, this is typically RAX (EAX on x86).
  4. Then it executes a special CPU instruction —
    • syscall on x64
    • sysenter on x86 These instructions trigger a privilege level switch from user mode (ring 3) to kernel mode (ring 0).
  5. The CPU jumps to the System Service Dispatcher inside the Windows kernel.
    • The dispatcher reads the system service number from RAX/EAX.
    • It uses that number as an index into the System Service Dispatch Table (SSDT) — this table maps service numbers to kernel function addresses.

Next Blog series will talk about Handles and Objects, how they work what they are and more details.

Published Oct 28, 2025

  • The dispatcher locates and calls the corresponding kernel function — in this case, the kernel implementation of NtCreateFile inside the I/O Manager of the Windows kernel.
  • The kernel executes the real system call — actually opening the file, handling permissions, device drivers, etc.
  • Once done, it returns control to user mode, resuming execution at the instruction following the syscall in ntdll.dll.