kernel programming basics

A deep dive into Windows kernel programming guidelines, contrasting it with user-mode development. Explore critical topics like unhandled exceptions (BSOD), resource management, IRQL, C++ limitations, and the Kernel API. Essential reading for driver developers.

Windows

Kernel

Programming

In our last blog, we learned how to set up a driver development environment and create a simple tracing driver. Now, we will take a deep dive into the fundamentals of kernel programming.

Kernel Programming Guidelines

programming kernel drivers require using windows driver kit WDK. which will provides headers and libraries. kernel APIs, C functions, and more. it's the same as the user-mode APIs but there is some differences:

In user mode, unhandled exceptions crash only the process, but in kernel mode they crash the entire system.
User-mode processes automatically free memory and resources when process end. but kernel mode drivers must manually free resources, or leaks persist until next boot.
return values, user mode API may tolerate ignored errors, whereas kernel mode errors should always be handled.
Interrupt Request Level (IRQL), user mode operates at PASSIVE_LEVEL (0), while kernel mode can run at DISPATCH_LEVEL (2) or higher, affecting interrupt handling. we will see what is this later.
Bad coding, errors in user mode are typically confined to the process, but in kernel mode, they can impact the while system.
Testing and Debugging, User-mode development allows local testing, while kernel-mode requires a separate machine for debugging.
Libraries, User mode supports most C/C++ libraries (e.g., STL, Boost), but kernel mode restricts usage to a limited set.
Exception Handling: User mode supports C++ exceptions and Structured Exception Handling (SEH), while kernel mode only supports SEH.
C++ Usage: User mode has full C++ runtime support, but kernel mode lacks it, limiting C++ features.

Function Return Value

In user-mode programming, developers sometimes ignore function return values, operating on the assumption that the call will succeed. If an API call fails, the consequence is typically limited: the single process will crash, but the operating system remains stable.
but, in kernel-mode programming, this practice is dangerous. When a kernel API fails, the resulting error is not contained within a single process; it can lead to an unhandled exception that crash the entire operating system, causing a system wide crash (a "Blue Screen of Death" or kernel panic). , the golden rule for kernel development is always check the return status values from every kernel API call.

Interrupt Request Level (IRQL)

The Interrupt Request Level (IRQL) is a fundamental concept in the Windows kernel, acting as the processor's internal priority system. It dictates which code can execute at any given moment, ensuring that time-critical operations like handling hardware interrupts are not delayed by less urgent tasks.

think of IRQL as a set of priority lanes on a highway. the higher IRQL the mode critical the task and the fewer interruptions it can tolerate.

C++ usage

when writing C++ code in kernel mode (like Windows drivers), many high-level C++ features are restricted or entirely not allowed because of performance, safety, and memory constraints.

New and delete, not supported due to reliance on user mode heaps, so we will use kernel specific allocation functions.
Global variables with constructors, are not called without a runtime. Workaround include using an explicit Init function or dynamically allocating instances with overloaded new and delete, we will take example later.
C++ exception handling, try, catch, and throw don’t compile, use Structured Exception Handling (SEH) instead, covered in Chapter 6.
Standard Libraries,Unavailable due to user-mode dependencies, but C++ templates work and can mimic user-mode types like std::vector or std::wstring for kernel use.

The Kernel API

Kernel drivers use exported functions from kernel components. These functions will be referred to as the Kernel API. Most functions are implemented within the kernel module itself NtOskrnl.exe, but some may be implemented by other kernel modules, such the HAL hal.dll.

Kernel API with prefixes

In Windows kernel programming, Zw* functions are the kernel-mode equivalents of the Nt* functions that exist in user mode.User-mode processes call functions like NtCreateFile, NtOpenProcess, etc., which reside in NtDll.dll These functions perform a system call to enter kernel mode and execute the real implementation inside the Windows Executive (the core of the kernel).In kernel mode, you can call these same system services directly via Zw* functions (e.g. ZwCreateFile, ZwOpenProcess, etc.).

When thread makes a system call from user mode the kerenl need to know who called it, user or kernel. Windows stores this in hidden field in each thread's kernel structure KTHREAD called PreviousMode. this effects how the kernel behaves when executing system calls particularly security checks and pointer validation.

Example: NtCreateFile vs ZwCreateFile

let's say both function call into the same kernel implementation internally.

if user mode calls NtCreateFile, the kernel knows from PreviousMode that this came from user space. It validates all input pointers and performs access checks to avoid trusting unverified user memory.
If kernel mode calls ZwCreateFile, the kernel sees PreviousMode = KernelMode. it skips certain user-mode safety checks, assuming kernel code knows what it’s doing. This makes it faster but also dangerous if misused (passing a user pointer could crash the system).

Error Codes

Most kernel API functions return a status indicating success or failure of an operation. This is typed as NTSTATUS a signed 32-bit integer. The value STATUS_SUCCESS (0) indicates success. A negative value indicates some kind of error. You can find all the defined NTSTATUS values in the file <ntstatus.h>.

The macro NT_SUCCESS(status) is used to check whether the returned status indicates success (including informational codes) rather than an error, NT_SUCCESS is defined roughly as:

ntdef.h

#define NT_SUCCESS(Status) (((NTSTATUS)(Status)) >= 0)

usage code:

sample.cpp

NTSTATUS status = CallSomeKernelFunction();
if (!NT_SUCCESS(status)) {
    KdPrint((L"Error occurred: 0x%08X\n", status));
    return status;
}
// continue …
return STATUS_SUCCESS;

NT_SUCCESS(Status) checks whether the NTSTATUS value indicates success (non-negative).
When results cross from kernel mode to user mode (e.g., via IOCTL), the NTSTATUS must often be mapped to a Win32 error code (used by GetLastError).
This mapping is not exact multiple NTSTATUS codes can map to one Win32 error, and some have no direct equivalents.
Inside the kernel, you simply propagate NTSTATUS values; no mapping is needed.
Real output data is usually passed through output parameters, while the function’s return value is reserved for the NTSTATUS status code.

String

In kernel mode, there are several ways to represent strings. A string might be a simple Unicode pointer such as wchar_t* or one of its typedefs like WCHAR*. These are standard null-terminated Unicode strings similar to those used in user mode. but, most kernel functions that deal with strings expect them to be represented using the UNICODE_STRING structure It stores a wide-character (UTF-16) string (each character is 2 bytes).

ntdef.h

typedef struct _UNICODE_STRING {
	USHORT Length;
	USHORT MaximumLength;
	PWCH Buffer;
} UNICODE_STRING;
typedef UNICODE_STRING *PUNICODE_STRING;
typedef const UNICODE_STRING *PCUNICODE_STRING;

The Length member is in bytes (not characters) and does not include a Unicode NULL terminator, if one exists (a NULL terminator is not mandatory). The MaximumLength member is the number of bytes the string can grow to without requiring a memory reallocation.

Manipulating UNICODE_STRING structures is typically done with a set of Rtl functions that deal with strings

Dynamic Memory Allocation

as we discussed in first blog, kernel thread stack size is small, so any large chunk of memory should allocated dynamically (heap). the kernel provides three general memory pools for drivers to use.

Paged pool, memory pool that can be paged out if required, it might cause Page faults, mean if the data was in disk the system must pause the program to retrieve the data from the disk and load it back into RAM. and only then allow the program to continue this process known as handling a page fault.
Non-paged pool, memory pool that is never paged out and is guaranteed to remain in RAM. will never cause a page fault.
Non-pagedPoolNx, same as Non-paged, but with hardware protection mechanism, also known as Data Execution Prevention (DEP), when memory page (4kb) is marked as "Nx" the CPU understand that this memory region contains only data and is not allowed to contain executable.

Page faults cannot be handled at a high IRQL, as we discussed. Therefore, using non-paged memory ensures that a driver can safely access its data at any time. Not only is this necessary to avoid page-fault-related crashes, but it is also faster since the data is always in RAM.

so the rule is only allocate non-paged memory when required (e.g, for code or data accessed at high IRQL)

Pool types

When a driver needs to allocate memory dynamically, it calls a function like ExAllocatePoolWithTag, and one of the most important parameters it provides is a value from the POOL_TYPE enumeration.

This POOL_TYPE enum tells the memory manager exactly what kind of memory to allocate.

wdm.h

typedef enum _POOL_TYPE {
    NonPagedPool,
    NonPagedPoolExecute = NonPagedPool,
    PagedPool,
    NonPagedPoolMustSucceed = NonPagedPool + 2,
    DontUseThisType,
    NonPagedPoolCacheAligned = NonPagedPool + 4,
    PagedPoolCacheAligned,
    NonPagedPoolCacheAlignedMustS = NonPagedPool + 6,
    MaxPoolType,
    NonPagedPoolBase = 0,
    NonPagedPoolBaseMustSucceed = NonPagedPoolBase + 2,
    NonPagedPoolBaseCacheAligned = NonPagedPoolBase + 4,
    NonPagedPoolBaseCacheAlignedMustS = NonPagedPoolBase + 6,
    NonPagedPoolSession = 32,
    PagedPoolSession = NonPagedPoolSession + 1,
    NonPagedPoolMustSucceedSession = PagedPoolSession + 1,
    DontUseThisTypeSession = NonPagedPoolMustSucceedSession + 1,
    NonPagedPoolCacheAlignedSession = DontUseThisTypeSession + 1,
    PagedPoolCacheAlignedSession = NonPagedPoolCacheAlignedSession + 1,
    NonPagedPoolCacheAlignedMustSSession = PagedPoolCacheAlignedSession + 1,
    NonPagedPoolNx = 512,
    NonPagedPoolNxCacheAligned = NonPagedPoolNx + 4,
    NonPagedPoolSessionNx = NonPagedPoolNx + 32,

} POOL_TYPE;

as we can see we have to many types but drivers should use only these three type:

PagedPool.
NonPagedPool
NonPagedPoolNt.

Memory allocation functions

ExAllocatePool, Allocates memory from a pool (obsolete).
ExAllocatePoolWithTag, Allocates memory from a pool and associates a tag (4-char ID) for tracking. Recommended method.
ExAllocatePoolZero, Same as ExAllocatePoolWithTag, but zeroes the memory before returning.
ExAllocatePoolWithQuotaTag, Allocates memory and charges process quota (used when the driver acts on behalf of a process).
ExFreePool / ExFreePoolWithTag, Frees the previously allocated memory block.

when allocated memory in kernel mode, using function like ExAllocatePoolWithTag (replaced by ExAllocatePool2)you provide a 4 byte tag, this tag used to identify the purpose or origin of the memory allocation.

wdm.h

DECLSPEC_RESTRICT PVOID ExAllocatePool2(
  POOL_FLAGS Flags,
  SIZE_T     NumberOfBytes,
  ULONG      Tag
);

it help with debugging and tracking memory leaks. for example if your driver unloads but some allocations are not freed, tools like PoolMon can show that. this is example:

sample.cpp

PVOID buffer = ExAllocatePoolWithTag(NonPagedPoolNx, 1024, 'bufT');
// 'bufT' is the tag — it helps identify this allocation later

ExFreePoolWithTag(buffer, 'bufT');

we will cover more memory management function later of the future blogs.

Linked Lists

Kernel users circular doubly linked lists in many of it's internal data. for example all processes on the system are managed by EPROCESS structures, connected in circular doubly linked list. where it's head stored in kernel variable PsActiveProcessHead.

the first pointer store the next entry address. the second Blink it store the previous entry. it allow traversal in both directions and looping back to the head.

ntdef.h

typedef struct _LIST_ENTRY {
	struct _LIST_ENTRY *Flink;
	struct _LIST_ENTRY *Blink;
} LIST_ENTRY, *PLIST_ENTRY;

When you traverse a kernel linked list, you are moving between LIST_ENTRY members, not the larger structures that contain them. This leaves you with a pointer to a field inside an object, not a pointer to the object itself. for example

sample.cpp

// This is our user-defined structure that will be part of the list.
typedef struct _DRIVER_DATA {
    int DriverId;
    const char* DriverName;
    // The LIST_ENTRY field MUST be part of the structure.
    // It's the "link" in the chain.
    LIST_ENTRY Link;
} DRIVER_DATA, *PDRIVER_DATA;

In this case, we have a linked list of DRIVER_DATA objects. Each object is connected to the next via its Link field, which is a LIST_ENTRY structure. The entire list is managed by a separate 'head' gloable variable (like g_DriverListHead in the example ), not a field within the structure itself. When we traverse the list, we only get pointers to the LIST_ENTRY fields, not the full DRIVER_DATA object. Therefore, we must use CONTAINING_RECORD to get back to the full structure.

This tells the compiler to compute the actual object’s address and cast it to the correct type.

ntdef.h

#define CONTAINING_RECORD(address, type, field) \
    ((type *)((char*)(address) - (size_t)(&((type *)0)->field)))

To manage these lists, the kernel provides helper routines that work in constant time. InitializeListHead sets up an empty list where the head points to itself. InsertHeadList and InsertTailList add elements to the start or end. here is all the functions

Driver Object

As we recall, one of the parameters to the DriverEntry function is a pointer to a DRIVER_OBJECT. This object is important, as the driver uses it to configure its properties and to inform the I/O Manager about which operations it supports by populating its function pointers

For example, we previously used this object to set the DriverUnload routine. This registers a cleanup function that the I/O Manager will execute just before the driver is unloaded, as we have discussed."

By default, the kernel fills a driver's I/O dispatch table (MajorFunction array) with a function that automatically fails any request. This is a security measure, ensuring a driver only responds to operations it explicitly supports.
In DriverEntry, the developer's job is to replace the default failure function with their own routines for the I/O requests they want to handle. All other entries can be left alone.
At a bare minimum, any functional driver must handle IRP_MJ_CREATE and IRP_MJ_CLOSE. Without these, no application could even open or close a handle to the driver's device, making communication impossible.

Object Attributes

n the Windows kernel, you don't just "open a file" or "create a key." Instead, you interact with generic Objects managed by the Object Manager To create or open any of these objects (like files, registry keys, events, threads, etc.), you must first describe what you want and how you want to open it.
The OBJECT_ATTRIBUTES structure is that description. It's a standardized parameter used by many kernel functions (ZwCreateFile, ZwOpenKey, etc.) to specify the properties of the object you are about to create or open. here is structure defined as:

_OBJECT_ATTRIBUTES

typedef struct _OBJECT_ATTRIBUTES {
    ULONG Length;
    HANDLE RootDirectory;
    PUNICODE_STRING ObjectName;
    ULONG Attributes;
    PVOID SecurityDescriptor;
    PVOID SecurityQualityOfService;
} OBJECT_ATTRIBUTES;
typedef OBJECT_ATTRIBUTES *POBJECT_ATTRIBUTES;
typedef CONST OBJECT_ATTRIBUTES *PCOBJECT_ATTRIBUTES;

Length: Size of the structure, automatically set by initialization macros.
RootDirectory: Optional handle to a directory in the object manager namespace for relative object names. Set to NULL for fully qualified names.
ObjectName: A pointer to a UNICODE_STRING specifying the object’s name. Can be NULL for objects without names (e.g., processes identified by PID).
Attributes: Flags controlling the operation (see below for details)
SecurityDescriptor: Optional SECURITY_DESCRIPTOR for the new object. NULL applies a default descriptor based on the caller’s token.
SecurityQualityOfService: Optional impersonation and context tracking settings, rarely used for most objects. Refer to WDK documentation for specifics.

Device Objects

A driver's code is loaded into the kernel, but by itself, it's just a block of code with no way for the outside world to talk to it. A Device Object is the official "front door" or "service window" that a driver creates so that applications can find it and send it requests.

Analogy:

DRIVER_OBJECT: This is the company's registration and business license. It proves the company (the driver) is legitimate and lists its capabilities, but customers don't interact with the license itself.
DEVICE_OBJECT: This is the company's public-facing storefront. It's the physical place customers go to do business. A company can have multiple storefronts (a driver can have multiple device objects).

Without a Device Object, a driver is like a business with no storefront it exists, but no one can use its services.

How Do Applications Find and Use This "Storefront"?

An application can't just walk into the kernel space. There's a security barrier. The connection is made through a clever, three-step process:

The Driver Creates an Internal Name: The driver creates its Device Object with a private, kernel-only name, like \Device\MyDriver.
The Driver Creates a Public "Shortcut" (Symbolic Link): The driver then creates a Symbolic Link in a special, public directory (\??). This link might be named MyDriverLink and it simply points to the internal \Device\MyDriver name. Common examples you see every day are C:, which is a symbolic link to a hard disk device object.
The Application Uses CreateFile to Connect: A user-mode application uses the standard CreateFile function to open a connection. This function isn't just for files on disk; it's the universal Windows API for connecting to named kernel objects.

When an application calls CreateFile(L"\\\\.\\MyDriverLink", ...):

system sees the public name MyDriverLink.
it follows the shortcut to the internal \device\MyDRiver object.
it send an IRRMJCREATE request to the driver.
if the driver approves, CreateFile returns HANDLE. this handle is the application's ticket for all future communication.

What if a driver creates a Device Object but doesn't create a public symbolic link? This is rare, but some system devices, like \Device\Beep, do this.
In this case, a standard user-mode application using CreateFile can't find it. To connect, you must use a lower-level, "native" API called NtOpenFile. This function allows you to open a device using its direct, internal kernel name.

Next blog we'll use many of the concepts we learned in the previous blogs and build a simple driver.

Published 27 days ago