hi = ((hi–1 >> 0x0D)|(hi–1 << (32 – 0x0D))) + ci.
We then ask the SMT solver to enumerate all solutions in which hn = 0. We created a Python implementation of this using Microsoft’s Z3 solver. It is capable of producing thousands of zero-hash strings within seconds. Here are ten of them.
LNZLTXWQYV TPLPPTVXWX
TPTPPTVTWX TPNPNTVWWY
TPNPLTVWWZ TPNPPTVWWX
TPNPZTVWWS TPVPZTVSWS
TPVPXTVSWT TPVPVTVSWU
So, for example, if we were to create a DLL with an exported function named “LNZLTXWQYVLoadLibraryA” that precedes the real LoadLibraryA, a Metasploit payload would mistakenly call our honeypot function.
SpyEye’s Hash
Finally, let’s take a look at an example from the wild: the hash used by the SpyEye malware, presented in Algorithm 2. “LoadLibraryA” hashes to 0xC8AC8026.
Algorithm 2 The find-API-by-hashing method used by SpyEye.
1: procedure HASH(name)
2: j ← 0
3: for i ← 0 to LEN(name) do
4: left ← (j << 0x07) & 0xFFFFFFFF
5: right ← (j >> 0x19)
6: j ← left | right
7: j ← j ^ name[i]
8: end for
9: return j
10: end procedure
As you can see, this is very similar to Metasploit’s method, in that it rotates the hash by seven bits for every character. However, unlike Metasploit’s additive method, SpyEye XORs the value of each character. That makes things a bit more complex, and it means that our trick of finding a string prefix that hashes to zero will no longer work. Nonetheless, this hash is not cryptographically secure, and is vulnerable to collision.
Once again, let’s encode it as a SMT problem with character variables c1, . . . , cn and hash variables h0, . . . , hn. The hash constraint this time is:
hi = ((hi–1 << 0x07)|(hi–1 >> 0x19)) ^ ci,
and we ask the SMT solver to enumerate solutions in which hn equals the same hash value of the string we want to collide with.
Once again, Microsoft’s Z3 solver makes short work of finding collisions. A Python implementation of this collision is attached to pocorgtfol2.pdf. Here is a sample of ten strings that all collide with “LoadLibraryA.”
RHDBJMZHQOIP ILPSKUXYYKKK
YMACZUQPXKKK KMACZUQPXBKK
KMICZUQPXBKO KMICZURPXBKW
KMICZUBPXBJW KMICZVBPXBRW
KMYCZVCPXBRW KMYCZVAPXBRG
Acknowledgments
This work was partially funded by the Halting Attacks Via Obstructing Configurations (HAVOC) project under Mudge’s DARPA Cyber Fast Track program, Digital Operatives IR&D, and our famous Single Malt Waterfall. With that said, the opinions and suspect Soviet cinematic similitudes expressed in this article are the authors’ own and do not necessarily reflect the views of DARPA or the United States government.
12:8 UMPOwn
by Alex Ionescu
With the introduction of new mitigation technologies such as DeviceGuard, Windows 10 makes it increasingly harder for attackers to enter the kernel through Ring 0 drivers, which are now subject to even stricter code integrity / signing verification, or through exploits, as increased mitigations and PatchGuard validations are used to detect these. However, even the best-written operating system with the best-intentioned team of developers will encounter vulnerabilities that mitigations may be unable to stop.
Therefore, the last key element needed in defending the security boundaries of the operating system is a sane response to quickly patch such vulnerabilities—without one, the entire defensive strategy falls apart. Incorrectly dismissing vulnerabilities as “too hard to exploit” or misunderstanding the security boundaries of the operating system can lead to unfixed vulnerabilities, which can then be used to work around the large amount of resources that were developed in creating new security defences.
In this article, we’ll take a look at an extremely challenging exploit—given a kernel function to signal an event (KeSetEvent), can reliable code execution from user-mode be achieved, if all that the attacker controls is the pointer to the event, which can be set to any arbitrary value? We’ll need to take a deep look at the Windows scheduler, understand the semantics and code flows of event signaling, and ultimately reveal a low-level scheduler attack that can result in arbitrary ROP-based exploitation of the kernel.
ACT I. Controlling RIP and RSP
Wait Object Signaling
To understand event signaling in the NT kernel, one must first understand that two types of events, and their corresponding wake logic mechanisms:
Synchronization Events, which have a wake one semantic
Notification Events, which have a wake any / wake all semantic
The difference between these two types of events is encoded in the Type field of the DISPATCHER_HEADER of the event’s KEVENT data structure, which is how the kernel internally represents these objects. As such, when an event is signaled, either KiSignalNotificationObject or KiSignalSynchronizationObject is used, which will wake up one waiting thread, or all waiting threads respectively.
How does the kernel associate waiting threads with their underlying synchronization objects? The answer lies in the KWAIT_-BLOCK data structure. Within which we find: the type of wait that the thread is performing and a pointer to the thread itself, known as a KTHREAD structure. The two types of wait that a thread can make are known as wait any and wait all, and they determine if a single signaled object is sufficient to wake up a thread (OR), or if all of the objects that the thread is waiting on must be signaled (AND). In Windows 8 and later, a thread can also asynchronously wait on an object—and associate an I/O Completion Port, or a KQUEUE as it’s known in the kernel, with a wait block. For this scenario, a new wait type was implemented: wait notify.
Therefore, simply put, a notification event will cause the iteration of all wait blocks—and the waking of each thread, or I/O completion port, based on the wait type—whereas a synchronization event will do the same, but only for a single thread. How are these wait blocks linked you ask? On Windows 8 and later they are guaranteed to all be allocated in a single, flat array, with a field in the KTHREAD, called WaitBlockCount, storing the number of elements. In Windows 7 and earlier, each wait block has a pointer to the next (NextWaitBlock), and the final wait block points back to the first, creating a circular singly-linked list. Finally, the KTHREAD structure also has a WaitBlockList pointer, which serves as the head of the list or array.
Internals Intermezzo
Let’s step back for a moment. We, from user mode, control the pointer to an arbitrary KEVENT, which we can construct in any way we want, and our goal is to obtain code execution in kernel mode. Based on the description we’ve seen so far, what are some ideas that come to mind? Certainly, we could probably cause some memory corruption or denial of service activity, by creating incorrect wait blocks or an infinite list. We could cause out-of-bounds memory access and maybe even flip certain bits in kernel-mode memory. But if the ultimate possibility (given the right set of constraints and linked data structures) is that a call to KeSetEvent will cause a thread to be woken, are we able to control this thread, and more importantly, can we get it to execute arbitrary code, in kernel mode? Let’s keep digging into the internals to find out more.
Thread Waking
Suppose there exists a synchronization event, with a single waiter. (Thus, a single wait block.) This waiter is currently blocked in a wait any fashion on the event and has no other objects that it is waiting on.50 The call to KeSetEvent will follow the following pattern: KeSetEvent → KiSignalSynchronizationObject → KiTryUnwaitThread → KiSignalThread
At the end of this chain, the thread’s state will have changed, going from what should be its current Waiting state to its new DeferredReady state, indicating that it is, in a way, ready to be prepped for execution. For it to be found in this state, it will be added to the queue of DeferredReady threads for the current processor, which lives in the K
PRCB’s DeferredReadyListHead lock-free stack list. Meanwhile, the wait block’s state, which should have been set to WaitBlockActive, will now migrate to WaitBlocklnactive, indicating that this is no longer a valid wait—the thread is ready to be awakened.
One of the most unique things about the NT scheduler is that it does not rely on a scheduler tick or other external event in order to kick off scheduling operations and pre-emption. In fact, any time a function has the possibility to change the state of a thread, it must immediately react to possible system-wide scheduler changes that this state transition has caused. Such functions implement this logic by calling the KiExitDispatcher function, with some hints as to what operation just occurred. In the case of KeSetEvent, the AdjustUnwait hint is used to indicate that one or more threads have potentially been woken.
One Does Not Simply Exit the Dispatcher . . .
Once inside KiExitDispatcher, the scheduler first checks if DeferredReady threads already exist in the KPRCB’s queue. In our scenario, we know this will be the case, so let’s see what happens next. A call to KiProcessThreadWaitList is made, which iterates over each thread in the DeferredReadyListHead, and for each one, a subsequent call to KiUnlinkWaitBlock occurs, which unlinks all wait blocks associated with this thread that are in WaitBlockActive state. Then, the AdjustReason field in the KTHREAD structure is set to the hint value we referenced earlier (AdjustUnwait here), and a potential priority boost, or increment, is added in the AdjustIncrement field of the KTHREAD. For events, this will be equal to EVENT_INCREMENT, or 1.
Standby! Get Ready for My Thread
As each thread is processed in this way, a call to KiReadyThread is finally performed. This routine’s job is to check whether or not the thread’s kernel stack is currently resident, as the NT kernel has an optimization that automatically causes the eviction (and even potential paging out) of the kernel stack of any user-mode waiting thread after a certain period of time (typically 4-6 seconds). This is exposed through the KernelStackResident field in the KTHREAD. In Windows 10, a process’ set of kernel stacks can also be evicted when a process is frozen as part of new behaviour for Modern (Metro) applications, so another flag, ProcessStackCountDecremented is also checked. For our purposes, let’s assume the thread has a fully-resident kernel stack. In this case, we move onto KiDeferredReadyThread, which will handle the DeferredReady → Ready (or Standby) transition.
Unlike a DeferredReady thread, which can be ready on an arbitrary processor queue, a Ready thread must be on the proper processor queue (and/or shared queue, in Windows 8 and later). Explaining the selection algorithms is beyond the scope of this article, but suffice it to say that the kernel will attempt to find the best possible processor among: idle cores, parked cores, heterogeneous vs. homogeneous cores, and busy cores, and balance that with the hard affinity, soft affinity/ideal processor, and group scheduling ranks and weights. Once a processor is chosen, the NextProcessor field in KTHREAD is set to its index. Ultimately, the following possibilities exist:
An idle processor was chosen. The KiUpdateThreadState routine executes and sets the thread’s state to Standby and sets the NextThread field in the KPRCB to the selected KTHREAD. The thread will start executing imminently.
An idle processor was chosen, which already had a thread selected as its NextThread. The same operations as above happen, but the existing KTHREAD is now pre-empted and must be dealt with. The thread will start executing imminently.
A busy processor was chosen, and this thread is more important. The same operations as in case #2 happen, with pre-emption again. The thread will start executing imminently.
A busy processor was chosen, but this thread is not more important. KiAddThreadToReadyQueue is used instead, and the state will be set to Ready instead. The thread will execute at a later time.
Internals Secondo Intermezzo
It should now become apparent that, given a custom KTHREAD structure, we can fool the scheduler into entering a scenario where that thread is selected for immediate execution. To make things even simpler, if we can force this thread to execute on the current processor, we can pre-empt ourselves and force an immediate switch to the new thread, without disturbing other processors and worrying about pre-empting other threads.
In order to go down this path, the KTHREAD we create must have a single, fixed, hard affinity, which will be set to our currently executing processor. We can do this by manipulating the Affinity field of the KTHREAD. This will ensure that the scheduler does not look at any idle processors. It must also have the current processor as its soft affinity, or ideal processor, so that the scheduler does not look at any other busy processors. By restricting all idle processors from selection and ignoring all other busy processors, the scheduler will have no choice but to pick the current processor.
Yet we still have to choose between paths #3 and #4, to get this new thread to appear “more important.” This is easily done by ensuring that our new thread’s priority (in the KTHREAD’s Priority) field will be higher than the current thread’s.
Completing the Exit
Once KiDeferredReadyThread is done with its business and returns to KiReadyThread, which returns to KiProcessThreadWaitList, which returns to KiExitDispatcher, it’s time to act. The routine will now verify if it’s possible to do so based on the IRQL at the time the event was signalled—a level of DISPATCH_-LEVEL or above will indicate that nothing can be done yet, so an interrupt will be queued, which should fire as soon as the IRQL drops. Otherwise, it will check if the NextThread field in the KPRCB is populated, implying that a new thread was chosen on the current processor.
At this point, NextThread will be set to NULL (after capturing its value), and KiUpdateThreadState will be called again, this time with the new state set to Running, causing the KPRCB’s CurrentThread field to now point to this thread instead. The old thread, meanwhile, will be pre-empted and added to the Ready list with KiQueueReadyThread.
Once that’s done, it’s time to call KiSwapContext. Once control returns from this function, the new thread will actually be running (i.e., it will basically be returning from whatever had pre-empted it to begin with), and KiDeliverApc will be called as needed in order to deliver any Asynchronous Procedure Calls (APCs) that were pending to this new thread.
KiExitDispatcher is done, and it returns back to its caller—not KeSetEvent! As we are now on a new thread, with a new stack, this will actually probably return to a completely different API, such as KeWaitForSingleObject.
Make It So—the Context Switch
To understand how KiSwapContext is able to change to a totally different thread’s execution context, let’s go inside the belly of the beast. The first operation that we see is the construction of the exception frame, which is done with the GENERATE_EXCEPTION_-FRAME assembly macro, which is public in kxamd64.inc. This essentially constructs a KEXCEPTION_FRAME on the stack, storing all the non-volatile register contents. Then, the SwapContext function is called.
Inside of SwapContext, a second structure is built on the stack, known as the KSWITCH_FRAME structure. It is documented in the ntosp.h header file, but not in the public symbols. Inside of the routine, the following key actions are taken on an x64 processor. (Similar, but uniquely different actions are taken on other CPU architectures.)
The Running field is set to 1 inside of the new KTHREAD.
Runtime CPU Cycles begin to accumulate based on the KPRCB’s StartCycles and CycleTime fields.
The count of context switches is incremented in KPRCB’s ContextSwitches field.
The NpxState field is checked to see if FPU/XSAVE state must be captured for the old thread.
The current value of the stack pointer RSP, is stored in the old thread’s KernelStack KTHREAD field.
RSP is updated based on the new thread’s KernelStack value.
A new LDT is loaded if the process owning the new thread is different than the old thread (i.e., a process switch has occurred).
In a similar vein to the above,
the process affinity is updated if needed, and a new CR3 value is loaded, again in the case of a process switch.
The RSP0 is updated in the current Task State Segment (TSS), which is indicated by the TssBase field of the KPCR. The value is set to the InitialStack field of the new KTHREAD.
The RspBase in the KPRCB is updated as per the above as well.
The Running field is set to 0 in the old KTHREAD.
The NpxField is checked to see if FPU/XSAVE state must be restored for the new thread.
The Compatibility Mode TEB Segment in the GDT (stored in the GdtBase field of the KPCR) is updated to point to the new thread’s TEB, stored in the Teb field of the KTHREAD.
The DS, ES, FS segments are loaded with their canonical values if they were modified.
The GS value is updated in both MSRs by using the swapgs instruction and reloading the GS segment in between.
The KPCR’s NtTib field is updated to point to the new thread’s TEB, and WRMSR is used to set MSR_GS_SWAP.
The count of context switches is incremented in KTHREAD’s ContextSwitches field.
The switch frame is popped off the stack, and control returns to the caller’s RIP address on the stack.
PoC or GTFO, Volume 2 Page 30