PoC or GTFO, Volume 2

Page 22

by Manul Laphroaig

R0 to R12: General Use

Larger ARM chips, such as those in an early smartphone, support two instructions sets. If the least significant bit of the program counter is clear (0), then the 32-bit instruction set is used, whereas if that bit is set (1), the chip will use a 16-bit instruction set called Thumb. Registers are still 32 bits wide, but the instructions themselves are only a half-word. They must be half-word aligned.

Because Thumb instructions have fewer bits to spare, code in larger ARM machines will switch between ARM and Thumb as it is convenient. You can see this in the least significant bit of a function pointer, where an ARM function’s address will be even, while a Thumb function’s address will be odd.

The Cortex M3 devices speak a slimmer dialect than the big-iron ARM chips. This dialect drops the 32-bit wide instruction set entirely, supporting only Thumb and Thumb2 instructions.9 Because of this, all functions and all interrupt handlers are referred to by odd addresses, which are actually the address of the byte after the real starting address! If you see a call to 0x0800-5615, that is really a call to the Thumb code at 0x08005614.

Registers and Calling Convention

Arguments are passed to the child function from R0 to R3. R4 to R11 hold local variables, and the child function must restore them before returning to the parent function. Values are returned in R0 to R3, and these registers are not preserved by the child.

Much like in PowerPC and very unlike x86, the Link Register (R14, a.k.a. LR) holds the return address. A leaf function, having no children, might never write its return pointer to the stack. The BL instruction automatically moves the old Program Counter into the Link Register when calling a child, so parent functions must manually save R14 before calling children. The return instruction, BLR, functions by moving R14 (LR) into R15 (PC).

Memory Map

Figure 11.3 shows the memory layout of the STM32F405, a Cortex M4 device. Study this map for a moment, before we go on to how to use it in your adventure!

Because Cortex M devices have four gigabytes of address space but hardly a megabyte of Flash, they keep functionally different parts of memory at very different addresses.

Figure 11.3: STM32F40xxx Memory Map

Code memory is officially the range from 0x00000000 to 0x1FFFFFFF, but in many cases, you’ll find that Flash is also mapped at a second address, such as 0x08000000. When reverse engineering an application, you’ll find that it’s either written here or a few dozens of kilobytes later, to leave room for a bootloader.

SRAM is usually mapped to begin at 0x20000000, so it’s safe to assume that any read or write to an absolute address in this region is a global variable, and also that the stack and heap fit somewhere in this range. Unlike a desktop application, which loads its initial globals directly into a .data segment, an embedded application must manually initialize its data variables, possibly by copying a large chunk from Flash into SRAM.

Peripheral memory begins at 0x40000000. Both because peripherals are most often referred to by an explicit address, and because Flash comes with no linking systems or system calls, reads and writes to this region are a gold mine for a reverse engineer!

System control registers are at 0xE0000000. These are used to do things like moving the interrupt table or reading the chip’s model number.

Making Sense of Pointers

Let us teach you some nifty tricks about pointers in Thumb machines.

Back when ARM was first designed, 32-bit fixed-width instructions with 32-bit alignment were all the rage, and all the cool kids (POWER, SPARC, Alpha) used them. Later on, when the Thumb instruction set was being designed, its designers chose 16-bit instructions that could be mapped back to the same 32-bit core. The CPU would fetch a 32-bit ARM instruction if the least-significant bit of the program counter were even, and a 16- bit Thumb instruction if the program counter were odd.

But these Cortex chips generally ship just Thumb and Thumb2, without backward compatibility to 32-bit ARM instructions. So the trick, which you can try in the next section, is that data pointers are always even and instruction (function) pointers are always odd.

Making Sense of the Interrupt Table

Let’s take a look at the interrupt table from the beginning of a Cortex M firmware image. These are 32-bit little endian addresses, which are to be read backward, of course.

Note that the first word, 0x20001430, is in the SRAM region; this is because the first word of a Cortex M interrupt table is the initialization value for the Stack Pointer (R13). The second word, 0x08004121, is the initialization value for the Program Counter (R15), so we know the entry point of the application is Thumb2 code starting at 0x08004120.

Except for some reserved (zeroed) words, the handler addresses are all in Flash memory and represent the interrupt handler functions. We can look up the meaning of each handler in the specific chip’s programming guide, then chase the ones that are most relevant. For example, if we are reverse engineering a USB device, powered by an STM32F3xx, the STM32F37xx reference manual tells us that the interrupts at offsets 0x000000D8 and 0x0000001C handle USB events. These might be good handlers to reverse early in the process.

Loading into IDA Pro or Radare2

To load the application into IDA Pro or Radare2, you generally need to know the loading point and the locations of some other memories.

The loading point will be at or near the beginning of Flash, depending upon whether a bootloader comes before your image. If you are working from a JTAG dump, just use the address the image came from. If you are working from a .dfu (Device Firmware Update) file, it will contain a loading address in its header metadata.

When given a raw dump without a starting address, disassemble the instructions and try to find a loading address at which the interrupt handlers line up. (The interrupt vector table is usually at 0x00000000 or 0x08000000 at boot, but it can be moved to a new address by software.)

Making Sense of the Peripherals

The Cortex M3 contains two peripheral regions. At 0x40000000, you will find the most useful ones for reverse engineering applications, such as UART and USB controllers, General Purpose IO (GPIO), and other devices. Unfortunately, these peripherals are not generic to the Cortex M3 as an architecture; rather, they are specific to each individual chip.

Supposing you are reverse engineering an application for the STM32F3xx series, you would download the Peripheral Support Library for that chip from its manufacturer and eventually find yourself reading stm32f30x.h. For other chips, there are similar headers, each of which is written around C structs for register groups and preprocessor definitions for peripheral base addresses and offsets.

Suppose we know from reverse engineering a circuit board that USART2 is used by our target application to send packets to a radio chip, and we would like to search for all functions that use this peripheral. Working backwards, we find the following relevant lines in stm32f30x.h.

This means that USART2’s data structure is located at 0x4000-4400. From the USART_TypeDef structure, we know that data is received from USART2 by reading 0x40004424 and written to USART2 by writing to 0x40004428! Searching for these addresses ought to easily find us the read and write functions for that port.

Other Oddities

Please note that this guide has omitted many chip-specific features, and that each chip has its own little quirks. You’ll find different memory maps on each implementation, and anything that looks confusing is likely worth spending more time to understand.

For example, some ARM devices offer Core-Coupled Memory (CCM), which is SRAM that’s wired directly to the CPU’s internal data bus rather than to the main memory bus of the chip. This makes data fetches lightning fast, but has the complications that the memory is unusable for DMA or code fetches. Care for a non-executable stack, anyone?

Another quirk is that many devices map the same physical memory to multiple virtual locations. In some high-performance code, the use of both cached and uncached memory can allow for more efficient ope
ration.

Additionally, address zero often contains a duplicate of the boot memory, which is usually Flash but might be executable SRAM. Presumably this was done to allow for code that has compatible immediate addresses when booting from either memory, but PoC‖GTFO 10:8 describes a nifty little jailbreak that relies on dumping the 48K recovery bootloader of an STM32F405 chip out of Flash through a null-pointer read.

We hope that you’ve enjoyed this friendly little guide to the Cortex M3, and that you’ll keep it handy when reverse engineering firmware from that platform.

11:7 A Ghetto Implementation of CFI on x86

by Jeffrey Crowell

In 2005, M. Abadi and his gang presented a nifty trick to prevent control flow hijacking, called Control Flow Integrity. CFI is, essentially, a security policy that forces the software to follow a predetermined control flow graph (CFG), drastically restricting the available gadgets for return-oriented programming and other nifty exploit tricks.

Unfortunately, the current implementations in both Microsoft’s Visual C++ and LLVM’s clang compilers require source to be compiled with special flags to add CFG checking. This is sufficient when new software is created with the option of added security flags, but we do not always have such luxury. When dealing with third party binaries, or legacy applications that do not compile with modern compilers, it is not possible to insert these compile-time protections.

Luckily, we can combine static analysis with binary patching to add an equivalent level of protection to our binaries. In this article, I explain the theory of CFI, with specific examples for patching 32-bit x86 ELF binaries—without the source code.

CFI is a way of enforcing that the intended control flow graph is not broken, that code always takes intended paths. In its simplest applications, we check that functions are always called by their intended parents. It sounds simple in theory, but in application it can get gnarly. For example, consider these three functions.

For them, our pseudo-CFI might look like the following, where called_by_x checks the return address.

Of course, this sounds quite easy, so let’s dig in a bit further. Here is a very simple example program to illustrate ROP, which we will be able to effectively kill with our ghetto trick.

In x86, the stack has a layout like this

Local Variables

Saved ebp

Return Pointer

Parameters

. . .

By providing enough characters to smashme, we can overwrite the return pointer. Assume for now, that we know where we are allowed to return to. We can then provide a whitelist and know where it is safe to return to in keeping the control flow graph of the program valid.

Figure 11.4 shows the disassembly of smashme() and main(), having been compiled by GCC.

Great. Using our whitelist, we know that smashme should only return to 0x08048456, because it is the next instruction after the ret. In x86, ret is equivalent to something like the following. (This is not safe for multi-threaded operations but we can ignore that for now.)

Cool. We can just add a check here. Perhaps something like this?

Now just replace our ret instruction with the check. ret in x86 is simply this:

where our code is this:

Sadly, this will not work for several reasons. The most glaring problem is that ret is only one byte, whereas our fancy checker is fifteen bytes. For more complicated programs, our checker could be even larger! Thus, we cannot simply replace the ret with our code, as it will overwrite some code after it—in fact, it would overwritemain. We’ll need to do some digging and replace our lengthy code with some relocated parasite, symbiont, code cave, hook, or detour—or whatever you like to call it!

Figure 11.4: Disassembly of main() and smashme().

Nowadays there aren’t many places to put our code. Before x86 got its no-execute (NX) MMU bit, it’d be easy to just write our code into a section like .data, but marking this as +x is now a huge security hole, as it will then be rwx, giving attackers a great place for putting shellcode. The .text section, where the main code usually goes, is marked r-x, but there’s rarely slack space enough in this section for our code.

Luckily, it’s possible to add or resize ELF sections, and there’re various tools to do it, such as Elfsh and ERESI. The challenge is rewriting the appropriate pointers to other sections; a dedicated tool for this will be released soon. Now we can add a new section that is marked as r-x, replace our ret with a jump to our new section—and we’re ready to take off!

Well, wheels aren’t up yet. As mentioned before, ret is just c3, but absolute jumps are five bytes.

So what is left to do? Well, we can simply rewind to the first complete opcode five bytes before the ret, and add a jump, then relocate the remaining opcodes. We could do something like this.

Here, parasite is mapped someplace else in memory, such as our new section.

With this technique, we’ll still to have to pass on protecting a few kinds of function epilogues, such as where a target of a jump is within the last five bytes. Nevertheless, we’ve covered quite a lot of the intended CFG.

This approach works great on platforms like ARM and MIPS, where all instructions are constant-length. If we’re willing to install a signal handler, we can do better on x86 and amd64, but we’re approaching a dangerous situation dealing with signals in a generic patching method, so I’ll leave you here for now. The code for applying the explained patches is all open source and will soon be extended to use emulation to compute relative calls.

Thanks for reading!

—Jeff

11:8 A Tourist’s Phrasebook for Reversing MSP430

by Ryan Speers and Travis Goodspeed

Howdy, y’all!

Welcome to another installment of our series of quick-start guides for reverse engineering embedded systems. Our goal here is to get you situated with the MSP430 architecture as quickly as possible, with a minimum of fuss and formality.

Those of you who have already used an MSP430 might find this to be a useful reference, while those of you new to the architecture will find that it isn’t really all that strange. If you’ve already reverse engineered binaries for any platform, even x86, we hope that you’ll soon feel right at home.

Memory Map

Unlike other embedded platforms, which like to put the interrupt vector table (IVT) at the beginning of memory, the MSP430 places it at the very end of the 16-bit address space, in Flash. (On smaller chips, this is the very end of Flash.)

Early on, Low RAM at 0x0200 would be the only RAM location, but as that region proved too small, a High RAM area was created at 0x1100. For firmware compatibility reasons, the Low RAM area is mapped on top of the High RAM area.

Note that Flash grows down from the top of memory, while the RAM grows up. On MSP430X chips with a 20-bit address space, an Extended Flash region sometimes grows upward from 0x10000.

Architecture

Von Neumann

16-bit words

Registers

R0: Program Counter

R1: Stack Pointer

R2: Status Register

R3: Constant Generator

R4-R15: General Use

Address Space

16-bit (MSP430)

20-bit (MSP430X, X2)

Additionally, there is an Info Flash area at 0x1000. While there is nothing to stop an engineer from using this for code, the region is generally used for configuration settings. In many devices, chips arrive with this region pre-programmed to contain calibration settings for the internal clock.

In most devices, the BSL ROM at 0x0C00 contains a serial bootloader that allows the chip to be reprogrammed even after the JTAG fuse has been blown, and if you know the contents of the last 32 bytes of Flash—the Interrupt Vector Table—you can also read out the contents of memory.

Loading into a Disassembler

Back in the old days, reverse engineering MSP430 code meant using GNU objdump and annotating on pen and paper. Some folks wo
uld wrap these tools in Perl, or fill paper notebooks with cross-referencing, but thankfully that’s no longer necessary.

Start

End

Size

Use

0x0000

0x000F

16

Interrupt Control Registers

0x0010

0x00FF

240

8-bit Peripherals

0x0100

0x01FF

255

16-bit Peripherals

0x0200

0x09FF

Low RAM (Mirrored at 0x1100)

0x0C00

0x0FFF

1024

BootStrap Loader (BSL ROM)

0x1000

0x10FF

256

Info Flash

0x1100

High RAM

0xFFFF

Flash

0x10000

Extended Flash

Table 11.1: MSP430 and MSP430X Address Space

Nowadays, IDA Pro has excellent support for the platform. If you have a legit license, just open the Intel Hex image of your target and specify MSP430 as the architecture. Memory locations can be had from the appropriate datasheets.

‹ Prev Next ›