Radare2’s MSP430 support is a bit less mature, and you should make sure to sanity check the disassembly wherever it looks suspect. Luckily, the Radare2 developers are frighteningly quick about fixing bugs, so both bugs that bothered us in the writing this article have already been patched by the time you read this. For best results, always run Radare2 built from the latest Git repository, and rebuild it often.10
There are no known decompilers for the MSP430, but with small code sizes and rather legible assembly we don’t expect one to be necessary.
Basics of the Instruction Set
The language is relatively simple, but there are a few dialects that the locals speak. There are 27 native instructions, and then some additional emulated instructions which are assembled to one of the 27. Most of these 27 instructions have two forms—.B when they are working on an 8-bit byte, or .W if they want to tackle a 16-bit word. If someone tells you something and doesn’t specify it, you can assume it’s a word. If you’re doing a byte operation in a register, be warned that the most-significant byte is cleared.
The three main types of core words are single-operand arithmetic, two-operand arithmetic, and jumps.
Our simple single-operands are RRC (1-bit rotate right and carry), SWPB (swap the bytes of the word), RRA (1-bit rotate right as arithmetic), SXT (sign-extend a byte into a word), PUSH (onto the stack), CALL (a subroutine, by pushing PC and then moving the new address to PC), and RETI (return from interrupt, restoring the Status Register SR and PC from stack).
Although these are all simple folk, they can, of course, be addressed in many different ways. If our register is n, then we see a few major types of addressing, all based off of the ‘As’ (for source) and ‘Ad’ (limited options for destination) fields:
Rn Operate on the contents of register n.
@Rn Operate on what is in memory at the address held in Rn.
@Rn+ Same as above, then increment the register by 1 or 2.11
x(Rn) Operate on what is in memory at the address Rn + x.
Wait, we just told you about an ‘x’. Where did that come from?! In this case, it’s an extension word, where the next 16-bit word after the extension defines x. In other words, it’s an index off the base address held in Rn.
If the register is r0 (PC, the program counter), r2 (SR, the status register), or r3 (the constant generator), special cases apply. A common special case is to give you a constant, either -1, 0, 1, 2, 4, or 8.
Now we tackle two-operand arithmetic operations, most of which you should recognize from any other instruction set. The mov, add, addc (add with carry), sub, and subc instructions are all as you’d expect. cmp pretends to subtract the source from the destination to set status flags. dadd does a decimal addition with carry. xor and and are bitwise operations as usual. We have three that are a little unique:
bis (bit immediate set, logical OR),
bic (dest = dest AND src),
and bit (test bits of src AND dest).
Even with these instructions, though, we’re still missing many favorite mnemonics that you’ll see in disassembly. These are emulated instructions, actually implemented using other instruction(s).
For example, br dst (branch) is an emulated instruction. There is no branch opcode, but instead the br instructions are assembled as mov dst, pc. Similarly, pop dst is really mov @SP+, dst, and ret is really mov @sp+, pc. If these mappings make sense, you’re all set to continue your travels!
Thus, when we need to get around this land of MSP430, we look not to the many jump types of x86, but instead to simpler patterns, where the only kind of jump operands are relative, and that’s that.
So jmp, the instruction says, but where to? The first three bits (001) mean jump, the next three specify the conditional, and the remaining ten are a signed offset. To get there, the ten bits are multiplied by two (left shifted) and then are added to the program counter, r0. Why multiply by two? Well, we have 16-bit word alignment, in the MSP430 land, unlike with those pesky x86 instructions you might be thinking of. Ordnung muß sein!
You might have noticed in your disassembly that even though we told you this was a fixed-width instruction set, some instructions are longer than one 16-bit word! One way this can happen is when using immediate values, which—much like those of the glorious PDP-11 of old—are implemented by dereferencing and incrementing the program counter. This way, the CPU will skip over the immediate value in its code fetch path just as it’s fetching that same value as data.
And, finally, there are prefix instructions that have been added in MSP430X, the 20-bit extension of the MSP430. These prefix instructions go before the normal instruction, and you’ll most commonly see them setting the upper four bits of the pointer in a 20-bit function call.
What’s a Function, Anyways?
In x86 assembly, we’re used to looking for function preambles to pick out the functions, but what do we look for in MSP430 code? We’ve already discussed finding the entry point of the program and those of other ISRs by looking at the vectors in the IVT. What about other functions?
In MSP430, all functions that are not ISRs will end with a RET instruction which, as you recall, is actually a MOV @SP+, PC.
Compilers vary greatly in the calling conventions, as there is actually no fixed ABI. Usually, arguments get passed in r12, r13, r14, and r15. This, however, is by no means a requirement. MSP430 GCC uses r15 for the first parameter and for most return value types, and r14, r13, and r12 for the other parameters. Texas Instruments’ Code Composer and the IAR compiler (after EW430 4.10A release) use the opposite order: r12, r13, r14, and r15 and return in r12. Remember this when using assembly examples of one calling convention in the other, as you’ll need to move the registers around a bit.
We recommend using an additional heuristic instead of looking for a function preamble format. In this heuristic, we assume that indirect calls are rare, and look for br #addr and call #addr instructions. Both of these consist of two 16-bit words, and whatever the #addr we extract from that second word, there’s a good chance that it’s the start of a function.
Using this logic, you should be able to find functions even in stripped images disassembled with objdump. A short script, or a good disassembler, should help automate the marking of these functions.
Making Sense of Interrupts
As with your (other) favorite microcontroller, our exploration of the code can be preempted by an interrupt.
If you don’t like these getting in the way of your travels, they can be globally or individually disabled—well, except for the non-maskable interrupts (NMI).12
The MSP430 handles any interrupts set in priority order, and goes through the interrupt vector table to find the right interrupt service routine’s (ISR) starting address. It hides away the current PC and SR on the stack, and runs the ISR. The ISR then returns, and normal execution continues.
If one thing is for certain, it’s that 0xFFFE is the system’s reset ISR address (used on power-up, external reset, etc.), and that it has the highest priority.
If you have an ELF formatted dump,13 use msp430-objdump dump.msp430 -DS to get disassembly. Then locate the interrupt table at the end of memory.
0000ffc0 <.sec2>:
ffc0 : 26 32 jn $-946 ;abs 0xfc0e
...
fffc : 26 32 jn $-946 ;abs 0xfc4a
fffe : 00 31 jn $+514 ;abs 0x200
We look at 0xFFFE for the reset interrupt address, which is 0x3100 in this image. (objdump mistakes it for a conditional relative jump, so ignore the disassembly and read only the bytes.) That’s our entry point into the program, and you can see how it nicely lines up in the disassembly.
00003100 <.sec1>:
3100: 31 40 00 31 mov #12544, r1
3104: 15 42 20 01 mov &0x0120,r5
3108: 75 f3 and.b #-1, r5
Maybe we want to look at some specific functionality that is triggered by an interrupt, for example incoming serial data. Looking in the MSP430F1611 data sheet, we find that USART1 receive is a
maskable interrupt at 0xFFE6. If we look at the notated IVT in an example program (e.g., TinyOS’s Printf program compiled for TelosB), we see addresses in little endian.
0000ffe0 <__ivtbl_16>:
ffe0: 52 44 dac/dma
ffe2: 52 44 i/o p2
ffe4: 56 56 usart 1 tx
ffe6: dO 55 usart 1 rx
ffe8: 52 44 i/o p1
ffea: 94 4f timer a3
ffec: 76 4f timer a3
ffee: 52 44 adc12
fffO: 52 44 usartO tx
fff2: 52 44 usartO rx
fff4: 52 44 watchdog timer
fff6: 52 44 compartor a
fff8: d8 4f timer b7
fffa: ba 4f timer b7
fffc: 52 44 nmi/etc
fffe: OO 4O reset
We note that 0x4452 is used often. A quick look at this address shows that it is an empty IVT noting unused interrupts. Since we’re interested in the USART1 receive path, we follow 0x55d0 and see a large function that in turn calls another function—both nicely annotated, as we were working from an image with debug symbols:
000055d0
...
563a: b0 12 98 46 call #0x4698
...
00004698
...
This technique of looking up your IVT entries and then working backward to reverse engineer any handlers that correspond to the functionality you are interested in can help you avoid getting lost in reversing unimportant pieces of the code.
Sorting out Peripherals
Reversing an image, we usually have some peripheral of interest, such as the SPI bus that attaches a radio.
Some peripherals are dealt with by interrupts, as we just saw, but some are also either partially or totally handled by touching memory defined by the peripheral file map.
In particular, as an alternative to using interrupts, a program could simply poll for incoming data or a change in a pin’s state. Likewise, setting up configurations for items such as the USART discussed above is done in the peripheral file map.
Let us take the same file we used above, and look in the MSP430F1611 guide for the USART1 in the peripheral file map.14
Here we see the registers in the range from 0x0078 to 0x007F. Let us search for a few of these in the image.
First, we look for 0x0078 (USART control), 0x0079 (transmit control), and 0x007A (receive control). We find them all together in a function that is responsible for configuring the USART resource. A reader referencing the documentation will see the other control registers similarly updated.
4e8e
...
4eb4: c2 4e 78 00 mov.b r14, &0x0078
4eb8: d2 42 04 11 mov.b &0x1104,&0x0079
4ebc: 79 00
4ebe: d2 42 05 11 mov.b &0x1105,&0x007a
4ec2: 7a 00
4ec4: 1e 42 00 11 mov &0x1100,r14
4ec8: c2 4e 7c 00 mov.b r14, &0x007c
4ecc: 8e 10 swpb r14
4ece: 4e 4e mov.b r14, r14
4ed0: c2 4e 7d 00 mov.b r14, &0x007d
4ed4: d2 42 02 11 mov.b &0x1102,&0x007b
...
Whereas this approach can help you understand the settings to better sniff the serial bus physically, often you’d rather want to understand the actual data being written out. For this, we look for the peripheral holding the transmit buffer pointer—in our case at 0x007F, according to the chip documentation.
Searching for this address in the disassembly leads us to a few interesting functions. Firstly, there’s one that disables the UART, which fills this address with null bytes. That helps us confirm we’re looking at the right address. We also see this address written to in the interrupt handler that we located in the previous section—and in a large function that ends up being a form of printf for writing out to this serial line.
As you can see, working backward from the addresses found in the peripheral file map can help you quickly find functions of interest.
This guide is neither complete nor perfectly accurate. We told a few lies-to-children as all teachers do, and we omitted a dozen nifty examples that would’ve fit. Still, we hope that this will whet your appetite for working with the MSP430 architecture, and that, when you begin to work on the ’430s, you can get your bearings quickly, jumping into the fun part of the journey with less hassle.
For more MSP430 tricks, check out PoC‖GTFO 2:5!
11:9 This HTML page is also a PDF which is also a ZIP which is also a Ruby script which is an HTTP quine; or, The Treachery of Files
by Evan Sultanik from a concept independently conceived by Ange Albertini and with great technical assistance from Philippe Teuwen
Please rise and open your hymnal for the recitation of the Book of PoC‖GTFO, Chapter 7, Verse 6.
“A file has no intrinsic meaning. The meaning of a file—its type, its validity, its contents—can be different for each parser or interpreter.”
You may be seated.
In the spirit of самиздат and the license of this publication, we thought it might be nifty to aid its promulgation by enabling the PDF to mirror itself. That’s right, this PDF is an HTTP quine: it is a web server that serves copies of itself.
$ ruby pocorgtfo11.pdf &
Listening for connections on port 8080.
To listen on a different port,
re-run with the desired port as a command-line argument.
$ curl -s http://localhost:8080/pocorgtfo11.pdf |
diff -s - pocorgtfo11.pdf
A neighbor at 127.0.0.1 is requesting/pocorgtfo11.pdf
Files - and pocorgtfo11.pdf are identical
Utilisation de la canne. — 1. Canne-filet à papillons. — 2. Canne à toiser les chevaux. — 3. Canne-parapluie. — 4. Canne musicale. — 5. Ceci n’est pas une pipe.
This polyglot once again exploits the fact that PDF readers ignore everything before the first instance of “%PDF”. Coupled with Ruby’s __END__ token—which effectively halts interpretation—and its __FILE__ token—which resolves to the path of the file being interpreted—it’s actually quite easy to make an HTTP quine by prepending the PDF with the following:
But why stop there? Ruby makes all of the bytes in the script that occur after the __END__ token available in the special “DATA” object. Therefore, we can add additional content between __END__ and %PDF that the script can serve.
Any HTTP request with a URL that ends with .pdf will result in a copy of the PDF; anything else will result in the HTML index parsed from DATA.
Since the data between __END__ and %PDF... is pure HTML already, it would be a shame not to make this file a pure HTML polyglot, as we did with pocorgtof07.pdf. Doing so is relatively simple by wrapping PDF in HTML comments.
This is valid Ruby, since Ruby does not interpret anything after the __END__. The PDF does not affect the validity of the HTML since it is commented. There will be trouble if the byte sequence “-->” (2D 2D 3E) occurs anywhere within the PDF, but this is very unlikely and has proven not to be a problem.
Wrapping the Ruby webserver code in an HTML comment would have been ideal, and does in fact work for most PDF viewers. However, the presence of an HTML opening comment before the %PDF causes Adobe’s parser to classify the file as HTML and therefore refuse to open it.
Unfortunately, some web browsers interpret the Ruby code as having an implied “” preceding it, adding all of that text to the DOM. This is remedied with Javascript in the HTML that sanitizes the DOM if necessary.
As has become the norm, this PDF is also a valid ZIP. This feat does not affect the Ruby/HTML portion since the ZIP is embedded later in the file as an object within the PDF, as in PoC‖GTFO 1:5. This presents an additional opportunity for the webserver: if the script can unzip itself, then it can also serve all of the contents of the ZIP. Unfortunately, Ruby does not have a ZIP decompression facility in its standard library. Therefore, the webserver calls the unzip utility with the “-l” option, parsing the output to
determine the names and sizes of the constituent files. Then, a call to unzip with “-p” writes raw decompressed contents to stdout, which the web server splits apart and stores in memory. Any HTTP request with a URL that matches a file path within the ZIP is served that decompressed file. This allows us to have images like a favicon in the HTML. In the event that the PDF is interpreted as raw HTML—i.e., it was not served from the Ruby script—a Javascript function conveniently hides all of the ZIP access portions.
With all of this feature bloat, the Ruby/HTML code that is prepended before the PDF started getting quite large. Unfortunately, some PDF readers like PDFium15 (the default PDF viewer shipped with Chrom(e|ium)) fail unless they find “%PDF” within the first 1024 characters. Therefore, the final trick in this polyglot is to exploit Ruby’s multiline comment syntax (which, like the __END__ token, owes itself to Ruby’s Perl heritage). This allows us to start the PDF header early, within a comment that will not be interpreted. Within that PDF header we open a dummy object stream that will contain the remainder of the Ruby script and the following HTML code before the start of the “real” PDF.
Figure 11.5: Anatomy of the Ruby/HTML/PDF/ZIP polyglot. portions contain the main content of their respective filetypes. portions are for context and to illustrate modifications necessary to make the polyglot work. Gray portions are not interpreted by their respective filetypes.
11:10 In Memoriam: Ben “bushing” Byer
by FailOverflow
We are deeply saddened by the news that our member, colleague, and friend Ben “bushing” Byer passed away of natural causes on Monday, February 8th.
Many of you knew him as one of the public faces of our group, failOverflow, and before that, Team Twiizers and the iPhone Dev Team.
Outspoken but never confrontational, he was proof that even in the competitive and often aggressive hacking scene, there is a place for both a sharp mind and a kind heart.
PoC or GTFO, Volume 2 Page 23