PoC or GTFO, Volume 2

Page 20

by Manul Laphroaig

At this journal, we generally frown upon defense, not because it is easy, but because it is so damned hard to describe properly. On page 396, Jeffrey Crowell presents a poor man’s method of patching 32-bit x86 binaries to enforce the control flow graph. With examples in Radare2 and legible C, you’ll be itching to write your own generic patchers for large binaries this weekend.

Page 415 describes how Evan Sultanik made this PDF—the one that you’re reading—into a poyglot webserver quine in Ruby with its own самиздат, PoC‖GTFO mirror.

It is with great sadness that we dedicate this release to the memory of our neighbor Ben Byer, the “hypothetical defendant by the name of ‘Bushing’” who inspired many of us to put pwnage before politics, to keep on hacking. We’re gonna miss him.

11:2 In Praise of Junk Hacking

by Pastor Manul Laphroaig in polite dissent to Daily Dave.

Gather round y’all, young and old, and listen to a story that I have to tell.

Back in 2014, when we were all eagerly waiting for to debut on the TV network formerly known as the Columbia Broadcasting System, a minor ruckus was raised over Junk Hacking. The moral fiber of the youth, it was said, was being corrupted by a dozen cheap Black Hat talks on popping embedded systems with old bugs from the nineties. Who among us high-brow neighbors would sully the good name of our profession by hacking an ATM that runs Windows XP, when breaking into XP is old hat?

Let’s think for just a minute and consider the best examples of neighborly junk hacking. Perhaps we’ll find that rather than being mere publicity stunts, junk hacking is a way to step back from the daily grind of confidential consulting work, to share nifty tricks and techniques that are often more interesting than the bug itself.

Our first example today is from everyone’s favorite doctor in a track suit, Charlie Miller. If you have the misfortune of reading about his work in the lay press, you might have heard that he could blow up laptop batteries by software,1 or that he was recklessly irresponsible by disabling the power train of a car with a reporter inside.2 That is to say, from the lay press articles, you wouldn’t know a damned thing about what mechanism he experimented with.

So please, read the fucking paper, the battery hacking paper,3 and ignore what CNN has to say on the subject. Read about how the Smart Battery Charger (SBC) is responsible for charging the battery even when the host is unresponsive, and consider how much more stable this would be than giving the host responsibility for managing the state. Read about how a complete development kit is available for the platform, about how the firmware update is flashed out of order to prevent bricking the battery.

Read about how the Texas Instruments BQ20Z80 chip is a CoolRISC 816 microcontroller, which was identified by Dion Blazakis through googling opcodes when the instruction set was not documented by the manufacturer. See that its mask ROM functions are well documented in sluu225.pdf.4 Read about how code memory erases not to all ones, as most architectures would, but to ff ff 3f because that’s a NOP instruction.

Read about how this architecture wasn’t supported by IDA Pro, but that a plugin disassembler wasn’t much trouble to write.5 Read about how instructions on the CoolRISC platform are 22 bits wide and 24-bit aligned, so code might begin at any 3-byte boundary. See how Charlie bypasses the firmware checksums in order to inject his own code.

Can you really read all thirty-eight pages without learning one new trick, without learning anything nifty? Without anything more to say than your disappointment that batteries shipped with the default password? He who has eyes to read, let him read!

Loyal readers of this journal will remember PoC‖GTFO 2:4, in which Natalie Silvanovich gets remote code execution in a Tamagotchi’s 6502 microcontroller through a plug-in memory chip. “Big whoop,” some jerk might say, “local control of memory is getting root when you already have root!”

Re-read her article; it packs a hell of a lot into just a few pages. The memory that she controls is just data memory, containing some fixed-size sprites and single byte describing the game that the cartridge should load. The game itself, like all other code, is already in the CPU’s unwritable Mask ROM.

So given just one byte of maneuverability, Natalie tried each value, discovering that a switch() statement had no default case, so values above 0x20 would cause a reboot, while really high values, above 0xD8, would sometimes jump the game to a valid screen.

At this point she had a good idea that she was running off the end of a jump table, but as is common in the best junk hacking, she had no copy of the code and needed an exploit to extract the code. She did, however, know from die photographs and datasheets that the chip was a GeneralPlus GPLB52X with a 6502 instruction set. So she came up with the clever trick of making a background picture that, when loaded into LCD RAM, would form a NOP sled into shellcode that dumped memory out of an I/O port.

By reverse engineering that memory dump, she was able to replace her Hail Mary of a NOP sled with perfectly placed, efficient shellcode containing any number of fancy new features. You can even send your Tamagotchi to 30C3, if you like.

The point of her paper is no more about securing the Tamagotchi than Charlie’s is about securing a battery. The point of the paper is to teach the reader the mechanism by which she dumped the firmware, and if you can read those two pages without learning something new about exploiting a target for which you have no machine code to disassemble, you aren’t really trying. He who has eyes to read, let him read!

And this is the crux of the matter, dear neighbors. We become jaded by so much garbage on TV, so much crap in the news, and so many attempts to straight-jacket the narrative of security research by the mistaken belief that it must involve security. But the very best security research doesn’t involve security! The very best research has no CVE, demands no patch, and has no direct relation to anything from your grandmother’s credit card number to your server’s shadow file.

The very best research is that which teaches you something new about the mechanism by which a machine functions. It teaches you how to build something, how to break something, or how to take something apart, but most of all it teaches you how the hell that thing really works.

So to hell with the target and to hell with the reporters. Teach me how a thing works, and teach me the techniques that you needed to do something clever with it. But if you casually dismiss the clever tricks learned from hacking an Apple ][, a battery, or a Tamagotchi, I’m afraid that I’ll have to ask you politely, but firmly, to get the fuck out.6

11:3 Star Wars on a Vector Display

by Trammell Hudson

Star Wars was one of Atari’s best vector games—possibly, the pinnacle of the golden age of arcade games. It featured 3D color vector graphics in an era when most games were low-resolution bitmaps. It also had digitized voice samples from the movie, while its contemporary games were still using 8-bit beeps.

The Star Wars ROMs, along with almost all of Atari’s vector games, can be emulated with MAME and the vectors extracted for display on actual vector hardware. Even though modern screens have exceeded the 10-bit resolution used by the game, the unique quality of a vector monitor is hard to convey. When compared to the low-resolution bitmap on a television monitor, the sharp lines and high resolution of the vectors are really stunning.

The graphics were 3D wireframe renderings that included features like the Tie fighters breaking up when they were hit by the player’s lasers. There was no hidden wireframe removal; at this time it was not computationally feasible to do so.

Digital to Analog Converters

There were two common ways to generate the analog voltages to steer the electron beam in the vector monitor. Most early Atari games used the “Digital Voltage Generator,” which used dual 10-bit DACs that directly output -2.5 to +2.5 volt signals. Star Wars, however, used the “Analog Voltage Generator,” in which the DACs generated the slope of the line, and opamps integrated the values to produce the output voltage. This is significantly
more complex to emulate, and modern DACs and microcontrollers make it fairly easy to generate the analog voltages to drive the displays with resolution exceeding the precision of the old opamps.

The open source hardware V.st quad-DAC boards output do 1.2 million samples per second, which is enough to steer the beam using Bresenham’s line algorithm at a resolution of about 12 bits. While this is generating discrete points, the analog nature of the CRT means that smooth lines will be traced in the phosphor. The ARM’s DMA engine clocks out the X and Y coordinates as well as the intensity, allowing the CPU to process incoming data from the USB serial connection without disrupting the output.

Source code for the V.st is available online or as an attachment to this PDF.7 A schematic diagram can be found on page 351.

Displays

Two inexpensive vector displays are the Tektronix 1720 vector-scope, a piece of analog NTSC video test equipment from a television studio, and the Vectrex, one of the only home vector console systems. The Tek uses an Electrostatic deflection CRT, which gives it very high bandwidth and almost instant transits between points, but at the cost of a very small deflection angle that results in a tiny screen and a very deep tube. The Vectrex has a magnetic deflection CRT, which allows it to be much shallower and significantly larger, but it requires many microseconds for the beam to stabilize in a new position. As a result, the DAC needs to take into account the “inertia” of the beam and wait for it to catch up.

Gameplay

Figure 11.1 compares the Tek 1720 on the left to the Vectrex on the right, which isn’t very impressive on paper but will animate as a short video if you open pocorgtfo11.pdf in Adobe Reader. A longer video showing some of the different scenes is available. As the number of line segments increases, the slower display starts to flicker.

The game was played with a yoke, so the Y-axis mapping might seem backwards for a normal joystick. You can invert it in MAME by pressing Tab to bring up the config menu, selecting “Analog Controls” and “AD Stick Y Reverse.”

While playing it on a small Vectrex or even smaller vectorscope doesn’t quite capture the thrill of the arcade, it is quite fun to relive the vector art æsthetic at home and hear the digitized voice of Obi-Wan telling you that “the Force will be with you, always.”

Figure 11.1: Tek 1720 vs Vectrex

11:4 Master Boot Record Nibbles; or, One Boot Sector PoC Deserves Another

by Eric Davisson

I was inspired by the boot sector Tetris game by Juhani Haverinen, Owen Shepherd, and Shikhin Sethi published as PoC‖GTFO 3:8. I feel more creative when dealing with extreme limitations, and half a kilobyte of real-mode assembly sounded like a great way to learn BIOS API stuff. I mostly learned some int 0x10 and 0x16 from this exercise, with a bit of int 0x19 from a pull request.

The game looks a lot more like Snake or Nibbles, except that the tail never follows the head, so the game piece acts less like a snake and more like a streak left in Tron. I called it Tron Solitaire because there is only one player. This game has an advanced/dynamic scoring system with bonus and trap items, and progressively increasing game speed. This game can also be won.

I’ve done plenty of protected mode assembly and machine code hacking, but for some reason have never jumped down to real mode. Tetranglix gave me a hefty head start by showing me how to do things like quickly setting up a stack and some video memory. I would have possibly struggled a little with int 0x16 keyboard handling without this code as a reference. Also, I re-used the elegant random value implementation as well. Finally, the PIT (Programmable Interval Timer) delay loop used in Tetranglix gave me a good start on my own dynamically timed delay.

I also learned how incredibly easy it was to get started with 16-bit real mode programming. I owe a lot of this to the immediate gratification from utilities like qemu. Looking at OS guides like the osdev.org wiki was a bit intimidating, because writing an OS is not at all trivial, but I wanted to start with much less than that. Just because I want to write real mode boot sector code doesn’t mean I’m trying to actually boot something. So a lot of the instructions and guides I found had a lot of information that wasn’t applicable to my unusual needs and desires.

I found that there were only two small things I needed to do in order to write this code: make sure the boot image file is exactly 512 bytes and make sure the last two bytes are 0x55AA. That’s it! All the rest of the code is all yours. You could literally start a file with OxEBFE (two-byte unconditional infinite “jump to self” loop), have 508 bytes of nulls (or ANYTHING else), and end with 0x55AA, and you’ll have a valid boot image that doesn’t error or crash. So I started with that simple PoC and built my way up to a game.

The most dramatic space savers were also the least interesting. Instead of cool low level hacks, it usually comes down to replacing a bad algorithm. One example is that the game screen has a nice blue border. Initially, I drew the top and bottom lines, and then the right and left lines. I even thought I was clever by drawing the right and left lines together, two pixels at a time—because drawing a right pixel and incrementing brings me to the left and one row down. I used this side-effect to save code, rewriting a single routine to be both right and left.

All of this was still too much code, so I tried something simpler: first splashing the whole screen with blue, then filling in a black box to only leave the blue border. The black box code wasn’t trivial, but it was smaller than the previous method. This saved me sixteen precious bytes!

Less than a week after I put this on Github, my friend Darkvoxels made a pull request to change the game-over screen. Instead of splashing the screen red and idling, he just restarts the game. I liked this idea and merged. As his game-over is just a simple int 0x19, he saved ten bytes.

Although I may not have tons of reusable subroutines, I still avoided inlining as much as possible. In my experience, inlining is great for runtime performance because it cuts out the overhead of jumping around the code space and stack overhead. However, this tends to create more code as the tradeoff. With 510 effective bytes to work with, I would gladly trade speed for space. If I see a few consecutive instructions that repeat, I try to make a routine of it.

I also took a few opportunities to use self-modifying code to save on space. No longer do I have to manually hex hack the w bit in the rwx attribute in the .text section of an ELF header; real mode trusts me to do all of the “bad” things that dev hipsters rage at me about.

Two self-modifying code hacks in this code are similar in concept. There are a couple of places where I needed something similar to a global variable. I could push and pop it to and from the stack when needed, but that requires more bytes of code overhead than I had to spare. I could also use a dedicated register, but there are too few of those. On the other hand, assuming I’m actually using this dynamic data, it’s going to end up being part of an operand in the machine code, which is what I would consider its persisted location. (Not a register, not the stack, but inside the actual code.)

As the pixel streak moves around on the game-board, the player gets one point per character movement. When the player collects a bonus item of any value, this one-point-per gets three added to it, becoming a four-points-per. If an additional bonus item were collected, it would be up to seven points. The code to add one point is selfmodify: add ax, 1. When a bonus item is collected, the routine for doing bonus points also has the line add byte [selfmodify + 2], 3. The +2 offset to our add ax, 1 instruction is the byte where the 1 operand was located, allowing us to directly modify it.

This adds to the strategy of the game. It discourages just filling the screen up with the streak while avoiding items (so as to not create a mess) and waiting out the clock. In fact, it is nearly impossible to win this way. To win, it is a better strategy to get as many bonuses as early as possible to take advantage of this progressive scoring system.

Another self-modifying code trick is used on the win screen. The background to the “YOU WIN!” screen does some color and charact
er cycling, which is really just an increment. It is initialized with winbg: mov ax, 0, and we can later increment through it with inc word [winbg + 0x01]. What I also find interesting about this is that we can’t do a space saving hack like just changing mov ax, 0 to xor ax, ax. Yes, the result is the same; ax will equal 0x0000 and the xor takes less code space. However, the machine code for xor ax, ax is 0x31c0, where 0x31 is the xor and OxcO represents “ax with ax.” The increment instruction would be incrementing the OxcO byte, and the first byte of the next instruction since the word modifier was used, which is even worse. This would not increment an immediate value, instead it would do another xor of different registers each time.

Instead of using an elaborate string print function, I have a loop to print a character at a pointer where my “YOU WIN!” string is stored (winloop: mov al, [winmessage]), and then use self-modifying code to increment the pointer on each round. (inc byte [winloop + 0x01])

The most interesting self-modifying code in this game changes the opcode, rather than an operand. Though the code for the trap items and the bonus items have a lot of differences, there are a significant amount of consecutive instructions that are exactly the same, with the exception of the addition (bonus) or the subtraction (trap) of the score. This is because the score actually persists in video memory, and there is some code overhead to extract it and push it back before and after modifying it.

So I made all of this a subroutine. In my assembly source you will see it as an addition (math: add ax, cx), even though the instruction initialized there could be arbitrary. Fortunately for me, the machine code format for this addition and subtraction instruction are the same. This means we can dynamically drop in whichever opcode we want to use for our current need on the fly. Specifically, the add I use is ADD r/m16, r16 (0x01/r) and the sub I use is SUB r/m16, r16 (0x29/r). So if it’s a bonus item, we’ll self modify the routine to add (mov byte [math], 0x01) and call it, then do other bonus related instructions after the return. If it’s a trap item, we’ll self modify the routine to subtract (mov byte [math], 0x29) and call it, then do trap/penalty instructions after the return. This whole hack isn’t without some overhead; the most exciting thing is that this hack saved me one byte, but even a single byte is a lot when making a program this small!

‹ Prev Next ›