Book Read Free

PoC or GTFO, Volume 2

Page 8

by Manul Laphroaig


  It is high time. We have been doing our dissections alone—with none or few to watch and learn—long enough. Let other neighbors watch your disassembly, show them your technique, and let them get a good view and share the fun.

  As the good old U. of Padua preserved its theater, so shall we! And then perhaps the Specter of the Future will smile upon us.

  10:3 Pokémon Plays Twitch

  by Allan Cecil (DwangoAC), Ilari Liusvaara (Ilari) and Jordan Potter (P4Plus2)

  For the Awesome Games Done Quick (AGDQ) 2015 charity marathon we exploited a chain of unmodified Nintendo game console components consisting of a Pokémon Red Game Boy cartridge in a Super Game Boy running in a Super Nintendo. We plugged the latter into custom hardware posing as a normal controller. In this seven-stage exploit, we corrupted a save file to give ourselves 255 Pokémon, swapped Pokémon, and tossed items to plant shellcode. We committed a series of atrocities using documented command packets and ultimately broke into the Super Nintendo’s working RAM, where we wrote our own chat interface to display live contents of the Twitch chat and even a representation of a defaced website.

  TAS’ing a Game to execute Arbitrary Code

  TASVideos2 hosts Tool-Assisted Speedruns of games that are created using an emulator with speed controls such as slow motion and frame advance, along with the ability to save and restore the state of the game (or, rather, of the entire console) at any time. TAS movie files consist of a list all of the button presses sent to the console every frame from the time it is powered on until the game is beaten. It aids our poor human reflexes, but it can do a lot more—like arbitrary code execution!

  The first run on the site to use this ability to execute arbitrary code to jump to the credits of a game was Masterjun’s Super Mario World run. Later, Bortreb used arbitrary code execution in a run of Pokémon Yellow, marking the first time external content was added to an existing game.

  In late 2013, DwangoAC worked with Ilari and Masterjun to present a run at AGDQ 2014 that programmed the games Snake and Pong into Super Mario World on a real console using a replay device (also known as a “bot”) from True. This was a huge success and was covered by Ars Technica, but we knew that we could do even more, which ultimately led us to the project described in this article.3

  The Game Choice

  We started with an end-goal of executing arbitrary code on a Super Nintendo (SNES) using a Super Game Boy (SGB) cartridge as the entry point. We initially planned to use Pokémon Yellow based on Bortreb’s exploit in that game, but quickly discovered that the SGB detection routine used by Pokémon Yellow is extremely broken and only worked on a real SGB by pure chance.4 After looking at other options, we chose to use the Pokémon Red version, which uses a more reliable SGB detection routine that gets us access to the full SGB palette, a custom border, and consistent timing benefits we’ll discuss later.5 Using Pokémon Red also has another added benefit in that the entire game has been skillfully disassembled.6

  The Emulator

  When we started this project in August 2014, the only emulator capable of emulating an SGB inside of an SNES at a low enough level for our needs was the BSNES emulator. Unfortunately, although BSNES is very accurate at emulating an SNES, it doesn’t do a very good job of emulating an SGB. The Gambatte Dot-Matrix Game Boy (DMG) emulator is likewise very accurate, but is unable to emulate an SGB on its own. Ilari was able to create a hybrid emulation core using BSNES to emulate the SNES↔DMG interface chip while using Gambatte for DMG emulation. This was a considerable undertaking, but in the end the emulator was very usable, albeit somewhat slow, as properly emulating the synchronization between the SNES CPU and the DMG CPU is a challenge. Ilari continued to provide emulator development and scripting support throughout the project.

  The Hardware

  We didn’t just want to exploit a game in the sandbox of a console emulator and call it a Proof of Concept. We wanted to do the job properly and create an actual exploit that would work on real hardware. Only one member of our team (DwangoAC) had all of the required hardware in one place, namely an SNES console, an SGB cartridge, a copy of Pokémon Red, and the replay device posing as a controller, also known as a “bot.”7 Because we wanted to stream data from an attached computer, we opted to use an older, serial-over-USB connected device, namely True’s NES/SNES Replay Device. This choice of hardware had a few limitations but worked out well for the project in the end.

  The Plan

  We were unsure what kind of payload to create once we gained the ability to execute arbitrary code on the SNES. Initially we investigated methods of showing crude video, but abandoned it after spending far too much time failing to increase the datarate and running into limits with the processing speed of the SNES’s 65C816 CPU. An IRC discussion about Twitch Plays Pokémon8 led DwangoAC and P4Plus2 to brainstorm what it would take to incorporate similar elements into our payload, and the concept of Pokémon Plays Twitch was hatched—where a Pokémon character enters a Twitch chat room and “plays” Twitch. In the end, we took it to the next level by giving Red a voice in a chat interface on the SNES and giving TASBot, the robot holding the replay board, the ability to speak through espeak and argue with Red. There’s much more to say about that, but let’s first get to the point where we can execute arbitrary code!

  Figure 10.1: The Legendary TASBot

  Figure 10.2: A Strange Rival

  Stage 0: Corrupting a save game.

  Three to seven bytes per minute.

  We start the game by creating a save file, giving ourselves the default name of Red and naming our rival RxRxpk as shown in Figure 10.2. We then save the game as in Figure 10.3, but reset the console directly after it starts writing to the cartridge’s SRAM. There is checksumming on most of the values in the save file but at least one value has no checksum at all, namely the byte at the start of the “party data” that stores the number of Pokémon that have been caught. By some chance, this value in SRAM (at 0xAF2C, or 0x2F2C when paged) is initially set to FF, so we wait long enough for valid name data and a save file header to be written before resetting. It is possible to do this with human reflexes as the window is approximately 20 ms but we opted to run a wire from our replay device to pin 19 on the expansion port on the underside of the SNES. This allowed ns to trigger a reset by shorting the pin to ground, as shown in Figure 10.3.9

  Figure 10.3: Corrupting a save game by pressing A to save, then resetting 24 frames later.

  Stage 1: Writing Z80 assembly by swapping Pokémon and tossing items.

  Thirty bytes per second

  After loading the game but before changing anything, the initial state of the GBBUS memory map is held in memory at 0xD163.10

  We want to adjust some of these values to create a payload described in the next section, and the game conveniently provides three ways to arrange the state of memory.

  Swapping Pokémon: The game implements moving Pokémon around the list by swapping the raw contents of entries in the ID, Data, Original trainer, and nickname tables, which happens to offset data by an odd amount. Since we have 255 Pokémon instead of the 141 the game was originally limited to we can swap around the contents of anything in WRAM above 0xD164.11

  Tossing items: Throwing away unwanted items decrements the second byte in an item’s two-byte ID / Quantity pair. Unfortunately, there are some items that can’t be tossed, either because the game prevents tossing them or because doing so softlocks or crashes the game.

  Swapping items: Items can be swapped around in the list of items, which normally just swaps the item data. If you swap two of the same item, the game tries to merge them by adding their counts and then shifting the item list. The shift adjusts the item count and writes a new sentinel item ID. (It doesn’t touch either the item count in that slot or the old sentinel.)

  Since we don’t have any items, let’s get some! Swapping the first Pokémon with the tenth causes the FF’s located at 0xD16B through 0xD196 to be written to 0xD2F7 through 0xD322. Per the memory map, the nu
mber of items is located at 0xD31D and this is changed along with lots of other nearby addresses from 00 to FF, which causes the game to think we have 255 items. We eventually enter the item menu and proceed to toss a number of safe items, but—because we can only ever decrement the quantity byte in each item’s ID/Quantity two-byte pair—we have to go back and swap Pokémon to make what was once an ID into an item count and vice versa.

  In order to avoid softlocking the game, we have to walk through the sequence in a very particular order. There are several items that the game refuses to toss, some that crash the game if you try to toss them, some that can only be thrown once—after which all items afflicted with this condition can no longer be tossed. Some will crash the game simply by being in the menu even if you never even select them.

  Figure 10.4: Pokémon and items are re-arranged in memory to create shellcode.

  Figure 10.5: Early Shellcode from Swaps

  To work around these pitfalls, we prepare memory by doing several Pokémon and item swaps followed by an initial round of tossing, we go back to swap Pokémon in a way that realigns memory so we can now toss what used to be item IDs. We swap several Pokémon to relocate the Stage 1 code and create a swath of 0’s in front of it, and at the very end we swap two identical items to shift memory two spaces back. That’s a lot to take in in one sentence, so page 155 diagrams the complete list of changes we make showing the value changes as anchored initially from GBBUS 0xD349.

  The primary purpose of all this swapping and tossing is to create a better way to craft our own code—as it would be quite tedious to use this method to do anything longer.12 Figure 10.5 shows a disassembly of what we’ve been able to write so far, starting from 0xD361.

  Everything up to this point has been the process of writing Stage 1, but now it’s time to walk through executing it, although some of the shortcuts we took require a bit of explanation.

  First, the reason 0xD361 contains 30 is because the beginning of the Stage 1 data is mostly copied from the field that holds the rival name—which happens to be directly preceded by the player’s money. Of this quantity we see the last two out of three bytes represented here in BCD format; the full value 00 30 00 starts at 0xD360. It would take extra effort to eliminate the 30 00 portion, but because that sequence is effectively a NOP, we leave it be.

  To reduce the number of bytes that needed to be modified, we used several clever tricks. The code that jumps to this point sets HL to the jump target address, and HL is a canonical pointer register that can be written to. We reused 0xD36E, the map level script pointer, as the loop jump address; upon exiting the menu, the map level script pointer is loaded and called, so it loads the value in 0xD36E into HL and jumps to it.

  Stage 1’s purpose is to read the buttons being held down on the controller and write them somewhere, eventually executing what we’ve written using this slightly more efficient method than twiddling with Pokémon and items. At a high level, this code will read a byte from the controller on one frame, read another byte from the controller on the next frame, subtract the two, store the result at a given memory offset and repeat, successively storing values one byte at a time in order in memory, and ultimately executing said bytes.

  The game’s NMI (Non-Maskable Interrupt) routine writes a bitmap of the current buttons being held down during each frame (mapped as the buttons ABsSRLUD from lowest to highest bit) to 0xFFF8, and HALT is used to wait for the next frame. Unfortunately, the SGB BIOS cancels out LEFT+RIGHT or UP+DOWN if they are pressed simultaneously and instead converts those bits to 0’s. To work around it, our short routine reads two frames and combines the values in a way that can yield arbitrary bytes. Because of restrictions on which bytes we can create, we use LD C,A to store the first value and then SUB C to combine them.13

  Using this method, we write the following data to 0xD338, which is executed every frame; that is to say, it is repeatedly executed even before it is fully written!

  Figure 10.6: Item IDs can double as Z80 opcodes.

  The memory range from 0xD338 to D360 contains only 00’s and forms a cascade of harmless NOP instructions. This is critical, because this entire section is executed every time a byte is written; this also means we have to be extremely careful with what we write, to avoid executing an incomplete Stage 2 that causes a crash. The solution is to write a jump instruction of 18 27 into the first two bytes—which will skip execution of Stage 2 while it is being constructed. The sequence 18 27 can’t be entered in one frame, but fortunately the incomplete form, 18 00, is effectively a NOP instruction. This gives us the full range from 0xD33A to 0xD360 where we can write whatever we want with impunity, and HL points to 0xD33A.

  We write 0x1a8 (JR x) into current write position and advance write position:

  Figure 10.7: Sending payload (combos injected by first controller)

  We write 0x27 into current write position, turning the first instruction into a nontrivial jump.

  We write the Second Stage to D33A-D360 which is jumped over and not executed. This takes 39 iterations through the loop.

  After this, we somehow need to jump to the newly completed Stage 2. The HL now points to 0xD360 and the next byte we poke is 18, which turns the first instruction in the Stage 1 code into JR 0, which is still effectively a NOP.

  We write 18 (JR x) to current position, turning the 30 into 18, acting as a JR 0 instruction.

  We write D7 into 0xD362, which modifies the instruction to be JR -41, which jumps to 0xD33A, the start of the second payload. After one more call into 0xD338 and the subsequent jump to 0xD360, the execution jumps to the Second Stage.

  We write D7 (-41) to current position, turning the jump into a jump to execute the Stage 2:

  One last note before moving on to what Stage 2 will do for us: as with most things in this exploit, entering the Stage 2 payload isn’t as straightforward as it should be, and this time it’s because the SNES and the DMG run at different clock speeds and framerates. It takes 351,120 cycles for the DMG to run one frame, and 357,366 for the SNES to run one frame. Each side polls the inputs once per their frame, and the SNES side updates the inputs that the DMG side reads once per frame. Since each SNES frame takes slightly longer, the SNES regularly skips updating inputs for one full DMG frame, causing the input to be duplicated.14

  This clock skew slip happens about every 56 DMG frames. (Sometimes it’s 57 frames between slips due to slipping.) It takes a full 86 frames to input the Stage 2 sequence because there are 39 bytes of payload plus four bytes total for prologue and epilogue jump instructions, and each byte takes two frames to enter as a result of working around L+R and U+D combinations being nulled out. This means we have to cope with at least one clock skew slip, and because 86 isn’t that much bigger than 2 × 56, the slip position must be relatively near the middle to avoid having to deal with two slips.15

  Figure 10.8: Z80 opcodes that can be sent through SGB input filtering.

  Stage 2: Sending packets to escape SGB from very little space.

  We have just 39 bytes to work with in the Stage 2 payload we just wrote and we need to make the most out of every last byte. Fortunately, Pokémon Red already contains a routine that sends a command packet into the SNES. The catch is the code to send that packet is in another ROM bank (0x1C) that we need to switch to. While the ROM bank can be switched by a single write, the game NMI routine (which runs every frame) does not save the bank; rather, it switches to one stored in another memory address instead. Two writes are needed to reliably change the bank which would take too much space; however, the common part of ROM (mapped regardless of the bank) has a function that does something, then switches banks and returns. That function makes for a very useful gadget! The entry address for this function is 0x00AF, with register A holding the bank number.

  We need to send two separate command packets, described below.16 The packets aren’t a full sixteen bytes in length like they appear to be, but eleven and seven bytes; the tails of the packets are ignored, so we let the packet pa
yloads overrun into whatever happens to be next. After sending the packets, we have no use for the DMG anymore, so we hang the Z80 by entering a tight loop.

  The following Stage 2 assembly code is loaded into memory from 0xD33A to D360.

  Originally, the LD L, 0x58; NOP sequence was LD HL, 0xD358 but we discovered that the transfer routine leaves the upper eight bits of the address in the H register at the end of the transfer. The transfer end of the packet at 0xD34D will be 0xD35D, so the H register will be D3, which is exactly the value we want for the next packet, so we can save one byte by just loading the L register. The saved byte can taken to be NOP (00).

  The repeated input can land on two inputs of the same byte, or the last input of one byte and first input of next. The latter is much better, since for any byte pair, it is possible to construct three valid inputs. However, the first is much worse: The byte will be forced to 00, and even more unfortunately, the frame rules always cause the duplication to occur in a bad way. The 00 freed from only loading L is close enough to the middle that this byte can be targeted for duplication. It turned out that the emulator doesn’t emulate the input slipping quite accurately and we had to do a lot of tedious trial and error testing to time the input correctly.17 The offset between emulator and real hardware turned out to be eight frames, which we adjusted by adding eight frames of no input into the file sent to the bot prior to exiting the menu.

 

‹ Prev