PoC or GTFO, Volume 2
Page 16
The encoding is not 6-and-2, either; it is 6-and-0 encoding. This requires 344 bytes per sector, instead of the regular 342 bytes. The decoder overwrites the addresses $xxAA and $xxAB twice in order to compensate for the additional bytes, as the program supports only page-aligned reads. The decoding is interleaved, so there is no denibbilisation pass.
The 6-and-0 encoding works by using the six-bit nibble as an alternating index into one of the arrays of six-bit or two-bit values. The code is both much faster (no fetching of the two-bit array) and much smaller (two-thirds of the size) than the one described in Race Conditions,(§10:7.2) but the decoding tables occupy 1.5kb of memory. The memory layout might have been chosen to avoid a timing penalty due to page-crossing accesses. However, the penalty has no effect on the performance of the routine because the code must still spend time waiting for the bytes to arrive from disk. Therefore, the tables could have been combined into a 512-byte region instead, which is a closer match to the memory usage of the routine described in Race Conditions.
A SpiraDisc-protected disk uses four sectors per track, but since the track stepping is quartered, the data density is equivalent to a single 16-sector track. Each sector has a unique prologue value to identify itself. When a read is requested, if a sector cannot be found on the current track, then the program advances the drive head by one quarter-track, and then attempts the read again. If the read fails again, then the program retreats the drive head by one quarter-track, and then attempts the read again. If the read still fails, then the program retreats the drive head by another quarter-track, and then attempts the read again. If the read fails at this point, then the disk is considered to be corrupted.
Given the behaviour of the read request, the data might not be stored on consecutive quarter-tracks. Instead, they might zigzag across a span of up to three quarter-tracks. This is another deviation from the idea of spiral tracking. By coincidence, the movement is very similar to the one in the program Captain Goodnight and other Brøderbund titles.
Copying a SpiraDisc-protected disk is difficult because of the potential for cross-talk which would corrupt the sectors when they are read back. However, images produced by an E.D.D. card will work in emulators, if the copy parameters are set correctly.
When run, the program decodes selected pages of itself, based on an array of flags, and also re-encodes those pages after use, to prevent dumping from memory. The decoding is simply an exclusive-OR of each byte with the value #$AC, exclusive-ORed with the index within the page.
At start-up, the program profiles the system: scanning the slot device space, and records the location of devices for which the first 17 bytes are constant (that is, they return the same value when read more than once), and which do not have eight bytes that match the first one within those 17 bytes. For example, Mockingboard has memory-mapped I/O space in that region, which are mostly zeroes. The program calculates and stores a checksum for slot devices which pass this check. The store was supposed to happen only if the checksum did not match certain values, but the comparison is made against a copyright string instead of an array of checksums. The first time around, all values are accepted. During subsequent profiling, the value must match exactly.
The program checks if bank one is writable, after attempting to write-enable it, and sets a flag based on the result. The program checksums the F8 and F0 ROM BIOS codes, watches for particular checksums, and sets flags based on the result. The original version of the program (as seen in 1981, used on the program Jawbreaker) actually required that the ROM BIOS code match particular checksums—either the original Apple ][ or the Apple ][+—otherwise the program simply wiped memory and rebooted. (This prevented protected programs from running on the Apple ][e or the Apple ][c.) The no-doubt numerous compatibility problems that resulted from this decision led to the final check being discarded (as seen in 1983, used on the program Maze Craze Construction Set, but quite possibly even earlier), though the rest of the profiling remains. However, having even one popular title that didn’t work on more modern machines was probably sufficient to turn publishers entirely off the use of the program.
The program probes all of memory by writing a zero to every second byte. However, it skips pages #0, #2, #4-7, and #$A8-C0, meaning that it writes data to all slot devices, with unpredictable results. The program also re-profiles the system upon receiving each request to read tracks. This re-profiling is intended to defeat memory dumps that are produced by NMI cards, and which are then transferred to another machine, as the second machine might have different hardware options.
The program also checksums the boot PROM prior to disk reads, and requires that it matches one particular checksum—that of the Disk ][ system—otherwise the program wipes memory and reboots. (This prevents protected programs from running on the Apple ][GS.)
Interestingly, despite all of the checks of the environment, the program does not protect itself against tampering, other than using encoded pages. The memory layout is data on pages #$A8-B1, and code on pages #$B2-BF. The data pages are very sparse, leaving plenty of room for a boot tracer to intercept execution and disable protections.
The program uses a quarter-track stepping algorithm that activates two phases, and then deactivates only one of them. According to Roland Gustafsson, this stepping technique allows for more precise positioning of the drive head, but it does not work on Rana drives. It was for this reason that he used the reduced-delay technique instead. The reduced-delay technique is apparently the only one which works on an Apple ][c, as well. SpiraDisc predated the Apple ][c by about two years, so it was just bad luck that an incompatible technique was chosen.
10:7.4 Illegal opcodes
The 6502 CPU has 151 documented instructions. There are quite a few additional instruction encodings for which the results could be considered useful, if the side-effects (e.g. memory and/or register corruption, or long execution time) were also acceptable. In some cases, the instructions were used to obfuscate the meaning of the code, since they would not be disassembled correctly. Some of these instructions were replaced in the 65C02 CPU with new instructions with different behaviors, and without the unfortunate side-effects. In some cases, the code that used the new instructions was not affected because the results of the old instructions were discarded, and the documented replacement did not introduce especially unwanted behavior. Note that the instructions that were not replaced will cause the 65C02 CPU to hang.
The Datasoft version of the program Dig Dug uses this technique. It begins with an instruction which used to behave as a two-byte NOP, but which is now a zero-page STZ instruction. Since the program does not make use of the zero-page at that time, the store has no side-effects. It looks like this in 6502 mode:
In 65C02 mode, the same machine code interpreted differently.
Beer Run uses this technique, but was unfortunate enough to choose an instruction which was not defined on the 65C02 CPU, so the program does not work on a modern machine. The code is run with the carry set much earlier in the flow, as a side-effect of executing a routine in the ROM BIOS. It is possible that the authors were not even aware of the fact.
which, when executed, does this:
X is zero, so $00 is first incremented to #$01, and then subtracted from A. A is zero before the subtraction, so it becomes #$FF. The resulting #$FF is used as a key to decipher some values later.
10:7.5 CPU bugs!
The original 6502 CPU had a bug where an indirect JMP (xxFF) could be directed to an unexpected location because the MSB will be fetched from address xx00 instead of page xx+1. Randamn relies on this behavior to perform a misdirection, by placing a dummy value at offset zero in page xx+1, and the real value at address xx00.
While not a bug, but perhaps an undocumented feature—the breakpoint bit is always set in the status register image that is placed on the stack by the PHP instruction. Lady Tut relies on this behavior to derive a decryption key.
There is also a class of alternative behaviours between the 650
2 and the 65C02 CPUs, particularly regarding the Decimal flag. For example, the following sequence will yield different values between the two CPUs: $1B on a 6502, and $0B on a 65C02. These days, it would be used as an emulator detection method. Try it in your favorite emulator to see what happens.
10:7.6 Magic stack values
One way to obfuscate the code flow is through the use of indirect transfers of control. Rescue At Rigel fills the stack entirely with the sequence #$12 #$11 #$10, and then performs an RTI without setting the stack pointer to a constant value. Of course, it works reliably.
Since there are only three values in the sequence, there should be only three cases to consider. If the stack pointer were #$F6 at the time of executing the RTI instruction, then this causes the value #$12 and $1011 to be fetched from $1F7. If the stack pointer were #$F7 at the time of executing the RTI instruction, then this causes the value #$11 and $1210 to be fetched from $1F8. If the stack pointer were #$F8 at the time of executing the RTI instruction, then this causes the value #$10 and $1112 to be fetched from $1F9. The program has an RTS instruction at the first and last of those locations. That yields two more cases to consider. The RTS at $1011 transfers control to $1112+1. The RTS at $1112 transfers control to $1210+1. That leaves one more case to consider. The program has an RTS instruction at $1113. The RTS at $1113 transfers control to $1211. So, both $1210 and $1211 are reachable this way. Both addresses contain a NOP instruction, to allow the code to fall through to the real entrypoint.
Note the phase “there should be.” There is one special case. The remainder of 256 divided by three is one. What is in that one byte? It’s the value #$10. So the first and last byte of the stack page is #$10, introducing an additional case. If the stack pointer were #$FD at the time of executing the RTI instruction, then this causes the value #$11 and $1010 to be fetched from $1FE. The program has an RTS instruction at $1010. The RTS at $1010 transfers control to $1112+1. The RTS at $1113 transfers control to $1211.
That’s not all! We can construct an even longer chain. If the stack pointer were #$F9 at the time of executing the RTI instruction, then this causes the value #$12 and $1011 to be fetched from $1FA. The RTS at $1011 transfers control to $1112+1, but the RTS at $1113 causes the stack pointer to wrap around. The CPU fetches both #$10 values, so the RTS at $1113 transfers control to $1010+1. The RTS at $1011 transfers control again to $1112+1. The RTS at $1113 finally transfers control to $1211.
Championship Lode Runner has a smaller chain. It uses only two values on the stack: $3FF and $400. An RTS transfers control to $3FF+1. The program has an RTS at $400. The RTS at $400 transfers control to $400+1, the real entrypoint.
10:7.7 Obfuscation
Anti-disassembly
This technique is intended to prevent casual reading of the code—that is, static analysis, and specifically targeting linear-sweep disassemblers—by inserting dummy opcodes into the stream, and using branch instructions to pass over them. At the time, recursive-descent disassembly was not common, so the technique was extremely effective.
Wings of Fury uses this technique, even for its system detection. The initial disassembly follows, with undocumented instructions such as RLA.
9600 ORA (0,X)
9602 LDY #$10
9604 BPL $9616
9606 RLA ($10,X)
9608 NOP
960A BEQ $95AC
960C NOP
960E STY $84
9610 STY $18
9612 CLC
9613 CLC
9614 BNE $961C
9616 CLC
9617 CLC
9618 BNE $960B
961A SRE ($51),Y
961C STY $C009
961F STX $20,Y
9621 ORA ($10),Y
9623 CPX $84
9625 STA $C008
9628 BEQ $9672
962A LDA $C088,X
962D ORA ($18),Y
962F ORA ($10),Y
9631 ASL
9632 LDX #$27
9634 ASL
9635 ASL
9636 LDY #$10
9638 BPL $9630
963A BRK
963B JMP $93BD
963E TYA
963F STA $400,X
9642 BNE $964C
9644 BRK
Upon closer examination, we see the branch instruction at $9604 is unconditional, because the value in the Y register is positive. That leads to the branch at $9618. This branch is also unconditional, because the value in the Y register is not zero. That takes us into the middle of an instruction at $960B, and requires a second round disassembly:
;store #$64 at $84
960B LDY #$64
960D STY $84
;four dummy instructions
960F STY $84
9611 CLC
9612 CLC
9613 CLC
;unconditional branch
;because Y is not zero
9614 BNE $961C
...
;switch to auxiliary memory
;bank, if available
961C STY $C009
;store alternative value
;at $84 ($20+#$64=$84)
961F STX $20,Y
;dummy instruction
9621 ORA ($10),Y
;compare the two values
;(differ in 64kb environment)
9623 CPX $84
;switch to main memory bank
9625 STA $C008
;branch if 128kb memory exists
9628 BEQ $9672
;turn off the drive
962A LDA $C088,X
;dummy instruction
962D ORA ($18),Y
;dummy ins masks real ins
962F ORA ($10),Y
;dummy ins in first pass
;opcode param in second pass
9631 ASL
;length of error message
9632 LDX #$27
;two dummy instructions
9634 ASL
9635 ASL
9636 LDY #$10
;unconditional branch
;because Y is positive
9638 BPL $9630
963A BRK
963B JMP $93BD
963E TYA
963F STA $400,X
9642 BNE $964C
9644 BRK
A third round disassembly:
;unconditional branch
;because Y is positive
9630 BPL $963C
...
;message text
963C LDA $9893,X
;write to the screen
963F STA $400,X
;unconditional branch
;because A is not zero
9642 BNE $964C
The obfuscated code only gets worse from there, but the intention is clear already.
Self-modifying code
As the name implies, this technique relies on the ability of code to modify itself at runtime, and to have the modified version executed. A common use of the technique is to improve performance by updating an address with a loop during a memory copy, for example. However, from the point of view of copy-protection, the most common use is to change the code flow, or to act as a light encoding layer. Self-modifying code can be used to interfere with debuggers, because a breakpoint that is placed on the modified instruction might be overwritten directly, thus removing it, and resulting in uncontrolled execution; or turned into an entirely unrelated (and possibly meaningless or even harmful) instruction, with unpredictable results.
Aquatron hides its protection check this way. The initial disassembly looks like this, complete with undocumented instructions such as ISB:
9600 DEC $9603
9603 ISB $9603
9606 LDA $9628
9609 E0R #$C9
960B BNE $960E
960D JSR $288D
9610 STX $18,Y
9612 BNE $9615
9614 JMP $29A0
9617 TYA
9618 BCC $961B
961A JSR $59
961D STX $99,Y
961F BRK
9620 STX $C8,Y
9622 BNE $9617
9624 TYA
9625 BPL $9628
9627 JMP $2960
Upon closer examination, we see references to instructions at “hidden” offsets, and of course, the direct modification of the instruction at $9603.
Second round disassembly:
9600 DEC $9603
;-> INC $9603
;undo self-modification
9603 ISB $9603
9606 LDA $9628
9609 E0R #$C9
;unconditional branch
;because A is not zero
960B BNE $960E
960D .BYTE $20
;replace instruction below
960E STA $9628
9611 CLC
;unconditional branch
;because A is not zero
9612 BNE $9615
9614 .BYTE $4C
9615 LDY #$29
9617 TYA
9618 BCC $961B
961A .BYTE $20
;decode and store
961B E0R $9600,Y
961E STA $9600,Y
9621 INY
9622 BNE $9617
9624 TYA
;unconditional branch
;because Y is positive
9625 BPL $9628
9627 .BYTE $4C
;self-modified by $960E to
;$A9 on first pass, restored
;to $60 on second pass
9628 RTS
;decoded by $961B-9620 on
;first pass, re-encoded on
;second pass
9629 .BYTE $29
Now we can see the decryption routine. It decodes the bytes at $9629-96FF, which contained a check for a sector with special format. If the checked passes, then the routine at $9600 is run again, which reverses the changes that had been made — the bytes at $9629-96FF are encoded again, and the routine exits via the RTS instruction at $9628.
Self-overwriting code
When self-modification is taken to the extreme, the result is self-overwriting code. There, the RWTS routine reads sector data over itself, in order to change the execution behavior, and potentially remove user-defined modifications such as breakpoints or detours. LifeSaver uses this technique. The loader enters a loop which has no apparent exit condition. Instead, the last sector to be read from disk contains an identical copy of the loader code, except for the last instruction which branches to a new location upon completion. When combined with a critically timing-dependent technique, such as reading a sector while the head is moving, it becomes extremely difficult to defeat.