If the fetch is slow enough; well, this is an interesting case. Bit-copiers try to read the data as quickly as it comes in. This is done not by polling the QA switch of the Data Register, but by checking if the top bit is already set, in an unrolled loop.
;2 cycle delay so
;shift might finish
TDL1 NOP
;try to detect timing bit
LDA $C0EC, X
BMI TDS2
TDL2 LDA $C0EC, X
BMI TDS2
;timing bit probably present
LDA $C0EC, X
BMI TDS3
LDA $C0EC, X
BMI TDS3
LDA $C0EC, X
BMI TDS3
LDA $C0EC, X
BMI TDS3
;3 cycle penalty if taken !
BPL TDL2
TDS2 STA ($0), Y
...
RTS
;store value with timing bit
;loses one bit as a result
TDS3 AND #$7F
STA ($0), Y
...
RTS
This code is a disassembly from Essential Data Duplicator (E.D.D.), but apart from the BPL instruction, it is shared by Copy ][+. (Someone copied!) Normally, a nibble will be shifted in before TDL2 completes, so that TDS2 is reached, and the nibble is stored intact. However, by using only six fetches, the code is vulnerable to a well-placed timing bit, such that the BPL will be reached just before the last bit of the nibble is shifted in. That three-cycle time penalty when the branch is taken is just enough that, when combined with the two-cycle instruction before it, the shift will complete, and the four CPU cycles will elapse, before the next read occurs. The result is that the nibble is missed, and the next few nibbles that arrive will reach TDS3 instead, losing one bit each. When those data are written to disk by the bit-copier, the values will be entirely wrong.
Create With Garfield: Deluxe Edition uses this technique. (The original Create With Garfield uses an entirely different protection.) It has one track that is full of repeated sequences. Each of the sequences has a prologue of five bytes in length. Every second one of the prologues has a timing bit after each of the five bytes in the prologue. In the middle of the track is a collection of bytes which do not match the sequence, so the track is essentially split into two groups of these repeated sequences. The size of the two groups is the same. When the bit-copier attempts to read the data, the timing bits cause about half of the sequences to be lost. What remain are far fewer sequences than exist on the original disk. (Enough of them that the bit-copier mistakenly believes that it has copied the track successfully.) A program can detect a copy by the small count of these sequences. This technique is likely to have been created to defeat E.D.D. specifically, but Copy ][+ is also affected. However, the protection can be reproduced with the use of a peripheral that connects to the drive controller (and thus see the zero-bits for exactly what they are), or by inserting an additional fetch in the software.
Bit-flip, or defeat bit-copiers with this one weird trick
Deeply technical content follows. Prepare yourself!
Let’s take this simple sentence (sorry, but it’s the best example that I could create at the time):
ITHASGOTTOBETHISLANDAHEAD
And split it according to some potential word boundaries:
IT HAS GOT TO BE THIS LAND AHEAD
Now we skip a bit:
OTTO BETH ISLAND AHEAD
A bit more:
TO BETH ISLAND AHEAD
A bit more still:
BET HIS L AND A HEAD
Okay, that last one doesn’t make much sense, but I wanted a sentence which could be read differently, depending on where you started reading, as opposed to a series of arbitrary overlapping words. In any case, it’s clear that depending on where you start reading, you can get vastly different results. Something similar is possible while reading the bitstream from the disk. After a nibble is shifted in (determined by the top bit being set), and the four CPU cycles have elapsed, and once the one-bit is seen, then the QA switch of the Data Register is set to zero. The absence of a counter allows the hardware to be fooled about how many bits have been read. Specifically, the controller can be convinced to discard some of the bits that it has read from the disk while forming a nibble, and then the starting position within the stream will be shifted accordingly. This is possible with a single instruction, in conjunction with an appropriate delay.
After issuing an access of Q6H ($C08D + (slot x 16)), the QA switch of the Data Register will receive a copy of the status bits, where it will remain accessible for four CPU cycles. After four CPU cycles, the QA switch of the Data Register will be zeroed. Meanwhile, assuming that the disk is spinning at the time, the Logic State Sequencer (LSS) continues to shift in the new bits. When the QA switch of the Data Register is zeroed, it discards the bits that were already shifted in, and the hardware will shift in bits as though nothing has been read previously. Let’s see that in action.
Tinka’s Mazes does it this way, beginning with some preamble code which is common to many programs that used this technique.
BB6A LDY #0
;wait for nibble to arrive
BB6C LDA $C08C,X
BB6F BPL $BB6C
BB71 DEY
;retry up to 256 times
BB72 BEQ $BBBB
;watch for #$D5
BB74 CMP #$D5
BB76 BNE $BB6C
BB78 LDY #0
;wait for nibble to arrive
BB7A LDA $C08C,X
BB7D BPL $BB7A
BB7F DEY
;retry up to 256 times
BB80 BEQ $BBBB
;watch for #$E7
BB82 CMP #$E7
BB84 BNE $BB7A
;wait for nibble to arrive
BB86 LDA $C08C,X
BB89 BPL $BB86
;watch for #$E7
BB8B CMP #$E7
BB8D BNE $BBBB
;wait for nibble to arrive
BB8F LDA $C08C,X
BB92 BPL $BB8F
;watch for #$E7
BB94 CMP #$E7
BB96 BNE $BBBB
Here is the switch:
;trigger desync
BB98 LDA $C08D,X
BB9B LDY #$10
;delay to ensure > 4 cycles
;before the next read occurs
BB9D BIT $6
;wait for nibble to arrive
BB9F LDA $C08C,X
BBA2 BPL $BB9F
BBA4 DEY
;retry up to 16 times
BBA5 BEQ $BBBB
;watch for #$EE
BBA7 CMP #$EE
BBA9 BNE $BB9F
BBAB LDY #7
;wait for nibble to arrive
BBAD LDA $C08C,X
BBBO BPL $BBAD
;compare backwards against the
;list at $BBC1
;E7 FC EE E7 FC EE EE FC
BBB2 CMP ($48),Y
BBB4 BNE $BBBB
BBB6 DEY
BBB7 BPL $BBAD
;pass
BBB9 CLC
BBBA RTS
BBBB DEC $50
;retry if count remains
BBBD BNE $BB57
;fail
BBBF SEC
BBCO RTS
BBC1 .BYTE $FC, $EE, $EE, $FC,
$E7, $EE, $FC, $E7
But wait, there’s more! To see the bitstream on disk, it looks like D5 E7 E7 E7 E7 E7 E7 E7 E7 E7 E7 E7 with some harmless zero-bits in between. So from where do the other values come? Since the magic is in the timing of the reads, we must count cycles:
Time passes...
One bit is shifted in every four CPU cycles, so a delay of 15 CPU cycles is enough for three bits to be shifted in. Those bits are discarded. Back to our stream. In binary, it looks like the following, with the seemingly redundant zero-bits in bold. 11100111 0 11100111 00 11100111 11100111 0 11100111 00 11100111 11100111 0 11100111 0 11100111 11100111
However, by skipping the first three bits, the stream lo
oks like this:
00 11101110 0 11100111 00 11111100 11101110 0 11100111 00 11111100 11101110 0 11101110 0 11111100 111...
The old zero-bits are still in bold, and the newly exposed zero-bits are in italics. We can see that the old zero-bits form part of the new stream. This decodes to E7 FC EE E7 FC EE EE FC, and we have our magic values.
Programs from Epyx that use this protection do not compare the values in the pattern. Instead, the values are used as a key to decode the rest of the data that are loaded. This hides the expected values, and causes the program to crash if they are altered.
The Thunder Mountain version of Dig Dug uses a slight variation on the technique, including a different preamble and switch. The company seems to have kept the variation to themselves. (Bop’N Wrestle from 1986 uses the same altered version, and comes from Mindscape, but Mindscape owned the Thunder Mountain label, so the connection is clear.)48 That version looks like this:
0224 LDY # $00
;wait for nibble to arrive
0226 LDA $C08C,X
0229 BPL $2226
022B DEY
;retry up to 256 times
022C BEQ $2275
022E CMP #$AD
0230 BNE $2226
A different prologue value is checked, allowing the bitstream to begin like a regular sector: D5 AA AD. . .
Here is the switch:
;trigger desync
0252 LDA $C08D,X
0255 LDY # $10
;no delay instruction in this version
;wait for nibble to arrive
0257 LDA $C08C,X
025A BPL $2257
025C DEY
;retry up to 16 times
025D BEQ $2275
;watch for #$E7 instead, but it’s not a ‘‘true’’ E7
025F CMP #$E7
0261 BNE $2257
;and double the size of the pattern to match
0263 LDY #$0F
The bitstream on disk looks like D5 AA AD [many 96s] E7 E7 E7 E7 E7 E7 E7 E7 E7 E7 E7 with some harmless zero-bits in between. The desync timing is only 12 cycles, but the required pattern is not found right away, so the delay is not as interesting. In binary, the stream looks like 11100111 11100111 11100111 00 11100111 0 11100111 0 11100111 0 11100111 00 11100111 00 11100111 0 11100111 00 11100111 0 11100111 0 11100111 0 11100111 00 11100111 0 11100111 00 11100111 0 11100111 0 11100111 with the seemingly redundant zero-bits in bold. However, by skipping the first three bits, the stream looks like this: 00 11111100 11111100 11100111 (← E7, but not aligned) 00 11101110 0 11101110 0 11101110 0 11100111 00 11100111 00 11101110 0 11100111 00 11101110 0 11101110 0 11101110 0 11100111 00 11101110 0 11100111 00 11101110 0 11101110 0 111. . .
The old zero-bits are still in bold, and the newly exposed zero-bits are in italics. We can see that the old zero-bits form part of the new stream. This decodes to FC (ignored) FC (ignored) E7 EE EE EE E7 E7 EE E7 EE EE EE E7 EE E7 EE EE, a very smooth sequence indeed. Put simply, each single bold zero-bit sequence results EE being seen, and every double bold zero-bit sequence results in E7 being seen, allowing easy control over exactly how smooth the sequence is.
1-2-3 Sequence Me uses the same technique but with different values:
;wait for nibble to arrive
BA5B LDA $C08C,X
BA5E BPL $BA5B
;watch for #$AA
BA60 CMP #$AA
BA62 BEQ $BA7A
...
BA7A LDY #$02
;trigger desync
BA7C LDA $C08D,X
;delay while status is loaded
BA7F PHA
;balance stack
BA80 PLA
;wait for nibble to arrive
BA81 LDA $C08C,X
BA84 BPL $BA81
;watch for #$BB
BA86 CMP #$BB
BA88 BEQ $BA8F
BA8A DEY
;retry if count remains
BA8B BPL $BA81
;fail
BA8D BMI $BA77
;wait for nibble to arrive
BA8F LDA $C08C,X
BA92 BPL $BA8F
;watch for #$F9
BA94 CMP #$F9
BA96 BNE $BA77
That stream looks like AA EB 97 DF FF with some harmless zero-bits in between. Now let’s count the cycles:
BA5B LDA $C08C,X
BA5E BPL $BA5B ;2 cycles
BA60 CMP #$AA ;2 cycles
BA62 BEQ $BA7A ;3 cycles
...
BA7A LDY #$02 ;2 cycles
BA7C LDA $C08D,X ;4 cycles
BA7F PHA ;3 cycles
;total: 16 cycles
One bit is shifted in every four CPU cycles, so a delay of 16 CPU cycles is enough for four bits to be shifted in. Those bits are discarded. Back to our stream. In binary, it would look like 11101011 0 10010111 0 11011111 00 11111111, with the seemingly redundant zero-bits in bold.
However, by skipping the first four bits, the stream looks a bit different. 10110100 10111011 0 11111001 111111. . .
The old zero-bits are still in bold, and the newly exposed zero-bit is in italics. We can see that the old zero-bits form part of the new stream. This decodes to B4 (ignored) BB F9 Fx, and we have our magic values.
The 4th R: Reasoning uses another variation of this technique. Instead of matching the values explicitly, it watches for the data field on a particular sector, waits for three nibbles and three bits to pass, and then reads and stores the next 16 nibbles in an array. Then it calculates a checksum of those 16 nibbles, and uses the checksum as an index into the table of those 16 nibbles, to fetch two 8-bit keys in a row. The table is treated as a circular list, so if the index were 15, then the two keys would be formed by fetching the last entry in the array and the first entry in the array. The keys are used to decipher the other nibbles that are read from all of the other sectors on the disk. It looks like this:
;wait for nibble to arrive
BB63 LDA $C08C,X
BB66 BPL $BB63
;wait for nibble to leave
;if zero-bit is present,
;then read value lasts longer
BB68 LDA $C08C,X
BB6B BMI $BB68
;wait for nibble to arrive
BB6D LDA $C08C,X
BB70 BPL $BB6D
;trigger desync
BB72 STA $C08D,X
;delay to reduce times
;that branch will be taken
BB75 NOP
;wait for status value to
;leave if zero-bit is present
;then read value lasts longer
BB76 LDA $C08C,X
BB79 BMI $BB76
;wait for next nibble
BB7B LDA $C08C,X
BB7E BPL $BB7B
That stream looks like CF CF 9E FD ED BB E6 B6 ED FB FC EB DF DE D3 D9 FF D9 DD D7 with some harmless zero-bits in between. Now let’s count those cycles.
BB63 LDA $C08C,X
BB66 BPL $BB63
BB68 LDA $C08C,X
BB6B BMI $BB68
BB6D LDA $C08C,X
BB70 BPL $BB6D ;2 cycles
BB72 STA $C08D,X ;5 cycles
BB75 NOP ;2 cycles
BB76 LDA $C08C,X ;4 cycles
;but +4 cycles for each time
;reached because of zero-bit
BB79 BMI $BB76 ;2 cycles
;but +3 for each time BMI is
;taken because of zero-bit.
;total 15, 22 or 29 cycles
One bit is shifted in every four CPU cycles, so a delay of 15 CPU cycles is enough for three bits to be shifted in. A delay of 22 CPU cycles would normally be enough for five bits to be shifted in. However, if the delay is caused by the presence of a zero-bit, then it behaves as though the delay were only 18 CPU cycles, which is enough for four bits to be shifted in. A delay of 29 CPU cycles is enough for seven bits to be shifted in. However, if the delay is caused by the presence of a second zero-bit, then it behaves as though the delay were only 21 CPU cycles, which is enough for five bits to be shifted in. In any case, the routi
ne is written to discard a fixed number of regular bits, along with any zero-bits that are also present. Back to our stream, in binary, it would look like this, with the seemingly redundant zero-bits in bold.
11001111 11001111 0 10011110 11111101 0 11101101 10111011 11100110 10110110 11101101 11111011 0 11111100 11101011 11011111 11011110 11010011 11011001 11111111 11011001 11011101 0 11010111
However, by skipping the first three bits, the stream looks a bit different.
0 11110100 11110111 11101011 10110110 11101111 10011010 11011011 10110111 11101101 11111001 11010111 10111111 10111101 10100111 10110011 11111111 10110011 10111010 11010111
The old zero-bits are still in bold, and the newly exposed zero-bit is in italics. We can see that the old zero-bits form part of the new stream. This decodes to F4 F7 (both ignored) EB B6 EF 9A DB B7 ED F9 D7 BF BD A7 B3 FF B3 BA. The trailing values are stored backwards, and the checksum is #$67. The low four bits (7) are the index into the table, and the values at offset 7 and 8 are #$D7 and #$F9.
A bit-copier that misses any of these zero-bits will write a track whose length and contents do not match the original
Race conditions
Page 4 of the Software Control of the Disk ][ or IWM Controller document states that “The Disk ][ controller hardware will keep the ENABLE/signal to its active low state for approximately one second after the execution of the motor off instruction, therefore read/write can be performed reliably within this period.” So, a program can issue the motor off instruction, and then read sector data successfully for up to one second afterwards.
This behavior functions as a very nice anti-debugging mechanism, since single-stepping through the disk access code, after the motor-off instruction has been issued, will cause the time period to be exceeded. Thus, the disk won’t be readable at that time. Sherwood Forest uses this technique.
Page 4 of the Software Control of the Disk ][ or IWM Controller document also states that “... the program should verify that the motor is spinning by monitoring the change in data pattern read from the drive.” That is to say, while the drive is spinning, the value will change. Once the drive stops spinning, the value will not change anymore.
Lady Tut uses this technique. It issues the motor-off instruction, and then reads continually from the drive until it sees two consecutive bytes of the same value. The program assumes at that point that the drive is no longer spinning. Periodically there-after, the program reads from the QA switch of the Data Register, and compares the newly read value with the initially read value. If a different value is seen, then the program triggers a reboot.
PoC or GTFO, Volume 2 Page 14