by Neil Clarke
Note that Brady hasn’t used the arrow yet. It’s actually the tribute he will give to the third oracle. Regardless of where it points, Brady always races the address from the base back to the start of the ring.
Gshare is more accurate than NAT. When their predictions disagree, Brady tells Ian the Gshare prediction. Of course, Gshare takes a cycle longer to consult than NAT. By the time he knows the Gshare prediction, he has already told Ian the NAT prediction. To correct the fetch path, the worm has to back up and the instructions Ian had forced down its mouth are flushed out.
Like NAT, Gshare doesn’t know anything about the instructions Ian fetches. It divines the future using a squirming snake and a bunch of arrows. Given a fetch address, Ian will ship down the pipeline a bundle of instructions starting at that address. Gshare may predict that the last instruction in that bundle will be a branch that will be taken. That bundle, however, may not even have a branch in it18. Despite PHR, fetch paths may have collided in the BHT and stolen each other’s arrows.
In a way, this is fetching with one hand tied behind the back. At the very least, Brady should predict only the branches that exist and not the ones that don’t. Ian handles every instruction that he ships down the pipeline. Once he finds them, he knows which ones are branches. In the time that shards of Brady take to consult NAT and Gshare, other shards of Brady are embarking on the long journey around the third and largest of the rings. By the time he reaches the third oracle, Ian will have found for Brady some of the answers Brady wants.
Branch Target Address Calculator (BTAC)
As the name suggests, the third oracle Brady consults doesn’t actually predict. It calculates. By the time Brady reaches BTAC, Ian has seen the instructions that live at the address A that Brady told him. From that, finding the address to go after A is just an addition, assuming the BHT arrow is pointing the correct way.
BTAC actually lives in Ian’s portions of the worm, not Brady’s19. The ring that Brady walks leads out of his own domain, into Ian’s, then back. Just around the bend, gears gnash against each other, conveyor belts whir, and hammers pound. Instructions are conveyed through a tunnel on their way down the pipeline. Inside, they are cracked and counted.
Brady can only want one of two addresses: the sequential address or the target address. BTAC works out both of them as Brady approaches the oracle.
The sequential address is the address that makes the worm go straight ahead. It’s the right address to use when the bundle of instructions Ian sends has no branches or the bundle ends with a branch that the BHT arrow will insist is not taken. It’s the address A plus the number of instructions in the bundle. That’s what the counting is for.
The target address is where the worm should go next if it’s not straight ahead. It’s the right address when a branch ends the bundle and the BHT arrays say that branch is taken. That’s what the cracking is for. Inside the branch instruction is a token that tells Ian how far away the instruction the worm should consume next is. Add the token to address A and Ian has a target address.
Ian deals with full addresses. Brady deals with an ITLB index and an offset. After Ian calculates his addresses, he needs to find the ITLB entry that holds the 8K page that matches it. If it’s not in the ITLB, then Ian goes on his own epic journey. But that’s another story…
Brady approaches BTAC as a supplicant with the BHT arrow. Ian is there, waiting for him. Brady gives him the arrow. If it points to not taken, Ian gives Brady the sequential address. If it points to taken, Ian gives Brady the target address. Actually, if there are no branches in the instruction bundle, Ian just scoffs and tosses the arrow away20. Then he gives Brady the sequential address.
Brady sprints back to the start. BTAC is the most accurate of the oracles. It also takes Brady four cycles to run this ring. By the time Brady can use this address, he has already told Ian three fetch addresses in the meantime.
This is the price, because there’s always a price. To use the BTAC prediction, the worm has to first flush out the three fetch addresses Ian has been told in the meantime, wasting whatever effort Sentry has already put into them. So much for keeping the pipeline full. Also, it may turn out that he doesn’t use the BTAC prediction. The fetch address it’s based on may itself have been flushed out. This is why we want to get to the correct address as quickly as possible. The more we flush, the more empty the pipeline, the poorer Sentry performs.
As things turned out, instructions were tougher to crack than we expected. Ian can’t get it done before Brady shows up. Instead, Ian partially pre-cracks them when he first caches them away. For every instruction, he figures out whether it’s a branch and how far it is to the next. This way, the sequential address is mostly fishing out this extra information and an addition21.
Furthermore, branches come in sixteen flavors. Each one has a different condition that has to be met in order for that branch to be taken. Ian can’t distinguish between flavors. He doesn’t have the time. On one hand, this is OK because Ian doesn’t have the space to cache that away with each instruction anyway22. On the other hand, the Branch Always instruction is always taken and the Branch Never instruction is never taken. The BHT arrow may point in the wrong direction and Brady will get the wrong address even though the information that tells them which address to use is sitting inside the partially cracked branch instruction23.
After BTAC, there’s nothing left for Brady to do for this fetch address except wait for the truth to come to confirm or deny his prediction. In order for this journey to be worth to taking, Brady has to find the truth way more often than the truth finds him. Otherwise, he might as well just have waited.
The magic of branch prediction, of course, is that NAT, Gshare, and BTAC together are surprisingly accurate once they are properly trained. “Properly” is the key word.
Branch Status Table
Historiomancy insists that what has happened will happen. Inside NAT, BHT, and BTB, Brady remembers what has happened. He consults them to divine what will happen. To remember, though, he inevitably makes a mistake first. A NAT entry or BTB entry has to be wrong before it can be repaired with a corrected address. The truth sways the BHT arrow that foretold (or didn’t foretell) the branch direction. As Brady sees the same branches again and again, correct predictions are swept inward to faster predictors. BTAC calculations find themselves inside BTB and BTB predictions find themselves inside NAT. The truth, of course, repairs all predictions.
The Branch Status Table (BST)24 tracks every branch from when Ian is told its address to when an execution unit retires it. When Brady sends his prediction to Ian, he stores it in a BST entry along with the fetch address he predicted from and the PHR used in the prediction (if any).
When a branch resolves, the truth comes to Brady with an index into BST. Actually, Ian sees the truth first because—and this seems to be a familiar refrain—the truth is an address. Ian needs to convert the upper bits into an ITLB index. Then Brady sees the truth. If we call the cycle where Brady tells Ian an address F1, then the truth arrives two cycles before that at FN1, because zero is a number25.
Brady compares his prediction with the truth. If he predicted wrong, not only does he need to re-steer the worm down the correct path, but he has to fix NAT, BHT, and BTB so that the next time the worm slides down this way, those oracles predict the path it is sliding down now. From what he’s stored in the BST for that branch, Brady can figure out which entries of NAT, BHT, and BTB to repair and with what. In addition, the PHR stored in the BST replaces the current PHR, then the actual direction of the branch is shifted into it. This makes the pattern of taken and not-taken branches in PHR what it would have been if we had predicted correctly.
Two observations:
One, Brady only uses BST to deal with the truth. He has no idea when the truth will arrive. It just shows up from out of the blue. When that happens, he has to remind himself what must be repaired. When he corrects himself, his repairs are based on which predictor is making the correc
tion. For example, a Gshare prediction always corrects the NAT prediction Brady told Ian the previous cycle26. In these cases, Brady remembers the past by passing along the fetch address and PHR from stage to stage.
A maze of tracks and maws surround the PHR. It pushes copies of itself onto the tracks, cycle after cycle. Like misshapen cars on intertwined roller coasters, they rush down the tracks, looping back to the start. At any cycle, Brady can revert to the PHR of two cycles ago or four cycles ago then go on as though it’d predicted correctly.
This all sounds really simple, except there’s a new prediction every cycle. That prediction may be for a new branch or it may correct an old branch. Meanwhile, predictions may be flushed out for so many possible reasons. Lots of PHRs and fetch addresses are careening through the tracks but not all of them are valid. The logic to throw the correct snake into PHR turned out to be surprisingly hard to get right, especially when the result has to make it into the trap before it shuts or it is never remembered27.
Two, repairs are never undone. By the time the truth arrives, Brady may have repaired itself who knows how many times for other branches fetched in the interim. The latest truth may flush out those other branches, but the repairs they caused stay in Brady. The Brady repaired by the truth still remembers walking down those false paths that now never happened.
If Brady is accurate enough, this may not happen often enough to matter. The paths that are false now may be paths Sentry will walk later. Maybe Brady has just trained them ahead of time. These may be good reasons to leave those repairs in. These may just be the rationalizations we tell ourselves because there’s nothing else we can do. We don’t have the room to remember enough so that we can recreate NAT, BHT, and BTB exactly as they were after any given fetch. As a result, we can never forget our mistakes. If we’re lucky, we overwrite them.
The Things We Don’t Talk About
This is the branch predictor that never was. There’s a lot I’ve avoided talking about, entire classes of control-transfer instructions, for example. Brady didn’t just predict branches, but also jumps, calls, and returns. Hell, the architecture Sentry implemented specified one delay slot immediately following each branch28. That delay slot can be in the following line and requiring its own fetch. The instruction in the delay slot may itself be a branch (whose delay slot, as it turns out, is located at the target of the original branch)29.
Truth turned out to be much more ephemeral in real life. Multiple branches were being digested in the worm at any given time. The execution units generally resolved them in out-of-order relative to how they arrived at the execution units. Sometimes, they resolved the same branch multiple times with a different result each time. Like the IFU then, the execution units made their own gambles that sometimes did not pay off. That whiplashed the worm, jerking it from one path to another then back again. The truth was not the final word. It was just something that was right for now, but might be wrong later, or not. The result the final time a branch resolved is always correct, but how do you know, at the time, that it’s the final time? You only realize too late when the branch retires. That is, when the worm has excreted the result.
Sentry, of course, had no real resolution. At least not a satisfying one. We never got to see whether it would have worked for real. Part of me still thinks that execution units that calculate the world are still speculating, that Sentry can still resolve another way. The world will flush out the past decade or so of our lives, start down another path and, this time, we’ll ship the damn thing. Or maybe we’ve already tried that and we’ve whiplashed back. There’s no way to know until we retire30.
Footnotes:
1 - Sentry doesn’t exist. The processor was canceled just as we were ready to build it for real. I’ve gone on to design processors that do exist. I’ve gone on to other canceled projects. I’ve gone on to state-of-the-art work. Sentry was just my first processor, but also the one where I wonder about the other paths we might have taken.
2 - Yes, I’m being pedantic. Architects, like Ajay and me, tell the builders and verifiers fairy tales about how the processor works. Designers, like Marie, take the fairy tales and make them real. They spin nano filaments and hammer them into place. They make sure the nanodots actually make it from one trap to another before the traps shut. Verifiers, like Hongwen, make sure the fairy tales are believable.
3 - I never found a diagram of the entire Sentry pipeline, just portions for a few of its units. The ones drawn by my partner-in-crime, Ajay, were, of course, exquisite.
4 - And the better Sentry performs, the more likely you can keep doing the job you love with the people you love.
5 - Yes, we really called it that. (Say it one letter after the other, like we did. I-F-U.) Gallows humor. Sentry was always in danger of being canceled.
6 - By the time the project was canceled, I’d invested so much of myself, I might as well have been Brady.
7 - In my mind’s eye, I always imagine Ian looks like Ajay. Tall, broad, and, frankly, somewhat daunting. Ian is, by all rights, though, Ajay’s story to tell, not mine.
8 - Or Ajay, for that matter.
9 - Sometimes, knowing what to say to Ajay was as hard as predicting the outcome of branches.
10 - Hmm. It’s going to be hard to talk about Brady without talking about Ian. Figures. It’s not like Ajay had nothing do with Brady or I had nothing to do with Ian.
11 - Yes, the IFUCR. It’s pronounced exactly the way you think it’s pronounced.
Ajay joined the project about a year in. He had this tendency to loom. I knew I’d like him when, about two weeks after he started, he walked into my cube and told me, deadpan, “I need a bit in the eye fucker.” Then, he smiled.
As it turned out, though, he really did need a bit in the IFUCR.
12 - Ajay was so the right person to design Ian. I’ve never seen anyone so handy with a spreadsheet or so disappointed when things didn’t go exactly according his Gantt chart.
He’d go lift whenever he got frustrated. Sometimes, I’d run into him in the locker room. He was much more approachable tired.
It probably says something that Sentry changed what flavor of imposing he presented over time. He started off very “Come home with me to my private island in the South Pacific.” After a couple years on the project, he was much more “Fight by my side in our quest for truth and justice.” Given that I was the one born outside the US—on an island in the South Pacific, no less—I found the latter far more appealing.
13 - Did you know that around 2AM, campus security would walk the parking lots and take down license plate numbers? 2AM was also about when Ajay would show up at my cube and gently suggest that it was time to go home.
I’m no longer dedicated enough to pull hours like that. Maybe if I were, there would be another Ajay at my current job for me to lean against as we walk out of the office and into the parking lot.
Trade offs.
14 - Marie basically parked herself in my cube until I came up with something that fit within one cycle. Hongwen liked NAT because it was simple. We could build it and we could verify it. Now if only we could have also made it predict accurately.
15 - If Ajay was the one who was precise and regular, then I was the one who ran down all the blind alleys then backtracked. Maybe, in a way, that made me the right person to architect Brady.
16 - History has not been kind to the delay slot. I tried to explain to Ajay once what happens if a delay slot itself is a branch. He fell asleep in my lap on my couch. In his defense, we were both hammered.
17 - Ian worked so closely with Brady that Ajay had to understand Brady, too. I’d try to explain historiomancy to Ajay over pizza and all I’d get back from him is this penetrating stare, as though I was daring to lie to the embodiment of all that’s true and just. He simply couldn’t see how a snake and a pile of arrow could guess right so often.
Honestly, I don’t think I could have lied to him even if I wanted to. No matter how skeptical his gaze, his body langua
ge was always gentle. His large, thick-fingered hands would rest open on his thighs, not balled into fists across his chest. (My hands and forearms, of course, were scarred. So much for that concert piano career.)
He listened, even when it was obvious that I didn’t understand how a snake and a pile of arrows were right so often either. Just that they were.
18 - I still remember Ajay’s expression when I started to point out the stupid ways Brady can go wrong. Who knew it was possible for eyes to roll that high?
19 - This part of the story doesn’t make sense without Ajay’s. Ian does the calculation, then Brady figures out what to do with it. By this point in the design, Ajay and I were one mind in two bodies.
20 - Ian brooks about as little nonsense as Ajay. Shocking, I know.
21 - This scheme seems obvious in retrospect, but we didn’t come up with it until after five workouts, three dinners, and Mission Impossible (the movie). That said, I don’t know that we were trying to minimize the amount of time we spent together outside the office.
22 - Ajay and I tried, but one did not mess with Marie. Her first processor was in the ball bearing days. If she said there wasn’t room, there wasn’t.
23 - When I realized this, I told Ajay that this was my baptism into the compromises of real-world computer architecture. He just smiled and handed me my cell phone. I’d left it at his apartment the previous night.
24 - Yes, a table of BS. Keeping morale in Sentry’s final days wasn’t easy.
25 - You’d be surprised who didn’t think F0 was a thing. For an embarrassingly long time, Ajay and I were a cycle off until we realized what was happening. It turns out when we both think the other one is wrong, anything could set one or the other of us off. (No, of course it wasn’t really about whether zero was a number. That was just the handy linchpin.)