On Feb 8, 2022, at 3:48 AM, Johnny Billquist
<bqt(a)softjar.se> wrote:
On 2022-02-08 09:34, dave.g4ugm(a)gmail.com wrote:
...
I am told by a guy who used to lecture at Manchester University that when they were
developing the MU5 experimental computer, they found that in general about one in six
instructions was a jump/branch.
This meant that an instruction pipeline longer than 6 provided little performance
improvements.
... so they developed a "heuristic pipeline" with content addressable memory.
Whenever a jump/branch instruction is executed, an entry is made in content addressable
memory showing the location of the jump and the location of the next instruction executed.
(this may of course simply be the next instruction). When the pipe-line building logic
fetches a "jump" instruction it checks the content addressable memory to see if
we have executed this jump recently.
If so we then fill the pipeline from where the code branched to last time. We may not be
right, in which case we are no worse off, but if we do get it right, and as a lot of code
consists of loops, then we have the code we need in the pipeline and performance is
improved.
Sadly all the papers I can find on this are behind expensive paywalls. E.g.
https://dl.acm.org/doi/10.1145/359327.359333
The technology is fairly common today. Speculative execution after branches, branch
predictions based on history or statistics. All rather interesting stuff.
I seem to remember even seeing machines that speculatively executed down both paths, and
just discarded the wrong one once it knew.
Yes, that goes back decades. A very simple approach is to "predict" based on
static considerations. For example, some computers (VAX? Alpha?) predict backward
branches to be taken, and forward ones to be not taken. A very strange case I remember
well is the Motorola 68040, which predicts all branches to be taken. When doing assembly
language programming this makes you write strange looking code, because the intuitive
thing to do is "branch on unlikely condition around the next bit of code" then
"the code for the likely case". But on the 68040 you have to put the unlikely
case after the branch, and the likely case is the target of the branch. This causes the
spaghetti to get worse than usual.
Not quite a CAM type branch predictor, but the CDC 6600 (1964) had a small buffer
("instruction stack") which would hold 6-7 words of preceding code so small
loops could run without memory references. An even earlier scheme vaguely like that is
the Dutch ARMAC computer (1956?) which has drum main memory but a single track core memory
buffer, so there was a lot of effort to group loop code and/or data together on a single
track to avoid the rotational latency.
This is also a reason for the kind of conditional
execution of instructions Paul was mentioning (with regards to ARM and others). Having
that kind of capability in the CPU reduces the amount and need of branches. (Don't
help with loops though, those still need to use branches.)
Right. It may also help to have loops that use explicit loop instructions, which are
known to be "branch likely". Electrologica did that; so does the PDP-11 and VAX
(SOB instruction).
paul