Branchless Programming Guide

5

bl4ckthund3r43 2020-09-15 03:31 (Edited)

A quick example of branchless programming. Useful for getting rid of if-else monsters, and speeding up the program.

==why?==

A CPU will load instructions BEFORE using them to be faster. This works until it hits a ‘branch’, like an if statement. In this case, it loads what *might* happen (by guessing), and if it’s wrong, will unload those instructions to load the correct ones.
This, unsurprisingly, takes time. So to combat this, use as few branches as possible.

==how?==

The way an if statement works is by evaluating a conditional statement, and running if it’s true or 1. We can use this to our advantage to remove the if part. This works really well with math. Say you want the smaller of 2 numbers, A and B. You could write something like:

If A<B print A else if B<=A print B

And that would be fine, but we can do better. Using the branchless method, just multiply each by the conditionals and add them together, like so:

Print -(((A<B)*A)+((B<=A)*B))

(Of course, the way this language works, true is -1, and as such the whole expression must be negated.)

This is a bit faster (because it’s pure math), and has no branches. This works with a lot of other things, too. Try it, it can only help.

==edit==
In response to was8bit’s comment asking how to implement more commands from 1 comparison, there are 2 ways to do it.

1) boil down all commands to one mathematical statement.
This is a cleaner, though harder, way to do it. Because computers ONLY manipulate numbers at the end of the day, this is almost always possible.

2) use ‘jump (statement)’ as substitution for sub programs.
This is almost always messier, but will be far easier to implement. It will take longer than the purely math driven approach, but won’t be as slow as if-else if-else blocks. PLEASE NOTE that you may need to do bit manipulations for the jump, as you can’t jump straight to a line number. (While generally being cleaner, it messes with the branchless method).

I hope either solution was beneficial in your programming endeavors.


was8bit 2020-09-15 04:30

I did a comparison speed test, and there is a signicant 1:100 faster...

BUT how to apply that to branching to different sets of commands depending on a value, such as C=1 THEN (commands) ELSE IF C=2 THEN (commands) .. etc...


Timo 2020-09-15 06:05

It’s true that real CPUs do this, but I think it doesn’t affect interpreted languages very much. (LowRes NX doesn’t have a JIT)

Here it’s more important to use less CPU cycles (see manual in advanced topics). But maybe the math tricks can help with cycles, too.


rilden 2020-09-15 17:47 (Edited)

Just tested this program with my cycle counter program (https://lowresnx.inutilis.com/topic.php?id=1374). In this case using an if else statement is actually faster.

result=-((F=1)*(N*N))+((F=2)*((N*N)/2))+((F=3)*(PI*N*N)) 

consumes 30 cycles.

if f=1 then
  result=n*n
else if f=2 then
  result=n*n/2
else if f=3 then
  result=pi*n*n
else
  result=0
end if

consumes between 12 and 20 cycles depending on f

In your formula you can bring n*n outside the parenthesis:

result=-N*N*((F=1)+(F=2)/2+(F=3)*PI)

This consumes 22 cycles.

In some cases you can speed up calculations by using math. In my starfield example i replaced x<0 or x>=160 by int(x/160) to improve the speed of the program.


bl4ckthund3r43 2020-09-16 02:07 (Edited)

Interesting find. Although I do admit my program wasn’t optimized as best it could (to keep it relatively readable), I wasn’t expecting it to take longer by *that* much. I use assembly regularly, so interpreted languages are a different beast to me (the only others I’ve used were commodore BASIC and python 3). It might not be a technique worth implementing (in some cases), but it definitely DOES speed up compiled languages (as long as you don’t outsmart the compiler ;) ).

*side note, I tried to use ? as a print statement (typed it in w/o thinking) unsurprisingly, it didn’t work.


Log in to reply.