Discussion

Branchless code

0

SP4CEBAR 2022-06-21 19:02 (Edited)

Since I watched this video years ago, I've been writing branches code when possible,

Branchless code is writing

X=A*-(x<0)-B*(X>=0)
(TRUE has the value -1, that's why everything is negative)

Instead of

IF X<0 THEN X=A ELSE X=B END IF

however branchless code isn't always faster, it all depends on what the compiler does with your code, the only ways to figure out if it is faster is to look at the assembly code or to measure the CPU cycles

Which compiler(s) does LowResNX use? Also, are there more people who also write branchless code


McPepic 2022-06-21 19:29

I switch back and forth. I think it runs faster if you write out the condition. It’s also easier to read. Sometimes I want to write it in as few lines as possible, in which case I use branchless. I think it’s up to preference, though. If you really want to check, there is a program on LowRes that counts clock cycles, so you could use that. By the way, if your condition runs all on one line, you don’t need to write “end if”.


SP4CEBAR 2022-06-21 20:56 (Edited)

I didn't realize it was one line, it should have been five lines, but I forgot to add a double space after each line, so there were no markdown line breaks so it appears as one line


nathanielbabiak 2022-06-22 00:42 (Edited)

The user rilden has uploaded Cycle counter. Also check out Timo's comments here and here.

From the manual...

LowRes NX has a simplified simulation of CPU cycles. There is a fixed limit of cycles per frame. This assures the same program execution speed on all devices, so if you optimize your program on your device to run smoothly, it will run the same on all other devices.

Each execution of a command, function or operator, as well as access to a variable or a constant count 1 cycle. Some operations have additional costs:

Total cycles per frame: 17556

Cycles per VBL interrupt: 1140

Cycles per raster interrupt: 51

The main program may spend any number of cycles, but when the limit is reached before a WAIT VBL or WAIT command, the execution continues in the next frame. If interrupts exceed their limit, you will see black scanlines on the screen.


nathanielbabiak 2022-06-22 00:53 (Edited)

Branchless code tends to be really hard to read, debug, modify, etc.

There aren't too many uploads on this site currently that require speed. Regardless, the best way to get faster code is to change your algorithm "big picture" rather than syntactically (saving only a few clock cycles).

A good exercise when looking at an algorithm "big picture" is:

You'll likely find you can trade execution time for memory usage. Here's some cool examples:

That said, I've explored the console's clock cycles for syntactic gains, my results are published here, with recommendations published here.


SP4CEBAR 2022-06-22 15:20 (Edited)

At one point I'll need to optimize my game engine which always has an NX CPU usage of 100% (it also contains a lot of branchless code)

Oh, wait you won't receive a notification for this, so you probably won't see this


Log in to reply.