Fast Line drawing algorithm (LowRes NX)

DDA.nx | Open in app

2023-03-03 17:24

DDA.nx | Open in app

2023-03-03 17:04

DDA.nx | Open in app

2023-03-03 05:36

DDA.nx | Open in app

2023-03-03 05:35

DDA.nx | Open in app

2023-03-03 05:34

Pixel.nx | Open in app

2023-03-03 01:50

GAMELEGEND 2023-03-03 05:35

woops wrong screenshot XD

McPepic 2023-03-03 16:31

Sped up from ~570 to ~730 lines per second! (Check debug) :)

DDA.nx | Open in app

2023-03-03 16:31

GAMELEGEND 2023-03-03 16:39

I see that you changed FOR I = 0 TO S-1 to FOR I = 0 to S

I was going to do that to make it faster but I forgot
because all that will do is just draw one extra pixel sometimes.

McPepic 2023-03-03 16:43

@GAMELEGEND
It’s actually looping from 1 to S. The actual value of I doesn’t matter, just the number of iterations. This just makes it so that it doesn’t have to recalculate S-1 every time the program wants to find out if the loop is completed.

GAMELEGEND 2023-03-03 17:05

I wonder if it can go even faster!

McPepic 2023-03-03 17:07

I just realized you can do:
S = MAX(ABS(DX),ABS(DY))
IF S = 0 THEN EXIT SUB

GAMELEGEND 2023-03-03 17:09

Hey I was just going to go try that
great minds think alike.

GAMELEGEND 2023-03-03 17:10

At the very beginning the latest version reached 1080 lines per second.

McPepic 2023-03-03 17:16

I just wrote my own that uses the 2 bit color scheme in character memory. This relies on pre calculated bit masks and runs at ~400 lines/second. I figure I can pre calculate more values to make it even faster!

GAMELEGEND 2023-03-03 17:18 (Edited)

I tested it at found out that

S = MAX(ABS(DX),ABS(DY))
IF S = 0 THEN EXIT SUB

saves 7 cycles

GAMELEGEND 2023-03-03 17:22

Yeah now it draws lines around 800 times a second for a tiny bit longer.

G-9 2023-03-03 20:59

gotta go fast!

SP4CEBAR 2023-03-03 23:17 (Edited)

I spent way too much time on this, but I managed to make it even faster
From 730 to 750, but it did cost some integrity: float imperfections cause the bit mask x value to get more and more out of sync while long (and quite vertical) lines are drawn
It's made up of a ton of preparation variables and then a nested FOR loop, with a few ADD commands that replaced the address array

The active subprogram is called "DDA3"

Also, your background registers' data is very efficient, I didn't know a continuous y coordinate was possible (that doesn't have to be broken up into y mod 8 and y\8)

DDA Even Faster.nx | Open in app

2023-03-03 23:17

GAMELEGEND 2023-03-04 02:37

I wonder if DDA is the algorithm that we should keep using to make it as fast as we can. I'm guessing it is since it's so simple.

GAMELEGEND 2023-03-04 02:58 (Edited)

The gears in my head just started turning and I started to wonder why we where creating the variables X0, Y0, X1, and Y1 in the do loop and making them equal a random number when we could just do
CALL DDA3(RND(159), RND(127), RND(159), RND(127))

@SP4CEBAR your DDA3 sub program dropped down into the 750s but with the new line of code I was getting at least 762 lines per second.

DDA Even Faster.nx | Open in app

2023-03-04 03:03

SP4CEBAR 2023-03-04 08:02 (Edited)

@gamelegend, great, it can be optimized further, since it doesn't have the aforementioned optimization yet, where:

IF ABS(DX) > ABS(DY) THEN S = ABS(DX) ELSE S = ABS(DY)
IF S THEN
...
END IF

Can be replaced with:

S = MAX(ABS(DX),ABS(DY))  
IF S = 0 THEN EXIT SUB

Also it loses a lot of speed from the mandatory restrictions: X0=MIN(150,MAX(10,X0)) X1=MIN(150,MAX(10,X1)) This gives it a buffer strip, which is needed because it overshoots vertically a lot due to the inner for loop not checking if the outer for loop's condition is met, I can make a buffer strip by sizing up the bit mask array

SP4CEBAR 2023-03-04 09:05 (Edited)

These changes I mentioned in the previous post made it slightly faster

DDA Even Faster.nx | Open in app

2023-03-04 09:05

SP4CEBAR 2023-03-04 09:21 (Edited)

the bit mask array can be replaced with ADD X0,X0
And by changing the starting value of X0, the course of the whole exponential function can be changed (like Y=A^X where A behaves the same as the starting value of X0)

SP4CEBAR 2023-03-04 09:32

This buggy version draws something really cool looking, it looks a bit like the imperfections on old films, and it looks 3D-ish

DDA Even Faster.nx | Open in app

2023-03-04 09:32

McPepic 2023-03-04 16:06

So, I was messing around with my line drawing program and I got this.
It gives up a little accuracy for the sake of speed, but it’s really fast!

Line Test.nx | Open in app

2023-03-04 16:06

GAMELEGEND 2023-03-04 16:22

So do you think that we can make it more accurate without losing a lot of speed?

SP4CEBAR 2023-03-04 19:03 (Edited)

But how is
POKE POS2ADRL(X1,Y1), GETVALL(PEEK(POS2ADRL(X1,Y1)), X1 MOD 8, C)
Faster than
POKE ADDR(X0,Y0),PEEK(ADDR(X0,Y0)) OR BITMASK(X0)

The first snippet has more array dimensions and a MOD operation, are those faster than the single AND operation of the last code snippet?

McPepic 2023-03-04 20:00

@SP4CEBAR
It's a lookup table. I just calculated every possible combination of byte given the original value, the index of the bit to draw, and the color that's being used.

Everything is precalculated using: (a and not m) or (b and m)
where a is the original value of the byte that we're modifying, b is a new value where every bit is equal to the bit (low or high) that we're setting the current pixel to, and m is the mask that switches between the original value of the bit (a) or the value from the new byte based on the color (b).

SP4CEBAR 2023-03-05 13:25 (Edited)

It looks like
GETVALL( ... , X1 MOD 8, C)
Has more operations than
... OR BITMASK(X0)

nathanielbabiak 2023-03-06 22:28 (Edited)

Yeah that pre-computed bitmask lookup won't save anything, it's the same number of cc or one slower, depending on if you use it as-is or optimize it.

If you take what I had claimed is the fastest published code from the other thread, and give up a little accuracy for the sake of speed in the exact same way as this thread is doing (and benchmark lines/sec the same way too), it's at least 100 lines per second faster. It's still the fastest! :-P

nathanielbabiak 2023-03-06 22:51

I hate not having internet at home right now, I'll upload an example benchmarking all of these later tonight

SP4CEBAR 2023-03-07 08:57

@nathanielbabiak If your outage is going to last a really long time then you could buy a starlink dish

nathanielbabiak 2023-03-08 17:28 (Edited)

My internet is back!

A few of the things I had mentioned in this thread were overlooked:

the efficient character layout (what SP4CEBAR called "background register data") is by Rilden (not McPepic, that was just a copy-paste of Rilden's prior work)
the S=0 vs. S=1 issue is a consequence of ignoring that the center of the pixel is offset (0.5)
DDA doesn't need to be 2-dimensional

(When I say 'dimensional' I mean the number of dimensions that are DDA-updated within the inner loop, a FOR integer increment isn't DDA.)

Anyway, I attached the benchmarks so everything's comparable, and made some minor tweaks to the 2D version:

Arguments to these subprograms can now be passed by reference without the subprograms modifying them (as expected), slower by 4 cc/line
The S=0 vs. S=1 issue is replaced by the 0.5 offset (now, lines will start/end on the coordinates, as expected), slower by 6 cc/line
The zero-length line condition now plots a point (as expected), faster by 2 cc/line
The benchmark loop, itself, is more precise, faster by 1 cc/line (for both benchmarks by McPepic and SP4CEBAR)
The benchmark updates every second (not every line), faster by 18+ cc/line

Note that the 1D version is faster by 3 cc/pixel (note the units are cc/pixel not cc/line). This is the cause of the speed increase.

Also, I really like the DDA3 idea where SP4CEBAR's working with addresses not (x,y) lookups. If that algorithm can get the point where it plots within 1 pixel of the Bresenham algorithm, I'll take a look to see if I can find any improvements to make it faster or more accurate.

lines.nx | Open in app

2023-03-08 17:29

SP4CEBAR 2023-03-08 19:06

Yay,
Also, the benchmark is McPepic's, my version branched from his version

McPepic 2023-03-09 00:18

When are the bounds of a for loop calculated? Do they get recalculated every loop or does the program calculate them before the loop is started?

nathanielbabiak 2023-03-09 23:47 (Edited)

The initializing bound of a for loop is calculated once, at entry. The check bound is calculated every iteration, and the step is also calculated at every iteration. It's covered here.

McPepic 2023-03-10 01:06

@nathanielbabiak
Alright, that's what I thought. That makes sense.