GAMELEGEND 2023-03-03 01:50 (Edited)
Time to make it even faster hopefully.
Thanks to McPepic it draws lines even faster!
Most recent version: The lines of code to show the number of lines per second are still there but have been turned in to comments to use less clock cycles.
GAMELEGEND 2023-03-03 05:35
woops wrong screenshot XD
McPepic 2023-03-03 16:31
Sped up from ~570 to ~730 lines per second! (Check debug) :)
GAMELEGEND 2023-03-03 16:39
I see that you changed FOR I = 0 TO S-1 to FOR I = 0 to S
I was going to do that to make it faster but I forgot
because all that will do is just draw one extra pixel sometimes.
McPepic 2023-03-03 16:43
@GAMELEGEND
It’s actually looping from 1 to S. The actual value of I doesn’t matter, just the number of iterations. This just makes it so that it doesn’t have to recalculate S-1 every time the program wants to find out if the loop is completed.
GAMELEGEND 2023-03-03 17:05
I wonder if it can go even faster!
McPepic 2023-03-03 17:07
I just realized you can do:
S = MAX(ABS(DX),ABS(DY))
IF S = 0 THEN EXIT SUB
GAMELEGEND 2023-03-03 17:09
Hey I was just going to go try that
great minds think alike.
GAMELEGEND 2023-03-03 17:10
At the very beginning the latest version reached 1080 lines per second.
McPepic 2023-03-03 17:16
I just wrote my own that uses the 2 bit color scheme in character memory. This relies on pre calculated bit masks and runs at ~400 lines/second. I figure I can pre calculate more values to make it even faster!
GAMELEGEND 2023-03-03 17:18 (Edited)
I tested it at found out that
S = MAX(ABS(DX),ABS(DY))
IF S = 0 THEN EXIT SUB
saves 7 cycles
GAMELEGEND 2023-03-03 17:22
Yeah now it draws lines around 800 times a second for a tiny bit longer.
G-9 2023-03-03 20:59
gotta go fast!
SP4CEBAR 2023-03-03 23:17 (Edited)
I spent way too much time on this, but I managed to make it even faster
From 730 to 750, but it did cost some integrity: float imperfections cause the bit mask x value to get more and more out of sync while long (and quite vertical) lines are drawn
It's made up of a ton of preparation variables and then a nested FOR loop, with a few ADD commands that replaced the address array
The active subprogram is called "DDA3"
Also, your background registers' data is very efficient, I didn't know a continuous y coordinate was possible (that doesn't have to be broken up into y mod 8 and y\8)
GAMELEGEND 2023-03-04 02:37
I wonder if DDA is the algorithm that we should keep using to make it as fast as we can. I'm guessing it is since it's so simple.
GAMELEGEND 2023-03-04 02:58 (Edited)
The gears in my head just started turning and I started to wonder why we where creating the variables X0, Y0, X1, and Y1 in the do loop and making them equal a random number when we could just do
CALL DDA3(RND(159), RND(127), RND(159), RND(127))
@SP4CEBAR your DDA3 sub program dropped down into the 750s but with the new line of code I was getting at least 762 lines per second.
SP4CEBAR 2023-03-04 08:02 (Edited)
@gamelegend, great, it can be optimized further, since it doesn't have the aforementioned optimization yet, where:
IF ABS(DX) > ABS(DY) THEN S = ABS(DX) ELSE S = ABS(DY)
IF S THEN
...
END IF
Can be replaced with:
S = MAX(ABS(DX),ABS(DY))
IF S = 0 THEN EXIT SUB
Also it loses a lot of speed from the mandatory restrictions: X0=MIN(150,MAX(10,X0)) X1=MIN(150,MAX(10,X1)) This gives it a buffer strip, which is needed because it overshoots vertically a lot due to the inner for loop not checking if the outer for loop's condition is met, I can make a buffer strip by sizing up the bit mask array
SP4CEBAR 2023-03-04 09:05 (Edited)
These changes I mentioned in the previous post made it slightly faster
SP4CEBAR 2023-03-04 09:21 (Edited)
the bit mask array can be replaced with ADD X0,X0
And by changing the starting value of X0, the course of the whole exponential function can be changed (like Y=A^X where A behaves the same as the starting value of X0)
SP4CEBAR 2023-03-04 09:32
This buggy version draws something really cool looking, it looks a bit like the imperfections on old films, and it looks 3D-ish
McPepic 2023-03-04 16:06
So, I was messing around with my line drawing program and I got this.
It gives up a little accuracy for the sake of speed, but it’s really fast!
GAMELEGEND 2023-03-04 16:22
So do you think that we can make it more accurate without losing a lot of speed?
SP4CEBAR 2023-03-04 19:03 (Edited)
But how is
POKE POS2ADRL(X1,Y1), GETVALL(PEEK(POS2ADRL(X1,Y1)), X1 MOD 8, C)
Faster than
POKE ADDR(X0,Y0),PEEK(ADDR(X0,Y0)) OR BITMASK(X0)
The first snippet has more array dimensions and a MOD operation, are those faster than the single AND operation of the last code snippet?
McPepic 2023-03-04 20:00
@SP4CEBAR
It's a lookup table. I just calculated every possible combination of byte given the original value, the index of the bit to draw, and the color that's being used.
Everything is precalculated using: (a and not m) or (b and m)
where a is the original value of the byte that we're modifying, b is a new value where every bit is equal to the bit (low or high) that we're setting the current pixel to, and m is the mask that switches between the original value of the bit (a) or the value from the new byte based on the color (b).
SP4CEBAR 2023-03-05 13:25 (Edited)
It looks like
GETVALL( ... , X1 MOD 8, C)
Has more operations than
... OR BITMASK(X0)
nathanielbabiak 2023-03-06 22:28 (Edited)
Yeah that pre-computed bitmask lookup won't save anything, it's the same number of cc or one slower, depending on if you use it as-is or optimize it.
If you take what I had claimed is the fastest published code from the other thread, and give up a little accuracy for the sake of speed in the exact same way as this thread is doing (and benchmark lines/sec the same way too), it's at least 100 lines per second faster. It's still the fastest! :-P
nathanielbabiak 2023-03-06 22:51
I hate not having internet at home right now, I'll upload an example benchmarking all of these later tonight
SP4CEBAR 2023-03-07 08:57
@nathanielbabiak If your outage is going to last a really long time then you could buy a starlink dish
nathanielbabiak 2023-03-08 17:28 (Edited)
My internet is back!
A few of the things I had mentioned in this thread were overlooked:
(When I say 'dimensional' I mean the number of dimensions that are DDA-updated within the inner loop, a FOR
integer increment isn't DDA.)
Anyway, I attached the benchmarks so everything's comparable, and made some minor tweaks to the 2D version:
Note that the 1D version is faster by 3 cc/pixel (note the units are cc/pixel not cc/line). This is the cause of the speed increase.
Also, I really like the DDA3 idea where SP4CEBAR's working with addresses not (x,y) lookups. If that algorithm can get the point where it plots within 1 pixel of the Bresenham algorithm, I'll take a look to see if I can find any improvements to make it faster or more accurate.
SP4CEBAR 2023-03-08 19:06
Yay,
Also, the benchmark is McPepic's, my version branched from his version
McPepic 2023-03-09 00:18
When are the bounds of a for loop calculated? Do they get recalculated every loop or does the program calculate them before the loop is started?
nathanielbabiak 2023-03-09 23:47 (Edited)
The initializing bound of a for loop is calculated once, at entry. The check bound is calculated every iteration, and the step is also calculated at every iteration. It's covered here.
McPepic 2023-03-10 01:06
@nathanielbabiak
Alright, that's what I thought. That makes sense.