Attempts to use raster fx for pixel-based graphics in NX (LowRes NX)

SP4CEBAR 2021-09-28 15:17

Second attempt

2021-09-28 15:17

SP4CEBAR 2021-09-28 15:23

This second attempt uses all the characters as a library for binary lines
An array with pixel data is then encoded to a bg file containing all the strips of characters (this still takes longer than a frame)
It then writes the correct line characters from memory to each raster
And it's almost possible: without attributes I only need to copy 20 bytes, but because there are attribute bytes in between the character bytes in the bg data it has to transfer 40 bytes, which is too much

SP4CEBAR 2021-09-28 15:34

Both attempts failed to fill the whole screen
This is pretty hard to debug

nathanielbabiak 2021-09-28 22:16

Coincidentally, I took the same approach a few weeks ago. What you need to ask yourself is, what use is the raster BG update? (Regardless of whether or not it's fast enough to be possible, all you're saving is a little memory usage.)

nathanielbabiak 2021-09-29 05:17 (Edited)

It seems you're avoiding bitmask/modifications to character RAM at address $8000. That's the only way to get pixel-based graphics in LowRes NX.

It seems you're attempting to create every reasonable combination of characters in ROM #2. 2x2-pixel "block"-based graphics are achievable, but the combinations you've got in ROM #2 and ROM #4 won't work.

Try this...

Edit ROM #2 characters to use only 4x4 pixels (use only the upper left corner). In characters 0 to 3, the upper-left block will be colors 0 to 3 (respectively). In character 4 to 7, the upper-left block will again be colors 0 to 3 (repeat the pattern), but the upper-right block will be color 1. Then for characters 8 to 11 the upper-right block will be color 2, and for characters 12 to 15 the upper-right block will be color 3. So it looks like this:

__0_ __1_ __2_ __3_ __4_ __5_ __6_ __7_ __8_ __9_ _10_ _11_ _12_ _13_ _14_ _15_

00-- 10-- 20-- 30-- 10-- 11-- 12-- 13-- 20-- 21-- 22-- 23-- 30-- 31-- 32-- 33--

00-- 00-- 00-- 00-- 00-- 00-- 00-- 00-- 00-- 00-- 00-- 00-- 00-- 00-- 00-- 00--

---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----

Where underscores indicate the character number, a number represents a 2x2-pixel "block" of solid-color, a dash (-) represents a 2x2 pixel block of solid-color 0/transparency/background (just for readability). To complete the pattern, you'll just update the "00". Repeat the 16-character pattern for a total of 64 characters to add one more 2x2-pixel block (64=16x4), and repeat the 64-character pattern for a total of 256 characters to add the final 2x2-pixel block (256=64x4).

So these 4x4-pixel blocks won't cover the whole screen, right? But they will if you use BG 0 (to cover the left-side of each 8x8-pixel area) and SCROLL BG 1 (to cover the right-side of each 8x8-pixel area). So now you can cover a whole row, 160 pixels wide by 4 pixels tall...

And this is where the raster comes in. SCROLL both BG 0 and BG 1 in the raster to eliminate the 4-pixel high gap that would otherwise occur every 160x8 pixels.

nathanielbabiak 2021-09-29 05:26 (Edited)

FYI Timo kind'a hinted at your approach on this upload, although "change the cells in each line" has the problem you've encountered - the raster interrupt just isn't fast enough!

I'm using an approach pretty close to what I described above to re-work my 3D raycasting demo - it won't use the pixel library and it'll be at least twice as fast, enough that wall textures and objects are possible, possibly floor textures too.

SP4CEBAR 2021-09-29 14:33 (Edited)

That's a pretty clever solution!

Unfortunately it's only a quarter of the max resolution

SP4CEBAR 2021-09-29 14:35

How much time does it take to encode something like an array of pixels to get the right characters and scroll offsets on each part of the BG

SP4CEBAR 2021-09-29 14:41 (Edited)

Is it possible to make a system that can control each NX pixel individually real-time?

nathanielbabiak 2021-09-29 14:48 (Edited)

It is possible to make a system that can control each NX pixel individually real-time. As I wrote above "It seems you're avoiding bitmask/modifications to character RAM at address $8000. That's the only way to get pixel-based graphics in LowRes NX." (It's also the subject of the pxl library I wrote a while back.)

If you expand upon your idea of "an array of pixels to get the right characters and scroll offsets on each part of the BG" I can help further. (I'm not sure what you meant because "pixels" in that quote is the goal, not a component of the system's display that you directly control. Did you mean an array of BG cell values, with the 2-byte cells changing frequently? Or did you mean an array of character data, with the 16-byte values changing frequently? There's 2 bytes in a BG cell, 16 in a character... which is it? Sorry to be verbose, just trying to be clear.)

SP4CEBAR 2021-09-29 16:51 (Edited)

When I wrote "pixels" I didn't quite yet understand what you were explaining, I meant it for 4x4 cells in the system you explained

SP4CEBAR 2021-09-29 16:56 (Edited)

Also, [Quote:] "bitmask/modifications to character RAM at address $8000" [End Quote] what are bitmasks and what kind of modifications are you referring to?
Is it like the paint-like program I made a while ago, which uses all characters but one to cover a 16x16 cell area of the screen?

nathanielbabiak 2021-09-30 01:34 (Edited)

Thanks for clarifying the "pixels" quote thing... not sure why I couldn't figure that out myself! It'll be clearer in a few days - I'll upload something that uses a similar approach to 2x2-pixel blocks... bloxels. Regarding how long it takes: the idea I described took maybe an hour to develop (Aug 31), an hour to code (Sept 2), and execution costs around 11% CPU to hold the display static. The trick is that the tabular pattern of ROM #2 data is so well-organized that a formula (not a look-up or even IF-THEN statements) can be used to access each individual block, even though there's four blocks in a single cell. And to increase speed during the main portion of a game loop, it uses array lookups (rather than evaluating the formula for a particular BX,BY coordinate).

As far as the alternative subject of pixel-based graphics... Your paint-like upload is exactly what I mean when I say bitmasks (and modifications to character data at the RAM address $8000). As you've no-doubt noticed, 256 characters isn't enough to cover the whole screen. The next paragraph explains how to use the raster interrupt to do that. It starts with a LowRes NX system configuration that "just works" and then will examine "why" it works. (Starting from "how" the LowRes NX system works, and developing a specific display configuration of "what" based on that is just too much to type!) I won't cite example uploads though, since most pixel-based graphic uploads on this site use this exact method... if they're full-screen at least.

Suppose BG 0 contains characters 1 through 20 on the top row (at CY=0 from CX=0 to CX=19), and that the second row contains characters 21 through 40 (at CY=1 from CX=0 to CX=19). Suppose these same characters (1 through 40) are repeated in a pattern below (starting at CY=2 and CY=3), and continuing all the way to the bottom of the screen (at CY=15 and CY=16).

Surely in this configuration you could modify the 16 bytes of data beginning at memory address $8010 and ending at $801F associated with character=1, right? And the same for $8020 through $802F, $8030, $8040, etc? (Side note: the characters are 16 bytes each, and, in hexadecimal, shifting the digit one position to the left is the same as multiplying by 16, so the "1" in $8010 always represents the first character!) So you could modify the 40 characters of data from $8010 through $828F.

And in doing so, the whole screen would update in a repeating pattern, where CY=0 and CY=1 would show the pixel-based graphics, but CY=2 and CY=3 (and higher values of CY also) would repeat them. To prevent the pattern from repeating, we just add two additional suppositions to the system configuration...

Suppose that (rather than modifying the memory address $8010), you modify working RAM at $A000 instead. And, instead of limiting yourself to $28F bytes (since you're no longer in character data RAM, why limit yourself to only 40 characters, right?), you use $1400 instead (this will ultimately be exactly the amount of characters you wish you had to fill the whole screen).
(Background and terminology: the raster interrupt runs immediately before each scanline is drawn by the system.) Just before the scanline at PY=8 is drawn (the first pixel row of CY=1), you'll need to have already copied 320 bytes (into character=21 through character=40 associated with CY=1). This data can't be modified while the scanline being drawn is PY=8 to PY=15. So it needs drawn up-to-and-including raster PY=8. So just copy 40 bytes during each raster interrupt, ensure the source address of the COPY instruction is the correct location in working RAM, and ensure the destination address is the correct location in character=21 through character=40. You'll have eight raster interrupts in which to copy the 320 bytes, so copy 40 bytes per interrupt.

There's a few details to understand about supposition 2:

The 16 byte size of a character doesn't divide evenly into the 40 bytes of the copy command, but does divide evenly into the size of 320 bytes or 20 characters.
When copying the first 16 bytes (of the first 40-byte COPY instruction), you'll be preparing the entire first cell (at CX=0), and you'll be preparing eight pixel rows (PY=8 to PY=15), not just one. The next 16 bytes (at CX=1) prepare eight pixel rows to the right, and so on, until all 20 characters have been prepared.
Immediately after the eight raster interrupts complete, that's the timing for when the next scanline should show the first pixel row of those rows that were prepared in this eight-row group.
Those eight pixel rows of character data can't be modified because it would become visible as the scanline displays the data. Thus, to begin working on the next eight-row group, you'll adjust the destination formulas to use character=1 to character=20.

And that's it! It basically becomes an exercise in determining the formulas for the source address and destination address. There's also a complication when displaying CY=0 (you may have noticed, in supposition 2, the example began at CY=1), since the RASTER values loop around and would be 121 to 127 and 0 where you'll need to do those COPY instructions.

It also becomes an exercise in code optimization, and speed varies. The earliest uploads on this site addressed the RASTER loop complication using a separate VBL interrupt, execution speed costs 38% CPU. The Pxl Library does it within the raster interrupt, execution speed costs 35% CPU.

P.S. My job/career is to learn and present super-complicated topics to other people at work, and communicate it clearly. Sorry if I sound like a textbook! Just think of me as a personal tutor with the qualifications of a textbook editor.

nathanielbabiak 2021-09-30 06:19

I uploaded the display system example for you. The raster interrupt section begins with "I_" and the display section begins with "D_". I'd have preferred to wait a bit since it's (yet again) an incomplete upload, but I figured since we're talking about it I might as well.

SP4CEBAR 2021-09-30 08:15

Thanks for your time and effort!

SP4CEBAR 2021-09-30 08:25 (Edited)

So the system you described works similar to my first attempt where a full screen worth of characters is stored in working RAM and copied to character data via raster interrupts?

nathanielbabiak 2021-09-30 12:35 (Edited)

Yep - I hadn't looked at the code in your initial upload, only your second. Looking at the code of your first upload now, it's on the right track, just make a few changes to match what I described in my second long post and you'll have it figured out.

SP4CEBAR 2021-09-30 15:28

That's great, thanks again for your time and effort explaining this

nathanielbabiak 2021-09-30 19:36 (Edited)

One last thing - your raster code is too slow because you're performing the addressing arithmetic each scanline. You need to pre-calculate the addressing arithmetic, and store it in a "lookup" array. Check out the raster code in the Pxl Library or uploads by others to get an example for how to do it.

SP4CEBAR 2021-09-30 20:22

Alright, that makes a lot of sense

SP4CEBAR 2021-09-30 20:25 (Edited)

It's only by a tiny bit though
According to riden's cycle counter, copying 40 bytes takes 48 cycles while calculating the address takes only 7 cycles

nathanielbabiak 2021-09-30 22:31 (Edited)

A few things: ~~rilden's cycle counter isn't always accurate~~, the calculations I'm referring to are also in your original upload within the IF evaluation and END IF instruction, which take 12 or 13 cycles (depending on whether it evaluates true or false). This arithmetic needs to happen and the 7 cycle math you've referenced.

There's only 51 cycles allowed in a raster (END SUB counts as 1, so really only 50 usable). You need to copy 40 bytes in 51 cycles. ~~Even supposing rilden's numbers were accurate at 48 and 7 (which they may be in this instance) (and omitting the 12-13),~~ the total of 48+7=55 cycles is too many.

I can't reproduce the discrepancy I got last year, so my comments about rilden's cycle counter are incorrect, as far as I can tell. Disregard ~~strikethroughs~~ above.

SP4CEBAR 2021-10-01 08:15

Thanks for mentioning the max cycles per raster interrupt

Attempts to use raster fx for pixel-based graphics in NX

I'm trying to make pixel-based graphics