Creating a 4-Color Full-Screen Pixel Buffer

How To

nathanielbabiak 2023-03-03 03:54 (Edited)

Forward

My earliest work on this console was the Pxl Library. I've explained how it works a number of times, in bits-and-pieces. I just wanted to get everything in one place for posterity. This... "thing"... is commonly called a "pixel buffer" on this forum.

This console isn't the easiest or simplest of the fantasy consoles that are currently available, but I do find it the most fun. However, because of that, it's probably best to begin learning about pixel buffers only after developing a robust understanding of the graphics system this console uses.

I'd start by getting a solid understanding of the graphics system terms. Then read through the manual's graphics section one more time. Then read through the inner-workings of the RAM.

The Pxl Library is just one example of "pixel buffers" that are often mentioned on this site. Check out this upload, it's got everything presented very concisely, without distractions like you'll find in the Pxl Library.

Introduction

This forum has many paint-like uploads that exemplify "bitmasks" (and modifications to character data at the RAM address $8000). As you've no-doubt noticed, 256 characters isn't enough to cover the whole screen.

The next section explains how to use the raster interrupt to do that (i.e., to cover the whole screen). It starts with a LowRes NX system configuration that "just works" and then will examine "why" it works.

(If we were to do the opposite, i.e., start from "how" the LowRes NX system works, and develop a specific display configuration of "what" based on only the "how" information... it would be too hard to explain!)

I won't be citing any example uploads, because many full-screen pixel-based graphic uploads on this site use this exact method. Following along with various author's implementations (and creating your own!) will be "left as an exercise for the reader".

The Configuration

Suppose BG 0 contains characters 1 through 20 on the top row (at CY=0 from CX=0 to CX=19), and that the second row contains characters 21 through 40 (at CY=1 from CX=0 to CX=19). Suppose these same characters (1 through 40) are repeated in a pattern below (starting at CY=2 and CY=3), and continuing all the way to the bottom of the screen (at CY=14 and CY=15).

Surely, in this configuration, you could modify the 16 bytes of data beginning at memory address $8010 and ending at $801F associated with character=1, right? And the same for $8020 through $802F, $8030, $8040, etc? (Side note: the characters are 16 bytes each, and, in hexadecimal, shifting the digit one position to the left is the same as multiplying by 16, so the "1" in $8010 always represents the first character!) So you could modify the 40 characters of data from $8010 through $828F.

And, in doing so, the whole screen would update in a repeating pattern, where CY=0 and CY=1 would show the pixel-based graphics, but CY=2 and CY=3 (and higher values of CY also) would repeat them. To prevent the pattern from repeating, we just add two additional suppositions to the system configuration...

Suppose that (rather than modifying the memory address $8010), you modify working RAM at $A000 instead. And, instead of limiting yourself to $28F bytes (since you're no longer in character data RAM, why limit yourself to only 40 characters, right?), you use $1400 instead (this will ultimately be exactly the amount of characters you wish you had to fill the whole screen).
(Background and terminology: the raster interrupt runs immediately before each scanline is drawn by the system.) Just before the scanline at PY=8 is drawn (the first pixel row of CY=1), you'll need to have already copied 320 bytes (into character=21 through character=40 associated with CY=1). This data can't be modified while the scanline being drawn is PY=8 to PY=15. So it needs drawn up-to-and-including raster PY=8. So just copy 40 bytes during each raster interrupt, ensure the source address of the COPY instruction is the correct location in working RAM, and ensure the destination address is the correct location in character=21 through character=40. You'll have eight raster interrupts in which to copy the 320 bytes, so copy 40 bytes per interrupt.

There's a few details to understand about supposition 2:

The 16 byte size of a character doesn't divide evenly into the 40 bytes of the copy command, but does divide evenly into the size of 320 bytes or 20 characters.
When copying the first 16 bytes (of the first 40-byte COPY instruction), you'll be preparing the entire first cell (at CX=0), and you'll be preparing eight pixel rows (PY=8 to PY=15), not just one. The next 16 bytes (at CX=1) prepare eight pixel rows to the right, and so on, until all 20 characters have been prepared.
Immediately after the eight raster interrupts complete, that's the timing for when the next scanline should show the first pixel row of those rows that were prepared in this eight-row group.
Those eight pixel rows of character data can't be modified because it would become visible as the scanline displays the data. Thus, to begin working on the next eight-row group, you'll adjust the destination formulas to use character=1 to character=20.

Conclusion

And that's it! It basically becomes an exercise in determining precise formulas for the source address and destination address. There's also a complication when displaying CY=0 (you may have noticed, in supposition 2, the example began at CY=1 rather than CY=0), since the RASTER values loop around and would be 121 to 127 and 0 where you'll need to do those COPY instructions.

It also becomes an exercise in code optimization, and speed varies. The earliest uploads on this site addressed the RASTER loop complication using a separate VBL interrupt, execution speed costs 38% CPU. The Pxl Library does it within the raster interrupt, execution speed costs 35% CPU.

One last thing... If you develop the raster code "from scratch" it might be too slow. There's only 51 cycles allowed in a raster (END SUB counts as 1, so really only 50 usable). You need to copy 40 bytes in 51 cycles. You need to pre-calculate the addressing arithmetic, and store it in a "lookup" array. Check out the raster code in the Pxl Library or uploads by others to get an example for how to do it.

Happy coding!

:-)

GAMELEGEND 2023-03-03 05:59

Do your raycasting demos use the pxl libary?

SP4CEBAR 2023-03-03 16:53

I usually find the 35% CPU cut too much, so for intensive graphics applications, I prefer bit-masks

McPepic 2023-03-03 19:16 (Edited)

@nathanielbabiak
I was working out the math and realized that if each pixel on the screen was represented by half a byte, it would take 10 kilobytes to store a simple depth buffer for the screen. Combined with this method, it could be stored alongside the pixel buffer. Would this be practical, though, or would it be optimal to just sort all the triangles for drawing in 3D?

Realized that an array could be used instead of WRAM. I just wonder if it’d be worth it to use a depth buffer instead of sorting all the triangles. (This would also decrease overdraw)

nathanielbabiak 2023-03-06 20:47 (Edited)

Some of the raycasting demos do, others don't. At this link, 1 and 1-updated both do, while 2 and 3 don't.

https://lowresnx.inutilis.com/topic.php?id=2678

And this one developed afterwards... kinda... does, if you use the 2022-11-29 or earlier versions.

https://lowresnx.inutilis.com/topic.php?id=2702

This post is from my phone, not formatted. I still don't have internet at home after Friday's storms. (Kentucky's governor declared a state of emergency Friday.)

As far as the 35% goes...

You know you can reduce that down a bunch right? You can use the scheme described above, but instead of making the pixel buffer 20 characters wide, you could make it 5 wide (you'd use 10 total characters in a similar repeating pattern described above), and the you'd fill the remaining 15-wide by 16-high cells with unique characters. This scheme would use 10 (raster RAM buffered) characters (5x2) and also 240 ("direct" RAM buffered) characters (15x16). It just doesn't leave much for sprites.

The Pxl Library BG tool includes this functionality already, actually. You input the overall pixel window size with the WINDOW prompt, and input the rasterized portion with the BUF.WIDTH prompt. If the portion is less than the overall, the remainder is assumed to be unique characters.

(Where I wrote "kinda" above, that one uses a 15+5 setup rather than a 5+15 setup.)

nathanielbabiak 2023-03-06 21:03 (Edited)

McPepic, I can't think of any restriction on this console that would prevent use of a pixel depth-buffer. (It'd be faster to use an array though, rather than RAM access.)

I do kinda wonder why you'd want to use it though, since this console is so limited that you're unlikely to have a complex scene of triangles that need rendered with one of the pixel-by-pixel rasterization algorithms. (Assuming you're OK limiting yourself to convex polyhedra...) Painter's algorithm is probably all this console can achieve for real time rendering. (And that'd be less than 10 fps.) It could work really well for static rendering though!

In addition to lacking CPU horsepower, this console also lacks fine color palettes. It'd be tough to show the z-buffered depth on screen with the colors we've got available. I'm not an artist, there might be a way to get the visual impression you're after. (Like your checkerboard triangle renderer maybe?)

McPepic 2023-03-06 22:10

@nathanielbabiak
I'm currently trying to write a triangle rasterizer. What I'm thinking of is getting a bounding box in tiles for the triangle and looping through to classify each one. This method would separate tiles into three categories: out, in, and on the edge of the triangle. If it's out of the triangle, I can skip it. Tiles entirely inside can just get POKEd and tiles on the edge can have a mask applied to only write to the parts that are inside the triangle.