Discussion

Micro Text, Text Strings, Rom Data, and back again...

0

was8bit 2020-12-10 05:19

I have the idea of making a lowresNx book maker that can make the database for a book player ... i am using as a model a used book i bought (c)2006 "The Lost Jewels of Nabooti" by the CYOA books... i add their link out of respect for them as i will be using their book(s) that i bought as used...

https://www.cyoa.com

So, I have decided to use the MICRO TEXT format as a 40x15 page is sufficient for displaying one complete small page of text... but as these books have over 100 pages per book, i cannot allow lots of blank unused space as i calculates this could exceed lowres memory restrictions..

So, i have the idea of letting someone to create the page (Magic Typewriter style) and using reserved characters as commands, condense the finished page into one condensed string, and all the strings condensed into a ROM file..

The reverse will be needed, to convert ROM data to a string array, and each string back into an editable micotext screen...

Then i will need to make a player... so TONS of work required...

But i thought i could share my progress here so as to
1) not glut up the game pages with a non-game, and
2) invite discussions, and maybe inspire others with ideas :)

So, my following posts will post my current work on this project :)




was8bit 2020-12-10 05:22

Here is what i have so far... i will elaborate in the next post...


was8bit 2020-12-10 05:34

So, this represents a test page entry... this one screen holds all the data for one page in the book... (test page shown should have the last * as a \)

There are characters reserved for special purposes only...
# indicates that this is the start of a page data
\ indicates the end of the page data
/ commands to ignore following uninterupted blank spaces
* indicates that the text for the 3 choices follow

...one more character is used NOT in the text, but tapping this character...
@
Tells the program you are done editing this page, and it will condense the page into a text string..

If you run this program, tap the @character... it will process the screen... then you will see...
1) a number representing the length of the text string,,and...
2) the actual text string...

the 4 numbers at the top of the edit page are:
Current page you are editing
The 3 pages that the 3 choices will go to..


What i need to do next is
1) create a ROM reader that converts data to proper string array
2) create a way to select a page entry to edit
3) create string to ROM converter/writer


was8bit 2020-12-10 05:41

My general idea is that the strings are arranged by their page number in the string array... then when writting these into ROM, to simply dump all strings in the array, in order, into a ROM file as a single stream of data... giving it maximum saving of spacw

Then to unpack the ROM, use # as a trigger that this starts a new page, and use \ that this page is done, write to string array..

Then once all the string array is loaded, one is free to edit any one string :)


nathanielbabiak 2020-12-10 15:47

I'd add a new-line encoding (`CHR$(10)) or automatic word wrap so there isn't a bunch of empty spaces that need encoded. It would also allow the ROM data to contain a full page without worrying about how to delineate end-of-string vs. end-of-line. (Is this similar to what you meant with "/ commands to ignore following uninterrupted blank spaces"?). I have a few other thoughts also...

Could you just use the source code to do the data entry from the book's text? Like... type the lines of the book into DATA commands, with one line of source code for one line of text, and one PAGE000 label at the top of every page. Then write the encoding subprogram and use RESTORE PAGE000 followed by the encoding, and SAVE. That way the Adventure Book Maker can focus on editing the interactive "jump to page" stuff, rather than the data entry itself.

Also, if you want to allow ASCII art in the books, you might want to move away from using printing characters in the encoding. You could use non-printing characters instead (from `CHR$(1)` to `CHR$(31)` for control characters). The Pxl Library includes `ESCSEQ`, a generic escape sequence encoder, and an explanation in the text demo. You could add hex encoding `\00` through `\0FF` to encode any character.

At 40x25, the data is likely almost a kilobyte per page, even with automatic word wrap. Since it's mostly text data (with a few control characters thrown in), you could use LZ77 compression or BWT compression to get a nice decrease in size-per-page.


was8bit 2020-12-10 16:41 (Edited)

Thanks so much for joining in and helping :)


Did you try running the program, and hit the @ key to see what i have so far :)

My page editor already handles the DELETE KEY, as well as the ENTER key, and the ENTER key also automatically adds the "/" character which tells my "page to string" converter to ignore all following blank spaces until it hits a non-blank space...

Currrently, in the example page it reduced a 600 sized page to a 112 length string... :)


was8bit 2020-12-10 16:48

There is NO other fancy abilities that normallu come with text editors.. you cannot bump or shift text around... as i have it set up, the layout you see when you edit it what you will see in the player... NOPE to any form of graphics, text art or otherwise...

The whole page must include everything, including current page and the goto pages... this is what is on the page in the real books... but in the PLAYER, it will hide all page numbers, all you will see is the text of the story, and the text for your choices...


was8bit 2020-12-10 16:52

The reasoning behind keeping all page numbers with the text is: thats how the books do it, and also keeping them seperate would be a nightmare trying to resolve a data entry error by just looking at a bunch of numbers.. especially if you were creating your own story adventure from scratch ... when testing your book, and can see all the numbers on the same page as the text, finding a data entry error will be much easier i think...


was8bit 2020-12-10 17:06 (Edited)

Data compression sounds interesting, my data is numbers 0-63 exculsively... with no long blocks of identical numbers... i know that if i could restrict the number of charactes down to 40 i could convert 3 text character into 2 bytes of ROM data, a 33% reduction... but that would compromise the available text characters... so not sure how to compress.... my current scheme simply removes all long blocks of empty space, and using one byte to store each text characters..,

My sample page would currently take up 112...


was8bit 2020-12-10 17:26 (Edited)

I am thinking any additional compression scheme could easily be added between the string array array and saving to ROM file...

My current plan is to use a "condensed" version of the existing text, making the text as sparse as possible without killing the overall storyline... i can elliminare a lot of "filler" and "fluf" and keep the text to the "bare bones"...


was8bit 2020-12-10 17:28

I have done some research on both compression schemes you mentioned, and all,examples use C language, which i really cant convert easily into basic commands...


was8bit 2020-12-10 17:32 (Edited)

I have already tested NX and can pack 15,000 text characters (direct 1 char/byte) into one huge ROM file, and NX can handle 2 such files, so that gives me 30,000 text characters... at 100 char/page would yield 300 pages per book... My books are under 150 pages, so that gives me some wiggle room for a few pages going over the 100char/page limit...


nathanielbabiak 2020-12-10 21:30

Yeah that earlier comment was on my lunch break today from work. Just ran the program now though - pretty cool!

I'm just not sure where you're getting 100 char/page though. I typed your original post into the program (changing the http:// for http:-- but not changing anything else) and got all the way to the closing parenthesis of "(Magic Typewriter style)", then I hit @, and the value of IT printed on line 156 was 560... Do the CYOA books really only use around 100 characters per page?

Also, there's no need to consider compression until you hit the storage limit of LowRes. For a fast form of compression, maybe just encode 3 characters into 3 bytes (using each of the byte's six low bits, basically what's being done now), and then encode one character into those same 3 bytes (using each of the byte's two high bits). That would compress every set of four characters down to three... a 25% reduction!


was8bit 2020-12-10 22:59 (Edited)

Thanks :)

I double checked on a nearly full page, and it is an estimated 40x30 or 1,200 characters on just one page... first, i am too lazy to type that much, secondly i would "bare bones" it down to this, as an example...

#049,074,076,
When you give the ivory to the beggar, 2 robed men appear and you follow them thru the dark to an old man. "We know who are, we want the jewels!" he says./
/
You know you don't have them but he says "look in your coat pockets!"/
*obey the man, run away\

So that is under 300, much reduced from a full page..

A smaller example, originally 400 characters long...

#087,117,118,
"The diamonds are in Morocco, the rubies in Denver" you said... they said, "Let's go get them, choose which one first!"/
*Morocco, Denver\

This one i think is under 150...

So, yes it will be a struggle keeping the volume down... flipping thru the book, most lean towards the smaller example in size, not very many to half page or beyond...

.. also, there are probably over 20 full paged graphics, so the 130 page is probably only 100 some pages of text...

also, there are 35 pages that simply say goto another page to continue reading, and i can condense these as well... saving 35 pages...

And of course there are about 40 THE END pages, most not ending well, so those should be very small in size ...


Taking all into account, maybe only need 65 pages, 40 being very small, leaving only 25 that might be large....


was8bit 2020-12-10 23:00 (Edited)



40x100=4,000
25x300=75,000

Maybe 80,000 characters... hmmm....


was8bit 2020-12-10 23:06

Here is one ending..

#089,
A girl gives you the leash to her dog and runs away. The dog is a metalic, "boom" it explodes... :(\

Reduced from 400 down to 100 characters.... :)


was8bit 2020-12-10 23:18

I guess i just need to go ahead, trimming the text to its extreme for this first book, and see what happens ...


nathanielbabiak 2020-12-11 00:10

Sounds good - it's a plan then! I think there's enough space - you're right that you'll save a bunch of space on the "goto" pages, since buttons could functionally replace them.


was8bit 2020-12-11 01:47

Wish me luck...i will post any progress :)


Greenpilloz 2020-12-20 17:00 (Edited)

That's a nice project ;) Heres some idea of mine:
* Did you consider also adding a "page break" char
* Why not using ascii charts directly (they got line breaks and all already in the first 256 chars). Just have the text as a bunch of single bytes and display only printable characters
* One approach for text compression is to use a dictionary. You take like the 1000 most used english words and you store them in like 2 or 3 ROMS (1000 * 5 B = 5Kb). So you loose a bit of space with this but then you can write the book by referring only to the words. For words not in the dict you can have like a prefix like $00 means the following bytes code for letters and $FF means following bytes are words


was8bit 2020-12-21 04:56 (Edited)

Great ideas :) i am way too busy with work to wrap my brain around this lately, but you have inspired me :)

Researching the top 50 commonly used english words, i came up with a scheme that saves over 50 characters on the list... this will be a straight substitution only in the ROM files...converting to and from micro to normal and back...


H=THE
O=OF
D=AND
N=IN
S=IS
Y=YOU
T=IT
W=WAS
F=FOR
R=ARE
V=HAVE
F=FROM
U=BUT
C=CAN
EA=EACH
CH=WHICH
EI=THEIR
ER=THERE
SA=SAID
UR=YOUR
EN=WHEN
WR=WERE
WH=WHAT
WI=WITH
TH=THAT
EY=THEY
TI=THIS







was8bit 2020-12-21 04:58 (Edited)

So "YOU HAVE BUT THEY ARE" in the book becomes "Y V U EY R" in the ROM file ;)

21->10


was8bit 2020-12-21 05:02

By focusing on the top commonly used words should help ....


Greenpilloz 2020-12-23 11:49

Nice system ! Glad to help. Maybe it would be simpler to use keys like "W00", "WF2", etc... Instead of the keys you used. You'll loose in readability though.

Another way to go would be to store all the words used in your novel in a list when compressing and store the book content as just numbers:
- dictionary: "TO;BE;OR;NOT;..."
- book content: "01 02 03 04 01 02 05" -> "TO BE OR NOT TO BE ADVENTUROUS"

The problem is that it will take time to compress since you have to check for each word in the text against all words in the dict to see if you already have it and if not add it... However the number of pages that can be stored could be huge. (It would be like replacing all words by 2 letter ones)

By the way, while researching how to compress text I found a technique to pack ascii char in less than one byte (https://programmingpraxis.com/2014/05/06/packed-ascii/) to achieve 25% compression on words storing. Hope that helps ;-)


was8bit 2020-12-23 11:52

Ooo, thanks for sharing... will research the link :)


Log in to reply.