High quality audio player

Discussion

SP4CEBAR 2024-01-12 21:59 (Edited)

Since any sound can be defined as a spectrum of frequencies, it is possible (with external tools) to find the spectrum for each of NX's sounds:

the triangle wave,
the sawtooth wave,
the pulse wave with each of the 16 pulse width values,
and the noise sound.

This information can be used to turn an audio file into a series of combinations of NX sounds that resemble the spectrums of each of the sounds in the file the most. This combination of NX sounds could even be encoded in one or more NX music files. This conversion can be done in NX or entirely with external tools.

SP4CEBAR 2024-01-13 10:39 (Edited)

Such external tools would likely be a programming language with libraries or packages to do the following:

read audio files (which are the audio files that you want to encode and recordings of NX sounds)
perform fast Fourier transforms on the audio files

The program itself would need to be able to

match the spectrums of the samples of the NX sounds with the spectrums of each time interval of the audio file that is to be encoded.
- I think the easiest way to do that is to check all combinations of the NX sounds on four voices
  - ((4 sounds + 15 other pulse widths) * 96 tones * 16 volume levels)) ^ (4 voices) = 29184^4 which is about 7.254*10^17 which is probably going to take a while
- another way to do that is to predict the effect of volume levels on the spectrum, which would cut the amount down to 1824^4 which is about 1.107*10^13
  - predicting the pitch as well would cut the amount down to 19^4 = 130321
  - predicting the pulse width as well would bring it down to 4^4 = 16
- or to repeatedly try all 4^4=16 combinations of instruments with different parameters to try to find what direction of the parameters is more favorable, the parameters should move in that direction to converge them to the right value.
- or to use a hand-crafted template that gives control over specific regions by changing certain known parameters of the sounds used in the template
encode the resulting combination of sounds and parameters to an NX music file

McPepic 2024-01-13 16:02

I was actually thinking about this problem before and was thinking -
Wouldn’t this problem be suited to a neural network?

It would then be possible to hook up an audio file to the input nodes and the sound parameters to the output nodes. The program would then be able to train itself by reproducing the resulting wave from the predicted parameters, compare that wave to the input wave, and adjust the weights based on the discrepancy.

Usually, however, there’s a predictable amount of input nodes, however, so maybe you could break the input data into chunks that could be processed individually by the neural network. Then, on playback, NX could cycle through the parameters from the neural network at the start of each chunk.

Just some ideas, though.

SP4CEBAR 2024-01-13 16:13

Yeah, that would work, maybe each sound can be divided up into a low number (like 8) of frequency bands, this low number likely wouldn't affect the final sound that much since it probably is still not going to be a high quality by today's standards

SP4CEBAR 2024-02-01 22:37

I have a linear algebra exam tomorrow, and right now my mind is loaded with formulas, this gave me a new take on this matter:
1. fast fourier transform each sample into a set of N frequency bands
2. make an vector with N dimensions to store the intensities of the frequency bands in for each sample
3. the span of these sample vectors form a vector space
4. find the multiples of each of these vectors to arrive at a target vector (which is the set of frequency band intensities that make up your target audio's sound)

however this approach relies on being able to generate the frequency bands of each NX sound

SP4CEBAR 2024-02-02 09:01

You can solve this math problem (a system of equations) using the reduced echelon form of a matrix of which each column is a vector

SP4CEBAR 2024-02-04 16:50 (Edited)

The system to be solved will look something like this:
vec_target_audio = a*vec_sample_1 + a*f*vec_sample_1_frequency + b*vec_sample_2 + b*f*vec_sample_2_frequency + ...

all single letter variables are scalars that can be solved with math as long as there are less scalars than vector dimensions, and the vectors aren't dependent (such a vector adds no useful new information to the system)

f is the frequency in all 16 bits that NX offers

The other scalars are the volumes of the sounds: they will be reduced to a 4-bit-value after the system has been solved.

Each vector (vec) contains the information of an audio sample, or it contains the changes that happen as the frequency increases. All vectors have as much dimensions as there are frequency bands (this amount is chosen, and will likely be 8 or less).

Each sample is a wave type: including all pulse widths, I count 4+15=19 wave types, to solve this system I may need a lot more frequency bands, and I need to make sure that enough vectors add new information to the system.

This expressions assumes that raising the volume of a sample, makes all frequency bands louder by equal amounts.

The two vectors of each sample probably have to be integrated into a single vector with elements like: "2 + 5*f", otherwise it may be harder to solve.

I think the system can almost be solved with exactly 19 frequency bands (vector dimensions) and 19 independent vectors, if some of the vectors are dependent, then the number of vectors and frequency bands needs to be reduced. The problem is that:

having more vectors than dimensions means that there are too many scalars for this system and that some of them can't be solved
having more dimensions than vectors means that there are not enough scalars to be able to solve all dimensions, and there will be no solution to this system.

So this system can be solved like this for all scalars except "f"

SP4CEBAR 2024-02-06 03:51

Having too many solutions is more favorable than no solutions, it isn't that much of a problem as it gives some play to the function that truncates the volumes down to 4-bit values and limits the voices to 4

McPepic 2024-02-06 16:41

I don’t really know much about it, but would the LFO be useful at all in a program like this?

SP4CEBAR 2024-02-08 16:32

if you would encode it as an NX music file, the LFO would allow you to reach more frequency values than you could otherwise. Outside of files, the LFO isn't that useful because we have access to the 16-bit frequency registers