perceptual loss...



well, for anyone that cares:
I discovered that in my tweaked out jpeg encoder (where I was getting
notably smaller filesizes, and didn't yet have a working decoder), there was
a bug that was likely messing up the results, as it would tend to omit a lot
of the coefficients...

true results are much less impressive...

then again, one never knows for sure what really works and what doesn't
until there is a working decompressor...


now, here is what I was battling against:
trying to figure out how I can get tolerable compression from a variation of
a linear predictor absent perceptible loss (at the "high-quality" end).

easier said than done it seems.

it seems, hearing is very sensitive to even minor tweaks (such as total
sample rate, mono vs stereo, ...), and even simple filtering tricks (for
example, partly averaging the left and right channels, or applying a "blur"
filter on the input samples) are not unnoticable.


so, I guess, if the goal is music, and the goal is being perceptibly
lossless, then maybe this is not a great task for linear filtering.

it is fine it seems when "quality" is not a concern (or ones' input is
either voice, or mechanical noise, or explosions, ...).

alternatively, it should be ok if one assumes that one is watching generic
tv type stuff (talking, ...), on generic tv speakers. in this case, the loss
should be fairly well hidden.

main trick then will just be to downsample to 11kHz and hope no one
notices...


so, why the issue:
it is my general observation that most mdct codecs are rather complicated...

if one has doubt, they can look over the source for said codec.

this does not seem like something one can pull off in a few hundred loc
(starting point, both the encoder and decoder fit within about 400 loc, this
later version fits in about 700 loc, excluding most of the code for managing
the bitstream and entropy coding).


but what is gained from mdct:
being perceptibly lossles at reasonable bitrates, for high-quality inputs.

now, I guess it is very different at the other end, as many of the mdct
codecs seem to break down at 4kbps, at least with voice. linear prediction
seems better here (broken glass vs muffled buzzing).

just, at 200kbps (and 44.1kHz, 16bit stereo) one apparently can't pull off
music that isn't muffled and gritty, which is lame...


and so, my best bet, at least for voice and sound effects, is the arguably
lame codec I had before. it will make music about 30kbps, and sounds kind of
lame, but I can just rest easy because I am not using it for storing music,
and at least for what it does do, it sounds "better" than mp3...

my old codec, apparently at least, seems perceptually lossless, if ones'
input happens to be 8kHz 8bit mono (the default minor levels of quantization
being more or less unnoticable...).

one can also go lower, eg, 4 or 2kHz, but 8kHz is poor enough quality...


so, my conclusion:

likely, the 2 technologies are different enough that they can't be
meaningfully compared directly, or at least outside some particular usage
domain.

at the high-quality end, it seems mdct codecs are superior;
at the low-quality end, I still opt for linear predictors (as at least they
are free of the broken glass sound).


others may think differently, but this still seems to be my experience...


or something...


.