Re: Next set of vetting
- From: Thomas Richter <thor@xxxxxxxxxxxxxxxxx>
- Date: Tue, 26 Aug 2008 09:30:08 +0200
Einstein schrieb:
This is some stuff to further vet, so that I can get past this stuff.
Patent Pending, so I am protected, but I am sticking to my Wednesday
release date for the method. This is to help get stuff out of the way
for when the release is done for fastest understanding. Note for most
in this forum the principles of some of this will be to easy, but
frankly I WANT TO MAKE SURE I CAN GET THE CORE TO BE FOCUSED ON!!!
The Second Fundamental is Pascals triangle.
Pascals triangle has applications in many many different fields. For
our purposes the Triangle is a perfect tool for identifying the odds
of certain outcomes in certain sized files.
About two years ago we had a guy here that presented a nice method for
encoding a permutation of k bits in a total of n bits optimally (i.e. to
encode one of the (k over n) permutations). The algorithm was neat, but
required a lot of buffering, and the major flaw of it was that there it
was rather unclear how to apply it - i.e. there was no easy way to
design a model for a data source that fits to that. I did a couple of
tests with it, and found that even for primitive audio coding an MQ
coder could easily outperform it. By that I mean that you should
probably check for prior art.
The Third Fundamental is Separate Filing
This is a new concept to some extent. In typical compression the usage
of multiple files, of multiple sizes, of multiple possible names,
results in a potential compression method, for extremely imbalanced in
ratio levels information. It is extremely ineffective, and unlikely to
compress. The problem ultimately is an issue of the command section,
or real size of a factual file, to hold the data separated.
The numbers feel impressive, for this can be an example:
Assumption, up to 3 files, 1 to 3 bits in each. Each file therefore
has 14 outcomes. This is 14 * 14 * 14 + 14 * 14 + 14 * 14 + 14 * 14 +
14 + 14 + 14, or a total of 3374. This is 11.7202 bits in ratio. Where
we had 1 to 3 bits, with an average of 2.57 bits per file, of 1 to 3
files, and a predictable number of around 2.5 files averaged. This is
7.19 bits averaged! It seems wonderful... until you need to count how
many bits per which file. The command section requires 1 bit per file
to say if it is active, 1 bit per an active file to identify if there
is 1 bit or 2 bits, and this adds up fast. So fast that it out does
any gains statistically, and renders itself moot.
However if a means existed to sort information to different files,
without an overhead cost to keep track of this information then a huge
increase would occur in the possible data being stored per bit. As
would the ability to have different numbers of bits in a file tracked
without cost. Either way would be an increase, together would be a
dramatic increase in the capacity for our purposes.
However, if 1 = 0, I would fly to the moon. The point is that the
counting argument ensures that there is no such means, so this point is
moot in first place. If you start with an assumption that is false, you
can prove everything (ex falso quodlibet). You need to keep track of
side-information somehow, and that's the matter of the operating system,
and if you do not count this side-information, you get wrong results,
clearly.
You should probably look into things like multi-channel coding, maybe
that's related. (Yes, first get in touch with existing technology).
The Sixth Fundamental is Ratio's
Ratio's can play an important part in compression. It is possible to
have anywhere from 100% ones, to 0% ones, versus a 0% zero's to 100%
zero's. This can lead to wildly different variables when talking two
bit outcomes, even if both are 50%. An example. You might have 50/50
1's to 0's, but 20% - 11, 30% - 10, 30% - 01, 20% 11. This would be a
ratio imbalance.
It is possible to create an imbalance from a pure statistical evenness
via any number of means. For instance a Huffman with 11 = 111, 10 =
110, 01 = 10, 00 = 0 will result in a statistical imbalance of 66.7%
total 1's versus 33.3% total 0's
You forget here that the input of the huffman must be "out of balance"
in first place to make the huffman worth applying. If the input is IID
with a flat distribution, then the huffman code above would indeed cause
an out-of-balance of one's and zero's, but you wouldn't apply huffman to
that data source in first place. If you would apply huffman to the
source it was modelled for, you will find that the result is in balance.
However, huffman is not ideal for all sources (arithmetic is), so you
can find for "most" sources an out-of-balance output, which is just
another way of expressing the non-optimality of the code. For an optimal
code, no such out-of-balance situation arises (proof: If it would be out
of balance, a simple binary arithmetic compressor would shorten the
output file size, making the combined algorithm performing better, which
is against the assumed optimality of the initial code. q.e.d.)
I will not go into Arithmetic encoding, bijective encoding, and other
compression techniques because frankly I do not need to. These are
better than Huffman is, by a good measure, but they are not needed to
validate the method.
They are needed from a theoretical standpoint to debunk your method.
Now, please back to the drawing board.
Thanks,
Thomas
.
- Follow-Ups:
- Re: Next set of vetting
- From: Willem
- Re: Next set of vetting
- References:
- Next set of vetting
- From: Einstein
- Next set of vetting
- Prev by Date: Re: Next set of vetting
- Next by Date: Re: Next set of vetting
- Previous by thread: Re: Next set of vetting
- Next by thread: Re: Next set of vetting
- Index(es):
Relevant Pages
|