Next set of vetting
- From: Einstein <michaelhh@xxxxxxxxx>
- Date: Mon, 25 Aug 2008 17:35:14 -0700 (PDT)
This is some stuff to further vet, so that I can get past this stuff.
Patent Pending, so I am protected, but I am sticking to my Wednesday
release date for the method. This is to help get stuff out of the way
for when the release is done for fastest understanding. Note for most
in this forum the principles of some of this will be to easy, but
frankly I WANT TO MAKE SURE I CAN GET THE CORE TO BE FOCUSED ON!!!
A break down of the different portions of the invention, known as
Harrington's Compression Method, or HCM for the layman.
~~~~~~~~
The First Fundamental is the Huffman.
A Huffman is a way for a computer to change a data line, and be able
to go back and change it back with out any errors, loss, or the likes.
It does this through identifying the following bit in a sequence of
bits until it recognizes a pattern.
This can be a typical Huffman
11 = 0
10 = 10
01 = 110
00 = 111
A Huffman can be patterned in many ways, and many lengths. The above
example works on switching 1's for 0's, and if 11 results outnumber 01
and 00 results it will also result in compression. A Huffman is a very
adaptable format, with many advantages to compression.
In the example above if the Huffman sees a 0 as the first bit it is
currently examining, it know it is actually a result and can end
identifying this group there. If the next bit is a 1, then it knows to
seek the following bit out and identify it. If it is a 0, then the
final result is a 10, and it ends that group there. Each is
identifiable in turn. Reversal of a Huffman is both easy and exacting.
There is no errors so long as no corruption occurred in the code.
~~~~~~~~~~
The Second Fundamental is Pascals triangle.
Pascals triangle has applications in many many different fields. For
our purposes the Triangle is a perfect tool for identifying the odds
of certain outcomes in certain sized files.
A triangle is made rather easily. The first two results are 11, and
are applied to the top of the triangle.
1-1
The left most, and right most result are always 1, however for those
used to a table, we can say the left is always a 1.
Therefore our Triangle starts to look like this:
1-1
1-
1-
1-
1-
now the second row down, 2nd column and on, operate in this manner:
Add the bit directly above the desired result, and the bit to the left
of that one.
1-1
1-2-1
1-3-3-1
1-4-6-4-1
1-5-10-10-5-1
This will always result in a SUM count over a row equal to the total
possible outcomes of that many bits, as represented in column 2. This
is always true. The row can be looked at like this. Assuming 0 is the
left, and 1 is the right, the first number represents the number of
outcomes with all 0's, the next is the number of bits, -1, that are
0's with +1 of the 1's. The 2nd column is all zero's minus two, and
plus two ones. This goes on until there is no more 0's. Therefore at
100 bits the area with 30 total 0's and 70 total 1's will have around
29,372,339,821,610,900,000,000,000 (with some rounding occurring
(darned maximums of OpenOffice and Excel) outcomes. As we progress in
total number of bits we end up with a odds of occurrence of most
outcomes (to a degree which rounds to 100% just by the decimal
displacement generated) to lay in/close to, the 50% range, with a
deviation of less than 1% in either direction.
This is a basic law of large software, so long as it is entropic, or
random. If the software is text based, largely non-random, or is
highly imbalanced in a low bit ratio, then it will fall outside this
norm.
~~~~~~~
The Third Fundamental is Separate Filing
This is a new concept to some extent. In typical compression the usage
of multiple files, of multiple sizes, of multiple possible names,
results in a potential compression method, for extremely imbalanced in
ratio levels information. It is extremely ineffective, and unlikely to
compress. The problem ultimately is an issue of the command section,
or real size of a factual file, to hold the data separated.
The numbers feel impressive, for this can be an example:
Assumption, up to 3 files, 1 to 3 bits in each. Each file therefore
has 14 outcomes. This is 14 * 14 * 14 + 14 * 14 + 14 * 14 + 14 * 14 +
14 + 14 + 14, or a total of 3374. This is 11.7202 bits in ratio. Where
we had 1 to 3 bits, with an average of 2.57 bits per file, of 1 to 3
files, and a predictable number of around 2.5 files averaged. This is
7.19 bits averaged! It seems wonderful... until you need to count how
many bits per which file. The command section requires 1 bit per file
to say if it is active, 1 bit per an active file to identify if there
is 1 bit or 2 bits, and this adds up fast. So fast that it out does
any gains statistically, and renders itself moot.
However if a means existed to sort information to different files,
without an overhead cost to keep track of this information then a huge
increase would occur in the possible data being stored per bit. As
would the ability to have different numbers of bits in a file tracked
without cost. Either way would be an increase, together would be a
dramatic increase in the capacity for our purposes.
~~~~~~~~~
The Fourth Fundamental is perfect tracking
If Step A tells you to do 00 = 11 then step B tells you 11 = 10, and
your command section is AB, then if your code was 001110 you get
instead 100011 as your final results. Reversing it is as simple as
doing the reverse, BA.
~~~~~~~~~
The Fifth Fundamental is Lossy versus Lossless
A lossy code system can mistake two results for each other. An example
of a lossy encoder would be one which takes a word, or words, it knows
in context, and tries to repair them. An example is “If then” type
statements. If the If then is in a programming context then the
software may be able to just predict a lot of the system, but if it
happens to be in a text portion it may lose some of the text to
errors.
A lossless system never makes a mistake in the code. There is no means
to create a mistake as steps are clearly written to produce step B
after step A, and with perfect accountability. A Huffman is a lossless
algorithm due to the fact it can be flawlessly reversed.
~~~~~~~~~~~~
The Sixth Fundamental is Ratio's
Ratio's can play an important part in compression. It is possible to
have anywhere from 100% ones, to 0% ones, versus a 0% zero's to 100%
zero's. This can lead to wildly different variables when talking two
bit outcomes, even if both are 50%. An example. You might have 50/50
1's to 0's, but 20% - 11, 30% - 10, 30% - 01, 20% 11. This would be a
ratio imbalance.
It is possible to create an imbalance from a pure statistical evenness
via any number of means. For instance a Huffman with 11 = 111, 10 =
110, 01 = 10, 00 = 0 will result in a statistical imbalance of 66.7%
total 1's versus 33.3% total 0's
~~~~~~~~~~~~
I will not go into Arithmetic encoding, bijective encoding, and other
compression techniques because frankly I do not need to. These are
better than Huffman is, by a good measure, but they are not needed to
validate the method.
.
- Follow-Ups:
- Re: Next set of vetting
- From: Mark Nelson
- Re: Next set of vetting
- From: Thomas Richter
- Re: Next set of vetting
- From: Einstein
- Re: Next set of vetting
- Prev by Date: Re: A new thing, Peer review time
- Next by Date: Re: Next set of vetting
- Previous by thread: Preliminary vetting
- Next by thread: Re: Next set of vetting
- Index(es):
Relevant Pages
|