Re: Random Ideas



'Faqs' thread for compression queries.
http://www.faqs.org/faqs/compression-faq/part1/section-8.html

From Wiki article
[http://en.wikipedia.org/wiki/Lossless_data_compression]

Limitations:-

Lossless data compression algorithms cannot guarantee compression for
all input
data sets. In other words, for any (lossless) data compression
algorithm, there will be
an input data set that does not get smaller when processed by the
algorithm.
This is easily proven with elementary mathematics using a counting
argument, as follows:

* Assume that each file is represented as a string of bits of some
arbitrary length.
* Suppose that there is a compression algorithm that transforms
every file into a distinct file
which is no longer than the original file, and that at least one file
will be compressed
into something that is shorter than itself.
* Let M be the least number such that there is a file F with
length M bits
that compresses to something shorter.
Let N be the length (in bits) of the compressed version of F.
* Because N < M, every file of length N keeps its size during
compression.
There are 2N such files. Together with F, this makes 2N + 1 files
which all compress into one of the 2N files of length N.
* But 2N is smaller than 2N + 1, so by the pigeonhole principle
there must be some file of length N which is simultaneously the outpu
of the compression function on two different inputs.
That file cannot be decompressed reliably (which of the two originals
should that yield?),
which contradicts the assumption that the algorithm was lossless.
* We must therefore conclude that our original hypothesis
(that the compression function makes no file longer) is necessarily
untrue.

Any lossless compression algorithm that makes some files
shorter must necessarily make some files longer,
but it is not necessary that those files become very much longer.
Most practical compression algorithms provide an "escape" facility
that can turn off the normal coding for files that would become
longer
by being encoded. Then the only increase in size is a few bits to tell
the
decoder that the normal coding has been turned off for the entire
input.
For example, DEFLATE compressed files never need to grow by more
than
5 bytes per 65,535 bytes of input.

In fact, if we consider files of length N, if all files were equally
probable,
then for any lossless compression that reduces the size of some file,
the expected length of a compressed file (averaged over all possible
files of length N) must necessarily be greater than N. So if we know
nothing about the properties of the data we are compressing,
we might as well not compress it at all. A lossless compression
algorithm is only useful when we are more likely to compress
certain types of files than others; then the algorithm could be
designed to compress those types of data better.

Thus, the main lesson from the argument is not that one risks big
losses,
but merely that one cannot always win. To choose an algorithm always
means implicitly to select a subset of all files that will become
usefully shorter.
This is the theoretical reason why we need to have different
compression
algorithms for different kinds of files: there cannot be any
algorithm that is good for all kinds of data.

The "trick" that allows lossless compression algorithms,
used on the type of data they were designed for,
to consistently compress such files to a shorter form is that the
files
the algorithm are designed to act on all have some form of
easily-modeled redundancy that the algorithm is designed to remove,
and thus belong to the subset of files that that algorithm can make
shorter,
whereas other files would not get compressed or even get bigger.
Algorithms are generally quite specifically tuned to a particular
type of file:
for example, lossless audio compression programs do not work well on
text files, and vice versa.

In particular, files of [ RANDOM] (section1.see below for my
explanation )
data cannot be consistently
compressed by any conceivable lossless data compression algorithm:
indeed, this result is used to define the concept of randomness in
algorithmic complexity theory.

There have been many claims through the years of companies achieving
'perfect-compression'
where an arbitrary number of random bits can always be compressed to
N-1 bits.
This is, of course, impossible: if such an algorithm existed,
it could be applied repeatedly to losslessly reduce any file to length
0.
These kinds of claims can be safely discarded without even looking at
any
further details regarding the purported compression scheme.

An algorithm that is asserted to be able to losslessly
compress any data stream is provably impossible.[2] In a more general
sense,
any compression algorithm whose proposed properties contradict
fundamental laws of mathematics may be called magic.

Mathematical background:-

Any compression algorithm can be viewed as a function that maps
sequences of units (normally octets)
into other sequences of the same units.
Compression is successful if the resulting sequence is shorter than
the original sequence.
In order for a compression algorithm to be considered lossless,
there needs to exist a reverse mapping from compressed
bit sequences to original bit sequences; that is to say,
the compression method would need to encapsulate a bijection between
"plain" and "compressed" bit sequences.

The sequences of length N or less are clearly a strict superset
of the sequences of length N-1 or less. It follows that there are
more
sequences of length N or less than there sequences of length N-1 or
less.
It therefore follows from the pigeonhole principle that it is not
possible to map
every sequence of length N or less to a unique sequence of length N-1
or less.
Therefore it is not possible to produce an algorithm that reduces the
size of every possible input sequence.

Psychological background:-

Most everyday files are relatively 'sparse' in an information entropy
sense, and thus,
most lossless algorithms a layperson is likely to apply on
regular files compress them relatively well.
This may, through misapplication of intuition,
lead some individuals to conclude that a well-designed compression
algorithm
can compress any input, thus, constituting a magic compression
algorithm.

Points of application in real compression theory

Real compression algorithm designers accept that streams of high
information entropy cannot be compressed, and accordingly,
include facilities for detecting and handling this condition.
An obvious way of detection is applying a raw compression
algorithm and testing if its output is smaller than its input.
Sometimes,
detection is made by heuristics; for example, a compression
application may consider
files whose names end in ".zip", ".arj" or ".lha" uncompressible
without any more
sophisticated detection. A common way of handling this situation is
quoting input, or
uncompressible parts of the input in the output, minimising the
compression overhead.
For example, the zip data format specifies the 'compression method'
of 'Stored' for input files that have been copied into the archive
verbatim.[3]

The Million Random Number Challenge

Mark Nelson, frustrated over many cranks trying to claim having
invented a magic
compression algorithm appearing in comp.compression,
has constructed a 415,241 byte binary file ([1]) of highly entropic
content,
and issued a public challenge of $100 to anyone to write a program
that,
together with its input, would be smaller
than his provided binary data yet be able to reconstitute
("decompress") it without error.[4]

The FAQ for the comp.compression newsgroup contains a challenge
by Mike Goldman offering $5,000 for a program that can compress
random data.
Patrick Craig took up the challenge, but rather than compressing the
data,
he split it up into separate files all of which ended in the number '5
' which was not stored as part of the file. Omitting this character
allowed
the resulting files (plus, in accordance with the rules, the size of
the program
that reassembled them) to be smaller than the original file. However,
no actual compression took place, and the information stored in the
names of the files was necessary in order to reassemble them in the
correct order into the original file, and this information was not
taken into
account in the file size comparison. The files themselves are thus
not
sufficient to reconstitute the original file, the file names are also
necessary.
A full history of the event, including discussion on whether or not
the challenge
was technically met, is on Patrick Craig's web site
--------------------------------------------------------------------------------------------
//**This is from wiki article**//
-------------------------------------------------------------------------------------------
One notorious thing you have to concentrate is about data is
'random'

My explanation for the section1.
let's first see this data
case1)1 1 1 1 0 0 1 0 1 1 0 1 0 1 0 0 0 1 0 1 0 0 1 1 [no doubt total
random.]

case2) 4 8 10 12 16 20 22 24 26 30 32 36 40 44 48 5
[what will you call this type of data?]
observe it carefully and write your conclusion for the case 2.

Thanks.
will contd........
.



Relevant Pages

  • Re: IFS Recess
    ... The new jpeg standard rejected fractal compression in favor of other ... it was appropriate to revisit the lossless coding mode within JPEG. ... This mode had been a late addition to the standard, ... A number of contenders for a suitable algorithm were proposed, ...
    (sci.fractals)
  • Recompressing poorly compressed files
    ... compression algorithm was not very good, and the 2nd one is very ... just compressing with a good algorithm in the first place of course. ... algorithms for compression and decompression, or even if the're a bit ... reversible (e.g. 'decompressing' a gif into a bitmap means losing some ...
    (comp.compression)
  • Re: Computer being developed modeled after human brain
    ... innate skills. ... run-length-encoding compression module. ... If the "edge detection" algorithm is not useful for ... edge detection algorithm will be replaced. ...
    (comp.ai.philosophy)
  • =?windows-1252?Q?Re=3A_Artificial_intelligence_and_Landauer=92s_princip?= =?windows-1252?Q?l
    ... Not so much a small amount of hardware rather a lot ... This is basically a data compression problem applied ... to a learning algorithm. ... understand how evolution evolved. ...
    (comp.ai.philosophy)
  • Re: Lossy image recompression
    ... compression and then to recompress it with no loss using this same ... and reflect JPEG images. ... These operations are truly lossless, but they achieve this by never decompressing far enough to reach a lossy phase of the algorithm. ...
    (comp.compression)

Loading