Re: Some questions



Does this size change if it
(contents) is written as text, or as binary?
In all cases, the file is made of 0s and 1s, so the way the data is
organized is not relevant for file size itself.

2) What about the title of the file for space as well? Is there a
specific size in bits each character is represented as? Perhaps 8? In
the new file I mean.
That depends on the filesystem what characters are allowed in file
names and how are they represented. You may need one byte per
character in ASCII, but modern systems will use more flexible coding
schemes (you can find good articles UTF and Unicode).
Briefly some methods uses 2 or even 4 byte per characters, but other
ones will use non fixed lengths for characters, so you will usually
have most common characters (for west-european languages) mapped
within the space of firts byte, and 2, 3 or even 4 bytes for coding
extended characters, ideograms, pictograms etc, with the double
usefullness of saving characters for encoding most common worlds (in a
west-european perspective), and theoriacally having the ability of
fulfilling any conceiveable need for more alphabets to code.

3) Is there a limit to file name sizes?
It depends on the filesystem

4) I understand entropy fully now, so this does have a factor. But can
this statement be true versus entropy? "A file can be made up of
compressible and non-compressible portions (aka strings) and if there
is a way to sort the compressible portions separate of the non-
compressible you can achieve compression on most files."
5) What is the highest known 'compressible rate' for a random binary
sequence? IE, if all data is random, is there a program that can
handle 5%, 10%, 20%, 40% or even 49.99%?
The very problem is that on a pure etropic file you cast prediction on
following bit knowing the previous ones so any method you may use to
sort (or otherwise give an alternative representation) of such a file
will be less efficient than copying the file itself, and the file will
be uncompressible.
Most of random files will have very high entropy levels, not following
a predictable model, and will not compress at all but rather would
increase in size (unless the compression scheme check this case and
avoid it, but you will have anyway an increase of size since you need
to code the information "the following data is not compressed" for the
decompressor).

.



Relevant Pages

  • Re: Reduce numbers to one number
    ... jpg is lossy in that the uncompressed image doesn't equal the original. ... more to compression. ... could jump up to 16 bit "characters" e.g. unicode. ...
    (sci.math)
  • Re: mp3AIFF
    ... A quick lesson in lossy compression... ... AAAAAAAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAABBBBBBBBBCCCCCCCAABBBBBBBBBBBBBB ... Now the same information takes up 17 characters, ... That's lossless compression; you still have all the information. ...
    (uk.comp.sys.mac)
  • Re: compression API available in Java & C++?
    ... > "Monique Y. Mudama" wrote... ... >>> compression scheme that asume them. ... >> ASCII characters, and of those only the common ones that show up on ... > schema for mapping each character to 6 bits instead of 8. ...
    (comp.lang.java.programmer)
  • Re: Need specific BootCamp/Vista advice please
    ... Hex garbage characters might work! ... Not for the purpose of tight compression of a file, ... Duplicate the garbage file about 4 times, stuff dups in a folder. ...
    (comp.sys.mac.system)
  • Re: Strings and bindary data
    ... String to another file, are those files guaranteed to be identical? ... No. Strings are designed to hold textual data, and that /always/ is subject to ... beyond what is in its representation. ... representation -- that's to say a mapping from abstract characters (or ...
    (comp.lang.java.programmer)