Re: Employing a set of data files guaranteed to be available at the time of decompression
- From: industrial_one@xxxxxxxxxxx
- Date: Fri, 13 Jul 2007 20:50:35 -0700
On Jul 13, 6:54 pm, Reality Handbook <realityhandb...@xxxxxxxxx>
wrote:
On Jul 13, 3:48 pm, industrial_...@xxxxxxxxxxx wrote:pbox] Errrmmm...I'll do my best to take that to have been intended
as a compliment, but I'll admit remarks like this are one of my pet
peeves. They seem unnecessary and just serve to ruffle feathers.
lol, dude, I wasn't being sarcastic. You just appeal to me like a
generally intelligent guy who's new in the field of data compression
(What the hell, so am i, I'm still in highschool for fux sake) so I
thought people would definetaly benefit if you mastered your knowledge
in data compression and enhanced the current faulty solid compression
schemes.
Compression is a somewhat niche topic that I only know the basics
about, which is why I came to solicit the experience of the group.
But I know a lot of math, engineering, and yes--programming. I'm
grateful for your shared experience, but...please don't condescend
while offering it! [/soapbox]
Ok, bygones. Back on topic!
No condescending intended. Sorry if I sounded like some assholes we
get on comp.compression *cough*Guenther*cough* every now and then .
Now *that's* the kind of information I am most curious about and why
I'm posting here. Do you have any idea why these solid compression
methods are failing to achieve the goal? Which programs were you
trying to use that failed? Even if the file were large and had
I was using WinRAR and 7-Z, the only good codecs I know that feature
solid archiving. Both of them failed, even on more aggressive
preferences.
repeating patterns, it seems if they were using a small moving window
or not enough passes, it might have missed the repetitions in the ROMs
because they were too far apart inside the file to exploit:
Yeah that would be my guess too. These 2 files that failed to compress
were not ROMs though, they were EXE patches to another larger program,
the first (older) one is (loosely) 23000 KB and the other (updated)
one is 23015 KB. I examined them with Hex Workshop to identify the
precise differences between them, and I only got about (150 i think)
instances with the average being 300 bytes of difference.
They alone cannot be reduced because they have already been
compressed, but since the proceeding nearly-identical version is such,
it SHOULD be able to compress to half when using solid compression
methods. But it still only compressed like 1%.
I'd be very interested to see more empirical data about people who've
tried to solidly compress very "obviously" redundant data (like your
ROMs) and seen little benefit. Perhaps there is a comparison table
Speaking of which, you reminded me of something: I DO have a romset
where each is 32 MB in size with a total of 48, making 1.5 GB total,
each one can be compressed to 50% individually so the compressed
archive would be 750 MB, now since most of the files are almost
identical I used solid compression just to see what I would get, to my
surprise I reduced it down to only 100+ MB with WinRAR, I expected
much more than that. I then realized I overestimated the RAR -solid
technique when I gained more in-depth knowledge about it. It was
simpler than I thought, there was no clever choice of grouping when
dealing with the files... rather it just combines the files and
applies the same LZW algorithm over them all with additional steps to
be able to distinguish the files on decompression. Think of it like a
dictionary file, it contains thousands of rows of words, if they are
not sorted alphabetically then the .txt file compresses 40% but when
sorted alaphetically it reduces 60%. So because -rar solid is an early
prototype of the technique, it's faulty. 7-Zip, however is more
advanced as it seems to use certain files as the base and if the
proceeding files do not compress well, a new base is set. So I tried
to compress with 7-Zip with aggressive preferences set and I got 1.5
GB down to... 55 MB, about 28:1. I was still not satisfied, I expected
not more than 32, but since one of the 48 files had about 25% of its
contents deviating from the typical range, I thought it was realistic.
I'm still not convinced, though, why the *** can't I compress these 2
patch.exe files???
somewhere of how different algorithms have fared in showing benefits
from solid compression? It would be easy enough to make one by
picking a few sets of ROMs, PDFs, TXTs and seeing what the gains from
enabling solid compression were using popular methods. If no one has
made such a table I might give it a try.
-R
I conducted some experiments back in May 6, 2005, the day I discovered
this awesome feature, and have made at least 1000 tests --
unfortunetaly, all has been lost when my HD malfunctioned and I backed
up EVERYTHING except that :(
So, by all means you should. If you want I'll even send you the two
fuckers who refuse to be reduced.
-The Industrial One
.
- Follow-Ups:
- References:
- Employing a set of data files guaranteed to be available at the time of decompression
- From: Reality Handbook
- Re: Employing a set of data files guaranteed to be available at the time of decompression
- From: Hans-Peter Diettrich
- Re: Employing a set of data files guaranteed to be available at the time of decompression
- From: Reality Handbook
- Re: Employing a set of data files guaranteed to be available at the time of decompression
- From: Hans-Peter Diettrich
- Re: Employing a set of data files guaranteed to be available at the time of decompression
- From: industrial_one
- Re: Employing a set of data files guaranteed to be available at the time of decompression
- From: Reality Handbook
- Employing a set of data files guaranteed to be available at the time of decompression
- Prev by Date: Re: Employing a set of data files guaranteed to be available at the time of decompression
- Next by Date: Re: The best MP3 VBR bitrate choice when encoding audio?
- Previous by thread: Re: Employing a set of data files guaranteed to be available at the time of decompression
- Next by thread: Re: Employing a set of data files guaranteed to be available at the time of decompression
- Index(es):