Re: Random access to content of archived files without extraction - how? compression scheme? algorithm? source code? tutorial? software?
- From: "giorgio.tani@xxxxxxxx" <giorgio.tani@xxxxxxxx>
- Date: 30 Jun 2006 00:29:05 -0700
still failed to find a compression scheme which allows me to tell the(maybe a silly question) Does the offset known by the user is referred
extractor:
get from file X.dat in archive Y.ext a 100 byte large section
beginning at offset 0xFFEEDDCC
where file X.dat is e.g. a 10 GByte large file (with huge sections
filled with same bytes) compressed into the archive Y.ext .
to uncompressed data or to compressed data?
In the first case, that seem to me the more probable from the user's
point of view, you can simply write a very simple compression scheme
from the compression primitive that works at best on your data,
compressing a fixed size of input data to a size you store in a
variable that you write to the output file along with compressed data.
The compressed data may be structured in blocks like this:
|- compressed size in B (an uint) -|- compressed data block -|-etc..
When you need to recover data at a certain point (referred to the point
the data occupies in the original uncompressed stream) you simply have
to read the first field (the size of the block, of a fixed size that is
the size of the type of variable you use, in this example 4 byte) and
then read (and discard) the n bytes of the second field of the block
(the compressed data), then you have the first field of the next block
and so on.
In this way you can simply parse the first field data (and simply and
quickly discard the second field data) until you reach the block were
the desired offset (of uncompressed data) begins, then start
uncompressing only the blocks containing the data you need.
That happens because if the compression scheme doesn't keep a state
from one block to the another (doesn't learn from previous data), you
don't need to uncompress all previous block to uncompress a random
block in the sequence, you have simply to "walk" trough the compressed
strem using a scheme like this one and the expand only the bolck(s)
were your data is, saving a lot of computing time.
And the scheme doesn't pose a limit to output file size (but the
underlying filesystem will do!).
You could do it even more efficiently building an index of the blocks
ath the beginning of the compressed file (a zone where are saved all
the sizes that in the previus scheme was called the first field), that
will save you some time when searching for the block with the data you
desire (basically because you have few data to read from disk, that is
a seriuos bottleneck, maybe if the file is small enough to be stored in
memory the gain would not be so noticeable), but this is more tricky to
program.
Obviously that approach poses some drawbacks, since the compressor will
work only on single blocks without learning from previous data, so the
compression level will be reduced.
(ps, you will obviously also need a mechanism to control the data
against casual or malicious corruption, depending from your scenery,
but it's OT)
.
- Prev by Date: CompIMAGE2006 - Remainder: SUBMISSION PERIOD ENDS TOMORROW
- Next by Date: Re: Random access to content of archived files without extraction - how? compression scheme? algorithm? source code? tutorial? software?
- Previous by thread: Re: Random access to content of archived files without extraction - how? compression scheme? algorithm? source code? tutorial? software?
- Next by thread: Re: Random access to content of archived files without extraction - how? compression scheme? algorithm? source code? tutorial? software?
- Index(es):
Relevant Pages
|
|