Re: copyng large compressed files



On Mon, 04 May 2009 13:52:58 -0400, "John B. Matthews"
<nospam@xxxxxxxxxxxxxx> wrote, quoted or indirectly quoted someone who
said :

To copy a large compressed file in Java is quite efficient, You just
read it a long stream of bytes in whacking huge chunks or with a big
buffer. Don't decompress or recompress it.

However, if you do this with a Windows compressed file, the OS
transparently decompresses and recompresses -- a lot of wasted work
and RAM. Presumably there is a way to get the OS either to copy the
file for you efficiently or to hand you the raw bytes on read/write.

Is this a platform [1], virus scanner [2], differential compression [3]
or other problem?

[1]<http://it.slashdot.org/article.pl?sid=07/03/27/038227>
[2]<http://www.ads-links.com/index.php/how-to-fix-windows-vista-slow-network-transfer.html>
[3]<http://www.vistarevisited.com/2007/09/16/remote-differential-compression-and-your-vista-network/>

I presume other OSes may have a similar problem.

I haven't seen this with any of several BSD or GNU/Linux flavors.

The issue I am trying to raise would happen with any OS that supported
transparent compression. It is not about HOW Vista implemented i/o
handling. To read a file you need to decompress it. To write a file
you need ot compress it, both CPU intensive operations. However to
copy a file you SHOULD not have to do either, just copy the compressed
bits unmodified. If there were a way to do this you save:

1. overhead of decompress.

2. overhead of fatter RAM buffers to hold the same amount of info

3. overhead to recompress.

If you had a native method that could copy a file bypassing the
decompress/compress you could use the API in any OS, for all files
compressed/normal. You would get a guaranteed fast copy of some kind,
perhaps optionally even on a separate thread.

The default implementation would simply read bytes and write bytes,
which would not save you anything but the work of writing your own
version of HunkIO/FileTransfer. A smart version would copy a
compressed file without decompressing/recompressing it in the OS
driver.

http://mindprod.com/products1.html#HUNKIO
http://mindprod.com/products1.html#FILETRANSFER



--
Roedy Green Canadian Mind Products
http://mindprod.com

"Species evolve exactly as if they were adapting as best they could to a changing world, and not at all as if they were moving toward a set goal."
~ George Gaylord Simpson
.



Relevant Pages

  • embarrassing question on lzo decompression buffer corruption error
    ... but I cannot get it to decompress anything but the buffer I have compressed. ... experience with that compression engine, please take a quick look at the ... ulCompressedSize, ...
    (comp.compression)
  • Re: Terse for PC
    ... I have not tried to decompress a file created using terse on PC. ... The appropriate function (compression | ... | you must follow to successfully exchange compressed files with the host. ...
    (bit.listserv.ibm-main)
  • Re: Windows CE BinaryCompression
    ... Save Space Using Windows CE Built-in Compression API's ... You can use BinaryCompress and BinaryDecompress to perform buffer ... DWORD BinaryCompress(LPBYTE bufin, DWORD lenin, LPBYTE bufout, DWORD ... public static extern unsafe int BinaryCompress(byte* bufferIn, ...
    (microsoft.public.windowsce.app.development)
  • PKWARE DCL-compatible compressor and decompressor (based on Ben Rudiak-Goulds information)
    ... /* being able to compress as well as PKWARE Data Compression library, ... // Size of output buffer, ... // Store the current size of output ...
    (comp.compression)
  • Re: [PATCH] init: bzip2 or lzma -compressed kernels and initrds
    ... Compression is slowest. ... The kernel size is about 33 per cent smaller with lzma, ... I guess this is done on the internal hard disk of the laptop (this is ... disk, decompress, write to disk. ...
    (Linux-Kernel)