Re: real time internet traffic compression




"kamal" <kamal.kc@xxxxxxxxx> wrote in message
news:1133854684.297990.264800@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
> dear everybody,
>
> i have been developing compression schemes for real time internet
> traffic. I have been obtaining compression ratios of 10 to 15%
> in the overall result. the compression algorithm i have used
> is adaptive LZW (same as compress utility). i compress data
> on packet basis so that at any time i cannot have more than 1480
> bytes to compress. i got this result by developing two compression/
> decompression pc and deploying on one of the proxies used by my
> company(ISP).
>
> i am planning to implement deflate as next choice of compression
> scheme.
>
> i want to know what effective compression ratio can be obtained for
> real time internet traffic which richly consists of multimedia content,
> installer downloads, and already compressed data ??
>
> what is the expected result with respect to different compression
> schemes or irrespective of the compression schemes ??
>
> Are there any theoritical description on this topic ??
>
> i think maybe somebody out there can help me.
> Pointers to any related documents or materials will be
> highly appreciated.
>

just my thoughts.

lzw or deflate may work (at the packet level) if the data is uncompressed to
begin with (consider all the xml and html going back and forth), but is
unlikely to work with allready compressed data, and gains likely from
individually compressing such small amounts of data are likely to be minor.

assuming it is a point to point thing (eg: various mixed network packets,
traveling between 2 nodes possibly connecting 2 networks, for example).
possible may be to try to exploit redundancy between the packets, rather
than within packets.

a simple example of this would be globbing the packets together in a buffer
and using something like markov prediction (or lz77) to locate previous
occurances of similar data. this way, you only need to send bits of the
packets and references to previous occurances, and the other end, having
regenerated the buffer from previous packets, will be able to reassemble the
new ones.

alternatively, one can pull off about the same effect by using deflate on
the packets, treating the flow of packets as a single stream. the problem
is, deflate only looks at 32kB at a time, and would thus attempt more to
exploit redundancy within packets rather than between them.

it would likely be better to use a much larger buffer (say, 16MB) and use
markov prediction instead (lz77 slows down notably with larger windows,
wheras markov prediction is much more immune to this).

sensible may also be to (explicitly) have the buffer wrap around (in a
manner similar to, say, lzss). this is partly because, in my experience,
this is less work to implement and is also faster in my experience (with
restrictions, eg, that referenced strings can't wrap around the inside of
the window, ...).

sliding windows are, in my experience, a little bit more work to manage.
implementing them as rotating windows is slightly slower since the possibly
the rotating window needs to be both larger than the sliding window (data
being copied into the window may overlap the data being referenced in some
cases), and one has to deal with occurances like strings wrapping around the
ends of the window, ...

these problems largely go away if one either uses a single large buffer for
the output (sane for some files, not as much for streams) or uses an
approach such as shifting the data in the buffer or similar (eg: copying the
data from the end of the buffer back to the start), but these may cause
other problems (former: impractical for large data, latter: copying data is
not free).

maybe people who know more about existing implementations can comment more
on this, I am just speaking from my personal experience.


this would likely be much better in finding similarities in packet contents,
and is also likely to be computationally cheaper than using deflate as well.

dunno...

> thanks,
> kamal
>


.



Relevant Pages