Re: general compression with awk



Loki Harfagr wrote:
>
> If you still want to try awk, here's a very
> timid starter.
> The commented prints are for looking at the engine
> being clumsy (It doesn't pair doubletons like "a b").
>
discovering (and compressing) recurrent patterns IS the whole point of
this exercise

> With your first original sample data it gives :
> a 4
> b 1
> a 1
> b 1
> a 1
> b 1
> c 1
> d 2
>
> Maybe someone (maybe me) will find some time to improve
> the poor thing :D)
>
> $ cat RLEinawk.awk
> BEGIN{
> ### print "========="
> }
> (imp[$0]>0){
> pot=$0
> imp[pot]++;
> ### print "["FNR"] put "$0" in accu, count "imp[$0]
> next
> }
> {
> ### print "["FNR"] We read a newt "$0", first depot "pot" ->"imp[pot]
> if(FNR>1)print pot" "imp[pot]
> imp[pot]=0
> pot=$0
> ### print "Then put the newt "pot" into accu"
> imp[pot]++
> next
> }
> END{
> ### print "In the end, depot the rest"
> if(FNR>1)print pot" "imp[pot]
> ### print "========="
> }
>

Your code does `uniq -c` in an obscure way.

$ uniq -c example
4 a
1 b
1 a
1 b
1 a
1 b
1 c
2 d

Maybe I did not quite see your point. - Looking forward to your
improvement...

As Ed pointed out

$ cat example example |awk -f RLEinawk.awk

should print something like
2 example

if you understand what I mean.

RTF
.



Relevant Pages