Re: *really* shortest match in awk - possible?



Hi Tomasz, hello netlanders,

On Wed, 21 Nov 2007 19:18:33 +0100, Tomasz Chmielewski wrote:

Yes, everything is possible in awk, but not for mere mortals, right?

I have a mbox file (see below for an example) with advertisements
attached to the end of every message. I would like to remove these ads.
Ads are placed between -------- and _________ characters.

I tried using something like:

awk '/^----/,/^____/{next}{print}'

but it eats up a bit too much, and one could argue if it's really the
shortest match.

Here an example mbox (note that the length of ------ and ______ can
vary):


From - Sun Sep 18 12:55:25 2005
(...)
Some text I want to keep
Some text I want to keep

------------------------------------------------------- SF.Net email is
sponsored by:
Tame your development.....
________________________________________ this text should stay
email@address
https://address/should/stay


From - Sun Sep 18 12:58:18 2005
(...)
2 Some text I want to keep 2
2 Some text I want to keep 2

A cool diagram which needs to stay:
------------------------------------- |This will be gone, too if I use
| |awk '/^----/,/^____/{next}{print}' | |because of "------" above
| -------------------------------------

2 Some text I want to keep 2
2 Some text I want to keep 2


--------------------
SF.Net email is sponsored by:
some other
advertisement
_______________________________________________ this text should stay
email@address
https://address/should/stay


From - Thu Sep 22 16:00:04 2005
(...)
3 Some text I want to keep 3
3 Some text I want to keep 3

-----------------------------------
SF.Net email is sponsored by:
_________________________________________ this text should stay
email@address
https://address/should/stay

a working POSIX awk script is coded according to Janis's ideas:

/^_+$/ { pr(a, e-1)
del(a)
e = 0
next }
/^-+$/ { ++e; j = 1 }
e { a[e, j++] = $0; next }
1
END { pr(a, e) }

function pr(a, e, i, j) {
for (i = 1; i <= e; ++i) {
for (j = 1; (i,j) in a; ++j)
print a[i,j]
} }

function del(a, k) {
for (k in a)
delete a[k]
}

But the tac-awk solution of Janis is more elegant but not as efficient
than the above script. Janis's on-topic awk script doesn't work for
multiple ads.

Hope I could help,

Steffen "goedel" Schuler
.



Relevant Pages

  • Re: *really* shortest match in awk - possible?
    ... I have a mbox file with advertisements attached to the end of every message. ... I would like to remove these ads. ... I see two more or less apparent solutions. ...
    (comp.lang.awk)
  • Re: *really* shortest match in awk - possible?
    ... I have a mbox file with advertisements attached to the end of every message. ... I would like to remove these ads. ... to add logic to continue printing after the first erased block. ...
    (comp.lang.awk)
  • Re: California - unbeliveable.
    ... advertisements, so loud I think i go deaf. ... There's a guy, in NYC of course, who is making big fukn bucks putting LCD ... Buses and subways now have ads, I swear to god, on the ceilings. ... The various exposed faces of any tall building is big-time ad space, ...
    (alt.machines.cnc)
  • Re: Looking to improve my genealogy site - feeback?
    ... of advertisements where you have to search for original material ... I do have a ton of actual content in the ... My homepage has the most ads and links to 3rd-party sites, ... and publish external links to other ...
    (soc.genealogy.misc)
  • Re: Top Ten Mock-A-Rama (August 2008)
    ... MOCK TOP TEN REASONS I REALLY REALLY LIKE ADVERTISEMENTS ... They provide me previous minutes to attend to bodily functions ... My zen powers of concentration are strengthen by ads (especially ...
    (alt.fan.letterman)