Re: *really* shortest match in awk - possible?
- From: Steffen Schuler <schuler.steffen@xxxxxxxxxxxxxx>
- Date: 22 Nov 2007 23:50:19 GMT
Hi Tomasz, hello netlanders,
On Wed, 21 Nov 2007 19:18:33 +0100, Tomasz Chmielewski wrote:
Yes, everything is possible in awk, but not for mere mortals, right?
I have a mbox file (see below for an example) with advertisements
attached to the end of every message. I would like to remove these ads.
Ads are placed between -------- and _________ characters.
I tried using something like:
awk '/^----/,/^____/{next}{print}'
but it eats up a bit too much, and one could argue if it's really the
shortest match.
Here an example mbox (note that the length of ------ and ______ can
vary):
From - Sun Sep 18 12:55:25 2005
(...)
Some text I want to keep
Some text I want to keep
------------------------------------------------------- SF.Net email is
sponsored by:
Tame your development.....
________________________________________ this text should stay
email@address
https://address/should/stay
From - Sun Sep 18 12:58:18 2005
(...)
2 Some text I want to keep 2
2 Some text I want to keep 2
A cool diagram which needs to stay:
------------------------------------- |This will be gone, too if I use
| |awk '/^----/,/^____/{next}{print}' | |because of "------" above
| -------------------------------------
2 Some text I want to keep 2
2 Some text I want to keep 2
--------------------
SF.Net email is sponsored by:
some other
advertisement
_______________________________________________ this text should stay
email@address
https://address/should/stay
From - Thu Sep 22 16:00:04 2005
(...)
3 Some text I want to keep 3
3 Some text I want to keep 3
-----------------------------------
SF.Net email is sponsored by:
_________________________________________ this text should stay
email@address
https://address/should/stay
a working POSIX awk script is coded according to Janis's ideas:
/^_+$/ { pr(a, e-1)
del(a)
e = 0
next }
/^-+$/ { ++e; j = 1 }
e { a[e, j++] = $0; next }
1
END { pr(a, e) }
function pr(a, e, i, j) {
for (i = 1; i <= e; ++i) {
for (j = 1; (i,j) in a; ++j)
print a[i,j]
} }
function del(a, k) {
for (k in a)
delete a[k]
}
But the tac-awk solution of Janis is more elegant but not as efficient
than the above script. Janis's on-topic awk script doesn't work for
multiple ads.
Hope I could help,
Steffen "goedel" Schuler
.
- Follow-Ups:
- Re: *really* shortest match in awk - possible?
- From: Steffen Schuler
- Re: *really* shortest match in awk - possible?
- From: Janis Papanagnou
- Re: *really* shortest match in awk - possible?
- References:
- *really* shortest match in awk - possible?
- From: Tomasz Chmielewski
- *really* shortest match in awk - possible?
- Prev by Date: Re: pulling out blocks of data from file
- Next by Date: Re: awk and sorting file
- Previous by thread: Re: *really* shortest match in awk - possible?
- Next by thread: Re: *really* shortest match in awk - possible?
- Index(es):
Relevant Pages
|