Re: [Dialog] Option to retrieve bodies of new posts does not work reliably
- From: VanguardLH <V@xxxxxxxxx>
- Date: Fri, 13 Mar 2009 11:14:41 -0500
VanguardLH wrote:
Under the default group options, in the Retrieving category of options,
I enabled the option "Retrieve bodies for all new posts". All group use
the default group options. Yet when there are new posts to retrieve and
after retrieving their headers, there are still many posts that are
either:
- Their bodies are not downloaded.
- They are only marked for downloading but don't get downloaded.
I only participate in text groups. Not all headers are available for
scoring when retrieving just the headers (i.e., all you get are the
overview headers). Just the overview headers are not sufficient to
identify particular posters or types of posters. So I have filters that
test on non-overview headers. This only works if the body of the post
gets pulled during a poll so that all headers are retrieved and are
therefore available for testing. But those filters will fail (and
incorrectly flag a post as Ignored) if the non-overview headers are not
available which is the case when Dialog neglects to honor the option to
download the bodies of all new posts.
I'll give an example. Someone was impersonating Bruce Hagen. Although
they were posting from the same ISP, Bruce posts from a San Diego branch
whereas the imposter was posting through a Nevada branch, as I recall,
of the same ISP. So I wanted to ignore any posts that said they were
from Bruce but which did not come through the same regional hub for his
ISP. My filter looks like (all on one line but split here for clarity):
!setcolor(maroon;white),ignore,markread
From {\bBruce.*\bHagen}
-@Header:{^NNTP-Posting-Host:.*\.sd\.cox\.net}
Anyone claiming to be "Bruce Hagen" but who did not post using a host
connected to the San Diego region for Cox would get ignored as an
imposter. Bruce always posts from sd.cox.net. I believe the imposter
might've decided to cease his impostering (I would have to check) but
the point is that the NNTP-Posting-Host is *not* an overview header so
this filter only works if the option in Dialog to download the bodies of
new posts was actually honored so the filters could test on the non-
overview headers. Since Dialog is not honoring its own option but
instead does not download some bodies of new posts, the above filter
would result in flagging as Ignored posts from Bruce. Since the non-
overview headers aren't available, the filter doesn't see the NNTP-
Posting-Host header. "And Not" (for the -@ prefix) also triggers when
the header is absent, not just when string is found.
So can I make Dialog obey its "Retrieve bodies for all new posts"
default group option? All groups are configured to use the default
group options. Since I only participate in text groups (36 of them), I
would rather wait for new posts to get downloaded so that I could test
on the non-overview headers rather than give up some decent filtering of
a lot of noise in several groups.
Might've found the cause: Dialog scores TWICE. From its help:
"Usenet articles are scored twice in Dialog. When you get headers in a
group the scoring rules are applied to the available, limited number of
headers, however when you retrieve the complete body of the message, the
message is scored again and this time all headers can be scored."
Oh, goody. That means if a filter tests on a non-overview header (that
WILL be available after downloading the body) during its first pass
through the filters on downloading just the headers then it can (and
does) screw up the filters ran on the 2nd pass when the bodies are
downloaded.
Apparently if an article is flagged Ignored, Dialog won't download its
body. That explains why some articles are not downloaded despite the
option to get them downloaded. It also means that any filters that rely
on non-overview headers will screw up. Those non-overview headers won't
be available in the first pass when only the headers are downloaded.
Well, other than providing regex support, this relegates Dialog to the
same dumb filtering as available in OE, Thunderbird, and many other
newsreaders. I cannot define filters in Dialog that will not have side
effects (from the first pass which exercises the filters against only
the overview headers). Because of this double pass through the filters,
and with one of them on just the overview headers, filters can only
specify the overview headers on which to test which makes it impossible
to detect certain posters or types of posters.
Of course, if the -@ (And Not) prefix did NOT trigger when the header
was absent then the problem wouldn't have come up (until I hit a
different situation of side effects produced by running the SAME filters
against one articles set with only overview headers and again against
another articles set with all headers).
.
- Follow-Ups:
- Re: [Dialog] Option to retrieve bodies of new posts does not work reliably
- From: Oliver Cromm
- Re: [Dialog] Option to retrieve bodies of new posts does not work reliably
- References:
- Prev by Date: [Dialog] Option to retrieve bodies of new posts does not work reliably
- Next by Date: Re: XNEWS Sound Question
- Previous by thread: [Dialog] Option to retrieve bodies of new posts does not work reliably
- Next by thread: Re: [Dialog] Option to retrieve bodies of new posts does not work reliably
- Index(es):
Relevant Pages
|