Re: [Dialog] Option to retrieve bodies of new posts does not work reliably



VanguardLH wrote:

Under the default group options, in the Retrieving category of options,
I enabled the option "Retrieve bodies for all new posts". All group use
the default group options. Yet when there are new posts to retrieve and
after retrieving their headers, there are still many posts that are
either:

- Their bodies are not downloaded.
- They are only marked for downloading but don't get downloaded.

I only participate in text groups. Not all headers are available for
scoring when retrieving just the headers (i.e., all you get are the
overview headers). Just the overview headers are not sufficient to
identify particular posters or types of posters. So I have filters that
test on non-overview headers. This only works if the body of the post
gets pulled during a poll so that all headers are retrieved and are
therefore available for testing. But those filters will fail (and
incorrectly flag a post as Ignored) if the non-overview headers are not
available which is the case when Dialog neglects to honor the option to
download the bodies of all new posts.

I'll give an example. Someone was impersonating Bruce Hagen. Although
they were posting from the same ISP, Bruce posts from a San Diego branch
whereas the imposter was posting through a Nevada branch, as I recall,
of the same ISP. So I wanted to ignore any posts that said they were
from Bruce but which did not come through the same regional hub for his
ISP. My filter looks like (all on one line but split here for clarity):

!setcolor(maroon;white),ignore,markread
From {\bBruce.*\bHagen}
-@Header:{^NNTP-Posting-Host:.*\.sd\.cox\.net}

Anyone claiming to be "Bruce Hagen" but who did not post using a host
connected to the San Diego region for Cox would get ignored as an
imposter. Bruce always posts from sd.cox.net. I believe the imposter
might've decided to cease his impostering (I would have to check) but
the point is that the NNTP-Posting-Host is *not* an overview header so
this filter only works if the option in Dialog to download the bodies of
new posts was actually honored so the filters could test on the non-
overview headers. Since Dialog is not honoring its own option but
instead does not download some bodies of new posts, the above filter
would result in flagging as Ignored posts from Bruce. Since the non-
overview headers aren't available, the filter doesn't see the NNTP-
Posting-Host header. "And Not" (for the -@ prefix) also triggers when
the header is absent, not just when string is found.

So can I make Dialog obey its "Retrieve bodies for all new posts"
default group option? All groups are configured to use the default
group options. Since I only participate in text groups (36 of them), I
would rather wait for new posts to get downloaded so that I could test
on the non-overview headers rather than give up some decent filtering of
a lot of noise in several groups.

Might've found the cause: Dialog scores TWICE. From its help:

"Usenet articles are scored twice in Dialog. When you get headers in a
group the scoring rules are applied to the available, limited number of
headers, however when you retrieve the complete body of the message, the
message is scored again and this time all headers can be scored."

Oh, goody. That means if a filter tests on a non-overview header (that
WILL be available after downloading the body) during its first pass
through the filters on downloading just the headers then it can (and
does) screw up the filters ran on the 2nd pass when the bodies are
downloaded.

Apparently if an article is flagged Ignored, Dialog won't download its
body. That explains why some articles are not downloaded despite the
option to get them downloaded. It also means that any filters that rely
on non-overview headers will screw up. Those non-overview headers won't
be available in the first pass when only the headers are downloaded.
Well, other than providing regex support, this relegates Dialog to the
same dumb filtering as available in OE, Thunderbird, and many other
newsreaders. I cannot define filters in Dialog that will not have side
effects (from the first pass which exercises the filters against only
the overview headers). Because of this double pass through the filters,
and with one of them on just the overview headers, filters can only
specify the overview headers on which to test which makes it impossible
to detect certain posters or types of posters.

Of course, if the -@ (And Not) prefix did NOT trigger when the header
was absent then the problem wouldn't have come up (until I hit a
different situation of side effects produced by running the SAME filters
against one articles set with only overview headers and again against
another articles set with all headers).
.



Relevant Pages

  • Re: Remove attachments from newsgroup posts
    ... I've not moved the newsgroup posts to another folder. ... some of them, containing attachments, at which point the headers change from ... Compact folders, ... seconds to download, so I am confident they're not being downloaded again. ...
    (microsoft.public.windows.inetexplorer.ie6_outlookexpress)
  • Re: Remove attachments from newsgroup posts
    ... Get xxx headers at a time, ... I've not moved the newsgroup posts to another folder. ... > and the attachments are instantly available if I open the posts. ... > command to download all the posts from the server. ...
    (microsoft.public.windows.inetexplorer.ie6_outlookexpress)
  • Re: 40tude Dialog score on NNTP-Posting-Host
    ... just their overview headers. ... the posts are ... Downloading all the bodies ... careful in defining your filters). ...
    (news.software.readers)
  • Re: 40tude Dialog score on NNTP-Posting-Host
    ... just their overview headers. ... the posts are ... Downloading all the bodies ... careful in defining your filters). ...
    (news.software.readers)
  • Re: [New Slrn/Slrnpull] Scoring
    ... >> I'll never even download the body of another one of ... > headers of their messages so they can criticise them in public! ... Breidbart to be a supporter of spammers, ... killfiling posts as soon as he saw a header ...
    (news.software.readers)