Re: [Dialog] Cracking Richard



On Fri, 13 Jun 2008 17:39:00 -0500, Ron Ford wrote:

Thanks for your reply, Bernd.

You're welcome. :-)

I seem to have liked every Bernd I've met.

I doubt this observation holds to make it a law of nature. ;-)

I did what you said and ended up with 524K of Richard's posts in RMain.txt!
That seems impossible to me as I cleared comp.lang.fortran yesterday.

The msg*.dat files also contain all deleted messages until "Compact
Database" is run. If you have an old setup of Dialog, you may even
have "undeletable" messages (like I have, myself), which are stored
in the database, but appear nowhere in Dialog. There have been some
early Beta versions, which messed up parts of the database, while
still leaving it functional.

--
Richard Maine | Good judgement comes from experience;
email: last name at domain . net | experience comes from bad judgement.
domain: summertriangle | -- Mark Twain

Please avoid posting a sig-separator line unaltered inside quotes.
Some people use special formats for sigs or even set sigs to be
hidden. This way they may miss what you write after this line.

IMHO, the best way to paste a citation of an extern text is by
using <Edit><Paste as custom quote> with a special quote char
like |. If this isn't an option, delete/mung the sig separator
inside the quote.

2D 20 4D 61 72 6B 20 54 77 61 69 6E 0D 0A 7F 53 - Mark Twain...S

I can see in the hex editor that the control character that precedes
Subject is DEL or hex 7F.

I saw that you already sifted out the true position of the
separator. (I commented on this in a follow-up to that posting.)

Can you say a few words on the syntax you used for the command line
for mtr.exe?

Hm. The short version: Read minitrue.txt!

The longer one:
mtr.exe -xo^^$i:sig.txt msg*.dat > RMain.txt

I decided to use a separate sig.txt, since the text you're looking
for was too long to be used on the command line. If you would like
to filter all messages containing:
"From: nospam@xxxxxxxxxxxxx (Richard Maine)"
you may as well execute this directly (all on one line):

mtr.exe -xo^^$ msg*.dat "[^\x7F]*\r\nFrom: nospam@xxxxxxxxxxxxx (Richard Maine)\r\n[^\x7F]*\x7F" > RMain.txt

All options are concatenated. You need to look them up, separately,
from the MiniTrue documentation.
-x ... search string contains regular expressions
-o^^ ... output text will not be prepended by file name (where the
text has been found); please note that the double caret is
necessary on Windows command line, because a single caret
is interpreted as an escape character
-o$ ... no contextual lines in output (the RegEx is built in a way
to fetch to whole text; therefore no context is necessary)

The most important parts of the RegEx are [^\x7F]* and [\x7F]. While
the former catches all chars *except* 0x7F, finds the latter *only*
that character.

And just another thought: If messages are found as first messages
inside msg*.dat files, the header of the msg*.dat (01 00 00 00) will
appear in the output. You can prevent this by replacing [^\x7F] with
[^\x00\x01\x7F]. Please note that you will encounter problems with all
these search strings, if the excluded characters appear inside the
message, e.g. when the messages carry attachments.

The nice thing about MiniTrue is the possibility to search multiline
RegEx without special efforts. Therefore I chose this tool.

Bernd
.



Relevant Pages

  • Re: Loading a data file containing character fields with different encodings
    ... UTF-8 characters along with Latin-1 characters. ... One containing the latin-1 character set column, the second containing the utf-8 column and of course both files containing the primary key information. ... it would be just as easy to write the loader script that converts the encoding to a "unicode" intermediate format and then load with the correct database encoding. ... This caused that no conversion was done, but you were puting CP1252 characters into an 819 database! ...
    (comp.databases.informix)
  • Re: Are there performancebenefits to reorganizing database-using export/import?
    ... time we had done full export of 8.1.7 database and then created 9.2. ... those warnings are well defined in "Expert Oracle Database ... What happens if you accidentally change the character set when you ... exactly 30 characters long with an empty set character in the column ...
    (comp.databases.oracle.server)
  • Re: Searching by Unicode codes
    ... What character are you searching for? ... What is its hex number? ... search string ^U0xnnnn, for example - no luck there, and similarly no luck ... Are there any other settings ...
    (microsoft.public.word.application.errors)
  • Re: help needed
    ... Whether you use pointer arithmetic or array notation, set up the database to look like a two-dimensional array of chars; ... Get the first character of the database entry. ... Of it is NULL go on to the next database entry. ... a NULL password will pass unless all database entries contain valid passwords. ...
    (comp.dsp)
  • Re: JSP Internationalization
    ... fixed it for me was to make sure that my Oracle database was in the ... correct character set, Oracle 9i did not use the standard UTF-8 unless ... I still get question marks. ... The native resource file is saved as UTF 8. ...
    (comp.lang.java.programmer)