Re: Orphaned OS/2 (Another View)



Yes ..

Andrew Stephenson wrote:
In article <ZUXci.23106$vT6.12762@edtnps90>
dave.r.yeo@xxxxxxxxx "Dave Yeo" writes:

I talked to the Authors about Antiword handling text. They did
not want to implement it. They said it was too hard to tell if
a DOC was just plain text.

Obviously they are likely to know far more about the Word (spit)
formats than I. But I do have a vague memory, from around 1990,
that Word (spit) uses a header which of course plain text won't.
If they can find a "fingerprint" (common even in simple formats)
there, that could distinguish the two file types. Such a marker
is likely to appear _very_ early in the file.

^^^^^^

You betcha .. Heck even WordStar files have their header right at the top of the file and you can even figure that out from the first 128 bytes or so.


For example:


000000 1D 7D 00 00 70 4C 51 35 37 30 00 20 20 00 00 00 } pLQ570 000010 80 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Ç 000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 000060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 000070 00 00 00 00 00 00 00 00 00 00 00 00 00 7D 00 1D } 


Here are the first 80 bytes of a M/S .DOC file:


000000 D0 CF 11 E0 A1 B1 1A E1 00 00 00 00 00 00 00 00 ð¤Óí▒ß 000010 00 00 00 00 00 00 00 00 3E 00 03 00 FE FF 09 00 >  ■ 000020 06 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00   000030 44 00 00 00 00 00 00 00 00 10 00 00 45 00 00 00 D  E 000040 01 00 00 00 FE FF FF FF 00 00 00 00 43 00 00 00  ■ C


Here are the first 80 bytes of another M/S .DOC file:


000000 D0 CF 11 E0 A1 B1 1A E1 00 00 00 00 00 00 00 00 ð¤Óí▒ß 000010 00 00 00 00 00 00 00 00 3E 00 03 00 FE FF 09 00 >  ■ 000020 06 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00   000030 30 00 00 00 00 00 00 00 00 10 00 00 32 00 00 00 0  2 000040 01 00 00 00 FE FF FF FF 00 00 00 00 31 00 00 00  ■ 1


What are the chances that any pure text .DOC file of old are going to have those first few bytes like that? That's why I suggested what I did.

--


--> Sleep well; OS2's still awake! ;)

Mike Luther
.



Relevant Pages

  • Re: savefile design
    ... then why plain UTF8 text in the first ... savefile into an email, ... in the file header or appended at the end. ... If you allow for partial recovery, ...
    (rec.games.roguelike.development)
  • Re: potential break or real break?
    ... It applies equally to image files, plain text files, ... Decoding enough to check if you have a valid header will be ... a jpeg for example, even if it is an OTP, and assuming the header is decoded ... recognise a file as a jpeg decodes correctly, if the rest of the data is not ...
    (sci.crypt)
  • Re: Stripping all formatting from Word doc
    ... You get a choice of formats, depending on what ... >> Word does NOT paste 'plain text' unless you force it to. ... The same text cut and pasted from TextEdit assumes ...
    (microsoft.public.mac.office.word)
  • Re: Using Countif for Date Matches
    ... If the header was a "true" date, formatted to display whatever you wanted to ... > Please keep all correspondence within the Group, ... >>> How are you typing your dates compared to your system settings? ... Do the settings for your Date Formats look the same as how ...
    (microsoft.public.excel.worksheet.functions)
  • Re: Rescuing data from DVD
    ... seems to be read as plain text format. ... The problem is not related to whether they are Windows programs or not. ... to be able to supply the related formats to my accountant). ... the same problem occurs with _all_ file types. ...
    (alt.os.linux.suse)