Re: Orphaned OS/2 (Another View)
- From: mike luther <mike.luther@xxxxxxxxxx>
- Date: Sun, 17 Jun 2007 04:25:06 +0000
Yes ..
Andrew Stephenson wrote:
In article <ZUXci.23106$vT6.12762@edtnps90>
dave.r.yeo@xxxxxxxxx "Dave Yeo" writes:
I talked to the Authors about Antiword handling text. They did
not want to implement it. They said it was too hard to tell if
a DOC was just plain text.
Obviously they are likely to know far more about the Word (spit)
formats than I. But I do have a vague memory, from around 1990,
that Word (spit) uses a header which of course plain text won't.
If they can find a "fingerprint" (common even in simple formats)
there, that could distinguish the two file types. Such a marker
is likely to appear _very_ early in the file.
^^^^^^
You betcha .. Heck even WordStar files have their header right at the top of the file and you can even figure that out from the first 128 bytes or so.
For example:
000000 1D 7D 00 00 70 4C 51 35 37 30 00 20 20 00 00 00 } pLQ570 000010 80 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Ç 000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 000060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 000070 00 00 00 00 00 00 00 00 00 00 00 00 00 7D 00 1D }
Here are the first 80 bytes of a M/S .DOC file:
000000 D0 CF 11 E0 A1 B1 1A E1 00 00 00 00 00 00 00 00 ð¤Óí▒ß 000010 00 00 00 00 00 00 00 00 3E 00 03 00 FE FF 09 00 > ■ 000020 06 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 000030 44 00 00 00 00 00 00 00 00 10 00 00 45 00 00 00 D E 000040 01 00 00 00 FE FF FF FF 00 00 00 00 43 00 00 00 ■ C
Here are the first 80 bytes of another M/S .DOC file:
000000 D0 CF 11 E0 A1 B1 1A E1 00 00 00 00 00 00 00 00 ð¤Óí▒ß 000010 00 00 00 00 00 00 00 00 3E 00 03 00 FE FF 09 00 > ■ 000020 06 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 000030 30 00 00 00 00 00 00 00 00 10 00 00 32 00 00 00 0 2 000040 01 00 00 00 FE FF FF FF 00 00 00 00 31 00 00 00 ■ 1
What are the chances that any pure text .DOC file of old are going to have those first few bytes like that? That's why I suggested what I did.
--
--> Sleep well; OS2's still awake! ;)
Mike Luther
.
- References:
- Re: Orphaned OS/2 (Another View)
- From: Nathan Liskov
- Re: Orphaned OS/2 (Another View)
- From: mike luther
- Re: Orphaned OS/2 (Another View)
- From: Dave Yeo
- Re: Orphaned OS/2 (Another View)
- From: Andrew Stephenson
- Re: Orphaned OS/2 (Another View)
- Prev by Date: Re: LCD Monitors
- Next by Date: Re: LCD Monitors
- Previous by thread: Re: Orphaned OS/2 (Another View)
- Next by thread: Re: Orphaned OS/2 (Another View)
- Index(es):
Relevant Pages
|