Re: Pages vs. MS Word
- From: gtr <xxx@xxxxxxx>
- Date: Sat, 13 Sep 2008 15:38:29 -0700
On 2008-09-13 12:20:29 -0700, Wes Groleau <groleau+news@xxxxxxxxxxxxx> said:
gtr wrote:So you produced an output file, then examined it to find out what it's native encoding was and found that encoding to be "Mac OS Roman". I ask again: how did you establish this was the output encoding?
Do three different things and see whether they agree:
Excellent! Thanks for the input.
1. At a Terminal prompt, type
file <filename> and see what that says about the file type
I have another page, again in Japanese and romaji that I saved as a text file on my desktop. I did the "file" cmd from within unix and got this:
Big-endian UTF-16 Unicode English character data, with CR line terminators
This seems to imply that Pages will indeed save in UTF16. Note that I copied text into this Pages document from another document (DevonThink, not that it matters). But this makes me wonder exactly where the encoding takes place. Nevertheless if I put a UTF16-encoded copy-block into the Pages document, it will save it thataway.
2. Launch TextEdit, discard the blank window and pick File->Open
Select the file and then one of the encoding methods. If you
don't see the right characters, close without saving and try
another encoding.
I've repeatedly proved to myself that "Automatic" is not automatic and so I have less than the greatest confidence in dealing with TextEdit and Japanese text files. It's given me a lot of trouble by being confused over file-encoding. Nevertheless I opened it as you indicated, first allowing "Automatic" and it read fine. Then I opened it again selecting UTF16,UTF8, Western (Mac OS Roman), Western (Windows Latin 1), Japanese (Mac OS X), Japanese (Windows, DOS), Japanese (ISO-2022 JP), Japanese (Shift JIS), and Korean (Mac OS). All of them loaded the text file appropriately, all of the loads displayed just fine.
That again confirms to me that TextEdit is a deaf-mute in a telephone booth.
3. If you get the right characters showing, save it to a different
file name and explicitly save as the same encoding. Compare the
two files with 'diff' or 'sum' or 'od' and see whether they are
the same.
I saved it from TextEdit, explictly stating the output to be UTF16 (in which I have no confidence). Checked this against the original text file output from Pages using diff, and they showed no differences.
I then did the same thing but saved the file from TextEdit as UTF8 and diff informed they that the two files differ. I then issued a "file" comand against the UTF8 file and got this:
DOS executable (COM)
Ain't life funny!
Optional: If you are still unsure, put some non-English characters
in a text file in TextEdit and/or TextWrangler, and save as various
types. If the document has characters that a particular encoding
cannot do, that one will be ghosted in the menu. Examine the bytes
of those characters with 'od -xc' and compare with the bytes in
the document you're wondering about.
Last resort: e-mail the document to me and I'll tell you the
encoding method. Might take me a while....
Would you be using the same tools, Pages, TextEdit and Unix? If so, I'd think you'd get the same results.
Anyway, having gotted the feedback from "file" makes me feel comfortable enough. Thanks a bunch for your input.
--
Thank you and have a nice day.
.
- Follow-Ups:
- Re: Pages vs. MS Word
- From: Wes Groleau
- Re: Pages vs. MS Word
- References:
- Re: Pages vs. MS Word
- From: Marc Heusser
- Re: Pages vs. MS Word
- From: P. Sture
- Re: Pages vs. MS Word
- From: P. Sture
- Re: Pages vs. MS Word
- From: gtr
- Re: Pages vs. MS Word
- From: Wes Groleau
- Re: Pages vs. MS Word
- Prev by Date: Re: Running a program from terminal
- Next by Date: Re: Little Snitch not notifying anymore
- Previous by thread: Re: Pages vs. MS Word
- Next by thread: Re: Pages vs. MS Word
- Index(es):
Relevant Pages
|