Re: Pages vs. MS Word



On 2008-09-13 12:20:29 -0700, Wes Groleau <groleau+news@xxxxxxxxxxxxx> said:

gtr wrote:
So you produced an output file, then examined it to find out what it's native encoding was and found that encoding to be "Mac OS Roman". I ask again: how did you establish this was the output encoding?

Do three different things and see whether they agree:

Excellent! Thanks for the input.

1. At a Terminal prompt, type
file <filename> and see what that says about the file type

I have another page, again in Japanese and romaji that I saved as a text file on my desktop. I did the "file" cmd from within unix and got this:

Big-endian UTF-16 Unicode English character data, with CR line terminators

This seems to imply that Pages will indeed save in UTF16. Note that I copied text into this Pages document from another document (DevonThink, not that it matters). But this makes me wonder exactly where the encoding takes place. Nevertheless if I put a UTF16-encoded copy-block into the Pages document, it will save it thataway.

2. Launch TextEdit, discard the blank window and pick File->Open
Select the file and then one of the encoding methods. If you
don't see the right characters, close without saving and try
another encoding.

I've repeatedly proved to myself that "Automatic" is not automatic and so I have less than the greatest confidence in dealing with TextEdit and Japanese text files. It's given me a lot of trouble by being confused over file-encoding. Nevertheless I opened it as you indicated, first allowing "Automatic" and it read fine. Then I opened it again selecting UTF16,UTF8, Western (Mac OS Roman), Western (Windows Latin 1), Japanese (Mac OS X), Japanese (Windows, DOS), Japanese (ISO-2022 JP), Japanese (Shift JIS), and Korean (Mac OS). All of them loaded the text file appropriately, all of the loads displayed just fine.

That again confirms to me that TextEdit is a deaf-mute in a telephone booth.

3. If you get the right characters showing, save it to a different
file name and explicitly save as the same encoding. Compare the
two files with 'diff' or 'sum' or 'od' and see whether they are
the same.

I saved it from TextEdit, explictly stating the output to be UTF16 (in which I have no confidence). Checked this against the original text file output from Pages using diff, and they showed no differences.

I then did the same thing but saved the file from TextEdit as UTF8 and diff informed they that the two files differ. I then issued a "file" comand against the UTF8 file and got this:

DOS executable (COM)

Ain't life funny!

Optional: If you are still unsure, put some non-English characters
in a text file in TextEdit and/or TextWrangler, and save as various
types. If the document has characters that a particular encoding
cannot do, that one will be ghosted in the menu. Examine the bytes
of those characters with 'od -xc' and compare with the bytes in
the document you're wondering about.

Last resort: e-mail the document to me and I'll tell you the
encoding method. Might take me a while....

Would you be using the same tools, Pages, TextEdit and Unix? If so, I'd think you'd get the same results.

Anyway, having gotted the feedback from "file" makes me feel comfortable enough. Thanks a bunch for your input.
--
Thank you and have a nice day.

.



Relevant Pages

  • Re: length of char in bits differs on Win/Linux and Mac
    ... You were just lucky on Windows with your algorithm, and you used the wrong encoding for reading on the Mac. ... because Java uses Unicode for all characters. ...
    (comp.lang.java.programmer)
  • Re: Converting textfile from Mac to Windows
    ... CE languages showed corrupted characters. ... After I wrote the conversion tool, the files were ok for mac. ... Filemaker should write a file to filesystem containig data from database. ... I have tried to encoding using C#'s Encoding classes but still special ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Text file format
    ... of some documents that I have imported from Windows systems but which ... Unix, including Mac OS X ... to be universal (lots of characters, lots of support), go with a ... go for the encoding for that language (eg. ...
    (comp.sys.mac.misc)
  • Re: Saving text files with Textedit
    ... The Unix command line tools in Terminal mostly deal with ASCII text ... and avoid the use of non-English characters. ... The best encoding format to pick is probably of the normally hidden ... Was this an old file you were modifying in TextEdit? ...
    (comp.sys.mac.misc)
  • Re: TextEdit saving options?
    ... it asks what kind of Plain Text Encoding (not 1% of new ... Since many apps can open these kinds of file, TextEdit ... has downplayed in Mac OS X the long-standing notion of an app writing ...
    (comp.sys.mac.apps)