Re: Text file format



Howard Brazee <howard@xxxxxxxxxx> wrote:
I have used Text Wrangler to change the character coding and line ends
of some documents that I have imported from Windows systems but which
did not display correctly in Text Edit.

I saw lot of options, and decided to look at a OS X generated text
file to model my documents.

But the amount of options Text Wrangler has leads me to ask advice
about different formats for different uses. For instance, a document
that I need to copy directly back and forth between OS X and Windows
might be different from one I use elsewhere.

Any recommendations about when a particular character coding and/or
LF/FF settings are appropriate?

Line endings are easy:

LF: Unix, including Mac OS X
CR+LF: DOS, Windows
CR: Old Mac OS

Where LF is a line feed and CR is a carriage return (DOS uses two
characters).

Since the pattern is so simple, there are no real dramas converting
between one and the other. Just use that which is most convenient.
I always use LF since all the unix tools will have no dramas and it
is simple to add or change to a CR for apps that don't get it.

Encodings I don't know much about. As I understand it, if you want
to be universal (lots of characters, lots of support), go with a
UTF format like UTF-8 or UTF-16. If you specifically want to support
a certain language, go for the encoding for that language (eg.
ISO-2022-JP (JIS) for Japanese). If you want a good, widely
compatible encoding that covers all of the characters used in
Western Europe but doesn't have the complexity of UTF, use
ISO 8859-1 (Latin 1). There are adaptions of this encoding (-2, -3,
-4 etc.) that add more exotic characters.

Finally, use Mac OS Roman or Windows-1250 only if you want to be
compatible with those OS's.

HTH

--
*--------------------------------------------------------*
| ^Nothing is foolproof to a sufficiently talented fool^ |
| Heath Raftery, HRSoftWorks _\|/_ |
*______________________________________m_('.')_m_________*
.



Relevant Pages

  • Re: length of char in bits differs on Win/Linux and Mac
    ... You were just lucky on Windows with your algorithm, and you used the wrong encoding for reading on the Mac. ... because Java uses Unicode for all characters. ...
    (comp.lang.java.programmer)
  • Re: Converting textfile from Mac to Windows
    ... CE languages showed corrupted characters. ... After I wrote the conversion tool, the files were ok for mac. ... Filemaker should write a file to filesystem containig data from database. ... I have tried to encoding using C#'s Encoding classes but still special ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Pages vs. MS Word
    ... native encoding was and found that encoding to be "Mac OS Roman". ... Then I opened it again selecting UTF16,UTF8, Western (Mac OS Roman), Western (Windows Latin ... That again confirms to me that TextEdit is a deaf-mute in a telephone booth. ... If the document has characters that a particular encoding ...
    (comp.sys.mac.apps)
  • Re: Strange Characters When Viewing Outlook Express messages
    ... Messages Received in Outlook Express Have Different Characters in the ... messages in the default encoding format regardless of the actual encoding ... changed something with whatever they use to produce the emails. ...
    (microsoft.public.windowsxp.general)
  • Re: Help me!! Why java is so popular
    ... Well, Unicode is not a storage encoding system, or anything like that. ... Unicode is primarily a mapping from characters (in the linguistic conceptual ... French, Russian, Japanese and Korean songs. ...
    (comp.lang.java.programmer)