Re: Non ASCII characters in CFString litteral, HELP !



In article <1173048739.956532@xxxxxxxxxxxxxxxxxx>,
Michael Ash <mike@xxxxxxxxxxx> wrote:

Sean McBride <cwatson@xxxxxxx> wrote:
In article <1172720951.797048@xxxxxxxxxxxxxxxxxx>,
Michael Ash <mike@xxxxxxxxxxx> wrote:

As others have said, the problem is gcc. Other than the workarounds
already suggested, you might look at using another compiler. Your
choices are: CodeWarrior (discontinued, thus not recommended), icc (from
Intel), and xlc (from IBM). I don't know if any of them accept
better-than-ASCII source files.

xlc is PPC-only and last I heard it did not do Objective-C. Same for icc
except it's x86-only. CodeWarrior is PPC-only and very dead, although it
does Objective-C, but I don't know if it does it convincingly enough to
actually work with Cocoa.

Oh, and they're all quite expensive, particularly compared to gcc's $free.

I think this is swatting a fly with a machine gun....

Maybe, that's for the OP to decide. Just trying to suggest options....

Also, you might not think it such a drastic solution if you couldn't
code in your mother tongue. Imagine gcc were a Japanese invention and
did not support English characters! :)

I would be careful calling it a "solution" when it doesn't actually fix
the OP's problem.

Quite right. I should have said 'potential maybe-solution'. :) As I
said way way back: "you might look at using another compiler. [...] I
don't know if any of them accept better-than-ASCII source files.". We
have now looked at it, and clearly it won't help the OP.

The problem is non-ASCII NSString literals (the error
refers to CF but actually means both CF and NS) but one or two of the
compilers listed don't even support Objective-C.

It is indeed lamentable that we have pretty much only one compiler
choice on our platform (at least for anything that needs Obj-C, and more
and more stuff needs it these days).

The one that does
(CodeWarrior) probably doesn't reliably support non-ASCII NSString
literals anyway. Even if it did, it won't support characters outside of
MacRoman because of how NSConstantString is implemented.

It is 2007 after all, and gcc's ASCII-only-ness is exceeding lame.

For this particular case, blame C or Cocoa, not gcc. Gcc is perfectly
happy to take C string constants in any encoding and just pass them
through to the other side. So as long as you make sure your files are
UTF-8 you can do something like this:

[NSString stringWithUTF8String:"Arbitrary Unicode Goes Here"]

However the C standard does not guarantee any particular behavior in this
case, so it's bad practice to rely on it in your code. Likewise, for ObjC
string constants, gcc just passes the data through. The problem is that
only MacRoman is accepted so for most people the data gets corrupted. It's
also not officially supported by the language/libraries, so once again
it's bad to rely on it. From what I can see, gcc is already doing as much
as it can in this department.

Interesting. Also, a quick look at gcc's man page has also set me
straight. OK, so maybe it's not gcc that sucks, but something sure
sucks! :) It's 2007, I should be able to do something like [NSString
stringWithUTF8String:"Arbitrary Unicode Goes Here"] already! Can you
sense it frustrates me?! :)

Maybe Apple will use the upcoming 64 bit ABI change to improve
NSConstantString and friends?

If you're referring to ASCII-only-ness outside of string constants (like
for identifiers and such), that certainly could be remedied and there's
little reason not to IMO, but once again the resulting code would be
unportable.

No, wasn't talking about that... I believe you are talking about
"extended identifiers" which seems to be on their todo list:
<http://gcc.gnu.org/c99status.html>. I'm no language lawyer, but it
seems to be part of C99, and so would be "portable". (Not that C99 is
widely supported by compiler vendors.)
.