Encoding issue under Windows



Windows console works with the CP866 encoding.

So input and output should be redefined to CP866.

This thread http://www.ruby-forum.com/topic/184730 suggests a nice
solution:

``````
Encoding.default_external = Encoding.find(Encoding.locale_charmap)
Encoding.default_internal = __ENCODING__

[STDIN, STDOUT, STDERR].each do |io|
io.set_encoding(Encoding.default_external, Encoding.default_internal)
end
``````

And it works. The text is now correctly displayed in Windows console.

But the code crashes whenever Ruby faces a string that contains a
non-CP866 character (like em dash "—"):

in `write': U+2014 from UTF-8 to IBM866
(Encoding::UndefinedConversionError)

How do i overcome this error? Not using UTF-8 is not an option.

Iconv has a nice solution: add "//IGNORE" to encoding name and error no
longer takes place. But IO does not recognize "//IGNORE", so this
solution i unacceptable.

I can't use Iconv instead of IO due to the following reason. I use
RubyMine. Unlike Windows console, RubyMine console works in pure UTF-8.
But whenever code run by RubyMine writes to disk, it seems to use
Windows encoding because text gets written in corrupt encoding. :( And i
do not know a way to distinguish RubyMine from Windows console so that
my program could perfrom only the conversions necessary for the current
environment.

--
Posted via http://www.ruby-forum.com/.
.



Relevant Pages

  • Re: unicode bit me
    ... Your results depend on your source code's encoding and your system's stdout encoding. ... The unicode() function, given a byte string of unspecified encoding, uses the ASCII codec. ... Assuming your source encoding was utf-8, unicodewill correctly convert it to unicode, and then printing that unicode string will attempt to convert it to stdout encoding. ... Here's a couple of the same instructions on my Windows console with cp437 encoding, which doesn't support the copyright character: ...
    (comp.lang.python)
  • Re: I18n issue with optik
    ... Terminal encoding is not US-ASCII. ... terminals I tried are set to US-ASCII (rxvt under Cygwin, ... Only the Windows Console is ... instance one with polish characters that aren't contained in iso-8859- ...
    (comp.lang.python)