Re: File.new and encoding



I'm doing something like:

File.open("target","w") do |target|
    File.open("source","r") do |source|
        source.each_line do |line|
            ... some processing ...
            target.write(line)
        end
     end
end

Have you looked at 'iconv' in the standard library?

http://www.ruby-doc.org/stdlib/libdoc/iconv/rdoc/classes/Iconv.html

Assuming all your input files were ISO-8859-1, and you wanted your output file in UTF-8, your example might look something like (untested):

File.open("target","w") do |target|
 Iconv.open('UTF-8', 'ISO-8859-1') do | converter |
   File.open("source","r") do |source|
     source.each_line do |line|
       # ... some processing ...
       target.write( converter.iconv(line) )
     end
   end
   target << converter.iconv(nil)
 end
end

Iconv should deal with BOMs, stripping them out or adding them in where necessary. I'm not sure if it will complain if it finds a BOM mid-stream (as you open your second and subsequent input file) - if so you could just instantiate a new Iconv to deal with each input.

HTH
alex
.