Re: Ada95 to Ada2005 parser - currently using lex/yacc - problem with Unicode!



On Thu, 2007-03-08 at 19:54 -0500, Tommy Nordgren wrote:
ANTLR supports unicode, but one point to consider with ANY tool, is
that you will need an module that supports converting the input text
files to canonical utf-16.

JavaCC also handles Unicode characters; for example, this would tokenize
and optional minus sign followed by the Unicode code points for "degrees
in Fahrenheit" and "degrees in Celsius" followed by a couple of digits:

TOKEN : {
<FAHRENHEIT_TEMPERATURE : (["-"])? <DIGITS> " \u2109">
| <CELSIUS_TEMPERATURE : (["-"])? <DIGITS> " \u2103">
| <#DIGITS : ["0"-"9"](["0"-"9"])*>
}

JavaCC doesn't yet handle supplementary characters (those outside the
Basic Multilingual Plane). But that's on our radar, so we shall see...

Yours,

Tom

.