Re: Ada95 to Ada2005 parser - currently using lex/yacc - problem with Unicode!
- From: Tom Copeland <tom@xxxxxxxxxxxxx>
- Date: 16 Mar 2007 03:16:42 -0400
On Thu, 2007-03-08 at 19:54 -0500, Tommy Nordgren wrote:
ANTLR supports unicode, but one point to consider with ANY tool, is
that you will need an module that supports converting the input text
files to canonical utf-16.
JavaCC also handles Unicode characters; for example, this would tokenize
and optional minus sign followed by the Unicode code points for "degrees
in Fahrenheit" and "degrees in Celsius" followed by a couple of digits:
TOKEN : {
<FAHRENHEIT_TEMPERATURE : (["-"])? <DIGITS> " \u2109">
| <CELSIUS_TEMPERATURE : (["-"])? <DIGITS> " \u2103">
| <#DIGITS : ["0"-"9"](["0"-"9"])*>
}
JavaCC doesn't yet handle supplementary characters (those outside the
Basic Multilingual Plane). But that's on our radar, so we shall see...
Yours,
Tom
.
- References:
- Re: Ada95 to Ada2005 parser - currently using lex/yacc - problem with Unicode!
- From: Tommy Nordgren
- Re: Ada95 to Ada2005 parser - currently using lex/yacc - problem with Unicode!
- Prev by Date: Re: ELF32 to ELF 64
- Next by Date: Re: pl/sql grammar
- Previous by thread: Re: Ada95 to Ada2005 parser - currently using lex/yacc - problem with Unicode!
- Next by thread: CFP: 1st Workshop on Advances in Programming Languages (WAPL'07, Poland, Oct 07)
- Index(es):