Re: IDN Domain name - ACE Punycode



I found PHP code for this, but I do not know enough about PHP to convert.
See next posting.

Basically I just need one step of this routine, the rest is clear for me.


Wikipedia describes this step - if anyone understands what I would have to
do in clipper - any help is greatly appreciated !!

http://en.wikipedia.org/wiki/Punycode

Encoding of non-ASCII character insertions as code numbers
To understand the next part of the encoding process we first need to
understand the behaviour of the decoder. The decoder is a state machine with
two state variables i and n. i is an index into the string ranging from zero
(representing a potential insertion at the start) to the current length of
the extended string (representing a potential insertion at the end).

i starts at zero while n starts at 128 (the first non-ASCII code point). The
state progression is monotonic. A state change either increments i or if i
is at its maximum resets i to zero and increments n. At each state change
either the code point denoted by "n" is inserted or it is not inserted.

The code numbers generated by the encoder encode how many possibilities the
decoder should skip before an insertion is made. "ü" has code point 252. So
before we get to the possibility of inserting ü in position one it is
necessary to skip over six potential insertions of each of the 124 preceding
non-ASCII code points and one possible insertion (at position zero) of code
point 252. That is why it is necessary to tell the decoder to skip a total
of (6 × 124) + 1 = 745 possible insertions before getting to the one
required.


Re-encoding of code numbers as ASCII sequences
Punycode uses generalized variable-length integers to represent these
values. For example, this is how "kva" is used to represent the code number
745:

A number system with little-endian ordering is used which allows
variable-length codes without separate delimiters: a digit lower than a
threshold value marks that it is the most-significant digit, hence the end
of the number. The threshold value depends on the position in the number and
also on previous insertions, to increase efficiency. Correspondingly the
weights of the digits varies.

In this case a number system with 36 "digits" is used, with the
case-insensitive 'a' through 'z' equal to the numbers 0 through 25, and '0'
through '9' equal to 26 through 35. Thus "kva", corresponds to "10 21 0".

To decode this string of "digits", the threshold starts out as 1 and the
weight is 1. The first digit is the units digit; 10 with a weight of 1
equals 10. After this, the threshold value is adjusted. For the sake of
simplicity, let's assume it is now 2. The second digit has a weight of 36
minus the previous threshold value, in this case, 35. Therefore the sum of
the first two "digits" is 10 × 1 + 21 × 35. Since the second "digit" is not
less than the threshold value of 2, there is more to come. The weight for
the third "digit" is the previous weight times 36 minus the new threshold
value; 35 × 34. The third "digit" in this example is 0, which is less than
2, meaning that it is the last (most significant) part of the number.
Therefore "kva" represents the number 10 × 1 + 21 × 35 + 0 × 35 × 34 = 745.

For the insertion of a second special character in "bücher", the first
possibility is "büücher" with code "bcher-kvaa", the second "bücüher" with
code "bcher-kvab", etc. After "bücherü" with code "bcher-kvae" comes
"ýbücher" with code "bcher-kvaf", etc.

To make the encoding and decoding algorithms simple, no attempt has been
made to prevent some encoded values from encoding inadmissible Unicode
values: however, these should be checked for and detected during decoding.

Compare an ASCII 'punycoded' URL http://xn--tdali-d8a8w.lv/ that includes
the Unicode representation of the Latvian "u with a macron", and "n with
cedilla", instead of the unmarked base characters: http://tudalin.lv.

Punycode is designed to work across all scripts, and to be self-optimizing
by attempting to adapt to the character set ranges within the string as it
operates. It is optimized for the case where the string is composed of zero
or more ASCII characters and in addition characters from only one other
script system, but will cope with any arbitrary Unicode string. Note that
for DNS use, the domain name string is assumed to have been normalized using
Nameprep and (for top-level domains) filtered against an officially
registered language table before being punycoded, and that the DNS protocol
sets limits on the acceptable lengths of the output Punycode string.




.



Relevant Pages

  • Re: Word Bug: Find/Replace wildcards while track changes is turned on is buggy
    ... characters, where the first character is a digit, in the example search ... I want to put a flag character before and after the string ... the deletion and insertion would ... and in such a way as to make the replacement ...
    (microsoft.public.word.docmanagement)
  • Word Bug: Find/Replace wildcards while track changes is turned on is buggy
    ... A user I work with has found an unusual bug in Microsoft Word, ... characters, where the first character is a digit, in the example search ... I want to put a flag character before and after the string ... the deletion and insertion would ...
    (microsoft.public.word.docmanagement)
  • Re: Word Bug: Find/Replace wildcards while track changes is turned on is buggy
    ... characters, where the first character is a digit, in the example search ... I want to put a flag character before and after the string ... the deletion and insertion would ... then all of the replacement text is inserted. ...
    (microsoft.public.word.docmanagement)
  • Re: Adding string to Memo
    ... >seems strange that i always thought delphi kept a copy of the string ... using WM_SETTEXT would necessitate reformatting the wrapping for all ... Basically the insertion of a character is 95% likely to take place on ... I'll also bet that the text in a multi line Edit control is not stored ...
    (alt.comp.lang.borland-delphi)
  • Number Formatting
    ... Insertion Point at the start of the number and use the ... John ... >Is there a way to format a 10 digit number so that it ...
    (microsoft.public.word.docmanagement)

Loading