Re: RfD: XCHAR wordset (for UTF-8 and alike)



jmdrake_98@xxxxxxxxx writes:
>
>Anton Ertl wrote:
>> I, too, originally thought that that was the way to go. But when I
>> thought about example programs, I found that there are not that many
>> examples where you have to deal with individual characters (examples
>> that I came up with were anagram or palindrome checkers, i.e., stuff
>> that's not very common).
>
>You seem to be taking the view that Chuck Moore took regarding
>ColorForth. (Forth processes words, not charecters). Anyway,
>there's and obvious application that seems to have skipped your
>mind. Text editors. In most text editors you deal with the
>individual charecters.

Sure. So if BMW or any of the other editors in Forth is to be
extended to work with xchars, they need some changes. The point is
that most applications around need changes only in a few places (or,
if they are lucky, nowhere), not everywhere where strings are handled.

>Anyway, using "variable length" Xchars turns the easiest of
>text editing tasks into the most complex. I'm talking about
>"overwrite mode". With fixed length chars that simply means
>replacing one char with another. But with variable length
>chars you'd have to do a deletion/insertion because you
>couldn't be sure ahead of time if the new char would be the
>same length as the char it was replacing.

So what? Unless the editor is optimized for overwrite-only, turning
an overwrite into a delete-forward followed by an insert should be
peanuts.

BTW, overwriting a character with another does not necessarily mean
replacing a character with another. E.g., in the buffer
representation I used in my last editor (hole at the cursor),
overwriting a character means deleting a character from the
behind-cursor part of the buffer, and inserting the replacement at the
before-cursor part, even for fixed-width characters. Alternatively,
you could replace the character in-place in the behind-cursor part,
then do a cursor-right, which in turn consists of deleting the char
from the behind-cursor part, and inserting it at the before-cursor
part. As you can see, that data structure is optimized for inserting
and deleting.

>> In most cases you deal with strings of characters, or you can write
>> the code such that it works on strings (see the scan2 example). And
>> for that kind of code, variable-width characters (done the right way)
>> work just as well as fixed-width characters, so why deal with all the
>> trouble that widening fixed-width characters would cause?
>>
>> - anton
>
>What "trouble" do you see widening fixed-width chars causing?

That depends on how you do it:

- make 1 CHARS > 1: lots of code breaks.

- add widechars: the transition from ordinary chars to widechars is
harder than the transition to xchars. And you have to transition
large parts of an application at the same time.

>It seems quite simple to me. In fact all of the words in the
>XChar wordset can be trivially implemented if you assume fixed
>char width.

Yes. However, using UTF-32 (i.e., fixed-width) xchars with 8-bit
chars would mean that a string containing ASCII characters would be
represented as xchars differently from chars. So, should string words
like READ-LINE produce an xchar string or a char string? Ypu would
have to introduce another set of string words, like for widechars.
And while we are at it, you probably should be using widechars anyway,
because they are designed for fixed-width encodings, whereas xchars
are designed for variable-width encodings.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.complang.tuwien.ac.at/forth/ansforth/forth200x.html
EuroForth 2005: http://www.complang.tuwien.ac.at/anton/euroforth2005/
.



Relevant Pages

  • Re: Is this string input function safe?
    ... return a pointer to mallocated memory holding one input string, ... See my comment after your call to fgets. ... char* malloc_getstr ... before any characters are read, then the ...
    (comp.lang.c)
  • RE: Fixed Length
    ... match these formats, but rather pull the existing, normal, formats from the ... and run a function to build that information into a string that you ... length and the total lenght of all the fields has to be 100 characters. ... JobId (8 char) ...
    (microsoft.public.access.modulesdaovba)
  • Re: Is this code totaly a shit?
    ... | void UppStrg(char *Low, char *Upp, int cnt); ... whitespace-delimited string. ... You're also assuming that the representations of the characters ...
    (comp.lang.c)
  • Re: Sorry, newbie question about generating a random string
    ... string grows to a max of 10 characters. ... The real problem is that you are not terminating the string. ... string is an array of characters ending in a null character, ... char myChar; ...
    (comp.lang.c.moderated)
  • Re: RfD: XCHAR wordset (for UTF-8 and alike)
    ... >I discussed xchars with Stephen Pelc at EuroForth, ... primitive characters and extended characters. ... A string is represented by caddr/len ...
    (comp.lang.forth)