Re: Sets and portability (was) Re: Is ISO Pascal compatible with J&W (original) Pascal ?
- From: frank@xxxxxxxx
- Date: 01 Jul 2005 11:15:41 GMT
Marco van de Voort <marcov@xxxxxxxx> wrote:
> On 2005-06-30, Jason Burgon <gvision@xxxxxxxxxxxx> wrote:
>>> > what Delphi (and FPC, and GPC?) had to do with strings. The principle
>>> > is just the same.
>>>
>>> The difference is that strings become large only if the user
>>> explcitly puts long data in them which doesn't normally happen
>>> accidentally, whereas, say, a set of all Unicode letters implcitly
>>> requires more space (probably in any representation as it's rather
>>> irregular) than in an 7/8 bit charset.
>>
>> (1) The vast majority of code in any complex program is library code (be it
>> your own or someone else's), and that needs to be as flexible as practical.
>> So library code would be better if it could handle huge sets.
I didn't say it wasn't needed -- quite the opposite actually. For
strings, the user can control the length by the data they process;
with sets the size explosion happens automatically when switching
charsets, even when processing the same data.
>> (2) The computer world is more complex than it's ever been (eg Unicode)
>> and will just get more so. So why make life even more difficult for Pascal
>> programmers by obsoleting their (eg: character) set library code? Again
>> the Delphi WideString type is a good example of providing familiar
>> mechanisms for dealing as seamlessly as is possible with the added
>> complexity of the 21st century.
>
> Clean code will indeed remain working, since they operate on the basis char
> tricks. A reference counted system like ansistring ensures some performance
> with not that optimal existing code.
>
> Widestring has as problem that it is different between the Windows and Linux
> editions of Delphi. In one it is a COM bstr, in kylix more like an
> ansistring (but then 16-bit). So I wouldn't use it as an example, unless you
> mean the Kylix version.
I'm not too familiar with those Borland/Windows particulars. Anyway,
longer strings and "wider" (16 or 32 bit) chars have never been a real
problem. The Pascal `Char' type can be this size (unlike C, it isn't
required to be 1 byte). Both standard Pascal fixed-strings and
Extended Pascal strings can be as large as the integer range. (Only
the UCSD/BP short strings were problematic, being limited to 255
chars.)
>> (3) Like your ~average~ string, a clever huge set implementation (like mine
>> ;-) of an ~average~ huge set is likely to be quite sparse or have large
>> areas of contiguous members, and wouldn't therefore use up huge amounts of
>> memory.
Not necessarily. AFAIK, Unicode letters already are rather
fragmented. Of course, and that's the good thing, a typical program
probably won't use many different sets of such kind (letters,
upper/lower case, digits, punctuation, etc.). This might save the
day.
> True. And refcounting (copy on write) would ensure that original code that
> is read-only, but passes somehow by value will still work not to shabby.
Probably. Of course, most code shouldn't even need this as it
probably won't pass such sets by value. So even a dumb implementation
might work to some extent with 16 bit charsets.
>> (4) Sure, a typical set of Unicode chars (say all uppercase characters) will
>> likely use more memory than a set of 7/8bit char (but in my case, not that
>> much more).
>
> Unicode is 32-bit, though only a (magnitude) 100000 codepoints are assigned.
AFAIK, strictly speaking Unicode is 16 bit, and UCS is 32 bit, but
that's nitpicking. According to Wikipedia, UCS has over 1.1 million
"code points" already. But even this would be livable (135 KB per
set) today. A UCS `Char' could be a 32 bit type, but with a suitable
range (instead of the full 4 billion), so even a dumb `set of Char'
could just barely work in practice.
> Note that ansi->wide conversion is codepage sensitive. I haven't reached a
> conclusion if this must be set runtime (from now on, assume all ansi->wide
> conversions are cp857 or some windows convention) or compiletime (directive,
> compiler links in correct conversion code or table).
If this means roughly the same in Windows that means iso-8859-1 AKA
latin1 etc. elsewhere, I think it should be runtime.
> The good part of doing this runtime you can make your program's user specify
> what encoding he uses for all plain text. The bad part is bloat with a few
> tens of kbs (even 100s) of conversion tables _IF_ they cannot be gotten from
> the OS or shared libs.
Yes. At least on modern Unix systems, both the tables and readily
available conversion functions exists. You might not have to
reinvent the wheel.
> Yes, there are 4 types:
>
> 1. registers
> 2. static sets
> 3. ref counted, dynamically allocated sets.
> 4. dynamically allocated sparse sets, possibly ref counted.
>
> The order 1 -> 4 is also roughly how you would change the type if the
> amount of elements get higher.
>
> One could specify the transitions from 2->3 and from 3->4 on the cmdline,
> e.g. to mimic behaviour of a legacy pascal compiler.
I don't understand this point. I think if several set models are
provided, then automatic conversion between all of them should be
done wherever necessary. This may indeed be the hardest part.
(That's independent of charsets, of course.)
> Conversions are not necessary, since only set of x; and set of y with x<>y
> are not compatible anyway.
They are compatible if x and y are compatible. So you need the
conversions unless for sets of subranges you choose the
representation applicable to the base type; i.e., set of 1..10 would
need the same representation as set of Integer then, which is just
what you usually want to avoid by providing several
representations. So you probably will need the conversions.
Frank
--
Frank Heckenbach, frank@xxxxxxxx, http://fjf.gnu.de/
GnuPG and PGP keys: http://fjf.gnu.de/plan (7977168E)
Pascal code, BP CRT bugfix: http://fjf.gnu.de/programs.html
Free GNU Pascal Compiler: http://www.gnu-pascal.de/
.
- Follow-Ups:
- Re: Sets and portability (was) Re: Is ISO Pascal compatible with J&W (original) Pascal ?
- From: Jason Burgon
- Re: Sets and portability (was) Re: Is ISO Pascal compatible with J&W (original) Pascal ?
- From: Scott Moore
- Re: Sets and portability (was) Re: Is ISO Pascal compatible with J&W (original) Pascal ?
- From: Marco van de Voort
- Re: Sets and portability (was) Re: Is ISO Pascal compatible with J&W (original) Pascal ?
- References:
- Is ISO Pascal compatible with J&W (original) Pascal ?
- From: Scott Moore
- Re: Is ISO Pascal compatible with J&W (original) Pascal ?
- From: frank
- Re: Is ISO Pascal compatible with J&W (original) Pascal ?
- From: 2metre
- Re: Is ISO Pascal compatible with J&W (original) Pascal ?
- From: frank
- Re: Is ISO Pascal compatible with J&W (original) Pascal ?
- From: Scott Moore
- Sets and portability (was) Re: Is ISO Pascal compatible with J&W (original) Pascal ?
- From: Marco van de Voort
- Re: Sets and portability (was) Re: Is ISO Pascal compatible with J&W (original) Pascal ?
- From: Scott Moore
- Re: Sets and portability (was) Re: Is ISO Pascal compatible with J&W (original) Pascal ?
- From: Jason Burgon
- Re: Sets and portability (was) Re: Is ISO Pascal compatible with J&W (original) Pascal ?
- From: frank
- Re: Sets and portability (was) Re: Is ISO Pascal compatible with J&W (original) Pascal ?
- From: Jason Burgon
- Re: Sets and portability (was) Re: Is ISO Pascal compatible with J&W (original) Pascal ?
- From: Marco van de Voort
- Is ISO Pascal compatible with J&W (original) Pascal ?
- Prev by Date: Re: Sets and portability (was) Re: Is ISO Pascal compatible with J&W (original) Pascal ?
- Next by Date: Re: Sets and portability (was) Re: Is ISO Pascal compatible with J&W (original) Pascal ?
- Previous by thread: Re: Sets and portability (was) Re: Is ISO Pascal compatible with J&W (original) Pascal ?
- Next by thread: Re: Sets and portability (was) Re: Is ISO Pascal compatible with J&W (original) Pascal ?
- Index(es):
Relevant Pages
|