Re: A taxonomy of types




"Rod Pemberton" <do_not_have@xxxxxxxxxxxxx> wrote in message
news:gmcimh$jkn$1@xxxxxxxxxxx
"cr88192" <cr88192@xxxxxxxxxxx> wrote in message
news:gmbfk6$hv6$1@xxxxxxxxxxxxxxxxxxxx
"Rod Pemberton" <do_not_have@xxxxxxxxxxxxx> wrote in message
news:gmanbg$7ad$1@xxxxxxxxxxx

well, I will just say, it gets more than a little "fun" trying to cram
more
than non-trivial types into a 32-bit integer...


I thought we were discussing how to implement "type data", ala James'
post... I definately wouldn't try to "stuff" an "unsigned long long" into
a
"long"... But, he needed, for example, something to indicate the data was
a
"unsigned long long" instead of a "long".


I think there is a difference of meaning here:
I am describing how to represent the representation (and, to a lesser
extent, semantics) of a type.

this is different from how the value of a type is represented.

for example, the "type" can be a single letter: "i", and the associated data
is a 32-bit integer...


remember, C also has things like arrays, funtion pointers, nestable
pointers, ... and all of these cases would need to be handled. one ends
up
using lots of bits just to say what is in the other bits...


Most of the ugly things are handled as a pointer to object. So, you just
need one bit of "type data" to indicate a pointer.


if you are using dynamic typing, maybe...
if the system follows static typing rules (such as in a compiler), it is
necessary to fully define the type...


(Correction: C doesn't have arrays. C has the offset operator. C has
array
declarations, which are designed to work with the offset operator...)


theoretically, and in terms of the functioning...

however, to actually compile the code correctly (AKA: following correct
array semantics, ...) one actually needs to have explicit array types, which
are considered distinct from the pointer-based types.

so, the hardware sees pointers and pointer arithmetic, but the compiler
needs to see an actual array type (and, more so, allow both arrays of
pointers, and pointers to arrays, and sometimes pointers to arrays of
pointers... however these later cases only really tend to emerge inside a
compiler, and need not be representable for variables or similar...).


a string-based representation handles all of this much easier and much
cleaner (I am saying this since I have used both approaches in my
experience...). not like we need a full tokenizing syntax or support for
whitespace or anything (these would eat up performance).

I agree.


yeah...

a strict string need not be used either, but a hybrid/"value chain"
representation could potentially also be used (for example, something a
little closer to .NET "blobs"). however, of note: .NET tends to centralize
its metadata structures, which does not entirely sit well with my "design
sense".


do you expect this to really scale?...

How many types do you need? Are you just using the defined types as in
the
C spec. or are you also including typedef'd types?


well, there are many more types than just the fixed types even if one does
not handle typedefs (these are actually handled in the upper compiler in my
case). for example, beyond just the possible representations of the built-in
types, one also has to handle things like structures and function pointers,
....

as noted, I use the generic type 'Block' in my compiler (this uses such a
bit-twiddled integer representation) for handling things like structs,
unions, classes, function-pointers, functions/methods, ...

the index of this block needs to be packed into the type-info-integer, along
with so many other pieces of info...


in order to be actually useful for much, a good majority of the C
typesystems' semantics (and, very likely, a good deal more) need to be
representable...


Only what is needed for type conversion and assignement conversion, which
is
represent by the types involved.


this is too simplistic...

consider a user types: 'foo->x'.

or 'foo->bar(3, 4);'

if the representation can't deal with more than base-types, the above
expressions can't be compiled...


instead, one should represent the physical types, rather than the
C-level
semantics types.

You don't even have to do that. The size of (i.e., sizeof()...) the
type
should be sufficient to represent the type, possibly with some
exceptions
say float or complex.


things don't work this way...

a type is not so useful if it can't tell your integer from your float
from
your pointer...
almost may as well just give a raw chunk of memory and a size...

Simple integer adjustments...

type=sizeof(token);
/* say size of 4 is 32-bit integer, but pointer is also size of 4 */
if (type==TYPE4) and (!strcmp(token,"pointer"))
type=TYPE8;
/* say size of 4 is 32-bit unsigned integer, but signed integer is also
size
of 4 */
if (type==TYPE4) and is_signed(token))
type=TYPE6;


representing every possible combination as a unique number is not likely to
be so effective either...


I think C compilers work best when sizes are multiples of 2 of each
other...
That is "unsigned char" is 8-bits, "unsigned short" is 16-bits,
"unsigned
long" is 32-bits, "unsigned long long" is 64-bits. When you don't
represent
the common and intermediate sizes, it seems to me you create code
implementation problems.


meaning here?...

You need to be able to represent integers of different multiples of "C
byte"
sizes. I.e., some compilers will skip certain sizes since they aren't
native to the cpu. E.g., skipping implementing 16-bit or 32-bit integers
because the cpu doesn't supports that size.


well, one need not handle all cases, just the common ones...


for example:
a/h: signed/unsigned 8-bit byte;

For C, don't you mean "8-bit char"?...


yes, but byte is more specific here...
for example, Java and .NET define char as 16-bits, and my system also
deals
with these systems, and so byte makes it more clear that I mean 8-bits...

In general, yes, more clear. For C, less clear since spec. doesn't define
a
byte as 8-bits... And, C defines multi-byte or wide characters for just
such
situations. For networking, less clear, since they use "octet".


a byte is a byte whether or not it is called a byte, a char, or an octet...
it matters mostly what the CPUs say, and the CPUs say 8-bit bytes exist...

but, alas, there is a little bit of an oddity:
C says char's are the smallest type, and are usually defined as being
8-bits;
Java and .NET have wider chars (16-bits...), and add 'byte' as the new
smallest type (signed and unsigned byte...).


s/t: signed/unsigned 16-bit short;
i/j: signed/unsigned 32-bit int;
l/m: signed/unsigned 64-bit long;
n/o: signed/unsigned 128-bit int;

Hmm, I guess it's time for me to go back to the C spec., since I
thought
for
sure that "int" had to be a "char", "short", "long", or "long long",
not
a
distinct type by itself as you've done... (Am I wrong?)


int and long may have the same size (at least on x86, or in MSVC on x64,
but
not in Linux on x86-64, PPC64, ...),

Well, technically, I think you comply with the wording of ISO C99... I
don't have a copy of ANSI C available here to check. But, I think that
doesn't comply with K&R C:

In K&R C in A.4.2 Basic Types:

"Besides the char types, up to three sizes of integer, declared short int,
int, and long int, are available. Plain int objects have the natural size
suggested by the host machine architecture; the other sizes are provided
to
meet special needs. Longer integers provide at least as much storage as
shorter ones, but the implementation may make plain integers equivalent to
either short integers, or long integers."

The critical point being an "int" is either "short" or "long", not between
the two...

Also, K&R C in 2.2 Data Types and Sizes :

"int an integer, typically reflecting the natural size of integers on the
host machine"
...
"The intent is that short and long should provide different lengths of
integers where practical; int will normally be the natural size for a
particular machine. short is often 16 bits long, and int either 16 or 32
bits. Each compiler is free to choose appropriate sizes for its own
hardware, subject only to the the restriction that shorts and ints are at
least 16 bits, longs are at least 32 bits, and short is no longer than
int,
which is no longer than long."


yes.


So, in 2.2, it's clear from 4.2, that an int is either "short" or "long",
not between "short" or "long" as 2.2 could be read... It'd be nice if I
could check ANSI C to see if the understanding was "lost" from K&R C to
the
standardization of C with ANSI C... Unfortunately, most my C books have
been boxed up for a few years now, and ANSI C seems to be about the only C
related document I haven't found a .pdf for on the net...

FYI, in pre-K&R C, Ritchie's 1974 "C reference manual":

"Integers (int) are represented in 16bit 2's complement notation."

That was prior to it being implemented on many systems.


not exactly...

"the same size as" does not mean "the same as"...


but in any case typically int is
regarded as an independent type...

Well, it's not what I expect, but it's not my compiler - which is going to
work my way. :-) I expect "int" to be equivalent to one of the other
types, not between them. My concern is your design might break code that
expects either "short" or "long" as "int". I'm not sure were I learned
this
understanding. If it was Harbison & Steele, I can't locate it. I never
read K&R "back in the day".


well, Linux on x86-64 and PPC64 does it the way I had mentioned.

likewise, Java also does this (long is 64 bits, ...).



Rod Pemberton




.



Relevant Pages

  • Re: Functions taking pointers to different types as arguments
    ... Pointers to functions are *interconvertible*. ... Consider int and double: we can take an int value, ... but the conversion has changed things internally. ... use the same representation for int and double. ...
    (comp.lang.c)
  • Re: About casts (and pointers)
    ... >> differing lengths have the same representation. ... p26, pointers to compatible types ... shall have the same representation and alignment requirements. ... requirements as 'int ', ...
    (comp.lang.c)
  • Re: Matrix optimization
    ... With unrolling, it seems like the gcc compiler likes the ... time ./matrix_add pointers ... The compiler does a pretty bad job of unrolling index loops! ... template <typename T, int Rows, int Cols> ...
    (comp.lang.cpp)
  • Re: Teaching new tricks to an old dog (C++ -->Ada)
    ... >> In Ada you can decide representation attributes like range, digit, ... In that situation if you forget to substitute every "int" ... with "long" you don't get any error from the compiler. ... Instead in Ada a programmer can just write code that is portable across ...
    (comp.lang.ada)
  • Re: Teaching new tricks to an old dog (C++ -->Ada)
    ... >> In Ada you can decide representation attributes like range, digit, ... In that situation if you forget to substitute every "int" ... with "long" you don't get any error from the compiler. ... Instead in Ada a programmer can just write code that is portable across ...
    (comp.lang.cpp)

Loading