Re: new IL: C (sort of...).




"Marco van de Voort" <marcov@xxxxxxxx> wrote in message
news:slrnh3783m.j4.marcov@xxxxxxxxxxxxxxxxxx
On 2009-06-13, cr88192 <cr88192@xxxxxxxxxxx> wrote:
subset, except some weirdness as the ? operator and post and
preincrement
as
expression (a similar feature as statement exists). And of course the
preprocessor is not mandatory.

only for "recent" Pascals (read: past 10 years), which could almost more
correctly be called Delphi's than Pascal's...

No, there are more, pretty much every non 16-bit versions does. And Delphi
is also 15 years old.

And x86 16-bits C's had similar limitations as e.g. TP.


yeah, far pointers, ...

far pointers weren't really limited, just awkward...
only now I forget some of the rules of far-pointer usage, but it doesn't so
much matter...


what I remember of Pascal at least, was from a compiler I was using in
the
late 90s, and with my reference material being books from the 80s...

Which compiler? Afaik the only other option is GNU Pascal and that
supports
arithmetic too (wouldn't be surprised if it did since the eighties)


I remember I used FreePascal / FPK Pascal back then, and a few others, only
I don't remember which it was...

(at the time, I had also partly learned Fortran77, but noted that F77
style
code didn't apparently work in GNU Fortran...).

Point is, it is archaic. I might as wel make similar statements about
C because I dabbled a bit in C51 or a 8-bit microchip variant, and whine
about the fact that in PIC's C you can't have arrays larger than 256
bytes.


I think the issue is that the language GNU Fortran accepts is too much
newer, and so F77 is no longer accepted...

I guess, newer fortran's dispensed with the line numbers and putting
everything at particular indentations, and so F77 with its line numbers and
fixed-indentation was not accepted as such.

probably it does something different than jumping around based on line
number as well, but I have not cared enough about Fortran to really bother
looking (apart from, when I have run across it, newer Fortran code tends to
look a lot more Pascal'ish...).


I think it did give mention though of "record-based magnetic disk
storage"
or somesuch...

Speaking about archaic.....


yep, I guess this is what higher-end computers were using at the time...

(PC's then, had floppy drives, and a FAT-style filesystem, or at least in
the late 80s...).

then again, I forget what part of the 80s the book was from, as I no longer
have the book I think...

some of my books are from the 60s...
yeah, some of these I ended up getting because they were library discards...


C effectively has no string type, and its library emulation of one is
1:1
convertable to Pascal. No difference there.

C doesn't need a string type...

Well, I don't agree. At least not anymore since the era that compiler's
have
enough memory.


but, it has 'char *', which can do strings fairly nicely (nevermind UTF-8
and wchar_t strings...).

in my compiler, I made wchar_t a builtin type (in most cases, aliased to
unsigned short, FWIW).
this is what is used in the (incomplete) Java and C# frontends for 'char'
(and where C's char maps to 'signed byte'...).

annoyingly at times, my signature system was originally designed relative to
the C typesystem, and the change to a more language neutral form (AKA:
integers types are based on size, ...) left some anomolies (such as, apart
from making the frontend mildly CPU-specific, I can either not support LP64
on x86-64, or force LP64 on x86, both of which would raise issue...).

this is because 'sizeof(long)' depends on arch, and previously the sig
handling code for 'long' had made it variable-sized, but now I have
designated it as being a fixed 64 bits, which otherwise makes a problem for
x86 (where much code on x86 may end up assuming a 32-bit long...).


of course, the new IL could help this, as this little piece of information
could be kept in the 'BMC' compiler, which could more safely be allowed to
know which arch it is targetting, and thus make relative adjustments.

however, this would mean that on x86:
void foo(int x);
and:
void foo(long x);

would have the same signature, which 'could' pose a problem if this
distinction is used when overloading functions in the C++ frontend (granted,
probably no one would do this anyways...).

this will not matter for C# and Java, which define long as always being 64
bits.


FWIW:
I may use a "modified" typesystem in BMC, namely it will still accept C's
typenames, but may also accept another set of builtin type names:
__int8
__uint8
__int16
__uint16
__int32
__uint32
__int64
__uint64
__int128
__uint128
....
__float16
__float32
__float64
__float80
__float128
....
as well as misc:
__char8
__char16

the reason then would be to reduce ambiguity, for example:
the C frontend may emit its long as 'long';
the C# frontend may emit its long as '__int64'.
....

the reason being that, when the input is comming from multiple possible
languages, it may get confusing as to which exact languages' conventions are
being followed in the IL (the IL is C based, but need not be C-specific,
although being "in keeping with C" is helpful, as otherwise I can be almost
certain I will only ever be someone to have an implementation of it...).


C# is C'ish, or at least a lot more than Java is, in both syntax and
semantics...
after all, C# has structs and pointers for those who want them...

Its pointers are rudimentary) Its semantics are rather Delphi than C. No
wonder, since it was designed by Delphi's author. It is roughly Delphi
syntax with curly braces and a few C operators.


odd, I had thought C# had grabbed C's pointer system as-is, but granted I
had not looked at this aspect much...

in my implementation though, since the parser and a lot of other things are
shared between languages, I will probably support full C-style pointers (as
is, my implementation is 'unsafe' by default, so the unsafe keyword is
presently sort of a no-op...).


apart from a few minor differences, the object system seemed about like the
one from Java.

....

so, you are saying, they are close enough so one could probably just do an
alternative parser and compile it along with these other C-family
languages?...

maybe interesting, but I don't know as much if this is something I would
probably do (enough other things to do, FWIW...).


in theory, each address is a different object (and no too objects hold
the
same address).

This goes nearly for any static compiled imperative language. Including
Pascal, Modula2, Ada etc. It is not really an unique C selling point.


yes, ok.


it can also be noted that I end up using SSE for all sorts of things,
many
of which are not strictly supported by the processor (and a lot of code
goes
into simulating a higher level of "orthogonality" than actually
exists...).

The main use of SSE in not specialised code is to make numeric benchmarks
go
faster (think shootout) and to improve the basic memory move.


yep...

I use them for memory copying as well...

but they are used for many other tasks:
floating point;
vectors and quats;
128 bit integers and floats;
128 bit pointers;
'long long' (x86);
....

recently, I have added SSE half-register support, which could be useful
mainly for long-long (x86), and possible register spillover (x86 and
x86-64...).

in the long-long case, it would likely be primarily beneficial as LL will
only take about 1/2 the register space (presently, LL uses an entire SSE
register). I am a little less certain though, as certain LL operations may
be made less efficient (since the LL may be in the upper-half of the reg,
and otherwise has to "cooperate" with whatever other value it may be sharing
the register with, meaning I can't just do full-register operations in these
cases...).


I had considered the prospect of using them for pointers (basically, so
certain memory-based types could refrain from using up the precious GPRs on
x86), but then realize the instruction sequences for doing so may not be
cheap...

it may or may not be justified vs, say, the relative cost of performing a
matmult, but I am not sure...

the alternative is, of course, to just pass the pointer to the matrix/... in
a GPR.

this could mean, however, performing matrix operations in a slightly less
efficient 2-address form, mostly because 3-address matrix ops with the
pointers held in GPRs would very likely exceed the number of available GPRs
on x86... (there are 5 usable GPRs, 3 of which would be needed for the
pointers, and 2 or more of which would likely be needed in the "compound
op").

using SSE regs for the pointers would take 1.5 SSE regs, and would not as
likely pos a problem (I can be more flexible with how many GPRs are used
within the operation), and if I use the "magic operation" trick (calling an
ASM thunk to do the operation), this overhead is negligable ('movhps mem,
xmm' and 'mov mem, reg' are not much different in terms of clocks...).


note, this only really applies to mat3 and mat4, as mat2 (like vec4 and
quat), can be passed (and manipulated) entirely in SSE registers.


- C requires all C objects to map onto a contiquous sequence of C
characters.
- C implements an "offset operator" which indexes contiguous sequences
of
C
characters.

I don't get these, could you explain what you mean here, and what the
relevance for lowlevel is? I think you mean that most constructs are
either
static structures or pointers to them.


I think the point is that, C bases its fundamental operating model on a
byte-mapped address space, which many other languages do not adhere to
(as
such, C is much closer to the Von Neumann ideal than are many other
languages...).

All languages that I named do it in the same way. Stop reading eighties
books :-)


ok.


While it saves maybe on the compiler in a 4k environment, I'm more
interested in lowlevel TARGET than HOST. So while this may true, could
you
please explain why having a normal array in addition to a pointer is a
bad
thing?


I am not sure the intent of this point exactly...
for C99 at least one needs an actual array type (in addition to pointer
types), it just happens that arrays can be converted to pointers fairly
easily...

Contrary to popular perception, a static array does not have a pointer in
the language sense. There is no room allocated for the precalculated
object,
you cannot change it.

C is indeed a bit obscure about the difference between pointers and
arrays,
but I never really considered it a feature. I always attributed that to a
result of some minor memory saving in the original compiler or to its
funky
preprocessor architecture.


I don't know the reason.

in my case, pointers and arrays "look about the same", but are handled
differently internally.

ammusingly, my low-level codegen (or, more correctly, the codegen in the
process of being rewritten) now supports pass-by-value usage of arrays,
although C is not so likely to use this.


in C, it would be if you could be like:

float a[16], b[16];

....
put something in b...
....
a=b;

I might make my IL also support this...


note, however, that FWIW one can add their own "string" type via library
code...

No you can't. A stringtype and emulating it are two different things. The
compiler can't check and enforce your string code.


yeah, maybe, only the runtime does it at runtime...

well, I could very well include builtin "managed strings" in the new IL.

it can be noted that due to technical issues, my strings will not be
strictly in conformance with the JVM's definitions, primarily in that my
framework's strings are not instances of "java.lang.String" (AFAIK, JVM
implementations tend to use internal loading trickery to make this work...).

in my case, I am likely to do something else: likely, this class will wrap
strings, rather than be strings.
that, or strings can be secretly marshelled (my Java support has not
actually gotten that far yet, as I am still beating with the endless
intricacies of getting all this various stuff to work together in my
framework...).


I guess eliminating buffer overflow could require bounds-checking and
similar...

Hard to do on dynamically allocated types without explicit compiler
support.


yes, but it is worth noting that I also have .NET style "jagged" and
"square" arrays, which do include bounds checking...

of course, like objects, full IL and compiler support for them is
incomplete...


I have yet to decide on whether or not to expose them in the C frontend (as
a compiler extension):
int foo[64]; //C-style array
int[64] foo; //managed array


even then, some trickery in the compiler could still pull off (limited)
bounds checking even without dynamically allocating the arrays. this would
be limited in that things like passing an array as a pointer would likely
break such support, apart from far more expensive trickery, namely doing a
pointer-based lookup to find the array definition in order to do the bounds
check (worse for stack-based arrays, which would require back-tracking and
having the compiler keep track of metadata for things like stack layout, ...
that or me implementing proper debug info...), ...


but, for whatever reason, many newer languages think it a good idea to
waste
lots of memory with unicode strings (vs UTF-8 strings...).

That is a hard and emotional subject. I had a lot of opinions about that,
but the more I dove into it, the more confused one gets.


I guess the big issue is whether one wants to regard strings more as atoms,
streams, or as arrays.

I tend to regard strings as either atoms or streams, and so UTF-8 makes the
most sense (after all, IME ASCII is the most common character set, even with
lots of text originating nowhere near the US...).

I guess if one wants to regard a string as an array, then UTF-8 is awkward
since characters may be different sizes, and so it is felt the
closer-to-uniform indexing is justified (since indexing by char in a UTF-8
string requires scanning through the string).

however, it is not difficult to support both varieties, and this is actually
what I do at present...


although, one can still debate if Delphi is really Pascal, or more like
Pascal++...

It's both, just like C++ also includes C.

yes, but C is not C++...


as I see it, Pascal is the older definition, which may include all the
stuff, and varieties, generally accepted to exist to begin with.

Delphi and newer Delphi-alikes may be a superset of Pascal, and far more
normalized than the rest of Pascal land, however, there is much in Pascal
land which is not Delphi, such as Ada, many older varieties, ...

it is like, how C++ and Objective-C are both OO-based C supersets, but each
is a very different beast from the other...


the C camp has actually put much emphasis on standardization, such that
nearly any compiler can be written "to the standards", and have code
generally work for it (source works between compilers, often binary code
will link between compilers, ...). one is safe so long as they don't rely to
heavily on compiler extensions...


Pascal has much weaker standards, and so people resort to an alternative
means:
each implementation clones other popular implementations...

this would be much the same as if, in C land, as opposed to people writing
compilers following the standards, everyone just started using gcc and MSVC
as their reference implementations...



.



Relevant Pages

  • Re: new IL: C (sort of...).
    ... and the frontend compiler needs to be able ... Having very limited pointers is a fact of life in a VM language though, ... vague claims about Pascal's pointer support. ... That is pretty normal for standards (the ...
    (comp.lang.misc)
  • Re: A taxonomy of types
    ... however, elsewhere in my project (off in the dynamic typesystem, ...), I ... (since I am using NULL-terminated strings), and so I have used U+10FFFF ... remember, C also has things like arrays, funtion pointers, nestable ... int RIL_TypeSmallIntP; ...
    (comp.lang.misc)
  • Re: code optimiation
    ... Given that the compiler can often optimise the generated code to use the best sized types available, it's seldom worth specifying "fast" types explicitly. ... pointers and floating point types whose "zero value" might not be all- ... instruction, so the assembler produced for *p++ when used as the ... It will do the same job, and let you write the source code using proper array constructs. ...
    (comp.arch.embedded)
  • Re: HeapFree() Failing to deallocate string
    ... I've been able to recreate and isolate the problem with HeapFree(), ... in this simplified example is just a pointer to 16 bytes to store pointers ... the szCaption strings need to be copied to the pointed to ... strings for which storage needs to be allocated, ...
    (microsoft.public.windowsce.embedded)
  • Re: Increasing efficiency in C
    ... >> The representation of a string in C is the sequence of characters, ... strings, they are passed the addresses of strings. ... supports pointers the way it does. ... Competent programmers make mistakes, too. ...
    (comp.lang.c)