Re: extended operators
- From: "cr88192" <cr88192@xxxxxxxxxxx>
- Date: Fri, 9 Nov 2007 23:54:04 +1000
"Charlie Gordon" <news@xxxxxxxxxxx> wrote in message
news:473442f2$0$28947$426a74cc@xxxxxxxxxxxxxxx
"cr88192" <cr88192@xxxxxxxxxxx> a écrit dans le message de news:
a904f$4733d8ce$ca8010a3$17527@xxxxxxxxxxxxx
<snip>
yes, true.
it is likely that this case would need to be caught and handled.
if(!c)return(v);
Yes, but there is a better fix without a test nor a jump:
uint32_t u32rotl(uint32_t v, int c) {
c &= 31;
return (v << c) | (v >> (31 - c) >> 1);
}
uint32_t u32rotr(uint32_t v, int c) {
c &= 31;
return (v >> c) | (v << (31 - c) << 1);
}
uint64_t u64rotl(uint64_t v, int c) {
c &= 63;
return (v << c) | (v >> (63 - c) >> 1);
}
uint64_t u64rotr(uint64_t v, int c) {
c &= 63;
return (v >> c) | (v << (63 - c) << 1);
}
ok, interesting approach.
#ifdef UINT128_MAX
uint128_t u128rotl(uint128_t v, int c) {
c &= 127;
return (v << c) | (v >> (127 - c) >> 1);
}
uint128_t u128rotr(uint128_t v, int c) {
c &= 127;
return (v >> c) | (v << (127 - c) << 1);
}
#endif
There was also another bug in your variable width rotators, as you forgot
to cast 1 to the proper type when computing the bitmask:
// variable width rotation
#ifdef UINT128_MAX
uint128_t u128vrotl(uint128_t v, int c, int w) {
assert(w > 0 && w <= 128);
c %= w;
return ((v << c) | (v >> (w - c - 1) >> 1)) &
((uint128_t)-1 >> (128 - w));
}
uint128_t u128vrotl(uint128_t v, int c, int w) {
assert(w > 0 && w <= 128);
c %= w;
return ((v >> c) | (v << (w - c - 1) << 1)) &
((uint128_t)-1 >> (128 - w));
}
#endif
yes, ok.
Equivalent versions for smaller uint sizes can be derived directly from
the code above by changing 128 to 64 or 32, or even 16
ok.
also something else:
it seems that even if the compiler has 128 bit integers (mine does, and I
am pretty sure also gcc, but I have not verified), there is little to say
that the types are included in 'stdint.h' (which in the standard, only
requires types up to 64 bits).
Availability for this type should be tested with the preprocessor:
#ifdef UINT128_MAX
#endif
I was just looking at the headers, but this makes sense...
of course, one has to compute UINT128_MAX...
I guess one can represent it in hex:
0x7FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
however, my compiler currently doesn't handle constants this large...
for example, the mingw stdint.h only specified up to 64 bits (since for
my compiler, I am mostly using the mingw headers, I used the existing
header as a template and added the extra types).
You have to make the C library consistent with your compiler features.
I am largely reusing the existing libraries and headers from mingw.
as such, the only library stuff I have to make consistent is my own
extension libs, and also a few of the headers (I am actually keeping my
custom headers in a seperate include directory, as such, I patch over the
existing headers, rather than fully customizing the originals).
oh well, as of 2 days ago, my compiler now handles the 'static' keyword,
and a few other arguably basic features.
Good! It is a wise move to support the "basic" features before extending
the language syntax and feature set.
well, I have my fair amount of extensions as well, just also getting the
'basic' stuff working is kind of a goal, though at present full conformance
on a few points may be asking a bit much...
often, it is the fine details that are most painful...
this morning, mostly bug fixing:
fixing some issues with the 'proxy' system (also, it now interfaces some
with the windows DLL import mechanism, allowing me to more directly
replace functions imported from dlls);
Do you support or plan to support a linux target ?
linux proper, not at present, although it shouldn't be much work though to
make it work on 32-bit linux (most of the internals are fairly architecture
neutral).
I have before built and tested parts of it on linux x86-64, but I don't have
a working x86-64 version as of yet, as, once again, there is a bit of pain
in the details...
dealing with buffer issues (groan, I am thinking I may need to beat
together a "cleaner" way of dealing with temporary text buffers, rather
than either making them really large or risking overflow, or the ugly
hackwork needed to handle dynamically resizing them, and vsnprintf being
a little less helpful than ideal...).
Does your compiler compile itself yet ?
If so, you can start bootstrapping and use your own extensions in the
compiler, including some fancy dynamic string objects. Bootstrapping non
standard features is a non trivial iterative process, you might have fun
just trying it out (hint: for portability, you could use an indermediary
interpretive target ;-).
no, as of yet, it is not my plan to make the thing self-compiling.
the major issue is that its primary target is as an incremental/JIT-based
compiler, and only really has a static compiler as an afterthought.
as such, it itself is compiled with gcc...
a more complicated issue is related to parsing, for example, how to
implement a "generic" token stream (issue: what exactly classifies as a
'token' depends on context, and neither BNF or regexes seems a terribly
good match).
There are very few context dependencies for the token parsing rules in
Standard C. I can only think if the #include <filename> hack, and the
handling of white space including end of line in preprocessor directives.
what I meant by 'context' was that was that in different situations, we have
different sets of tokens. for example, x86 assembler, C, Scheme, and XML,
have very different sets of tokens...
in the past, I have mostly always just used copy/paste/edit to beat together
new tokenizers, but here I would need something more general purpose (having
a mass of slightly different tokenizers, though very possible, is wasteful
and inflexible).
so, the issues are:
what types of tokens are there;
what tokens fall into each set;
how to handle open-ended tokens (such as numeric types and symbol names);
....
ideally, nearly all this would be parameterized in some way, rather than
just having a bunch of different tokenizers.
If you add support for regular expression literals, you will need context
feedback into the token parser to determine if / starts a division
operator (/ or /= when expecting a binary operator) or a regex literal
(when expecting a primary expression), but that's rather easy to hack.
If you plan to add user defined tokens and operators as we discussed
elsethread, these should not be context dependent, just dynamically
defined: the token parser needs to be updated to include support for each
newly defined (or undefined) token on the fly.
I don't think regular expressions are a generic solution for this problem.
I would favor an extensible finite state machine with ad-hoc special cases
for string/char literals, numbers, identifiers. It may involve regular
expressions as a way to extend the syntax for numbers (for binary
literals, _ grouping, complex and quaternion literals, decimal floats,
bignums...), strings (borrowing syntax from ruby, python or even lua), and
identifiers (adding $ or ::)
This approach requires a lot of dynamic support in the compiler: just
redefining the tokens on the fly is not enough, the compiler needs to
compile and execute custom parsing and marshalling code at compile time to
handle the new syntaxes for literals.
I wasn't talking about filling up the language syntax with magic here, more
just trying to find a general way to resolve the "bunches of different
parser and printer interfaces" problem, along with the good old buffer
management issues (another use is because, in a very different part of my
main project, I am likely to need similar, though to deal with textual data
serialization issues).
better to solve the problem a good way one time, than to continue endlessly
solving it half-assedly every time.
having a seperate set of parsers and printers, for more or less every
compiler stage (the present situation) is wasteful...
buffer management:
because doing a 1MB malloc, if only ever to be the target of various sprintf
calls, is wasteful, and because vsnprintf is a little less helpful than
ideal (seems to always return -1 in my case if any of the string parameters
are, large...).
so, the goal would be this:
the preprocessor, C parser, RPNIL compiler, and assembler, each share a
singular set of generalized parsing, printing, and buffer-management
functions, rather than each stage (as is presently) half-assedly rolling its
own version (which adds to code bloat and fragmentation, and also wastes
memory due to the current crude practice of allocating bunches of
rediculously large buffers in trying to deal with the rare edge cases that
smaller buffers get overflowed).
a simple example:
for parsing nearly each and every token, I have a 256 byte token buffer;
are most tokens ever likely to be this large? no. most are, at most, a few
tens of bytes...
now, a smarter API will be able to figure out how to use smaller buffers,
and thus, in general, waste far less memory...
The next step would lead to a substantial revamping of the "preprocessor"
where all the constructs of the language including statements, functions
and data just compiled could be used at parse time. We may need some
syntax sugar to implement dynamic token generation (like the ` , lisp
trick):
Just for fun, and to further delve into off topic wilderness, let's
introduce new syntax for:
- quoting tokens: `{ `, `int `++ `123 ... of type token_t
- token composition: function syntax, but could use ## or another operator
- generating tokens: [[ tokenlist ]]
- generating dynamic tokens: [[ @variable @(expression) ]]
static inline int bitcount_helper(unsigned x) {
return x == 0 ? 0 : (x & 1) + bitcount_helper(x >> 1);
}
int const bitcount_array[] = {
for (int i = 0; i < 256; i++) {
[[ @bitcount_helper(i), ]];
}
};
/* a potentially simpler alternative would be: */
int const bitcount_array[256];
for (int i = 0; i < countof(bitcount_array); i++) {
/* modifying const objects is OK at compile time */
bitcount_array[i] = bitcount_helper(i);
}
/* Furthermore, bitcount_helper could be defined inside the
local scope of the array initializer or the for block so
as not to clobber the global namespace and be discarded
immediately after use. */
/* We could also generate the rotation functions this way: */
for (int size = 16; size <= 128; size <<= 1) {
token_t type = tokenpaste(`uint, size, `_t);
token_t rotr = tokenpaste(`u, size, `rotr);
token_t rotr = tokenpaste(`u, size, `rotl);
if (typedef(type)) [[
@type @rotr(@type v, int c) {
c &= @(size - 1);
return (v >> c) | (v << (@(size - 1) - c) << 1);
}
@type @rotl(@type v, int c) {
c &= @(size - 1);
return (v << c) | (v >> (@(size - 1) - c) >> 1);
}
]];
}
I guess its not easy to come up with a viable alternative to the
preprocessor that's as powerful, provides better features, and keep it
simple or at least readable.
actually, I had been feeling temped before by the idea of adding in a hybrid
between the C++ template system and a LISP-style macro system...
likely, the macros would run a kind of special-purpose mini-C, rather than
the full language. using the full language would require a good deal of
additional complexity, but something simpler, such as a direct interpreter,
is much more reasonable. this means, of course, that macros are unlikely to
have access to program state or system libraries (and may be semantically
different in some cases), but this is hardly much real loss...
however, at present, this issue is non-trivial (it will likely wait until I
time when I somewhat rework how the upper stages of the compiler
operate...).
--
Chqrlie.
.
- References:
- Re: extended operators
- From: David R Tribble
- Re: extended operators
- From: Charlie Gordon
- Re: extended operators
- From: Antoine Leca
- Re: extended operators
- From: David . R . Tribble
- Re: extended operators
- From: lawrence . jones
- Re: extended operators
- From: cr88192
- Re: extended operators
- From: Charlie Gordon
- Re: extended operators
- From: cr88192
- Re: extended operators
- From: Charlie Gordon
- Re: extended operators
- Prev by Date: Re: extended operators
- Next by Date: Re: extended operators
- Previous by thread: Re: extended operators
- Next by thread: Re: extended operators
- Index(es):
Relevant Pages
|
Loading