Re: compiler bugs



Chris F Clark <cfc@xxxxxxxxxxxxxxxxxxxx> writes:
anton@xxxxxxxxxxxxxxxxxxxxxxxxxx (Anton Ertl) writes:

"Christopher Glaeser" <cdg@xxxxxxxxxxxxx> writes:
[Anton Ertl:]
the optimizer should preserve the behaviour of the non-optimizing
compiler.

In a language like C, the behavior of programs that reference
uninitialized variables can be affected by the slightest change to
register assignments.

Not if the unoptimizing compiler initializes all variables to some
known value.

The problem with this solution is that there is at least 1 well-known
algorithmic trick that doesn't work in this case. It is a way to
initialize a large unbounded array efficiently, when you only use a
portion of the array in your algorithhm.

The issue under discussions was that the optimizations I mentioned
would change the register allocation, and that would result in changed
values for uninitialized variables and thus altered behaviour. To fix
that one actually does not need to initialize all variables (unlike
what I wrote), only variables that may be born in registers,
i.e. scalars. I don't think that's impractical.

However, it depends upon being able to not actually initialize the
array for the unused sections. You find a way to map the array such
that you can track the "highest" element used and when accessing the
array, you compare the element you are about to use with the highest
element initialized so far, and if it is beyond the bound, you
initialize enough elements to cover the element you are accessing and
update your tracker.

I don't know of any optimizers that will insert that kind of trick in
ones code.

That should be easy to implement, though. The hard part is deciding
when to use that vs. when to initialize everything up-front and avoid
the run-time overhead of checking the bound.

There is an even more extreme case where you don't even need (and
want) initialization when you access it, because you have another way
of validating the contents [briggs&torczon93]. However, AFAIK at that
data structure has not been used much (at least for this purpose),
because there are good alternatives that are less wasteful of memory.

Anyway, at least for the optimizations I mentioned it is not necessary
to initialize arrays.

And yes, as I mentioned, there are certain cases (like out-of-bounds
array accesses) where preservation of behaviour in all cases is
impractical, and the benefits derived from it are vanishingly small.
But still, that's a good ideal for optimizer writers to strive for,
even if they cannot achieve it in all cases.

So, if you have an algorithm that needs a trick like the above to meet
its performance promises, then the unoptimizing compiler breaks your
algorithm (and we don't have a well-known optimization technique that
fixes that breakage).

For the case you mention, always performing the on-demand
initialization in the compiler would preserve the asymptotic
complexity, but is probably impossible in general in a language like C
that has arbitrary pointer arithmetic.

If the array is really large, a good approach is to just allocate it
with mmap (the run-time system of the language could do this by
itself, if the programmer does not do it explicitly). Then the OS
will perform the on-demand initialization.

- anton

@Article{briggs&torczon93,
author = "Preston Briggs and Linda Torczon",
title = "An Efficient Representation for Sparse Sets",
journal = "ACM Letters on Programming Languages and Systems",
year = "1993",
volume = "2",
number = "1--4",
pages = "59--69",
annote = "Describes a set representation that is
asymptotically more
time-efficient than the classical bit-vector for
operations like \emph{clear-set} and \emph{forall},
but needs much more memory. The paper also presents
empirical data from micro-benchmarks that shows that
the \emph{member}, \emph{add-member} and
\emph{delete-member} operations are about three
times slower on an RS/6000 with the new
representation than with the bit-vector
representation. It also presents empirical data from
compilations of several routines, that shows that
the new representation significantly reduces the
register allocation time for these routines."
}
--
M. Anton Ertl
anton@xxxxxxxxxxxxxxxxxxxxxxxxxx
http://www.complang.tuwien.ac.at/anton/

.



Relevant Pages

  • Bounds checked arrays
    ... As everybody knows, the C language lacks ... When the state of this toggle is ON, the compiler ... Important is to know that the array updates ... We have just to allow him/her to specify what to do ...
    (comp.lang.c)
  • Re: Bounds checked arrays
    ... > more at each array access will not make any ... But for those applications which are real-time ... > When the state of this toggle is ON, the compiler ... The C language has a long-standing tradition ...
    (comp.lang.c)
  • Re: Bounds checked arrays
    ... > more at each array access will not make any ... > this problem in the C language to do their dirty ... > When the state of this toggle is ON, the compiler ... > We have just to allow him/her to specify what to do ...
    (comp.lang.c)
  • Re: compiler bugs
    ... While I hate disagreeing with you Anton, ... Not if the unoptimizing compiler initializes all variables to some ... portion of the array in your algorithhm. ... being able to not actually initialize the array for the unused ...
    (comp.compilers)
  • Re: Teaching new tricks to an old dog (C++ -->Ada)
    ... > attributes helps both the writer and the compiler. ... You can build algorithms around language provided ... > Maybe, if one gets used to template programming, the basis of the ... > Try telling someone that useful array attributes, ...
    (comp.lang.ada)

Loading