Re: HardBound and SoftBound



nmm1@xxxxxxxxx writes:
Thirdly, I cannot make head or tail of what you believe HardBound
and SoftBound specify is legal C, and so what they check.

I'm going to take another stab at briefly describing the semantics that technologies such as HardBound and SoftBound enable, in their tightest checking modes, for programming languages such as C and C++.

(This somewhat diffident phraseology is intended to emphasize that technologies such as HardBound and SoftBound do not specify the programming language semantics; they just implement a semantics that the researchers believe is either compatible with the official semantics, or compatible with what the vast majority of users expect. Also, to emphasize that, as others have pointed out, HardBound is a hardware technology, and SoftBound a compiler technology, that can be used to implement several different levels of checking. E.g. several different levels of checking for C and C++, and/or different languages.)

(I had hoped that my earlier post on the subject could be understood by Nick, since I said that I was using an option that Nick himself had proposed (but then rejected), with modifications. Nevertheless, I will try again. If Nick cannot understand this, well, as he says "the mind boggles".)


Objects are associated with an "extent".

By "objects" I mean non-register storage class (is "register" still supported?) primitive data types (char, int, etc.) as well as structs and arrays. And, in C++, objects of class datatype.

For an object O with char* A = (char*)&O,
the "extent" is defined to be the interval [lwb,upb),
where lwb = A = (char*)&O,
and upb = A+sizeof(O) = ((char*)&O)+sizeof(O).

Pointers to objects are similarly associated with "extents".
Pointer extents may be said to have three sorts of values:
a) an interval [lwb,upb), such as defined above,
which will be used in bounds checking
b) "unbounded", which will be used to pass (disable)
all bounds checking
c) "inaccessible", which fails all bounds checking
The "unbounded" and "invalid" values for the extents may be understood as being equivalent to unbounded=[lwb=-infinity,upb=+infinity), and inaccessible=[0,0) or some other extent that has no interior. However, it may be convenient to not use such equivalents - e.g. if the address space is not linear.

In general, pointer extents are associated with pointer expressions (and subexpressions), as well as with pointer variables and constants and parameters.

---++ Checking an Extent on Memory References

Pointers may be dereferenced using expressions such as
ptr->field
*ptr
ptr[index]
etc.

These expressions all have an extent, calculated as explained in subsequent sections. Let us call this [lwb,upb).

In addition, these expressions all have a base address and a size.

Let us call these all "deref". Such a derefer has a base address and a size:

e.g. the base address (char*)&(deref)
(char*)&(ptr->field)
(char*)&(*ptr)
(char*)&(ptr[index])

e.g. the size sizeof(deref)
sizeof(ptr->field)
sizeof(*ptr)
sizeof(ptr[index])

Then a "in-bounds check" is the result of the following computation, "cond":

bool cond = true;
for(int i = 0, (char*)p=(char*)&(deref); i < sizeof(deref); i++ ) {
if( lwb <= (p+i) && (p+i) < ubp ) {
// okay
} else {
// out of bounds
cond = false;
break
}
}

Any memory dereference whose "in-bounds check" is false is illegal.

Present definitions of languages such as C or C++ do not specify what should happen on such an illegal memory dereference. Behavior could include
a) doing nothing (what most systems do nowadays)
b) killing the program with a signal or abort or core dump
c) throwing an exception
d) doing nothing, except incrementing a variable and or continuing execution.

It is expected that programming languages designed to support tecghnologies such as HardBound or SoftBound may eventually specify exactly what exceptions to throw. But legacy C and C++ do not.

---++ Taking address of, &object or &object.field or &(ptr->field)

Pointer extents are created when the address of an object is taken via an address-of, &:

E.g. assigning to a pointer "ptr" for a global object "obj" of type T:

T obj;
T* ptr = &obj

ptr's extent will be the same as the extent of obj, i.e. [lwb,ubp)
where lwb=(char*)&obj, and upb=((char*)&obj)+sizeof(T).

Similarly when taking the address of a local variable (auto).

---+++ Arrays

Similarly when taking the address of an array (whether global or local):

T obj[n];
T* ptr = &obj

ptr's extent will be the same as the extent of obj, i.e. [lwb,ubp)
where lwb=(char*)&obj, and upb=((char*)&obj)+sizeof(obj).

However, when taking the address of an array element, the pointer extent shall be the extent of the entire array:

T obj[n];
T* ptr = &(obj[i])

ptr's extent will be the same as the extent of obj, i.e. [lwb,ubp)
where lwb=(char*)&obj, and upb=((char*)&obj)+sizeof(obj).

---+++ Fields

Similarly when taking the address of a structure field, whether the structure is global or local:

struct S { ... T field; ... };
S obj;
T* ptr = &(obj.field);

ptr's extent will be the same as the extent of obj.field, i.e. [lwb,ubp)
where lwb=(char*)&(obj.field), and upb=((char*)&(obj.field))+sizeof(obj.field).

However, when taking the address of a field accessed through a pointer, the resulting extent is the intersection of the accessing pointer's extent and the field's extent:

struct S { ... T field; ... };
S obj;
...
S* sptr = &obj;
...
T* fptr = &(sptr->field);

fptr's extent will be the intersection of the field extent,
i.e. [lwb,ubp)
where lwb=(char*)&(f->field),
and upb=((char*)&(f->field))+sizeof(f->field).
and the accessing pointer's extent
i.e. [lwb',ubp')
where lwb'=(char*)&(obj),
and upb=((char*)&(obj))+sizeof(S).

Although in this example the intersection is straightforward - the field is completely within the object - in other situations, particularly in the presence of casts, the pointer extent and the field extent may overlap, without being nested. Or they may not overlap at all.

---+++ malloc (and new)

The extent of the pointer returned by a successful malloc, e.g.
char* p = malloc(size)
is defined to be [lwb=p, upb=p+size).

---+++ expressions

Pointer extents are propagated through pointer value expressions, but are never changed.

E.g. the extent for ptr_out is the same as for ptr_in in

ptr_out = ptr_in + int_val;
ptr_out = ptr_in - int_val;

Pointer extents are not associated with non-pointer expressions

ptrdiff_t diff = ptr_in1 - ptr_in2

(NOTE: this means that, strictly speaking, bit operations on pointers, such as rounding them to a cache line boundary, might be considered to "strip" a pointer of its extent, e.g.

T* ptr_maybe_unaligned;
T* ptr_aligned = ((unsigned long long)(ptr_maybe_unaligned)
&~0x03Full);

Such idioms arise because languages, at least in the past, had no way of expressing such logic operations on pointers (and/or because the "official" way of expressing such operations, e.g.

T* ptr_maybe_unaligned;
ptrdiff_t delta = ((unsigned long long)(ptr_maybe_unaligned)
&0x03Full);
T* ptr_aligned = (T)(
(char*)(ptr_maybe_unaligned)
- delta
);

was clumsy.

It is expected that compilers for technologies such as HardBound and SoftBound will implement optional checking modes that propagate extents through such expressions - i.e. checking modes that are stronger than the language definition, strictly speaking, prescribes.)

---+++ casts

Casts between pointer types do not affect extents.

(Although here I think Nick's suggestion of "narrowing" on casts to arrays of specified size would be a good one.)

---+++ pointer parameters

Passing pointer parameters does not affect extents.

(Ditto wrt sized arrays.)

---+++ Library functions

It will be necessary to specify how library functions such as memcpy affect extents.

E.g.
void * memcpy ( void * destination, const void * source, size_t num );

The return value is a pointer.
The return value pointer is the destination pointer.
The return value pointer extent is the destination pointer extent.

num bytes are copied from source to destination.

Any pointers that are entirely within the source region have their associated extents copied.


---++ Summary - Narrowing vs. Widening

One might summarize these rules as "pointer extents are only narrowed, not widened".

One can imagine language extensions that allow pointer extents to be widened. e.g. widen( [lwb1,upb1), [lwb2,upb2) ) where upb1 = lwb2
=> [lwb1,upb2)



---+ Conclusion

This is a fairly precise, although somewhat verbose, specification of the programming language semantics a compiler taking advantage of a technology such as HardBound or SoftBound may implement, for a language such as C or C++.

The purpose of this post is not to be a formal spec. That would be egven longer and more verbose.





--
The content of this message is my personal opinion only.
Although I am an employee - currently of Intel,
in the past of other computer companies such as AMD, Motorola, and Gould
- I reveal this only so that the reader may account
for any possible bias I may have towards my employer's products.
The statements I make here in no way represent my employer's position,
nor am I authorized to speak on behalf of my employer.

In fact, this posting may not even represent my personal opinion,
since occasionally I play devil's advocate.
.



Relevant Pages

  • Re: arithmetic on a void * pointer
    ... Using norms from the realm of natural languages isn't necessarily ... the sizeof a void pointer is the sizeof the smallest unit of memory. ... It would be nice to think, however, that both compiler writers *and* ... here the Register. ...
    (comp.lang.c)
  • Re: memory allocation questions (newbie)
    ... I think this means something like "a pointer that is ... prepared to point to a memory location big enough to hold an int." ... I think having multiple exit points is not a good idea. ... experience writing programs in high-level languages. ...
    (comp.lang.c)
  • Re: why cannot assign to function call
    ... hypothetical C-like languages, ... sizeof business would still indicate that a pointer was being passed. ... talk about variables of an array type. ... the earlier version of the standard didn't have numbered ...
    (comp.lang.python)
  • RE: Programming language for children
    ... marc said: ... object references and that'll take you beyond the edge of the known ... java, unless they are trying to perform pointer arithmetic on an array:-P, ... Given experience with these languages, ...
    (Ubuntu)
  • Re: Malcolms new book - Chapter 1 review
    ... user of the library function could decide on some suitable upper ... pointer to the next input line. ... attempt to allocate an arbitrarily large amount of memory. ... lets you specify a limit. ...
    (comp.lang.c)