Re: HardBound and SoftBound (was "The State of Software")
- From: "Andy \"Krazy\" Glew" <ag-news@xxxxxxxxxxxxxxx>
- Date: Thu, 06 Aug 2009 21:36:33 -0700
nmm1@xxxxxxxxx wrote:
In article <4A7AEDC0.7050106@xxxxxxxxxxxxxxx>,
Andy \"Krazy\" Glew <"ag-news AT patten DASH my-last-name DOT net"> wrote:
Sigh. I will try to explain to you one last time what the problem
area is. Yes, OF COURSE, there are cases where checking is possible;
but being able to check only the cases that are trivially checkable
is not an interesting property, either theoretically or practically.
Sigh.
I will explain to you one last time that the problem is not whether a compiler can generate checks for all buffer overflows.
The question is what fraction of buffer overflows can be detected.
What is the fraction of true positives (real bugs, that can be turned ibto security holes) are detected?
What is the fraction of false positives (code that is not buggy, but that is incorrectly indicated as having a bug)?
It's numbers that matter here. Not pontification. And not worst case.
I hope that Milo Martin will continue his research, and be able to make such numbers public.
Consider an object like:>
typedef struct {int A[10]; double B[5]; int C[10];} P;
P *Q = malloc(sizeof(P));
memset(Q.A,0,sizeof(P));
fred(Q.A,(offsetof(P,C)-offsetof(P,A)));;
In another module, far, far away:
void fred (int *X, size_t Y) {
int *Z = memset((char *)X+Y,1,sizeof(int));
joe(Z);
}
You might like to say that is clearly illegal, or does not occur in
real code, but you would be wrong on both counts. In particular,
that construction is a CRITICAL part of the design of the INTERFACE
of the X Windowing System (and, if I recall, the Microsoft one, too).
So it CANNOT be changed without rewriting all code that uses them.
It is also used in a fair number of other codes.
> So what bounds does the checking software pass when fred is called?
By the way, in the last example of such code that I saw, it had been changed to use the address of the struct, not of the first element of the struct:
memset((int*)Q,0,sizeof(P));
fred((int*)Q,(offsetof(P,C)-offsetof(P,A)));;
which happens to make it all work hunky dory.
But I agree: in traditional C, the address of a struct is the address of the first member thereof. So having the compiler enforce bounds blindly leads to false positives. Steps that can be taken:
1) Rewrite the code, to take address of struct rather than address of first element. Compile appropriately. If you feel confident, advertize your code as "New! Improved! Better Security! Fewer Buffer Overflows!"
Note that it is not a major rewrite.
If there are too many such false positives to rewrite code, then you take steps such as the following:
0) Perhaps you compile all of your code without bounds checking. But then you deserve any loss of sales to a competitor who has changed the code, and/or loss of lawsuits to customers whose systems were broken into because of the flaw.
1) Perhaps you compile some of your code without bounds checking, but some of it with.
2) Perhaps the compiler adds idioms to detect address of first element of struct, and/or address of last (for that other classic, struct { int size; char data[1]; }).
Note that for this last changes to the language standard were made (struct { int size; char data[]; }), and more and more compilers do not allow data[1] code to be accessed past 1. Change happens... just slowly.
3) Perhaps the compiler or library vendor adds annotations or a database indicating what interfaces have what. Microsoft already has such an annotation system. It is used extensively, particularly for standard libraries like the windowing system. However, we cannot rely on new programmers always doing the right thing.
Annotations systems are a crutch.
For important, widely used, libraries like the X Windows systems, annotations can be added - *because* they are important and widely used, and the effort is paid back.
If a library is not important and not widely used - well, I still care, because bugs find such code to break in. But, if it is not widely used, then by definition there won't be too many false positives.
It is numbers that matter here.
If that problem were limited to 'leaf' functions like memset, the
problem could be hacked up by eliminating checks in those or even
by passing BOTH sets of bounds and using the 'whole allocation'
ones for such uses. But it isn't. In many important programs,
such tricks are used to create a pointer which is then passed on.
So what bounds does the checking software pass when joe is called?
This has an unambiguous answer: the bounds of X.
void fred (int *X, size_t Y) {
int *Z = memset((char *)X+Y,1,sizeof(int));
joe(Z);
}
fred is called with pointer argument X, with associated bounds.
memset is called with pointer argument X+Y, which is derived from X, and hence has the bounds of X. There has been no narrowing operation.
memset returns its first argument. There is no narrowing.
Hence memset returns the bounds of X, and Z is assigned the bounds of X, and joe is called with the bounds of X.
The question is not how to propagate bounds from function parameters, across computations, to other subfunction parameters.
The question is what bounds to associate with a pointer initially. Perhaps with special pleading for the first time a pointer to an object is passed as a parameter.
This is as it should be. After all, it is (or should be) part of the semantics of a function how much memory it can tread on.
Nick and Wilco (and others) are correct in saying that C and C++ do not provide mechanisms that allow the programmer to precisely specify how much memory is modified.
For that matter, neither C nor C++ provide a mechanism that allows a compiler to figure out precisely what can be placed in ROM, and what must be placed in RAM. Nevertheless, we have tools that do this.
It would be nice if C and C++ made a distinction between a pointer to a scalar of type T, and a pointer into an array of type T. Milo and I have discussed language extensions that might do this - notations such as T&& ptr_to_scalar_of_T, instead of T* ptr_into_array_of_T etc. I am afraid that we do not have a perfect solution. (Actually, C++ comes close, with T& ref_to_scalar_of_T.
--
The content of this message is my personal opinion only.
Although I am an employee - currently of Intel,
in the past of other computer companies such as AMD, Motorola, and Gould
- I reveal this only so that the reader may account
for any possible bias I may have towards my employer's products.
The statements I make here in no way represent my employer's position,
nor am I authorized to speak on behalf of my employer.
In fact, this posting may not even represent my personal opinion,
since occasionally I play devil's advocate.
.
- Follow-Ups:
- References:
- HardBound and SoftBound (was "The State of Software")
- From: Andy \"Krazy\" Glew
- Re: HardBound and SoftBound (was "The State of Software")
- From: Andy \"Krazy\" Glew
- Re: HardBound and SoftBound (was "The State of Software")
- From: nmm1
- Re: HardBound and SoftBound (was "The State of Software")
- From: Andy \"Krazy\" Glew
- Re: HardBound and SoftBound (was "The State of Software")
- From: nmm1
- HardBound and SoftBound (was "The State of Software")
- Prev by Date: Re: HardBound and SoftBound (was "The State of Software")
- Next by Date: Free Penny Stocks Newsletter!!!
- Previous by thread: Re: HardBound and SoftBound (was "The State of Software")
- Next by thread: Re: HardBound and SoftBound (was "The State of Software")
- Index(es):
Relevant Pages
|
Loading