Re: Secure C library



P.J. Plauger wrote:
> "David Wagner" <daw@xxxxxxxxxxxxxxxxxxxxxxxx> wrote:
> > In general, the interface doesn't seem like it has been designed; it
> > seems like someone just took the existing string-related functions and
> > tweaked them with a max str len. But some of the old unbounds-checked
> > string functions don't make much sense once you add bounds-checking
> > (if they ever did). It seems like if one had put some thought into
> > designing an interface just for the purpose of reducing the frequency
> > of security bugs, and studied the history of problems in the current
> > interfaces, one could do better.
>
> Well, that's in fact exactly what Microsoft did. They came up with a
> library that could be incrementally retrofit to existing C code to
> make buffer size decisions more visible, in response to their extensive
> data about the commonest sources of bugs and security gaffes. They
> then spent a couple of years refining the library *and converting
> millions of lines of C code to use it*.

You are looking to Microsoft for leadership on making code safer? Even
if you believe they knew something, Microsoft has only *very recently*
paid serious attention to security. They are relative newcomers.

The fact that they can convert 7 million lines of code, is a testament
to the amount of proprietary code they own, and their ability to push a
matra throughout their corporation, but I can't tell whether this is
being driven by a PR message or by a truly deep technical motivation.

By comparison, from what I know, David has been really looking very
closely at security for about a decade (remember the Netscape random
number generator exploit? I believe he was somehow involved.) So
amongst security experts (that don't work for Microsoft), what kind of
feedback are you getting for this proposal? I'm not an expert, but my
reaction to the proposal itself is basically identical to David's.

> Maybe an opaque buffer type would be safer in the end, but it's a
> bigger jump from existing C practice.

Ok, first of all the "smaller change" (TR 24731) will give you a
smaller benefit. The buffer overflow issue in this proposal relies on
an accurate synchronization of the buffer size with the extra rsize_t
parameter.

As we all know buffer overflows themselves come from lazy, or just poor
programming in general. One simple *lazy* thing that a programmer can
do is pass RSIZE_MAX as the rsize_t parameter, and just proceed to use
the library as they were using the original C functions. Of course
this defeats the whole point, and puts you back to where you started.
And none of this speaks to the endless kinds of "cut-and-paste" errors
that can cause missynchronizations between the lengths. If the buffer
comes from strange sources, or has been malloced to some dynamic
length, then using these new library functions still require real grey
cells to move towards anyways.

David's proposal has a lot of merit to it -- the size of the buffer is
associated with the buffer, so why not just stick the two in struct?
So rather than passing around pointers to strings, we can pass "length
prefixed" buffers. An interesting way of declaring a buffer containing
a constant string, for example:

#define bufferStrConst(s) {sizeof ("" s ""), "" s ""}
/* e.g., buf_t x = bufferStrConst ("Hello, World"); */

Or create a writable buffer:

#define bufferAlloc(var,n) {(var).size = n; (var).content = malloc
(n);}

Otherwise, you would use functions with a similar syntax to the C
library. The enhanced functions would check the buffer lengths, and if
you allowed/desired, you could "realloc" the buffer lengths to make
them grow to match the necessary data lengths. Or perhaps the lengths
could just be cut off with the available length -- maybe there could
actually be both kinds of functions.

> [...] That raises the cost of
> retrofitting code, and lowers the chance that subgroups will budget
> to do it.

I'm not sure David's approach is actually a *bigger jump*, since he
doesn't need to add in extra parameters synchronized to some other
magical value in the source. The only problems are that pointer
arithmetic doesn't really work the same any more, and something needs
to be done about constant strings being unresizable/non-modifiable.
But at least this is a credible try.

> [...] Maybe you can come up with a better design than Microsoft's,
> but they at least did this experiment on a rather large scale and
> then saw fit to share their results with the rest of us.

Right, so you are saying that as a barrier to entry, anyone who has an
idea has to get themselves 7 million lines of code ported to it?

I don't know anything about the process of getting something proposed
to the ISO C standard, and I have no idea how receptive the committee
is, but I've had a solution to the *string* problem for some time now:

http://bstring.sf.net/ ("The Better String Library" / "Bstrlib")

This actually *does* require people to change their code. But the
costs are not what you think. Because the library is very easy to use
and is extremely functional, code size and complexity generally
decreases. Bstrlib is also very interoperable with char *'s, so
porting can be quite incremental. So the cost of the "big jump" is
actually not that severe.

Of course its very safe, very fast (I've got benchies!), easy to
understand and highly functional. Bstrlib is a real solution to the
buffer overflow problem, which is a superset of what David's idea
covers (Bstrlib blocks modification of constant strings, has a superior
alternative to pointer arithmetic, supports write protection, has
utilities for secure input, etc).

Now of course, I don't have 7 million lines of code ported to it. If
there are 100K lines, I'd be kind of surprised (but not *that*
surprised -- the people that use the library have a bad habit of not
telling me about it!) I can tell you that I haven't had any negative
feedback from anyone who's actually used it.

And my idea of how Bstrlib should be specified may seem incongruous to
how the rest of the C language library is specified (the source should
be available, no combination of independently legal parameters passed
to a Bstrlib function may lead to undefined behavior, several modules
are optional, parameters may alias in the C99 sense, etc.) Also, I
don't attempt to solve some of the re-entrant problems that the TR
24731 proposal does.

So lets compare the most simple examples:

/* Bstrlib */
err = bstrcat (p = bfromcstr ("Hello,"), q = bfromcstr (" World"));
bdestroy (q);

/* TR 24731 */
err = -1; /* Some to indicate malloc failure */
if (NULL != (p = malloc (1000))) {
err = strcpy_s (p, 1000, "Hello,") ||
strcat_s (p, 1000, " World");
}

The two code fragments are functionally very similar. The idea, is
that after either fragment you can check p against NULL and the value
of err to determine whether or not an error occurred.

What you should notice is that the TR 24731 solution requires explicit
flow control (the if, and the ||) in order to capture errors, and halt
further processing so that errors don't compound, or lead to UB. There
is an ugly sort of err = -1; at the beginning, since returning NULL
from malloc is an error for us, but we don't really know what error
number to assign it. We can see how the 1000 value has to basically be
replicated by hand redundantly; the value is also kind of arbitrary --
just some number that we expect will let the program work.

The Bstrlib version needs very little comment. Bstrlib detects NULL as
possible outputs, so failure accumulation is not a problem, and in fact
is a feature leveraged here quite effectively here. Notice how magical
size values are nowhere to be seen. (But at the same time because
since Bstrlib has a deterministic behavior, if we care, we can know
that barring errors, p will have a buffer length of 16.)

Because the code size and complexity generally decreases, when using
Bstrlib, bstrings feel much more like primitive data types, which means
programmer productivity rises to levels more similar to higher level
languages.

I've been told twice to consider the possibility of submitting Bstrlib
to the ISO standard. I wouldn't ordinarily bother, except it looks to
me like ISO is going to go down a really suboptimal path with this TR
24731 proposal. Is this something I can/should do? Or do I need to
port 7 million lines of code to it first?

----

But on the issue of straight security, no more will be accomplished by
adding secure paths, than can be accomplished by removing insecure
paths. Besides the gets() function, there are a number of
non-reentrant functions that you are trying to move programmers away
from with TR 24731. Is there going to be consideration for deprecating
those functions? You can built up a star wars defence shield, or you
can just get rid of the nukes.

(The tmpnam(_s) issue that David brings up sounds complicated. To make
sure there's no window in which an asynchronous action between fclose()
and unlink() could take place, it means you have to give tmpnam(_s)
sematics close to tmpfile(_s), where the file is deleted upon closing
(or in Unix, its deleted as its opened.) But that changes it
fundamentally, and so I don't know exactly what semantics are desired
from tmpnam(_s). To know the file name, you are practically giving up
on atomicity of the file over the lifetime of the program anyways -- so
why isn't tmpfile(_s) really what you want? I.e., if a tmpfile_ex()
function were designed that also popped out a filename, how would you
ever use that filename given that the file will be deleted before you
can do anything useful with that filename? I think, perhaps, I don't
understand the issue.)

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

.



Relevant Pages

  • Re: Discovering variable types...
    ... >- but I suppose MS expect us to use wrappers ... memory allocations for your variables from disk as well. ... >They most certainly are of fixed size, changing the size of a String ... >>me to keep buffer size and current postion right in the memory block. ...
    (comp.lang.pascal.delphi.misc)
  • Re: Secure C library
    ... I read much of the new "security TR", and gee, I don't know. ... the buffer from the buffer size. ... It is not hard to design a better form of buffer and string handling. ... but this is just one example of how thoughtful interface design can ...
    (comp.std.c)
  • Re: why I can not write to the file after initialize the MFC in a service program
    ... you check EVERY return from a call that can fail, ... Why do you need an intermedate buffer to write literal strings anyway? ... For example, if AfxWinInit fails, you copy a 45-character string into a ... So you are going to try to initialize MFC EACH TIME THROUGH THE LOOP? ...
    (microsoft.public.vc.mfc)
  • Re: Calling dll functions from vb.net with pointer returns!
    ... (ByRef pulLen As Integer, ByVal pszFilter As String, ByVal ulFlags As ... OUT PTCHAR Buffer, ... Address of a buffer to receive a set of NULL-terminated device instance ... pszFilter must specify the name of a device ...
    (microsoft.public.dotnet.languages.vb)
  • Re: Hash functions (was: Maximum String size in Java?)
    ... Bstrlib is *NOT* suspect. ... > your string library has problems with them speaks for itself. ... some kind of corruption of the raw source file. ... occurred to me that the corruption was truly in the source file itself. ...
    (comp.programming)

Quantcast