Buffer overflows and asctime()



A buffer overflow in the C standard document
--------------------------------------------

Examples of sloppy programming abound, but it was for me a surprise to discover that the C standard itself propagates this same kind of “who cares?” attitude.

In the official C standard of 1999 we find the specifications of the “asctime” function, page 341:

char *asctime(const struct tm *timeptr)
{
static const char wday_name[7][3] = {
"Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"
};
static const char mon_name[12][3] = {
"Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"
};
static char result[26];
sprintf(result, "%.3s %.3s%3d %.2d:%.2d:%.2d %d\n",
wday_name[timeptr->tm_wday],
mon_name[timeptr->tm_mon],
timeptr->tm_mday, timeptr->tm_hour,
timeptr->tm_min, timeptr->tm_sec,
1900 + timeptr->tm_year);
return result;
}

This function is supposed to output a character string of 26 positions at most, including the terminating zero. If we count the length indicated by the format directives we arrive at 25:
o 3 for the day of the week +
o 1 space +
o 3 for the month +
o 3 for the day in the month +
o 1 space +
o 8 for the hour in hh:mm:ss format +
o 1 space +
o 4 for the year +
o 1 newline.
It makes 25 characters, and taking into account the terminating zero, the calculation seems OK.

But it is not.

The problem is that the format %d of the printf specification doesn’t allow for a maximum size. When you write %.3d it means that at least 3 characters will be output, but it could be much more if, for instance, the input is bigger than 999.
In that case, the buffer allocated for asctime is too small to contain the printf result, and a buffer overflow will happen. The consequences of that overflow will change from system to system, depending on what data is stored beyond that buffer, the alignment used for character strings, etc. If, for instance, a function pointer is stored just beyond the buffer the consequences could be catastrophic in the sense that it would be very difficult to trace where the error is happening.

Getting rid of buffer overflows
-------------------------------
How much buffer space we would need to protect asctime from buffer overflows in the worst case?
This is very easy to calculate. We know that in all cases, %d can't output more characters than the maximum numbers of characters an integer can hold. This is INT_MAX, and taking into account the possible negative sign we know that:
Number of digits N = 1 + ceil(log10((double)INT_MAX));
For a 32 bit system this is 11, for a 64 bit system this is 21.
In the asctime specification there are 5 %d format specifications, meaning that we have as the size for the buffer the expression:
26+5*N bytes
In a 32 bit system this is 26+55=81.
This is a worst case oversized buffer, since we have already counted some of those digits in the original calculation, where we have allowed for 3+2+2+2+4 = 13 characters for the digits. A tighter calculation can be done like this:
Number of characters besides specifications (%d or %s) in the string: 6.
Number of %d specs 5
Total = 6+5*11 = 61 + terminating zero 62.
The correct buffer size for a 32 bit system is 62.

Buffer overflows are not inevitable.
-----------------------------------
As we have seen above, a bit of reflection and some easy calculations allows us to determine how much we need exactly. Most buffer overflows come from people not doing those calculations because “C is full of buffer overflows anyway” attitude. What is absolutely incredible however, is to find a buffer overflow in the text of the last official C standard.
An international standard should present code that is 100% right. There are so few lines of code in the standard, that making those lines “buffer overflow free” is not a daunting task by any means.

The attitude of the committee
-----------------------------
I am not the first one to point out this problem. In a “Defect Report” filed in 2001, Clive Feather proposed to fix it. The answer of the committee was that if any of the members of the input argument was out of range this was “undefined behavior”, and anything was permitted, including corrupting memory.
Corrupting memory provokes bugs almost impossible to trace. The symptoms vary, and the bug can appear and disappear almost at random, depending on the linker and what data was exactly beyond the overflowed buffer. This means that the asctime function can’t be relied upon. Any small mistake like passing it an uninitialized variable will provoke no immediate crash but a crash later, in some completely unrelated program point.

---------------------------------------------------------------
Defect Report #217
Submitter: Clive Feather (UK)
Submission Date: 2000-04-04
Reference Document: N/A
Version: 1.3
Date: 2001-09-18 15:51:36
Subject: asctime limits

Summary
The definition of the asctime function involves a sprintf call writing into a buffer of size 26. This call will have undefined behavior if the year being represented falls outside the range [-999, 9999]. Since applications may have relied on the size of 26, this should not be corrected by allowing the implementation to generate a longer string. This is a defect because the specification is not self-consistent and does not restrict the domain of the argument.

Suggested Technical Corrigendum
Append to 7.23.3.1[#2]:
except that if the value of timeptr->tm_year is outside the range [-2899, 8099] (and thus the represented year will not fit into four characters) it is replaced by up to 4 implementation-defined characters.
-----------------------------------------------------------------------------------------------
Committee Response
From 7.1.4 paragraph 1:
If an argument to a function has an invalid value (such as a value outside the domain of the function, or a pointer outside the address space of the program, or a null pointer, or a pointer to non-modifiable storage when the corresponding parameter is not const-qualified) or a type (after promotion) not expected by a function with variable number of arguments, the behavior is undefined.
Thus, asctime() may exhibit undefined behavior if any of the members of timeptr produce undefined behavior in the sample algorithm (for example, if the timeptr->tm_wday is outside the range 0 to 6 the function may index beyond the end of an array).
As always, the range of undefined behavior permitted includes:
Corrupting memory
Aborting the program
Range checking the argument and returning a failure indicator (e.g., a null pointer)
Returning truncated results within the traditional 26 byte buffer.
There is no consensus to make the suggested change or any change along this line.


Conclusion:
-----------

This decision of the standard committee must be reconsidered. It is
very simple to just test the input values and write a sequence of
'*' if there is an overflow. The buffer length could remain the
same.

--
jacob navia
jacob at jacob point remcomp point fr
logiciels/informatique
http://www.cs.virginia.edu/~lcc-win32
.



Relevant Pages