Re: Writing a unittest against thread (un)safe ref-counter pointer



Hi!

Henrik Goldman schrieb:
Recently I had a bug report on our software which lead to 14 days of frustrations before the bug was discovered.
It turned out to be a thread (un)safe ref-counter pointer which would eventually do a double-delete.

So as a programmer I wanted to reproduce this problem as a unit-test to avoid these kind of situations in future.
However it turned out to be harder than imagined.
In a simplistic scenario I simply cannot reproduce it anymore.

There is no way to create test cases that are guaranteed to fail on undefined behaviour. It's in the nature of undefined behaviour that it may /not/ fail too.

However, as Dmitriy said, there are mostly ways to increase the probability of the fail case. But as long as they are intrusive (as in the example), they won't help you much.

In this case creating some tenths of threads which randomly create, copy and discard references in a common array of references for a while will maybe do the job. If your memory is clean afterwards and the program did not crash it is likely that your reference counter is clean.

In your case two thread switches have to occur close together, because once m_p is NULL the second delete turns into a no-op. So more helpfull than a test case is maybe a debug build that checks m_p for to be not NULL before the delete (if this is a valid assumption in your case).

In fact, I would not create a test case for exactly this problem at all now, because it is really unlikely that exactly this happens again.
I have not found anything better than experience and much care for now to avoid synchronization issues in core libraries of mutli-threaded applications. Test cases like the above can contribute, but they will only check for some faults and are expensive to build. Another method is a code review of a second (experienced) programmer that is only marginally involved in the project. I find almost everytime some potential synchronization issues in code at such reviews. Even in code that is productive since years.
A clean code desing that decouples synchronization and resource management as far as possible form the application logic is helpful too. If the critical parts are in a view small but heavy used classes, bugs are more likely to be found soon. Using tested implementations like the boost libraries are good advises too. Documenting the thread-safety of each class (and sometimes method) is required too. Debug builds might use assertions to check for some of these requirements.


Marcel
.



Relevant Pages

  • Re: 2.6.19 file content corruption on ext3
    ... Now I'm not exactly sure how ext3 filesystems make use ... never going to clean the page in the general case _anyway_. ... buffers being clean actually say something. ... Trying to mix it in just caused a bug that _everybody_ agrees is a bug. ...
    (Linux-Kernel)
  • Re: Preempt-RT patch for 2.6.25
    ... to fix it up - in general it's much easier to fix a bug than to talk ... how significant a bug is. ... clean up the mess" stance, combined with an aggressive, uncompromising, ... demanding tone towards the maintainers of a project wont get you very ...
    (Linux-Kernel)
  • Re: [PATCH] kbuild: Disallow GCC 4.1.0 / 4.1.1
    ... but still not clean enough. ... Subject: kbuild: Disallow GCC 4.1.0 / 4.1.1 ... GCC 4.1.0 and 4.1.1 has a bug that can miscompile __weak symbols, ... not allow these compilers (which are quite rare these days, ...
    (Linux-Kernel)
  • Re: If 1 = 0 Then .... Serious bug in .NET?!
    ... A "Clean Solution" would have also fixed this. ... turn Option Strict On and Option Compare On as VB defaults as well as ensuring these two lines are the first lines to appear in all your VB source files. ... the bug you ran into is actually a flaw in VB itself when Option Strict is not being enforced. ... If you do, however, manage to positively identify the cause of such corruption, I suggest that you rush out and buy lots of lottery tickets as you will be one lucky bunny:) ...
    (microsoft.public.dotnet.general)
  • Re: Adding hyphens and underscore to my JS expression test
    ... > in Mozilla which includes other characters from the Unicode repertoire. ... It appears that this bug was fixed in Mozilla/5.0 release version 1.8 ... > so I decided to avoid the hassle in my previous post. ... crossposting so without Followup-To is even worse. ...
    (comp.lang.javascript)