Re: YACD - level 40 power level 68



On Jul 28, 12:07 pm, camlost <joshua.middend...@xxxxxxxxxxxxx> wrote:
If there is trouble while waiting to exit self knowledge, then the
problem must be buried in the inkey function somewhere.

Well this is suspicious:

/* Hack -- get bored */
if (!Term->never_bored)
{
/* Process random events */
(void)Term_xtra(TERM_XTRA_BORED, 0);
}

It's in Term_inkey. I wonder if it b0rks if it happens while viewing a
temp file instead of while at the main game UI? This section starting
with "/* Hack" is definitely suggestive... :)

This leads by a convoluted path including function pointers to

Term_xtra_win_event(0)

on Windows (the only affected port so far as I am aware), and thus to

if (PeekMessage(&msg, NULL, 0, 0, PM_REMOVE))
{
TranslateMessage(&msg);
DispatchMessage(&msg);
}

There are only two occurrences of the substring PeekMessage in the
entire source directory (recursive) -- two calls in main-win.c, the
other being in Term_xtra_win_flush. So it must be a library call. And
it's not one familiar to me.

Indeed all three functions look to be Windoze API calls.

The actual waiting loop seems to be:

while (Term->key_head == Term->key_tail)
{
/* Process events (wait for one) */
(void)Term_xtra(TERM_XTRA_EVENT, TRUE);
}

in Term_inkey(). This leads to almost identical winapi calls:

if (GetMessage(&msg, NULL, 0, 0))
{
TranslateMessage(&msg);
DispatchMessage(&msg);
}

Problem being there's nothing here to suggest a problem, and
especially one localized to the specific case of being in a text
browser instead of the game proper.

Self-knowledge seems to go through something called pause_line to
display the prompt in the specific case of self-knowledge. This calls
inkey(ALLOW_CLICK). Leads right back to Term_inkey.

The main-sdl version of Term_xtra_foo_event seems to be similar to the
regular-windows version but with

if (SDL_WaitEvent(&event))
{
/* Handle it */
error = sdl_HandleEvent(&event);
}
else return (1);

for the blocking version and a nearly-identical PollEvent version for
nonblocking. If it's failing here it's an SDL bug. (I don't know
whether the people experiencing crashes are playing the SDL version or
not.)

Text-file viewing and help-viewing leads eventually to prt("[Press
Space to advance, direction keys to move, ESC for previous, '?' to
exit.]", hgt - 1, 0); which I believe is where Phil said he saw his
crashes in the "Sangband spontaneous exits" thread. That function call
merely displays the prompt; it doesn't actually wait for a keypress.
Right after it is this alarming code:

/* Hide the cursor */
inkey_cursor_hack[TERM_MAIN] = -1;

I wonder if this starts the bomb ticking? Then again it's only an
assignment statement with no function calls...

The actual waiting is once again in inkey(ALLOW_CLICK).


Upshot of all this: I can be fairly certain the abend is happening
inside either winapi GetMessage or inside SDL_WaitEvent. What causes
it only in these particular places when every single wait-for-a-key
goes through same remains a mystery but inkey_cursor_hack might have
something to do with it as it makes inkey do funny things before
calling Term_inkey.

A separate tack is to look for calls to "exit". And I immediately
found this in main-sdl.c:

/*
* Display error message and quit (see "z-util.c")
*/
static void hack_quit(cptr str)
{
/* Display nothing (because we don't have a surface yet) */
(void)str;

/* Shut down the TTF library */
TTF_Quit();

/* Shut down the SDL library */
SDL_Quit();

/* Exit */
exit(0);
}

That look like the smoking gun to you too? But it's supposed to only
be used before the main window is created. Still if it's getting
called when it shouldn't...

Further down is hook_quit which is also capable of causing a silent
exit from hook_quit(NULL); our second strong culprit candidate. A
little more diving and we're looking for quit(NULL) and somewhat
interested in something called plog().

The former leaves us right to the SIGQUIT handler which seems to have
a nasty hack to try to stop accidental quitting from miskeys. Problem
is there's a global variable for the number of times it was hit that
counts up endlessly and after some threshold it calls -- you guessed
it -- quit(NULL). A helpful comment snippet:

* To prevent messy accidents, we should reset this global variable
* whenever the user enters a keypress, or something like that.

But evidently they don't. My guess is that whenever it's at certain
prompts it sees phantom ^C or some such inputs that cause SIGINT or
SIGQUIT to get raised from time to time. The global variable creeps up
as you spend more time in a session, particularly in the help browser
or similar places, and eventually Sangband goes nuclear.

However, judging by the code it should only quit(NULL) if it thinks no
character is being player (e.g. at birth menu). Otherwise it should
actually kill the character(!) and exit the process with the word
"interrupt". I don't think this is what's happening; spontaneously
dead characters aren't mentioned in the complaints, and Phil's necro
doesn't seem to be dead in particular! It also tries to make warning
noises which nobody reports getting spuriously.

I think the SIGQUIT handler is not in fact the culprit here.

SIGKILL and SIGABRT have another handler that can quit(NULL) from the
birth menu and the like, but prints the infamous gruesome software bug
message. Nobody has reported this in connection with the bogus exits.

Other quit(NULL)s are all either during game load or character
creation ... except for one in a familiar function. One named
sdl_HandleEvent. It's apparently not an SDL library function after
all. It's in main_sdl.c and calls quit(NULL) if an SDL_QUIT event is
generated. This occurrence is however supposed to save the game to
judge by immediately preceding code and comments. It's probably the
behavior when the X in the game window is clicked, rather than the
crash people are reporting. The other quit(NULL) in main_sdl.c is the
result of a normal exit from play_game() in main().

The main-win port OTOH handles WM_QUIT with a plain quit(NULL) -- no
attempt to save the game. This is our prime suspect now.

None of Sangband's code actually can post a WM_QUIT (or even SDL_QUIT)
message (including via PostQuitMessage). So it may be an interaction
with a Windows bug generating a spurious WM_QUIT event. In fact it
almost has to be, since inkey() doesn't seem to be guilty of anything
and it happens during inkey(), meaning an event or signal handler is
to blame.








.