Re: MVS 4 minute 'outage'



While we wait for the EREP, it might be fun to look at the presented evidence and do some speculating.

I don’t think a single task can do this, even at a FF priority. There are interrupts and a task under the control of the dispatcher can't own the whole box (even a uni processor) exclusively for that long.

That would point to something deeper. I like the spin loop scenario. Some task grabbed a spin lock and the hardware had to step in to break its grip.

A hole in that logic is the complete absence of recognizable SYSLOG messages both before and after the event. Another path is that anytime the hardware takes some drastic action, there is almost always a 'phone home' event. I'd want to look at the HMC/SE logs, and call the support center to get their perspective.

If it were my call, I'd take the shop to sev 2 right now. If I don't have a satisfactory explanation in short order, then I'd up the bar to sev 1 (nobody goes home) and have everyone start thinking about having to pull a DR trigger. I would have to assume that the next time the box went to sleep it may not wake up.

Just my $0.02 US (before Taxes)


-----Original Message-----
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@xxxxxxxxxxx] On Behalf Of Eric Bielefeld
Sent: Wednesday, January 07, 2009 12:06 PM
To: IBM-MAIN@xxxxxxxxxxx
Subject: Re: MVS 4 minute 'outage'

I would suggest that some very high priority task got in a loop - something that runs at dispatching priority x'FF'. But then if it were truly in a loop, why did it stop after 4 minutes?

Do you just have 1 processor?

Good luck in finding the problem. Let us know if it happens again. I'm sure you have IBM involved - that sure sounds like a Sev 1 to me.

Eric

---- JE Thinnes <jethinnes@xxxxxxx> wrote:
We just experienced a 4 minute 'outage' on our z/OS system. (single image
z/OS 1.9 system).

By 'outage', I mean we could not communicate with MVS through TSO or the
z/OS consoles. There is a 4 minute gap in SYSLOG. The same for CICS, IMS
and DB2 logs.

There were no system dumps or other indicators.

We reviewed SYSLOG for the 15 minutes that preceeded the 'outage' and did
not find anything. TMONMVS had a 4 minute gap in the collector during
the 'outage'.

Any suggestions how we can determine what happened?--

Eric Bielefeld
Systems Programmer
Washington University
St Louis, Missouri
314-935-3418

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to listserv@xxxxxxxxxxx with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html
NOTICE: This electronic mail message and any files transmitted with it are intended
exclusively for the individual or entity to which it is addressed. The message,
together with any attachment, may contain confidential and/or privileged information.
Any unauthorized review, use, printing, saving, copying, disclosure or distribution
is strictly prohibited. If you have received this message in error, please
immediately advise the sender by reply email and delete all copies.

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to listserv@xxxxxxxxxxx with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html
.



Relevant Pages

  • Re: Wiseman and McGhie are Ranting Again
    ... of which is the clock tick that expires a time slice interval. ... > Then we can start playing around with Thread Priority. ... I think you are comparing pre-emptive and co-operative scheduling. ... then falls quiet and calls its Idle Loop. ...
    (microsoft.public.mac.office.word)
  • Re: Why RosAsm Breaks on a large number of symbols
    ... Are you sure you have not fallen off your pirch again? ... highers priority to the app with that loop you lock up the box. ... Where you have commented about the installation speed of the MASM32 ...
    (alt.lang.asm)
  • Re: Sorting a Dictionary by Property
    ... through that array referencing each dictionary object by key/value one ... Loop through each object in an order based on priority. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Crash
    ... an arbitrary shell command to the set of tagged images. ... some risk, if you can do smaller batches thats great, better still to loop ... High priority ... Now if you were to do one of the shell key combinations, the response will ...
    (alt.os.linux.suse)
  • Re: WaveIn notification methods
    ... If you have a dedicated thread with a message loop, ... because Sleep will always give you its latency). ... execution. ... blocks, no matter what its priority is, the system is free to do other ...
    (microsoft.public.win32.programmer.mmedia)

Loading