Re: MVS 4 minute 'outage'
- From: HMerritt@xxxxxxxxxxxxx (Hal Merritt)
- Date: 7 Jan 2009 10:38:36 -0800
While we wait for the EREP, it might be fun to look at the presented evidence and do some speculating.
I don’t think a single task can do this, even at a FF priority. There are interrupts and a task under the control of the dispatcher can't own the whole box (even a uni processor) exclusively for that long.
That would point to something deeper. I like the spin loop scenario. Some task grabbed a spin lock and the hardware had to step in to break its grip.
A hole in that logic is the complete absence of recognizable SYSLOG messages both before and after the event. Another path is that anytime the hardware takes some drastic action, there is almost always a 'phone home' event. I'd want to look at the HMC/SE logs, and call the support center to get their perspective.
If it were my call, I'd take the shop to sev 2 right now. If I don't have a satisfactory explanation in short order, then I'd up the bar to sev 1 (nobody goes home) and have everyone start thinking about having to pull a DR trigger. I would have to assume that the next time the box went to sleep it may not wake up.
Just my $0.02 US (before Taxes)
-----Original Message-----
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@xxxxxxxxxxx] On Behalf Of Eric Bielefeld
Sent: Wednesday, January 07, 2009 12:06 PM
To: IBM-MAIN@xxxxxxxxxxx
Subject: Re: MVS 4 minute 'outage'
I would suggest that some very high priority task got in a loop - something that runs at dispatching priority x'FF'. But then if it were truly in a loop, why did it stop after 4 minutes?
Do you just have 1 processor?
Good luck in finding the problem. Let us know if it happens again. I'm sure you have IBM involved - that sure sounds like a Sev 1 to me.
Eric
---- JE Thinnes <jethinnes@xxxxxxx> wrote:
We just experienced a 4 minute 'outage' on our z/OS system. (single image
z/OS 1.9 system).
By 'outage', I mean we could not communicate with MVS through TSO or the
z/OS consoles. There is a 4 minute gap in SYSLOG. The same for CICS, IMS
and DB2 logs.
There were no system dumps or other indicators.
We reviewed SYSLOG for the 15 minutes that preceeded the 'outage' and did
not find anything. TMONMVS had a 4 minute gap in the collector during
the 'outage'.
Any suggestions how we can determine what happened?--
Eric Bielefeld
Systems Programmer
Washington University
St Louis, Missouri
314-935-3418
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to listserv@xxxxxxxxxxx with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html
NOTICE: This electronic mail message and any files transmitted with it are intended
exclusively for the individual or entity to which it is addressed. The message,
together with any attachment, may contain confidential and/or privileged information.
Any unauthorized review, use, printing, saving, copying, disclosure or distribution
is strictly prohibited. If you have received this message in error, please
immediately advise the sender by reply email and delete all copies.
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to listserv@xxxxxxxxxxx with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html
.
- References:
- MVS 4 minute 'outage'
- From: JE Thinnes
- Re: MVS 4 minute 'outage'
- From: Eric Bielefeld
- MVS 4 minute 'outage'
- Prev by Date: Re: Auditors
- Next by Date: Re: MVS 4 minute 'outage'
- Previous by thread: Re: MVS 4 minute 'outage'
- Next by thread: Re: MVS 4 minute 'outage'
- Index(es):
Relevant Pages
|
Loading