Re: Broken hardware was Re: Broken Brancher (was Re: Best IEFACTRT)
- From: TrailingEdgeTechnologies <bbreynolds@xxxxxxx>
- Date: Mon, 5 Oct 2009 09:34:16 -0700 (PDT)
On Oct 2, 8:40�pm, cfmpub...@xxxxxxxxxxxxxxx (Clark Morris) wrote:
I'll try not to send the message before its time (reference to an old
wine commercial) this time. �And to Scott, Sesame Street should have
called it the Cookie-eating Monster (and I resemble that).
3 incidents come to mind. �The first was a 2821 print controller that
blew up error recovery by sending back Device End and Busy. �Despite
MVT being in its last days, we were the site of first discovery. �The
second was on a mod 65 where the CSW was getting stored x'40' or x'48'
from a 256K boundary. �We were finally able to force it by using an
IEBCOPY unload with IEBCOPY brought back from SVS thanks to the
MICHMODS MVT tape. �We called in the third party memory CEs who came
in and proved it wasn't their problem by some process that I forget
even though I was the person watching this for the company. �I then
called IBM and the CE came in. �He checked for the problem after I
showed the symptoms thinking it wasn't an IBM problem and turned up a
250 nano-second delay card in the channel that wasn't delaying things
for 250 nano-seconds. � The last was under MVS when we lost an indexed
VTOC on a 3380. �After rebuilding it, I checked EREP to see what was
happening at the time and found a large number of temporary write
errors to the drive at the time. �The CE checked it out and found a
loose card in the controller. �Reseating the card ended the problem.
Interestingly both the wacko 2821 and the loose controller card
resulted in PTFs because of the inadequate handling of error
conditions. �
On 2 Oct 2009 06:05:52 -0700, in bit.listserv.ibm-main you wrote:
Many years ago the company I worked for had a 3031. We added the AP to it. Soon after, we started experiencing random 0Cx abends that made no sense when the dump was examined. The abends were in user and IBM code. CEs could find no problem so it had to be software. The PSR agreed that the data in the dump was valid. There were even samples where registers were wrong (did not match the storage that they were loaded from). HW started looking again. The problem did not occur with the AP offline. The problem was narrowed down to the TLB, with it off all was good. Replaced TLB. Still failed. An old CE came in with a data scope. The problem - the TLB was receiving the "here's data" signal 1.5ms ahead of the data, causing the TLB to load with all 1 bits. There was an optional EC that reduced a section of tri-lead by 18 inches. The EC fixed the problem.
Dennis Roach
GHG Corporation
Lockheed Martin Mission Services
Facilities Design and Operations Contract
NASA/JSC
Address:
� 2100 Space Park Drive
� LM-15-4BH
� Houston, Texas 77058
Mail:
� P.O. Box 58487
� Mail Code H4C
� Houston, Texas 77258
Phone:
� Voice: �(281)336-5027
� Cell: � (713)591-1059
� Fax: � �(281)336-5410
E-Mail: �Dennis.Ro...@xxxxxxxx
All opinions expressed by me are mine and may not agree with my employer or any person, company, or thing, living or dead, on or near this or any other planet, moon, asteroid, or other spatial object, natural or manufactured, since the beginning of time.
-----Original Message-----
From: IBM Mainframe Discussion List [mailto:IBM-M...@xxxxxxxxxxx] On
Behalf Of Bruce Richardson
Sent: Thursday, October 01, 2009 12:54 PM
To: IBM-M...@xxxxxxxxxxx
Subject: Broken Brancher (was Re: Best IEFACTRT)
Are you sure your code didn't suffer the same fate as IEFBR14?
The story (Urban Legend?) I heard, was that IEFBR14 was originally just
a "BR
14", but that code was APAR'd to add a "SR 15,15" before the "BR 14" to
set
the return code to zero. But then along came a problem with the loader,
it
seems that the minimum program length has to be 8 bytes, so another
APAR
was opened to add two NOPRs to the code.
Your code without the second "BR 14" is just 6 bytes!
On Tue, 29 Sep 2009 21:34:13 -0500, William H. Blair
<wmhbl...@xxxxxxxxxxx> wrote:
Edward Jaffe asks:
Which is the best IEFACTRT?
I am dying to know what you meant exactly by that question.
But I'll offer my candidate (in case this is a contest):
IEFACTRT CSECT
IEFACTRT AMODE 31
IEFACTRT RMODE ANY
R1 � � � EQU � 1
R14 � � �EQU � 14
R15 � � �EQU � 15
� � � � SR � �R1,R1 � � �Write SMF termination record
� � � � SR � �R15,R15 � �JOB processing is to continue
� � � � BR � �R14 � � � �Return to INITiator
� � � � BR � �R14 � � � �(just in case the brancher's broke
* � � � � � � � � � � � � when it executes that first BR)
� � � � END
And, yes, at one point, I had a machine where the brancher
was broke. I had to code a Bx immediately after every Bx
in case the first Bx ended up at a certain offset in a page,
else the box ignored the Bx as if it were a NOP[R] and went
on to whatever followed, unless it was an invalid opcode,
in which case it threw an ABEND S0C4 on the Bx even if the
branch address was, in fact, good. No, the CE didn't believe
me. �Nobody believed me for a week or so until some special
CE diagnostic tape flown in by IBM from POK failed to run,
red lighting the box.
The hardware guys kept telling everyone it was a software
problem, but the IBM software guys kept saying what they
saw in the dumps was impossible, so it had to be a hardware
problem. (IBM pointing fingers at itself.) Took 2 weeks to
find it. Meanwhile, everything ran fine except _my_ code,
which had the BR that elicited the error (an IEFACTRT exit,
in fact), and the odd application here and there (which the
operators just recovered and restarted on the other machine).
I remembered the incident because a frequent complaint from
some of the less experienced application programmers working
on Assembler programs (when the PSW ended up somewhere they
didn't think it should ever have gotten to) was that "the
brancher was broke." It always gave us lots of good laughs.
Well, for at least once in this world, it really was broke.
--
WB
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@xxxxxxxxxxx with the message: GET IBM-MAIN INFO
Search the archives athttp://bama.ua.edu/archives/ibm-main.html
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@xxxxxxxxxxx with the message: GET IBM-MAIN INFO
Search the archives athttp://bama.ua.edu/archives/ibm-main.html
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@xxxxxxxxxxx with the message: GET IBM-MAIN INFO
Search the archives athttp://bama.ua.edu/archives/ibm-main.html
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@xxxxxxxxxxx with the message: GET IBM-MAIN INFO
Search the archives athttp://bama.ua.edu/archives/ibm-main.html- Hide quoted text -
- Show quoted text -
In the realm of hardware-destroying software, I offer the
following from the mid-1970s, on an IBM 1800, with 2311s
attached to an 2848. In order to reduce maintenance costs,
eliminate the hydraulics of the IBM 2311s, and increase the
number of drives, we brought in Century Data 2311s (voice-coil
drives OEMed from Calcomp). Our application, an expansion
of IBM's Clinical Laboratory Management System, had
multiple levels of indices into the master data files, and in
a optimization done on the IBM drives, two of those indices
were located on the same logical 1810, mapped as a Form 5
DSCB on the the 2311s. After about six months of operation
on the replacement drives, we started noticing strange I/O WAITs
while doing disk accesses, and then actually saw the WAIT light
come on on the drive with the above-mentioned indices, stay
on for about a second, and then have the I/O completed successfully.
That evening, I removed the 1316 from the drive, and saw that
the steel tracks on which the heads rode had indentations at
the two locations to which the seeks to the index files were
directed. The indentations were the equivalent of the condition
of an asphalt road subjected to continued braking at a stop
light on a downhill. It appears that the steel tracks were not
sufficiently case-hardened, and the software as set up had
actually destroyed the hardware. The WAIT was due to the
voice coil needing to get extra force to get the head mechanism
out of the dip in the track: it would then complete the seek
successfully.
While waiting from new tracks to be manufactured and installed,
I set about altering the file locations (not on an easy task) with
a goal of cutting down continued seeks to those two locations.
Bruce B. Reynolds, Trailing Edge Technologies, Glenside PA
.
- References:
- Broken Brancher (was Re: Best IEFACTRT)
- From: Bruce Richardson
- Re: Broken Brancher (was Re: Best IEFACTRT)
- From: Roach, Dennis , N-GHG
- Broken hardware was Re: Broken Brancher (was Re: Best IEFACTRT)
- From: Clark Morris
- Broken Brancher (was Re: Best IEFACTRT)
- Prev by Date: Re: 500 Pound Gorilla & Downsizing
- Next by Date: Re: 500 Pound Gorilla & Downsizing
- Previous by thread: Broken hardware was Re: Broken Brancher (was Re: Best IEFACTRT)
- Next by thread: Re: Broken Brancher (was Re: Best IEFACTRT)
- Index(es):
Relevant Pages
|