Re: CPU Node board failure on Origin 2000



Peter van Heusden schrieb:
Hi there. Here at SANBI we have an Origin 2000 made up of two machines
in one cabinet. Historically they have appeared a single 'virtual machine',

Hmm.... if the machine is still conneted with 2 (max 4.) craylink cables you'll have a single machine. You can check this when open the right baffle and looking for thicker cables.
I think with 'virtual machine' you mean that your sytems use the partition feature. With that you can create smaller sub units from a larger installation. The smallest subunit in an origin 2000 is one module and each runs its own irix kernel (installation).

But... i never use the partition feature so i cant spend some more specific hints how to deal when having a problem there. I cant believe that you can run a system which uses partition without the MMSC because the MMSC is needed to shotdown and restart a specific partion.

with the 'bottom' one playing the role of 'master'. A few weeks ago,
the machine rebooted itself, and the 'top' machine took over as 'master'.
And now, if you try and boot up the bottom machine - or both machines - you get an error message like this:

If an origin went down the system create a FRU analyse under /var/adm/crash. You will also find some infos in the SYSLOG file. When having no need for the FRU files you can delete them to get the disk space back.

.......
DONE
Checking
partitioning information ......... DONE
Loading BASEIO prom ....................... DONE

BASEIO PROM Monitor SGI Version 6.111 built 09:43:30 AM May 24,
2002

This looks like and older IRIX installation because latest PROM is 6.156 from Nov 18, 2003.

(BE64) 13 CPUs on 7 nodes found.
****************************************************************
* PANIC: Boards in same module show different moduleids. *
* PANIC: Failed to automatically assign moduleid(s) *
* Please assign globally unique module id(s) at the MSC. *
****************************************************************

When the origin modules are craylinked together each one needs an unique id. In an standard installation the lower module of rack1 is numbered with '1' and the upper is '2'. The lower module from a 2nd. rack get '3' and so on.

If a module lost its configuration, or when clearing up the logs from POD, or after changing the MSC it can be happend that two or more modules use the same id. So you have to asign the ids manually by enter the command line modus from the maintenance menu. Type in 'help' to get a list. Im sure there is an command like 'moduleid' or just 'module'. With this command you can get the current id and also assign a new one.

So shutdown both installations and restart only the lower module. When possible let the system boot into the IRIX OS. The MSC than shows you the current ID. The LEDs shows something like 'P0M 1 C' which means "Partion 0,Module 1, Console". Shutdown the system and try the same step with the upper module. If both uses the same ID you have to re-assign one of them. Restart the module and enter then maintenance menu. Press '5' for the commandline menu. Type in 'moduleid 1' for example followed by 'update' to save the new configuration. After this restart the systems.

Something similar can be happend when moving/replacing nodeboards from one module into another. I my case (2 rack system with 32cpus) i try clearallogs and initlogs from the POD. After this the system starts to re-number all nodes and modules. But dont try this until you have check howto setup your partions!

Take a look to
http://forums.nekochan.net/viewtopic.php?t=883&view=next

regards
Joerg
.



Relevant Pages

  • RE: Help and Support Service - Missing File +=+=+ Long Running Iss
    ... command to no effect. ... I now have another server with the Help and Support service ... I have read through the log files you sent, and your SBS SP1 installation ... Attached are also the logs after the last restart. ...
    (microsoft.public.windows.server.sbs)
  • Re: Control installation process after windows restart
    ... > How can I restart computer, and capture the computer after restart? ... > to change login screen to installation progress dialog, ... with a unique name for your program with the command line to execute as the ... The command will not get executed before the ...
    (microsoft.public.win32.programmer.kernel)
  • Re: Mouse
    ... > installation questions and terms that are not designed for a new user. ... Then restart your X session. ... If your session manager is kdm instead of gdm, the command ...
    (Debian-User)
  • Reset windows installer message
    ... command. ... with setup, it is necessary to reset the windows installer. ... restart the machine. ... Any clues on what has gone wrong with my installation? ...
    (microsoft.public.dotnet.framework.setup)
  • Re: IISRESET
    ... If you run "IISRESET" without other parameters on a command prompt, ... and then restarts all running Internet services. ... This parameter will restart the computer. ...
    (microsoft.public.windows.server.sbs)