WEEEIRD - the SRM is not on serial only, not kbd/monitor as before.
On 17 Sep 2011, at 11:46, Sampsa Laine wrote:
I did, and found a totally superfluous (to my needs) QLogic SCSI card + drive in there.
I've got a KZPAC attached with a BA365 - the machine boots, but just to a blue screen.
With the Qlogic in, it won't even boot.
Is there some magic key setting to reset the SRM ?
Sampsa
On 17 Sep 2011, at 11:40, hvlems at zonnet.nl wrote:
Possibly cards became unseated.
Open it up and remove all boards and reseat them. If that doesn't work.
------Origineel bericht------
Van: Sampsa Laine
Afzender: owner-hecnet at Update.UU.SE
Aan: hecnet at Update.UU.SE
Beantwoorden: hecnet at Update.UU.SE
Onderwerp: [HECnet] DS10 fucked - 3 beeps. Any ideas?
Verzonden: 17 september 2011 11:52
Hi,
So I just had my DS10 perfectly configured but on its side, and it fell over.
Now it just gives me 3 beeps.
What do I do, disassemble the whole thing and reseat the RAM or what?
I did, and found a totally superfluous (to my needs) QLogic SCSI card + drive in there.
I've got a KZPAC attached with a BA365 - the machine boots, but just to a blue screen.
With the Qlogic in, it won't even boot.
Is there some magic key setting to reset the SRM ?
Sampsa
On 17 Sep 2011, at 11:40, hvlems at zonnet.nl wrote:
Possibly cards became unseated.
Open it up and remove all boards and reseat them. If that doesn't work.
------Origineel bericht------
Van: Sampsa Laine
Afzender: owner-hecnet at Update.UU.SE
Aan: hecnet at Update.UU.SE
Beantwoorden: hecnet at Update.UU.SE
Onderwerp: [HECnet] DS10 fucked - 3 beeps. Any ideas?
Verzonden: 17 september 2011 11:52
Hi,
So I just had my DS10 perfectly configured but on its side, and it fell over.
Now it just gives me 3 beeps.
What do I do, disassemble the whole thing and reseat the RAM or what?
Possibly cards became unseated.
Open it up and remove all boards and reseat them. If that doesn't work.
------Origineel bericht------
Van: Sampsa Laine
Afzender: owner-hecnet at Update.UU.SE
Aan: hecnet at Update.UU.SE
Beantwoorden: hecnet at Update.UU.SE
Onderwerp: [HECnet] DS10 fucked - 3 beeps. Any ideas?
Verzonden: 17 september 2011 11:52
Hi,
So I just had my DS10 perfectly configured but on its side, and it fell over.
Now it just gives me 3 beeps.
What do I do, disassemble the whole thing and reseat the RAM or what?
So I just had my DS10 perfectly configured but on its side, and it fell over.
Now it just gives me 3 beeps.
If it's like other alphas, there are different beep sequences that mean
different things and the meanings of some of the more common ones are listed
in the manual.
What do I do, disassemble the whole thing and reseat the RAM or what?
Given what happened, that sounds like a good idea. Also check for connections
that have come loose and for any damage.
If you have a graphics card fitted but no keyboard or mouse plugged in, this
can cause some alphas to beep on startup, but they generally ignore the
problem and continue on booting.
Regards,
Peter Coghlan.
My apolgies for the confusion Peter. But you're right to assume I meant the
cl.gr.nr.
And yes, there is obviously something wrong here. IF an autogen was done I'd
say have a look at modparams.dat
- alloclass must be unique for each node
- same for tapealloclass
The alloclass is used in forming device names and lock resource names. If it
is not set correctly, there will be problems in these areas but they should not
prevent a node from joining a cluster. I would tend to leave it at zero unless
there were good reasons for changing it. There are myriads of little rules about
how various sysgen parameters should be set but most of them can be ignored
for the moment, partly because VMS is not so fragile that having any one of
a vast number of parameters wrong will prevent a system from booting and partly
because the system picks sensible defaults that will work in typical cases
for sysgen parameters.
More important sysgen parameters to check are SCSNODE and particularly
SCSSYSTEMID. If SCSSYSTEMID was inadvertently changed, this may well cause
difficulties. SCSSYSTEMID must be the same as the decnet area number *1024 plus
the decnet node number. This number is used to calculate the ethernet address
used for cluster communications. It is also used to uniquely identify cluster
nodes to other cluster nodes.
If SCSNODE is changed without changing SCSSYSTEMID or vice versa, that node
will have difficulties joining a cluster as the other nodes in the cluster
will remember the previous values and complain that the new ones are not
consistent. The solution here is to shut down all nodes in the cluster so that
they are all down at the same time and then reboot each.
And check the cluster license. AFAIK the cluster license must bu unique for
each node. Or one license with 0 units and in that case make sure all
nodenames are mentioned in the /INCLUDE list, again on all nodes where the
license was loaded.
I am not 100% sure on this but as far as I know, cluster licenses are not
checked when a node is attempting to join a cluster because the system is
operating at a very low level and may not be in a position to access the disk
yet where its licenses are held. I think the response to lack of cluster
license is whinges about it in the operator log rather than disallowing a
node from joining. If it were to prevent a node from joining, I would expect
to see a prominent error message mentioning a license problem.
Regards,
Peter Coghlan.
Hi,
So I just had my DS10 perfectly configured but on its side, and it fell over.
Now it just gives me 3 beeps.
What do I do, disassemble the whole thing and reseat the RAM or what?
My apolgies for the confusion Peter. But you're right to assume I meant the cl.gr.nr.
And yes, there is obviously something wrong here. IF an autogen was done I'd say have a look at modparams.dat
- alloclass must be unique for each node
- same for tapealloclass
And check the cluster license. AFAIK the cluster license must bu unique for each node. Or one license with 0 units and in that case make sure all nodenames are mentioned in the /INCLUDE list, again on all nodes where the license was loaded.
PS
I'm travelling now so can't check my mail too often
Hans
-----Original Message-----
From: Peter Coghlan <HECNET at beyondthepale.ie>
Sender: owner-hecnet at Update.UU.SE
Date: Fri, 16 Sep 2011 12:52:13
To: <hecnet at Update.UU.SE>
Reply-To: hecnet at Update.UU.SESubject: Re: Fwd: Re: [HECnet] More clustering fun
2) The cluster id and cluster password are different on both nodes.
Whats the cluster id?
The cluster id is 4.
Sorry - I should have said "What do you mean by the cluster id?"
I guess you mean the cluster group number but I wanted to be
sure we weren't referring to two different things.
I've confirmed that the authorize file is the same on both the alpha and
vax nodes:
$$ diff dsa0:[sys0.syscommon.sysexe]cluster_authorize.dat -
_$$ $4$dkb400:[sys0.syscommon.sysexe]cluster_authorize.dat
Number of difference sections found: 0
Number of difference records found: 0
DIFFERENCES /IGNORE=()/MERGED=1-
DSA0:[SYS0.SYSCOMMON.SYSEXE]CLUSTER_AUTHORIZE.DAT;1-
$4$DKB400:[SYS0.SYSCOMMON.SYSEXE]CLUSTER_AUTHORIZE.DAT;1
[snip]
It's weird that by changing the VOTES to 0 on the satellite and running an
AUTOGEN that this behaviour is experienced. Before that the satellite
boots off the served disk no problem.
I doubt that changing VOTES to 0 was responsible. To check this, you could
set VOTES back to 1 (probably manually using SYSGEN) and test again.
(A problem with running AUTOGEN for the first time is that it finds anything
that you didn't know was lurking in MODPARAMS.DAT and acts on it.)
Indeed, the 'master' disk which is a local drive in the VAXstation boots
into the cluster no problem, but again, with VOTES = 1.
Check that there are no files called SYS$SPECIFIC:[SYSEXE]CLUSTER_AUTHORIZE.DAT
on either node. If you find any, delete them.
You may have to reboot both nodes in order to get them to read the correct
CLUSTER_AUTHORIZE.DAT files after copying them or deleting errant ones.
Regards,
Peter Coghlan.
2) The cluster id and cluster password are different on both nodes.
Whats the cluster id?
The cluster id is 4.
Sorry - I should have said "What do you mean by the cluster id?"
I guess you mean the cluster group number but I wanted to be
sure we weren't referring to two different things.
I've confirmed that the authorize file is the same on both the alpha and
vax nodes:
$$ diff dsa0:[sys0.syscommon.sysexe]cluster_authorize.dat -
_$$ $4$dkb400:[sys0.syscommon.sysexe]cluster_authorize.dat
Number of difference sections found: 0
Number of difference records found: 0
DIFFERENCES /IGNORE=()/MERGED=1-
DSA0:[SYS0.SYSCOMMON.SYSEXE]CLUSTER_AUTHORIZE.DAT;1-
$4$DKB400:[SYS0.SYSCOMMON.SYSEXE]CLUSTER_AUTHORIZE.DAT;1
[snip]
It's weird that by changing the VOTES to 0 on the satellite and running an
AUTOGEN that this behaviour is experienced. Before that the satellite
boots off the served disk no problem.
I doubt that changing VOTES to 0 was responsible. To check this, you could
set VOTES back to 1 (probably manually using SYSGEN) and test again.
(A problem with running AUTOGEN for the first time is that it finds anything
that you didn't know was lurking in MODPARAMS.DAT and acts on it.)
Indeed, the 'master' disk which is a local drive in the VAXstation boots
into the cluster no problem, but again, with VOTES = 1.
Check that there are no files called SYS$SPECIFIC:[SYSEXE]CLUSTER_AUTHORIZE.DAT
on either node. If you find any, delete them.
You may have to reboot both nodes in order to get them to read the correct
CLUSTER_AUTHORIZE.DAT files after copying them or deleting errant ones.
Regards,
Peter Coghlan.
On Fri, 16 Sep 2011, Peter Coghlan wrote:
2) The cluster id and cluster password are different on both nodes.
Whats the cluster id?
The cluster id is 4.
I've confirmed that the authorize file is the same on both the alpha and vax nodes:
$$ diff dsa0:[sys0.syscommon.sysexe]cluster_authorize.dat -
_$$ $4$dkb400:[sys0.syscommon.sysexe]cluster_authorize.dat
Number of difference sections found: 0
Number of difference records found: 0
DIFFERENCES /IGNORE=()/MERGED=1-
DSA0:[SYS0.SYSCOMMON.SYSEXE]CLUSTER_AUTHORIZE.DAT;1-
$4$DKB400:[SYS0.SYSCOMMON.SYSEXE]CLUSTER_AUTHORIZE.DAT;1
The cluster group number and the cluster password must be the same
on all nodes in the cluster. The easiest way to achieve this is
to copy sys$common:[sysexe]cluster_authorize.dat from one to the
other.
I set the right values when I ran CLUSTER_CONFIG_LAN.COM on the VAX.
If the cluster group numbers are different, you will end up
trying to form two different clusters.
If the passwords are different, the results will probably be
something like you are seeing, nodes failing to completely
join the cluster. There may also be some messages in the
operator.log
It's weird that by changing the VOTES to 0 on the satellite and running an AUTOGEN that this behaviour is experienced. Before that the satellite boots off the served disk no problem.
Indeed, the 'master' disk which is a local drive in the VAXstation boots into the cluster no problem, but again, with VOTES = 1.
Regards, Mark
2) The cluster id and cluster password are different on both nodes.
Whats the cluster id?
The cluster group number and the cluster password must be the same
on all nodes in the cluster. The easiest way to achieve this is
to copy sys$common:[sysexe]cluster_authorize.dat from one to the
other.
If the cluster group numbers are different, you will end up
trying to form two different clusters.
If the passwords are different, the results will probably be
something like you are seeing, nodes failing to completely
join the cluster. There may also be some messages in the
operator.log
Regards,
Peter.