On 17/09/11 18:15, Sampsa Laine wrote:
Guys,
I hink CHIMPY is running again, please try it out (8.401)...
Sampsa
Well done Sampsa! Woot-woot!
Glad to hear you've had some luck.
I was getting system errors from my VAXstation 4000/60. I've just stripped it down and cleaned it of all dust bunnies, reseated the RAM and contact cleaned all connectors. It was one of the few machines that I didn't do this to when I got it.
So far so good. Just need to find a CDROM drive that works!
Picked up a MicroVAX II today. Have yet to muster up the strength to remove it from the car!
Mark.
WEEEIRD - the SRM is not on serial only, not kbd/monitor as before.
On 17 Sep 2011, at 11:46, Sampsa Laine wrote:
I did, and found a totally superfluous (to my needs) QLogic SCSI card + drive in there.
I've got a KZPAC attached with a BA365 - the machine boots, but just to a blue screen.
With the Qlogic in, it won't even boot.
Is there some magic key setting to reset the SRM ?
Sampsa
On 17 Sep 2011, at 11:40, hvlems at zonnet.nl wrote:
Possibly cards became unseated.
Open it up and remove all boards and reseat them. If that doesn't work.
------Origineel bericht------
Van: Sampsa Laine
Afzender: owner-hecnet at Update.UU.SE
Aan: hecnet at Update.UU.SE
Beantwoorden: hecnet at Update.UU.SE
Onderwerp: [HECnet] DS10 fucked - 3 beeps. Any ideas?
Verzonden: 17 september 2011 11:52
Hi,
So I just had my DS10 perfectly configured but on its side, and it fell over.
Now it just gives me 3 beeps.
What do I do, disassemble the whole thing and reseat the RAM or what?
I did, and found a totally superfluous (to my needs) QLogic SCSI card + drive in there.
I've got a KZPAC attached with a BA365 - the machine boots, but just to a blue screen.
With the Qlogic in, it won't even boot.
Is there some magic key setting to reset the SRM ?
Sampsa
On 17 Sep 2011, at 11:40, hvlems at zonnet.nl wrote:
Possibly cards became unseated.
Open it up and remove all boards and reseat them. If that doesn't work.
------Origineel bericht------
Van: Sampsa Laine
Afzender: owner-hecnet at Update.UU.SE
Aan: hecnet at Update.UU.SE
Beantwoorden: hecnet at Update.UU.SE
Onderwerp: [HECnet] DS10 fucked - 3 beeps. Any ideas?
Verzonden: 17 september 2011 11:52
Hi,
So I just had my DS10 perfectly configured but on its side, and it fell over.
Now it just gives me 3 beeps.
What do I do, disassemble the whole thing and reseat the RAM or what?
Possibly cards became unseated.
Open it up and remove all boards and reseat them. If that doesn't work.
------Origineel bericht------
Van: Sampsa Laine
Afzender: owner-hecnet at Update.UU.SE
Aan: hecnet at Update.UU.SE
Beantwoorden: hecnet at Update.UU.SE
Onderwerp: [HECnet] DS10 fucked - 3 beeps. Any ideas?
Verzonden: 17 september 2011 11:52
Hi,
So I just had my DS10 perfectly configured but on its side, and it fell over.
Now it just gives me 3 beeps.
What do I do, disassemble the whole thing and reseat the RAM or what?
So I just had my DS10 perfectly configured but on its side, and it fell over.
Now it just gives me 3 beeps.
If it's like other alphas, there are different beep sequences that mean
different things and the meanings of some of the more common ones are listed
in the manual.
What do I do, disassemble the whole thing and reseat the RAM or what?
Given what happened, that sounds like a good idea. Also check for connections
that have come loose and for any damage.
If you have a graphics card fitted but no keyboard or mouse plugged in, this
can cause some alphas to beep on startup, but they generally ignore the
problem and continue on booting.
Regards,
Peter Coghlan.
My apolgies for the confusion Peter. But you're right to assume I meant the
cl.gr.nr.
And yes, there is obviously something wrong here. IF an autogen was done I'd
say have a look at modparams.dat
- alloclass must be unique for each node
- same for tapealloclass
The alloclass is used in forming device names and lock resource names. If it
is not set correctly, there will be problems in these areas but they should not
prevent a node from joining a cluster. I would tend to leave it at zero unless
there were good reasons for changing it. There are myriads of little rules about
how various sysgen parameters should be set but most of them can be ignored
for the moment, partly because VMS is not so fragile that having any one of
a vast number of parameters wrong will prevent a system from booting and partly
because the system picks sensible defaults that will work in typical cases
for sysgen parameters.
More important sysgen parameters to check are SCSNODE and particularly
SCSSYSTEMID. If SCSSYSTEMID was inadvertently changed, this may well cause
difficulties. SCSSYSTEMID must be the same as the decnet area number *1024 plus
the decnet node number. This number is used to calculate the ethernet address
used for cluster communications. It is also used to uniquely identify cluster
nodes to other cluster nodes.
If SCSNODE is changed without changing SCSSYSTEMID or vice versa, that node
will have difficulties joining a cluster as the other nodes in the cluster
will remember the previous values and complain that the new ones are not
consistent. The solution here is to shut down all nodes in the cluster so that
they are all down at the same time and then reboot each.
And check the cluster license. AFAIK the cluster license must bu unique for
each node. Or one license with 0 units and in that case make sure all
nodenames are mentioned in the /INCLUDE list, again on all nodes where the
license was loaded.
I am not 100% sure on this but as far as I know, cluster licenses are not
checked when a node is attempting to join a cluster because the system is
operating at a very low level and may not be in a position to access the disk
yet where its licenses are held. I think the response to lack of cluster
license is whinges about it in the operator log rather than disallowing a
node from joining. If it were to prevent a node from joining, I would expect
to see a prominent error message mentioning a license problem.
Regards,
Peter Coghlan.
Hi,
So I just had my DS10 perfectly configured but on its side, and it fell over.
Now it just gives me 3 beeps.
What do I do, disassemble the whole thing and reseat the RAM or what?
My apolgies for the confusion Peter. But you're right to assume I meant the cl.gr.nr.
And yes, there is obviously something wrong here. IF an autogen was done I'd say have a look at modparams.dat
- alloclass must be unique for each node
- same for tapealloclass
And check the cluster license. AFAIK the cluster license must bu unique for each node. Or one license with 0 units and in that case make sure all nodenames are mentioned in the /INCLUDE list, again on all nodes where the license was loaded.
PS
I'm travelling now so can't check my mail too often
Hans
-----Original Message-----
From: Peter Coghlan <HECNET at beyondthepale.ie>
Sender: owner-hecnet at Update.UU.SE
Date: Fri, 16 Sep 2011 12:52:13
To: <hecnet at Update.UU.SE>
Reply-To: hecnet at Update.UU.SESubject: Re: Fwd: Re: [HECnet] More clustering fun
2) The cluster id and cluster password are different on both nodes.
Whats the cluster id?
The cluster id is 4.
Sorry - I should have said "What do you mean by the cluster id?"
I guess you mean the cluster group number but I wanted to be
sure we weren't referring to two different things.
I've confirmed that the authorize file is the same on both the alpha and
vax nodes:
$$ diff dsa0:[sys0.syscommon.sysexe]cluster_authorize.dat -
_$$ $4$dkb400:[sys0.syscommon.sysexe]cluster_authorize.dat
Number of difference sections found: 0
Number of difference records found: 0
DIFFERENCES /IGNORE=()/MERGED=1-
DSA0:[SYS0.SYSCOMMON.SYSEXE]CLUSTER_AUTHORIZE.DAT;1-
$4$DKB400:[SYS0.SYSCOMMON.SYSEXE]CLUSTER_AUTHORIZE.DAT;1
[snip]
It's weird that by changing the VOTES to 0 on the satellite and running an
AUTOGEN that this behaviour is experienced. Before that the satellite
boots off the served disk no problem.
I doubt that changing VOTES to 0 was responsible. To check this, you could
set VOTES back to 1 (probably manually using SYSGEN) and test again.
(A problem with running AUTOGEN for the first time is that it finds anything
that you didn't know was lurking in MODPARAMS.DAT and acts on it.)
Indeed, the 'master' disk which is a local drive in the VAXstation boots
into the cluster no problem, but again, with VOTES = 1.
Check that there are no files called SYS$SPECIFIC:[SYSEXE]CLUSTER_AUTHORIZE.DAT
on either node. If you find any, delete them.
You may have to reboot both nodes in order to get them to read the correct
CLUSTER_AUTHORIZE.DAT files after copying them or deleting errant ones.
Regards,
Peter Coghlan.
2) The cluster id and cluster password are different on both nodes.
Whats the cluster id?
The cluster id is 4.
Sorry - I should have said "What do you mean by the cluster id?"
I guess you mean the cluster group number but I wanted to be
sure we weren't referring to two different things.
I've confirmed that the authorize file is the same on both the alpha and
vax nodes:
$$ diff dsa0:[sys0.syscommon.sysexe]cluster_authorize.dat -
_$$ $4$dkb400:[sys0.syscommon.sysexe]cluster_authorize.dat
Number of difference sections found: 0
Number of difference records found: 0
DIFFERENCES /IGNORE=()/MERGED=1-
DSA0:[SYS0.SYSCOMMON.SYSEXE]CLUSTER_AUTHORIZE.DAT;1-
$4$DKB400:[SYS0.SYSCOMMON.SYSEXE]CLUSTER_AUTHORIZE.DAT;1
[snip]
It's weird that by changing the VOTES to 0 on the satellite and running an
AUTOGEN that this behaviour is experienced. Before that the satellite
boots off the served disk no problem.
I doubt that changing VOTES to 0 was responsible. To check this, you could
set VOTES back to 1 (probably manually using SYSGEN) and test again.
(A problem with running AUTOGEN for the first time is that it finds anything
that you didn't know was lurking in MODPARAMS.DAT and acts on it.)
Indeed, the 'master' disk which is a local drive in the VAXstation boots
into the cluster no problem, but again, with VOTES = 1.
Check that there are no files called SYS$SPECIFIC:[SYSEXE]CLUSTER_AUTHORIZE.DAT
on either node. If you find any, delete them.
You may have to reboot both nodes in order to get them to read the correct
CLUSTER_AUTHORIZE.DAT files after copying them or deleting errant ones.
Regards,
Peter Coghlan.
On Fri, 16 Sep 2011, Peter Coghlan wrote:
2) The cluster id and cluster password are different on both nodes.
Whats the cluster id?
The cluster id is 4.
I've confirmed that the authorize file is the same on both the alpha and vax nodes:
$$ diff dsa0:[sys0.syscommon.sysexe]cluster_authorize.dat -
_$$ $4$dkb400:[sys0.syscommon.sysexe]cluster_authorize.dat
Number of difference sections found: 0
Number of difference records found: 0
DIFFERENCES /IGNORE=()/MERGED=1-
DSA0:[SYS0.SYSCOMMON.SYSEXE]CLUSTER_AUTHORIZE.DAT;1-
$4$DKB400:[SYS0.SYSCOMMON.SYSEXE]CLUSTER_AUTHORIZE.DAT;1
The cluster group number and the cluster password must be the same
on all nodes in the cluster. The easiest way to achieve this is
to copy sys$common:[sysexe]cluster_authorize.dat from one to the
other.
I set the right values when I ran CLUSTER_CONFIG_LAN.COM on the VAX.
If the cluster group numbers are different, you will end up
trying to form two different clusters.
If the passwords are different, the results will probably be
something like you are seeing, nodes failing to completely
join the cluster. There may also be some messages in the
operator.log
It's weird that by changing the VOTES to 0 on the satellite and running an AUTOGEN that this behaviour is experienced. Before that the satellite boots off the served disk no problem.
Indeed, the 'master' disk which is a local drive in the VAXstation boots into the cluster no problem, but again, with VOTES = 1.
Regards, Mark
2) The cluster id and cluster password are different on both nodes.
Whats the cluster id?
The cluster group number and the cluster password must be the same
on all nodes in the cluster. The easiest way to achieve this is
to copy sys$common:[sysexe]cluster_authorize.dat from one to the
other.
If the cluster group numbers are different, you will end up
trying to form two different clusters.
If the passwords are different, the results will probably be
something like you are seeing, nodes failing to completely
join the cluster. There may also be some messages in the
operator.log
Regards,
Peter.
On 16/09/11 11:22, hvlems at zonnet.nl wrote:
Did you adjust EXPECTED_VOTES on the alpha server?
From: Mark Wickens <mark at wickensonline.co.uk>
Sender: owner-hecnet at Update.UU.SE
Date: Fri, 16 Sep 2011 11:09:04 +0100
To: <hecnet at Update.UU.SE>
ReplyTo: hecnet at Update.UU.SE
Subject: Fwd: Re: [HECnet] More clustering fun
Just to add to the picture - if I reduce VOTES on the satellite to 0 I get this happening:
-------- Original Message --------
Subject:
Re: [HECnet] More clustering fun
Date:
Fri, 16 Sep 2011 09:33:47 +0000
From:
hvlems at zonnet.nl
Reply-To:
hvlems at zonnet.nl
To:
Mark Wickens <mark at wickensonline.co.uk>
All I can think of is this:
1) Both slave and aleph both use the same VMSCLUSTER license
2) The cluster id and cluster password are different on both nodes.
On an alpha you can modify this in sysman (use help in sysman to find the correct command). On a Vax the command is burried in sysgen.
Hans
-----Original Message-----
From: Mark Wickens <mark at wickensonline.co.uk>
Date: Fri, 16 Sep 2011 10:11:41
Cc: <hvlems at zonnet.nl>
Subject: Re: [HECnet] More clustering fun
Hi Hans,
I didn't want to pre-empt what I thought happened last time as I wasn't
sure I'd got it right, but it has happened again so it's definitely an
issue.
I've updated the VOTES in ALEPH (the satellite) MODPARAMS.DAT, ran
AUTOGEN and rebooted the cluster.
Now when ALEPH attempts to join the cluster I get these messages repeatedly:
%CNXMAN, sending VAXcluster membership request to system SLAVE
%CNXMAN, sending VAXcluster membership request to system SLAVE
%CNXMAN, sending VAXcluster membership request to system SLAVE
%CNXMAN, sending VAXcluster membership request to system SLAVE
%CNXMAN, sending VAXcluster membership request to system SLAVE
and I see this on SLAVE:
$$
%CNXMAN, Received VMScluster membership request from system ALEPH
%CNXMAN, Proposing addition of system ALEPH
%CNXMAN, Completing VMScluster state transition
%%%%%%%%%%% OPCOM 16-SEP-2011 10:05:35.79 %%%%%%%%%%%
10:05:35.79 Node SLAVE (csid 00010001) received VMScluster membership
request from node ALEPH
%%%%%%%%%%% OPCOM 16-SEP-2011 10:05:35.79 %%%%%%%%%%%
10:05:35.79 Node SLAVE (csid 00010001) proposed addition of node ALEPH
%%%%%%%%%%% OPCOM 16-SEP-2011 10:05:35.79 %%%%%%%%%%%
10:05:35.79 Node SLAVE (csid 00010001) completed VMScluster state transition
$$
%CNXMAN, Received VMScluster membership request from system ALEPH
%CNXMAN, Proposing addition of system ALEPH
%CNXMAN, Completing VMScluster state transition
%%%%%%%%%%% OPCOM 16-SEP-2011 10:05:39.04 %%%%%%%%%%%
10:05:39.04 Node SLAVE (csid 00010001) received VMScluster membership
request from node ALEPH
%%%%%%%%%%% OPCOM 16-SEP-2011 10:05:39.04 %%%%%%%%%%%
10:05:39.04 Node SLAVE (csid 00010001) proposed addition of node ALEPH
%%%%%%%%%%% OPCOM 16-SEP-2011 10:05:39.04 %%%%%%%%%%%
10:05:39.04 Node SLAVE (csid 00010001) completed VMScluster state transition
$$
%CNXMAN, Received VMScluster membership request from system ALEPH
%CNXMAN, Proposing addition of system ALEPH
%CNXMAN, Completing VMScluster state transition
%%%%%%%%%%% OPCOM 16-SEP-2011 10:05:43.02 %%%%%%%%%%%
10:05:43.02 Node SLAVE (csid 00010001) received VMScluster membership
request from node ALEPH
%%%%%%%%%%% OPCOM 16-SEP-2011 10:05:43.02 %%%%%%%%%%%
10:05:43.02 Node SLAVE (csid 00010001) proposed addition of node ALEPH
%%%%%%%%%%% OPCOM 16-SEP-2011 10:05:43.02 %%%%%%%%%%%
10:05:43.02 Node SLAVE (csid 00010001) completed VMScluster state transition
This just repeats forever.
Some more information from SLAVE (the ALPHA server):
SHOW CLUSTER:
View of Cluster from system ID 4345 node:
SLAVE
16-SEP-2011 10:06:13
+--------------------------------------------------------+---------+
| SYSTEMS | MEMBERS |
+--------+--------------------------------+--------------+---------+
| NODE | HW_TYPE | SOFTWARE | STATUS |
+--------+--------------------------------+--------------+---------+
| SLAVE | AlphaServer 1000A 5/300 | VMS V8.3 | MEMBER |
| ALEPH | VAXstation 4000-VLC | VMS V7.3 | NEW |
+--------+--------------------------------+--------------+---------+
+------------------------------------------------------------------------------------+
|
CLUSTER |
+--------+-----------+----------+------------+-------------------+-------------------+
| CL_EXP | CL_QUORUM | CL_VOTES | CL_MEMBERS | FORMED |
LAST_TRANSITION |
+--------+-----------+----------+------------+-------------------+-------------------+
| 1 | 1 | 1 | 1 | 16-SEP-2011 09:56 |
16-SEP-2011 09:56 |
+--------+-----------+----------+------------+-------------------+-------------------+
SYSGEN> SHOW EXPECTED_VOTES
%CNXMAN, Completing VMScluster state transition
Parameter Name Current Default Min. Max. Unit
Dynamic
-------------- ------- ------- ------- ------- ----
-------
EXPECTED_VOTES 1 1 1 127 Votes
Any ideas why this is going wrong?
Thanks for the help, much appreciated,
Mark.
On 16/09/11 09:40, hvlems at zonnet.nl wrote:
> Regarding the alphaserver: check the value of expectedvotes in sysgen.
> In a cluster with non-voting satellites only, its value must be less than Votes+1
> Hans
> ------Origineel bericht------
> Van: Mark Wickens
> Afzender: owner-hecnet at Update.UU.SE
> Aan: hecnet at Update.UU.SE
> Beantwoorden: hecnet at Update.UU.SE
> Onderwerp: [HECnet] More clustering fun
> Verzonden: 16 september 2011 10:14
>
> I've now refreshed the VAX satellites system drive and installed it in the
> ALPHA server. The one problem I have remaining is that the VOTES the
> satellite is contributing to the cluster is 1. I believe for a proper
> satellite this should be 0.
>
> Is this a case of updating the MODPARAMS.DAT on the satellite and autogen
> and reboot? Do I need to do anything with the ALPHA servers configuration?
>
> Presumably I will need to reboot the ALPHA server as well.
>
> Thanks for the help,
>
> Kind regards, Mark.
>
The EXPECTED_VOTES on the ALPHA is set to 1 - which I assume is the value we would anticipate would work?
Even when both the server and the satellite have VOTES set to 1 the expected votes remains at 1 on the server, although if you examine the cluster via SHOW CLUSTER it shows that the quorum is set to 2:
View of Cluster from system ID 4345 node: SLAVE 16-SEP-2011 11:21:15
+--------------------------------------------------------+---------+
| SYSTEMS | MEMBERS |
+--------+--------------------------------+--------------+---------+
| NODE | HW_TYPE | SOFTWARE | STATUS |
+--------+--------------------------------+--------------+---------+
| SLAVE | AlphaServer 1000A 5/300 | VMS V8.3 | MEMBER |
| ALEPH | VAXstation 4000-VLC | VMS V7.3 | MEMBER |
+--------+--------------------------------+--------------+---------+
+-------------------------------------------------------------------------------
| CLUSTER
+--------+-----------+----------+------------+-------------------+--------------
| CL_EXP | CL_QUORUM | CL_VOTES | CL_MEMBERS | FORMED | LAST_TRANSIT
+--------+-----------+----------+------------+-------------------+--------------
| 2 | 2 | 2 | 2 | 16-SEP-2011 11:04 | 16-SEP-2011 1
+--------+-----------+----------+------------+-------------------+--------------
With the satellites VOTES set to 0 (which causes the endless %CNXMAN messages) if I turn off the satellite at that point I get the following:
$$
%CNXMAN, Quorum lost, blocking activity
$$
%CNXMAN, Timed-out lost connection to system ALEPH
%CNXMAN, Proposing reconfiguration of the VMScluster
%CNXMAN, Discovered system ALEPH
%CNXMAN, Removed from VMScluster system ALEPH
%CNXMAN, Completing VMScluster state transition
%CNXMAN, Established connection to system ALEPH
and the server hangs. I end up having to reboot the server, because the satellite never joins the cluster successfully.
It's all fun!
Regards, Mark.
Did you adjust EXPECTED_VOTES on the alpha server?
From: Mark Wickens <mark at wickensonline.co.uk>
Sender: owner-hecnet at Update.UU.SE
Date: Fri, 16 Sep 2011 11:09:04 +0100
To: <hecnet at Update.UU.SE>
ReplyTo: hecnet at Update.UU.SE
Subject: Fwd: Re: [HECnet] More clustering fun
Just to add to the picture - if I reduce VOTES on the satellite to 0 I get this happening:
-------- Original Message --------
Subject:
Re: [HECnet] More clustering fun
Date:
Fri, 16 Sep 2011 09:33:47 +0000
From:
hvlems at zonnet.nl
Reply-To:
hvlems at zonnet.nl
To:
Mark Wickens <mark at wickensonline.co.uk>
All I can think of is this:
1) Both slave and aleph both use the same VMSCLUSTER license
2) The cluster id and cluster password are different on both nodes.
On an alpha you can modify this in sysman (use help in sysman to find the correct command). On a Vax the command is burried in sysgen.
Hans
-----Original Message-----
From: Mark Wickens <mark at wickensonline.co.uk>
Date: Fri, 16 Sep 2011 10:11:41
Cc: <hvlems at zonnet.nl>
Subject: Re: [HECnet] More clustering fun
Hi Hans,
I didn't want to pre-empt what I thought happened last time as I wasn't
sure I'd got it right, but it has happened again so it's definitely an
issue.
I've updated the VOTES in ALEPH (the satellite) MODPARAMS.DAT, ran
AUTOGEN and rebooted the cluster.
Now when ALEPH attempts to join the cluster I get these messages repeatedly:
%CNXMAN, sending VAXcluster membership request to system SLAVE
%CNXMAN, sending VAXcluster membership request to system SLAVE
%CNXMAN, sending VAXcluster membership request to system SLAVE
%CNXMAN, sending VAXcluster membership request to system SLAVE
%CNXMAN, sending VAXcluster membership request to system SLAVE
and I see this on SLAVE:
$$
%CNXMAN, Received VMScluster membership request from system ALEPH
%CNXMAN, Proposing addition of system ALEPH
%CNXMAN, Completing VMScluster state transition
%%%%%%%%%%% OPCOM 16-SEP-2011 10:05:35.79 %%%%%%%%%%%
10:05:35.79 Node SLAVE (csid 00010001) received VMScluster membership
request from node ALEPH
%%%%%%%%%%% OPCOM 16-SEP-2011 10:05:35.79 %%%%%%%%%%%
10:05:35.79 Node SLAVE (csid 00010001) proposed addition of node ALEPH
%%%%%%%%%%% OPCOM 16-SEP-2011 10:05:35.79 %%%%%%%%%%%
10:05:35.79 Node SLAVE (csid 00010001) completed VMScluster state transition
$$
%CNXMAN, Received VMScluster membership request from system ALEPH
%CNXMAN, Proposing addition of system ALEPH
%CNXMAN, Completing VMScluster state transition
%%%%%%%%%%% OPCOM 16-SEP-2011 10:05:39.04 %%%%%%%%%%%
10:05:39.04 Node SLAVE (csid 00010001) received VMScluster membership
request from node ALEPH
%%%%%%%%%%% OPCOM 16-SEP-2011 10:05:39.04 %%%%%%%%%%%
10:05:39.04 Node SLAVE (csid 00010001) proposed addition of node ALEPH
%%%%%%%%%%% OPCOM 16-SEP-2011 10:05:39.04 %%%%%%%%%%%
10:05:39.04 Node SLAVE (csid 00010001) completed VMScluster state transition
$$
%CNXMAN, Received VMScluster membership request from system ALEPH
%CNXMAN, Proposing addition of system ALEPH
%CNXMAN, Completing VMScluster state transition
%%%%%%%%%%% OPCOM 16-SEP-2011 10:05:43.02 %%%%%%%%%%%
10:05:43.02 Node SLAVE (csid 00010001) received VMScluster membership
request from node ALEPH
%%%%%%%%%%% OPCOM 16-SEP-2011 10:05:43.02 %%%%%%%%%%%
10:05:43.02 Node SLAVE (csid 00010001) proposed addition of node ALEPH
%%%%%%%%%%% OPCOM 16-SEP-2011 10:05:43.02 %%%%%%%%%%%
10:05:43.02 Node SLAVE (csid 00010001) completed VMScluster state transition
This just repeats forever.
Some more information from SLAVE (the ALPHA server):
SHOW CLUSTER:
View of Cluster from system ID 4345 node:
SLAVE
16-SEP-2011 10:06:13
+--------------------------------------------------------+---------+
| SYSTEMS | MEMBERS |
+--------+--------------------------------+--------------+---------+
| NODE | HW_TYPE | SOFTWARE | STATUS |
+--------+--------------------------------+--------------+---------+
| SLAVE | AlphaServer 1000A 5/300 | VMS V8.3 | MEMBER |
| ALEPH | VAXstation 4000-VLC | VMS V7.3 | NEW |
+--------+--------------------------------+--------------+---------+
+------------------------------------------------------------------------------------+
|
CLUSTER |
+--------+-----------+----------+------------+-------------------+-------------------+
| CL_EXP | CL_QUORUM | CL_VOTES | CL_MEMBERS | FORMED |
LAST_TRANSITION |
+--------+-----------+----------+------------+-------------------+-------------------+
| 1 | 1 | 1 | 1 | 16-SEP-2011 09:56 |
16-SEP-2011 09:56 |
+--------+-----------+----------+------------+-------------------+-------------------+
SYSGEN> SHOW EXPECTED_VOTES
%CNXMAN, Completing VMScluster state transition
Parameter Name Current Default Min. Max. Unit
Dynamic
-------------- ------- ------- ------- ------- ----
-------
EXPECTED_VOTES 1 1 1 127 Votes
Any ideas why this is going wrong?
Thanks for the help, much appreciated,
Mark.
On 16/09/11 09:40, hvlems at zonnet.nl wrote:
> Regarding the alphaserver: check the value of expectedvotes in sysgen.
> In a cluster with non-voting satellites only, its value must be less than Votes+1
> Hans
> ------Origineel bericht------
> Van: Mark Wickens
> Afzender: owner-hecnet at Update.UU.SE
> Aan: hecnet at Update.UU.SE
> Beantwoorden: hecnet at Update.UU.SE
> Onderwerp: [HECnet] More clustering fun
> Verzonden: 16 september 2011 10:14
>
> I've now refreshed the VAX satellites system drive and installed it in the
> ALPHA server. The one problem I have remaining is that the VOTES the
> satellite is contributing to the cluster is 1. I believe for a proper
> satellite this should be 0.
>
> Is this a case of updating the MODPARAMS.DAT on the satellite and autogen
> and reboot? Do I need to do anything with the ALPHA servers configuration?
>
> Presumably I will need to reboot the ALPHA server as well.
>
> Thanks for the help,
>
> Kind regards, Mark.
>
Just to add to the picture - if I reduce VOTES on the satellite to 0 I get this happening:
-------- Original Message --------
Subject:
Re: [HECnet] More clustering fun
Date:
Fri, 16 Sep 2011 09:33:47 +0000
From:
hvlems at zonnet.nl
Reply-To:
hvlems at zonnet.nl
To:
Mark Wickens <mark at wickensonline.co.uk>
All I can think of is this:
1) Both slave and aleph both use the same VMSCLUSTER license
2) The cluster id and cluster password are different on both nodes.
On an alpha you can modify this in sysman (use help in sysman to find the correct command). On a Vax the command is burried in sysgen.
Hans
-----Original Message-----
From: Mark Wickens <mark at wickensonline.co.uk>
Date: Fri, 16 Sep 2011 10:11:41
Cc: <hvlems at zonnet.nl>
Subject: Re: [HECnet] More clustering fun
Hi Hans,
I didn't want to pre-empt what I thought happened last time as I wasn't
sure I'd got it right, but it has happened again so it's definitely an
issue.
I've updated the VOTES in ALEPH (the satellite) MODPARAMS.DAT, ran
AUTOGEN and rebooted the cluster.
Now when ALEPH attempts to join the cluster I get these messages repeatedly:
%CNXMAN, sending VAXcluster membership request to system SLAVE
%CNXMAN, sending VAXcluster membership request to system SLAVE
%CNXMAN, sending VAXcluster membership request to system SLAVE
%CNXMAN, sending VAXcluster membership request to system SLAVE
%CNXMAN, sending VAXcluster membership request to system SLAVE
and I see this on SLAVE:
$$
%CNXMAN, Received VMScluster membership request from system ALEPH
%CNXMAN, Proposing addition of system ALEPH
%CNXMAN, Completing VMScluster state transition
%%%%%%%%%%% OPCOM 16-SEP-2011 10:05:35.79 %%%%%%%%%%%
10:05:35.79 Node SLAVE (csid 00010001) received VMScluster membership
request from node ALEPH
%%%%%%%%%%% OPCOM 16-SEP-2011 10:05:35.79 %%%%%%%%%%%
10:05:35.79 Node SLAVE (csid 00010001) proposed addition of node ALEPH
%%%%%%%%%%% OPCOM 16-SEP-2011 10:05:35.79 %%%%%%%%%%%
10:05:35.79 Node SLAVE (csid 00010001) completed VMScluster state transition
$$
%CNXMAN, Received VMScluster membership request from system ALEPH
%CNXMAN, Proposing addition of system ALEPH
%CNXMAN, Completing VMScluster state transition
%%%%%%%%%%% OPCOM 16-SEP-2011 10:05:39.04 %%%%%%%%%%%
10:05:39.04 Node SLAVE (csid 00010001) received VMScluster membership
request from node ALEPH
%%%%%%%%%%% OPCOM 16-SEP-2011 10:05:39.04 %%%%%%%%%%%
10:05:39.04 Node SLAVE (csid 00010001) proposed addition of node ALEPH
%%%%%%%%%%% OPCOM 16-SEP-2011 10:05:39.04 %%%%%%%%%%%
10:05:39.04 Node SLAVE (csid 00010001) completed VMScluster state transition
$$
%CNXMAN, Received VMScluster membership request from system ALEPH
%CNXMAN, Proposing addition of system ALEPH
%CNXMAN, Completing VMScluster state transition
%%%%%%%%%%% OPCOM 16-SEP-2011 10:05:43.02 %%%%%%%%%%%
10:05:43.02 Node SLAVE (csid 00010001) received VMScluster membership
request from node ALEPH
%%%%%%%%%%% OPCOM 16-SEP-2011 10:05:43.02 %%%%%%%%%%%
10:05:43.02 Node SLAVE (csid 00010001) proposed addition of node ALEPH
%%%%%%%%%%% OPCOM 16-SEP-2011 10:05:43.02 %%%%%%%%%%%
10:05:43.02 Node SLAVE (csid 00010001) completed VMScluster state transition
This just repeats forever.
Some more information from SLAVE (the ALPHA server):
SHOW CLUSTER:
View of Cluster from system ID 4345 node:
SLAVE
16-SEP-2011 10:06:13
+--------------------------------------------------------+---------+
| SYSTEMS | MEMBERS |
+--------+--------------------------------+--------------+---------+
| NODE | HW_TYPE | SOFTWARE | STATUS |
+--------+--------------------------------+--------------+---------+
| SLAVE | AlphaServer 1000A 5/300 | VMS V8.3 | MEMBER |
| ALEPH | VAXstation 4000-VLC | VMS V7.3 | NEW |
+--------+--------------------------------+--------------+---------+
+------------------------------------------------------------------------------------+
|
CLUSTER |
+--------+-----------+----------+------------+-------------------+-------------------+
| CL_EXP | CL_QUORUM | CL_VOTES | CL_MEMBERS | FORMED |
LAST_TRANSITION |
+--------+-----------+----------+------------+-------------------+-------------------+
| 1 | 1 | 1 | 1 | 16-SEP-2011 09:56 |
16-SEP-2011 09:56 |
+--------+-----------+----------+------------+-------------------+-------------------+
SYSGEN> SHOW EXPECTED_VOTES
%CNXMAN, Completing VMScluster state transition
Parameter Name Current Default Min. Max. Unit
Dynamic
-------------- ------- ------- ------- ------- ----
-------
EXPECTED_VOTES 1 1 1 127 Votes
Any ideas why this is going wrong?
Thanks for the help, much appreciated,
Mark.
On 16/09/11 09:40, hvlems at zonnet.nl wrote:
> Regarding the alphaserver: check the value of expectedvotes in sysgen.
> In a cluster with non-voting satellites only, its value must be less than Votes+1
> Hans
> ------Origineel bericht------
> Van: Mark Wickens
> Afzender: owner-hecnet at Update.UU.SE
> Aan: hecnet at Update.UU.SE
> Beantwoorden: hecnet at Update.UU.SE
> Onderwerp: [HECnet] More clustering fun
> Verzonden: 16 september 2011 10:14
>
> I've now refreshed the VAX satellites system drive and installed it in the
> ALPHA server. The one problem I have remaining is that the VOTES the
> satellite is contributing to the cluster is 1. I believe for a proper
> satellite this should be 0.
>
> Is this a case of updating the MODPARAMS.DAT on the satellite and autogen
> and reboot? Do I need to do anything with the ALPHA servers configuration?
>
> Presumably I will need to reboot the ALPHA server as well.
>
> Thanks for the help,
>
> Kind regards, Mark.
>
I've now refreshed the VAX satellites system drive and installed it in the
ALPHA server. The one problem I have remaining is that the VOTES the
satellite is contributing to the cluster is 1. I believe for a proper
satellite this should be 0.
The number of votes a node has determines what happens to it when it
loses contact with other members of the cluster. If each node in a two
node cluster has one vote, then the cluster quorum is two votes. If
something happens to either node, the other notices that quorum has been
lost and will hang until quorum is reestablished. The reason for the hang
is that all each node knows is that it can't see the other node. It doesn't
know whether the other has shut down or is still running and might become
visible again shortly, in which case, everything can resume.
If your diskless satellite should die unexpectedly, it is somewhat
irritating if it also ends up hanging the other node, for no good reason.
If the satellite has lost contact with the node with the disks, then it can
do nothing anyway. Hence the reason for recommending zero votes for diskless
satellites. No other ill effects will result from leaving votes set to one.
Whatever the number of votes each node has, when shutting down a voting
member of the cluster, REMOVE_NODE should be specified in order to avoid
hanging the rest of the cluster after the shutdown completes. This specifies
that the remaining cluster nodes should recompute quorum taking into account
the loss of the node being shut down.
Regards.
Peter Coghlan.
Regarding the alphaserver: check the value of expectedvotes in sysgen.
In a cluster with non-voting satellites only, its value must be less than Votes+1
Hans
------Origineel bericht------
Van: Mark Wickens
Afzender: owner-hecnet at Update.UU.SE
Aan: hecnet at Update.UU.SE
Beantwoorden: hecnet at Update.UU.SE
Onderwerp: [HECnet] More clustering fun
Verzonden: 16 september 2011 10:14
I've now refreshed the VAX satellites system drive and installed it in the
ALPHA server. The one problem I have remaining is that the VOTES the
satellite is contributing to the cluster is 1. I believe for a proper
satellite this should be 0.
Is this a case of updating the MODPARAMS.DAT on the satellite and autogen
and reboot? Do I need to do anything with the ALPHA servers configuration?
Presumably I will need to reboot the ALPHA server as well.
Thanks for the help,
Kind regards, Mark.
Correct. VOTES is not dynamic so you have two options:
1) Modify modparams.dat and run autogen with the reboot option
2) Modify modparams.dat
Run sysgen to modify VOTES and write current
Reboot
There is a lazy third option: same as 2) but skip modifying modparams.dat. Which is sloppy work, to put it mildly.
------Origineel bericht------
Van: Mark Wickens
Afzender: owner-hecnet at Update.UU.SE
Aan: hecnet at Update.UU.SE
Beantwoorden: hecnet at Update.UU.SE
Onderwerp: [HECnet] More clustering fun
Verzonden: 16 september 2011 10:14
I've now refreshed the VAX satellites system drive and installed it in the
ALPHA server. The one problem I have remaining is that the VOTES the
satellite is contributing to the cluster is 1. I believe for a proper
satellite this should be 0.
Is this a case of updating the MODPARAMS.DAT on the satellite and autogen
and reboot? Do I need to do anything with the ALPHA servers configuration?
Presumably I will need to reboot the ALPHA server as well.
Thanks for the help,
Kind regards, Mark.
I've now refreshed the VAX satellites system drive and installed it in the ALPHA server. The one problem I have remaining is that the VOTES the satellite is contributing to the cluster is 1. I believe for a proper satellite this should be 0.
Is this a case of updating the MODPARAMS.DAT on the satellite and autogen and reboot? Do I need to do anything with the ALPHA servers configuration?
Presumably I will need to reboot the ALPHA server as well.
Thanks for the help,
Kind regards, Mark.
I guess so, but remember that a scsi cluster was never possible with a VAX!
The reason for the behaviour is that diskdata and clusterdata follow different paths in a scsi cluster. Which isn't possible on VAXes (I think).
Then again it may be a bug in 8.3 of course!
------Origineel bericht------
Van: Peter Coghlan
Afzender: owner-hecnet at Update.UU.SE
Aan: hecnet at Update.UU.SE
Beantwoorden: hecnet at Update.UU.SE
Onderwerp: Re: [HECnet] Satellite configuration/NCP crash
Verzonden: 16 september 2011 00:37
It should happen before disk access across the CI bus is even possible.
I think we'll have to agree to disagree on this one :-)
I've just booted my Vaxstation 2000 into an NI cluster with my
Vaxstation 3100. I gave the commands:
NCP SET CIRCUIT SVA-0 STATE OFF
NCP SET CIRCUIT SVA-0 STATE ON
on each node. The only effects were an opcom message complaining about
adjacency down and the loss of my decnet connections. Both nodes
continued to function quite happily while the circuit was off. No crash.
Anyway, it looks like avoiding issueing the SET CIRCUIT STATE OFF
command by making the changes to the permanent database and rebooting
was a good way of sidestepping the problem. Hopefully the bug is only
triggered when SET CIRCUIT STATE OFF is issued in Marks setup.
Apologies to gerry77. I missed your reply earlier and ended up repeating
much of what you said.
Regards,
Peter Coghlan.
On 2011-09-16 00:30, Mark Wickens wrote:
Just thinking aloud,
Can I run a bridge between the dongle based network and my own home
network?
If the bridge program on my network is wrapped with a script which
checks the dyndns hostname periodically against the recorded IP address,
cannot I not reconfigure the bridge on my home network and restart it if
required?
If this is possible it keeps my configuration in a realm I've got half a
chance of working out myself and saves me from hassling anyone else.
Presumably at home I would need to run two bridges on different ports -
with the current bridge linking back to Johnny as normal.
Short answer: yes.
Johnny
--
Johnny Billquist || "I'm on a bus
|| on a psychedelic trip
email: bqt at softjar.se || Reading murder books
pdp is alive! || tryin' to stay hip" - B. Idol
Mark,
Forget the bridge. Run Multinet at the event and either connect to your
end via Multinet (you are already running that), or connect to mine.
Either way it will just work. I personally, would use the dedicated
events area on the machines at the event and be done with it!
I can either supply you with my code for your end, or you can connect to
my end. Your choice. In the second scenerio, all I have to do is add
one (1) entry in the existing database and off we go!
-Steve
-----Original Message-----
From: owner-hecnet at Update.UU.SE [mailto:owner-hecnet at Update.UU.SE] On
Behalf Of Mark Wickens
Sent: Thursday, September 15, 2011 18:30
To: hecnet at Update.UU.SE
Subject: [HECnet] Another thought regarding dynamic bridge setup
Just thinking aloud,
Can I run a bridge between the dongle based network and my own home
network?
If the bridge program on my network is wrapped with a script which
checks the dyndns hostname periodically against the recorded IP address,
cannot I not reconfigure the bridge on my home network and restart it if
required?
If this is possible it keeps my configuration in a realm I've got half a
chance of working out myself and saves me from hassling anyone else.
Presumably at home I would need to run two bridges on different ports -
with the current bridge linking back to Johnny as normal.
Regards, Mark
It should happen before disk access across the CI bus is even possible.
I think we'll have to agree to disagree on this one :-)
I've just booted my Vaxstation 2000 into an NI cluster with my
Vaxstation 3100. I gave the commands:
NCP SET CIRCUIT SVA-0 STATE OFF
NCP SET CIRCUIT SVA-0 STATE ON
on each node. The only effects were an opcom message complaining about
adjacency down and the loss of my decnet connections. Both nodes
continued to function quite happily while the circuit was off. No crash.
Anyway, it looks like avoiding issueing the SET CIRCUIT STATE OFF
command by making the changes to the permanent database and rebooting
was a good way of sidestepping the problem. Hopefully the bug is only
triggered when SET CIRCUIT STATE OFF is issued in Marks setup.
Apologies to gerry77. I missed your reply earlier and ended up repeating
much of what you said.
Regards,
Peter Coghlan.
Actually, who cares. All of this happens at this end. I can tell this
end to do the lookup every second if I wish. In reality it is usually
done in some number of minutes. As for the reboot, that's why it is a
dedicated machine. It can reboot as often as it needs to - I don't
depend on it except to provide connectivity to this site from remote
sites that have dynamic IP addresses. The actual reboot is only a few
minutes. This entire process is automated and it is able to inform via
email (delayed using NMAIL) anyone necessary that their was a change in
state (address).
-Steve
-----Original Message-----
From: owner-hecnet at Update.UU.SE [mailto:owner-hecnet at Update.UU.SE] On
Behalf Of Johnny Billquist
Sent: Thursday, September 15, 2011 17:46
To: hecnet at Update.UU.SE
Subject: Re: [HECnet] Alternatives to bridge when running via a dynamic
IP address
The question is how long does it take to detect, and is a reboot every
time acceptable?
Johnny
On 2011-09-15 23:40, Steve Davidson wrote:
This issue has been dealt with here. The declab.net site runs a
dedicated VAX with Multinet that polls the other end at predetermined
intervals. When the other ends' address changes this end makes change
to the Multinet database and reboots. This software has been in use
for well over a year and appears to work very well.
It does require the use of DynDNS at the end that changes addresses so
that this end can do the lookup.
-Steve
-----Original Message-----
From: owner-hecnet at Update.UU.SE [mailto:owner-hecnet at Update.UU.SE] On
Behalf Of Mark Wickens
Sent: Thursday, September 15, 2011 17:14
To: hecnet at Update.UU.SE
Subject: Re: [HECnet] Alternatives to bridge when running via a
dynamic IP address
On 15/09/11 22:03, hvlems at zonnet.nl wrote:
Won't the IP address remain unchanged as long as you keep the router
up and running?
Or would thet be too costly?
------Origineel bericht------
Van: Mark Wickens
Afzender: owner-hecnet at Update.UU.SE
Aan: hecnet at Update.UU.SE
Beantwoorden: hecnet at Update.UU.SE
Onderwerp: [HECnet] Alternatives to bridge when running via a dynamic
IP address
Verzonden: 15 september 2011 22:53
Guys,
Might as well explore this as an option.
The 3G network enabled Vigor 2800 router that I will be using at the
DEC Legacy event appears to be working reasonably well, but
unsurprisingly there does appear to be some fluctuation in the IP
address.
This means that an alternative to Johnny's bridge program might be
called for. I have a dynamic DNS reference set up and now updating
periodically, hecnet.no-ip.org, the question is what can I use as an
alternative to the bridge program which will cope with IP address
changes?
I seem to remember Multinet tunnels being mentioned in the past, do
they support dnydns, or any other technology?
Thanks for the help,
Mark.
Unfortunately not. I've seen three IP address changes in about the
same number of hours.
I don't know whether this poor performance is down to the completely
haphazard location of the dongle, or whether it's always this rubbish
but you don't realise when all you're doing is browsing the internet.
My experience of using a 3G dongle for anything other than web
browsing is very limited!
Regards
Mark.
--
Johnny Billquist || "I'm on a bus
|| on a psychedelic trip
email: bqt at softjar.se || Reading murder books
pdp is alive! || tryin' to stay hip" - B. Idol
Just thinking aloud,
Can I run a bridge between the dongle based network and my own home network?
If the bridge program on my network is wrapped with a script which checks the dyndns hostname periodically against the recorded IP address, cannot I not reconfigure the bridge on my home network and restart it if required?
If this is possible it keeps my configuration in a realm I've got half a chance of working out myself and saves me from hassling anyone else.
Presumably at home I would need to run two bridges on different ports - with the current bridge linking back to Johnny as normal.
Regards, Mark