On 16/09/11 11:22, hvlems at zonnet.nl wrote:
Did you adjust EXPECTED_VOTES on the alpha server?
From: Mark Wickens <mark at wickensonline.co.uk>
Sender: owner-hecnet at Update.UU.SE
Date: Fri, 16 Sep 2011 11:09:04 +0100
To: <hecnet at Update.UU.SE>
ReplyTo: hecnet at Update.UU.SE
Subject: Fwd: Re: [HECnet] More clustering fun
Just to add to the picture - if I reduce VOTES on the satellite to 0 I get this
happening:
-------- Original Message --------
Subject:
Re: [HECnet] More clustering fun
Date:
Fri, 16 Sep 2011 09:33:47 +0000
From:
hvlems at zonnet.nl
Reply-To:
hvlems at zonnet.nl
To:
Mark Wickens <mark at wickensonline.co.uk>
All I can think of is this:
1) Both slave and aleph both use the same VMSCLUSTER license
2) The cluster id and cluster password are different on both nodes.
On an alpha you can modify this in sysman (use help in sysman to find the correct
command). On a Vax the command is burried in sysgen.
Hans
-----Original Message-----
From: Mark Wickens <mark at wickensonline.co.uk>
Date: Fri, 16 Sep 2011 10:11:41
Cc: <hvlems at zonnet.nl>
Subject: Re: [HECnet] More clustering fun
Hi Hans,
I didn't want to pre-empt what I thought happened last time as I wasn't
sure I'd got it right, but it has happened again so it's definitely an
issue.
I've updated the VOTES in ALEPH (the satellite) MODPARAMS.DAT, ran
AUTOGEN and rebooted the cluster.
Now when ALEPH attempts to join the cluster I get these messages repeatedly:
%CNXMAN, sending VAXcluster membership request to system SLAVE
%CNXMAN, sending VAXcluster membership request to system SLAVE
%CNXMAN, sending VAXcluster membership request to system SLAVE
%CNXMAN, sending VAXcluster membership request to system SLAVE
%CNXMAN, sending VAXcluster membership request to system SLAVE
and I see this on SLAVE:
$$
%CNXMAN, Received VMScluster membership request from system ALEPH
%CNXMAN, Proposing addition of system ALEPH
%CNXMAN, Completing VMScluster state transition
%%%%%%%%%%% OPCOM 16-SEP-2011 10:05:35.79 %%%%%%%%%%%
10:05:35.79 Node SLAVE (csid 00010001) received VMScluster membership
request from node ALEPH
%%%%%%%%%%% OPCOM 16-SEP-2011 10:05:35.79 %%%%%%%%%%%
10:05:35.79 Node SLAVE (csid 00010001) proposed addition of node ALEPH
%%%%%%%%%%% OPCOM 16-SEP-2011 10:05:35.79 %%%%%%%%%%%
10:05:35.79 Node SLAVE (csid 00010001) completed VMScluster state transition
$$
%CNXMAN, Received VMScluster membership request from system ALEPH
%CNXMAN, Proposing addition of system ALEPH
%CNXMAN, Completing VMScluster state transition
%%%%%%%%%%% OPCOM 16-SEP-2011 10:05:39.04 %%%%%%%%%%%
10:05:39.04 Node SLAVE (csid 00010001) received VMScluster membership
request from node ALEPH
%%%%%%%%%%% OPCOM 16-SEP-2011 10:05:39.04 %%%%%%%%%%%
10:05:39.04 Node SLAVE (csid 00010001) proposed addition of node ALEPH
%%%%%%%%%%% OPCOM 16-SEP-2011 10:05:39.04 %%%%%%%%%%%
10:05:39.04 Node SLAVE (csid 00010001) completed VMScluster state transition
$$
%CNXMAN, Received VMScluster membership request from system ALEPH
%CNXMAN, Proposing addition of system ALEPH
%CNXMAN, Completing VMScluster state transition
%%%%%%%%%%% OPCOM 16-SEP-2011 10:05:43.02 %%%%%%%%%%%
10:05:43.02 Node SLAVE (csid 00010001) received VMScluster membership
request from node ALEPH
%%%%%%%%%%% OPCOM 16-SEP-2011 10:05:43.02 %%%%%%%%%%%
10:05:43.02 Node SLAVE (csid 00010001) proposed addition of node ALEPH
%%%%%%%%%%% OPCOM 16-SEP-2011 10:05:43.02 %%%%%%%%%%%
10:05:43.02 Node SLAVE (csid 00010001) completed VMScluster state transition
This just repeats forever.
Some more information from SLAVE (the ALPHA server):
SHOW CLUSTER:
View of Cluster from system ID 4345 node:
SLAVE
16-SEP-2011 10:06:13
+--------------------------------------------------------+---------+
| SYSTEMS | MEMBERS |
+--------+--------------------------------+--------------+---------+
| NODE | HW_TYPE | SOFTWARE | STATUS |
+--------+--------------------------------+--------------+---------+
| SLAVE | AlphaServer 1000A 5/300 | VMS V8.3 | MEMBER |
| ALEPH | VAXstation 4000-VLC | VMS V7.3 | NEW |
+--------+--------------------------------+--------------+---------+
+------------------------------------------------------------------------------------+
|
CLUSTER |
+--------+-----------+----------+------------+-------------------+-------------------+
| CL_EXP | CL_QUORUM | CL_VOTES | CL_MEMBERS | FORMED |
LAST_TRANSITION |
+--------+-----------+----------+------------+-------------------+-------------------+
| 1 | 1 | 1 | 1 | 16-SEP-2011 09:56 |
16-SEP-2011 09:56 |
+--------+-----------+----------+------------+-------------------+-------------------+
SYSGEN> SHOW EXPECTED_VOTES
%CNXMAN, Completing VMScluster state transition
Parameter Name Current Default Min. Max. Unit
Dynamic
-------------- ------- ------- ------- ------- ----
-------
EXPECTED_VOTES 1 1 1 127 Votes
Any ideas why this is going wrong?
Thanks for the help, much appreciated,
Mark.
On 16/09/11 09:40, hvlems at zonnet.nl wrote:
Regarding the alphaserver: check the value of
expectedvotes in sysgen.
In a cluster with non-voting satellites only, its value must be less than Votes+1
Hans
------Origineel bericht------
Van: Mark Wickens
Afzender: owner-hecnet at Update.UU.SE
Aan: hecnet at Update.UU.SE
Beantwoorden: hecnet at Update.UU.SE
Onderwerp: [HECnet] More clustering fun
Verzonden: 16 september 2011 10:14
I've now refreshed the VAX satellites system drive and installed it in the
ALPHA server. The one problem I have remaining is that the VOTES the
satellite is contributing to the cluster is 1. I believe for a proper
satellite this should be 0.
Is this a case of updating the MODPARAMS.DAT on the satellite and autogen
and reboot? Do I need to do anything with the ALPHA servers configuration?
Presumably I will need to reboot the ALPHA server as well.
Thanks for the help,
Kind regards, Mark.
The EXPECTED_VOTES on the ALPHA is set to 1 - which I assume is the value we would
anticipate would work?
Even when both the server and the satellite have VOTES set to 1 the expected votes remains
at 1 on the server, although if you examine the cluster via SHOW CLUSTER it shows that the
quorum is set to 2:
View of Cluster from system ID 4345 node: SLAVE 16-SEP-2011
11:21:15
+--------------------------------------------------------+---------+
| SYSTEMS
| MEMBERS |
+--------+--------------------------------+--------------+---------+
| NODE | HW_TYPE | SOFTWARE |
STATUS |
+--------+--------------------------------+--------------+---------+
| SLAVE | AlphaServer 1000A 5/300 | VMS V8.3 | MEMBER |
| ALEPH | VAXstation 4000-VLC | VMS V7.3 | MEMBER |
+--------+--------------------------------+--------------+---------+
+-------------------------------------------------------------------------------
| CLUSTER
+--------+-----------+----------+------------+-------------------+--------------
| CL_EXP | CL_QUORUM | CL_VOTES | CL_MEMBERS | FORMED |
LAST_TRANSIT
+--------+-----------+----------+------------+-------------------+--------------
| 2 | 2 | 2 | 2 | 16-SEP-2011
11:04 | 16-SEP-2011 1
+--------+-----------+----------+------------+-------------------+--------------
With the satellites VOTES set to 0 (which causes the endless %CNXMAN messages) if I turn
off the satellite at that point I get the following:
$$
%CNXMAN, Quorum lost, blocking activity
$$
%CNXMAN, Timed-out lost connection to system ALEPH
%CNXMAN, Proposing reconfiguration of the VMScluster
%CNXMAN, Discovered system ALEPH
%CNXMAN, Removed from VMScluster system ALEPH
%CNXMAN, Completing VMScluster state transition
%CNXMAN, Established connection to system ALEPH
and the server hangs. I end up having to reboot the server, because the satellite never
joins the cluster successfully.
It's all fun!
Regards, Mark.