- HECnet - lists @ Update

by mark＠wickensonline.co.uk

On 17/09/11 18:15, Sampsa Laine wrote: Guys, I hink CHIMPY is running again, please try it out (8.401)... Sampsa Well done Sampsa! Woot-woot! Glad to hear you've had some luck. I was getting system errors from my VAXstation 4000/60. I've just stripped it down and cleaned it of all dust bunnies, reseated the RAM and contact cleaned all connectors. It was one of the few machines that I didn't do this to when I got it. So far so good. Just need to find a CDROM drive that works! Picked up a MicroVAX II today. Have yet to muster up the strength to remove it from the car! Mark.

13 years, 10 months

1
0
0 0

CHIMPY IS REBORN!

by sampsa＠mac.com

Guys, I hink CHIMPY is running again, please try it out (8.401)... Sampsa

13 years, 10 months

1
0
0 0

DS10 fucked - 3 beeps. Any ideas?

by sampsa＠mac.com

WEEEIRD - the SRM is not on serial only, not kbd/monitor as before. On 17 Sep 2011, at 11:46, Sampsa Laine wrote: I did, and found a totally superfluous (to my needs) QLogic SCSI card + drive in there. I've got a KZPAC attached with a BA365 - the machine boots, but just to a blue screen. With the Qlogic in, it won't even boot. Is there some magic key setting to reset the SRM ? Sampsa On 17 Sep 2011, at 11:40, hvlems at zonnet.nl wrote: Possibly cards became unseated. Open it up and remove all boards and reseat them. If that doesn't work. ------Origineel bericht------ Van: Sampsa Laine Afzender: owner-hecnet at Update.UU.SE Aan: hecnet at Update.UU.SE Beantwoorden: hecnet at Update.UU.SE Onderwerp: [HECnet] DS10 fucked - 3 beeps. Any ideas? Verzonden: 17 september 2011 11:52 Hi, So I just had my DS10 perfectly configured but on its side, and it fell over. Now it just gives me 3 beeps. What do I do, disassemble the whole thing and reseat the RAM or what?

13 years, 10 months

1
0
0 0

DS10 fucked - 3 beeps. Any ideas?

by sampsa＠mac.com

I did, and found a totally superfluous (to my needs) QLogic SCSI card + drive in there. I've got a KZPAC attached with a BA365 - the machine boots, but just to a blue screen. With the Qlogic in, it won't even boot. Is there some magic key setting to reset the SRM ? Sampsa On 17 Sep 2011, at 11:40, hvlems at zonnet.nl wrote: Possibly cards became unseated. Open it up and remove all boards and reseat them. If that doesn't work. ------Origineel bericht------ Van: Sampsa Laine Afzender: owner-hecnet at Update.UU.SE Aan: hecnet at Update.UU.SE Beantwoorden: hecnet at Update.UU.SE Onderwerp: [HECnet] DS10 fucked - 3 beeps. Any ideas? Verzonden: 17 september 2011 11:52 Hi, So I just had my DS10 perfectly configured but on its side, and it fell over. Now it just gives me 3 beeps. What do I do, disassemble the whole thing and reseat the RAM or what?

13 years, 10 months

1
0
0 0

DS10 fucked - 3 beeps. Any ideas?

by hvlems＠zonnet.nl

Possibly cards became unseated. Open it up and remove all boards and reseat them. If that doesn't work. ------Origineel bericht------ Van: Sampsa Laine Afzender: owner-hecnet at Update.UU.SE Aan: hecnet at Update.UU.SE Beantwoorden: hecnet at Update.UU.SE Onderwerp: [HECnet] DS10 fucked - 3 beeps. Any ideas? Verzonden: 17 september 2011 11:52 Hi, So I just had my DS10 perfectly configured but on its side, and it fell over. Now it just gives me 3 beeps. What do I do, disassemble the whole thing and reseat the RAM or what?

13 years, 10 months

1
0
0 0

DS10 fucked - 3 beeps. Any ideas?

by HECNET＠beyondthepale.ie

So I just had my DS10 perfectly configured but on its side, and it fell over. Now it just gives me 3 beeps. If it's like other alphas, there are different beep sequences that mean different things and the meanings of some of the more common ones are listed in the manual. What do I do, disassemble the whole thing and reseat the RAM or what? Given what happened, that sounds like a good idea. Also check for connections that have come loose and for any damage. If you have a graphics card fitted but no keyboard or mouse plugged in, this can cause some alphas to beep on startup, but they generally ignore the problem and continue on booting. Regards, Peter Coghlan.

13 years, 10 months

1
0
0 0

Fwd: Re: [HECnet] More clustering fun

by HECNET＠beyondthepale.ie

My apolgies for the confusion Peter. But you're right to assume I meant the cl.gr.nr. And yes, there is obviously something wrong here. IF an autogen was done I'd say have a look at modparams.dat - alloclass must be unique for each node - same for tapealloclass The alloclass is used in forming device names and lock resource names. If it is not set correctly, there will be problems in these areas but they should not prevent a node from joining a cluster. I would tend to leave it at zero unless there were good reasons for changing it. There are myriads of little rules about how various sysgen parameters should be set but most of them can be ignored for the moment, partly because VMS is not so fragile that having any one of a vast number of parameters wrong will prevent a system from booting and partly because the system picks sensible defaults that will work in typical cases for sysgen parameters. More important sysgen parameters to check are SCSNODE and particularly SCSSYSTEMID. If SCSSYSTEMID was inadvertently changed, this may well cause difficulties. SCSSYSTEMID must be the same as the decnet area number *1024 plus the decnet node number. This number is used to calculate the ethernet address used for cluster communications. It is also used to uniquely identify cluster nodes to other cluster nodes. If SCSNODE is changed without changing SCSSYSTEMID or vice versa, that node will have difficulties joining a cluster as the other nodes in the cluster will remember the previous values and complain that the new ones are not consistent. The solution here is to shut down all nodes in the cluster so that they are all down at the same time and then reboot each. And check the cluster license. AFAIK the cluster license must bu unique for each node. Or one license with 0 units and in that case make sure all nodenames are mentioned in the /INCLUDE list, again on all nodes where the license was loaded. I am not 100% sure on this but as far as I know, cluster licenses are not checked when a node is attempting to join a cluster because the system is operating at a very low level and may not be in a position to access the disk yet where its licenses are held. I think the response to lack of cluster license is whinges about it in the operator log rather than disallowing a node from joining. If it were to prevent a node from joining, I would expect to see a prominent error message mentioning a license problem. Regards, Peter Coghlan.

13 years, 10 months

1
0
0 0

DS10 fucked - 3 beeps. Any ideas?

by sampsa＠mac.com

Hi, So I just had my DS10 perfectly configured but on its side, and it fell over. Now it just gives me 3 beeps. What do I do, disassemble the whole thing and reseat the RAM or what?

13 years, 10 months

1
0
0 0

Fwd: Re: [HECnet] More clustering fun

by hvlems＠zonnet.nl

My apolgies for the confusion Peter. But you're right to assume I meant the cl.gr.nr. And yes, there is obviously something wrong here. IF an autogen was done I'd say have a look at modparams.dat - alloclass must be unique for each node - same for tapealloclass And check the cluster license. AFAIK the cluster license must bu unique for each node. Or one license with 0 units and in that case make sure all nodenames are mentioned in the /INCLUDE list, again on all nodes where the license was loaded. PS I'm travelling now so can't check my mail too often Hans -----Original Message----- From: Peter Coghlan <HECNET at beyondthepale.ie> Sender: owner-hecnet at Update.UU.SE Date: Fri, 16 Sep 2011 12:52:13 To: <hecnet at Update.UU.SE> Reply-To: hecnet at Update.UU.SESubject: Re: Fwd: Re: [HECnet] More clustering fun 2) The cluster id and cluster password are different on both nodes. Whats the cluster id? The cluster id is 4. Sorry - I should have said "What do you mean by the cluster id?" I guess you mean the cluster group number but I wanted to be sure we weren't referring to two different things. I've confirmed that the authorize file is the same on both the alpha and vax nodes: $$ diff dsa0:[sys0.syscommon.sysexe]cluster_authorize.dat - _$$ $4$dkb400:[sys0.syscommon.sysexe]cluster_authorize.dat Number of difference sections found: 0 Number of difference records found: 0 DIFFERENCES /IGNORE=()/MERGED=1- DSA0:[SYS0.SYSCOMMON.SYSEXE]CLUSTER_AUTHORIZE.DAT;1- $4$DKB400:[SYS0.SYSCOMMON.SYSEXE]CLUSTER_AUTHORIZE.DAT;1 [snip] It's weird that by changing the VOTES to 0 on the satellite and running an AUTOGEN that this behaviour is experienced. Before that the satellite boots off the served disk no problem. I doubt that changing VOTES to 0 was responsible. To check this, you could set VOTES back to 1 (probably manually using SYSGEN) and test again. (A problem with running AUTOGEN for the first time is that it finds anything that you didn't know was lurking in MODPARAMS.DAT and acts on it.) Indeed, the 'master' disk which is a local drive in the VAXstation boots into the cluster no problem, but again, with VOTES = 1. Check that there are no files called SYS$SPECIFIC:[SYSEXE]CLUSTER_AUTHORIZE.DAT on either node. If you find any, delete them. You may have to reboot both nodes in order to get them to read the correct CLUSTER_AUTHORIZE.DAT files after copying them or deleting errant ones. Regards, Peter Coghlan.

13 years, 10 months

1
0
0 0

Fwd: Re: [HECnet] More clustering fun

by HECNET＠beyondthepale.ie

2) The cluster id and cluster password are different on both nodes. Whats the cluster id? The cluster id is 4. Sorry - I should have said "What do you mean by the cluster id?" I guess you mean the cluster group number but I wanted to be sure we weren't referring to two different things. I've confirmed that the authorize file is the same on both the alpha and vax nodes: $$ diff dsa0:[sys0.syscommon.sysexe]cluster_authorize.dat - _$$ $4$dkb400:[sys0.syscommon.sysexe]cluster_authorize.dat Number of difference sections found: 0 Number of difference records found: 0 DIFFERENCES /IGNORE=()/MERGED=1- DSA0:[SYS0.SYSCOMMON.SYSEXE]CLUSTER_AUTHORIZE.DAT;1- $4$DKB400:[SYS0.SYSCOMMON.SYSEXE]CLUSTER_AUTHORIZE.DAT;1 [snip] It's weird that by changing the VOTES to 0 on the satellite and running an AUTOGEN that this behaviour is experienced. Before that the satellite boots off the served disk no problem. I doubt that changing VOTES to 0 was responsible. To check this, you could set VOTES back to 1 (probably manually using SYSGEN) and test again. (A problem with running AUTOGEN for the first time is that it finds anything that you didn't know was lurking in MODPARAMS.DAT and acts on it.) Indeed, the 'master' disk which is a local drive in the VAXstation boots into the cluster no problem, but again, with VOTES = 1. Check that there are no files called SYS$SPECIFIC:[SYSEXE]CLUSTER_AUTHORIZE.DAT on either node. If you find any, delete them. You may have to reboot both nodes in order to get them to read the correct CLUSTER_AUTHORIZE.DAT files after copying them or deleting errant ones. Regards, Peter Coghlan.

13 years, 10 months

1
0
0 0

Fwd: Re: [HECnet] More clustering fun

by mark＠wickensonline.co.uk

On Fri, 16 Sep 2011, Peter Coghlan wrote: 2) The cluster id and cluster password are different on both nodes. Whats the cluster id? The cluster id is 4. I've confirmed that the authorize file is the same on both the alpha and vax nodes: $$ diff dsa0:[sys0.syscommon.sysexe]cluster_authorize.dat - _$$ $4$dkb400:[sys0.syscommon.sysexe]cluster_authorize.dat Number of difference sections found: 0 Number of difference records found: 0 DIFFERENCES /IGNORE=()/MERGED=1- DSA0:[SYS0.SYSCOMMON.SYSEXE]CLUSTER_AUTHORIZE.DAT;1- $4$DKB400:[SYS0.SYSCOMMON.SYSEXE]CLUSTER_AUTHORIZE.DAT;1 The cluster group number and the cluster password must be the same on all nodes in the cluster. The easiest way to achieve this is to copy sys$common:[sysexe]cluster_authorize.dat from one to the other. I set the right values when I ran CLUSTER_CONFIG_LAN.COM on the VAX. If the cluster group numbers are different, you will end up trying to form two different clusters. If the passwords are different, the results will probably be something like you are seeing, nodes failing to completely join the cluster. There may also be some messages in the operator.log It's weird that by changing the VOTES to 0 on the satellite and running an AUTOGEN that this behaviour is experienced. Before that the satellite boots off the served disk no problem. Indeed, the 'master' disk which is a local drive in the VAXstation boots into the cluster no problem, but again, with VOTES = 1. Regards, Mark

13 years, 10 months

1
0
0 0

Fwd: Re: [HECnet] More clustering fun

by HECNET＠beyondthepale.ie

2) The cluster id and cluster password are different on both nodes. Whats the cluster id? The cluster group number and the cluster password must be the same on all nodes in the cluster. The easiest way to achieve this is to copy sys$common:[sysexe]cluster_authorize.dat from one to the other. If the cluster group numbers are different, you will end up trying to form two different clusters. If the passwords are different, the results will probably be something like you are seeing, nodes failing to completely join the cluster. There may also be some messages in the operator.log Regards, Peter.

13 years, 10 months

1
0
0 0

Fwd: Re: [HECnet] More clustering fun

by mark＠wickensonline.co.uk

On 16/09/11 11:22, hvlems at zonnet.nl wrote: Did you adjust EXPECTED_VOTES on the alpha server? From: Mark Wickens Sender: owner-hecnet at Update.UU.SE Date: Fri, 16 Sep 2011 11:09:04 +0100 To: <hecnet at Update.UU.SE> ReplyTo: hecnet at Update.UU.SE Subject: Fwd: Re: [HECnet] More clustering fun Just to add to the picture - if I reduce VOTES on the satellite to 0 I get this happening: -------- Original Message -------- Subject: Re: [HECnet] More clustering fun Date: Fri, 16 Sep 2011 09:33:47 +0000 From: hvlems at zonnet.nl Reply-To: hvlems at zonnet.nl To: Mark Wickens All I can think of is this: 1) Both slave and aleph both use the same VMSCLUSTER license 2) The cluster id and cluster password are different on both nodes. On an alpha you can modify this in sysman (use help in sysman to find the correct command). On a Vax the command is burried in sysgen. Hans -----Original Message----- From: Mark Wickens Date: Fri, 16 Sep 2011 10:11:41 Cc: <hvlems at zonnet.nl> Subject: Re: [HECnet] More clustering fun Hi Hans, I didn't want to pre-empt what I thought happened last time as I wasn't sure I'd got it right, but it has happened again so it's definitely an issue. I've updated the VOTES in ALEPH (the satellite) MODPARAMS.DAT, ran AUTOGEN and rebooted the cluster. Now when ALEPH attempts to join the cluster I get these messages repeatedly: %CNXMAN, sending VAXcluster membership request to system SLAVE %CNXMAN, sending VAXcluster membership request to system SLAVE %CNXMAN, sending VAXcluster membership request to system SLAVE %CNXMAN, sending VAXcluster membership request to system SLAVE %CNXMAN, sending VAXcluster membership request to system SLAVE and I see this on SLAVE: $$ %CNXMAN, Received VMScluster membership request from system ALEPH %CNXMAN, Proposing addition of system ALEPH %CNXMAN, Completing VMScluster state transition %%%%%%%%%%% OPCOM 16-SEP-2011 10:05:35.79 %%%%%%%%%%% 10:05:35.79 Node SLAVE (csid 00010001) received VMScluster membership request from node ALEPH %%%%%%%%%%% OPCOM 16-SEP-2011 10:05:35.79 %%%%%%%%%%% 10:05:35.79 Node SLAVE (csid 00010001) proposed addition of node ALEPH %%%%%%%%%%% OPCOM 16-SEP-2011 10:05:35.79 %%%%%%%%%%% 10:05:35.79 Node SLAVE (csid 00010001) completed VMScluster state transition $$ %CNXMAN, Received VMScluster membership request from system ALEPH %CNXMAN, Proposing addition of system ALEPH %CNXMAN, Completing VMScluster state transition %%%%%%%%%%% OPCOM 16-SEP-2011 10:05:39.04 %%%%%%%%%%% 10:05:39.04 Node SLAVE (csid 00010001) received VMScluster membership request from node ALEPH %%%%%%%%%%% OPCOM 16-SEP-2011 10:05:39.04 %%%%%%%%%%% 10:05:39.04 Node SLAVE (csid 00010001) proposed addition of node ALEPH %%%%%%%%%%% OPCOM 16-SEP-2011 10:05:39.04 %%%%%%%%%%% 10:05:39.04 Node SLAVE (csid 00010001) completed VMScluster state transition $$ %CNXMAN, Received VMScluster membership request from system ALEPH %CNXMAN, Proposing addition of system ALEPH %CNXMAN, Completing VMScluster state transition %%%%%%%%%%% OPCOM 16-SEP-2011 10:05:43.02 %%%%%%%%%%% 10:05:43.02 Node SLAVE (csid 00010001) received VMScluster membership request from node ALEPH %%%%%%%%%%% OPCOM 16-SEP-2011 10:05:43.02 %%%%%%%%%%% 10:05:43.02 Node SLAVE (csid 00010001) proposed addition of node ALEPH %%%%%%%%%%% OPCOM 16-SEP-2011 10:05:43.02 %%%%%%%%%%% 10:05:43.02 Node SLAVE (csid 00010001) completed VMScluster state transition This just repeats forever. Some more information from SLAVE (the ALPHA server): SHOW CLUSTER: View of Cluster from system ID 4345 node: SLAVE 16-SEP-2011 10:06:13 +--------------------------------------------------------+---------+ | SYSTEMS | MEMBERS | +--------+--------------------------------+--------------+---------+ | NODE | HW_TYPE | SOFTWARE | STATUS | +--------+--------------------------------+--------------+---------+ | SLAVE | AlphaServer 1000A 5/300 | VMS V8.3 | MEMBER | | ALEPH | VAXstation 4000-VLC | VMS V7.3 | NEW | +--------+--------------------------------+--------------+---------+ +------------------------------------------------------------------------------------+ | CLUSTER | +--------+-----------+----------+------------+-------------------+-------------------+ | CL_EXP | CL_QUORUM | CL_VOTES | CL_MEMBERS | FORMED | LAST_TRANSITION | +--------+-----------+----------+------------+-------------------+-------------------+ | 1 | 1 | 1 | 1 | 16-SEP-2011 09:56 | 16-SEP-2011 09:56 | +--------+-----------+----------+------------+-------------------+-------------------+ SYSGEN> SHOW EXPECTED_VOTES %CNXMAN, Completing VMScluster state transition Parameter Name Current Default Min. Max. Unit Dynamic -------------- ------- ------- ------- ------- ---- ------- EXPECTED_VOTES 1 1 1 127 Votes Any ideas why this is going wrong? Thanks for the help, much appreciated, Mark. On 16/09/11 09:40, hvlems at zonnet.nl wrote: > Regarding the alphaserver: check the value of expectedvotes in sysgen. > In a cluster with non-voting satellites only, its value must be less than Votes+1 > Hans > ------Origineel bericht------ > Van: Mark Wickens > Afzender: owner-hecnet at Update.UU.SE > Aan: hecnet at Update.UU.SE > Beantwoorden: hecnet at Update.UU.SE > Onderwerp: [HECnet] More clustering fun > Verzonden: 16 september 2011 10:14 > > I've now refreshed the VAX satellites system drive and installed it in the > ALPHA server. The one problem I have remaining is that the VOTES the > satellite is contributing to the cluster is 1. I believe for a proper > satellite this should be 0. > > Is this a case of updating the MODPARAMS.DAT on the satellite and autogen > and reboot? Do I need to do anything with the ALPHA servers configuration? > > Presumably I will need to reboot the ALPHA server as well. > > Thanks for the help, > > Kind regards, Mark. > The EXPECTED_VOTES on the ALPHA is set to 1 - which I assume is the value we would anticipate would work? Even when both the server and the satellite have VOTES set to 1 the expected votes remains at 1 on the server, although if you examine the cluster via SHOW CLUSTER it shows that the quorum is set to 2: View of Cluster from system ID 4345 node: SLAVE 16-SEP-2011 11:21:15 +--------------------------------------------------------+---------+ | SYSTEMS | MEMBERS | +--------+--------------------------------+--------------+---------+ | NODE | HW_TYPE | SOFTWARE | STATUS | +--------+--------------------------------+--------------+---------+ | SLAVE | AlphaServer 1000A 5/300 | VMS V8.3 | MEMBER | | ALEPH | VAXstation 4000-VLC | VMS V7.3 | MEMBER | +--------+--------------------------------+--------------+---------+ +------------------------------------------------------------------------------- | CLUSTER +--------+-----------+----------+------------+-------------------+-------------- | CL_EXP | CL_QUORUM | CL_VOTES | CL_MEMBERS | FORMED | LAST_TRANSIT +--------+-----------+----------+------------+-------------------+-------------- | 2 | 2 | 2 | 2 | 16-SEP-2011 11:04 | 16-SEP-2011 1 +--------+-----------+----------+------------+-------------------+-------------- With the satellites VOTES set to 0 (which causes the endless %CNXMAN messages) if I turn off the satellite at that point I get the following: $$ %CNXMAN, Quorum lost, blocking activity $$ %CNXMAN, Timed-out lost connection to system ALEPH %CNXMAN, Proposing reconfiguration of the VMScluster %CNXMAN, Discovered system ALEPH %CNXMAN, Removed from VMScluster system ALEPH %CNXMAN, Completing VMScluster state transition %CNXMAN, Established connection to system ALEPH and the server hangs. I end up having to reboot the server, because the satellite never joins the cluster successfully. It's all fun! Regards, Mark.

13 years, 10 months

1
0
0 0

Fwd: Re: [HECnet] More clustering fun

by hvlems＠zonnet.nl

Did you adjust EXPECTED_VOTES on the alpha server? From: Mark Wickens Sender: owner-hecnet at Update.UU.SE Date: Fri, 16 Sep 2011 11:09:04 +0100 To: <hecnet at Update.UU.SE> ReplyTo: hecnet at Update.UU.SE Subject: Fwd: Re: [HECnet] More clustering fun Just to add to the picture - if I reduce VOTES on the satellite to 0 I get this happening: -------- Original Message -------- Subject: Re: [HECnet] More clustering fun Date: Fri, 16 Sep 2011 09:33:47 +0000 From: hvlems at zonnet.nl Reply-To: hvlems at zonnet.nl To: Mark Wickens All I can think of is this: 1) Both slave and aleph both use the same VMSCLUSTER license 2) The cluster id and cluster password are different on both nodes. On an alpha you can modify this in sysman (use help in sysman to find the correct command). On a Vax the command is burried in sysgen. Hans -----Original Message----- From: Mark Wickens Date: Fri, 16 Sep 2011 10:11:41 Cc: <hvlems at zonnet.nl> Subject: Re: [HECnet] More clustering fun Hi Hans, I didn't want to pre-empt what I thought happened last time as I wasn't sure I'd got it right, but it has happened again so it's definitely an issue. I've updated the VOTES in ALEPH (the satellite) MODPARAMS.DAT, ran AUTOGEN and rebooted the cluster. Now when ALEPH attempts to join the cluster I get these messages repeatedly: %CNXMAN, sending VAXcluster membership request to system SLAVE %CNXMAN, sending VAXcluster membership request to system SLAVE %CNXMAN, sending VAXcluster membership request to system SLAVE %CNXMAN, sending VAXcluster membership request to system SLAVE %CNXMAN, sending VAXcluster membership request to system SLAVE and I see this on SLAVE: $$ %CNXMAN, Received VMScluster membership request from system ALEPH %CNXMAN, Proposing addition of system ALEPH %CNXMAN, Completing VMScluster state transition %%%%%%%%%%% OPCOM 16-SEP-2011 10:05:35.79 %%%%%%%%%%% 10:05:35.79 Node SLAVE (csid 00010001) received VMScluster membership request from node ALEPH %%%%%%%%%%% OPCOM 16-SEP-2011 10:05:35.79 %%%%%%%%%%% 10:05:35.79 Node SLAVE (csid 00010001) proposed addition of node ALEPH %%%%%%%%%%% OPCOM 16-SEP-2011 10:05:35.79 %%%%%%%%%%% 10:05:35.79 Node SLAVE (csid 00010001) completed VMScluster state transition $$ %CNXMAN, Received VMScluster membership request from system ALEPH %CNXMAN, Proposing addition of system ALEPH %CNXMAN, Completing VMScluster state transition %%%%%%%%%%% OPCOM 16-SEP-2011 10:05:39.04 %%%%%%%%%%% 10:05:39.04 Node SLAVE (csid 00010001) received VMScluster membership request from node ALEPH %%%%%%%%%%% OPCOM 16-SEP-2011 10:05:39.04 %%%%%%%%%%% 10:05:39.04 Node SLAVE (csid 00010001) proposed addition of node ALEPH %%%%%%%%%%% OPCOM 16-SEP-2011 10:05:39.04 %%%%%%%%%%% 10:05:39.04 Node SLAVE (csid 00010001) completed VMScluster state transition $$ %CNXMAN, Received VMScluster membership request from system ALEPH %CNXMAN, Proposing addition of system ALEPH %CNXMAN, Completing VMScluster state transition %%%%%%%%%%% OPCOM 16-SEP-2011 10:05:43.02 %%%%%%%%%%% 10:05:43.02 Node SLAVE (csid 00010001) received VMScluster membership request from node ALEPH %%%%%%%%%%% OPCOM 16-SEP-2011 10:05:43.02 %%%%%%%%%%% 10:05:43.02 Node SLAVE (csid 00010001) proposed addition of node ALEPH %%%%%%%%%%% OPCOM 16-SEP-2011 10:05:43.02 %%%%%%%%%%% 10:05:43.02 Node SLAVE (csid 00010001) completed VMScluster state transition This just repeats forever. Some more information from SLAVE (the ALPHA server): SHOW CLUSTER: View of Cluster from system ID 4345 node: SLAVE 16-SEP-2011 10:06:13 +--------------------------------------------------------+---------+ | SYSTEMS | MEMBERS | +--------+--------------------------------+--------------+---------+ | NODE | HW_TYPE | SOFTWARE | STATUS | +--------+--------------------------------+--------------+---------+ | SLAVE | AlphaServer 1000A 5/300 | VMS V8.3 | MEMBER | | ALEPH | VAXstation 4000-VLC | VMS V7.3 | NEW | +--------+--------------------------------+--------------+---------+ +------------------------------------------------------------------------------------+ | CLUSTER | +--------+-----------+----------+------------+-------------------+-------------------+ | CL_EXP | CL_QUORUM | CL_VOTES | CL_MEMBERS | FORMED | LAST_TRANSITION | +--------+-----------+----------+------------+-------------------+-------------------+ | 1 | 1 | 1 | 1 | 16-SEP-2011 09:56 | 16-SEP-2011 09:56 | +--------+-----------+----------+------------+-------------------+-------------------+ SYSGEN> SHOW EXPECTED_VOTES %CNXMAN, Completing VMScluster state transition Parameter Name Current Default Min. Max. Unit Dynamic -------------- ------- ------- ------- ------- ---- ------- EXPECTED_VOTES 1 1 1 127 Votes Any ideas why this is going wrong? Thanks for the help, much appreciated, Mark. On 16/09/11 09:40, hvlems at zonnet.nl wrote: > Regarding the alphaserver: check the value of expectedvotes in sysgen. > In a cluster with non-voting satellites only, its value must be less than Votes+1 > Hans > ------Origineel bericht------ > Van: Mark Wickens > Afzender: owner-hecnet at Update.UU.SE > Aan: hecnet at Update.UU.SE > Beantwoorden: hecnet at Update.UU.SE > Onderwerp: [HECnet] More clustering fun > Verzonden: 16 september 2011 10:14 > > I've now refreshed the VAX satellites system drive and installed it in the > ALPHA server. The one problem I have remaining is that the VOTES the > satellite is contributing to the cluster is 1. I believe for a proper > satellite this should be 0. > > Is this a case of updating the MODPARAMS.DAT on the satellite and autogen > and reboot? Do I need to do anything with the ALPHA servers configuration? > > Presumably I will need to reboot the ALPHA server as well. > > Thanks for the help, > > Kind regards, Mark. >

13 years, 10 months

1
0
0 0

Fwd: Re: [HECnet] More clustering fun

by mark＠wickensonline.co.uk

Just to add to the picture - if I reduce VOTES on the satellite to 0 I get this happening: -------- Original Message -------- Subject: Re: [HECnet] More clustering fun Date: Fri, 16 Sep 2011 09:33:47 +0000 From: hvlems at zonnet.nl Reply-To: hvlems at zonnet.nl To: Mark Wickens All I can think of is this: 1) Both slave and aleph both use the same VMSCLUSTER license 2) The cluster id and cluster password are different on both nodes. On an alpha you can modify this in sysman (use help in sysman to find the correct command). On a Vax the command is burried in sysgen. Hans -----Original Message----- From: Mark Wickens Date: Fri, 16 Sep 2011 10:11:41 Cc: <hvlems at zonnet.nl> Subject: Re: [HECnet] More clustering fun Hi Hans, I didn't want to pre-empt what I thought happened last time as I wasn't sure I'd got it right, but it has happened again so it's definitely an issue. I've updated the VOTES in ALEPH (the satellite) MODPARAMS.DAT, ran AUTOGEN and rebooted the cluster. Now when ALEPH attempts to join the cluster I get these messages repeatedly: %CNXMAN, sending VAXcluster membership request to system SLAVE %CNXMAN, sending VAXcluster membership request to system SLAVE %CNXMAN, sending VAXcluster membership request to system SLAVE %CNXMAN, sending VAXcluster membership request to system SLAVE %CNXMAN, sending VAXcluster membership request to system SLAVE and I see this on SLAVE: $$ %CNXMAN, Received VMScluster membership request from system ALEPH %CNXMAN, Proposing addition of system ALEPH %CNXMAN, Completing VMScluster state transition %%%%%%%%%%% OPCOM 16-SEP-2011 10:05:35.79 %%%%%%%%%%% 10:05:35.79 Node SLAVE (csid 00010001) received VMScluster membership request from node ALEPH %%%%%%%%%%% OPCOM 16-SEP-2011 10:05:35.79 %%%%%%%%%%% 10:05:35.79 Node SLAVE (csid 00010001) proposed addition of node ALEPH %%%%%%%%%%% OPCOM 16-SEP-2011 10:05:35.79 %%%%%%%%%%% 10:05:35.79 Node SLAVE (csid 00010001) completed VMScluster state transition $$ %CNXMAN, Received VMScluster membership request from system ALEPH %CNXMAN, Proposing addition of system ALEPH %CNXMAN, Completing VMScluster state transition %%%%%%%%%%% OPCOM 16-SEP-2011 10:05:39.04 %%%%%%%%%%% 10:05:39.04 Node SLAVE (csid 00010001) received VMScluster membership request from node ALEPH %%%%%%%%%%% OPCOM 16-SEP-2011 10:05:39.04 %%%%%%%%%%% 10:05:39.04 Node SLAVE (csid 00010001) proposed addition of node ALEPH %%%%%%%%%%% OPCOM 16-SEP-2011 10:05:39.04 %%%%%%%%%%% 10:05:39.04 Node SLAVE (csid 00010001) completed VMScluster state transition $$ %CNXMAN, Received VMScluster membership request from system ALEPH %CNXMAN, Proposing addition of system ALEPH %CNXMAN, Completing VMScluster state transition %%%%%%%%%%% OPCOM 16-SEP-2011 10:05:43.02 %%%%%%%%%%% 10:05:43.02 Node SLAVE (csid 00010001) received VMScluster membership request from node ALEPH %%%%%%%%%%% OPCOM 16-SEP-2011 10:05:43.02 %%%%%%%%%%% 10:05:43.02 Node SLAVE (csid 00010001) proposed addition of node ALEPH %%%%%%%%%%% OPCOM 16-SEP-2011 10:05:43.02 %%%%%%%%%%% 10:05:43.02 Node SLAVE (csid 00010001) completed VMScluster state transition This just repeats forever. Some more information from SLAVE (the ALPHA server): SHOW CLUSTER: View of Cluster from system ID 4345 node: SLAVE 16-SEP-2011 10:06:13 +--------------------------------------------------------+---------+ | SYSTEMS | MEMBERS | +--------+--------------------------------+--------------+---------+ | NODE | HW_TYPE | SOFTWARE | STATUS | +--------+--------------------------------+--------------+---------+ | SLAVE | AlphaServer 1000A 5/300 | VMS V8.3 | MEMBER | | ALEPH | VAXstation 4000-VLC | VMS V7.3 | NEW | +--------+--------------------------------+--------------+---------+ +------------------------------------------------------------------------------------+ | CLUSTER | +--------+-----------+----------+------------+-------------------+-------------------+ | CL_EXP | CL_QUORUM | CL_VOTES | CL_MEMBERS | FORMED | LAST_TRANSITION | +--------+-----------+----------+------------+-------------------+-------------------+ | 1 | 1 | 1 | 1 | 16-SEP-2011 09:56 | 16-SEP-2011 09:56 | +--------+-----------+----------+------------+-------------------+-------------------+ SYSGEN> SHOW EXPECTED_VOTES %CNXMAN, Completing VMScluster state transition Parameter Name Current Default Min. Max. Unit Dynamic -------------- ------- ------- ------- ------- ---- ------- EXPECTED_VOTES 1 1 1 127 Votes Any ideas why this is going wrong? Thanks for the help, much appreciated, Mark. On 16/09/11 09:40, hvlems at zonnet.nl wrote: > Regarding the alphaserver: check the value of expectedvotes in sysgen. > In a cluster with non-voting satellites only, its value must be less than Votes+1 > Hans > ------Origineel bericht------ > Van: Mark Wickens > Afzender: owner-hecnet at Update.UU.SE > Aan: hecnet at Update.UU.SE > Beantwoorden: hecnet at Update.UU.SE > Onderwerp: [HECnet] More clustering fun > Verzonden: 16 september 2011 10:14 > > I've now refreshed the VAX satellites system drive and installed it in the > ALPHA server. The one problem I have remaining is that the VOTES the > satellite is contributing to the cluster is 1. I believe for a proper > satellite this should be 0. > > Is this a case of updating the MODPARAMS.DAT on the satellite and autogen > and reboot? Do I need to do anything with the ALPHA servers configuration? > > Presumably I will need to reboot the ALPHA server as well. > > Thanks for the help, > > Kind regards, Mark. >

13 years, 10 months

1
0
0 0

More clustering fun

by HECNET＠beyondthepale.ie

I've now refreshed the VAX satellites system drive and installed it in the ALPHA server. The one problem I have remaining is that the VOTES the satellite is contributing to the cluster is 1. I believe for a proper satellite this should be 0. The number of votes a node has determines what happens to it when it loses contact with other members of the cluster. If each node in a two node cluster has one vote, then the cluster quorum is two votes. If something happens to either node, the other notices that quorum has been lost and will hang until quorum is reestablished. The reason for the hang is that all each node knows is that it can't see the other node. It doesn't know whether the other has shut down or is still running and might become visible again shortly, in which case, everything can resume. If your diskless satellite should die unexpectedly, it is somewhat irritating if it also ends up hanging the other node, for no good reason. If the satellite has lost contact with the node with the disks, then it can do nothing anyway. Hence the reason for recommending zero votes for diskless satellites. No other ill effects will result from leaving votes set to one. Whatever the number of votes each node has, when shutting down a voting member of the cluster, REMOVE_NODE should be specified in order to avoid hanging the rest of the cluster after the shutdown completes. This specifies that the remaining cluster nodes should recompute quorum taking into account the loss of the node being shut down. Regards. Peter Coghlan.

13 years, 10 months

1
0
0 0

More clustering fun

by hvlems＠zonnet.nl

Regarding the alphaserver: check the value of expectedvotes in sysgen. In a cluster with non-voting satellites only, its value must be less than Votes+1 Hans ------Origineel bericht------ Van: Mark Wickens Afzender: owner-hecnet at Update.UU.SE Aan: hecnet at Update.UU.SE Beantwoorden: hecnet at Update.UU.SE Onderwerp: [HECnet] More clustering fun Verzonden: 16 september 2011 10:14 I've now refreshed the VAX satellites system drive and installed it in the ALPHA server. The one problem I have remaining is that the VOTES the satellite is contributing to the cluster is 1. I believe for a proper satellite this should be 0. Is this a case of updating the MODPARAMS.DAT on the satellite and autogen and reboot? Do I need to do anything with the ALPHA servers configuration? Presumably I will need to reboot the ALPHA server as well. Thanks for the help, Kind regards, Mark.

13 years, 10 months

1
0
0 0

More clustering fun

by hvlems＠zonnet.nl

Correct. VOTES is not dynamic so you have two options: 1) Modify modparams.dat and run autogen with the reboot option 2) Modify modparams.dat Run sysgen to modify VOTES and write current Reboot There is a lazy third option: same as 2) but skip modifying modparams.dat. Which is sloppy work, to put it mildly. ------Origineel bericht------ Van: Mark Wickens Afzender: owner-hecnet at Update.UU.SE Aan: hecnet at Update.UU.SE Beantwoorden: hecnet at Update.UU.SE Onderwerp: [HECnet] More clustering fun Verzonden: 16 september 2011 10:14 I've now refreshed the VAX satellites system drive and installed it in the ALPHA server. The one problem I have remaining is that the VOTES the satellite is contributing to the cluster is 1. I believe for a proper satellite this should be 0. Is this a case of updating the MODPARAMS.DAT on the satellite and autogen and reboot? Do I need to do anything with the ALPHA servers configuration? Presumably I will need to reboot the ALPHA server as well. Thanks for the help, Kind regards, Mark.

13 years, 10 months

1
0
0 0

More clustering fun

by mark＠wickensonline.co.uk

I've now refreshed the VAX satellites system drive and installed it in the ALPHA server. The one problem I have remaining is that the VOTES the satellite is contributing to the cluster is 1. I believe for a proper satellite this should be 0. Is this a case of updating the MODPARAMS.DAT on the satellite and autogen and reboot? Do I need to do anything with the ALPHA servers configuration? Presumably I will need to reboot the ALPHA server as well. Thanks for the help, Kind regards, Mark.

13 years, 10 months

1
0
0 0

Satellite configuration/NCP crash

by hvlems＠zonnet.nl

I guess so, but remember that a scsi cluster was never possible with a VAX! The reason for the behaviour is that diskdata and clusterdata follow different paths in a scsi cluster. Which isn't possible on VAXes (I think). Then again it may be a bug in 8.3 of course! ------Origineel bericht------ Van: Peter Coghlan Afzender: owner-hecnet at Update.UU.SE Aan: hecnet at Update.UU.SE Beantwoorden: hecnet at Update.UU.SE Onderwerp: Re: [HECnet] Satellite configuration/NCP crash Verzonden: 16 september 2011 00:37 It should happen before disk access across the CI bus is even possible. I think we'll have to agree to disagree on this one :-) I've just booted my Vaxstation 2000 into an NI cluster with my Vaxstation 3100. I gave the commands: NCP SET CIRCUIT SVA-0 STATE OFF NCP SET CIRCUIT SVA-0 STATE ON on each node. The only effects were an opcom message complaining about adjacency down and the loss of my decnet connections. Both nodes continued to function quite happily while the circuit was off. No crash. Anyway, it looks like avoiding issueing the SET CIRCUIT STATE OFF command by making the changes to the permanent database and rebooting was a good way of sidestepping the problem. Hopefully the bug is only triggered when SET CIRCUIT STATE OFF is issued in Marks setup. Apologies to gerry77. I missed your reply earlier and ended up repeating much of what you said. Regards, Peter Coghlan.

13 years, 10 months

1
0
0 0

Another thought regarding dynamic bridge setup

by bqt＠softjar.se

On 2011-09-16 00:30, Mark Wickens wrote: Just thinking aloud, Can I run a bridge between the dongle based network and my own home network? If the bridge program on my network is wrapped with a script which checks the dyndns hostname periodically against the recorded IP address, cannot I not reconfigure the bridge on my home network and restart it if required? If this is possible it keeps my configuration in a realm I've got half a chance of working out myself and saves me from hassling anyone else. Presumably at home I would need to run two bridges on different ports - with the current bridge linking back to Johnny as normal. Short answer: yes. Johnny -- Johnny Billquist || "I'm on a bus || on a psychedelic trip email: bqt at softjar.se || Reading murder books pdp is alive! || tryin' to stay hip" - B. Idol

13 years, 10 months

1
0
0 0

Another thought regarding dynamic bridge setup

by jeep＠scshome.net

Mark, Forget the bridge. Run Multinet at the event and either connect to your end via Multinet (you are already running that), or connect to mine. Either way it will just work. I personally, would use the dedicated events area on the machines at the event and be done with it! I can either supply you with my code for your end, or you can connect to my end. Your choice. In the second scenerio, all I have to do is add one (1) entry in the existing database and off we go! -Steve -----Original Message----- From: owner-hecnet at Update.UU.SE [mailto:owner-hecnet at Update.UU.SE] On Behalf Of Mark Wickens Sent: Thursday, September 15, 2011 18:30 To: hecnet at Update.UU.SE Subject: [HECnet] Another thought regarding dynamic bridge setup Just thinking aloud, Can I run a bridge between the dongle based network and my own home network? If the bridge program on my network is wrapped with a script which checks the dyndns hostname periodically against the recorded IP address, cannot I not reconfigure the bridge on my home network and restart it if required? If this is possible it keeps my configuration in a realm I've got half a chance of working out myself and saves me from hassling anyone else. Presumably at home I would need to run two bridges on different ports - with the current bridge linking back to Johnny as normal. Regards, Mark

13 years, 10 months

1
0
0 0

Satellite configuration/NCP crash

by HECNET＠beyondthepale.ie

It should happen before disk access across the CI bus is even possible. I think we'll have to agree to disagree on this one :-) I've just booted my Vaxstation 2000 into an NI cluster with my Vaxstation 3100. I gave the commands: NCP SET CIRCUIT SVA-0 STATE OFF NCP SET CIRCUIT SVA-0 STATE ON on each node. The only effects were an opcom message complaining about adjacency down and the loss of my decnet connections. Both nodes continued to function quite happily while the circuit was off. No crash. Anyway, it looks like avoiding issueing the SET CIRCUIT STATE OFF command by making the changes to the permanent database and rebooting was a good way of sidestepping the problem. Hopefully the bug is only triggered when SET CIRCUIT STATE OFF is issued in Marks setup. Apologies to gerry77. I missed your reply earlier and ended up repeating much of what you said. Regards, Peter Coghlan.

13 years, 10 months

1
0
0 0

Alternatives to bridge when running via a dynamic IP address

by jeep＠scshome.net

Actually, who cares. All of this happens at this end. I can tell this end to do the lookup every second if I wish. In reality it is usually done in some number of minutes. As for the reboot, that's why it is a dedicated machine. It can reboot as often as it needs to - I don't depend on it except to provide connectivity to this site from remote sites that have dynamic IP addresses. The actual reboot is only a few minutes. This entire process is automated and it is able to inform via email (delayed using NMAIL) anyone necessary that their was a change in state (address). -Steve -----Original Message----- From: owner-hecnet at Update.UU.SE [mailto:owner-hecnet at Update.UU.SE] On Behalf Of Johnny Billquist Sent: Thursday, September 15, 2011 17:46 To: hecnet at Update.UU.SE Subject: Re: [HECnet] Alternatives to bridge when running via a dynamic IP address The question is how long does it take to detect, and is a reboot every time acceptable? Johnny On 2011-09-15 23:40, Steve Davidson wrote: This issue has been dealt with here. The declab.net site runs a dedicated VAX with Multinet that polls the other end at predetermined intervals. When the other ends' address changes this end makes change to the Multinet database and reboots. This software has been in use for well over a year and appears to work very well. It does require the use of DynDNS at the end that changes addresses so that this end can do the lookup. -Steve -----Original Message----- From: owner-hecnet at Update.UU.SE [mailto:owner-hecnet at Update.UU.SE] On Behalf Of Mark Wickens Sent: Thursday, September 15, 2011 17:14 To: hecnet at Update.UU.SE Subject: Re: [HECnet] Alternatives to bridge when running via a dynamic IP address On 15/09/11 22:03, hvlems at zonnet.nl wrote: Won't the IP address remain unchanged as long as you keep the router up and running? Or would thet be too costly? ------Origineel bericht------ Van: Mark Wickens Afzender: owner-hecnet at Update.UU.SE Aan: hecnet at Update.UU.SE Beantwoorden: hecnet at Update.UU.SE Onderwerp: [HECnet] Alternatives to bridge when running via a dynamic IP address Verzonden: 15 september 2011 22:53 Guys, Might as well explore this as an option. The 3G network enabled Vigor 2800 router that I will be using at the DEC Legacy event appears to be working reasonably well, but unsurprisingly there does appear to be some fluctuation in the IP address. This means that an alternative to Johnny's bridge program might be called for. I have a dynamic DNS reference set up and now updating periodically, hecnet.no-ip.org, the question is what can I use as an alternative to the bridge program which will cope with IP address changes? I seem to remember Multinet tunnels being mentioned in the past, do they support dnydns, or any other technology? Thanks for the help, Mark. Unfortunately not. I've seen three IP address changes in about the same number of hours. I don't know whether this poor performance is down to the completely haphazard location of the dongle, or whether it's always this rubbish but you don't realise when all you're doing is browsing the internet. My experience of using a 3G dongle for anything other than web browsing is very limited! Regards Mark. -- Johnny Billquist || "I'm on a bus || on a psychedelic trip email: bqt at softjar.se || Reading murder books pdp is alive! || tryin' to stay hip" - B. Idol

13 years, 10 months

1
0
0 0

Another thought regarding dynamic bridge setup

by mark＠wickensonline.co.uk

Just thinking aloud, Can I run a bridge between the dongle based network and my own home network? If the bridge program on my network is wrapped with a script which checks the dyndns hostname periodically against the recorded IP address, cannot I not reconfigure the bridge on my home network and restart it if required? If this is possible it keeps my configuration in a realm I've got half a chance of working out myself and saves me from hassling anyone else. Presumably at home I would need to run two bridges on different ports - with the current bridge linking back to Johnny as normal. Regards, Mark

13 years, 10 months

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

HECnet