I was updating some of MRC's late changes into my version of PANDA.? A
(very) few of these were not documented with a change number, so I had
been caught by surprise.? While doing this, I stumbled over some
defensive coding that he had put in to keep even an enabled WHEEL or
OPERATOR from doing something /Really/ /Bad/.
I got to wondering about this, which led me to think about the use of
the OPERATOR id, which is analogous to the [1,2] PPN under Tops-10 and
maybe having administrator rights under Windows, not as bad as root.
Tops-20, like most mainframe operating systems of the day, assumed a
trained operations staff (as in trained to "DON'T TOUCH") and a trained
systems programming staff.? However, this all-or-nothing approach
eventually became more granular with the implementation of SEMI-OPR
capability.? I had implemented this at Columbia several years earlier
with SC%LOP, or Little Operator capability (this was known colloquially
as l'opr.)? We had a number of training issues with our operators (some
of them literally could not read), so this allowed them to use Galaxy to
mount tapes and paper forms, but nothing more.? PANDA provides
administrator capability (SC%ADM) which allows a certain amount of file
system administration.
The PANDA access control job (ACJ) already puts certain restrictions on
OPERATOR, mainly in the area of login.? It occurred to me that the above
assumptions concerning training simply are not true in the current
hobbyist universe.? You can actually remove OPERATOR (SC%OPR) capability
from the OPERATOR id, which will break a rather large number of things.?
As a matter of fact, you can delete the OPERATOR id entirely.? I'm not
sure you could complete boot up into an sensible state after that; I
guess you'd need to turn on debugging and go directly to KDDT to then
recreate the id by hand. That's training; maybe Monitor Internals level
training...
And of course, even trained individuals can do dumb things... (GUILTY!)?
So I modified the ACJ further to implement the following policies on a
CRDIR% of OPERATOR:
1. May not delete OPERATOR.
2. May not remove SC%OPR from OPERATOR.
3. May not set OPERATOR not login-able
* FILES-ONLY
* INVALID
* CHARGE-LIMITED
4. May not set OPERATOR FTP-ONLY (no way to get to the EXEC; the normal
time-sharing user interface)
5. May not set 'junk' (I.E., undocumented) bits in OPERATOR directory
protection word
6. May not set OPERATOR directory protection to allow world read, write
or connect
7. May not clear the OPERATOR password (set to NUL)
If you're winning enough to defeat these restrictions, then maybe you're
winning enough to understand the ramifications of doing them.
Comments for other DEC OS's?? I don't remember what RSTS, RSX and VMS do.
After working on the DTEKPA issue and coming up with a work around, I
went searching to see if the problem had ever been reported.? In fact,
it had been.?? By me. ? ...Well over a decade ago...
Basically, the default behavior for Tops-20 is to note that a front-end
counter isn't incrementing and--if on the next check it still hasn't
changed--to declare the associated PDP-11 front end down and to initiate
a reboot.? This gives the 11 about a millisecond to get its act together
before the KL whacks it.
Of course, that will never work in KLH10 (see below and previous);
you're hung because the KLH10 DTE emulator doesn't implement code to
simulate a reboot action, so the KL loops forever looking for the
response.? Of course, why would it?? There should be no reason for
Tops-20 to ever think its down.?? Seeing as I like writing assembler
more than C, I decided not to tweak the KLH10 code (I have tweaked it
for other cases and might rethink this).
The workaround is to define some additional functions for the BOOT% JSYS
to set some variables in resident storage in STG and modify DTESRV to
use them.? You can now set an elapsed time to wait before declaring the
PDP-11 down.? I default to five minutes of non-incrementing keep-alive.
Depending on how hard I am beating on things, the front end 'appears' to
go away between anywhere from 5 to 15 seconds.
I think probably the real fix is to not depend on an OS interrupt to
increment the counter.? A thread should be spawned which uses
nanosleep() to bump the counter every 500 microseconds, no matter what
the rest of KLH10 might be doing.? KLH10 is already using multiple forks
for the disks, tape, NI, Etc. (one reason I've preferred it over SimH),
so maybe this won't be a big deal.
> ------------------------------------------------------------------------
> *From*: Mark Crispin <MRC at Lingling.Panda.COM>
> *Subject*: Re: KLH10 front-end reload??
> *To*: Thomas DeBellis <slogin at acedsl.com>
> *Date*: Sun, 29 Nov 2009 11:20:46 -0800 (PST)
> *In-Reply-To*: <4B11CFEA.4020906 at acedsl.com>
> *Message-ID*: <alpine.OSX.2.00.0911291100070.245 at hsinghsing.panda.com>
>
> KLH10 implements enough for the front end DTE protocol for TOPS-20 to
> think that it is talking to a front end, albeit one with just a CTY
> (no KLINIK, DL11 lines, or DECnet).
>
> There is a keepalive timer in both TOPS-20 (to reboot the front end
> when the front end crashes) and in RSX-11F (to reboot TOPS-20 when it
> crashes).
>
> The front end also keeps time well enough to set TOPS-20's clock
> following a crash-reboot; this is superceded in KLH10 as the timebase
> instructions get the time from the host OS.? In addition, Panda
> monitors try to run a program called TIMCHK which will synchronize
> with NTP servers.
>
> So, what happened was that the DTE protocol stopped for some reason.
> TOPS-20 tried to reboot the front end in an attempt to get it going,
> but of course that was futile.
>
> Here's something that may help:
>
> On some Linux systems the esoteric real-time interrupt mechanisms in
> KLH10 don't work well.? So, it may be necessary to set
> KLH10_ITIME_SYNC instead of the default KLH10_ITIME_INTRP. Note that
> doing so will make KLH10 burn much more CPU on the host system.
>
> Usually, though, if you need to do this, it becomes pretty obvious at
> once, with nasty DTE errors from KLH10 shortly after booting (and any
> time you type on the CTY).
>
> One reason why I haven't upgraded Lingling's host CPU is that most of
> the newer machines that I've run KLH10 on have required doing this.?
> It's quite annoying.
>
>> ------------------------------------------------------------------------
>> *From*: Thomas DeBellis <slogin at acedsl.com>
>> *Subject*: KLH10 front-end reload??
>> *To*: Tops-20 Wizards <TOPS-20 at lingling.panda.com>
>> *Date*: Sat, 28 Nov 2009 20:35:38 -0500
>> *Message-ID*: <4B11CFEA.4020906 at acedsl.com>
>>
>> Tommy Timesharing hung earlier today; it had been up over a 175 days.
>> I got an error around 1:27PM-EST that the front end had hung and was
>> rebooted.? By the time I noticed at 4:08, the system was completely
>> wedged.
>>
>> I couldn't get in on the CTY, but KLH10 appeared to be working.
>> However, in the process of poking around, I completely destroyed some
>> information, so I am unable to determine exactly what was going on.?
>> Sigh...
>>
>> As I had been up since early June (and the middle of March before
>> that because of a power failure), this does not appear to be of
>> immediate concern.? The system had not had a single issue during all
>> this time (not even a BUGINF)
>>
>> However ...? Ideas, anyone?? Should I think about getting nervous?? I
>> mean, there IS no front-end on KLH10, right?
>> ________________________________________________________________________
>>
>> ************************************************************************
>> TOPS-20 BUGHLT-BUGCHK
>> ?Logged on Sat 28 Nov 2009 13:27:08????? Monitor uptime was 175 days
>> 19:54:26
>> ????Detected on system # 3699.
>> ????Record sequence number:??? 17527.
>> ************************************************************************
>>
>> Error information:
>> ????Date/Time of error:??? Sat 28 Nov 2009 13:27:05
>> ????Errors since reload:??? 1.
>> ????Fork # & Job #:??????? 777777,777777
>> ????User's logged in dir:??? unknown
>> ????Program name:
>> ????Error:??????????? BUGINF
>> ????Address of error:??? 1137031
>> ????Name:??????????? DTEKPA
>> ????Description:??????? DTE keep alive fail
>> ????CONI APR:??????? 007740,,000003 = No error bits detected
>> ????CONI PAG:??????? 000000,,660151
>> ????DATAI PAG:??????? 700101,,002750
>> ????Contents of ACs:
>> ???????????? 0:??? 000000,,575700
>> ???????????? 1:??? 777777,,000000
>> ???????????? 2:??? 000000,,000000
>> ???????????? 3:??? 000000,,277242
>> ???????????? 4:??? 000100,,206260
>> ???????????? 5:??? 000000,,247445
>> ???????????? 6:??? 000000,,000000
>> ???????????? 7:??? 000000,,000000
>> ??????????? 10:??? 777775,,000002
>> ??????????? 11:??? 000000,,000000
>> ??????????? 12:??? 000000,,614101
>> ??????????? 13:??? 777772,,000012
>> ??????????? 14:??? 777777,,777650
>> ??????????? 15:??? 777305,,353304
>> ??????????? 16:??? 620012,,000000
>> ??????????? 17:??? 777115,,246540
>> ????PI status:??????? 000000,,000175
>> ????Additional data items:??? 1
>> ??????????????? 000000,,000000
>>
>> ????ERA:??????????? 000000,,000000 = word #0 Memory read
>> ????Base phyiscal memory
>> ???? address at failure:??? 0
>>
>> ************************************************************************
>> FRONT END RELOADED
>> ?Logged on Sat 28 Nov 2009 13:28:04????? Monitor uptime was 175 days
>> 19:55:21
>> ????Detected on system # 3699.
>> ????Record sequence number:??? 17528.
>> ************************************************************************
>> ????CPU # :,,Front end #:??? 0,0
>> ????Status at reload:???? No error bits detected
>> ????Retries:??? 3
>> ????Filename for DUMP: <SYSTEM>0DMP11.BIN.1,28-Nov-2009 13:27:05
Hey. Update is virtually exhibiting at the VCF Berlin right now. As a
part of that, there is a tour of Updates stuff and computers, which
people might find amusing.
Among other things, you can see Magica (1.1) (a PDP-11/70) for a while,
up and running on HECnet. You can see the video at
https://wiki.vcfb.de/2020/start?id=en:update
Johnny
--
Johnny Billquist || "I'm on a bus
|| on a psychedelic trip
email: bqt at softjar.se || Reading murder books
pdp is alive! || tryin' to stay hip" - B. Idol
Hello Keith,
Indeed, there is a way to make the white Alphaservers run VMS.The difference is minimal between the servers themselves, but sometimes the adapters and peripherals could be different in the white Alphas and don't work with VMS. Therefore you should check which adapters there are in the box. Examples of not working ones are Adaptec SCSI-adapters and various Graphics adapters.?You need to enable the SRM firmware instead of the AlphaBIOS and then create a short script which is run at boot time.If you want to do it I can send you the detailed instructions.Kari
-------- Original message --------
From: Keith Halewood <Keith.Halewood at pitbulluk.org>
Date: 10/8/20 14:00 (GMT+02:00)
To: hecnet at Update.UU.SE
Subject: [HECnet] VMS on AlphaServer 5000
Hi,
?
I?ve trawled through HECnet emails for enlightenment around alpha systems running VMS and I believe I?m not repeating the question. So here goes.
?
There?s an DEC Alphaserver 5000 (white case) (EV56) floating a few miles away from where I live. It?s running NT, so there?s an AlphaBIOS. Is there a ?hidden? SRM in that machine and therefore is it able to run current (HPE at least) versions
of VMS? I don?t want to end up with a machine running NT ? my life has enough horror in it already. I had a look on HP?s FTP site at firmware updates and the Alphaserver 5000 isn?t even mentioned, or at least not in plain text.
?
I nearly bought a reasonably loaded Itanic RX2600 a few days ago but chickened out at the last minute. What I really should do is abandon all of this and buy a new motorbike.
?
Keith
Hi,
I've trawled through HECnet emails for enlightenment around alpha systems running VMS and I believe I'm not repeating the question. So here goes.
There's an DEC Alphaserver 5000 (white case) (EV56) floating a few miles away from where I live. It's running NT, so there's an AlphaBIOS. Is there a 'hidden' SRM in that machine and therefore is it able to run current (HPE at least) versions of VMS? I don't want to end up with a machine running NT - my life has enough horror in it already. I had a look on HP's FTP site at firmware updates and the Alphaserver 5000 isn't even mentioned, or at least not in plain text.
I nearly bought a reasonably loaded Itanic RX2600 a few days ago but chickened out at the last minute. What I really should do is abandon all of this and buy a new motorbike.
Keith
Gentlepeople,
I committed rev 560, which may be of interest to a number of you.
It's basically a change of internal machinery, no functional changes. The main ones are:
1. Restructured the point to point (DDCMP and Multinet) implementations. Error recovery is much cleaner now and should be faster. Retry for persistent problems has holdoff on it that starts out pretty fast and slows down to (typically) one retry per two minutes.
2. Ethernet in PCAP mode now uses the PCAP filter mechanism to request only the frames we want. This means the Python code no longer spends a bunch of CPU time tossing away non-DECnet frames. If your machine runs other traffic in significant quantities, as mine does, this can be quite useful.
3. Improve the performance of packet logging.
This code has been running on PYTHON for several days now without problems (except for one bug that has been fixed). You're invited to give it a try; as always, problem reports or other comments are welcome.
paul
Hi, guys!
I'm trying to boot latest SIMH (4.0-current) as LAVC satellite node over
link with latency about 100ms.
Boot node is VAX/VMS 5.5.2 with latest PEDRIVER ECO.
Result: MOP part of boot sequence is work without a hitch, but SCS part is
failing miserably.
The most frequent result:
SIMH console filled with 10 x " %VAXcluster-W-NOCONN, No connection to disk
server " messages, halting with
"%VAXcluster-F-CTRLERR, boot driver virtual circuit to SCSSYSTEM 0000
Failed"
Sometime it goes little further:
...
%VAXcluster-W-RETRY, Attempting to reconnect to a disk server
%VAXcluster-W-NOCONN, No connection to disk server
%VAXcluster-W-RETRY, Attempting to reconnect to a disk server
%VAXcluster-W-NOCONN, No connection to disk server VULCAN
%VAXcluster-W-RETRY, Attempting to reconnect to a disk server
%VAXcluster-W-NOCONN, No connection to disk server
%VAXcluster-W-RETRY, Attempting to reconnect to a disk server
%VAXcluster-I-CONN, Connected to disk server VULCAN
%VAXcluster-W-NOCONN, No connection to disk server VULCAN
%VAXcluster-W-RETRY, Attempting to reconnect to a disk server
...
And halting after minute or so of filling console with those messages.
Whenever I setup throttling in SIMH to 2500K ops/s, the node boots
successfully,
joins cluster successfully and work flawlessly, but slow.
Boot process takes about half hour. After boot, changing throttle value to
3500K ops/s still works.
Increasing throttle value further broke system, with the same messages about
disk server.
Throttled SIMH performance is about 5VUPS.
The only information about maximum channel latency restrictions found in
"Guidelines for OpenVMS Cluster Configurations" manual is that:
"When an FDDI is used for OpenVMS Cluster communications, the ring latency
when the FDDI ring is idle should not exceed 400 ms."
So I suppose that 100ms latency link should be good enough for booting
satellite nodes over it.
My understanding of situation is that combination of PEDRIVER/[PEBTDRIVER
within NISCS_LOAD] with fast hardware
and slow links is a primary reason of such behavior. Please correct me if
I'm wrong.
Do anyone have experience with booting VMS clusters over slow links? OS
version recommendations?
Probably some VMS tunable variables are exists for making PEDRIVER happy on
fast hardware?
Having PEDRIVER listings can shed lights for such PEDRIVER's buggy behavior.
Link details:
Two Cisco 1861 routers connected with Internet via ADSL on one side and 3G
HSDPA on other side.
TCP/IP between sites is routed over IPSec site-to-site VPN. Ping between
sites is about 100ms.
Over that VPN built DECnet family (eth.type = 0x6000..0x600F) bridge with
L2TPv3 VPN.
--
BR, Vladimir Machulsky
Gentlepeople,
Currently the details of what PyDECnet circuits connect to are not displayed. So you can see that a Multinet circuit is up and the other end is node 42.73, but you don't see the IP addresses or the like.
When things are working that's fine; when they are broken it might be helpful to see what something is trying to talk to.
On the other hand, hiding IP addresses is arguably a security feature. So I have this question:
1. Should the addressing info (basically, what's in the --device config argument) be shown in the PyDECnet web interface?
2. Should the addressing info be visible via NCP / NML?
The difference is that #1 can be limited to be local only, if you use an internal address for the web service. That's what I do for my nodes except for the mapper, though perhaps there isn't a strong argument why it should be so restrictive. #2, on the other hand, is visible to all HECnet users assuming you haven't disabled NML in your config settings.
I'd be interested in comments. Am I too concerned about hiding information, or is it sensible to be cautious?
paul