Does anybody know how I could find out what the PDP-11 front end running
RSX20F is supposed to do when it is forced to reboot by the PDP-10?
What else does it do besides go into secondary protocol and signal the
10 that it is alive?
The klh10 micro-engine does not currently handle this and it is
affecting my ability to take a crash dump. The saga is below.
------------------------------------------------------------------------
*The Background*:
I ran into an unexpected issue when reworking some of the logic in
TTYSRV to 'normalize' the arguments to the GETOK%. Again, the ACJ can't
do much directly with a JFN as it is a small number that has no meaning
outside of job context. I'm not sure that you could (easily) figure out
what the JFN means with SNOOP% and PEEK%, either, because the active JSB
is the ACJ's and not that of the fork pending on the GETOK%, which means
you have to map that and then chase down the OFN (which might not be so
bad).
Since other monitor GETOK%'s which are asking about file related JFN's
(say for Secure files and directories) resolve the JFN to the actual
text of the file specification, having the terminal device resolved
before feeding the GETOK% seemed like the right thing to do and I
reworked some code, which was basically some reordering. The monitor
assembled just fine, no .PSECT complaints. There are 25 pages free in
section zero, but everything I've done is all in the swappable monitor
(NRCOD .PSECT), so I didn't expect any problems there, either. So
everything looks fine, I've done it a bunch of times before, yet:
BOOT>tlinkm
[BOOT: Loading] [OK]
[TOMMYT mounted]
VENTI2, PANDA TOPS-20 Monitor 7.1(21745)-5 Internet: Loading host
names [OK]
System restarting, wait...
Date and time is: Friday, 8-April-2022 8:32PM
Why reload? other debug
Run CHECKD? no
[KLH10: Illegal exec mode op, PC = 0: 0 0,0]
**********
*BUGHLT "ILLUUO"
*ADDITIONAL DATA: 300000000001, 000000000001, 000000000000
**********
[DTE: Unsupp sts bits 240020,,32463]
[HALTED: Program Halt, PC = 1043534]
Well! That's not very promising. Actually, kind of stunning. The
monitor has transferred control into accumulator zero, which contains a
zero, which is an illegal opcode. That can show up as an Illegal UUO
(I'll get to the DTE bits in a minute). I couldn't believe it was my
code, so I did the obvious thing and break-pointed it with Executive DDT
(EDDT), viz:
BOOT>TLINKM/L Loads the monitor, does not start it
BOOT>/g141 Starts monitor Executive DDT (SYSDDT)
EDDTF[ 1 Keeps EDDT locked in memory
DBUGSW[ 2 Write-enable swappable monitor
GOTSWM$B Breakpoint after swappable monitor loaded
147$G Normal start up
When it hits the breakpoint:
CALL SWPMLK$X Lock swappable monitor
CALL SWPMWE$X Write-enable swappable monitor
Set a breakpoint in the TLINK% JSYS entry and proceed
.TLINK$B
$P
And... Same thing. So that means my code is effectively dark at the
point of the crash because the breakpoint is never hit. It would appear
that the additional 93 (decimal) words I wrote pushed something
someplace, causing an address space overwrite. That is puzzling, as I
would have thought that LINK would have caught that or the postlude.
Whatever it is, it means that it's time for a crash dump and some
quality time with FILDDT and other utilities. I dusted off MAKDMP, duly
created an 8,193 page DUMP.EXE and rebooted, crashed with an ILLUUO,
reloaded BOOT and started it to get the crash dump (with the /D,
switch). And I got that odd DTE message and a hung machine.
KLH10> load boot.sav
Using word format "c36"...
Loaded "boot.sav":
Format: DEC-CSAV
Data: 4630, Symwds: 0, Low: 040000, High: 054641, Startaddress: 040000
Entvec: JRST (120 ST: 0147, 124 RE: 0, 137 VR: 500701,,21745)
KLH10> go
Starting KN10 at loc 040000...
[DTE: Unsupp sts bits 240020,,32463]
KLH10>> halt
[HALTED: FE command, PC = 772446]
KLH10>
Some more poking around shows that BOOT is hung in the following loop,
waiting for a response from the 11 that never comes.
SETEP4: MOVEI Q1,.DTMMN ;MONITOR MODE ON
MOVEM Q1,DTECMD ; STORE IN COMMAND
J5: MOVEI Q1,TO11DB ;RING THE BELL
XCT DTCNOW ;DO IT
J6: SKIPN DTEFLG ;WAIT FOR COMPLETION
J7: JRST .-1 ; ...
BOOT is forcing the 11 into something called by Tops-20 to be 'monitor
mode'. This does not happen on the normal first boot case because at
that time, the enumeration of devices has not taken place and the DTE is
not known nor is it connected to the priority interrupt system. When
BOOT is reloaded, the PDP-10 hardware is already conditioned and BOOT
detects this.
What the KLH10 DTE code sees is that a request has come in with bit
200000 high in the left half word, which is the LOAD-11 indicator. And
it doesn't have code to handle that, viz:
if (op10m_tlnn(w, (DTE_CMT_PWF|DTE_CMT_L11|DTE_CMT_INI))) {
fprintf(dt->dt_dv.dv_dbf, "[DTE: Unsupp sts bits %lo,,%lo]",
(long)LHGET(w), (long)RHGET(w));
return;
}
That's the message that is coming out, above. What can be seen is that
the PDP-11 has not been signaled that anything has happened, so it waits
for something to happen, except that now dvdte is waiting for another
command which the 10 will never issue.
I have seen this behavior before. Sometimes, the klh10 micro-engine can
get so active that a clock interrupt from the host operating system
doesn't come in time for the keep alive counter to be incremented. The
20 declares the 11 down, issues a DTEPKA BUGINF, attempts to restart it
and ...everything hangs...
At the time I addressed the issue with a hack: I put programmable waits
into DTESRV (set via the BOOT% JSYS) and gave the micro-engine a chance
to catch its breath before Tops-20 lowers the boom.
So one way to address the issue is to hack BOOT not to wait forever.
It's a hack, but it will get the job done. I think... The other is to
change dvdte.c to do what a regular 11 running RSX20F would do.