Does anybody know how I could find out what the PDP-11 front end running RSX20F is supposed to do when it is forced to reboot by the PDP-10?

What else does it do besides go into secondary protocol and signal the 10 that it is alive?

The klh10 micro-engine does not currently handle this and it is affecting my ability to take a crash dump.  The saga is below.


The Background:

I ran into an unexpected issue when reworking some of the logic in TTYSRV to 'normalize' the arguments to the GETOK%.  Again, the ACJ can't do much directly with a JFN as it is a small number that has no meaning outside of job context.  I'm not sure that you could (easily) figure out what the JFN means with SNOOP% and PEEK%, either, because the active JSB is the ACJ's and not that of the fork pending on the GETOK%, which means you have to map that and then chase down the OFN (which might not be so bad).

Since other monitor GETOK%'s which are asking about file related JFN's (say for Secure files and directories) resolve the JFN to the actual text of the file specification, having the terminal device resolved before feeding the GETOK% seemed like the right thing to do and I reworked some code, which was basically some reordering.  The monitor assembled just fine, no .PSECT complaints.  There are 25 pages free in section zero, but everything I've done is all in the swappable monitor (NRCOD .PSECT), so I didn't expect any problems there, either.  So everything looks fine, I've done it a bunch of times before, yet:

BOOT>tlinkm

[BOOT: Loading] [OK]

[TOMMYT mounted]
VENTI2, PANDA TOPS-20 Monitor 7.1(21745)-5 Internet: Loading host names [OK]

System restarting, wait...
Date and time is: Friday, 8-April-2022 8:32PM
Why reload? other debug
Run CHECKD? no

[KLH10: Illegal exec mode op, PC = 0: 0 0,0]
**********
*BUGHLT "ILLUUO"
*ADDITIONAL DATA: 300000000001, 000000000001, 000000000000
**********
[DTE: Unsupp sts bits 240020,,32463]

[HALTED: Program Halt, PC = 1043534]

Well!  That's not very promising.  Actually, kind of stunning.  The monitor has transferred control into accumulator zero, which contains a zero, which is an illegal opcode.  That can show up as an Illegal UUO (I'll get to the DTE bits in a minute).  I couldn't believe it was my code, so I did the obvious thing and break-pointed it with Executive DDT (EDDT), viz:

BOOT>TLINKM/L           Loads the monitor, does not start it
BOOT>/g141              Starts monitor Executive DDT (SYSDDT)

EDDTF[  1               Keeps EDDT locked in memory
DBUGSW[ 2               Write-enable swappable monitor
GOTSWM$B                Breakpoint after swappable monitor loaded
147$G                   Normal start up

When it hits the breakpoint:

CALL SWPMLK$X           Lock swappable monitor
CALL SWPMWE$X           Write-enable swappable monitor

Set a breakpoint in the TLINK% JSYS entry and proceed

.TLINK$B
$P

And...  Same thing.  So that means my code is effectively dark at the point of the crash because the breakpoint is never hit.  It would appear that the additional 93 (decimal) words I wrote pushed something someplace, causing an address space overwrite.  That is puzzling, as I would have thought that LINK would have caught that or the postlude.

Whatever it is, it means that it's time for a crash dump and some quality time with FILDDT and other utilities.  I dusted off MAKDMP, duly created an 8,193 page DUMP.EXE and rebooted, crashed with an ILLUUO, reloaded BOOT and started it to get the crash dump (with the /D, switch).  And I got that odd DTE message and a hung machine.

KLH10> load boot.sav   
Using word format "c36"...
Loaded "boot.sav":
Format: DEC-CSAV
Data: 4630, Symwds: 0, Low: 040000, High: 054641, Startaddress: 040000
Entvec: JRST (120 ST: 0147, 124 RE: 0, 137 VR: 500701,,21745)
KLH10> go
Starting KN10 at loc 040000...
[DTE: Unsupp sts bits 240020,,32463]
KLH10>> halt  
[HALTED: FE command, PC = 772446]
KLH10>

Some more poking around shows that BOOT is hung in the following loop, waiting for a response from the 11 that never comes. 

SETEP4: MOVEI Q1,.DTMMN         ;MONITOR MODE ON
        MOVEM Q1,DTECMD         ; STORE IN COMMAND
J5:     MOVEI Q1,TO11DB         ;RING THE BELL
        XCT DTCNOW              ;DO IT
J6:     SKIPN DTEFLG            ;WAIT FOR COMPLETION
J7:     JRST .-1                ; ...

BOOT is forcing the 11 into something called by Tops-20 to be 'monitor mode'.  This does not happen on the normal first boot case because at that time, the enumeration of devices has not taken place and the DTE is not known nor is it connected to the priority interrupt system.  When BOOT is reloaded, the PDP-10 hardware is already conditioned and BOOT detects this.

What the KLH10 DTE code sees is that a request has come in with bit 200000 high in the left half word, which is the LOAD-11 indicator.  And it doesn't have code to handle that, viz:

    if (op10m_tlnn(w, (DTE_CMT_PWF|DTE_CMT_L11|DTE_CMT_INI))) {
    fprintf(dt->dt_dv.dv_dbf, "[DTE: Unsupp sts bits %lo,,%lo]",
        (long)LHGET(w), (long)RHGET(w));
    return;
    }

That's the message that is coming out, above.  What can be seen is that the PDP-11 has not been signaled that anything has happened, so it waits for something to happen, except that now dvdte is waiting for another command which the 10 will never issue.

I have seen this behavior before.  Sometimes, the klh10 micro-engine can get so active that a clock interrupt from the host operating system doesn't come in time for the keep alive counter to be incremented.  The 20 declares the 11 down, issues a DTEPKA BUGINF, attempts to restart it and ...everything hangs...

At the time I addressed the issue with a hack: I put programmable waits into DTESRV (set via the BOOT% JSYS) and gave the micro-engine a chance to catch its breath before Tops-20 lowers the boom.

So one way to address the issue is to hack BOOT not to wait forever.  It's a hack, but it will get the job done.  I think...  The other is to change dvdte.c to do what a regular 11 running RSX20F would do.