I discovered a pernicious boot race condition in FAL. Early in system start up, only the boot structure (which may not be the login structure), is recognized by Tops-20. MOUNTR has to start and identify those other structures which should be made available and their settings, which it does by looking at where they are persisted in SYSTEM:DEVICE-STATUS.BIN file. Until MOUNTR has completed structure identification, any RCDIR% might fail for a non-boot structure. This is exactly what happens for FAL, which is running asynchronously and sometimes crashed during start up.
One quick and dirty hack was to simply wait to start FAL, which I did, but that has its own
problems which I won't detail here. I tried some other hacks,
such as doing the mount from the EXEC, which had other different
drawbacks.
The real thing to do is to issue a mount request for the structure from FAL itself and--once Galaxy is up and running--is processed, will make RCDIR% safe to do (or safer, anyway). At any rate, there should be no timing issue in this case. As Columbia's Galaxy programmer in the 1980's and having done some minor work in LCG's Galaxy group in the late 1970's, I actually knew how to do such a request and did so, having written some sub-routines to issue queue requests for our ID system in the 1980's.
So, I decided to fix FAL to 'do the right thing' and wound up down another rabbit hole. I should emphasize "knew how to do", as doing that kind of Galaxy IPCF request is kind of hairy, particularly if you haven't done it in four decades. After I looked at it for a while, I grew despaired at having to relearn it again.
Incredibly, DEC defined a new JSYS called QUEUE%
which seemed to do the trick. Briefly, it performs certain Galaxy
requests on behalf of the user, converting it's simpler calling
format to an internal Galaxy format, getting a PID, and returning
the response. QUEUE% does a lot
of cool things, including mount requests. Or at least it's
documented to...
Besides having its own bugs, there is no code in Galaxy handle a mount request from QUEUE%... The simpler format that QUEUE% is using is actually something that Quasar internally calls a 'short' queue request, which I had known about back in the day. What happens is that the QSRQUE module converts it into the regular ('long') format and away you go. This works fine for printing, plotting, punching, and other kinds of spooling requests, including batch jobs.
There is no short format for mounting, QSRQUE has no code for this because mount requests are dispatched to MOUNTR... Nothing is persisted in the failsoft file. So I wrote a conversion routine to do exactly that before the request makes it to QSRMDA and thence to MOUNTR.
Naturally doing all that resulted in a hailstorm of other issues as I pushed things a little farther than they had been designed for.
It happens. I'm working through everything, bit by bit.