It might not have anything in particular to do with the node number
(2.298), but could be it being the n:th entry being poked at. Or
possibly because it redefines an existing node.
Try clearing the node out and then define it?
Add a dummy extra entry in the file before this one is another idea.
Johnny
On 2021-05-05 22:05, Thomas DeBellis wrote:
I got annoyed at the thought of having to wait a few
more months for the
error condition to show up and, instead of having the batch job run more
frequently (and thus beating on poor MIM::), I wrote another batch job
which took every single file that I have /ever/ downloaded from MIM::
and inserted it.? So that's 75 files and it failed on number 54.
15:36:51 USER?? SETNOD>**T OLDS:NODE-DATA.TXT.54*
15:36:52 USER
15:36:52 USER?? SETNOD>**List Total*
15:36:52 USER
15:36:52 USER
15:36:52 USER?? TOTAL NODES FOUND: 869
15:36:52 USER
15:36:52 USER?? SETNOD>**Insert*
15:36:52 USER
15:36:52 USER?? ?SETNOD: Failed at node RSX11M (2.298), Item 650 of 869,
Error: _-11_
15:36:52 USER?? SETNOD>
It is interesting that it is failing on node 2.298, but this is before
that number had been reassigned to REACH::. The negative 11 error
returns means "Component in Wrong State" (aka NF.CWS), which I didn't
find immediately informative.? However, now I've got something to look
around for.
I still can't imagine why there would be anything particularly
diabolical about the number 2.298.
> ------------------------------------------------------------------------
>
> On 5/5/21 12:38 AM, Thomas DeBellis wrote:
>
> I finished the modifications to SCLINK to properly return error values
> which are negative and JNTMAN to return the error value in AC3 if
> .NDINT doesn't succeed inserting all the nodes.? Then I modified
> SETNOD to get this extended error information and print it.? I put the
> new monitor and SETNOD up, rebooted *?AND*?
>
> SETNOD>set nod 2.298 name REACH SETNOD>ins SETNOD>
>
> It works perfectly because, of course it does?
>
> So, as usual, Johnny's guess is pretty close to the mark, even if he
> isn't a 36 bit'er.? "Slightly broken"?? Yeah, 'slightly'
enough so
> that it can't be easily reproduced?
>
> The only thing I can think of is that the system had been up over 15
> weeks when I saw this.? I had looked at the storage space utilization
> with SYSDPY and didn't notice anything maxing out.? I restarted the
> GETNOD batch job on VENTI2::.? Maybe in another 15 weeks, it will
> break again.
>
> /Annoyed/?
>
>> ------------------------------------------------------------------------
>> On 5/4/21 10:31 PM, Thomas DeBellis wrote:
>>
>> Personally, I don't see how it could /possibly/ be anything to do
>> with the REACH:: node definition, but I have been known to
>> occasionally overlook the utterly obvious, particularly when it's
>> near night-night.? Maybe not this time.
>>
>> Right now, the way to figure it out is to get the minor error data
>> and see where that takes things.? So I'm making a change to JNTMAN to
>> have .NDINT to return the lower level code on an incomplete insert.
>> SCLINK appears to have a problem that it is mangling return values,
>> which I'm currently investigating.
>>
>> You can't just blithely assuming somebody got it wrong and 'fix'
>> things; sometimes it's a certain way for a reason.
>>
>> On 5/4/21 8:46 PM, Johnny Billquist wrote:
>>> On 2021-05-05 00:54, Mike Kostersitz wrote:
>>>> Ouch that is one of my nodes ? @Johnny Billquist
>>>> <mailto:bqt at softjar.se> anything you could think of since we
just
>>>> renamed my old RSX11M node to REACH.
>>>
>>> Well. It is something slightly broken in Tops-20, so there isn't
>>> really anything we can do about it.
>>>
>>> Except hope that Thomas can figure it out and fix it.
>>>
>>> ?Johnny
>>>
>>>>
>>>> Mike
>>>>
>>>> Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for
>>>> Windows 10
>>>>
>>>> *From: *Thomas DeBellis <mailto:tommytimesharing at gmail.com>
>>>> *Sent: *Tuesday, May 4, 2021 15:16
>>>> *To: *HECnet <mailto:hecnet at update.uu.se>
>>>> *Subject: *[HECnet] Re: Tops-20 SETNOD Failure
>>>>
>>>> I fixed a few things in SETNOD to get some more information about
>>>> the error.? In particular,
>>>>
>>>> ? * Allow listing of AREA 1 (this was specifically disallowed, I
don't
>>>> ??? know why)
>>>> ? * More consistent error reporting (via ESOUT%)
>>>> ? * List more than one node when doing an area list (it would only
>>>> list
>>>> ??? a single node)
>>>> ? * List nodes with more than three digits in the node number when
>>>> doing
>>>> ??? columnar output
>>>>
>>>> So now you get the expected results:
>>>>
>>>> ??? SETNOD>lis a 1
>>>> ??? [Area 1]
>>>> ??? A1RTR?? 1023??? ATHENA?? 620??? ATLE???? 605??? AURORA 606
>>>> ??? BANAI??? 770
>>>> ??? BANX25?? 771??? BEA?????? 19??? BIZET??? 800 BJARNE???? 7
>>>> BLINKY?? 266
>>>> ??? CATWZL?? 302??? CLYDE??? 269??? COOPER?? 263??? CRISPS 201
>>>> ??? CYGNUS?? 259
>>>> ??? DAVROS?? 254??? DBIT???? 351??? DE1RSX?? 450??? DE1RSY 452
>>>> ??? DOCTOR?? 252
>>>> ??? ELIN???? 616??? ELMER??? 617??? ERNIE????? 2??? ERSATZ 350
>>>> ??? FLETCH?? 100
>>>> ??? FNATTE???? 3??? FREJ???? 608??? GAXP???? 730 GNAT????? 16
>>>> GNOME????? 6
>>>> ??? GOBLIN???? 4??? GVAX???? 731??? HAGMAN?? 262??? HARPER 261
>>>> ??? HORSE??? 150
>>>> ??? HUGIN??? 602??? HYUNA??? 500??? INKY???? 268??? JIMIN 501
>>>> JOCKE???? 21
>>>> ??? JOSSE???? 17??? KLIO???? 451??? KRILLE???? 8??? LOKE 607
>>>> MACARO?? 303
>>>> ??? MACRA??? 258??? MAGICA???? 1??? MASTER?? 251 MIM?????? 13
>>>> MUNIN??? 603
>>>> ??? NIPPER?? 202??? NOMAD??? 610??? NOXBIT?? 720??? ORACLE 301
>>>> ??? PACMAN?? 265
>>>> ??? PAI????? 541??? PALLAS?? 621??? PAMINA??? 18??? PIDP11 560
>>>> ??? PINKY??? 267
>>>> ??? PISTON?? 520??? PLINTH?? 200??? PMAVS2?? 510 PONDUS??? 15
>>>> PONY????? 12
>>>> ??? PUFF????? 22??? QEMUNT?? 151??? REI????? 540 ROCKY???? 11
>>>> ROJIN??? 542
>>>> ??? RSX124?? 306??? RSX145?? 304??? RSX170?? 305??? RSX184 307
>>>> ??? RUTAN??? 255
>>>> ??? SHARPE?? 260??? SIDRAT?? 253??? SIGGE???? 10 SPEEDY??? 24
>>>> TARDIS?? 250
>>>> ??? TEMPO????? 9??? THOROS?? 257??? TINA????? 14??? TIPSY 604
>>>> TONGUE?? 264
>>>> ??? TOPSY??? 601??? VALAR??? 400??? VAROS??? 256 WXP?????? 20
>>>> WXP2????? 23
>>>> ??? YMER???? 609??? ZEKE?????? 5
>>>> ??? Total nodes in area 1: 92
>>>> ??? SETNOD>exit
>>>>
>>>> Regarding the error, I have reproduced it with a single entry, viz:
>>>>
>>>> ??? !setnod
>>>> ??? SETNOD>_set nod 2.298 name REACH_
>>>> ??? SETNOD>_insert_
>>>> ??? ?SETNOD: Failed at node REACH (2.298), Item 0 of 1
>>>> ??? SETNOD>
>>>>
>>>> The high level code to do the entry is in JNTMAN.? It loops through
>>>> the table passed to it via .NDINT, calling a lower level routine
>>>> called SCTAND in SCLINK.? An error here is passed up to JNTMAN, but
>>>> it is not passed back to the user. There are some other problems in
>>>> SCLINK pertaining to negative return values, so some minor work is
>>>> necessary there, also.
>>>>
>>>> I'll make some changes to these two modules, generate a new monitor
>>>> for VENTI2 and see what happens in a few days.
>>>>
>>>> Right now, if any Tops-20 using is using SETNOD to update DECnet
>>>> tables, this appears to fail.? If anybody else is seeing it or can
>>>> reproduce it, I'd like to hear about it.
>>>>
>>>> ??? On 5/4/21 11:15 AM, Thomas DeBellis wrote:
>>>>
>>>> ??? Has anybody ever seen SETNOD fail to insert the entire node
>>>> list?? I
>>>> ??? just did.
>>>>
>>>> ??? Shortly after I put my 20's up on HECnet, I wrote a reoccurring
>>>> ??? batch job that fires once a week on Sundays to pull the latest
>>>> node
>>>> ??? list (T20.FIX) from MIM::.? I use the highly venerable FILCOM
>>>> ??? program to do a difference of it with the previous week's list.?
I
>>>> ??? don't do anything in particular with the output except save it
in
>>>> ??? case I feel like looking at it for some reason.
>>>>
>>>> ??? The batch job always inserts the entire list, rewriting whatever
>>>> ??? might be in the monitor's data base.? I have always been
>>>> unsatisfied
>>>> ??? with doing things that way because it seemed to me to be
>>>> inefficient
>>>> ??? as the node list grew.?? The HECnet node list count was 716 on
>>>> ??? 9-Jun-19 and it's now up to 884 as of the latest version that
I've
>>>> ??? pulled, 30-Apr-21.? The other problem is the microscopic
>>>> possibility
>>>> ??? that a node is in Tops-20's monitor database (a hash table) that
>>>> ??? isn't in the HECnet node list.
>>>>
>>>> ??? Nodes can get removed, although I think that infrequent.? Nodes
>>>> ??? could get inserted outside of the batch job, but I think that most
>>>> ??? unlikely in my situation.? Nodes can get renamed, as evidenced by
>>>> ??? 2.299 below, which went from THEPIT to THEARK.? None of this
>>>> should
>>>> ??? or has broken anything.
>>>>
>>>> ??? However, it's been in the back of my mind to do two
enhancements,
>>>> ??? one to Tops-20 and one to SETNOD.? The NODE% JSYS should have an
>>>> ??? additional feature to return the current monitor data base.? The
>>>> ??? SETNOD program should be enhanced to take that to compute the set
>>>> ??? difference with the new list.? This would show additions, renames
>>>> ??? and deletions.? That would bring the update operation down from
>>>> some
>>>> ??? hundred items to less than ten, on average.? This would obviously
>>>> ??? make more of a difference on huge DECnet's in the tens of
>>>> thousands
>>>> ??? of nodes.? Another NODE% feature should probably be to whack the
>>>> ??? entire monitor database except for the local node, which would be
>>>> ??? useful for trouble shooting.
>>>>
>>>> ??? Last Sunday, the batch job failed with the following error:
>>>>
>>>> ??? 18:33:40 USER?? SETNOD>*TAKE SYSTEM:NODE-DATA.TXT.0
>>>> ??? 18:33:40 USER
>>>> ??? 18:33:40 USER?? [Fork SETNOD opening <SYSTEM>NODE-DATA.TXT.1
for
>>>> ??? reading]
>>>> ??? 18:33:41 USER?? SETNOD>*SAVE
>>>> ??? 18:33:41 USER
>>>> ??? 18:33:41 USER?? [Fork SETNOD opening <SYSTEM>NODE-DATA.BIN.74
for
>>>> ??? reading, writing]
>>>> ??? 18:33:41 USER?? SETNOD>*INSERT
>>>> ??? 18:33:41 USER
>>>> ??? 18:33:41 USER *?SETNOD: Failed at node REACH*
>>>> ??? 18:33:41 USER?? SETNOD>
>>>>
>>>> ??? I had a look at the SETNOD source and the HECnet node list and
>>>> have
>>>> ??? discovered and concluded a few things.? First, there doesn't
>>>> seem to
>>>> ??? be anything syntactically wrong with REACH::'s definition:
"set
>>>> nod
>>>> ??? 2.298 name REACH".? Second, there don't appear to be any
semantic
>>>> ??? issues.? 2.298 wasn't in use and it shouldn't matter if it
was.
>>>>
>>>> ??? In the case of INSERT, there are two kinds of errors from NODE%, a
>>>> ??? general failure of the JSYS and an incomplete insertion.?? The
>>>> error
>>>> ??? is from the second case.? Unfortunately, SETNOD isn't reporting
>>>> ??? enough information about the error, so I have to make some changes
>>>> ??? there.? It's also possible that SETNOD is building an
inconsistent
>>>> ??? database for the monitor to swallow; at least the LIST command is
>>>> ??? giving me some odd results, viz:
>>>>
>>>> ??????? SETNOD>list arEA 2
>>>>
>>>> ??????? [AREA 2]
>>>> ??????? A2RTR
>>>>
>>>> ??????? TOTAL NODES FOUND: 1
>>>>
>>>> ??????? SETNOD>
>>>>
>>>> ??? That's clearly wrong, viz:
>>>>
>>>> ??????? !i dec
>>>> ???????? ?Local DECNET node: VENTI2.? Nodes reachable: 7.
>>>> ???????? ?Accessible DECNET nodes are:??? A2RTR??? BOINGO LEGATO
>>>> ??????? TOMMYT??? VENTI2??? VENTI??? ZITI
>>>>
>>>> ??? The Exec output should probably be changed to say, "Nodes
>>>> reachable
>>>> ??? in local area" and "Online nodes in area are:"
>>>>
>>>> ??? Anybody have any ideas?? Hunches?? Clues?
>>>>
>>>> File 1) OLDF:[4,120]??? created: 1241 15-Apr-21
>>>> File 2) NEWF:[1,1]????? created: 0102 30-Apr-21
>>>>
>>>> 1)1???? set nod 44.9 name OSMIUM
>>>> ****
>>>> 2)1???? set nod 2.292 name OSIRIS
>>>> 2)????? set nod 44.9 name OSMIUM
>>>> **************
>>>> 1)1???? set nod 13.3 name RED
>>>> ****
>>>> 2)1 *set nod 2.298 name REACH *
>>>> 2)????? set nod 13.3 name RED
>>>> **************
>>>> 1)1???? set nod 2.298 name RSX11M
>>>> 1)????? set nod 1.306 name RSX124
>>>> ****
>>>> 2)1???? set nod 1.306 name RSX124
>>>> **************
>>>> 1)1???? set nod 42.5 name SPARKY
>>>> ****
>>>> 2)1???? set nod 2.291 name SPARK
>>>> 2)????? set nod 42.5 name SPARKY
>>>> **************
>>>> 1)1???? set nod 2.299 name THEPIT
>>>> 1)????? set nod 35.70 name THOMAS
>>>> ****
>>>> 2)1???? set nod 2.299 name THEARK
>>>> 2)????? set nod 35.70 name THOMAS
>>>> **************
>>>>
>>>
--
Johnny Billquist || "I'm on a bus
|| on a psychedelic trip
email: bqt at softjar.se || Reading murder books
pdp is alive! || tryin' to stay hip" - B. Idol