Thanks!? If I'm understanding your interpretation of the DECnet
specifications, I think you may be right; so this is definitely
something /To Do/.
A couple of things are bothering me about Tops-20's handling of the
situation.? Many of the changes that we (universities) made over the
years were either to enable the system to better handle problems or to
better signal them.? The lore is that the access control job was a
Columbia suggestion, but MIT and Stanford had plenty others.? Two that
DEC did would be routing BUG alerts through Galaxy so more people would
see them and generalizing error capturing with Dump-on-Bug.
/Nothing/ came up on the operator consoles about this, even though the
error file was increasing by pages a day, eventually swelling to
gargantuan size.? It's the kind of thing that could make the system
effectively unusable to somebody who wasn't paying attention.? I had
noted the disk usage increase, but had naively attributed this to the
large amount of data I was accumulating, downloaded updated sources, Etc.
I think some better introductory documentation might be written as to
how to configure DECnet with cautions, provisos and examples.? So I had
some notes and I guess I'll tidy those up.
Meanwhile, the frame length error reporting code needs to be enhanced in
two ways:
1. In addition to the Ethernet mac addresses, it ought to report the
differences in frame and buffer sizes
2. It ought to have some throttling:
* After 'X' time, do a BUGINF
* After 'X' time, stop logging the same frame error
Since I've shut off logging of the error (by stomping NIEVTB+4 to -1),
my log files aren't increasing as drastically.? I'm getting a number of
node-offline-online events, but I'm not sure if I caused those with some
other things I was doing.
------------------------------------------------------------------------
On 1/12/21 8:51 PM, Paul Koning wrote:
By all means go ahead. I think for me to change what I did is the right thing, but
I'll hold off until you're finished with what you want to test.
paul
> ------------------------------------------------------------------------
> On Jan 12, 2021, at 5:49 PM, Thomas DeBellis <tommytimesharing at gmail.com>
wrote:
>
> Before you fix that (if you do), would you might my testing some changes I made,
first?
>> ------------------------------------------------------------------------
>> On 1/12/21 4:21 PM, Paul Koning wrote:
>> That's certainly sufficient but the precise rule is somewhat more lenient.
>>
>> Segment buffer size controls the largest packet NSP will generate. Buffer size
specifies the largest packet the routing layer can forward. The requirement for two nodes
to be able to communicate is that the Buffer Size value for each of the routers on the
path between the two must be >= the smaller of the two segment buffer size values at
the NSP end points.
>>
>> For sanity, this translates into: all routers must have a Buffer Size >= the
value of any Segment Buffer Size. But it's perfectly legal for the segment buffer
size to be smaller, or inconsistent. Also, it's perfectly legal for an end node to
have a small Buffer Size, and some small systems indeed do this. The reason is that end
nodes don't forward, so the messages they are dealing with can only ever be as large
as their segment buffer size, since that's exchanged during NSP connection
establishment.
>>
>> Reading carefully in the various DECnet specs I think I got the buffer size
parameter wrong for Ethernet circuits. It looks like I should limit the routing packets
to the Buffer Size (the common parameter, i.e., 576) rather than using a larger value on
Ethernet. That should cure the issue Thomas discovered.
>>
>> paul
>>> ------------------------------------------------------------------------
>>> On Jan 12, 2021, at 3:29 PM, Peter Lothberg <roll at stupi.com> wrote:
>>>
>>> The DECnet segment size has to be the same "network wide".
>>>
>>> If I remember right DECnet looks at the two end nodes and uses the smalles
segment size,
>>> so if there is any transit node in the path with a small segment size things
will not work as
>>> it will drop packets bigger than it''s size.
>>>
>>> The only SW/HW combination I knew of that has other than 576 is MRC/Stu
DECnet for
>>> Tops20 4.x on DEC2020.
>>>
>>> -P
>>>> ------------------------------------------------------------------------
>>>> *From*: "tommytimesharing" <tommytimesharing at
gmail.com>
>>>> *To*: "hecnet" <hecnet at Update.UU.SE>
>>>> *Sent*: Monday, January 11, 2021 11:58:56 PM
>>>> *Subject*: Re: [HECnet] Thousands of DECnet errors on Tops-20
>>>>
>>>> Yes, I had seen this and had wondered about it after I had reflected on
the output of a SHOW EXECUTOR CHARACTERISTICS command(clipped)
>>>>
>>>> Executor Node = 2.520 (TOMMYT)
>>>>
>>>> Identification = Tommy Timesharing
>>>> Management Version = 4.0.0
>>>> CPU = DECSYSTEM1020
>>>> Software Identification = Tops-20 7.1 PANDA
>>>>
>>>> .
>>>> .
>>>> .
>>>> Buffer Size = 576
>>>> Segment Buffer Size = 576
>>>>
>>>> So it would appear that the 20's implementation of NICE knows of this
differentiation. I can parse for both SET EXECUTOR SEGMENT BUFFER SIZE and SET EXECUTOR
BUFFER SIZE. Both fail, of course; again, once DECnet is initialized, they are locked.
>>>>
>>>> However, when one looks at the DECnet initialization block (IBBLK), it
only contains a field for buffer size (IBBSZ), nothing about segment size. Further, the
NODE% JSYS' set DECnet initialization parameters function (.NDPRM) only contains a
sub-function for buffer size (.NDBSZ) and SETSPD will only parse for DECNET BUFFER-SIZE.
I'm hopeful to test that this weekend after I've looked further through the error
log.
>>>>
>>>> The receive code in the low level NI driver (PHYKNI) only checks to see
whether was was received will fit into the buffer specified. It returns a length error
(UNLER%) to DNADLL, but not the actual difference.
>>>>
>>>> I have yet to puzzle out how the segment size is derived, but it is
apparently set on a line basis.
>>>>
>>>> On 1/11/21 8:24 PM, Johnny Billquist wrote:
>>>>
>>>> Thomas, I wonder if you might experience the effects of that ethernet
packet size might be different than the DECnet segment buffer size.
>>>> This is a little hard to explain, as I don't have all the proper
DECnet naming correct.
>>>>
>>>> But, based on RSX, there is two sizes relevant. One is the actual buffer
size the line is using. The other is the DECnet segment buffer size.
>>>>
>>>> The DECnet segment buffer size is the maximum size of packets you can
ever expect DECnet itself to ever use.
>>>> However, at least with RSX, when it comes to the exchange of information
at the line level, which includes things like hello messages, RSX is actually using a
system buffer size setting, which might be very different from the DECnet segment buffer
size.
>>>>
>>>> I found out that VMS have a problem here in that if the hello packets
coming in are much larger than the DECnet segment buffer size, you never even get
adjacency up, while RSX can deal with this just fine.
>>>>
>>>> It sounds like you might be seeing something similar in Tops-20. In which
case you would need to tell the other end to reduce the size of these hello and routing
information packets for Tops-20 to be happy, or else find a way to accept larger packets.
>>>>
>>>> After all, ethernet packets can be up to 1500 bytes of payload.
>>>>
>>>> And to explain it a bit more from an RSX point of view. RSX will use the
system buffer size when creating these hello messages. So, if that is set to 1500, you
will get hello packets up to 1500 bytes in size, which contain routing vectors and so on.
>>>>
>>>> But actual DECnet communication will be limited to what the DECnet
segment buffer size say, so once you have adjacency up, when a connection is established
between two programs, those packets will never be larger than the DECnet segment buffer
size, which is commonly 576 bytes.
>>>>
>>>> Johnny
>>>>>
------------------------------------------------------------------------
>>>>> On 2021-01-11 23:43, Thomas DeBellis wrote:
>>>>>
>>>>> Paul,
>>>>>
>>>>> Lots of good information. For right now, I did an experiment and
went into MDDT and stubbed out the XWD UNLER%,^D5 entry in the NIEVTB: table in the
running monitor on VENTI2. Since then (about an hour or so ago), TOMMYT 's ERROR.SYS
file has been increasing as usual (a couple of pages an hour) while VENTI2's
hasn't changed at all. So that particular fire hose is plugged for the time being.
>>>>>
>>>>> I don't believe I have seen this particular error before,
however, there are probably some great reasons for that. In the 1980's, CCnet may not
have had Level-2 routers on it while Columbia's 20's were online. We did have a
problem with the 20's complaining about long Ethernet frames from an early version BSD
4.2 that was being run on some VAX 11/750's in the Computer Science department's
research lab. They got taught how to not do that and all was well.
>>>>>
>>>>> Tops-20's multinet implementation was first done at BBN and then
later imported. I am not sure that it will allow me to change the frame size. 576 was
what was used for the Internet, so I don't know where that might be hardwired.
I'll check.
>>>>>
>>>>> I think there are two forensics to perform here:
>>>>>
>>>>> 1. Investigate when the errors started happening; whether they
predate
>>>>> Bob adopting PyDECnet
>>>>> 2. Investigate what the size difference is; I don't believe
that is
>>>>> going into the error log, but I'll have to look more
carefully with
>>>>> SPEAR.
>>>>>
>>>>> A *warning* for anyone also looking to track this down: if you do the
retrieve in SPEAR on KLH10 and you don't have have my time out changes for DTESRV, you
will probably crash your 20. This will happen both with a standard DEC monitor and
PANDA.
>>>>>
------------------------------------------------------------------------
>>>>> On 1/11/21 4:41 PM, Paul Koning wrote:
>>>>>>
------------------------------------------------------------------------
>>>>>> On Jan 11, 2021, at 4:22 PM, Thomas DeBellis<tommytimesharing
at gmail.com> wrote:
>>>>>>
>>>>>> OK, I guess that's probably a level 2 router broadcast coming
over the bridge. There is no way Tops-10 or Tops-20 could currently be generating that
because there is no code to do so; they're level 1, only
>>>>> Yes, unfortunately originally both multicasts used the same address.
That was changed in Phase IV Plus, but that still sends to the old address for backwards
compatibility and it isn't universally implemented.
>>>>>> I started looking at the error; it starts out in DNADLL when it
is detected on a frame that has come back from NISRV (the Ethernet Interface driver). The
error is then handed off to NTMAN where the actual logging is done. So, there are two
quick hacks to stop all the errors:
>>>>>>
>>>>>> ? I could stub out the length error entry (XWD UNLER%,^D5)
in the NIEVTB: table in DNADLL.MAC.
>>>>>> ? I could put in a filter ($NOFIL) for event class 5 in the
NMXFIL: table in NTMAN.MAC.
>>>>>>
>>>>>> That will stop the deluge for the moment. Meanwhile, I have to
understand what's actually being detected; even the full SPEAR entry is short on
details (like how long the frame was).
>>>>>> The thing to look for is the buffer size (frame size) setting of
the stations on the Ethernet. It should match; if not someone may send a frame small
enough by its settings but too large for someone else who has a smaller value. Routing
messages tend to cause that problem because they are variable length; the Phase IV rules
have the routers send them (the periodic ones) as large as the line buffer size permits.
>>>>> Note that DECnet by convention doesn't use the full max Ethernet
frame size in DECnet, because DECnet has no fragmentation so the normal settings are
chosen to make for consistent NSP packet sizes throughout the network. The router
sending the problematic messages is 2.1023 (not 63.whatever, Rob, remember that addresses
are little endian) which has its Ethernet buffer size set to 591. That matches the VMS
conventional default of 576 when accounting for the "long header" used on
Ethernet vs. the "short header" on point to point (DDCMP etc.) links). But
VENTI2 has its block size set to 576. If you change it to 591 it should start working.
>>>>>
>>>>> Perhaps I should change PyDECnet to have a way to send shorter than
max routing messages.
>>>>>
>>>>> paul
>>>>>
>>>>>
>>>>>
>>>
>>>>
>>>>
>>>
>>>
>>>
>>>