[HECnet] Thousands of DECnet errors on Tops-20

12 Jan 2021

That's certainly sufficient but the precise rule is somewhat more lenient.

Segment buffer size controls the largest packet NSP will generate.  Buffer size specifies
the largest packet the routing layer can forward.  The requirement for two nodes to be
able to communicate is that the Buffer Size value for each of the routers on the path
between the two must be >= the smaller of the two segment buffer size values at the NSP
end points.

For sanity, this translates into: all routers must have a Buffer Size >= the value of
any Segment Buffer Size.  But it's perfectly legal for the segment buffer size to be
smaller, or inconsistent.  Also, it's perfectly legal for an end node to have a small
Buffer Size, and some small systems indeed do this.  The reason is that end nodes
don't forward, so the messages they are dealing with can only ever be as large as
their segment buffer size, since that's exchanged during NSP connection
establishment.

Reading carefully in the various DECnet specs I think I got the buffer size parameter
wrong for Ethernet circuits.  It looks like I should limit the routing packets to the
Buffer Size (the common parameter, i.e., 576) rather than using a larger value on
Ethernet.  That should cure the issue Thomas discovered.

	paul

...
  On Jan 12, 2021, at 3:29 PM, Peter Lothberg <roll
at stupi.com> wrote:

 The DECnet segment size has to be the same "network wide".

 If I remember right DECnet looks at the two end nodes and uses the smalles segment size,

 so if there is any transit node in the path with a small segment size things will not
work as 
 it will drop packets bigger than it''s size.

 The only SW/HW combination I knew of that has other than 576 is MRC/Stu DECnet for 
 Tops20 4.x on DEC2020.

 -P

 From: "tommytimesharing" <tommytimesharing at gmail.com>
 To: "hecnet" <hecnet at Update.UU.SE>
 Sent: Monday, January 11, 2021 11:58:56 PM
 Subject: Re: [HECnet] Thousands of DECnet errors on Tops-20
 Yes, I had seen this and had wondered about it after I had reflected on the output of a
SHOW EXECUTOR CHARACTERISTICS command(clipped)

 Executor Node = 2.520 (TOMMYT)

   Identification = Tommy Timesharing
   Management Version = 4.0.0
   CPU = DECSYSTEM1020
   Software Identification = Tops-20 7.1 PANDA

 .
 .
 .
   Buffer Size = 576
   Segment Buffer Size = 576

 So it would appear that the 20's implementation of NICE knows of this
differentiation.  I can parse for both SET EXECUTOR SEGMENT BUFFER SIZE and SET EXECUTOR
BUFFER SIZE.  Both fail, of course; again, once DECnet is initialized, they are locked.

 However, when one looks at the DECnet initialization block (IBBLK), it only contains a
field for buffer size (IBBSZ), nothing about segment size.  Further, the NODE% JSYS'
set DECnet initialization parameters function (.NDPRM) only contains a sub-function for
buffer size (.NDBSZ) and SETSPD will only parse for DECNET BUFFER-SIZE.  I'm hopeful
to test that this weekend after I've looked further through the error log.

 The receive code in the low level NI driver (PHYKNI) only checks to see whether was was
received will fit into the buffer specified.  It returns a length error (UNLER%) to
DNADLL, but not the actual difference.

 I have yet to puzzle out how the segment size is derived, but it is apparently set on a
line basis.

 On 1/11/21 8:24 PM, Johnny Billquist wrote:

 Thomas, I wonder if you might experience the effects of that ethernet packet size might
be different than the DECnet segment buffer size. 
 This is a little hard to explain, as I don't have all the proper DECnet naming
correct. 

 But, based on RSX, there is two sizes relevant. One is the actual buffer size the line is
using. The other is the DECnet segment buffer size. 

 The DECnet segment buffer size is the maximum size of packets you can ever expect DECnet
itself to ever use. 
 However, at least with RSX, when it comes to the exchange of information at the line
level, which includes things like hello messages, RSX is actually using a system buffer
size setting, which might be very different from the DECnet segment buffer size. 

 I found out that VMS have a problem here in that if the hello packets coming in are much
larger than the DECnet segment buffer size, you never even get adjacency up, while RSX can
deal with this just fine. 

 It sounds like you might be seeing something similar in Tops-20. In which case you would
need to tell the other end to reduce the size of these hello and routing information
packets for Tops-20 to be happy, or else find a way to accept larger packets. 

 After all, ethernet packets can be up to 1500 bytes of payload. 

 And to explain it a bit more from an RSX point of view. RSX will use the system buffer
size when creating these hello messages. So, if that is set to 1500, you will get hello
packets up to 1500 bytes in size, which contain routing vectors and so on. 

 But actual DECnet communication will be limited to what the DECnet segment buffer size
say, so once you have adjacency up, when a connection is established between two programs,
those packets will never be larger than the DECnet segment buffer size, which is commonly
576 bytes. 

   Johnny 
 On 2021-01-11 23:43, Thomas DeBellis wrote: 

 Paul, 

 Lots of good information.  For right now, I did an experiment and  went into MDDT and
stubbed out the XWD UNLER%,^D5 entry in the NIEVTB: table in the running monitor on
VENTI2.  Since then (about an hour or so ago), TOMMYT 's ERROR.SYS file has been
increasing as usual (a couple of pages an hour) while VENTI2's hasn't changed at
all.  So that particular fire hose is plugged for the time being. 

 I don't believe I have seen this particular error before, however, there are probably
some great reasons for that.  In the 1980's, CCnet may not have had Level-2 routers on
it while Columbia's 20's were online.  We did have a problem with the 20's
complaining about long Ethernet frames from an early version BSD 4.2 that was being run on
some VAX 11/750's in the Computer Science department's research lab.  They got
taught how to not do that and all was well. 

 Tops-20's multinet implementation was first done at BBN and then later imported.  I
am not sure that it will allow me to change the frame size.  576 was what was used for the
Internet, so I don't know where that might be hardwired.  I'll check. 

 I think there are two forensics to perform here: 

  1. Investigate when the errors started happening; whether they predate 
     Bob adopting PyDECnet 
  2. Investigate what the size difference is; I don't believe that is 
     going into the error log, but I'll have to look more carefully with 
     SPEAR. 

 A *warning* for anyone also looking to track this down: if you do the retrieve in SPEAR
on KLH10 and you don't have have my time out changes for DTESRV, you will probably
crash your 20.  This will happen both with a standard DEC monitor and PANDA. 

 ------------------------------------------------------------------------ 
 On 1/11/21 4:41 PM, Paul Koning wrote: 

 On Jan 11, 2021, at 4:22 PM, Thomas DeBellis<tommytimesharing at gmail.com>  wrote:

 OK, I guess that's probably a level 2 router broadcast coming over the bridge.  There
is no way Tops-10 or Tops-20 could currently be generating that because there is no code
to do so; they're level 1, only 
 Yes, unfortunately originally both multicasts used the same address.  That was changed in
Phase IV Plus, but that still sends to the old address for backwards compatibility and it
isn't universally implemented. 

 I started looking at the error; it starts out in DNADLL when it is detected on a frame
that has come back from NISRV (the Ethernet Interface driver).  The error is then handed
off to NTMAN where the actual logging is done.  So, there are two quick hacks to stop all
the errors: 

     ? I could stub out the length error entry (XWD UNLER%,^D5) in the NIEVTB: table in
DNADLL.MAC. 
     ? I could put in a filter ($NOFIL) for event class 5 in the NMXFIL: table in
NTMAN.MAC. 

 That will stop the deluge for the moment.  Meanwhile, I have to understand what's
actually being detected; even the full SPEAR entry is short on details (like how long the
frame was). 
 The thing to look for is the buffer size (frame size) setting of the stations on the
Ethernet.  It should match; if not someone may send a frame small enough by its settings
but too large for someone else who has a smaller value.  Routing messages tend to cause
that problem because they are variable length; the Phase IV rules have the routers send
them (the periodic ones) as large as the line buffer size permits. 

 Note that DECnet by convention doesn't use the full max Ethernet frame size in
DECnet, because DECnet has no fragmentation so the normal settings are chosen to make for
consistent NSP packet sizes throughout the network.   The router sending the problematic
messages is 2.1023 (not 63.whatever, Rob, remember that addresses are little endian) which
has its Ethernet buffer size set to 591.  That matches the VMS conventional default of 576
when accounting for the "long header" used on Ethernet vs. the "short
header" on point to point (DDCMP etc.) links).  But VENTI2 has its block size set to
576.  If you change it to 591 it should start working.           

 Perhaps I should change PyDECnet to have a way to send shorter than max routing messages.

     paul 

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

[HECnet] Thousands of DECnet errors on Tops-20