[HECnet] Re: L2 redundancy

9 Sep 2022

Sounds impressive, yet I'm not sure I'm clear about the figures being 
touted here.  Isn't 32 TB a monthly cap on total data transfers before 
data rates are decreased?  Have you done any throughput tests?  What 
have you seen?

When I think of bandwidth, I think of bits over time.  For example, my 
FIOS interface was recently increased from 1 Gbps to 2 Gbps.  I haven't 
noticed any difference except less of a degradation during the day.  
Optimum Online (another ISP in my area) is now quoting 5 Gbps.  I'm not 
sure what to think about that, but I'm fairly certain both figures have 
data caps on them which are probably unpublished.

DECnet over Ethernet is so fast for NRT's that I was forced to re-code 
Kermit-20's timing routines to be microsecond based. Although I 
calculate data rates out to terabits (DUDE!!), I haven't been able to 
push more than about a megabit through Kermit.  However, this is still 
alpha code.

I was wondering if any kind of statistics are being kept on total data 
going through HECnet?  I'm probably hogging my fair share due to all the 
transfer tests I'm running against Tops-10 Kermit.

------------------------------------------------------------------------
On 9/8/22 3:35 PM, Brian Hechinger wrote:
...
  I've got a 4 node kubernetes cluster I've just
setup recently in 
 Germany. It's got tons of bandwidth (32TB month per node) and I would 
 be more than happy to host (a) routing node(s) there. I was already 
 thinking about putting simh there so it's not much of a stretch to 
 make it a hub. It's not all that far from Sweden though so it may not 
 be super useful.

 I can easily setup pyDECnet nodes. One per k8s node if we wanted to 
 get crazy. 🤣

 -brian
 On Sep 8, 2022, 15:27 +0100, Johnny Billquist &lt;bqt(a)softjar.se&gt;se>, wrote:

     Hi. A couple of comments.

     On 2022-09-08 16:04, Paul Koning wrote:

         I just had a conversation with one of the operators of a
         HECnet L2 router who has demoted it to L1 because of
         connectivity issues to other areas. That prompted some
         thinking about the DECnet requirements for fault tolerance.

         The basic principle of DECnet Phase IV is that L1 routing
         (within an area) involves only routers of that area, and L2
         routing (across areas) involves only L2 routers. Phase V
         changes that to some extent, but HECnet is not Phase V and
         isn't likely to be. :-) In addition, L1 routers send out of
         area traffic to some L2 router in their area, but without any
         awareness of the L2 topology.

     Agreed on all of the things mentioned.

         This has several consequences:

         1. If some of the L2 routers in your area can't see the
         destination area but others can, you may not be able to
         communicate even though it would seem that there is a way to
         get there from here.

     The is basically a broken network. *All* L2 routers should be able to
     communicate with all other L2 routers without anything but L2
     routers in
     between. If that is broken, then the whole DECnet is split.

     The solution to this is really to have multiple connections. You could
     argue that a full connection mesh would be ideal, but that means
     way too
     many links. But ideally every L2 router should have at least
     connections
     to two other L2 routers. That way you should never have a single point
     of failure. Some more links is probably good. And a suggestion is
     really
     to have them to different areas in different places, to
     distribute/disperse the whole connectivity graph.

     This does not hurt the slightest from a performance point of view.
     DECnet will always route along the most efficient path, even if other
     alternatives exist.

     As a supplemental comment to that, "efficient path" is always a bit
     subjective. But if people adhere to the cost suggestions I have
     written,
     packets will be traveling along a good path.

         2. If your area is split, i.e., some of its L2 routers can see
         one subset of the nodes in the area and other L2 routers can
         see a different subset, then out of area traffic inbound to
         that area may not reach its destination -- if it enters at the
         "wrong" L2 entry point.

     An area split actually have nothing to do with L2 routers, but
     what you
     describe is a fallout from the area split. But an area split
     happens if
     the L1 routers inside one area do not have connectivity with all other
     L1 routers within the area, without passing through anything else than
     other L1 routers in the area.

     It really is the same thing as with point 1 above, but within one area
     instead of inter-area. And again, the best solution is really to have
     multiple L1 routers, and have links to several other L1 routers within
     the area, so that you avoid a single point of failure. More links are
     better, but again, it has to be kept to reasonable numbers.

     This will minimize the risks of ever getting an area split.

     And there are no penalties with this. As long as costs are set
     properly,
     packets will travel along a good path.

     However, what people also need to understand is that when packets will
     travel to another area, the risk that the packets take a slightly less
     optimal path is unavoidable. This has to do with how DECnet is routing
     packets, and it is not something that can be "improved". It also means
     that packets might travel very different ways in the two directions.

     If someone really wants more explanations about that topic, I'm
     happy to
     explain why it is this way. I'm sure Paul already knows as well.

         I believe the issue I mentioned at the top was #1: one of the
         L2 routers went down and the remaining L2 routers of that area
         ended up at two sides of a partitioned L2 network.

     If the L2 network is partitioned, it is basically a broken DECnet in
     general, and for every node, just a subset of the DECnet is reachable,
     depending on which side they end up through this partitioning.

         Obviously HECnet isn't a production network, but still it
         would be nice for it to be tolerant of outages. Especially
         since we can insert additional routers easily with PyDECnet or
         Robert Jarratt's C router. The HECnet map can be set to show
         just the L2 network (using the layers menu, accessible via the
         layers icon in the top right corner of the map). It's easy to
         see a number of L2 routers that have only one connection to
         the rest of HECnet. It's also clear that a large fraction of
         the connectivity is via Sweden, which certainly is a fine
         option but it's a bit odd for a node in, say, western Canada
         to have only that one connection and none to nodes much closer
         to it.

     I would agree that my preference is that we should improve the
     resilience of HECnet. In addition, I would in general recommend
     that L2
     routers should have multiple connections, and they should have some to
     nearby L2 routers in other areas, for multiple reasons.

     Don't make sense to route through Sweden unless it's actually on
     the way
     towards the destination.

         The map display doesn't give a visual clue about
         singly-connected area routers for which there is no location
         information in the database (the ones plotted at Inaccessible
         Island). The data is there in the map data table; it wouldn't
         be too hard to do some post-processing on that data to find
         cases of no redundancy.

     I think a bigger problem is that the Cisco routers links are not
     visible
     at all on the map right now, making the picture rather skewed. Not
     sure
     how well the bridge links are shown either...

         I'm curious if people would be interested in trying to make
         HECnet more fault tolerant. My router (PYTHON) can definitely
         help, especially for North American nodes, and I'm sure there
         are a number of others that feel the same.

     Like I said. I'm all for it.

     Johnny

     --
     Johnny Billquist || "I'm on a bus
     || on a psychedelic trip
     email: bqt(a)softjar.se || Reading murder books
     pdp is alive! || tryin' to stay hip" - B. Idol
     _______________________________________________
     HECnet mailing list -- hecnet(a)lists.dfupdate.se
     To unsubscribe send an email to hecnet-leave(a)lists.dfupdate.se

 _______________________________________________
 HECnet mailing list --hecnet(a)lists.dfupdate.se
 To unsubscribe send an email tohecnet-leave(a)lists.dfupdate.se 

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

[HECnet] Re: L2 redundancy