On May 27, 2014, at 10:48 AM, <Paul_Koning at Dell.com> <Paul_Koning at Dell.com> wrote:
On May 27, 2014, at 10:35 AM, Bob Armstrong <bob at jfcl.com> wrote:
On 2014-05-27 07:07, Paul_Koning at Dell.com wrote:
In other words, the listen timeout for
Ethernet is 3 * hello time, for point to point it is 2 * hello time.
But can you clarify who is dropping who (i.e. which end is timing out)? The message on LEGATO says "dropped by adjacent node" not "dropping adjacent node", which makes it sound as if MIM is timing out, not LEGATO (and then somehow telling LEGATO that, which is another mystery).
Or is this just a poorly worded message?
No, the message is very precise.
If you see a message on LEGATO which reports adjacency MIM down for that reason, it means LEGATO received an Ethernet router hello message from MIM that no longer lists LEGATO as one of the routers that MIM can see.
MIM will have a reason for not reporting LEGATO any longer. It should have logged a message stating that reason.
Let me spell out the sequence of protocol exchanges a bit, that may help make this clear.
Suppose I have an Ethernet with two routers on it (and nothing else, to keep it simple).
1. Start A. It will periodically send router hello messages with R/S List (see Phase IV routing spec, page 92) that is empty.
2. Start B. It will send out a router hello message with R/S List empty.
3. A receives that hello. It adds B to the list of routers it has heard from. It sends out (immediately, usually) a router hello with one R/S List entry: B, NOT known to be 2-way.
4. B receives that hello from A. It adds A to the list of routers it has heard from. It adds A to the R/S List, known 2-way (because A says it has heard B). It sends that updated hello. It also generates an Adjacency Up for A.
5. A receives that hello from B. It sees itself mentioned in the R/S List, so it changes its own R/S list entry for B to say that it now has 2-way connectivity. It sends out that updated router hello, and generates an Adjacency Up event for B.
So the key thing here is that an adjacency B is not up at A unless A hears from B that B can hear A. When that stops being true, you get the dropped event.
paul
On May 27, 2014, at 10:35 AM, Bob Armstrong <bob at jfcl.com> wrote:
On 2014-05-27 07:07, Paul_Koning at Dell.com wrote:
In other words, the listen timeout for
Ethernet is 3 * hello time, for point to point it is 2 * hello time.
But can you clarify who is dropping who (i.e. which end is timing out)? The message on LEGATO says "dropped by adjacent node" not "dropping adjacent node", which makes it sound as if MIM is timing out, not LEGATO (and then somehow telling LEGATO that, which is another mystery).
Or is this just a poorly worded message?
No, the message is very precise.
If you see a message on LEGATO which reports adjacency MIM down for that reason, it means LEGATO received an Ethernet router hello message from MIM that no longer lists LEGATO as one of the routers that MIM can see.
MIM will have a reason for not reporting LEGATO any longer. It should have logged a message stating that reason.
paul
On 2014-05-27 07:07, Paul_Koning at Dell.com wrote:
In other words, the listen timeout for
Ethernet is 3 * hello time, for point to point it is 2 * hello time.
But can you clarify who is dropping who (i.e. which end is timing out)? The message on LEGATO says "dropped by adjacent node" not "dropping adjacent node", which makes it sound as if MIM is timing out, not LEGATO (and then somehow telling LEGATO that, which is another mystery).
Or is this just a poorly worded message?
Thanks
Bob
On May 25, 2014, at 10:20 PM, Johnny Billquist <bqt at softjar.se> wrote:
On 2014-05-26 04:17, Johnny Billquist wrote:
On 2014-05-26 03:47, Bob Armstrong wrote:
Increased it to 32, ...
That's probably a good idea, but I don't actually think that's the
problem. Like I said, I see this happening with several nodes, all on
the
bridge QNA - MIM, PONDUS, A5RTR, SGC, etc.
Yeah. I checked some more, and I actually only have 18 adjacent nodes at
MIM:: so this is definitely not the problem. From my point of view, it
only seems to be LEGATO:: that is currently acting like a yo-yo...
I would guess that either some packets are lost, or else the packet trip times varies a *lot*. We should investigate more, but it's really getting late for me
Varying round trip time itself doesn t matter. Routing layer hellos are sent out periodically; the round trip time is not considered. What is necessary is that they must be delivered reliably enough. For Ethernet, two lost packets are allowed; for point to point links, only one. (Beware of braindead tunnel protocols that look like point to point links but run over UDP.) In other words, the listen timeout for Ethernet is 3 * hello time, for point to point it is 2 * hello time.
paul
On May 25, 2014, at 9:33 PM, Johnny Billquist <bqt at softjar.se> wrote:
On 2014-05-26 03:06, Bob Armstrong wrote:
Do you have a max routers set too low on your machine maybe?
We have quite a few routers on the bridge segment.
$ NCP SHOW EXEC CHAR
...
Max broadcast nonrouters = 512
Max broadcast routers = 128
...
$ NCP TELL MIM SHOW EXEC CHAR
...
Max broadcast nonrouters = 64
Max broadcast routers = 20
...
Maybe it's too low on MIM?? The message actually makes it sound like MIM
is dropping LEGATO, not the other way around.
Could be... In fact you are probably right. Just checked, and MIM have a MAX of 20 right now, which I believe is too low here. Increased it to 32, but I need to reboot for it to take effect
In DECnet, we do not assume that I can hear A means A can hear me . Instead, the protocol explicitly tests for that (in the case of routers). If the test fails, you don t get an adjacency. If there was an adjacency before, and then the test fails, that adjacency goes down.
The way this is done is that the Ethernet router hello message contains a list of routers the sender has heard. If a router doesn t see itself listed, it doesn t bring up the adjacency with the sender of that router hello. If it was there and goes away, you get the dropped by adjacency router event.
To find out why the adjacent router stopped mentioning you, you need to look in its event log. If the reason is too many router, there should be an event that says so. If the reason is something else, the reason should say what else it is. For example, it might be adjacency listener timeout meaning no hello messages were seen in 3 * hello time.
paul
Sent from mobile device that advertises itself for no good reason
On 26 May 2014, at 08:48, "Jerome H. Fine" <jhfinedp3k at compsys.to> wrote:
Cory Smelosky wrote:
On Mon, 19 May 2014, Jerome H. Fine wrote:
Since you are obviously using either V05.06 or V05.07
Yup. 5.07. It has Mentec branding!
(only the last two versions of RT-11 have the RT11ZM
Monitor), you can use the VRUN command to support
giving LINK all 64 KB of memory. Naturally, you
will probably need at least 256 KB of total physical
memory on even a PDP-11/23 (to run RT11XM as well as
to provide the needed extended memory to provide
LINK with the full 64 KB to run in) although 128 KB
might do in a pinch depending on which device you use
for the system device.
Thanks. I'll look in to VRUN.
Any good results (or bad) to report?
Got sidetracked with other projects.
How much physical memory do you have?
256 kilo words. Had more but half the board is bad. :(
Jerome Fine
[ Summary : File transfers between two simh PDP hang,
DECnet reports Data overruns and Response timeouts ]
At 2:14 AM +0200 26/5/14, Johnny Billquist wrote:
This is a problem inside of DECnet on the simulated host. It gets packets
faster than it can process them, so some packets are dropped.
Unfortunately DECnet deals very bad with systematic packet loss like this.
You get retransmissions, and after a while the retransmission timeout backs
off until you have more than a minute between retransmission attempts.
Anyway, if you can get simh to throttle the ethernet interface, that might help you.
(I don't remember offhand if it do support such functionality.)
The service polling timer can be adjusted
SET XQ POLL={DEFAULT|4..2500}
Set to 100 by default.
Changing the polling timer makes a huge difference. Have a look at:
http://pastebin.com/AZ1U6bh3
Although it still hangs sometimes, reliability has vastly improved upon the erratic behavior of the beginning. Remember, the completion time was about 3 minutes.
We're almost there :)
This turns into an interesting challenge : optimize XQ service timer to make overruns the lowest possible. This depends on many factors, among them is the data sink bandwidth.
You may have flawless copy to TI:, but it will fail to disk. The terminal is actually throttling the transfer. Disks are faster, and emulated disks are order of magnitude faster than the original ones. Emulation is pushing DECnet to speeds it was never designed for.
I'm running here as low as 10 polls/sec. Maybe 50 would be optimal, and what about 500? I need a metrics. And tools. Here, I am using AT. to time a 100. blocks file transfer. Overruns and timeouts still raise slowly, but DECnet recovers happily most of the time.
--
Jean-Yves Bernier
>Cory Smelosky wrote:
>On Mon, 19 May 2014, Jerome H. Fine wrote:
Since you are obviously using either V05.06 or V05.07
Yup. 5.07. It has Mentec branding!
(only the last two versions of RT-11 have the RT11ZM
Monitor), you can use the VRUN command to support
giving LINK all 64 KB of memory. Naturally, you
will probably need at least 256 KB of total physical
memory on even a PDP-11/23 (to run RT11XM as well as
to provide the needed extended memory to provide
LINK with the full 64 KB to run in) although 128 KB
might do in a pinch depending on which device you use
for the system device.
Thanks. I'll look in to VRUN.
Any good results (or bad) to report?
How much physical memory do you have?
Jerome Fine
On 2014-05-26 04:17, Johnny Billquist wrote:
On 2014-05-26 03:47, Bob Armstrong wrote:
Increased it to 32, ...
That's probably a good idea, but I don't actually think that's the
problem. Like I said, I see this happening with several nodes, all on
the
bridge QNA - MIM, PONDUS, A5RTR, SGC, etc.
Yeah. I checked some more, and I actually only have 18 adjacent nodes at
MIM:: so this is definitely not the problem. From my point of view, it
only seems to be LEGATO:: that is currently acting like a yo-yo...
I would guess that either some packets are lost, or else the packet trip times varies a *lot*. We should investigate more, but it's really getting late for me...
Johnny
--
Johnny Billquist || "I'm on a bus
|| on a psychedelic trip
email: bqt at softjar.se || Reading murder books
pdp is alive! || tryin' to stay hip" - B. Idol
On 2014-05-26 03:47, Bob Armstrong wrote:
Increased it to 32, ...
That's probably a good idea, but I don't actually think that's the
problem. Like I said, I see this happening with several nodes, all on the
bridge QNA - MIM, PONDUS, A5RTR, SGC, etc.
Yeah. I checked some more, and I actually only have 18 adjacent nodes at MIM:: so this is definitely not the problem. From my point of view, it only seems to be LEGATO:: that is currently acting like a yo-yo...
Johnny
--
Johnny Billquist || "I'm on a bus
|| on a psychedelic trip
email: bqt at softjar.se || Reading murder books
pdp is alive! || tryin' to stay hip" - B. Idol