On May 27, 2014, at 3:00 PM, Bob Armstrong <bob at jfcl.com> wrote:
... The hello timer on circuit QNA-1 on LEGATO
(which is what I assume to be the relevant parameter here) is set to 15
seconds. I don't know what the corresponding timeout is on the MIM side,
but maybe Johnny will tell us. In any case, changing the QNA-1 hello timer
to 5 seconds just for fun didn't seem to make much difference, so the
problem is likely that packets are being lost somewhere.
Also, in Phase IV the hello timer value is sent in the hello message, and that value times 3 (or 2) is used for the listen timer. So the timeout at MIM is caused by no hellos from LEGATO for 45 seconds.
paul
Here LEGATO gives adjacent node listener receive timeouts too.
Probably not surprising - all HECnet Ethernet traffic from LEGATO goes
thru Johnny's bridge and psilo, so nobody on HECnet would see any better
results than MIM.
I don't exactly know how to go about figuring out where the bottleneck is.
"ping Psilo.Update.UU.SE" and traceroute give trip times of about 190ms,
which doesn't seem outrageous. The hello timer on circuit QNA-1 on LEGATO
(which is what I assume to be the relevant parameter here) is set to 15
seconds. I don't know what the corresponding timeout is on the MIM side,
but maybe Johnny will tell us. In any case, changing the QNA-1 hello timer
to 5 seconds just for fun didn't seem to make much difference, so the
problem is likely that packets are being lost somewhere.
Bob
On May 27, 2014, at 7:23 PM, Peter Lothberg <roll at Stupi.SE> wrote:
Hum...
I dunno if this matters, but some nodes like 1.13 always sends two
hello's back to back... (I deleted the hello-sending messages for this box..)
...
May 27 15:17:16: DNET-ADJ: Level 2 hello from 1.13
May 27 15:17:16: DNET-ADJ: Level 2 hello from 1.13
May 27 15:17:16: DNET-ADJ: Level 2 hello from 47.556
May 27 15:17:16: DNET-ADJ: Level 2 hello from 47.556
Two possible reasons.
1. Router hellos from the designated router go to both the All Endnodes and the All Routers address.
2. Router hellos from Phase IV+ L2 routers go to both the All Routers address and the (new in IV+) All L2 Routers address.
paul
Here LEGATO gives adjacent node listener receive timeouts too.
I've just tried dnping under Linux (just because I remembered it
existed) and found mean RTTs to MIM of min/avg/max = 93.6/95.0/96.1 ms
and to LEGATO 256.0/487.5/981.9 ms. Perhaps this info is of help,
Erik
On Tue, May 27, 2014 at 05:11:23PM +0200, Johnny Billquist wrote:
On 2014-05-27 17:06, Johnny Billquist wrote:
On 2014-05-27 16:48, Paul_Koning at Dell.com wrote:
On May 27, 2014, at 10:35 AM, Bob Armstrong <bob at jfcl.com> wrote:
On 2014-05-27 07:07, Paul_Koning at Dell.com wrote:
In other words, the listen timeout for
Ethernet is 3 * hello time, for point to point it is 2 * hello time.
But can you clarify who is dropping who (i.e. which end is timing
out)? The message on LEGATO says "dropped by adjacent node" not
"dropping adjacent node", which makes it sound as if MIM is timing
out, not LEGATO (and then somehow telling LEGATO that, which is
another mystery).
Or is this just a poorly worded message?
No, the message is very precise.
If you see a message on LEGATO which reports adjacency MIM down for
that reason, it means LEGATO received an Ethernet router hello message
from MIM that no longer lists LEGATO as one of the routers that MIM
can see.
MIM will have a reason for not reporting LEGATO any longer. It should
have logged a message stating that reason.
Right. And MIM says "Adjacency listener receive timeout."
So it would seem packets are lost from Bob/LEGATO...
By the way. Note that it is only LEGATO that MIM had this problem
with yesterday. All other nodes were stable. So it's not something
local to the ethernet at update, neither to the bridge as such.
It's either the network between Bob and Update, or the bridge on
Bob's end, or something on the network at his end...
Johnny
Hum...
I dunno if this matters, but some nodes like 1.13 always sends two
hello's back to back... (I deleted the hello-sending messages for this box..)
May 27 15:17:19: DNET-ADJ: Level 2 hello from 4.248
May 27 15:17:20: DNET-ADJ: Level 2 hello from 42.1022
May 27 15:17:20: DNET-ADJ: Level 2 (IV+) hello from 42.1022
May 27 15:17:20: DNET-ADJ: Level 2 hello from 42.1022
May 27 15:17:21: DNET-ADJ: Level 2 hello from 23.1023
May 27 15:17:21: DNET-ADJ: Level 2 (IV+) hello from 23.1023
May 27 15:17:22: DNET-ADJ: Level 2 hello from 19.40
May 27 15:17:23: DNET-ADJ: Level 2 hello from 19.40
May 27 15:17:25: DNET-ADJ: Level 2 hello from 59.57
May 27 15:17:25: DNET-ADJ: Level 2 (IV+) hello from 59.57
May 27 15:17:27: DNET-ADJ: Level 2 hello from 59.60
May 27 15:17:27: DNET-ADJ: Level 2 (IV+) hello from 59.60
May 27 15:17:28: DNET-ADJ: Level 2 hello from 5.1023
May 27 15:17:30: DNET-ADJ: Level 2 hello from 1.15
May 27 15:17:31: DNET-ADJ: Level 2 hello from 62.637
May 27 15:17:31: DNET-ADJ: Level 2 hello from 62.637
May 27 15:17:31: DNET-ADJ: Level 2 hello from 1.13
May 27 15:17:31: DNET-ADJ: Level 2 hello from 1.13
May 27 15:17:16: DNET-ADJ: Level 2 hello from 62.637
May 27 15:17:16: DNET-ADJ: Level 2 hello from 1.13
May 27 15:17:16: DNET-ADJ: Level 2 hello from 1.13
May 27 15:17:16: DNET-ADJ: Level 2 hello from 47.556
May 27 15:17:16: DNET-ADJ: Level 2 hello from 47.556
May 27 15:17:16: DNET-ADJ: Level 2 hello from 28.41
May 27 15:17:16: DNET-ADJ: Level 2 hello from 11.2
May 27 15:17:17: DNET-ADJ: Level 2 hello from 19.41
May 27 15:17:17: DNET-ADJ: Level 2 hello from 8.400
May 27 15:17:17: DNET-ADJ: Level 2 hello from 28.41
May 27 15:17:17: DNET-ADJ: Level 2 hello from 11.2
May 27 15:17:17: DNET-ADJ: Level 2 hello from 5.1023
May 27 15:17:17: DNET-ADJ: Level 2 hello from 4.248
May 27 15:17:18: DNET-ADJ: Level 2 hello from 19.41
May 27 15:17:18: DNET-ADJ: Level 2 hello from 8.400
May 27 15:17:18: DNET-ADJ: Level 2 hello from 28.41
May 27 15:17:18: DNET-ADJ: Level 2 hello from 11.2
May 27 15:17:18: DNET-ADJ: Level 2 hello from 4.248
May 27 15:17:19: DNET-ADJ: Level 2 hello from 19.41
May 27 15:17:19: DNET-ADJ: Level 2 hello from 8.400
May 27 15:17:19: DNET-ADJ: Level 2 hello from 4.248
May 27 15:17:20: DNET-ADJ: Level 2 hello from 42.1022
May 27 15:17:20: DNET-ADJ: Level 2 (IV+) hello from 42.1022
May 27 15:17:20: DNET-ADJ: Level 2 hello from 42.1022
May 27 15:17:21: DNET-ADJ: Level 2 hello from 23.1023
May 27 15:17:21: DNET-ADJ: Level 2 (IV+) hello from 23.1023
May 27 15:17:22: DNET-ADJ: Level 2 hello from 19.40
May 27 15:17:23: DNET-ADJ: Level 2 hello from 19.40
May 27 15:17:25: DNET-ADJ: Level 2 hello from 59.57
May 27 15:17:25: DNET-ADJ: Level 2 (IV+) hello from 59.57
May 27 15:17:27: DNET-ADJ: Level 2 hello from 59.60
May 27 15:17:27: DNET-ADJ: Level 2 (IV+) hello from 59.60
May 27 15:17:28: DNET-ADJ: Level 2 hello from 5.1023
May 27 15:17:30: DNET-ADJ: Level 2 hello from 1.15
May 27 15:17:31: DNET-ADJ: Level 2 hello from 62.637
May 27 15:17:31: DNET-ADJ: Level 2 hello from 62.637
May 27 15:17:31: DNET-ADJ: Level 2 hello from 1.13
May 27 15:17:31: DNET-ADJ: Level 2 hello from 1.13
On 2014-05-27 17:06, Johnny Billquist wrote:
On 2014-05-27 16:48, Paul_Koning at Dell.com wrote:
On May 27, 2014, at 10:35 AM, Bob Armstrong <bob at jfcl.com> wrote:
On 2014-05-27 07:07, Paul_Koning at Dell.com wrote:
In other words, the listen timeout for
Ethernet is 3 * hello time, for point to point it is 2 * hello time.
But can you clarify who is dropping who (i.e. which end is timing
out)? The message on LEGATO says "dropped by adjacent node" not
"dropping adjacent node", which makes it sound as if MIM is timing
out, not LEGATO (and then somehow telling LEGATO that, which is
another mystery).
Or is this just a poorly worded message?
No, the message is very precise.
If you see a message on LEGATO which reports adjacency MIM down for
that reason, it means LEGATO received an Ethernet router hello message
from MIM that no longer lists LEGATO as one of the routers that MIM
can see.
MIM will have a reason for not reporting LEGATO any longer. It should
have logged a message stating that reason.
Right. And MIM says "Adjacency listener receive timeout."
So it would seem packets are lost from Bob/LEGATO...
By the way. Note that it is only LEGATO that MIM had this problem with yesterday. All other nodes were stable. So it's not something local to the ethernet at update, neither to the bridge as such.
It's either the network between Bob and Update, or the bridge on Bob's end, or something on the network at his end...
Johnny
On 2014-05-27 17:01, Peter Lothberg wrote:
On May 27, 2014, at 10:35 AM, Bob Armstrong <bob at jfcl.com> wrote:
On 2014-05-27 07:07, Paul_Koning at Dell.com wrote:
In other words, the listen timeout for
Ethernet is 3 * hello time, for point to point it is 2 * hello time.
=20
But can you clarify who is dropping who (i.e. which end is timing out)? =
The message on LEGATO says "dropped by adjacent node" not "dropping adjac=
ent node", which makes it sound as if MIM is timing out, not LEGATO (and th=
en somehow telling LEGATO that, which is another mystery).
=20
Or is this just a poorly worded message?
No, the message is very precise.
If you see a message on LEGATO which reports adjacency MIM down for that re=
ason, it means LEGATO received an Ethernet router hello message from MIM th=
at no longer lists LEGATO as one of the routers that MIM can see.
MIM will have a reason for not reporting LEGATO any longer. It should have=
logged a message stating that reason.
paul
When we where chasing another problem, it was found that packets
"disapear" in the Ethernet fabric at Update, sometimes.
I think you are referring to when we played with transfers and the performance drops to the floor. That is lost packets because of interface speed differences. They are not really lost, but there only so much that can be queued up in the fabric between interfaces with different speeds.
Maybe Johnny can make a small map of the current topology with all
involved things and the lan-speeds?
That would actually be useful either way, but I'm not sure I can do it easily.
Johnny
On 2014-05-27 16:48, Paul_Koning at Dell.com wrote:
On May 27, 2014, at 10:35 AM, Bob Armstrong <bob at jfcl.com> wrote:
On 2014-05-27 07:07, Paul_Koning at Dell.com wrote:
In other words, the listen timeout for
Ethernet is 3 * hello time, for point to point it is 2 * hello time.
But can you clarify who is dropping who (i.e. which end is timing out)? The message on LEGATO says "dropped by adjacent node" not "dropping adjacent node", which makes it sound as if MIM is timing out, not LEGATO (and then somehow telling LEGATO that, which is another mystery).
Or is this just a poorly worded message?
No, the message is very precise.
If you see a message on LEGATO which reports adjacency MIM down for that reason, it means LEGATO received an Ethernet router hello message from MIM that no longer lists LEGATO as one of the routers that MIM can see.
MIM will have a reason for not reporting LEGATO any longer. It should have logged a message stating that reason.
Right. And MIM says "Adjacency listener receive timeout."
So it would seem packets are lost from Bob/LEGATO...
Johnny
On May 27, 2014, at 10:35 AM, Bob Armstrong <bob at jfcl.com> wrote:
On 2014-05-27 07:07, Paul_Koning at Dell.com wrote:
In other words, the listen timeout for
Ethernet is 3 * hello time, for point to point it is 2 * hello time.
=20
But can you clarify who is dropping who (i.e. which end is timing out)? =
The message on LEGATO says "dropped by adjacent node" not "dropping adjac=
ent node", which makes it sound as if MIM is timing out, not LEGATO (and th=
en somehow telling LEGATO that, which is another mystery).
=20
Or is this just a poorly worded message?
No, the message is very precise.
If you see a message on LEGATO which reports adjacency MIM down for that re=
ason, it means LEGATO received an Ethernet router hello message from MIM th=
at no longer lists LEGATO as one of the routers that MIM can see.
MIM will have a reason for not reporting LEGATO any longer. It should have=
logged a message stating that reason.
paul
When we where chasing another problem, it was found that packets
"disapear" in the Ethernet fabric at Update, sometimes.
Maybe Johnny can make a small map of the current topology with all
involved things and the lan-speeds?
--P