On Monday, May 26, 2014 at 7:09 AM, Jean-Yves Bernier wrote:
[ Summary : File transfers between two simh PDP hang, DECnet reports Data
overruns and Response timeouts ]
At 2:14 AM +0200 26/5/14, Johnny Billquist wrote:
This is a problem inside of DECnet on the simulated host. It gets
packets faster than it can process them, so some packets are dropped.
Unfortunately DECnet deals very bad with systematic packet loss like this.
You get retransmissions, and after a while the retransmission
timeout backs off until you have more than a minute between
retransmission attempts.
Anyway, if you can get simh to throttle the ethernet interface, that
might help you.
(I don't remember offhand if it do support such functionality.)
The service polling timer can be adjusted
SET XQ POLL={DEFAULT|4..2500}
Set to 100 by default.
Changing the polling timer makes a huge difference. Have a look at:
http://pastebin.com/AZ1U6bh3
Although it still hangs sometimes, reliability has vastly improved upon the
erratic behavior of the beginning. Remember, the completion time was
about 3 minutes.
We're almost there :)
This turns into an interesting challenge : optimize XQ service timer to make
overruns the lowest possible. This depends on many factors, among them is
the data sink bandwidth.
You may have flawless copy to TI:, but it will fail to disk. The terminal is
actually throttling the transfer. Disks are faster, and emulated disks are order
of magnitude faster than the original ones.
Emulation is pushing DECnet to speeds it was never designed for.
I'm running here as low as 10 polls/sec. Maybe 50 would be optimal, and
what about 500? I need a metrics. And tools. Here, I am using AT.
to time a 100. blocks file transfer. Overruns and timeouts still raise slowly, but
DECnet recovers happily most of the time.
If you're going to drive this deeply into testing it would really be best if you ran
with the latest code since your results may ultimately suggest code changes, AND the
latest code may behave significantly different than the older versions.
Independent of which codebase you're running, and since you're now tweaking the
behavior of the simulated hardware, you may want to look at sim> SHOW XQ STATS and try
to analyze the relationship between these stats and the ones the OS sees.
Also, you may want to explore what happens if:
NCP> DEFINE LINE QNA-0 RECEIVE BUFFER 32
Is done prior to starting your network... The limit of 32 may be different on
different Operating Systems...
Once again the latest code is available from:
https://github.com/simh/simh/archive/master.zip
Good Luck,
- Mark