Thank you all for your support!
So here are some more detailed notes to get DECnet for Linux working with a
recent longterm kernel, 4.9.77. How stable it remains to be seen...
Changes to sources of other kernels are likely to be similar. But because of
many API changes, there are many differences between decnet module sources
with different kernel versions, so patch files are not too useful. However,
just a few small edits are needed.
The patches below were tested with Slackware 14.1 (which doesn't use systemd),
where it is relatively easy to build a custom kernel. Also, decnet startup
was done by hand rather than using a script:
# ifconfig eth0 mtu 576 # see below
# modprobe decnet
# echo -n area.node > /proc/sys/net/decnet/node_address
# dnetd
# phoned
[Kernel 4.14.15 gives BUG: unable to handle kernel NULL pointer dereference
which seems to be related to xfrm_lookup(). A kernel warning about dst.h:256
can be avoided by using dst_use_noref() instead of dst_use() at two places.]
--- MTU ---
It seems the decnet module can generate packets which are one byte larger than
intended. This can be fixed (?) in af_decnet.c:dn_mss_from_pmtu() by adding
one line:
*/
mtu -= (21 + DN_MAX_NSP_DATA_HEADER + 16);
}
+ mtu--; /* probably due to padding */
if (mtu > mss)
mss = mtu;
return mss;
The dneigh utility gives
Node HWtype HWaddress Flags MTU Iface
rullfl loop AA:00:04:00:04:70 --- 65533 lo
rullfs ether AA:00:04:00:29:70 -2- 1498 eth0
gorvax ether AA:00:04:00:90:21 -2- 1498 eth0
pdxvax ether AA:00:04:00:98:F2 -2- 1498 eth0
dimma ether AA:00:04:00:0B:EC -2- 576 eth0
jocke ether AA:00:04:00:15:04 -2- 576 eth0
mim ether AA:00:04:00:0D:04 -2- 1500 eth0
hub ether AA:00:04:00:FE:AB -2- 576 eth0
bitxoz ether AA:00:04:00:51:1C 1-- 1492 eth0
skhngw ether AA:00:04:00:04:38 -2- 576 eth0
a44rtr ether AA:00:04:00:FF:B3 -2- 1498 eth0
where it is interesting to see values of 576, but also a value of 1500 for MIM
(but communication with MIM works well).
MTUs or blksizes are computed in dn_neigh.c, accompanied by the remark:
/*
* Make an estimate of the remote block size by assuming that its
* two less then the device mtu, which it true for ethernet (and
* other things which support long format headers) since there is
* an extra length field (of 16 bits) which isn't part of the
* ethernet headers and which the DECnet specs won't admit is part
* of the DECnet routing headers either.
*
* If we over estimate here its no big deal, the NSP negotiations
* will prevent us from sending packets which are too large for the
* remote node to handle. In any case this figure is normally updated
* by a hello message in most cases.
*/
dn->blksize = dev->mtu - 2;
The above mtu--; seems sufficient, and subtracting an additional two not necessary.
For HECnet, it seems best to set the MTU to 576, to be able to reach nodes in
different areas. This can be done with ifconfig.
Optionally in sysctl_net_decnet.c a /proc/sys/net/decnet/mtu entry could be added with:
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec,
+ },
+ {
+ .procname = "mtu",
+ .data = &decnet_mtu,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
},
{ }
};
where decnet_mtu needs to be declared in this source file, and in linux/net/include/dn.h
as:
extern int decnet_mtu;
which could be used at various places instead of dst_mtu(). [But perhaps the
blksize can be requested from the remote host.]
--- "No buffer space available" message by a dnprogs utility ---
At some point in time, sock_alloc_send_skb() changed, causing a problem in
af_decnet.c:dn_alloc_send_pskb(), which can be fixed by adding the second "+"
line (the first one seems common practice, but may be unnecessary):
if (skb) {
skb->protocol = htons(ETH_P_DNA_RT);
skb->pkt_type = PACKET_OUTGOING;
+ skb->sk = sk;
+ *errcode = 0;
}
return skb;
}
--- "Destroying alive neighbour" kernel message ---
This issue could be related by dneigh entries that are neither local nor routers.
The following two changes seem to help (based on comparing various sources):
The check of __refcnt in dn_route.c:dn_dst_check_expire():
spin_lock(&dn_rt_hash_table[i].lock);
while ((rt = rcu_dereference_protected(*rtp,
lockdep_is_held(&dn_rt_hash_table[i].lock))) != NULL) {
- if (atomic_read(&rt->dst.__refcnt) ||
+ if (atomic_read(&rt->dst.__refcnt) > 1 ||
(now - rt->dst.lastuse) < expire) {
rtp = &rt->dst.dn_next;
continue;
And the setting of initial_ref in dn_route.c:dn_route_output_slow():
if (dev_out->flags & IFF_LOOPBACK)
flags |= RTCF_LOCAL;
- rt = dst_alloc(&dn_dst_ops, dev_out, 0, DST_OBSOLETE_NONE, DST_HOST);
+ rt = dst_alloc(&dn_dst_ops, dev_out, 1, DST_OBSOLETE_NONE, DST_HOST);
if (rt == NULL)
goto e_nobufs;
--- "WARNING: CPU: 0 PID: nnn at include/net/dst.h:188" kernel message ---
This seems to be benign, and can be avoided by the following two changes
in dn_route.c, in void dn_dst_update_pmtu() and dn_rt_set_next_hop():
- if (dst_metric(dst, RTAX_MTU) > mtu && mtu >= min_mtu) {
+ if (dst_metric_raw(dst, RTAX_MTU) > mtu && mtu >= min_mtu) {
- if (dst_metric(&rt->dst, RTAX_MTU) > rt->dst.dev->mtu)
+ if (dst_metric_raw(&rt->dst, RTAX_MTU) > rt->dst.dev->mtu)
--- Kernel segfault with ctermd (login from remote node) ---
dnprogs 2.65 were used for testing. The segfault happens with
ctermd.c:cterm_reset(), where variable line is used, even though
it is not assigned when DNETUSE_DEVPTS is defined. So this #ifndef
around the code using line helps:
#ifndef DNETUSE_DEVPTS
p=line+sizeof("/dev/")-1;
setutent();
memcpy(entry.ut_line,p,strlen(p));
entry.ut_line[strlen(p)]='\0';
lentry=getutline(&entry);
lentry->ut_type=DEAD_PROCESS;
memset(lentry->ut_line,0,UT_LINESIZE);
memset(lentry->ut_user,0,UT_NAMESIZE);
lentry->ut_time=0;
pututline(lentry);
(void)chmod(line,0666);
(void)chown(line,0,0);
*p='p';
(void)chmod(line,0666);
(void)chown(line,0,0);
#endif
--- Using local node with dnprogs crashes the machine ---
This has not been solved.