Jerome,
    I find your discussion of multiword precision math very interesting although I
don't have any applications
that need that sort of thing. The idea of extending the instruction set to do high speed
external math
routines reminds me of the folks using GPUs to do high speed math operations.
   I recently visited the University of Illinois NCSA facility where the new "Blue
Waters" system went into operation late last summer.  ( See
https://bluewaters.ncsa.illinois.edu/hardware-summary ) 
  It was designed by Cray and has 22,640 Cray XE6 nodes and 4,228 Cray XK7 nodes that
include NVIDIA graphics processor acceleration. It can achieve 13+ petaflops, but the
power bill is a killer as it draws 24 megawatts but that includes the water chillers and
the 9,000 gallons per minute of cooling water flowing through it.
  It is very neat that Eratz-11 has that kind of extensibility in it. I would love to play
with it's multiprocessor capabilities sometime with RSX11M+ like Johnny B. has done.
  Good Luck with your project.
Mark
On Jun 23, 2015, at 8:05 AM, Jerome H. Fine wrote:
  About 10 years ago, I was using an algorithm which
required more than
 15 digits of precision.  I wrote some PDP-11 assembler code which
 could handle unsigned values up to 2^512 (just under 10^160) plus
 fractional numbers with 1024 bits that had a precision on the right hand
 side of the decimal point equal to the integer portion - 512 bits for each.
 Actually, there were three levels of precision: 128 bits, 256 bits and the
 maximum at 512 bits.  The FORTRAN 77 integer symbols were LU...,
 MU... and NU... while the corresponding integer / fractional symbols
 were LX..., MX... and NX..., all allocated as CHARACTER *n variables.
 
 These subroutines are designed to be used under FORTRAN 77, so any
 PDP-11 operating system (such as RT-11 and RSX-11) can easily make
 use of them.  While these routines include ADD, SUBTRACT and
 MULTIPLY, DIVISION is not available, although that is easily remedied
 via a FORTRAN 77 subroutine which arrives at the result via the standard
 approximation algorithm.  Also available are ENCODE and DECODE
 routines to convert between internal binary and external decimal values.
 In addition, there are routines to convert back and forth between all six
 sizes of variables and DOUBLE  PRECISION floating point or REAL *8
 variables.
 
 Of late, I realized that a signed variable aspect is required, so I have begun
 to consider what is needed.  ALSO, because I so often run the PDP-11
 code under the Ersatz-11 emulator, I will consider supporting the use of
 six additional PDP-11 instructions (for each ONLY one combination of
 registers will be used as operands - Ersatz-11 supports a DLL):
 UMUL16  -  unsigned multiple two 16 bit variables
 MUL32     - signed multiple two 32 bit variables
 UMUL32  -  unsigned multiple two 32 bit variables
 UDIV16   -  unsigned divide a 32 bit variable by a 16 bit variable
 DIV32      -  signed divide a 64 bit variable by a 32 bit variable
 UDIV32   -  unsigned divide a 64 bit variable by a 32 bit variable
 the UMUL16 and UMUL32 instructions being especially important to
 perform multi-precision MULTIPLY.  I will also consider the possibility
 of a single PDP-11 instruction to perform multi-precision arithmetic of
 values contained in memory using that ability of the Ersatz-11 emulator
 to LOAD a user written DLL, namely to convert many of the PDP-11
 multi-precision assembler subroutines to a single PDP-11 instruction
 which would then be executed using x86 instructions at a much higher
 speed, sort of like a CIS for multi-precision variables.  In that case,
 much larger sized variables could easily be supported due to the much
 higher speed of execution.  In addition, the (approximately) 16KB
 of subroutine instruction / data memory within the emulated PDP-11
 could be substantially reduced.
 
 If there is sufficient interest and support, complete algorithms might be
 implemented which could directly make use of the x86's huge GB
 memory to solve particular problems - sort of like a SLAR auxiliary
 processor CPU (which for example performed an FFT on a KB
 sized array in virtual memory) implemented in software rather than
 hardware.
 
 I hope that some interest is expressed.  Commercial inquiries for a
 specific algorithm would obviously receive priority, but hobby users
 are expressly encouraged to express all of their needs as well.
 
 Jerome Fine