A new concept was introduced in [14] for a reconfigurable floating-point multiply-adders to reduce the latency for robot control. This reconfigurations involves direct hardware connections between the multipliers and the adders. A parallel VLSI processor composed of several processor elements (PE) was proposed. In each PE, a switching hardware is used to change the connection between the multipliers and the adders, so that the multiply-adders with a desired numbers of multipliers can be constructed.
Each PE consists of two multipliers, two adders, a local memory (LM) and a switch circuit (SC) as shown in Figure 9. The inner connection of the SC is changed every clock cycle to reconfigure the multiply-adder. Figure 10 shows an example of a reconfigured multiply-adder that contains four multipliers.
Figure 9: Reconfigurable parallel VLSI processor.
Figure 10: Reconfiguration of a multi-operand multiply-adder.
The following examples shows the speed improvement of using this processor. The latency for differential inverse kinematics (DIK) computations of twelve-DOF manipulator is about which is about 180 times faster than the latency of a parallel processor approach using general-purpose microprocessors. Also, the latency for resolved acceleration control of a twelve-DOF manipulator is which is about 60 times faster than the latency of a parallel processor approach using conventional DSPs.
Figure 11 shows the reconfigured floating-point multi-operand multiply-adder in which there is a pre-normalize circuit before each stage of the addition, and only one post-normalize circuit only in the final stage adder, this reduces the time needed for pre- and post-normalization of the operands about one half using this method in comparison with the multi-operand adder shown in Figure 12.
Figure 11: Reconfiguration for the floating-point multi-operand multiply-adder.
Figure 12: Conventional floating-point multi-operand multiply-adder.
To perform multiplication in one clock cycle, the PE has pipeline registers as shown in Figure 13. For matrix operations, a reconfigurable parallel VLSI processor is shown in Figure 14. In this configuration, each PE has seven sixty-for-bit wide I/O channels to construct a two-dimensional linear array processor. Three I/O channels are provided for common data busses. The other four are to connect the neighboring PEs for the reconfiguration.
Figure 13: Structure of the PE.
Figure 14: Reconfigurable parallel VLSI for matrix operations.
Figure 15 shows the chip layout of the PE, and Figure 16 shows the features of this chip.
Figure 15: Chip layout of the PE.
Figure 16: Features of the PE.