TY - JOUR
T1 - Designing VLSI network nodes to reduce memory traffic in a shared memory parallel computer
AU - Dickey, Susan
AU - Gottlieb, Allan
AU - Kenner, Richard
AU - Liu, Yue Sheng
PY - 1987/6
Y1 - 1987/6
N2 - Serialization of memory access can be a critical bottleneck in shared memory parallel computers. The NYU Ultracomputer, a large-scale MIMD (multiple instruction stream, multiple data stream) shared memory architecture, may be viewed as a column of processors and a column of memory modules connected by a rectangular network of enhanced 2×2 buffered crossbars. These VLSI nodes enable the network to combine multiple requests directed at the same memory location. Such requests include a new coordination primitive, fetch- and-add, which permits task coordination to be achieved in a highly parallel manner. Processing within the network is used to reduce serialization at the memory modules. To avoid large network latency, the VLSI network nodes must be high-performance components. Design tradeoffs between architectural features, asymptotic performance requirements, cycle time, and packaging limitations are complex. This report sketches the Ultracomputer architecture and discusses the issues involved in the design of the VLSI enhanced buffered crossbars which are the key element in reducing serialization.
AB - Serialization of memory access can be a critical bottleneck in shared memory parallel computers. The NYU Ultracomputer, a large-scale MIMD (multiple instruction stream, multiple data stream) shared memory architecture, may be viewed as a column of processors and a column of memory modules connected by a rectangular network of enhanced 2×2 buffered crossbars. These VLSI nodes enable the network to combine multiple requests directed at the same memory location. Such requests include a new coordination primitive, fetch- and-add, which permits task coordination to be achieved in a highly parallel manner. Processing within the network is used to reduce serialization at the memory modules. To avoid large network latency, the VLSI network nodes must be high-performance components. Design tradeoffs between architectural features, asymptotic performance requirements, cycle time, and packaging limitations are complex. This report sketches the Ultracomputer architecture and discusses the issues involved in the design of the VLSI enhanced buffered crossbars which are the key element in reducing serialization.
UR - http://www.scopus.com/inward/record.url?scp=0023577237&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0023577237&partnerID=8YFLogxK
U2 - 10.1007/BF01598960
DO - 10.1007/BF01598960
M3 - Article
AN - SCOPUS:0023577237
SN - 0278-081X
VL - 6
SP - 217
EP - 238
JO - Circuits, Systems, and Signal Processing
JF - Circuits, Systems, and Signal Processing
IS - 2
ER -