Designing VLSI network nodes to reduce memory traffic in a shared memory parallel computer

Susan Dickey, Allan Gottlieb, Richard Kenner, Yue Sheng Liu

Research output: Contribution to journalArticlepeer-review


Serialization of memory access can be a critical bottleneck in shared memory parallel computers. The NYU Ultracomputer, a large-scale MIMD (multiple instruction stream, multiple data stream) shared memory architecture, may be viewed as a column of processors and a column of memory modules connected by a rectangular network of enhanced 2×2 buffered crossbars. These VLSI nodes enable the network to combine multiple requests directed at the same memory location. Such requests include a new coordination primitive, fetch- and-add, which permits task coordination to be achieved in a highly parallel manner. Processing within the network is used to reduce serialization at the memory modules. To avoid large network latency, the VLSI network nodes must be high-performance components. Design tradeoffs between architectural features, asymptotic performance requirements, cycle time, and packaging limitations are complex. This report sketches the Ultracomputer architecture and discusses the issues involved in the design of the VLSI enhanced buffered crossbars which are the key element in reducing serialization.

Original languageEnglish (US)
Pages (from-to)217-238
Number of pages22
JournalCircuits, Systems, and Signal Processing
Issue number2
StatePublished - Jun 1987

ASJC Scopus subject areas

  • Signal Processing
  • Applied Mathematics


Dive into the research topics of 'Designing VLSI network nodes to reduce memory traffic in a shared memory parallel computer'. Together they form a unique fingerprint.

Cite this