Parallel processing on networks of workstations: a fault-tolerant, high performance approach

Partha Dasgupta, Zvi M. Kedem, Michael O. Rabin

Research output: Contribution to conferencePaperpeer-review

Abstract

One of the most sought after software innovation of this decade is the construction of systems using off-the-shelf workstations that actually deliver, and even surpass, the power and reliability of supercomputers. Using completely novel techniques: eager scheduling, evasive memory layouts and dispersed data management, it is possible to build a execution environment for parallel programs on workstation networks. These techniques were originally developed in a theoretical framework for an abstract machine which models a shared memory asynchronous multiprocessor. The network of workstations platform presents an inherently asynchronous environment for the execution of our parallel program. This gives rise to substantial problems of correctness of the computation and of proper automatic load balancing of the work amongst the processors, so that a slow processor will not hold up the total computation. A limiting case of asynchrony is when a processor becomes infinitely slow, i.e. fails. Our methodology copes with all these problems, as well as with memory failures. An interesting feature of this system is that it is neither a fault-tolerant system extended for parallel processing nor is it parallel processing system extended for fault tolerance. The same novel mechanisms ensure both properties.

Original languageEnglish (US)
Pages467-474
Number of pages8
StatePublished - 1995
EventProceedings of the 15th International Conference on Distributed Computing Systems - Vancouver, Can
Duration: May 30 1995Jun 2 1995

Other

OtherProceedings of the 15th International Conference on Distributed Computing Systems
CityVancouver, Can
Period5/30/956/2/95

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Fingerprint Dive into the research topics of 'Parallel processing on networks of workstations: a fault-tolerant, high performance approach'. Together they form a unique fingerprint.

Cite this