Workload-cognizant concurrent error detection in the scheduler of a modern microprocessor

Naghmeh Karimi, Michail Maniatakos, Abhijit Jas, Chandrasekharan Tirumurti, Yiorgos Makris

Research output: Contribution to journalArticlepeer-review

Abstract

We present a Concurrent Error Detection (CED) scheme for the Scheduler of a modern microprocessor. The proposed CED scheme is based on monitoring a set of invariances imposed through added hardware, violation of which signifies the occurrence of an error. The novelty of our solution stems from the workload-cognizant way in which these invariances are selected so that they leverage the application-level error masking inherent in program execution. Specifically, in order to ensure cost-effectiveness of the hardware employed to construct these invariances, we make use of information regarding the type and frequency of errors affecting the typical workload of the microprocessor. Thereby, we identify the most susceptible aspects of instruction execution and we accordingly distribute CED resources to protect them. Our approach is demonstrated on the Scheduler of an Alpha-like superscalar microprocessor with dynamic scheduling, hybrid branch prediction and out-of-order execution capabilities. Using an extensive fault-simulation infrastructure that we developed around this microprocessor, we profile the impact of Scheduler faults across a variety of different SPEC2000 benchmarks. Based on the results, we construct a CED scheme which monitors the time and location of instruction execution, the executed operation, the utilized resources, as well as the executed and retired sequence of instructions. At a hardware cost of only 32 percent of the Scheduler, the corresponding CED scheme detects over 85 percent of its faults that affect the architectural state of the microprocessor. Furthermore, over 99.5 percent of these faults are detected before they corrupt the architectural state, while the average detection latency for the remaining faults is in the order of a few clock cycles, implying that efficient recovery methods can be developed.

Original languageEnglish (US)
Article number5669287
Pages (from-to)1274-1287
Number of pages14
JournalIEEE Transactions on Computers
Volume60
Issue number9
DOIs
StatePublished - 2011

Keywords

  • Concurrent error detection
  • invariance
  • microprocessor
  • scheduler

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'Workload-cognizant concurrent error detection in the scheduler of a modern microprocessor'. Together they form a unique fingerprint.

Cite this