Fine-Grained Checkpoint Recovery for Application-Specific Instruction-Set Processors

Tuo Li, Muhammad Shafique, Jude Angelo Ambrose, Jorg Henkel, Sri Parameswaran

Research output: Contribution to journalArticlepeer-review

Abstract

Checkpoint recovery (CR) is a classic fault-tolerance technique, which enables computing systems to execute correctly even when affected by transient faults. Although a number of software and hardware based approaches for CR does exist, these approaches usually are either too large, too slow, or require extensive modifications to the software and the caching/memory schemes. In this paper, we propose a novel CR approach, which is based on re-engineering the instruction set of a target processor. We take the base instruction set and augment the native micro-operations, i.e., an architectural description language (ADL), with additional microoperations to perform checkpointing at the granularity of basic blocks. The recovery mechanism is realized by three custom instructions, which can undo the corruptions caused by transient faults during instruction execution, including the values of general-purpose registers, data memory, and special-purpose registers (PC, status registers, etc.), which were incorrectly modified. Our checkpoint storage is sized according to the application program executed. The experimental results show that our approach degrades the system performance by just 0.76 percent when there is no fault, and introduces an area overhead of 44 percent on average and 79 percent in the worst case. During the fault injection test with the benchmark applications, the recovery took just 62 clock cycles (worst case).

Original languageEnglish (US)
Pages (from-to)647-660
Number of pages14
JournalIEEE Transactions on Computers
Volume66
Issue number4
DOIs
StatePublished - Apr 1 2017

Keywords

  • ASIP
  • checkpoint recovery
  • reliability

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'Fine-Grained Checkpoint Recovery for Application-Specific Instruction-Set Processors'. Together they form a unique fingerprint.

Cite this