In this paper, we propose a new, low hardware overhead solution for permanent fault detection at the microarchitecture/instruction level. The proposed technique is based on an ultra-reduced instruction set co-processor (URISC) that, in its simplest form, executes only one Turing complete instruction - the subleq instruction. Thus, any instruction on the main core can be redundantly executed on the URISC using a sequence of subleq instructions, and the results can be compared, also on the URISC, to detect faults. A number of novel software and hardware techniques are proposed to decrease the performance overhead of online fault detection while keeping the error detection latency bounded including: (i) URISC routines and hardware support to check both control and data flow instructions; (ii) checking only a subset of instructions in the code based on a novel check window criterion; and (iii) URISC instruction set extensions. Our experimental results, based on FPGA synthesis and RTL simulations, illustrate the benefits of the proposed techniques.