Springer, 2006. — XIV, 228 p. — ISBN: 978-0-387-32937-6.
Software-Implemented Hardware Fault Tolerance addresses the innovative topic of software-implemented hardware fault tolerance (SIHFT), i.e., how to deal with faults affecting the hardware by only (or mainly) acting on the software.
The first SIHFT techniques were proposed and adopted several decades ago, but they have been the object of new interest in the past few years, mainly due to the need for developing low-cost safety-critical computer-based applications in fields such as automotive, biomedics, and telecommunications. Therefore, several new approaches to detect and, when possible, correct transient and permanent faults in the hardware have been recently proposed. These approaches are innovative (with respect to those proposed in the past) since they are of higher applicability (often starting from the source-level code of an application) and generality, being capable of coping with many different fault types. The book presents the theory behind software-implemented hardware fault tolerance, as well as the practical aspects related to put it at work on real examples. By evaluating accurately the advantages and disadvantages of the already available approaches, the book provides a guide to developers willing to adopt software-implemented hardware fault tolerance in their applications. Moreover, the book identifies open issues for researchers willing to improve the already available techniques.
Background.Definitions.
Error models for hardware and software components.
Origin of single-event effects.
Redundancy techniques.
Hardening the data.Computation Duplication.
Executable assertions.
Hardening the control flow.Background.
Path identification.
CFE detection in sequential and parallel programs.
BEEC and ECI.
Exploiting instruction level parallelism: ARC technique.
VASC.
ЕССА.
Plain inter-block errors detection.
CFc via regular expressions resorting to 1PC.
CFCSS.
ACFC.
YACCA.
SIED and its enhancements.
Achieving fault tolerance.Design diversity.
Checkpointing.
Algorithm-based fault tolerance (ABFT).
Duplication.
Hybrid techniques.Control flow checking.
Memory access checking.
Reasonableness checking.
Combined techniques.
Fault injection techniques.The FARM Model.
Assumptions.
The fault injection environments.