Design of Fault-Tolerant Systems (ID2218, PhD F2B5472)

March-May 2017


Fault tolerance is the ability of a system to continue performing its intended function despite of faults. In a broad sense, fault tolerance is associated with reliability, successful operation, and the absence of breakdowns.

The goal of fault tolerance is the development of a dependable system. As computer systems become relied upon by society more and more, dependability of these systems becomes a critical issue. In airplanes, chemical plants or heart pace-makers a system failure can cost people's lives or environmental disaster.

There are various approaches to achieve fault-tolerance. Common to all of them is a certain amount of redundancy. This can a replicated hardware component, an additional check bit attached to a string of digital data, or a few lines of program code verifying the correctness of the program's results.


The aims of this course are:


The following is a tentative list of topics to be covered:


The evaluation will be based on seven homework assignments (20%, grade A-F), a midterm exam (20%, grade A-F) and a final exam (60%, grade A-F). For PhD students, an additional task will be to read and present a paper approved by the instructor (20 min talk).


The following lecture handouts contain the material covered in the course.


Five assignments for the course (become available as deadline approaches). Numbers refer to probelms in the textbook.


Midterm exam will take place on Monday, April 24th, 13:15-14:00 in room F304 (same as lecture). You don't need to register for it. An example of last year midterm exam.


Final exam will take place on Wednesday, June 1th, 8-12 in room 303. Don't forget to register! An example of exam with answers. More examples without answers: exam 1 exam 2.


Basic understanding of circuits and digital logic.



Elena Dubrova
School of Information and Communication Technology
Royal Institute of Technology (KTH)
Stockholm, Sweden