Syllabus
Unit I
Introduction to Fault-Tolerance: Error, Faults and Failures; Reliability and Availability; Dependability Measures. Hardware Fault-Tolerance: Canonical and Resilient Structures; Reliability Evaluation Techniques and Models; Processor-level Fault Tolerance; Byzantine Failures and Agreements.
Unit II
Information Redundancy: Error Detection/Correction Codes (Hamming, Parity, Checksum, Berger, Cyclic, Arithmetic); Encoding/Decoding circuits; Resilient Disk Systems (RAID). Fault-Tolerant Networks: Network Topologies and their Resilience; Fault-tolerant Routing
Unit III
Software Fault-Tolerance: Single-Version Fault Tolerance; N-Version Programming; Recovery Approach; Exception and Conditional (Assert) Handling; Reliability Models. Check pointing: Optimal Check pointing; Check pointing in Distributed and Shared-memory Systems.
Objectives and Outcomes
Course Objectives
- The objective of this course is to identify the requirements of fault tolerant systems, their algorithms and design principles.
Course Outcomes
CO1: To understand the risk of computer failures and their comparison with other equipment failures.
CO2: To understand the different advantages and limits of fault avoidance and fault tolerance techniques.
CO3: To gain knowledge in sources of faults and their prevention and forecasting.
CO4: To analyze fault-tolerant or non-fault-tolerant on the basis of dependability requirements.
CO-PO Mapping
PO/PSO
|
PO1
|
PO2
|
PO3
|
PO4
|
PO5
|
PO6
|
PO7
|
PO8
|
PO9
|
PO10
|
PO11
|
PO12
|
PSO1
|
PSO2
|
CO
|
CO1
|
3
|
3
|
1
|
|
1
|
2
|
|
|
2
|
|
|
3
|
3
|
2
|
CO2
|
3
|
3
|
1
|
|
1
|
2
|
|
|
2
|
|
|
3
|
3
|
2
|
CO3
|
3
|
3
|
1
|
2
|
1
|
2
|
|
|
2
|
|
|
3
|
3
|
2
|
CO4
|
3
|
3
|
2
|
2
|
3
|
2
|
|
|
2
|
|
|
3
|
3
|
2
|
Evaluation Pattern
Evaluation Pattern: 70:30
Assessment
|
Internal
|
End Semester
|
MidTerm Exam
|
20
|
|
Continuous Assessment – Theory (*CAT)
|
10
|
|
Continuous Assessment – Lab (*CAL)
|
40
|
|
**End Semester
|
|
30 (50 Marks; 2 hours exam)
|
*CAT – Can be Quizzes, Assignments, and Reports
*CAL – Can be Lab Assessments, Project, and Report
**End Semester can be theory examination/ lab-based examination/ project presentation
Text Books / References
Textbook(s)
Israel Koren, C. Krishna, “Fault-Tolerant Systems”, 2nd Edition, 2020.
Reference(s)
Kishor S. Trivedi, “Probability and Statistics with Reliability, Queuing and Computer Science Applications”, John Wiley & Sons Inc., 2016.
- Jalote, “Fault Tolerance in Distributed Systems”, Prentice-Hall Inc. 1994.
- K. Pradhan, “Fault-Tolerant Computing, Theory and Techniques”, Prentice-Hall, 1998.