Back close

A hierarchical fault detection and recovery in a computational grid using watchdog timers

Publication Type : Conference Paper

Publisher : INCOCCI-2010

Source : Proceedings of 2010 International Conference on Communication and Computational Intelligence, INCOCCI-2010, Perundurai, Erode, p.467-471 (2010)

Url : http://www.scopus.com/inward/record.url?eid=2-s2.0-79954594711&partnerID=40&md5=c6bf54e2d77d2241df70ccc6e8025d98

ISBN : 9788183713696

Keywords : Algorithms, Artificial intelligence, cluster, Cluster computing, Computational grids, Divide-and-conquer algorithm, Efficient algorithm, Fault, Fault detection, Fault tolerance, Fault-tolerant, Grid computing, load balancing, Local state, Node failure, Parallel architectures, Quality assurance, recovery, Recovery strategies, watch dog timer, Watchdog timers, Work-flows

Campus : Coimbatore

Department : Department of Information Technology

Year : 2010

Abstract : Grid computing basically means applying the resources of individual computers in a network to focus on a single problem/task at the same time. But the disadvantage of this feature is that the computers which are actually performing the calculations might not be always trustworthy and may fail periodically. Hence larger the number of nodes in the grid, greater is the probability that a node fails. Hence in order to execute the workflows in a fault tolerant manner we go for fault tolerance and recovery strategies. This paper proposes a method in which the instantaneous snapshot of the local state of processes within each node is recorded. An efficient algorithm is introduced for the detection of the node failures using watch dog timers. For recovery we make use of divide and conquer algorithm that avoids redoing of already completed jobs, enabling faster recovery. © 2010 Kongu Engineering College.

Cite this Research Publication : A. H. Bhagyashree, Pradeep, D., Jayanthy, N., Mounica, K. V., Nivejaa, S., and P. Dharani, S., “A hierarchical fault detection and recovery in a computational grid using watchdog timers”, in Proceedings of 2010 International Conference on Communication and Computational Intelligence, INCOCCI-2010, Perundurai, Erode, 2010, pp. 467-471.

Admissions Apply Now