Abstract
Service Level Agreements (SLAs) have been introduced into the Grid in order to build a basis for its commercial uptake. The challenge for Grid providers in agreeing and operating SLA-bound jobs is to ensure their fulfillment even in the case of failures. Hence, fault-tolerance mechanisms are an essential means of the provider?s SLA management. The high utilization of commercial operated clusters leads to scenarios in which typically a job migration effects other jobs scheduled. The effects result from the unavailability of enough free resources which would be needed to catch all resource outages. Consequently before initiating a migration, its effects for other jobs have to be compared and the initiation of fault tolerance(FT-) mechanisms have to be evaluated recursively. This paper presents a measurement for the benefit of initiating a FT mechanism,the recursive evaluation, and termination condition. Performing such an impact evaluation of an initiated chain of FT-mechanisms is often more profitable than performing a single FT-mechanism and accordingly this is important for the Grid commercialization.