Abstract
Application failures characterized by the phrases, "it worked yesterday, but it doesn?t work today" and "it worked on that machine, but it doesn?t work on this machine" are a major source of computer user frustration and a major component in the total cost of ownership. The typical symptom-based troubleshooting approach relies too much on creative thinking and may lead users or support technicians in directions far from the actual root cause. In this paper, we propose a state-based troubleshooting approach for configuration failures that aims at making the diagnostic process as mechanical as possible. In the narrow-down phase, we use checkpoint comparison and application tracing to determine which pieces of persistent state have changed and are affecting current application execution; ongoing self-monitoring of persistent-state changes by the machine is used to help eliminate false positives. In the solution-query phase, state-to-task mapping and searches of online databases are used to translate low-level state information into high-level user interfaces and articles. We describe the design and implementation of a troubleshooter that uses this state-based approach and present preliminary results to demonstrate its effectiveness in diagnosing several actual configuration failures.