Abstract
Most reported experience with software reliability models is from a project's testing phases, during which researchers have little control over the failure data. Since failure data can be noisy and distorted, reported procedures for determining model applicability may be incomplete. To gain additional insight into this problem, we generated forty sets of data by drawing samples from two distributions, which were used as inputs to six different software reliability models. We used several different methods to analyze the applicability of the models. We expected that a model would perform best on the data sets created to comply with the model's assumptions, but initially found that this was not always the case. More detailed examination showed that a model using a data set created to satisfy its assumptions tended to have better prequential likelihood bias, and bias trend measures, although the Kolmogorov-Smirnov test might not be a reliable indicator of the best model. These results indicate that more than one measure should be used to determine model applicability, and that for greater accuracy they be evaluated in sequence rather than simultaneously.