Confidence & Hypothesis Testing


In an operational test we typically have one or more research questions in mind. “Is the new system more survivable than the old system?” or “Does the reliability of the system meet the threshold?” or most importantly, “Does system performance vary across the operational envelope?” When answering these questions with statistical rigor, the concept of confidence is crucial.  Confidence is best understood in the context of a specific hypothesis test.  For example, consider a new information technology system designed to process repair requests quickly.  To test whether this system has adequate performance across a range of operationally relevant request sizes, we use the following hypotheses: H0 (Null Hypothesis): Request size does not significantly impact the time to process requests Ha (Alternative Hypothesis): Request size does significantly impact the time to process requests Note that if processing time varies across the operational envelope we will reject this null hypothesis.  Notice, while not specifically stated in the hypothesis, if we reject the null hypothesis we will also be able to estimate the effect that the request size has on processing time and compare that to requirements across the operational envelope.  The confidence level of our test dictates how much we believe our result in the case where we reject.  For example, if it turns out that the request size does not impact the processing time for operationally representative file request sizes, but we reject the null hypothesis, the result of the test is a false positive or Type I error.  The confidence level of our test controls the Type I error rate.  So if we perform a test with 95% confidence, we will get a false positive 100% - 95% = 5% of the time.  Similarly, if we perform a test with 80% confidence, we’ll get a false positive 100% - 80% = 20% of the time.  A desired Type I error rate of α can be achieved by using a test with 100(1-α)% confidence. However, this doesn’t necessarily mean we should always choose a high level of confidence for our test.  The tradeoff for a high level of confidence is that we will fail to reject the null hypothesis more often even when we should reject.  Suppose now that the request size does significantly impact processing but only by 5 percent over the range of operationally relevant file sizes.  Without conducting a large test, it is likely that we will fail to reject the null hypothesis, resulting in a false negative or Type II error. Error Chart

Leave a Reply