Power & Test Size – Test Science

Statistical Power

One of the primary goals of Design of Experiment is to determine if the proposed statistical analysis and test design are adequate to assess the effectiveness of the system under test. One component of this process is to consider the amount of testing, in terms of the number of test points, and where they are placed across the test region. Statistical power is a quantitative approach that can inform this decision. Power is critical in right-sizing a test to maximize information gained and minimize resources spent. In this section, the general concept of power will be introduced, along with applications of power analysis to evaluation scenarios.

Statistical Power Defined

Statistical power refers to the probability of rejecting the null hypothesis when a specific alternative hypothesis is true. Put another way, statistical power is the likelihood of detecting that a system is effective, given that the system is effective in the real world. Power Example - System Improvement

Consider the notional example where the U.S. Navy is assessing the effects of a modified mine detection sensor (New Sensor) compared to the older version of the sensor (Legacy Sensor) regarding distance to mine detection. Mine detection distance is the distance, in yards, the sensor is from a faux mine when it successfully detects the mine. Statistical power refers to the probability we will conclude that the New Sensor has a better detection rate, if it is indeed a better sensor than the legacy sensor. If statistical power is low, the test team should re-consider the number of test points and the overall design so that adequate power is achieved.

The goal of test design is to plan an experiment that maximizes the ability to conclude that there is statistical support for the efficacy of a system when the system is indeed effective, while minimizing the likelihood of concluding that there is statistical support for a system when, in reality, it is not effective (Type I error). To further explore the concept of power, Figure 1 depicts the four possible outcomes of a hypothesis test:

In reality, the new system is better, but the conclusion is that it was not better.
In reality, the new system is no better/worse, and our test is consistent with reality.
In reality, the new system is better, and our test is consistent with reality.
In reality, the new system is no better or worse, but the conclusion is that it was better.

Figure 1. Hypothesis Testing Decision Matrix

Conceptual View of Power

Power is the likelihood of concluding a system works, when it actually does work in the real world. We must make educated assumptions about what are the real world effects of the system. If we knew the real world effects of a system, we would not need to test it. In hypothesis testing, we start out with two dueling viewpoints on the effects of the system:

Null Hypothesis: The New system is the same as or worse than the Legacy system. (i.e., The average detection distance for the New and Legacy systems are identical.)
Alternative Hypothesis: The New system is better than the Legacy system. (i.e., The average detection distance is longer for the New system than the Legacy system.)

Factors that Affect Power

Without loss of generality, there are four factors that affect statistical power, and these factors can be summarized with the acronym B.E.A.N.

Beta Error (1 – Power): Beta error, or Type II error, is the probability a test outcome will fail to reject the null hypothesis when the null hypothesis is actually false (i.e., there is an effect, but the evaluators concluded there was not). As beta error decrease, power increases, all else being equal.
Effect Size: A standardized value indicating the magnitude of the phenomenon being tested (e.g., the strength of a relationship between variables or the magnitude of a difference between responses at different factor levels). All else being equal, as the effect size increases, power increases.
Alpha Error (1 - Confidence): Alpha error, or Type I error, is the probability of rejecting the null hypothesis when it is actually true (i.e., concluding there is an effect when in reality there is none). All else being equal, as alpha error increases, power increases.
N (Sample Size): The number of runs or test points in a design. The variability of sample statistics decreases as your sample size increases. Thus, as sample size increases, power increases, all else being equal. In practice, sample size is optimized to maximize power while also minimizing the use of test resources. This is accomplished by selecting the smallest sample size needed to attain a desired level of power.

Links and Resources

To interactively explore the effects of sample size and other factors on power, visit the Statistical Power Tools Page For more information on statistical power, see the following resources: Tutorial on Power Calculations in JMP (View or download this PDF Document) Case Study on Determining Sample Size and Power(Download this Word Document) Applications for Visualizing Power and Estimating Sample Sizes

Assess Test Adequacy » Power & Test Size

Statistical Power

Statistical Power Defined

Conceptual View of Power

Factors that Affect Power

Links and Resources

Leave a Reply

Subscribe

Assess Test Adequacy » Power & Test Size

Statistical Power

Statistical Power Defined

Conceptual View of Power

Factors that Affect Power

Links and Resources

Leave a Reply