Collinearity


When designing an experiment, collinearity describes the degree of linear relationship between two or more factors.  A well designed experiment minimizes the amount of collinearity between factors.  Two or more factors are consider collinear if they move together linearly (e.g., as one increases, so does the other).  An example from operational testing is the use of multiple factors to describe the geometric location of an aircraft.  Consider Figure 1, below.  An Apache helicopter engages an enemy tank at ten locations along two different profiles.  The response variable is weapon accuracy, while the factors are slant range and altitude.  The slant range is said to be collinear with altitude because there is a near-linear relationship between the two.  The red best fit line, shown in the bottom left of Figure 1, shows a positive linear relationship between the two factors as demonstrated by its positive slope; hence, altitude and slant range are, to some degree, collinear. Collinearity Apache attack angle While selecting a different flight profile could mitigate the collinearity between slant range and altitude, the factors are mathematically related.  Therefore, we cannot completely eliminate the collinearity by adjusting the flight profile.  What we can do to break the collinearity is replace altitude with engagement angle, as shown in the bottom right of Figure 1.  Using the same 10 points as before, the bottom right of Figure 1 shows that we can create a similar experiment where the factors are not collinear.  Notice that the best fit line, shown in red, is horizontal, indicating that there is no linear relationship between factors (That is, the factors are orthogonal.) Analysis of data containing highly collinear factors can be misleading, confusing, and imprecise.  Variances of coefficient estimates become greatly inflated (making the precision of the test worse) when factors are highly collinear, leading to inflated non-significant p-values (Type II errors).  Additionally, collinearity can lead to false positives.  When a response is regressed on two highly collinear factors, an analysis of variance might report that both factors are significant.  Yet, if only one factor is included in the model, the analysis of variance may indicate that that factor is not significant.  Finally, using a model containing highly collinear factors to extrapolate or interpolate between design points can yield estimates with large uncertainty. Correlation coefficients, and Variance Inflation Factors (VIFs) can be used to help avoid collinearity while planning a DOE.  Both of these measures of merit are calculated and monitored prior to executing an experiment.  They are functions of the number of runs, the factors and levels in an experiment, and how those factors vary from run to run.  They are not a function of the data collected from the test, but rather, serve as a tool for establishing the merit of an experiment and can be used to compare DOEs.

Leave a Reply