Common Test & Evaluation Analyses

Defense systems are complex systems that contain a variety of software and hardware components. They consistently push the limitations of scientific understanding, synergizing new and unique interfaces among software, hardware, and users.

Testing these complex, multi-functional systems, especially when you consider the range of system types that must be tested, demands both a structured and flexible analytical process. Thorough testing of defense systems must ensure that they will work in a variety of complex combat environments and scenarios.

Characteristics of Test and Evaluation Goals

To ensure thorough testing, defense system testing occurs in multiple phases – developmental and operational. Both use similar analytical tools, but emphasize different goals.

Developmental Testing

Developmental testing focuses on understanding system capabilities and limitations while still in the development phase. This often means tests are conducted on critical subsystems before they are integrated into the full system. Additionally, in contrast to operational testing, developmental testing often takes place in a more controlled environment, making it the best time to use DOE to learn about key reasons why system performance changes as a function of test conditions.

Key focus areas for developmental testing include:

  • – System Performance
  • – Reliability
  • – Interoperability
  • – Cybersecurity

Operational Testing

In contrast, to developmental testing, operational testing focuses on the effectiveness and suitability of systems when used in the operational environment, by operational users. Key components of effectiveness often include mission accomplishment and system performance. Suitability, on the other hand, often relies on the assessment of reliability, maintainability, and availability as well as Human Factors or Human Systems Integration measures (e.g., workload, usability, training adequacy).

Key focus areas for operational testing include:

  • System Effectiveness
    • – Performance in Operational Environment
    • – Support of Mission Accomplishment
  • System Suitability
    • – Reliability, Availability, & Maintainability
    • – Usability, Workload, other Human Factors


Characteristics of Test & Evaluation Data

In the end, analysis of any data simply comes down to the mathematical characteristics of the data. Regardless of test phase (developmental/operational), we see common trends throughout T&E that make a certain subset of statistical analysis methods very useful to know well. The following are the most commonly occurring data characteristics and their corresponding analyses.

T&E Data Characteristics & Analyses

 Issues in T&EImplications for DataImplications for AnalysisCommon Statistical Techniques
Complex TestsT&E probes wide ranges of expected operating conditions and performance capabilities. Ideally an experimental design was used to specifically select conditions of interest, but even in cases where “free play” methodologies are used, tests often strive for detailed coverage of multiple operating modes crossed with several environmental factors, meaning resulting data are categorized by or associated with factor setting combinations.Simple summary statistics (means, medians, etc.) are almost never appropriate for complex tests, as they fail to capture the outcomes of tests at each condition. Regression based techniques allow for analysts to show how performance, mission accomplishment, and other outcomes change across a variety of conditions.
  • Regression-based Analyses
High-performance Systems AssessmentsMilitary systems are typically engineered to solve problems faster, more accurately, and with fewer problems. Common performance measures such as timeliness and accuracy are often skewed because, for example, a system cannot complete a process in 0 time or be more than 100% accurate. Thus, their distributions have "tails" to one side or the other.Many traditional parametric analyses assume normally distributed data and are invalid when distributions are too skewed. Instead, T&E analysts commonly transform data before analysis or use generalized linear models.
  • Generalized Linear Modeling
  • Data Transformation
Unavoidable Pass-Fail CriteriaPass/fail metrics clearly reflect the choices decision makers must face and are commonly used in testing. Interoperability metrics, for example, are often best characterized by whether or not systems can communicate with one-another. When a continuous alternative measure is not possible or not desirable to decision-makers, these metrics result in binary data. Traditional regression analyses assume continuous linear responses, which are more statistically powerful and informative.
T&E analysts often use logistic regression to render such data appropriate for regression techniques. Logistic regression computes the odds (a continuous measure) of observing either category of the binary variable, thus allowing other continuous and categorical variables to predict these odds.
  • Logistic Regression
High Priority on ReliabilityReliability is one of the primary aspects of a system’s operational suitability; Military systems must be built to withstand extreme conditions and operate consistently without failure. Thus, it is critical that reliability is designed into a system in the earliest stages and assessed throughout all phases of testing.

Reliability analysts must choose how to convert down-time, up-time, and other relevant data into an index of reliability as well as choose which distribution best models the data.

Additionally, some systems are highly reliable or test time is limited and failures are not observed.
Defense test evaluators often compute reliability as Mean Time Between Failures and model it as Weibull or Exponential distributions. They then compute a reliability estimate and confidence intervals for these estimates using methods appropriate for these non-normal distributions.

Often data is limited and, for example, only one failure occurred or failure did not occur at all. In these cases, analysts must make advanced analytical decisions about which parameters are appropriate to estimate or whether data of that type can be censored. For example, a lower confidence bound cannot be calculated when no failures were observed.

When multiple estimates are computed, analysts can also predict failure rates across system variants, modes of operation, and test phase in a generalized linear regression.
  • Reliability Estimators & Functions
  • Weibull & Exponential Confidence Interval Methods
  • Censored Data
  • Generalized Linear Modeling
Maintainability and Availability AssessmentsMilitary systems are maintained on a regular and rigorous schedule, but must also be available for operations without excessive down times. Thus, it is important to assess how accessible and easy to maintain a system's components are, which is often measured by how long maintenance activities and repairs take. Many activities can be accomplished quickly without a large impact on the availability of the system, while some can require thorough disassembly or be slowed by waiting for parts. These variable times often create skewed distributions.Time to completion is often skewed or lognormally distributed. That is, the distribution of times has a long positive tail because some tasks drag on, but most are completed within a smaller time-frame. Analyses require transforming the data or using lognormal regression, a form of generalized linear model.
  • Lognormal Regression
  • Generalized Linear Modeling
  • Log & Exponential Transformations
Multiple Test PhasesSystems are tested multiple times throughout Developmental and Operational testing. Often, earlier data is informative to latter assessments. Techniques to combine data from multiple test phases are increasingly investigated and used.Traditional frequentist statistics are limited to making statements about data observed under the same conditions and belonging to the same distribution. There are often so many variables to control and log that comparable conditions across tests cannot be achieved.

A Bayesian approach, on the other hand, can update and improve existing estimate accuracy with new data, or integrate prior knowledge into new test results without the same constraints.

For example, in reliability assessments, wherein hundreds of hours of operation are often required but not feasible, Bayesian methods are more frequently being investigate and applied to combining reliability observations across multiple phases of testing.
  • Bayesian Analysis
Limited Test ResourcesTime, personnel & equipment availability, and other test resources are commonly stretched thin, leaving a constant desire for more data.Small sample sizes are often poorly fit by assumed distributions. Some statistical tests are robust, or only slightly affected by such violations of assumptions, but only with larger sample sizes. Bootstrapping is a common method to generate estimations from odd distributions.

Bayesian methods are also useful if more data is available from another test phase.

Leave a Reply