This paper summarizes sensitivity test methods commonly employed in the . . .
This paper summarizes sensitivity test methods commonly employed in the Department of Defense. A comparison study shows that modern methods such as Neyer’s method and Three-Phase Optimal Design are improvements over historical methods.
|Thomas Johnson, Laura J. Freeman, Janice Hester, Jonathan Bell||Research Paper|
Bootstrapping is a powerful nonparametric tool for conducting statistical . . .
Bootstrapping is a powerful nonparametric tool for conducting statistical inference with many applications to data from operational testing. Bootstrapping is most useful when the population sampled from is unknown or complex or the sampling distribution of the desired statistic is difficult to derive. Careful use of bootstrapping can help address many challenges in analyzing operational test data.
|Matthew Avery||Technical Briefing|
The purpose of this paper was to identify the shortcomings of a single-method . . .
The purpose of this paper was to identify the shortcomings of a single-method approach to evaluating human-system interactions during operational testing and offer an alternative, multi-method approach that is more defensible, yields richer insights into how operators interact with weapon systems, and provides a practical implications for identifying when the quality of human-system interactions warrants correction through either operator training or redesign.
|Dean Thomas, Heather Wojton, Chad Bieber, Daniel Porter||Research Paper|
Sequential analysis concerns statistical evaluation in situations in which the . . .
Sequential analysis concerns statistical evaluation in situations in which the number, pattern, or composition of the data is not determined at the start of the investigation, but instead depends upon the information acquired throughout the course of the investigation. Expanding the use of sequential analysis has the potential to save a lot of money and reduce test time (National Research Council, 1998). This paper summarizes the literature on sequential analysis and offers fundamental information for providing recommendations for its use in DoD test and evaluation.
|Rebecca Medlin, John Dennis, Keyla Pagán-Rivera, Leonard Wilkins, Heather Wojton||Research Paper|
There are many situations where the requirements of a standard experimental . . .
There are many situations where the requirements of a standard experimental design do not fit the research requirements of the problem. Three such situations occur when the problem requires unusual resource restrictions, when there are constraints on the design region, and when a non-standard model is expected to be required to adequately explain the response.
|Douglas C. Montgomery, Bradley A. Jones, Rachel T. Johnson||Research Paper|
This paper emphasizes the use of fundamental statistical techniques – design of . . .
This paper emphasizes the use of fundamental statistical techniques – design of experiments, statistical modeling, and propagation of uncertainty – in the context of a combat scenario that depicts a ground vehicle being engaged by indirect artillery.
|Mark Couch, Thomas Johnson, John Haman, Heather Wojton, Benjamin Turner, David Higdon||Other|
As the Department of Defense acquires new systems with artificial intelligence . . .
As the Department of Defense acquires new systems with artificial intelligence (AI) and autonomous (AI&A) capabilities, the test and evaluation (T&E) community will need to adapt to the challenges that these novel technologies present. The goals listed in this AI Roadmap address the broad range of tasks that the T&E community will need to achieve in order to properly test, evaluate, verify, and validate AI-enabled and autonomous systems. It includes issues that are unique to AI and autonomous systems, as well as legacy T&E shortcomings that will be compounded by newer technologies.
|Brian Vickers, Daniel Porter, Rachel Haga, Heather Wojton||Technical Briefing|
One of the most powerful features of Bayesian analyses is the ability to combine . . .
One of the most powerful features of Bayesian analyses is the ability to combine multiple sources of information in a principled way to perform inference. This feature can be particularly valuable in assessing the reliability of systems where testing is limited. At their most basic, Bayesian methods for reliability develop informative prior distributions using expert judgment or similar systems. Appropriate models allow the incorporation of many other sources of information, including historical data, information from similar systems, and computer models. We introduce the Bayesian approach to reliability using several examples and point to open problems and areas for future work.
|Alyson Wilson, Kassandra Froncyzk||Research Paper|
Binomial metrics like probability-to-detect or probability-to-hit typically do . . .
Binomial metrics like probability-to-detect or probability-to-hit typically do not provide the maximum information from testing. Using continuous metrics such as time to detect provide more information, but do not account for non-detects. Censored data analysis allows us to account for both pieces of information simultaneously.
|V. Bram Lillard||Technical Briefing|
This briefing defines human-machine teaming, describes new challenges in . . .
This briefing defines human-machine teaming, describes new challenges in evaluating HMTs, and provides a framework for the categories of metrics that are important for the T&E of HMTs.
|Heather Wojton, Brian Vickers, Kristina Carter, David Sparrow, Leonard Wilkins, Caitlan Fealing||Technical Briefing|
This paper illustrates the construction of D-optimal second order designs for . . .
This paper illustrates the construction of D-optimal second order designs for situations when the response is either binomial (pass/fail) or Poisson (count data).
|Rachel T. Johnson, Douglas C. Montgomery||Research Paper|
Space filling designs are a common choice of experimental design strategy for . . .
Space filling designs are a common choice of experimental design strategy for computer experiments. This paper compares space filling design types based on their theoretical prediction variance properties with respect to the Gaussian Process model. https://www.tandfonline.com/doi/abs/10.1080/08982112.2012.758284
|Rachel T. Johnson, Douglas C. Montgomery, Bradley Jones, Chris Gotwalt||Research Paper|
The purpose of this briefing is to provide a high-level overview of how to frame . . .
The purpose of this briefing is to provide a high-level overview of how to frame the question of testing autonomous systems in a way that will enable development of successful test strategies. The brief outlines the challenges and broad-stroke reforms needed to get ready for the test challenges of the next century.
|Heather Wojton, Daniel Porter||Technical Briefing|
This paper presents the underlying tenets of design of experiments, as applied . . .
This paper presents the underlying tenets of design of experiments, as applied in the Department of Defense, focusing on factorial, fractional factorial and response surface design and analyses. The concepts of statistical modeling and sequential experimentation are also emphasized.
|Rachel T. Johnson, Douglas C. Montgomery, James R. Simpson||Research Paper|
Advances in computational power have allowed both greater fidelity and more . . .
Advances in computational power have allowed both greater fidelity and more extensive use of such models. Numerous complex military systems have a corresponding models that simulate its performance in the field. In response, the DoD needs defensible practices for validating these models. DOE and statistical analysis techniques are the foundational building blocks for validating the use of computer models and quantifying uncertainty in that validation. Recent developments in uncertainty quantification have the potential to benefit the DoD in using modeling and simulation to inform operational evaluations.
|Heather Wojton, Kelly Avery, Laura Freeman, Thomas Johnson||Other|
This paper illustrates the construction of Bayesian D-optimal designs for . . .
This paper illustrates the construction of Bayesian D-optimal designs for nonlinear models and compares the relative efficiency of standard designs to these designs for several models and prior distributions on the parameters.
|Rachel T. Johnson, Douglas C. Montgomery||Research Paper|
This paper describes holistic progress in answering the question of “How much . . .
This paper describes holistic progress in answering the question of “How much testing is enough?” It covers areas in which the T&E community has made progress, areas in which progress remains elusive, and issues that have emerged since 1994 that provide additional challenges. The selected case studies used to highlight progress are especially interesting examples, rather than a comprehensive look at all programs since 1994.
|Rebecca Medlin, Matthew Avery, James Simpson, Heather Wojton||Research Paper|
In this paper we compare data from a fairly large legacy wind tunnel test . . .
In this paper we compare data from a fairly large legacy wind tunnel test campaign to smaller, statistically-motivated experimental design strategies. The comparison, using Monte Carlo sampling methodology, suggests a tremendous opportunity to reduce wind tunnel test efforts without losing test information.
|Raymond R. Hill, Derek A. Leggio, Shay R. Capehart, August G. Roesener||Research Paper|
This handbook focuses on methods for data-driven validation to supplement the . . .
This handbook focuses on methods for data-driven validation to supplement the vast existing literature for Verification, Validation, and Accreditation (VV&A) and the emerging references on uncertainty quantification (UQ). The goal of this handbook is to aid the test and evaluation (T&E) community in developing test strategies that support model validation (both external validation and parametric analysis) and statistical UQ.
|Heather Wojton, Kelly Avery, Laura J. Freeman, Samuel Parry, Gregory Whittier, Thomas Johnson, Andrew Flack||Handbook||handbook, statistics|
This tutorial provides an overview of experimental design for modeling and . . .
This tutorial provides an overview of experimental design for modeling and simulation. Pros and cons of each design methodology are discussed.
|Rachel Johnson Silvestrini||Technical Briefing|
This paper shows how Bayesian methods are ideal for the assessment of complex . . .
This paper shows how Bayesian methods are ideal for the assessment of complex system reliability assessments. Several examples illustrate the methodology.
|Kassandra Fronczyk, Laura J. Freeman||Research Paper|
Trust is a key determinant of whether people rely on automated systems in the . . .
Trust is a key determinant of whether people rely on automated systems in the military and the public. However, there is currently no standard for measuring trust in automated systems. In the present studies we propose a scale to measure trust in automated systems that is grounded in current research and theory on trust formation, which we refer to as the Trust in Automated Systems Test (TOAST). We evaluated both the reliability of the scale structure and criterion validity using independent, military-affiliated and civilian samples. In both studies we found that the TOAST exhibited a two-factor structure, measuring system understanding and performance (respectively), and that factor scores significantly predicted scores on theoretically related constructs demonstrating clear criterion validity. We discuss the implications of our findings for advancing the empirical literature and in improving interface design.
|Heather Wojton, Daniel Porter, Stephanie Lane, Chad Bieber, Poornima Madhavan||Research Paper|
This guide provides both a general explanation of power analysis and specific . . .
This guide provides both a general explanation of power analysis and specific guidance to successfully interface with two software packages, JMP and Design Expert (DX).
|James Simpson, Thomas Johnson, Laura J. Freeman||Handbook|
This paper investigates regularization for continuously observed covariates that . . .
This paper investigates regularization for continuously observed covariates that resemble step functions. Two approaches for regularizing these covariates are considered, including a thinning approach commonly used within the DoD to address autocorrelated time series data.
|Matthew Avery, Mark Orndorff, Timothy Robinson, Laura J. Freeman||Research Paper|
This document presents arguments and methods for using space-filling designs . . .
This document presents arguments and methods for using space-filling designs (SFDs) to plan modeling and simulation (M&S) data collection.
|Han Yi, Curtis Miller, Kelly Avery||Research Paper|
In the increasingly complex and data‐limited world of military defense testing, . . .
In the increasingly complex and data‐limited world of military defense testing, statisticians play a valuable role in many applications. Before the DoD acquires any major new capability, that system must undergo realistic testing in its intended environment with military users. Oftentimes new or complex analysis techniques are needed to support the goal of characterizing or predicting system performance across the operational space. Statistical design and analysis techniques are essential for rigorous evaluation of these models.
|Dean Thomas, Kelly Avery, Laura Freeman||Research Paper|
This paper describes the benefits of using parametric statistical models to . . .
This paper describes the benefits of using parametric statistical models to combine information across multiple testing events. Both frequentist and Bayesian inference techniques are employed, and they are compared and contrasted to illustrate different statistical methods for combining information.
|Rebecca Dickinson, Laura J. Freeman, Bruce Simpson, Alyson Wilson||Research Paper|
This paper summarizes a subset of the literature regarding the challenges to and . . .
This paper summarizes a subset of the literature regarding the challenges to and recommendations for the test, evaluation, verification, and validation (TEV&V) of autonomous military systems.
|Heather Wojton, Daniel Porter, John Dennis||Research Paper|
All systems undergo operational testing before fielding or full-rate production. . . .
All systems undergo operational testing before fielding or full-rate production. While contractor and developmental testing tends to be requirements-driven, operational testing focuses on mission success. The goal is to evaluate operational effectiveness and suitability in the context of a realistic environment with representative users. This brief will first provide an overview of operational testing and discuss example defense applications of, and key differences between, classical and space-filling designs. It will then present several challenges (and possible solutions) associated with implementing space-filling designs and associated analyses in the defense community.
|Rebecca Medlin, Kelly Avery, Curtis Miller||Technical Briefing|
In this document, we present part one of our two-part roadmap. We discuss the . . .
In this document, we present part one of our two-part roadmap. We discuss the challenges and possible solutions to assessing system effectiveness.
|Daniel Porter, Michael McAnally, Chad Bieber, Heather Wojton, Rebecca Medlin||Handbook|
This briefing, developed for a presentation at the 2021 Quality and Productivity . . .
This briefing, developed for a presentation at the 2021 Quality and Productivity Research Conference, includes two case studies that highlight why statistical engineers are necessary for successful T&E. These case studies center on the important theme of improving methods to integrate testing and data collection across the full system life cycle – a large, unstructured, real-world problem. Integrated testing supports efficient test execution, potentially reducing cost.
|Rebecca Medlin, Keyla Pagán-Rivera, Monica Ahrens||Technical Briefing|