Title | Authors | Type | Tags |
---|---|---|---|
A Comparison of Ballistic Resistance Testing Techniques in the Department of Defense This paper summarizes sensitivity test methods commonly employed in the . . . This paper summarizes sensitivity test methods commonly employed in the Department of Defense. A comparison study shows that modern methods such as Neyer's method and Three-Phase Optimal Design are improvements over historical methods. | Thomas Johnson, Laura J. Freeman, Janice Hester, Jonathan Bell | Research Paper | |
A First Step into the Bootstrap World Bootstrapping is a powerful nonparametric tool for conducting statistical . . . Bootstrapping is a powerful nonparametric tool for conducting statistical inference with many applications to data from operational testing. Bootstrapping is most useful when the population sampled from is unknown or complex or the sampling . . . | Matthew Avery | Technical Briefing | |
A Multi-Method Approach to Evaluating Human-System Interactions during Operational Testing The purpose of this paper was to identify the shortcomings of a single-method . . . The purpose of this paper was to identify the shortcomings of a single-method approach to evaluating human-system interactions during operational testing and offer an alternative, multi-method approach that is more defensible, yields richer . . . | Dean Thomas, Heather Wojton, Chad Bieber, Daniel Porter | Research Paper | |
A Review of Sequential Analysis Sequential analysis concerns statistical evaluation in situations in which the . . . Sequential analysis concerns statistical evaluation in situations in which the number, pattern, or composition of the data is not determined at the start of the investigation, but instead depends upon the information acquired throughout the . . . | Rebecca Medlin, John Dennis, Keyla Pagán-Rivera, Leonard Wilkins, Heather Wojton | Research Paper | |
A team-centric metric framework for testing and evaluation of human-machine teams We propose and present a parallelized metric framework for evaluating . . . We propose and present a parallelized metric framework for evaluating human-machine teams that draws upon current knowledge of human-systems interfacing and integration but is rooted in team-centric concepts. Humans and machines working . . . | Jay Wilkins, David A. Sparrow, Caitlan A. Fealing, Brian D. Vickers, Kristina A. Ferguson, Heather Wojton | Research Paper | |
Test and evaluation (T&E) of AI-enabled systems (AIES) often emphasizes . . . Test and evaluation (T&E) of AI-enabled systems (AIES) often emphasizes algorithm accuracy over robust, holistic system performance. While this narrow focus may be adequate for some applications of AI, for many complex uses, T&E . . . | Brian Vickers | Technical Briefing | |
An Expository Paper on Optimal Design There are many situations where the requirements of a standard experimental . . . There are many situations where the requirements of a standard experimental design do not fit the research requirements of the problem. Three such situations occur when the problem requires unusual resource restrictions, when there are . . . | Douglas C. Montgomery, Bradley A. Jones, Rachel T. Johnson | Research Paper | |
An Uncertainty Analysis Case Study of Live Fire Modeling and Simulation This paper emphasizes the use of fundamental statistical techniques – design of . . . This paper emphasizes the use of fundamental statistical techniques – design of experiments, statistical modeling, and propagation of uncertainty – in the context of a combat scenario that depicts a ground vehicle being engaged by indirect . . . | Mark Couch, Thomas Johnson, John Haman, Heather Wojton, Benjamin Turner, David Higdon | Other | |
Artificial Intelligence & Autonomy Test & Evaluation Roadmap Goals As the Department of Defense acquires new systems
with artificial intelligence . . . As the Department of Defense acquires new systems
with artificial intelligence (AI) and autonomous (AI&A)
capabilities, the test and evaluation (T&E) community will
need to adapt to the challenges that these novel . . . | Brian Vickers, Daniel Porter, Rachel Haga, Heather Wojton | Technical Briefing | |
Bayesian Reliability: Combining Information One of the most powerful features of Bayesian analyses is the ability to combine . . . One of the most powerful features of Bayesian analyses is the ability to combine multiple sources of information in a principled way to perform inference. This feature can be particularly valuable in assessing the reliability of systems . . . | Alyson Wilson, Kassandra Froncyzk | Research Paper | |
Censored Data Analysis Methods for Performance Data: A Tutorial Binomial metrics like probability-to-detect or probability-to-hit typically do . . . Binomial metrics like probability-to-detect or probability-to-hit typically do not provide the maximum information from testing. Using continuous metrics such as time to detect provide more information, but do not account for non-detects. . . . | V. Bram Lillard | Technical Briefing | |
Challenges and new methods for designing reliability experiments Engineers use reliability experiments to determine the factors that drive . . . Engineers use reliability experiments to determine the factors that drive product reliability, build robust products, and predict reliability under use conditions. This article uses recent testing of a Howitzer to illustrate the challenges . . . | Laura Freeman, Thomas Johnson, Rebecca Medlin | Research Paper | |
Characterizing Human-Machine Teaming Metrics for Test & Evaluation This briefing defines human-machine teaming, describes new challenges in . . . This briefing defines human-machine teaming, describes new challenges in evaluating HMTs, and provides a framework for the categories of metrics that are important for the T&E of HMTs. | Heather Wojton, Brian Vickers, Kristina Carter, David Sparrow, Leonard Wilkins, Caitlan Fealing | Technical Briefing | |
Choice of second-order response surface designs for logistic and Poisson regression models This paper illustrates the construction of D-optimal second order designs for . . . This paper illustrates the construction of D-optimal second order designs for situations when the response is either binomial (pass/fail) or Poisson (count data). | Rachel T. Johnson, Douglas C. Montgomery | Research Paper | |
Circular prediction regions for miss distance models under heteroskedasticity Circular prediction regions are used in ballistic testing to express the . . . Circular prediction regions are used in ballistic testing to express the uncertainty in shot accuracy. We compare two modeling approaches for estimating circular prediction regions for the miss distance of a ballistic projectile. The miss . . . | Thomas H. Johnson, John T. Haman, Heather Wojton, Laura Freeman | Research Paper | |
Comparing Computer Experiments for the Gaussian Process Model Using Integrated Prediction Variance Space filling designs are a common choice of experimental design strategy for . . . Space filling designs are a common choice of experimental design strategy for computer experiments. This paper compares space filling design types based on their theoretical prediction variance properties with respect to the Gaussian . . . | Rachel T. Johnson, Douglas C. Montgomery, Bradley Jones, Chris Gotwalt | Research Paper | |
Comparing Normal and Binary D-Optimal Designs by Statistical Power In many Department of Defense test and evaluation applications, binary response . . . In many Department of Defense test and evaluation applications, binary response variables are unavoidable. Many have considered D-optimal design of experiments for generalized linear models. However, little consideration has been given to . . . | Addison D Adams | Technical Briefing | |
Data Principles for Operational and Live-Fire Testing Many DOD systems undergo operational testing, which is a field test involving . . . Many DOD systems undergo operational testing, which is a field test involving realistic combat conditions. Data, analysis, and reporting are the fundamental outcomes of operational test, which support leadership decisions. The importance of . . . | John Haman | Technical Briefing | |
Demystifying the Black Box: A Test Strategy for Autonomy The purpose of this briefing is to provide a high-level overview of how to frame . . . The purpose of this briefing is to provide a high-level overview of how to frame the question of testing autonomous systems in a way that will enable development of successful test strategies. The brief outlines the challenges and . . . | Heather Wojton, Daniel Porter | Technical Briefing | |
Designed Experiments for the Defense Community This paper presents the underlying tenets of design of experiments, as applied . . . This paper presents the underlying tenets of design of experiments, as applied in the Department of Defense, focusing on factorial, fractional factorial and response surface design and analyses. The concepts of statistical modeling and . . . | Rachel T. Johnson, Douglas C. Montgomery, James R. Simpson | Research Paper | |
Designing Experiments for Model Validation Advances in computational power have allowed both greater fidelity and more . . . Advances in computational power have allowed both greater fidelity and more extensive use of such models. Numerous complex military systems have a corresponding models that simulate its performance in the field. In response, the DoD needs . . . | Heather Wojton, Kelly Avery, Laura Freeman, Thomas Johnson | Other | |
Designing experiments for nonlinear models—an introduction This paper illustrates the construction of Bayesian D-optimal designs for . . . This paper illustrates the construction of Bayesian D-optimal designs for nonlinear models and compares the relative efficiency of standard designs to these designs for several models and prior distributions on the parameters. | Rachel T. Johnson, Douglas C. Montgomery | Research Paper | |
This paper describes holistic progress in answering the question of “How much . . . This paper describes holistic progress in answering the question of “How much testing is enough?” It covers areas in which the T&E community has made progress, areas in which progress remains elusive, and issues that have emerged since . . . | Rebecca Medlin, Matthew Avery, James Simpson, Heather Wojton | Research Paper | |
Developing AI Trust: From Theory to Testing and the Myths in Between This introductory work aims to provide members of the Test and Evaluation . . . This introductory work aims to provide members of the Test and Evaluation community with a clear understanding of trust and trustworthiness to support responsible and effective evaluation of AI systems. The paper provides a set of working . . . | Yosef Razin, Kristen Alexander, John Haman | Research Paper | Trust; AI |
This work describes the development of a statistical test created in support of . . . This work describes the development of a statistical test created in support of ongoing verification, validation, and accreditation (VV&A) efforts for modeling and simulation (M&S) environments. The test computes a Wald-type . . . | Carrington A. Metts, Curtis Miller | ||
Examining Improved Experimental Designs for Wind Tunnel Testing Using Monte Carlo Sampling Methods In this paper we compare data from a fairly large legacy wind tunnel test . . . In this paper we compare data from a fairly large legacy wind tunnel test campaign to smaller, statistically-motivated experimental design strategies. The comparison, using Monte Carlo sampling methodology, suggests a tremendous opportunity . . . | Raymond R. Hill, Derek A. Leggio, Shay R. Capehart, August G. Roesener | Research Paper | |
Handbook on Statistical Design & Analysis Techniques for Modeling & Simulation Validation This handbook focuses on methods for data-driven validation to supplement the . . . This handbook focuses on methods for data-driven validation to supplement the vast existing literature for Verification, Validation, and Accreditation (VV&A) and the emerging references on uncertainty quantification (UQ). The goal of . . . | Heather Wojton, Kelly Avery, Laura J. Freeman, Samuel Parry, Gregory Whittier, Thomas Johnson, Andrew Flack | Handbook | handbook, statistics |
This tutorial provides an overview of experimental design for modeling and . . . This tutorial provides an overview of experimental design for modeling and simulation. Pros and cons of each design methodology are discussed. | Rachel Johnson Silvestrini | Technical Briefing | |
Implementing Fast Flexible Space-Filling Designs in R Modeling and simulation (M&S) can be a useful tool when testers and . . . Modeling and simulation (M&S) can be a useful tool when testers and evaluators need to augment the data collected during a test event. When planning M&S, testers use experimental design techniques to determine how much and which . . . | Christopher Dimapasok | Technical Briefing | |
Improving Operational Test Efficiency: Sequential Methods in Operational Testing The Department of Defense develops and acquires some of the world's most . . . The Department of Defense develops and acquires some of the world's most advanced, sophisticated, and expensive systems. As new technologies emerge and are incorporated into systems, Director, Operational Test and Evaluation faces the . . . | Keyla Pagan-Rivera | Technical Briefing | |
Improving Reliability Estimates with Bayesian Statistics This paper shows how Bayesian methods are ideal for the assessment of complex . . . This paper shows how Bayesian methods are ideal for the assessment of complex system reliability assessments. Several examples illustrate the methodology. | Kassandra Fronczyk, Laura J. Freeman | Research Paper | |
Improving Test Efficiency: A Bayesian Assurance Case Study To improve test planning for evaluating system reliability, we propose the use . . . To improve test planning for evaluating system reliability, we propose the use of Bayesian methods to incorporate supplementary data and reduce testing duration. Furthermore, we recommend Bayesian methods be employed in the analysis phase . . . | Rebecca M Medlin | Technical Briefing | |
Informing the Warfighter—Why Statistical Methods Matter in Defense Testing https://chance.amstat.org/2018/04/informing-the-warfighter/ https://chance.amstat.org/2018/04/informing-the-warfighter/ | Laura J. Freeman and Catherine Warner | Research Paper | |
Initial Validation of the Trust of Automated Systems Test (TOAST) Trust is a key determinant of whether people rely on automated systems in the . . . Trust is a key determinant of whether people rely on automated systems in the military and the public. However, there is currently no standard for measuring trust in automated systems. In the present studies we propose a scale to measure . . . | Heather Wojton, Daniel Porter, Stephanie Lane, Chad Bieber, Poornima Madhavan | Research Paper | |
ciTools is an R package for working with model uncertainty. It gives users . . . ciTools is an R package for working with model uncertainty. It gives users access to confidence and prediction intervals for the fitted values of (log-) linear models, generalized linear models, and (log-) linear mixed models.
Additionally, . . . | John Haman, Matthew Avery, Laura Freeman | Technical Briefing | |
Introduction to Design of Experiments This training provides details regarding the use of design of experiments, from . . . This training provides details regarding the use of design of experiments, from choosing proper response variables, to identifying factors that could affect such responses, to determining the amount of data necessary to collect. The . . . | Breeana Anderson, Rebecca Medlin, John T. Haman, Kelly M. Avery, Keyla Pagan-Rivera | Technical Briefing | |
Introduction to Measuring Situational Awareness in Mission-Based Testing Scenarios In FY23, OED’s Test Science group conducted research into situationalawareness . . . In FY23, OED’s Test Science group conducted research into situationalawareness (SA) measurement for operational testing (OT). Following ourpresentation at the 2023 DATAWorks conference, a representative from the Army Evaluation Command . . . | Elizabeth Green, John Haman | Technical Briefing | |
Managing T&E Data to encourage reuse Reusing Test and Evaluation (T&E) datasets multiple times at different . . . Reusing Test and Evaluation (T&E) datasets multiple times at different points throughout a program’s lifecycle is one way to realize their full value. Data management plays an important role in enabling this practice. Reuse of T&E . . . | Andrew Flack | Research Paper | Data Management |
Metamodeling Techniques for Verification and Validation of Modeling and Simulation Data Modeling and simulation (M&S) outputs help the Director, Operational Test . . . Modeling and simulation (M&S) outputs help the Director, Operational Test and Evaluation (DOT&E), assess the effectiveness, survivability, lethality, and suitability of systems. To use M&S outputs, DOT&E needs models and . . . | John T. Haman, Curtis G. Miller | Research Paper | |
On scoping a test that addresses the wrong objective Statistical literature refers to a type of error that is committed by giving the . . . Statistical literature refers to a type of error that is committed by giving the right answer to the wrong question. If a test design is adequately scoped to address an irrelevant objective, one could say that a Type III error occurs. In . . . | Thomas Johnson, Rebecca Medlin, Laura Freeman, James Simpson | Research Paper | |
Power Analysis Tutorial for Experimental Design Software This guide provides both a general explanation of power analysis and specific . . . This guide provides both a general explanation of power analysis and specific guidance to successfully interface with two software packages, JMP and Design Expert (DX). | James Simpson, Thomas Johnson, Laura J. Freeman | Handbook | |
This paper investigates regularization for continuously observed covariates that . . . This paper investigates regularization for continuously observed covariates that resemble step functions. Two approaches for regularizing these covariates are considered, including a thinning approach commonly used within the DoD to address . . . | Matthew Avery, Mark Orndorff, Timothy Robinson, Laura J. Freeman | Research Paper | |
Scientific Measurement of Situation Awareness in Operational Testing Situation Awareness (SA) plays a key role in decision making and human . . . Situation Awareness (SA) plays a key role in decision making and human performance; higher operator SA is associated with increased operator performance and decreased operator errors. While maintaining or improving “situational awareness” . . . | Elizabeth A. Green, Miriam E. Armstrong, Janna Mantua | Research Paper | |
Space-Filling Designs for Modeling & Simulation This document presents arguments and methods for using space-filling designs . . . This document presents arguments and methods for using space-filling designs (SFDs) to plan modeling and simulation (M&S) data collection. | Han Yi, Curtis Miller, Kelly Avery | Research Paper | |
The U.S. Department of Defense uses modeling and simulation (M&S) for test . . . The U.S. Department of Defense uses modeling and simulation (M&S) for test and evaluation of systems acquired by the Services. The Director, Operational Test and Evaluation (DOT&E), who provides oversight of operational testing, . . . | Curtis G Miller | Technical Briefing | |
Statistical Methods Development Work for M&S Validation Modeling and simulation (M&S) environments feature frequently in test and . . . Modeling and simulation (M&S) environments feature frequently in test and evaluation (T&E) of Department of Defense (DoD) systems. Many M&S environments do not suffer many of the resourcing limitations associated with live test. . . . | Curtis Miller | Technical Briefing | |
Statistical Methods for Defense Testing In the increasingly complex and data‐limited world of military defense testing, . . . In the increasingly complex and data‐limited world of military defense testing, statisticians play a valuable role in many applications. Before the DoD acquires any major new capability, that system must undergo realistic testing in its . . . | Dean Thomas, Kelly Avery, Laura Freeman | Research Paper | |
Statistical Models for Combining Information Stryker Reliability Case Study This paper describes the benefits of using parametric statistical models to . . . This paper describes the benefits of using parametric statistical models to combine information across multiple testing events. Both frequentist and Bayesian inference techniques are employed, and they are compared and contrasted to . . . | Rebecca Dickinson, Laura J. Freeman, Bruce Simpson, Alyson Wilson | Research Paper | |
Test & Evaluation of AI-Enabled and Autonomous Systems: A Literature Review This paper summarizes a subset of the literature regarding the challenges to and . . . This paper summarizes a subset of the literature regarding the challenges to and recommendations for the test, evaluation, verification, and validation (TEV&V) of autonomous military systems. | Heather Wojton, Daniel Porter, John Dennis | Research Paper | |
Test Design Challenges in Defense Testing All systems undergo operational testing before fielding
or full-rate production. . . . All systems undergo operational testing before fielding
or full-rate production. While contractor and developmental
testing tends to be requirements-driven, operational testing
focuses on mission success. The goal is to evaluate
operational . . . | Rebecca Medlin, Kelly Avery, Curtis Miller | Technical Briefing | |
We present a simulation study that examines the impact of small sample sizes in . . . We present a simulation study that examines the impact of small sample sizes in both observation and nesting levels of the model on the fixed effect bias,
type I error, and the power of a simple mixed model analysis. Despite the need for . . . | Kristina A. Carter, Heather M. Wojton, Stephanie T. Lane | Research Paper | |
We present a simulation study that examines the impact of small sample sizes in . . . We present a simulation study that examines the impact of small sample sizes in both observation and nesting levels of the model on the fixed effect bias, type I error, and the power of a simple mixed model analysis. Despite the need for . . . | Kristina A. Carter, Heather M. Wojton, Stephanie T. Lane | Research Paper | |
The Purpose of Mixed-effects Models in Test and Evaluation Mixed-effects models are the standard technique for analyzing data that exhibit . . . Mixed-effects models are the standard technique for analyzing data that exhibit some grouping structure. In defense testing, these models are useful because they allow us to account for correlations between observations, a feature common in . . . | John Haman, Matthew Avery, Heather Wojton | Research Paper | Mixed models |
Trustworthy Autonomy: A Roadmap to Assurance -- Part 1: System Effectiveness In this document, we present part one of our two-part roadmap. We discuss the . . . In this document, we present part one of our two-part roadmap. We discuss the challenges and possible solutions to assessing system effectiveness. | Daniel Porter, Michael McAnally, Chad Bieber, Heather Wojton, Rebecca Medlin | Handbook | |
Why are Statistical Engineers needed for Test & Evaluation? This briefing, developed for a presentation at the 2021
Quality and Productivity . . . This briefing, developed for a presentation at the 2021
Quality and Productivity Research Conference, includes two
case studies that highlight why statistical engineers are
necessary for successful T&E. These case studies center on
the . . . | Rebecca Medlin, Keyla Pagán-Rivera, Monica Ahrens | Technical Briefing |
2021-03-18