Research Document Library

Title	Authors	Type	Tags
A Comparison of Ballistic Resistance Testing Techniques in the Department of Defense This paper summarizes sensitivity test methods commonly employed in the . . . This paper summarizes sensitivity test methods commonly employed in the Department of Defense. A comparison study shows that modern methods such as Neyer's method and Three-Phase Optimal Design are improvements over historical methods.	Thomas Johnson, Laura J. Freeman, Janice Hester, Jonathan Bell	Research Paper
A First Step into the Bootstrap World Bootstrapping is a powerful nonparametric tool for conducting statistical . . . Bootstrapping is a powerful nonparametric tool for conducting statistical inference with many applications to data from operational testing. Bootstrapping is most useful when the population sampled from is unknown or complex or the sampling . . .	Matthew Avery	Technical Briefing
A Multi-Method Approach to Evaluating Human-System Interactions during Operational Testing The purpose of this paper was to identify the shortcomings of a single-method . . . The purpose of this paper was to identify the shortcomings of a single-method approach to evaluating human-system interactions during operational testing and offer an alternative, multi-method approach that is more defensible, yields richer . . .	Dean Thomas, Heather Wojton, Chad Bieber, Daniel Porter	Research Paper
A Review of Sequential Analysis Sequential analysis concerns statistical evaluation in situations in which the . . . Sequential analysis concerns statistical evaluation in situations in which the number, pattern, or composition of the data is not determined at the start of the investigation, but instead depends upon the information acquired throughout the . . .	Rebecca Medlin, John Dennis, Keyla Pagán-Rivera, Leonard Wilkins, Heather Wojton	Research Paper
A team-centric metric framework for testing and evaluation of human-machine teams We propose and present a parallelized metric framework for evaluating . . . We propose and present a parallelized metric framework for evaluating human-machine teams that draws upon current knowledge of human-systems interfacing and integration but is rooted in team-centric concepts. Humans and machines working . . .	Jay Wilkins, David A. Sparrow, Caitlan A. Fealing, Brian D. Vickers, Kristina A. Ferguson, Heather Wojton	Research Paper
AI + Autonomy T&E in DoD Test and evaluation (T&E) of AI-enabled systems (AIES) often emphasizes . . . Test and evaluation (T&E) of AI-enabled systems (AIES) often emphasizes algorithm accuracy over robust, holistic system performance. While this narrow focus may be adequate for some applications of AI, for many complex uses, T&E . . .	Brian Vickers	Technical Briefing
An Expository Paper on Optimal Design There are many situations where the requirements of a standard experimental . . . There are many situations where the requirements of a standard experimental design do not fit the research requirements of the problem. Three such situations occur when the problem requires unusual resource restrictions, when there are . . .	Douglas C. Montgomery, Bradley A. Jones, Rachel T. Johnson	Research Paper
An Uncertainty Analysis Case Study of Live Fire Modeling and Simulation This paper emphasizes the use of fundamental statistical techniques – design of . . . This paper emphasizes the use of fundamental statistical techniques – design of experiments, statistical modeling, and propagation of uncertainty – in the context of a combat scenario that depicts a ground vehicle being engaged by indirect . . . artillery.	Mark Couch, Thomas Johnson, John Haman, Heather Wojton, Benjamin Turner, David Higdon	Other
Artificial Intelligence & Autonomy Test & Evaluation Roadmap Goals As the Department of Defense acquires new systems with artificial intelligence . . . As the Department of Defense acquires new systems with artificial intelligence (AI) and autonomous (AI&A) capabilities, the test and evaluation (T&E) community will need to adapt to the challenges that these novel . . .	Brian Vickers, Daniel Porter, Rachel Haga, Heather Wojton	Technical Briefing
Bayesian Reliability: Combining Information One of the most powerful features of Bayesian analyses is the ability to combine . . . One of the most powerful features of Bayesian analyses is the ability to combine multiple sources of information in a principled way to perform inference. This feature can be particularly valuable in assessing the reliability of systems . . .	Alyson Wilson, Kassandra Froncyzk	Research Paper
Censored Data Analysis Methods for Performance Data: A Tutorial Binomial metrics like probability-to-detect or probability-to-hit typically do . . . Binomial metrics like probability-to-detect or probability-to-hit typically do not provide the maximum information from testing. Using continuous metrics such as time to detect provide more information, but do not account for non-detects. . . .	V. Bram Lillard	Technical Briefing
Challenges and new methods for designing reliability experiments Engineers use reliability experiments to determine the factors that drive . . . Engineers use reliability experiments to determine the factors that drive product reliability, build robust products, and predict reliability under use conditions. This article uses recent testing of a Howitzer to illustrate the challenges . . .	Laura Freeman, Thomas Johnson, Rebecca Medlin	Research Paper
Characterizing Human-Machine Teaming Metrics for Test & Evaluation This briefing defines human-machine teaming, describes new challenges in . . . This briefing defines human-machine teaming, describes new challenges in evaluating HMTs, and provides a framework for the categories of metrics that are important for the T&E of HMTs.	Heather Wojton, Brian Vickers, Kristina Carter, David Sparrow, Leonard Wilkins, Caitlan Fealing	Technical Briefing
Choice of second-order response surface designs for logistic and Poisson regression models This paper illustrates the construction of D-optimal second order designs for . . . This paper illustrates the construction of D-optimal second order designs for situations when the response is either binomial (pass/fail) or Poisson (count data).	Rachel T. Johnson, Douglas C. Montgomery	Research Paper
Circular prediction regions for miss distance models under heteroskedasticity Circular prediction regions are used in ballistic testing to express the . . . Circular prediction regions are used in ballistic testing to express the uncertainty in shot accuracy. We compare two modeling approaches for estimating circular prediction regions for the miss distance of a ballistic projectile. The miss . . .	Thomas H. Johnson, John T. Haman, Heather Wojton, Laura Freeman	Research Paper
Comparing Computer Experiments for the Gaussian Process Model Using Integrated Prediction Variance Space filling designs are a common choice of experimental design strategy for . . . Space filling designs are a common choice of experimental design strategy for computer experiments. This paper compares space filling design types based on their theoretical prediction variance properties with respect to the Gaussian . . .	Rachel T. Johnson, Douglas C. Montgomery, Bradley Jones, Chris Gotwalt	Research Paper
Comparing Normal and Binary D-Optimal Designs by Statistical Power In many Department of Defense test and evaluation applications, binary response . . . In many Department of Defense test and evaluation applications, binary response variables are unavoidable. Many have considered D-optimal design of experiments for generalized linear models. However, little consideration has been given to . . .	Addison D Adams	Technical Briefing
Data Principles for Operational and Live-Fire Testing Many DOD systems undergo operational testing, which is a field test involving . . . Many DOD systems undergo operational testing, which is a field test involving realistic combat conditions. Data, analysis, and reporting are the fundamental outcomes of operational test, which support leadership decisions. The importance of . . .	John Haman	Technical Briefing
Demystifying the Black Box: A Test Strategy for Autonomy The purpose of this briefing is to provide a high-level overview of how to frame . . . The purpose of this briefing is to provide a high-level overview of how to frame the question of testing autonomous systems in a way that will enable development of successful test strategies. The brief outlines the challenges and . . .	Heather Wojton, Daniel Porter	Technical Briefing
Designed Experiments for the Defense Community This paper presents the underlying tenets of design of experiments, as applied . . . This paper presents the underlying tenets of design of experiments, as applied in the Department of Defense, focusing on factorial, fractional factorial and response surface design and analyses. The concepts of statistical modeling and . . .	Rachel T. Johnson, Douglas C. Montgomery, James R. Simpson	Research Paper
Designing Experiments for Model Validation Advances in computational power have allowed both greater fidelity and more . . . Advances in computational power have allowed both greater fidelity and more extensive use of such models. Numerous complex military systems have a corresponding models that simulate its performance in the field. In response, the DoD needs . . .	Heather Wojton, Kelly Avery, Laura Freeman, Thomas Johnson	Other
Designing experiments for nonlinear models—an introduction This paper illustrates the construction of Bayesian D-optimal designs for . . . This paper illustrates the construction of Bayesian D-optimal designs for nonlinear models and compares the relative efficiency of standard designs to these designs for several models and prior distributions on the parameters.	Rachel T. Johnson, Douglas C. Montgomery	Research Paper
Determining How Much Testing is Enough: An Exploration of Progress in the Department of Defense Test and Evaluation Community This paper describes holistic progress in answering the question of “How much . . . This paper describes holistic progress in answering the question of “How much testing is enough?” It covers areas in which the T&E community has made progress, areas in which progress remains elusive, and issues that have emerged since . . .	Rebecca Medlin, Matthew Avery, James Simpson, Heather Wojton	Research Paper
Developing AI Trust: From Theory to Testing and the Myths in Between This introductory work aims to provide members of the Test and Evaluation . . . This introductory work aims to provide members of the Test and Evaluation community with a clear understanding of trust and trustworthiness to support responsible and effective evaluation of AI systems. The paper provides a set of working . . .	Yosef Razin, Kristen Alexander, John Haman	Research Paper	Trust; AI
Development of Wald-Type and Score-Type Statistical Tests to Compare Live Test Data and Simulation Predictions This work describes the development of a statistical test created in support of . . . This work describes the development of a statistical test created in support of ongoing verification, validation, and accreditation (VV&A) efforts for modeling and simulation (M&S) environments. The test computes a Wald-type . . .	Carrington A. Metts, Curtis Miller
Examining Improved Experimental Designs for Wind Tunnel Testing Using Monte Carlo Sampling Methods In this paper we compare data from a fairly large legacy wind tunnel test . . . In this paper we compare data from a fairly large legacy wind tunnel test campaign to smaller, statistically-motivated experimental design strategies. The comparison, using Monte Carlo sampling methodology, suggests a tremendous opportunity . . .	Raymond R. Hill, Derek A. Leggio, Shay R. Capehart, August G. Roesener	Research Paper
Handbook on Statistical Design & Analysis Techniques for Modeling & Simulation Validation This handbook focuses on methods for data-driven validation to supplement the . . . This handbook focuses on methods for data-driven validation to supplement the vast existing literature for Verification, Validation, and Accreditation (VV&A) and the emerging references on uncertainty quantification (UQ). The goal of . . .	Heather Wojton, Kelly Avery, Laura J. Freeman, Samuel Parry, Gregory Whittier, Thomas Johnson, Andrew Flack	Handbook	handbook, statistics
Hybrid Designs: Space Filling and Optimal Experimental Designs for Use in Studying Computer Simulation Models This tutorial provides an overview of experimental design for modeling and . . . This tutorial provides an overview of experimental design for modeling and simulation. Pros and cons of each design methodology are discussed.	Rachel Johnson Silvestrini	Technical Briefing
Implementing Fast Flexible Space-Filling Designs in R Modeling and simulation (M&S) can be a useful tool when testers and . . . Modeling and simulation (M&S) can be a useful tool when testers and evaluators need to augment the data collected during a test event. When planning M&S, testers use experimental design techniques to determine how much and which . . .	Christopher Dimapasok	Technical Briefing
Improving Operational Test Efficiency: Sequential Methods in Operational Testing The Department of Defense develops and acquires some of the world's most . . . The Department of Defense develops and acquires some of the world's most advanced, sophisticated, and expensive systems. As new technologies emerge and are incorporated into systems, Director, Operational Test and Evaluation faces the . . .	Keyla Pagan-Rivera	Technical Briefing
Improving Reliability Estimates with Bayesian Statistics This paper shows how Bayesian methods are ideal for the assessment of complex . . . This paper shows how Bayesian methods are ideal for the assessment of complex system reliability assessments. Several examples illustrate the methodology.	Kassandra Fronczyk, Laura J. Freeman	Research Paper
Improving Test Efficiency: A Bayesian Assurance Case Study To improve test planning for evaluating system reliability, we propose the use . . . To improve test planning for evaluating system reliability, we propose the use of Bayesian methods to incorporate supplementary data and reduce testing duration. Furthermore, we recommend Bayesian methods be employed in the analysis phase . . .	Rebecca M Medlin	Technical Briefing
Informing the Warfighter—Why Statistical Methods Matter in Defense Testing https://chance.amstat.org/2018/04/informing-the-warfighter/ https://chance.amstat.org/2018/04/informing-the-warfighter/	Laura J. Freeman and Catherine Warner	Research Paper
Initial Validation of the Trust of Automated Systems Test (TOAST) Trust is a key determinant of whether people rely on automated systems in the . . . Trust is a key determinant of whether people rely on automated systems in the military and the public. However, there is currently no standard for measuring trust in automated systems. In the present studies we propose a scale to measure . . .	Heather Wojton, Daniel Porter, Stephanie Lane, Chad Bieber, Poornima Madhavan	Research Paper
Introduction to ciTools ciTools is an R package for working with model uncertainty. It gives users . . . ciTools is an R package for working with model uncertainty. It gives users access to confidence and prediction intervals for the fitted values of (log-) linear models, generalized linear models, and (log-) linear mixed models. Additionally, . . .	John Haman, Matthew Avery, Laura Freeman	Technical Briefing
Introduction to Design of Experiments This training provides details regarding the use of design of experiments, from . . . This training provides details regarding the use of design of experiments, from choosing proper response variables, to identifying factors that could affect such responses, to determining the amount of data necessary to collect. The . . .	Breeana Anderson, Rebecca Medlin, John T. Haman, Kelly M. Avery, Keyla Pagan-Rivera	Technical Briefing
Introduction to Measuring Situational Awareness in Mission-Based Testing Scenarios In FY23, OED’s Test Science group conducted research into situationalawareness . . . In FY23, OED’s Test Science group conducted research into situationalawareness (SA) measurement for operational testing (OT). Following ourpresentation at the 2023 DATAWorks conference, a representative from the Army Evaluation Command . . .	Elizabeth Green, John Haman	Technical Briefing
Managing T&E Data to encourage reuse Reusing Test and Evaluation (T&E) datasets multiple times at different . . . Reusing Test and Evaluation (T&E) datasets multiple times at different points throughout a program’s lifecycle is one way to realize their full value. Data management plays an important role in enabling this practice. Reuse of T&E . . .	Andrew Flack	Research Paper	Data Management
Metamodeling Techniques for Verification and Validation of Modeling and Simulation Data Modeling and simulation (M&S) outputs help the Director, Operational Test . . . Modeling and simulation (M&S) outputs help the Director, Operational Test and Evaluation (DOT&E), assess the effectiveness, survivability, lethality, and suitability of systems. To use M&S outputs, DOT&E needs models and . . .	John T. Haman, Curtis G. Miller	Research Paper
On scoping a test that addresses the wrong objective Statistical literature refers to a type of error that is committed by giving the . . . Statistical literature refers to a type of error that is committed by giving the right answer to the wrong question. If a test design is adequately scoped to address an irrelevant objective, one could say that a Type III error occurs. In . . .	Thomas Johnson, Rebecca Medlin, Laura Freeman, James Simpson	Research Paper
Power Analysis Tutorial for Experimental Design Software This guide provides both a general explanation of power analysis and specific . . . This guide provides both a general explanation of power analysis and specific guidance to successfully interface with two software packages, JMP and Design Expert (DX).	James Simpson, Thomas Johnson, Laura J. Freeman	Handbook
Regularization for Continuously Observed Ordinal Response Variables with Piecewise-Constant Functional Predictors This paper investigates regularization for continuously observed covariates that . . . This paper investigates regularization for continuously observed covariates that resemble step functions. Two approaches for regularizing these covariates are considered, including a thinning approach commonly used within the DoD to address . . . autocorrelated time series data.	Matthew Avery, Mark Orndorff, Timothy Robinson, Laura J. Freeman	Research Paper
Scientific Measurement of Situation Awareness in Operational Testing Situation Awareness (SA) plays a key role in decision making and human . . . Situation Awareness (SA) plays a key role in decision making and human performance; higher operator SA is associated with increased operator performance and decreased operator errors. While maintaining or improving “situational awareness” . . .	Elizabeth A. Green, Miriam E. Armstrong, Janna Mantua	Research Paper
Space-Filling Designs for Modeling & Simulation This document presents arguments and methods for using space-filling designs . . . This document presents arguments and methods for using space-filling designs (SFDs) to plan modeling and simulation (M&S) data collection.	Han Yi, Curtis Miller, Kelly Avery	Research Paper
Space-filling experimental design and surrogate models for U.S. Department of Defense modeling and simulation evaluation The U.S. Department of Defense uses modeling and simulation (M&S) for test . . . The U.S. Department of Defense uses modeling and simulation (M&S) for test and evaluation of systems acquired by the Services. The Director, Operational Test and Evaluation (DOT&E), who provides oversight of operational testing, . . .	Curtis G Miller	Technical Briefing
Statistical Methods Development Work for M&S Validation Modeling and simulation (M&S) environments feature frequently in test and . . . Modeling and simulation (M&S) environments feature frequently in test and evaluation (T&E) of Department of Defense (DoD) systems. Many M&S environments do not suffer many of the resourcing limitations associated with live test. . . .	Curtis Miller	Technical Briefing
Statistical Methods for Defense Testing In the increasingly complex and data‐limited world of military defense testing, . . . In the increasingly complex and data‐limited world of military defense testing, statisticians play a valuable role in many applications. Before the DoD acquires any major new capability, that system must undergo realistic testing in its . . .	Dean Thomas, Kelly Avery, Laura Freeman	Research Paper
Statistical Models for Combining Information Stryker Reliability Case Study This paper describes the benefits of using parametric statistical models to . . . This paper describes the benefits of using parametric statistical models to combine information across multiple testing events. Both frequentist and Bayesian inference techniques are employed, and they are compared and contrasted to . . .	Rebecca Dickinson, Laura J. Freeman, Bruce Simpson, Alyson Wilson	Research Paper
Test & Evaluation of AI-Enabled and Autonomous Systems: A Literature Review This paper summarizes a subset of the literature regarding the challenges to and . . . This paper summarizes a subset of the literature regarding the challenges to and recommendations for the test, evaluation, verification, and validation (TEV&V) of autonomous military systems.	Heather Wojton, Daniel Porter, John Dennis	Research Paper
Test Design Challenges in Defense Testing All systems undergo operational testing before fielding or full-rate production. . . . All systems undergo operational testing before fielding or full-rate production. While contractor and developmental testing tends to be requirements-driven, operational testing focuses on mission success. The goal is to evaluate operational . . .	Rebecca Medlin, Kelly Avery, Curtis Miller	Technical Briefing
The Effect of Extremes in Small Sample Size on Simple Mixed Models: A Comparison of Level-1 and Level-2 Size We present a simulation study that examines the impact of small sample sizes in . . . We present a simulation study that examines the impact of small sample sizes in both observation and nesting levels of the model on the fixed effect bias, type I error, and the power of a simple mixed model analysis. Despite the need for . . .	Kristina A. Carter, Heather M. Wojton, Stephanie T. Lane	Research Paper
The Effect of Extremes in Small Sample Size on Simple Mixed Models: A Comparison of Level-1 and Level-2 Size We present a simulation study that examines the impact of small sample sizes in . . . We present a simulation study that examines the impact of small sample sizes in both observation and nesting levels of the model on the fixed effect bias, type I error, and the power of a simple mixed model analysis. Despite the need for . . .	Kristina A. Carter, Heather M. Wojton, Stephanie T. Lane	Research Paper
The Purpose of Mixed-effects Models in Test and Evaluation Mixed-effects models are the standard technique for analyzing data that exhibit . . . Mixed-effects models are the standard technique for analyzing data that exhibit some grouping structure. In defense testing, these models are useful because they allow us to account for correlations between observations, a feature common in . . .	John Haman, Matthew Avery, Heather Wojton	Research Paper	Mixed models
Trustworthy Autonomy: A Roadmap to Assurance -- Part 1: System Effectiveness In this document, we present part one of our two-part roadmap. We discuss the . . . In this document, we present part one of our two-part roadmap. We discuss the challenges and possible solutions to assessing system effectiveness.	Daniel Porter, Michael McAnally, Chad Bieber, Heather Wojton, Rebecca Medlin	Handbook
Why are Statistical Engineers needed for Test & Evaluation? This briefing, developed for a presentation at the 2021 Quality and Productivity . . . This briefing, developed for a presentation at the 2021 Quality and Productivity Research Conference, includes two case studies that highlight why statistical engineers are necessary for successful T&E. These case studies center on the . . .	Rebecca Medlin, Keyla Pagán-Rivera, Monica Ahrens	Technical Briefing

Subscribe

Research Document Library