DATAWorks Sessions Archive

Session Title	Speaker	Type	Recording	Materials	Year
Virtual Speaker Epistemic and Aleatoric Uncertainty Quantification for Gaussian Processes (Abstract) One of the major advantages of using Gaussian Processes for regression and surrogate modeling is the availability of uncertainty bounds for the predictions. However, the validity of these bounds depends on using an appropriate covariance function, which is usually learned from the data out of a parametric family through maximum likelihood estimation or cross-validation methods. In practice, the data might not contain enough information to select the best covariance function, generating uncertainty in the hyperparameters, which translates into an additional layer of uncertainty in the predictions (epistemic uncertainty) on top of the usual posterior covariance (aleatoric uncertainty). In this talk, we discuss considering both uncertainties for UQ, and we quantify them by extending the MLE paradigm using a game theoretical framework that identifies the worst-case prior under a likelihood constraint specified by the practitioner. Session Files Session Recording	Pau Batlle PhD Student California Institute of Technology (bio) Pau is a PhD student in Computing and Mathematical Sciences at Caltech, advised by Houman Owhadi. His main research area is Game-theoretical Uncertainty Quantification (UQ) and Gaussian Process Regression (GPR), both from a theoretical point of view and with applications to the Physical Sciences, including collaboration with scientists from the Machine Learning and Uncertainty Quantification groups at NASA Jet Propulsion Laboratory. Before joining Caltech, he graduated from Universitat Politècnica de Catalunya with a double degree in Mathematics and Engineering Physics as part of the CFIS program and he was a research intern at the Center for Data Science at NYU for 9 months.		Session Recording		2023
Virtual Speaker Using Changepoint Detection and Artificial Intelligence to Classify Fuel Pressure Behaviors in Aerial Refueling (Abstract) An open question in aerial refueling system test and evaluation is how to classify fuel pressure states and behaviors reproducibly and defensibly when visually inspecting the data stream. This question exists because fuel pressure data streams are highly stochastic, may exhibit multiple types of troublesome behavior simultaneously in a single stream, and may exhibit unique platform-dependent discernable behaviors. These data complexities result in differences in fuel pressure behavior classification determinations between engineers based on experience level and individual judgement. In addition to consuming valuable time, discordant judgements between engineers reduce confidence in metrics and other derived analytic products that are used to evaluate the system's performance. The Pruned Exact Linear Time (PELT) changepoint detection algorithm is an unsupervised machine learning method that, when coupled with an expert system AI, has provided a consistent and reproducible solution in classifying various fuel pressure states and behaviors with adjustable sensitivity. Session Files Session URLSession Recording	Nelson Walker and Michelle Ouellette Mathematical Statistician United States Air Force (bio) Dr. Walker is a statistician for the United States Air Force at the 412th Test Wing at Edwards AFB, California. He graduate with a PhD in statistics from Kansas State University in 2021. Ms. Ouellette is the Data Science and Statistics Lead at the 418th Global Reach Flight Test Squadron for the United States Air Force at the 412th Test Wing at Edwards AFB, California. She graduated with a M.S. in Statistics from California State University, Fullerton in 2018. Ms. Ouellette is the Data Science and Statistics Lead at the 418th Global Reach Flight Test Squadron for the United States Air Force at the 412th Test Wing at Edwards AFB, California. She graduated with a M.S. in Statistics from California State University, Fullerton in 2018.		Session Recording	Materials	2023
A Bayesian Approach for Nonparametric Multivariate Process Monitoring using Universal Residuals (Abstract) In Quality Control, monitoring sequential-functional observations for characteristic changes through change-point detection is a common practice to ensure that a system or process produces high-quality outputs. Existing methods in this field often only focus on identifying when a process is out-of-control without quantifying the uncertainty of the underlying decision-making processes. To address this issue, we propose using universal residuals under a Bayesian paradigm to determine if the process is out-of-control and assess the uncertainty surrounding that decision. The universal residuals are computed by combining two non-parametric techniques: regression trees and kernel density estimation. These residuals have the key feature of being uniformly distributed when the process is in control. To test if the residuals are uniformly distributed across time (i.e., that the process is in-control), we use a Bayesian approach for hypothesis testing, which outputs posterior probabilities for events such as the process being out-of-control at the current time, in the past, or in the future. We perform a simulation study and demonstrate that the proposed methodology has remarkable detection and a low false alarm rate. Session Files Session URLSession Recording	Daniel Timme PhD Candidate Florida State University (bio) Daniel A. Timme is currently a PhD candidate in Statistics at Florida State University. Mr. Timme graduated with a BS in Mathematics from the University of Houston and a BS in Business Management from the University of Houston-Clear Lake. He earned an MS in Systems Engineering with a focus in Reliability and a second MS in Space Systems with focuses in Space Vehicle Design and Astrodynamics, both from the Air Force Institute of Technology. Mr. Timme’s research interest is primarily focused in the areas of reliability engineering, applied mathematics and statistics, optimization, and regression.		Session Recording	Materials	2023
Presentation A Bayesian Decision Theory Framework for Test & Evaluation (Abstract) Decisions form the core of T&E: decisions about which tests to conduct and, especially, decisions on whether to accept or reject a system at its milestones. The traditional approach to acceptance is based on conducting tests under various conditions to ensure that key performance parameters meet certain thresholds with the required degree of confidence. In this approach, data is collected during testing, then analyzed with techniques from classical statistics in a post-action report. This work explores a new Bayesian paradigm for T&E based on one simple principle: maintaining a model of the probability distribution over system parameters at every point during testing. In particular, the Bayesian approach posits a distribution over parameters prior to any testing. This prior distribution provides (a) the opportunity to incorporate expert scientific knowledge into the inference procedure, and (b) transparency regarding all assumptions being made. Once a prior distribution is specified, it can be updated as tests are conducted to maintain a probability distribution over the system parameters at all times. One can leverage this probability distribution in a variety of ways to produce analytics with no analog in the traditional T&E framework. In particular, having a probability distribution over system parameters at any time during testing enables one to implement an optimal decision-making procedure using Bayesian Decision Theory (BDT). BDT accounts for the cost of various testing options relative to the potential value of the system being tested. When testing is expensive, it provides guidance on whether to conserve resources by ending testing early. It evaluates the potential benefits of testing for both its ability to inform acceptance decisions and for its intrinsic value to the commander of an accepted system. This talk describes the BDT paradigm for T&E and provides examples of how it performs in simple scenarios. In future work we plan to extend the paradigm to include the features, the phenomena, and the SME elicitation protocols necessary to address realistic T&E cases. Session Files Session URLSession Recording	James Ferry Senior Research Scientist Metron, Inc. (bio) Dr. James Ferry has been developing Bayesian analytics at Metron for 18 years. He has been the Principal Investigator for a variety of R&D projects that apply Bayesian methods to data fusion, network science, and machine learning. These projects range from associating disparate data types from multiple sensors for missile defense, developing methods to track hidden structures on dynamically changing networks, computing incisive analytics efficiently from information in large databases, and countering adversarial attacks on neural-network-based image classifiers. Dr. Ferry was active in the network science community in the 2010's. He organized a full-day special session on network science at FUSION 2015, co-organized WIND 2016 (Workshop on Incomplete Network Data), and organized a multi-day session on the Frontiers of Networks at the MORS METSM meeting in December 2016. Since then, his focus has been Bayesian analytics and network science algorithms for the Intelligence Community. Prior to Metron, Dr. Ferry was a computational fluid dynamicist. He developed models and supercomputer simulations of the multiphase fluid dynamics of rocket engines at the Center for Simulation of Advanced Rockets at UIUC. He has 30+ technical publications in fluid dynamics, network science, and Bayesian analytics. He holds a B.S. in Mathematics from M.I.T. and a Ph.D. in Applied Mathematics from Brown University.	Presentation	Session Recording	Materials	2023
Presentation A Bayesian Optimal Experimental Design for High-dimensional Physics-based Models (Abstract) Many scientific and engineering experiments are developed to study specific questions of interest. Unfortunately, time and budget constraints make operating these controlled experiments over wide ranges of conditions intractable, thus limiting the amount of data collected. In this presentation, we discuss a Bayesian approach to identify the most informative conditions, based on the expected information gain. We will present a framework for finding optimal experimental designs that can be applied to physics-based models with high-dimensional inputs and outputs. We will study a real-world example where we aim to infer the parameters of a chemically reacting system, but there are uncertainties in both the model and the parameters. A physics-based model was developed to simulate the gas-phase chemical reactions occurring between highly reactive intermediate species in a high-pressure photolysis reactor coupled to a vacuum-ultraviolet (VUV) photoionization mass spectrometer. This time-of-flight mass spectrum evolves in both kinetic time and VUV energy producing a high-dimensional output at each design condition. The high-dimensional nature of the model output poses a significant challenge for optimal experimental design, as a surrogate model is built for each output. We discuss how accurate low-dimensional representations of the high-dimensional mass spectrum are necessary for computing the expected information gain. Bayesian optimization is employed to maximize the expected information gain by efficiently exploring a constrained design space, taking into account any constraint on the operating range of the experiment. Our results highlight the trade-offs involved in the optimization, the advantage of using optimal designs, and provide a workflow for computing optimal experimental designs for high-dimensional physics-based models. Session Files Session URLSession Recording	James Oreluk Postdoctoral Researcher Sandia National Laboratories (bio) James Oreluk is a postdoctoral researcher at Sandia National Laboratories in Livermore, CA. He earned his Ph.D. in Mechanical Engineering from UC Berkeley with research on developing optimization methods for validating physics-based models. His current research focuses on advancing uncertainty quantification and machine learning methods to efficiently solve complex problems, with recent work on utilizing low-dimensional representation for optimal decision making.	Presentation	Session Recording	Materials	2023
A Data-Driven Approach of Uncertainty Quantification on Reynolds Stress Based on DNS Turbulence Data (Abstract) High-fidelity simulation capabilities have progressed rapidly over the past decades in computational fluid dynamics (CFD), resulting in plenty of high-resolution flow field data. Uncertainty quantification remains an unsolved problem due to the high-dimensional input space and the intrinsic complexity of turbulence. Here we developed an uncertainty quantification method to model the Reynolds stress based on Karhunen-Loeve Expansion(KLE) and Project Pursuit basis Adaptation polynomial chaos expansion(PPA). First, different representative volume elements (RVEs) were randomly drawn from the flow field, and KLE was used to reduce them into a moderate dimension. Then, we build polynomial chaos expansions of Reynolds stress using PPA. Results show that this method can yield a surrogate model with a test accuracy of up to 90%. PCE coefficients also show that Reynolds stress strongly depends on second-order KLE random variables instead of first-order terms. Regarding data efficiency, we built another surrogate model using a neural network(NN) and found that our method outperforms NN in limited data cases. Session Files Session URLSession Recording	Zheming Gou Graduate Research assistant University of Southern California (bio) Zheming Gou, a Ph.D. student in the Department of Mechanical Engineering at the University of Southern California, is a highly motivated individual with a passion for high-fidelity simulations, uncertainty quantification, and machine learning, especially in high dimensions and rare data scenarios. Currently, Zheming Gou is engaged in research to build probabilistic models for multiscale simulations using tools including polynomial chaos expansion (PCE) and state-of-art machine learning methods.		Session Recording	Materials	2023
A generalized influence maximization problem (Abstract) The influence maximization problem is a popular topic in social networks with several applications in viral marketing and epidemiology. One possible way to understand the problem is from the perspective of a marketer who wants to achieve the maximum influence on a social network by choosing an optimum set of nodes of a given size as seeds. The marketer actively influences these seeds, followed by a passive viral process based on a certain influence diffusion model, in which influenced nodes influence other nodes without external intervention. Kempe et al. showed that a greedy algorithm-based approach can provide a (1-1/e)-approximation guarantee compared to the optimal solution if the influence spreads according to the Triggering model. In our current work, we consider a much more general problem where the goal is to maximize the total expected reward obtained from the nodes that are influenced by a given time (that may be finite or infinite) where the reward obtained by influencing a set of nodes can depend on the set (and not necessarily a sum of rewards from the individual nodes) as well as the times at which each node gets influenced, we can restrict ourself to a subset of the network from where the seeds can be chosen, we can choose to assign multiple units of our budget to a single node (where the maximum number of budget units that may be assigned on a node can depend on the node), and a seeded node will actually get influenced with a certain probability where the probability is a non-decreasing function of the number of budget units assigned to that node. We have formulated a greedy algorithm that provides a (1-1/e)-approximation guarantee compared to the optimal solution of this generalized influence maximization problem if the influence spreads according to the Triggering model. Session Files	Sumit Kumar Kar Ph.D. candidate University of North Carolina at Chapel Hill (bio) I am currently a Ph.D. candidate and a Graduate Teaching Assistant in the department of Statistics & Operations Research at the University of North Carolina at Chapel Hill (UNC-CH). My Ph.D. advisors are Prof. Nilay Tanik Argon, Prof. Shankar Bhamidi, and Prof. Serhan Ziya. I broadly work in probability, statistics, and operations research. My current research interests include (but are not confined to) working on problems in random networks which find interesting practical applications such as in viral marketing or Word-Of-Mouth (WOM) marketing, efficient immunization during epidemic outbreaks, finding the root of an epidemic, and so on. Apart from research, I have worked on two statistical consulting projects with the UNC Dept. of Medicine and Quantum Governance, L3C, CUES. Also, I am extremely passionate about teaching. I have been the primary instructor as well as a teaching assistant for several courses at UNC-CH STOR. You can find more about me on my website here: https://sites.google.com/view/sumits-space/home.				2023
A Stochastic Petri Net Model of Continuous Integration and Continuous Delivery (Abstract) Modern software development organizations rely on continuous integration and continuous delivery (CI/CD), since it allows developers to continuously integrate their code in a single shared repository and automates the delivery process of the product to the user. While modern software practices improve the performance of the software life cycle, they also increase the complexity of this process. Past studies make improvements to the performance of the CI/CD pipeline. However, there are fewer formal models to quantitatively guide process and product quality improvement or characterize how automated and human activities compose and interact asynchronously. Therefore, this talk develops a stochastic Petri net model to analyze a CI/CD pipeline to improve process performance in terms of the probability of successfully delivering new or updated functionality by a specified deadline. The utility of the model is demonstrated through a sensitivity analysis to identify stages of the pipeline where improvements would most significantly improve the probability of timely product delivery. In addition, this research provided an enhanced version of the conventional CI/CD pipeline to examine how it can improve process performance in general. The results indicate that the augmented model outperforms the conventional model, and sensitivity analysis suggests that failures in later stages are more important and can impact the delivery of the final product. Session Files Session URLSession Recording	Sushovan Bhadra Master's student University of Massachusetts Dartmouth (bio) Sushovan Bhadra is a MS student in the Department of Electrical & Computer Engineering at the University of Massachusetts Dartmouth (UMassD). He received his BS (2016) in Electrical Engineering from Ahsanullah University of Science and Technology (AUST), Bangladesh.		Session Recording	Materials	2023
A Tour of JMP Reliability Platforms and Bayesian Methods for Reliability Data (Abstract) JMP is a comprehensive, visual, and interactive statistical discovery software with a carefully curated graphical user interface designed for statistical discovery. The software is packed with traditional and modern statistical analysis capabilities and many unique innovative features. The software hosts several suites of tools that are especially valuable to the DATAWorks’s audience. The software includes suites for Design of Experiments, Quality Control, Process Analysis, and Reliability Analysis. JMP has been building its reliability suite for the past fifteen years. The reliability suite in JMP is a comprehensive and mature collections of JMP platforms. The suite empowers reliability engineers with tools for analyzing time-to-event data, accelerated life test data, observational reliability data, competing cause data, warranty data, cumulative damage data, repeated measures degradation data, destructive degradation data, and recurrence data. For each type of data, there are numerous models and one or more methodologies that are applicable based on the nature of data. In addition to reliability data analysis platforms, the suite also provides capabilities of reliability engineering for system reliability from two distinct perspectives, one for non-repairable systems and the other for repairable systems. The capability of JMP reliability suite is also at the frontier of advanced research on reliability data analysis. Inspired by the research by Prof. William Meeker at Iowa State University, we have implemented Bayesian inference methodologies for analyzing three most important types of reliability data. The tutorial will start with an overall introduction to JMP’s reliability platforms. Then the tutorial will focus on analyzing time-to-event data, accelerated life test data, and repeated measures degradation data. The tutorial will present analyzing these types of reliability data using traditional methods, and highlight when, why, and how to analyze them in JMP using Bayesian methods. Session Files Session URLSession Recording	Peng Liu Principal Research Statistician Developer JMP Statistical Discovery (bio) Peng Liu is a Principal Research Statistician Developer at JMP Statistical Discovery LLC. He holds a Ph.D. in statistics from NCSU. He has been working at JMP since 2007. He specializes in computational statistics, software engineering, reliability data analysis, reliability engineering, time series analysis, and time series forecasting. He is responsible for developing and maintaining all JMP platforms in the above areas. He has broad interest in statistical analysis research and software product development.		Session Recording	Materials	2023
Presentation Advanced Automated Machine Learning System for Cybersecurity (Abstract) Florida International University (FIU) has developed an Advanced Automated Machine Learning System (AAMLS) under the sponsored research from Department of Defense - Test Resource Management Center (DOD-TRMC), to provide Artificial Intelligence based advanced analytics solutions in the area of Cyber, IOT, Network, Energy, Environment etc. AAMLS is a Rapid Modeling & Testing Tool (RMTT) for developing machine learning and deep learning models in few steps by subject matter experts from various domains with minimum machine learning knowledge using auto & optimization workflows. AAMLS allows analysis of data collected from different test technology domains by using machine learning / deep learning and ensemble learning approaches to generate models, make predictions, then apply advanced analytics and visualization to perform analysis. This system enables automated machine learning using AI based Advanced Analytics and the Analytics Control Center platforms by connecting to multiple Data Sources. Artificial Intelligence based Advanced Analytics Platform: This platform is the analytics engine of AAML which provides pre-processing, feature engineering, model building and predictions. Primary components of this platform include: Machine Learning Server: This module is deployed to build ML/DL models using the training data from the data sources and perform predictions/analysis of associated test data based on the AAML-generated ML/DL models. Machine Learning Algorithms: ML algorithms like Logistic Regression, Linear regression, Decision tree, Random Forest, One Class Support Vector Machine, Jaccard Similarity etc. are available for model building. Deep learning Algorithms: Deep learning algorithms such as Deep Neural Networks and Recurrent Neural Networks are available to perform classification & anomaly detection using the TensorFlow framework and the Keras API. Analytics Control Center: This platform is a centralized application to manage the AAML system. It consists of following main modules. Data Source: This module allows the user to connect to the existing data to the AAML system to perform analytics. These data sources may reside in a Network File Share, Database or Big Data Cluster. Model Development: This module provides the functionality to build ML/DL models with various AI algorithms. This is performed by engaging specific ML algorithms for five types of analysis: Classification, Regression, Time-Series, Anomaly Detection and Clustering Predictions: This module provides the functionality to predict the outcome of an analysis of an associated data set based on model built during Model Development. Manage Models and Predictions: These modules allow the user to manage the ML models that have been generated and resulting predictions of associated data sets. Session Files Session Recording	Himanshu Dayaram Upadhyay Associate Professor (ECE) Florida International University (bio) Dr. Himanshu Upadhyay is serving Florida International University’s Applied Research Center for the past 21 years, leading the Artificial Intelligence / Cybersecurity / Big Data research group. He is currently working as Associate Professor in Electrical & Computer Engineering teaching Artificial Intelligence and Cybersecurity courses. He is mentoring DOE Fellows, AI Fellows, Cyber Fellows, undergraduate and graduate students supporting multiple cybersecurity & AI research projects from various federal agencies. He brings more than 30 years of experience in artificial intelligence/machine learning, big data, cybersecurity, information technology, management and engineering to his role, serving as co-principal investigator for multimillion - dollar cybersecurity and artificial intelligence projects for the Department of Defense and Defense Intelligence Agency. He is also serving as co-principal investigator for Department of Energy’s Office of Environmental Management research projects focused on knowledge/waste management, cybersecurity, artificial intelligence and big data technologies. He has published multiple papers in the area of cybersecurity, machine learning, deep learning, big data, knowledge / nuclear waste management and service-oriented architecture. His current research focuses on artificial intelligence, machine learning, deep learning, cyber security, big data, cyber analytics/visualization, cyber forensics, malware analysis and blockchain. He has architected a range of tiered and distributed application system to address strategic business needs, managing a team of researchers and scientists building secured enterprise information systems.	Presentation	Session Recording		2023
An Evaluation Of Periodic Developmental Reviews Using Natural Language Processing (Abstract) As an institution committed to developing leaders of character, the United States Military Academy (USMA) holds a vested interest in measuring character growth. One such tool, the Periodic Developmental Review (PDR), has been used by the Academy’s Institutional Effectiveness Office for over a decade. PDRs are written counseling statements evaluating how a cadet is developing with respect to his/her peers. The objective of this research was to provide an alternate perspective of the PDR system by using statistical and natural language processing (NLP) based approaches to find whether certain dimensions of PDR data were predictive of a cadet’s overall rating. This research implemented multiple NLP tasks and techniques, including sentiment analysis, named entity recognition, tokenization, part-of-speech tagging, and word2vec, as well as statistical models such as linear regression and ordinal logistic regression. The ordinal logistic regression model concluded PDRs with optional written summary statements had more predictable overall scores than those without summary statements. Additionally, those who wrote the PDR on the cadet (Self, Instructor, Peer, Subordinate) held strong predictive value towards the overall rating. When compared to a self-reflecting PDR, instructor-written PDRs were 62.40% more probable to have a higher overall score, while subordinate-written PDRs had a probability of improvement of 61.65%. These values were amplified to 70.85% and 73.12% respectively when considering only those PDRs with summary statements. These findings indicate that different writer demographics have a different understanding of the meaning of each rating level. Recommendations for the Academy would be implementing a forced distribution or providing a deeper explanation of overall rating in instructions. Additionally, no written language facets analyzed demonstrated predictive strength, meaning written statements do not introduce unwanted bias and could be made a required field for more meaningful feedback to cadets. Session Files Session URLSession Recording	Dominic Rudakevych Cadet United States Military Academy (bio) Cadet Dominic Rudakevych is from Hatboro, Pennsylvania, and is a senior studying mathematics at the United States Military Academy (USMA). Currently, CDT Rudakevych serves as the Society for Industrial and Applied Mathematics president at USMA and the captain of the chess team. He is involved in research within the Mathematics department, using artificial intelligence methods to analyze how written statements about cadets impact overall cadet ratings. He will earn a Bachelor of Science Degree in mathematics upon graduation. CDT Rudakevych will commission as a Military Intelligence officer in May and is excited to serve his country as a human-intelligence platoon leader.		Session Recording	Materials	2023
Presentation An Introduction to Ranking Data and a Case Study of a National Survey of First Responders (Abstract) Ranking data are collected by presenting a respondent with a list of choices, and then asking what are the respondent's favorite, second favorite, and so on. The rankings may be complete, the respondent rank orders the complete list, or partial, only the respondent's favorite two or three, etc. Given a sample of rankings from a population, one goal may be to estimate the most favored choice from the population. Another may be to compare the preferences of one subpopulation to another. In this presentation I will introduce ranking data and probability models that form the foundation for statistical inference for them. The Plackette-Luce model will be the main focus. After that I will introduce a real data set containing ranking data assembled by the National Institute of Standards and Technology (NIST) based on the results of a national survey of first responders. The survey asked about how first responders use communication technology. With this data set, questions such as do rural and urban/suburban first responders prefer the same types communication devices, can be explored. I will conclude with some ideas for incorporating rankning data into test and evaluation settings. Session Files Session URLSession Recording	Adam Pintar Mathematical Statistician National Institute of Standards and Technology (bio) Adam earned a Ph.D. in Statistics from Iowa State University in 2010, and has been a Mathematical Statistician with NIST's Statistical Engineering Division since. His primary focus is providing statistical and machine learning expertise and insight on multidisciplinary research teams. He has collaborated with researchers from very diverse backgrounds such as social science, engineering, chemistry, and physics. He is a Past Chair of the Statistics Division of the American Society for Quality (ASQ), he currently serves on the editorial board of the journal Transactions on Mathematical Software, and he is a member of the American Statistical Association and a senior member of ASQ.	Presentation	Session Recording	Materials	2023
An Introduction to Uncertainty Quantification for Modeling & Simulation (Abstract) Predictions from modeling and simulation (M&S) are increasingly relied upon to inform critical decision making in a variety of industries including defense and aerospace. As such, it is imperative to understand and quantify the uncertainties associated with the computational models used, the inputs to the models, and the data used for calibration and validation of the models. The rapidly evolving field of uncertainty quantification (UQ) combines elements of statistics, applied mathematics, and discipline engineering to provide this utility for M&S. This mini tutorial provides an introduction to UQ for M&S geared towards engineers and analysts with little-to-no experience in the field but with some knowledge of probability and statistics. A brief review of basic probability will be provided before discussing some core UQ concepts in more detail, including uncertainty propagation and the use of Monte Carlo simulation for making probabilistic predictions with computational models, model calibration to estimate uncertainty in model input parameters using experimental data, and sensitivity analysis for identifying the most important and influential model inputs parameters. Examples from relevant NASA applications are included and references are provided throughout to point viewers to resources for further study. Session Files Session URLSession Recording	James Warner Computational Scientist NASA Langley Research Center (bio) Dr. James (Jim) Warner joined NASA Langley Research Center (LaRC) in 2014 as a Research Computer Engineer after receiving his PhD in Computational Solid Mechanics from Cornell University. Previously, he received his B.S. in Mechanical Engineering from SUNY Binghamton University and held temporary research positions at the National Institute of Standards and Technology and Duke University. Dr. Warner is a member of the Durability, Damage Tolerance, and Reliability Branch (DDTRB) at LaRC, where he focuses on developing computationally-efficient approaches for uncertainty quantification for a range of applications including structural health management, additive manufacturing, and trajectory simulation. Additionally, he works to bridge the gap between UQ research and NASA mission impact, helping to transition state-of-the-art methods to solve practical engineering problems. To that end, he has recently been involved in efforts to certify the xEMU spacesuit and develop guidance systems for entry, descent, and landing for Mars landing. His other research interests include machine learning, high performance computing, and topology optimization.		Session Recording	Materials	2023
An Overview of Methods, Tools, and Test Capabilities for T&E of Autonomous Systems (Abstract) This tutorial will give an overview of selected methodologies, tools and test capabilities discussed in the draft “Test and Evaluation Companion Guide for Autonomous Military Systems.” This test and evaluation (T&E) companion guide is being developed to provide guidance to test and evaluation practitioners, to include program managers, test planners, test engineers, and analysts with test strategies, applicable methodologies, and tools that will help to improve rigor in addressing the challenges unique to the T&E of autonomy. It will also cover selected capabilities of test laboratories and ranges that support autonomous systems. The companion guide is intended to be a living document contributed by the entire community and will adapt to ensure the right information reaches the right audience. Session Files Session URLSession Recording	Leonard Truett and Charlie Middleton Senior STAT Expert STAT COE (bio) Dr. Truett has been a member of the Scientific Test and Analysis Techniques Center of Excellence (STAT COE) located at WPAFB, OH since 2012 and is currently the is the Senior STAT Expert. He began his career as a civilian for the 46th Test Wing supporting Live Fire Test and Evaluation (LFT&E) specializing in fire and explosion suppression for Aircraft. He has also worked for the Institute for Defense Analyses (IDA) supporting the Director, Operational Test and Evaluation (DOT&E) in LFT&E and Operational Test and Evaluation (OT&E) for air systems. He holds a Bachelor’s of Science in Aerospace Engineering and a Master’s of Science in Aerospace Engineering from the Georgia Institute of Technology, and Doctorate of Aerospace Engineering from The University of California, San Diego. Charlie Middleton currently leads the Advancements in Test and Evaluation (T&E) of Autonomous Systems team for the OSD STAT Center of Excellence. His responsibilities include researching autonomous system T&E methods and tools; collaborating with Department of Defense program and project offices developing autonomous systems; leading working groups of autonomy testers, staffers, and researchers; and authoring a handbook, reports, and papers related to T&E of autonomous systems. Previously, Mr. Middleton led development of a live-fire T&E risk-based framework for survivability and lethality evaluation for the office of the Director, Operational T&E; led a multi-domain modeling and simulation team supporting Air Force Research Labs future space efforts; and developed a Bayesian reliability analysis toolset for the National Air and Space Intelligence Center. While an active-duty Air Force officer, he was a developmental and operational test pilot leading several aircraft and weapons T&E programs and projects, and piloted 291 combat hours in the F-16 aircraft, employing precision munitions in Close Air Support, Time-Sensitive Targeting, and Suppression of Enemy Air Defense combat missions. Mr. Middleton is a distinguished graduate of the U.S. Naval Test Pilot School, and holds undergraduate and graduate degrees in operations research and operations analysis from Princeton University and the Air Force Institute of Technology.		Session Recording	Materials	2023
Presentation An Overview of the NASA Quesst Community Test Campaign with the X-59 Aircraft (Abstract) In its mission to expand knowledge and improve aviation, NASA conducts research to address sonic boom noise, the prime barrier to overland supersonic flight. For half a century civilian aircraft have been required to fly slower than the speed of sound when over land to prevent sonic boom disturbances to communities under the flight path. However, lower noise levels may be achieved via new aircraft shaping techniques that reduce the merging of shockwaves generated during supersonic flight. As part of its Quesst mission, NASA is building a piloted, experimental aircraft called the X-59 to demonstrate low noise supersonic flight. After initial flight testing to ensure the aircraft performs as designed, NASA will begin a national campaign of community overflight tests to collect data on how people perceive the sounds from this new design. The data collected will support national and international noise regulators’ efforts as they consider new standards that would allow supersonic flight over land at low noise levels. This presentation provides an overview of the community test campaign, including the scope, key objectives, stakeholders, and challenges. Session Files Session URLSession Recording	Jonathan Rathsam Senior Research Engineer NASA Langley Research Center (bio) Jonathan Rathsam is a Senior Research Engineer at NASA’s Langley Research Center in Hampton, Virginia. He conducts laboratory and field research on human perceptions of low noise supersonic overflights. He currently serves as Technical Lead of Survey Design and Analysis for Community Test Planning and Execution within NASA’s Commercial Supersonic Technology Project. He has previously served as NASA co-chair for DATAWorks and as chair for a NASA Source Evaluation Board. He holds a Ph.D. in Engineering from the University of Nebraska, a B.A. in Physics from Grinnell College in Iowa, and completed postdoctoral research in acoustics at Ben-Gurion University in Israel.	Presentation	Session Recording	Materials	2023
Analysis of Surrogate Strategies and Regularization with Application to High-Speed Flows (Abstract) Surrogate modeling is an important class of techniques used to reduce the burden of resource-intensive computational models by creating fast and accurate approximations. In aerospace engineering, surrogates have been used to great effect in design, optimization, exploration, and uncertainty quantification (UQ) for a range of problems, like combustor design, spacesuit damage assessment, and hypersonic vehicle analysis. Consequently, the development, analysis, and practice of surrogate modeling is of broad interest. In this talk, several widely used surrogate modeling strategies are studied as archetypes in a discussion on parametric/nonparametric surrogate strategies, local/global model forms, complexity regularization, uncertainty quantification, and relative strengths/weaknesses. In particular, we consider several variants of two widely used classes of methods: polynomial chaos and Gaussian process regression. These surrogate models are applied to several synthetic benchmark test problems and examples of real high-speed flow problems, including hypersonic inlet design, thermal protection systems, and shock-wave/boundary-layer interactions. Through analysis of these concrete examples, we analyze the trade-offs that modelers must navigate to create accurate, flexible, and robust surrogates. Session Files Session URLSession Recording	Gregory Hunt Assistant Professor William & Mary (bio) Greg is an interdisciplinary researcher that helps advance science with statistical and data-analytic tools. He is trained as a statistician, mathematician and computer scientist, and currently works on a diverse set of problems in engineering, physics, and microbiology.		Session Recording	Materials	2023
Presentation Analysis of Surrogate Strategies and Regularization with Application to High-Speed Flows (Abstract) Surrogate modeling is an important class of techniques used to reduce the burden of resource-intensive computational models by creating fast and accurate approximations. In aerospace engineering, surrogates have been used to great effect in design, optimization, exploration, and uncertainty quantification (UQ) for a range of problems, like combustor design, spacesuit damage assessment, and hypersonic vehicle analysis. Consequently, the development, analysis, and practice of surrogate modeling is of broad interest. In this talk, several widely used surrogate modeling strategies are studied as archetypes in a discussion on parametric/nonparametric surrogate strategies, local/global model forms, complexity regularization, uncertainty quantification, and relative strengths/weaknesses. In particular, we consider several variants of two widely used classes of methods: polynomial chaos and Gaussian process regression. These surrogate models are applied to several synthetic benchmark test problems and examples of real high-speed flow problems, including hypersonic inlet design, thermal protection systems, and shock-wave/boundary-layer interactions. Through analysis of these concrete examples, we analyze the trade-offs that modelers must navigate to create accurate, flexible, and robust surrogates. Session Files	Gregory Hunt Assistant Professor William & Mary (bio) Greg is an interdisciplinary researcher that helps advance science with statistical and data-analytic tools. He is trained as a statistician, mathematician and computer scientist, and currently works on a diverse set of problems in engineering, physics, and microbiology.	Presentation			2023
Analysis Opportunities for Missile Trajectories (Abstract) Contractor analysis teams are given 725 monte-carlo generated trajectories for thermal heating analysis. The analysis team currently uses a reduced-order model to evaluate all 725 options for the worst cases. The worst cases are run with the full model for handoff of predicted temperatures to the design team. Two months later, the customer arrives with yet another set of trajectories updated for the newest mission options. This presentation will question each step in this analysis process for opportunities to improve the cost, schedule, and fidelity of the effort. Using uncertainty quantification and functional data analysis processes, the team should be able to improve the analysis coverage and power with reduced (or at least a similar number) of model runs. Session Files Session URLSession Recording	Kelsey Cannon Research Scientist Lockheed Martin Space (bio) Kelsey Cannon is a senior research scientist at Lockheed Martin Space in Denver, CO. She earned a bachelors degree from the CO School of Mines in Metallurgical and Materials Engineering and a masters degree in Computer Science and Data Science. In her current role, Kelsey works across aerospace and DoD programs to advise teams on effective use of statistical and uncertainty quantification techniques to save time and budget.		Session Recording	Materials	2023
Application of Recurrent Neural Network for Software Defect Prediction (Abstract) Traditional software reliability growth models (SRGM) characterize software defect detection as a function of testing time. Many of those SRGM are modeled by the non-homogeneous Poisson process (NHPP). However, those models are parametric in nature and do not explicitly encode factors driving defect or vulnerability discovery. Moreover, NHPP models are characterized by a mean value function that predicts the average of the number of defects discovered by a certain point in time during the testing interval, but may not capture all changes and details present in the data and do not consider them. More recent studies proposed SRGM incorporating covariates, where defect discovery is a function of one or more test activities documented and recorded during the testing process. These covariate models introduce an additional parameter per testing activity, which adds a high degree of non-linearity to traditional NHPP models, and parameter estimation becomes complex since it is limited to maximum likelihood estimation or expectation maximization. Therefore, this talk assesses the potential use of neural networks to predict software defects due to their ability to remember trends. Three different neural networks are considered, including (i) Recurrent neural networks (RNNs), (ii) Long short-term memory (LSTM), and (iii) Gated recurrent unit (GRU) to predict software defects. The neural network approaches are compared with the covariate model to evaluate the ability in predictions. Results suggest that GRU and LSTM present better goodness-of-fit measures such as SSE, PSSE, and MAPE compared to RNN and covariate models, indicating more accurate predictions. Session Files Session URLSession Recording	Fatemeh Salboukh phd student University of Massachusetts Dartmouth (bio) I am a Ph.D. student in the Department of Engineering and Applied Science at the University of Massachusetts Dartmouth. I received my Master’s in from the University of Allame Tabataba’i in Mathematical Statistics (September, 2020) and my Bachelor’s degree from Yazd University (July, 2018) in Applied Statistics.		Session Recording	Materials	2023
Application of Software Reliability and Resilience Models to Machine Learning (Abstract) Machine Learning (ML) systems such as Convolutional Neural Networks (CNNs) are susceptible to adversarial scenarios. In these scenarios, an attacker attempts to manipulate or deceive a machine learning model by providing it with malicious input, necessitating quantitative reliability and resilience evaluation of ML algorithms. This can result in the model making incorrect predictions or decisions, which can have severe consequences in applications such as security, healthcare, and finance. Failure in the ML algorithm can lead not just to failures in the application domain but also to the system to which they provide functionality, which may have a performance requirement, hence the need for the application of software reliability and resilience. This talk demonstrates the applicability of software reliability and resilience tools to ML algorithms providing an objective approach to assess recovery after a degradation from known adversarial attacks. The results indicate that software reliability growth models and tools can be used to monitor the performance and quantify the reliability and resilience of ML models in the many domains in which machine learning algorithms are applied. Session Files Session URLSession Recording	Zakaria Faddi Master Student University of Massachusetts Dartmouth (bio) Zakaria Faddi is a master's student at the University of Massachusetts Dartmouth in the Electrical and Computer Engineering department. He completed his undergraduate in the Spring of 2022 at the same institute in Electrical and Computer Engineering with a concentration in Cybersecurity.		Session Recording	Materials	2023
Presentation Applications of Network Methods for Supply Chain Review (Abstract) The DoD maintains a broad array of systems, each one sustained by an often complex supply chain of components and suppliers. The ways that these supply chains are interlinked can have major implications for the resilience of the defense industrial base as a whole, and the readiness of multiple weapon systems. Finding opportunities to improve overall resilience requires gaining visibility of potential weak links in the chain, which requires integrating data across multiple disparate sources. By using open-source data pipeline software to enhance reproducibility, and flexible network analysis methods, multiple stovepiped data sources can be brought together to develop a more complete picture of the supply chain across systems. Session Files Session URLSession Recording	Zed Fashena Research Associate IDA (bio) Zed Fashena is currently a Research Associate in the Information Technology and Systems Division at the Institute for Defense Analyses. He holds a Master of Science in Statistics from the University of Wisconsin - Madison and a Bachelor of Arts in Economics from Carleton College (MN).	Presentation	Session Recording	Materials	2023
Short Course Applied Bayesian Methods for Test Planning and Evaluation (Abstract) Bayesian methods have been promoted as a promising way for test and evaluation analysts to leverage previous information across a continuum-of-testing approach to system evaluation. This short course will cover how to identify when Bayesian methods might be useful within a test and evaluation context, components required to accomplish a Bayesian analysis, and provide an understanding of how to interpret the results of that analysis. The course will apply these concepts to two hands-on examples (code and applications provided): one example focusing on system reliability and one focusing on system effectiveness. Furthermore, individuals will gain an understanding of the sequential nature of a Bayesian approach to test and evaluation, the limitations thereof, and gain a broad understanding of questions to ask to ensure a Bayesian analysis is appropriately accomplished. Additional Information: Recommended background: basic understanding of common distributions (e.g., Normal distribution, Binomial distribution) To follow along with examples, it is recommended you bring a computer that… o Can access the internet o Already has R Studio loaded (code will be provided; however, not having R Studio will not be a detriment to those who are interested in understanding the overall process but not executing an analysis themselves) Session Files Session Recording Session Recording part2	Victoria Sieck Deputy Director STAT COE/AFIT (bio) Dr. Victoria R. C. Sieck is the Deputy Director of the Scientific Test & Analysis Center of Excellence (STAT COE), where she works with major acquisition programs within the Department of Defense (DoD) to apply rigor and efficiency to current and emerging test and evaluation methodologies through the application of the STAT process. Additionally, she is an Assistant Professor of Statistics at the Air Force Institute of Technology (AFIT), where her research interests include design of experiments, and developing innovate Bayesian approaches to DoD testing. As an Operations Research Analyst in the US Air Force (USAF), her experiences in the USAF testing community include being a weapons and tactics analyst and an operational test analyst. Dr. Sieck has a M.S. in Statistics from Texas A&M University, and a Ph.D. in Statistics from the University of New Mexico.	Short Course	Session Recording Session Recording part2		2023
Short Course Applied Bayesian Methods for Test Planning and Evaluation (Abstract) Bayesian methods have been promoted as a promising way for test and evaluation analysts to leverage previous information across a continuum-of-testing approach to system evaluation. This short course will cover how to identify when Bayesian methods might be useful within a test and evaluation context, components required to accomplish a Bayesian analysis, and provide an understanding of how to interpret the results of that analysis. The course will apply these concepts to two hands-on examples (code and applications provided): one example focusing on system reliability and one focusing on system effectiveness. Furthermore, individuals will gain an understanding of the sequential nature of a Bayesian approach to test and evaluation, the limitations thereof, and gain a broad understanding of questions to ask to ensure a Bayesian analysis is appropriately accomplished. Additional Information Recommended background: basic understanding of common distributions (e.g., Normal distribution, Binomial distribution) To follow along with examples, it is recommended you bring a computer that… o Can access the internet o Already has R Studio loaded (code will be provided; however, not having R Studio will not be a detriment to those who are interested in understanding the overall process but not executing an analysis themselves) Session Files Session Recording Session Recording part2	Cory Natoli Applied Statistician Huntington Ingalls Industries/STAT COE (bio) Dr. Cory Natoli works as an applied statistician at Huntington Ingalls Industries as a part of the Scientific Test and Analysis Techniques Center of Excellence (STAT COE). He received his MS in Applied Statistics from The Ohio State University and his Ph.D. in Statistics from The Air Force Institute of Technology. His emphasis lies in design of experiments, regression modeling, statistical analysis, and teaching.	Short Course	Session Recording Session Recording part2		2023
Short Course Applied Bayesian Methods for Test Planning and Evaluation (Abstract) Bayesian methods have been promoted as a promising way for test and evaluation analysts to leverage previous information across a continuum-of-testing approach to system evaluation. This short course will cover how to identify when Bayesian methods might be useful within a test and evaluation context, components required to accomplish a Bayesian analysis, and provide an understanding of how to interpret the results of that analysis. The course will apply these concepts to two hands-on examples (code and applications provided): one example focusing on system reliability and one focusing on system effectiveness. Furthermore, individuals will gain an understanding of the sequential nature of a Bayesian approach to test and evaluation, the limitations thereof, and gain a broad understanding of questions to ask to ensure a Bayesian analysis is appropriately accomplished. Additional Information Recommended background: basic understanding of common distributions (e.g., Normal distribution, Binomial distribution) To follow along with examples, it is recommended you bring a computer that… o Can access the internet o Already has R Studio loaded (code will be provided; however, not having R Studio will not be a detriment to those who are interested in understanding the overall process but not executing an analysis themselves) Session Files Session Recording Session Recording part2	Corey Thrush Statistician Huntington Ingalls Industries/STAT COE (bio) Mr. Corey Thrush is a statistician at Huntington Ingalls Industries within the Scientific Test and Analysis Techniques Center of Excellence (STAT COE). He received a B.S. in Applied Statistics from Ohio Northern University and an M.A. in Statistics from Bowling Green State University. His interests are data exploration, statistical programming, and Bayesian Statistics.	Short Course	Session Recording Session Recording part2		2023
Keynote April 26th Morning Keynote Session Files Session URLSession Recording	Mr. Peter Coen Mission Integration Manager NASA (bio) Peter Coen currently serves as the Mission Integration Manager for NASA’s Quesst Mission. His primary responsibility in this role is to ensure that the X-59 aircraft development, in-flight acoustic validation and community test elements of the Mission stay on track toward delivering on NASA’s Critical Commitment to provide quiet supersonic overflight response data to the FAA and the International Civil Aviation Organization. Previously, Peter was the manager for the Commercial Supersonic Technology Project in NASA’s Aeronautics Research Mission, where he led a team from the four NASA Aero Research Centers in the development of tools and technologies for a new generation of quiet and efficient supersonic civil transport aircraft. Peter’s NASA career spans almost 4 decades. During this time, he has studied technology integration in practical designs for many different types of aircraft and has made technical and management contributions to all of NASA’s supersonics related programs over the past 30 years. As Project Manager, he led these efforts for 12 years. Peter is a licensed private pilot who has amassed nearly 30 seconds of supersonic flight time.	Keynote	Session Recording	Materials	2023
Presentation April 26th Morning Keynote Session Files	Maj. Gen. Shawn N. Bratton Commander, Space Training and Readiness Command United States Space Force (bio) Maj Gen Shawn N. Bratton is Commander, Space Training and Readiness Command, temporarily located at Peterson Space Force Base, Colorado. Space Training and Readiness Command was established as a Field Command 23 August 2021, and is responsible for preparing the USSF and more than 6,000 Guardians to prevail in competition and conflict through innovative education, training, doctrine, and test activities. Maj Gen Bratton received his commission from the Academy of Military Science in Knoxville, Tenn. Prior to his commissioning Maj Gen Bratton served as an enlisted member of the 107th Air Control Squadron, Arizona Air National Guard. He has served in numerous operational and staff positions. Maj Gen Bratton was the first Air National Guardsman to attend Space Weapons Instructor Course at Nellis Air Force Base. He deployed to the Air Component Coordination Element, Camp Victory Iraq for Operation IRAQI FREEDOM, he served as the USNORTHCOM Director of Space Forces, and commanded the 175th Cyberspace Operations Group, Maryland Air National Guard. He also served as the Deputy Director of Operations, USSPACECOM.	Presentation			2023
Keynote April 26th Morning Keynote Session Files Session Recording	Col Sacha Tomlinson Test Enterprise Division Chief, STARCOM United States Space Force (bio) Colonel Sacha Tomlinson is the Test Enterprise Division Chief for Space Training and Readiness Command (STARCOM), Peterson Space Force Base, Colorado. She is also the STARCOM Commander’s Deputy Operational Test Authority. In these two roles, she is responsible for establishing STARCOM’s test enterprise, managing and executing the continuum of test for space systems and capabilities. Colonel Tomlinson’s background is principally in operational test, having served two assignments in the Air Force Test and Evaluation Center first as the Test Director for the Space Based Infrared System's GEO 1 test campaign, and later as AFOTEC Detachment 4's Deputy Commander. She also had two assignments with the 17th Test Squadron in Air Combat Command, first as the Detachment 4 Commander, responsible for testing the Eastern and Western Range Launch and Test Range Systems, and later commanding the 17th Test Squadron, which tested major sustainment upgrades for Space systems, as well as conducting Tactics Development & Evaluations and the Space Weapon System Evaluation Program.	Keynote	Session Recording		2023
Keynote April 27th Morning Keynote Session Files Session URLSession Recording	The Honorable Christine Fox Senior Fellow Johns Hopkins University Applied Physics Laboratory (bio) The Honorable Christine Fox currently serves as a member of the President’s National Infrastructure Advisory Council, participates on many governance and advisory boards, and is a Senior Fellow at the Johns Hopkins University Applied Physics Laboratory. Previously, she was the Assistant Director for Policy and Analysis at JHU/APL, a position she held from 2014 to early 2022. Before joining APL, she served as Acting Deputy Secretary of Defense from 2013 to 2014 and as Director of Cost Assessment and Program Evaluation (CAPE) from 2009-2013. As Director, CAPE, Ms. Fox served as chief analyst to the Secretary of Defense. She officially retired from the Pentagon in May 2014. Prior to her DoD positions, she served as president of the Center for Naval Analyses from 2005 to 2009, after working there as a research analyst and manager since 1981. Ms. Fox holds a bachelor and master of science degree from George Mason University.	Keynote	Session Recording	Materials	2023
Army Wilks Award Session Files	Wilks Award Winner to Be Announced				2023
ASA SDNS Student Poster Awards Session Files	Student Winners to be Announced				2023
Presentation Assessing Predictive Capability and Contribution for Binary Classification Models (Abstract) Classification models for binary outcomes are in widespread use across a variety of industries. Results are commonly summarized in a misclassification table, also known as an error or confusion matrix, which indicates correct vs incorrect predictions for different circumstances. Models are developed to minimize both false positive and false negative errors, but the optimization process to train/obtain the model fit necessarily results in cost-benefit trades. However, how to obtain an objective assessment of the performance of a given model in terms of predictive capability or benefit is less well understood, due to both the rich plethora of options described in literature as well as the largely overlooked influence of noise factors, specifically class imbalance. Many popular measures are susceptible to effects due to underlying differences in how the data are allocated by condition, which cannot be easily corrected. This talk considers the wide landscape of possibilities from a statistical robustness perspective. Results are shown from sensitivity analyses for a variety of different conditions for several popular metrics and issues are highlighted, highlighting potential concerns with respect to machine learning or ML-enabled systems. Recommendations are provided to correct for imbalance effects, as well as how to conduct a simple statistical comparison that will detangle the beneficial effects of the model itself from those of imbalance. Results are generalizable across model type. Session Files Session URL	Mindy Hotchkiss Technical Specialist Aerojet Rocketdyne (bio) Mindy Hotchkiss is a Technical Specialist and Subject Matter Expert in Statistics for Aerojet Rocketdyne. She holds a BS degree in Mathematics and Statistics and an MBA from the University of Florida, and a Masters in Statistics from North Carolina State University. She has over 20 years of experience as a statistical consultant between Pratt & Whitney and Aerojet Rocketdyne, including work supporting technology development across the enterprise, including hypersonics and metals additive manufacturing. She has been a team member and statistics lead on multiple Metals Affordability Initiative projects, working with industry partners and the Air Force Research Laboratory Materials Directorate. Interests include experimentation, risk and reliability, statistical modeling in any form, machine learning, autonomous systems development and systems engineering, digital engineering, and the practical implementation of statistical methods. She is a Past Chair of the ASQ Statistics Division and currently serves on the Board of Directors for RAMS, the Reliability and Maintainability Symposium, and the Governing Board of the Ellis R. Ott Graduate Scholarship program.	Presentation		Materials	2023
Assessing Risk with Cadet Candidates and USMA Admissions (Abstract) Though the United States Military Academy (USMA) graduates approximately 1,000 cadets annually, over 100 cadets from the initial cohort fail to graduate and are separated or resign at great expense to the federal government. Graduation risk among incoming cadet candidates is difficult to measure; based on current research, the strongest predictors of college graduation risk are high school GPA and, to a lesser extent, standardized test scores. Other predictors include socioeconomic factors, demographics, culture, and measures of prolonged and active participation in extra-curricular activities. For USMA specifically, a cadet candidate’s Whole Candidate Score (WCS), which includes measures to score leadership and physical fitness, has historically proven to be a promising predictor of a cadet’s performance at USMA. However, predicting graduation rates and identifying risk variables still proves to be difficult. Using data from the USMA Admissions Department, we used logistic regression, k-Nearest Neighbors, random forests, and gradient boosting algorithms to better predict which cadets would be separated or resign using potential variables that may relate to graduation risk. Using measures such as p-values for statistical significance, correlation coefficients, and the Area Under the Curve (AUC) scores to determine true positives, we found supplementing the current admissions criteria with data on the participation of certain extra-curricular activities improves prediction rates on whether a cadet will graduate. Session Files	Daniel Lee Cadet United States Military Academy (bio) I was born in Harbor City, California at the turn of the millennium to two Korean immigrant parents. For most of my young life, I grew up in various communities in Southern California before my family ultimately settled in Murrieta in the Inland Empire, roughly equidistant from Los Angeles and San Diego. At the beginning of my 5th grade year, my father accepted a job offer with the US Army Corps of Engineers at Yongsan, South Korea. The seven years that I would spend in Korea would be among the most formative and fond years of my life. In Korea, I grew to better understand the diverse nations that composed the world and grew closer to my Korean heritage that I often forgot living in the US. It was in Korea, however, that I made my most impactful realization: I wanted to serve in the military. The military never crossed my mind as a career growing up. Growing up around the Army in Korea, I knew this was a path I wanted. Though the military was entirely out of my character, I spent the next several years working towards my goal of becoming an Army officer. Just before my senior year of high school, my family moved again to the small, rural town of Vidalia, Louisiana. I transitioned from living in a luxury high-rise in the middle of Seoul to a bungalow in one of the poorest regions of the US. Yet, I once again found myself entranced; not only did I once again grow to love my new home, but I also began to open my mind to the struggles, perspectives, and motivations of many rural Americans. To this day, I proudly proclaim my hometown and state of residence as Vidalia, Louisiana. My acceptance into West Point shortly after my move marked the beginning of my great adventure, fulfilling a life-long dream of serving in the Army and becoming an officer.				2023
Presentation Assurance of Responsible AI/ML in the DOD Personnel Space (Abstract) Testing and assuring responsible use of AI/ML enabled capabilities is a nascent topic in the DOD with many efforts being spearheaded by CDAO. In general, black box models tend to suffer from consequences related to edge cases, emergent behavior, misplaced or lack of trust, and many other issues, so traditional testing is insufficient to guarantee safety and responsibility in the employment of a given AI enabled capability. Focus of this concern tends to fall on well-publicized and high-risk capabilities, such as AI enabled autonomous weapons systems. However, while AI/ML enabled capabilities supporting personnel processes and systems, such as algorithms used for retention and promotion decision support, tend to carry low safety risk, many concerns, some of them specific to the personnel space, run the risk of undermining the DOD’s 5 ethical principles for RAI. Examples include service member privacy concerns, invalid prospective policy analysis, disparate impact against marginalized service member groups, and unintended emergent service member behavior in response to use of the capability. Eroding barriers to use of AI/ML are facilitating an increasing number of applications while some of these concerns are still not well understood by the analytical community. We consider many of these issues in the context of an IDA ML enabled capability and propose mechanisms to assure stakeholders of the adherence to the DOD’s ethical principles. Session Files Session URLSession Recording	John Dennis Research Staff Member (Economist) IDA (bio) Dr. John W. Dennis, PhD, is a research staff member focusing on Econometrics, Statistics, and Data Science in the Institute for Defense Analyses' Human Capital and Test Science groups. He received his PhD in Economics from the University of North Carolina at Chapel Hill in 2019.	Presentation	Session Recording	Materials	2023
Avoiding Pitfalls in AI/ML Packages (Abstract) Recent years have seen an explosion in the application of artificial intelligence and machine learning (AI/ML) to practical problems from computer vision to game playing to algorithm design. This growth has been mirrored and, in many ways, been enabled by the development and maturity of publicly-available software packages such as PyTorch and TensorFlow that make model building, training, and testing easier than ever. While these packages provide tremendous power and flexibility to users, and greatly facilitate learning and deploying AI/ML techniques, they and the models they provide are extremely complicated and as a result can present a number of subtle but serious pitfalls. This talk will present three examples from the presenter's recent experience where obscure settings or bugs in these packages dramatically changed model behavior or performance - one from a classic deep learning application, one from training of a classifier, and one from reinforcement learning. These examples illustrate the importance of thinking carefully about the results that a model is producing and carefully checking each step in its development before trusting its output. Session Files Session URLSession Recording	Justin Krometis Research Assistant Professor Virginia Tech National Security Institute (bio) Justin Krometis is a Research Assistant Professor with the Virginia Tech National Security Institute and holds an adjunct position in the Math Department at Virginia Tech. His research in mostly in development of theoretical and computational frameworks for Bayesian data analysis. These include approaches to incorporating and balancing data and expert opinion into decision-making, estimating model parameters, including high- or even infinite-dimensional quantities, from noisy data, and designing experiments to maximize the information gained. He also has extensive expertise in high-performance computing and more recently-developed skills in Artificial Intelligence/Machine Learning (AI/ML) techniques. Research interests include: Statistical Inverse Problems, High-Performance Computing, Parameter Estimation, Uncertainty Quantification, Artificial Intelligence/Machine Learning (AI/ML), Reinforcement Learning, and Experimental Design. Prior to joining VTNSI, Dr. Krometis spent ten years as a Computational Scientist supporting high-performance computing with Advanced Research Computing at Virginia Tech and seven years in the public and private sectors doing transportation modeling for planning and evacuation applications and hurricane, pandemic, and other emergency preparedness. He holds Ph.D., M.S., and B.S. degrees in Math and a B.S. degree in Physics, all from Virginia Tech.		Session Recording	Materials	2023
Back to the Future: Implementing a Time Machine to Improve and Validate Model Predictions (Abstract) At a time when supply chain problems are challenging even the most efficient and robust supply ecosystems, the DOD faces the additional hurdles of primarily dealing in low volume orders of highly complex components with multi-year procurement and repair lead times. When combined with perennial budget shortfalls, it is imperative that the DOD spend money efficiently by ordering the “right” components at the “right time” to maximize readiness. What constitutes the “right” components at the “right time” depends on model predictions that are based upon historical demand rates and order lead times. Given that the time scales between decisions and results are often years long, even small modeling errors can lead to months-long supply delays or tens of millions of dollars in budget shortfalls. Additionally, we cannot evaluate the accuracy and efficacy of today’s decisions for some years to come. To address this problem, as well as a wide range of similar problems across our Sustainment analysis, we have built “time machines” to pursue retrospective validation – for a given model, we rewind DOD data sources to some point in the past and compare model predictions, using only data available at the time, against known historical outcomes. This capability allows us to explore different decisions and the alternate realities that would manifest in light of those choices. In some cases, this is relatively straightforward, while in others it is made quite difficult by problems familiar to any time-traveler: changing the past can change the future in unexpected ways. Session Files Session URLSession Recording	Olivia Gozdz and Kyle Remley Research Staff Member IDA (bio) Dr. Gozdz received her Bachelor's of Science in Physics from Hamilton College in 2016, and she received her Ph.D. in Climate Science from George Mason University in 2022. Dr. Gozdz has been a Research Staff Member with IDA since September 2022. Dr. Remley received his Bachelor's of Science in Nuclear and Radiological Engineering from Georgia Tech in 2013, he received in Master's of Science in Nuclear Engineering from Georgia Tech in 2015, and he received in Ph.D. in Nuclear and Radiological Engineering from Georgia Tech in 2016. He was a senior engineer with the Naval Nuclear Laboratory until 2020. Dr. Remley has been a Research Staff Member with IDA since July 2020.		Session Recording	Materials	2023
Presentation Best Practices for Using Bayesian Reliability Analysis in Developmental Testing (Abstract) Traditional methods for reliability analysis are challenged in developmental testing (DT) as systems become increasingly complex and DT programs become shorter and less predictable. Bayesian statistical methods, which can combine data across DT segments and use additional data to inform reliability estimates, can address some of these challenges. However, Bayesian methods are not widely used. I will present the results of a study aimed at identifying effective practices for the use of Bayesian reliability analysis in DT programs. The study consisted of interviews with reliability subject matter experts, together with a review of relevant literature on Bayesian methods. This analysis resulted in a set of best practices that can guide an analyst in deciding whether to apply Bayesian methods, in selecting the appropriate Bayesian approach, and in applying the Bayesian method and communicating the results. Session Files	Paul Fanto Research Staff Member IDA (bio) Paul Fanto is a research staff member at the Institute for Defense Analyses (IDA). His work focuses on the modeling and analysis of space and ISR systems and on statistical methods for reliability. He received a Ph.D. in Physics from Yale University in 2021, where he developed computational models of atomic nuclei.	Presentation			2023
Presentation Best Practices for Using Bayesian Reliability Analysis in Developmental Testing (Abstract) Traditional methods for reliability analysis are challenged in developmental testing (DT) as systems become increasingly complex and DT programs become shorter and less predictable. Bayesian statistical methods, which can combine data across DT segments and use additional data to inform reliability estimates, can address some of these challenges. However, Bayesian methods are not widely used. I will present the results of a study aimed at identifying effective practices for the use of Bayesian reliability analysis in DT programs. The study consisted of interviews with reliability subject matter experts, together with a review of relevant literature on Bayesian methods. This analysis resulted in a set of best practices that can guide an analyst in deciding whether to apply Bayesian methods, in selecting the appropriate Bayesian approach, and in applying the Bayesian method and communicating the results. Session Files	Paul Fanto Research Staff Member Institute for Defense Analyses (bio) Paul Fanto is a research staff member at the Institute for Defense Analyses (IDA). His work focuses on the modeling and analysis of space and ISR systems and on statistical methods for reliability. He received a Ph.D. in Physics from Yale University in 2021, where he developed computational models of atomic nuclei.	Presentation			2023
Case Study on Test Planning and Data Analysis for Comparing Time Series (Abstract) Several years ago, the US Army Research Institute of Environmental Medicine developed an algorithm to estimate core temperature in military working dogs (MWDs). This canine thermal model (CTM) is based on thermophysiological principles and incorporates environmental factors and acceleration. The US Army Medical Materiel Development Activity is implementing this algorithm in a collar-worn device that includes computing hardware, environmental sensors, and an accelerometer. Among other roles, Johns Hopkins University Applied Physics Laboratory (JHU/APL) is coordinating the test and evaluation of this device. The device’s validation is ultimately tied to field tests involving MWDs. However, to minimize the burden to MWDs and the interruptions to their training, JHU/APL seeks to leverage non-canine laboratory-based testing to the greatest possible extent. For example, JHU/APL is testing the device’s accelerometers with shaker tables that vertically accelerate the device according to specified sinusoidal acceleration profiles. This test yields time series of acceleration and related metrics, which are compared to ground-truth measurements from a reference accelerometer. Statistically rigorous comparisons between the CTM and reference measurements must account for the potential lack of independence between measurements that are close in time. Potentially relevant techniques include downsampling, paired difference tests, hypothesis tests of absolute difference, hypothesis tests of distributions, functional data analysis, and bootstrapping. These considerations affect both test planning and subsequent data analysis. This talk will describe JHU/APL’s efforts to test and evaluate the CTM accelerometers and will outline a range of possible methods for comparing time series. Session Files Session URLSession Recording	Phillip Koshute Johns Hopkins University Applied Physics Laboratory (bio) Phillip Koshute is a data scientist and statistical modeler at the Johns Hopkins University Applied Physics Laboratory. He has degrees in mathematics and operations research and is currently pursuing his PhD in applied statistics at the University of Maryland.		Session Recording	Materials	2023
Presentation CDAO Joint AI Test Infrastructure Capability (Abstract) The Chief Digital and AI Office (CDAO) Test & Evaluation Directorate is developing the Joint AI Test Infrastructure Capability (JATIC) program of record, which is an interoperable set of state-of-the-art software capabilities for AI Test & Evaluation. It aims to provide a provide a comprehensive suite of integrated testing tools which can be deployed widely across the enterprise to address key T&E gaps. In particular, JATIC will capabilities will support the assessment of AI system performance, cybersecurity, adversarial resilience, and explainability - enabling the end-user to more effectively execute their mission. It is a key component of the digital testing infrastructure that the CDAO will provide in order to support the development and deployment of data, analytics, and AI across the Department. Session Files Session URLSession Recording	David Jin Senior AI Engineer MITRE (bio) David Jin is the AI Test Tools Lead at the Chief Digital and AI Office. Within this role, he leads the Joint AI Test Infrastructure Capability Program which is developing software tools for rigorous AI algorithmic testing. His background is in computer vision and pure mathematics.	Presentation	Session Recording	Materials	2023
Presentation Circular Error Probable and an Example with Multilevel Effects (Abstract) Circular Error Probable (CEP) is a measure of a weapon system’s precision developed based on the Bivariate Normal Distribution. Failing to understanding the theory behind CEP can result in misuse of equations developed to help estimation. Estimation of CEP is also much more straightforward given situations such as single samples where factors are not being manipulated. This brief aims to help build a theoretical understanding of CEP, and then presents a non-trivial example in which CEP is estimated via multilevel regression. The goal is to help build an understanding of CEP so it can be properly estimated in trivial (single sample) and non-trivial cases (e.g. regression and multilevel regression). Session Files Session URLSession Recording	Jacob Warren Assistant Scientific Advisor Marine Corps Operational Test and Evaluation Activity (bio) Jacob Warren is the Assistant Scientific Advisor for the Marine Corps Operational Test and Evaluation Activity (MCOTEA). He was worked for MCOTEA since 2011 starting as a statistician before moving into his current role. Mr. Warren has a Master of Science degree in Applied Statistics from the Rochester Institute of Technology.	Presentation	Session Recording	Materials	2023
Presentation Coming Soon Session Files	Cosmin Safta Sandia National Laboratories	Presentation			2023
Comparing Normal and Binary D-optimal Design of Experiments by Statistical Power (Abstract) In many Department of Defense (DoD) Test and Evaluation (T&E) applications, binary response variables are unavoidable. Many have considered D-optimal design of experiments (DOEs) for generalized linear models (GLMs). However, little consideration has been given to assessing how these new designs perform in terms of statistical power for a given hypothesis test. Monte Carlo simulations and exact power calculations suggest that D-optimal designs generally yield higher power than binary D-optimal designs, despite using logistic regression in the analysis after data have been collected. Results from using statistical power to compare designs contradict traditional DOE comparisons which employ D-efficiency ratios and fractional design space (FDS) plots. Power calculations suggest that practitioners that are primarily interested in the resulting statistical power of a design should use normal D-optimal designs over binary D-optimal designs when logistic regression is to be used in the data analysis after data collection. Session Files Session URLSession Recording	Addison Adams Summer Associate / Graduate Student IDA / Colorado State University (bio) Addison joined the Institute for Defense Analysis (IDA) during the summer of 2022. Addison is currently a PhD student at Colorado State University where he is studying statistics. Addison's PhD research is focused on the stochastic inverse problem and its applications to random coefficient models. Before attending graduate school, Addison worked as a health actuary for Blue Cross of Idaho. Addison attended Utah Valley University (UVU) where he earned a BS in mathematics. During his time at UVU, Addison completed internships with both the FBI and AON.		Session Recording	Materials	2023
Presentation Comparison of Bayesian and Frequentist Methods for Regression (Abstract) Statistical analysis is typically conducted using either a frequentist or Bayesian approach. But what is the impact of choosing one analysis method over another? This presentation will compare the results of both linear and logistic regression using Bayesian and frequentist methods. The data set combines information on simulated diffusion of material and anticipated background signal to imitate sensor output. The sensor is used to estimate the total concentration of material, and a threshold will be set such that the false alarm rate (FAR) due to the background is a constant. The regression methods are used to relate the probability of detection, for a given FAR, to predictor variables, such as the total amount of material released. The presentation concludes with a comparison of the similarities and differences between the two methods given the results. Session Files	James P Theimer Operations Research Analyst/STAT Expert Homeland Security Community of Best Practices (bio) Dr. James Theimer is a Scientific Test and Analysis Techniques Expert employed by Huntington Ingles Industries Technical Solutions and working to support the Homeland Security Center of Best Practices. Dr. Theimer worked for Air Force Research Laboratory and predecessor organizations for more than 35 years. He worked on modeling and simulation of sensors systems and supporting devices. His doctoral research was on modeling pulse formation in fiber lasers. He worked with a semiconductor reliability team as a reliability statistician and led a team which studied statistical validation of models of automatic sensor exploitation systems. This team also worked with programs to evaluate these systems. Dr. Theimer has a PhD in Electrical Engineering from Rensselaer Polytechnic Institute, and MS in Applied Statistics from Wright State University, and MS in Atmospheric Science from SUNY Albany and a BS in Physics from University of Rochester.	Presentation			2023
Presentation Comparison of Magnetic Field Line Tracing Methods (Abstract) At George Mason University, we are developing swmfio, a Python package, for processing Space Weather Modeling Framework (SWMF) magnetosphere and ionosphere results, which is used to study the sun, heliosphere, and the magnetosphere. The SWMF framework centers around a high-performance magnetohydrodynamic (MHD) model, the Block Adaptive Tree Solar-wind Roe Upwind Scheme (BATS-R-US). This analysis uses swmfio and other methods, to trace magnetic field lines, compare the results, and identify why the methods differ. While the earth's magnetic field protects the planet from solar radiation, solar storms can distort the earth's magnetic field allowing solar storms to damage satellites and electrical grids. Being able to trace magnetic field lines helps us understand space weather. In this analysis, the September 1859 Carrington Event is examined. This event is the most intense geomagnetic storm in recorded history. We use three methods to trace magnetic field lines in the Carrington Event, and compare the field lines generated by the different methods. We consider two factors in the analysis. First, we directly compare methods by measuring the distances between field lines generated by different methods. Second, we consider how sensitive the methods are to initial conditions. We note that swmfio’s linear interpolation, which is customized for the BATS-R-US adaptive mesh, provides expected results. It is insensitive to small changes in initial conditions and terminates field lines at boundaries. We observe, that for any method, when the mesh size becomes large, results may not be accurate. Session Files	Dean Thomas Researcher George Mason University (bio) In 2022, Dean Thomas joined a NASA Goddard collaboration examining space weather phenomena. His research is examining major solar events that affect the earth. While the earth's magnetic field protects the earth from solar radiation, solar storms can distort the earth's magnetic field allowing the storms to damage satellites and electrical grids. Previously, he was Deputy Director for the Operational Evaluation Division (OED) at the Institute for Defense Analyses (IDA), managing a team of 150 researchers. OED supports the Director, Operational Test and Evaluation (DOT&E) within the Pentagon, who is responsible for operational testing of new military systems including aircraft, ships, ground vehicles, sensors, weapons, and information technology systems. His analyses fed into DOT&E’s reports and testimony to Congress and the Secretary of Defense on whether these new systems can successfully complete their missions and protect their crews. He received his PhD in Physics in 1987 from the State University of New York (SUNY), Stony Brook.	Presentation			2023
Presentation Confidence Intervals for Derringer and Suich Desirability Function Optimal Points (Abstract) A shortfall of the Derringer and Suich (1980) desirability function for multi-objective optimization has been a lack of inferential methods to quantify uncertainty. Most articles for addressing uncertainty involve robust methods, providing a point estimate that is less affected by variation. Few articles address confidence intervals or bands but not specifically for the widely used Derringer and Suich method. 8 methods are presented to construct 100(1-alpha) confidence intervals around Derringer and Suich desirability function optimal values. First order and second order models using bivariate and multivariate data sets are used as examples to demonstrate effectiveness. The 8 proposed methods include a simple best/worst case method, 2 generalized methods, 4 simulated surface methods, and a nonparametric bootstrap method. One of the generalized methods, 2 of the simulated surface methods, and the nonparametric method account for covariance between the response surfaces. All 8 methods seem to perform decently on the second order models; however, the methods which utilize an underlying multivariate-t distribution, Multivariate Generalized (MG) and Multivariate t Simulated Surface (MVtSSig) are recommended methods from this research as they perform well with small samples for both first order and second order models with coverage only becoming unreliable at consistently non-optimal solutions. MG and MVtSSig inference could also be used in conjunction with robust methods such as Pareto Front Optimization to help ascertain which solutions are more likely to be optimal before constructing confidence interval. Session Files Session URLSession Recording	Peter Calhoun Operational Test Analyst HQ AFOTEC (bio) Peter Calhoun received the B.S degree in Applied Mathematics from the University of New Mexico, M.S. in Operations Research from the Air Force Institute of Technology (AFIT), and Ph.D in Applied Mathematics from AFIT. He has been an Operations Research Analyst with the United States Air Force since 2017. He is currently an Operational Test Analyst at HQ AFOTEC. His research interests are analysis of designed experiments, multivariate statistics, and response surface methodology.	Presentation	Session Recording	Materials	2023
Covariate Resilience Modeling (Abstract) Resilience is the ability of a system to respond, absorb, adapt, and recover from a disruptive event. Dozens of metrics to quantify resilience have been proposed in the literature. However, fewer studies have proposed models to predict these metrics or the time at which a system will be restored to its nominal performance level after experiencing degradation. This talk presents three alternative approaches to model and predict performance and resilience metrics with techniques from reliability engineering, including (i) bathtub-shaped hazard functions, (ii) mixture distributions, and (iii) a model incorporating covariates related to the intensity of events that degrade performance as well as efforts to restore performance. Historical data sets on job losses during seven different recessions in the United States are used to assess the predictive accuracy of these approaches, including the recession that began in 2020 due to COVID-19. Goodness of fit measures and confidence intervals as well as interval-based resilience metrics are computed to assess how well the models perform on the data sets considered. The results suggest that both bathtub-shaped functions and mixture distributions can produce accurate predictions for data sets exhibiting V, U, L, and J shaped curves, but that W and K shaped curves that respectively experience multiple shocks, deviate from the assumption of a single decrease and subsequent increase, or suffers a sudden drop in performance cannot be characterized well by either of those classes proposed. In contrast, the model incorporating covariates is capable of tracking all of types of curves noted above very well, including W and K shaped curves such as the two successive shocks the U.S. economy experienced in 1980 and the sharp degradation in 2020. Moreover, covariate models outperform the simpler models on all of the goodness of fit measures and interval-based resilience metrics computed for all seven data sets considered. These results suggest that classical reliability modeling techniques such as bathtub-shaped hazard functions and mixture distributions are suitable for modeling and prediction of some resilience curves possessing a single decrease and subsequent recovery, but that covariate models to explicitly incorporate explanatory factors and domain specific information are much more flexible and achieve higher goodness of fit and greater predictive accuracy. Thus, the covariate modeling approach provides a general framework for data collection and predictive modeling for a variety of resilience curves. Session Files Session URLSession Recording	Priscila Silva (additional authors: Andrew Bajumpaa, Drew Borden, and Christian Taylor) Graduate Research Assistant University of Massachusetts Dartmouth (bio) Priscila Silva is a Ph.D. student in Electrical and Computer Engineering at University of Massachusetts Dartmouth (UMassD). She received her MS in Computer Engineering from UMassD in 2022, and her BS degree in Electrical Engineering from Federal University of Ouro Preto (UFOP) in 2017. Andrew Bajumpaa is an undergraduate student in Computer Science at University of Massachusetts Dartmouth. Drew Borden is an undergraduate student in Computer Engineering at University of Massachusetts Dartmouth. Christian Taylor is an undergraduate student in Computer Engineering at University of Massachusetts Dartmouth.		Session Recording	Materials	2023
Covariate Software Vulnerability Discovery Model to Support Cybersecurity T&E (Abstract) Vulnerability discovery models (VDM) have been proposed as an application of software reliability growth models (SRGM) to software security related defects. VDM model the number of vulnerabilities discovered as a function of testing time, enabling quantitative measures of security. Despite their obvious utility, past VDM have been limited to parametric forms that do not consider the multiple activities software testers undertake in order to identify vulnerabilities. In contrast, covariate SRGM characterize the software defect discovery process in terms of one or more test activities. However, data sets documenting multiple security testing activities suitable for application of covariate models are not readily available in the open literature. To demonstrate the applicability of covariate SRGM to vulnerability discovery, this research identified a web application to target as well as multiple tools and techniques to test for vulnerabilities. The time dedicated to each test activity and the corresponding number of unique vulnerabilities discovered were documented and prepared in a format suitable for application of covariate SRGM. Analysis and prediction were then performed and compared with a flexible VDM without covariates, namely the Alhazmi-Malaiya Logistic Model (AML). Our results indicate that covariate VDM significantly outperformed the AML model on predictive and information theoretic measures of goodness of fit, suggesting that covariate VDM are a suitable and effective method to predict the impact of applying specific vulnerability discovery tools and techniques. Session Files Session URLSession Recording	Lance Fiondella Associate Professor University of Massachusetts (bio) Lance Fiondella is an Associate Professor in the Department of Electrical & Computer Engineering at the University of Massachusetts Dartmouth and the Founding Director of the University of Massachusetts Dartmouth Cybersecurity Center, A NSA/DHS designated Center of Academic Excellence in Cyber Research (CAE-R). His research has been funded by DHS, Army, Navy, Air Force, NASA, and National Science Foundation, including a CAREER award and CyberCorps Scholarship for Service.		Session Recording	Materials	2023
Cyber Testing Embedded Systems with Digital Twins (Abstract) Dynamic cyber testing and analysis require instrumentation to facilitate measurements, e.g., to determine which portions of code have been executed, or detection of anomalous conditions which might not manifest at the system interface. However, instrumenting software causes execution to diverge from the execution of the deployed binaries. And instrumentation requires mechanisms for storing and retrieving testing artifacts on target systems. RESim is a dynamic testing and analysis platform that does not instrument software. Instead, RESim instruments high fidelity models of target hardware upon which software-under-test executes, providing detailed insight into program behavior. Multiple modeled computer platforms run within a single simulation that can be paused, inspected and run forward or backwards to selected events such as the modification of a specific memory address. Integration of the Google’s AFL fuzzer with RESim avoids the need to create fuzzing harnesses because programs are fuzzed in their native execution environment, commencing from selected execution states with data injected directly into simulated memory instead of I/O streams. RESim includes plugins for the IDA Pro and NSA’s Ghidra disassembler/debuggers to facilitate interactive analysis of individual processes and threads, providing the ability to skip to selected execution states (e.g., a reference to an input buffer) and “reverse execution” to reach a breakpoint by appearing to run backwards in time. RESim simulates networks of computers through use of Wind River’s Simics platform of high fidelity models of processors, peripheral devices (e.g., network interface cards), and memory. The networked simulated computers load and run firmware and software from images extracted from the physical systems being tested. Instrumenting the simulated hardware allows RESim to observe software behavior from the other side of the hardware, i.e., without affecting its execution. Simics includes tools to extend and create high fidelity models of processors and devices, providing a clear path to deploying and managing digital twins for use in developmental test and evaluation. The simulations can include optional real-world network and bus interfaces to facilitate integration into networks and test ranges. Simics is a COTS product that runs on commodity hardware and is able to execute several parallel instances of complex multi-component systems on a typical engineering workstation or server. This presentation will describe RESim and strategies for using digital twins for cyber testing of embedded systems. And the presentation will discuss some of the challenges associated with fuzzing non-trivial software systems. Session Files Session URLSession Recording	Michael Thompson Research Associate Naval Postgraduate School (bio) Mike Thompson is a Research Associate at the Naval Postgraduate School. Mike is the lead developer of the RESim reverse engineering platform, which grew out of his work as a member of the competition infrastructure development team for DARPA's Cyber Grand Challenge. He is the lead developer for the Labtainers cyber lab exercise platform and for the CyberCIEGE educational video game. Mike has decades of experience developing products for software vulnerability analysis, cybersecurity education and high assurance trusted platforms.		Session Recording	Materials	2023
Presentation Data Fusion: Using Data Science to Facilitate the Fusion of Multiple Streams of Data (Abstract) Today there are an increasing number of sensors on the battlefield. These sensors collect data that includes, but is not limited to, images, audio files, videos, and text files. With today’s technology, the data collection process is strong, and there is a growing opportunity to leverage multiple streams of data, each coming in different forms. This project aims to take multiple types of data, specifically images and audio files, and combine them to increase our ability to detect and recognize objects. The end state of this project is the creation of an algorithm that utilizes and merges voice recordings and images to allow for easier recognition. Most research tends to focus on one modality or the other, but here we focus on the prospect of simultaneously leveraging both modalities for improved entity resolution. With regards to audio files, the most successful deconstruction and dimension reduction technique is a deep auto encoder. For images, the most successful technique is the use of a convolutional neural network. To combine the two modalities, we focused on two different techniques. The first was running each data source through a neural network and multiplying the resulting class probability vectors to capture the combined result. The second technique focused on running each data source through a neural network, extracting a layer from each network, concatenating the layers for paired image and audio samples, and then running the concatenated object through a fully connected neural network. Session Files	Madison McGovern Cadet United States Military Academy (bio) Madison McGovern is a senior at the United States Military Academy majoring in Applied Statistics and Data Science. Upon graduation, she is headed to Fort Gordon, GA to join the Army’s Cyber branch as a Cyber Electromagnetic Warfare Officer. Her research interests include using machine learning to assist military operations.	Presentation			2023
Presentation Data Literacy Within the Department of Defense (Abstract) Data literacy, the ability to read, write, and communicate data in context, is fundamental for military organizations to create a culture where data is appropriately used to inform both operational and non-operational decisions. However, oftentimes organizations outsource data problems to outside entities and rely on a small cadre of data experts to tackle organizational problems. In this talk we will argue that data literacy is not solely the role or responsibility of the data expert. Ultimately, if experts develop tools and analytics that Army decision makers cannot use, or do not effectively understand the way the Army makes decisions, the Army is no more data rich than if it had no data at all. While serving on a sabbatical as the Chief Data Scientist for Joint Special Operations Command, COL Nick Clark (Department of Mathematical Sciences, West Point), noticed that a lack of basic data literacy skills was a major limitation to creating a data centric organization. As a result of this, he created 10 hours of training focusing on the fundamentals of data literacy. After delivering the course to JSOC, other DoD organizations began requesting the training. In response to this, a team from West Point joined with Army Talent Management Task Force to create mobile training teams. The teams have now delivered the training over 30 times to organizations ranging from tactical units up to strategic level commands. In this talk, we discuss what data literacy skills should be taught to the force and highlight best practices in educating soldiers, civilians, and contractors on the basics of data literacy. We will finally discuss strategies for assessing organizational Data Literacy and provide a framework for attendees to assess their own organizations data strengths and weaknesses. Session Files Session URLSession Recording	Nicholas Clark Associate Professor United States Military Academy (bio) COL Nicholas Clark is an Associate Professor in the Department of Mathematical Sciences at West Point where he is the Program Director for West Point's Applied Statistics and Data Science Program. Nick received a BS in Mathematics from West Point in 2002, a MS in Statistics from George Mason in 2010, and a PhD in Statistics from Iowa State University in 2018. His dissertation was on Self-Exciting Spatio-Temporal Statistical Models and he has published in a variety of disciplines including spatio-temporal statistics, best practices in statistical methodologies, epidemiology, and sports statistics. Nick is the former director of the Center for Data Analysis and Statistics, where he conducted research for a variety of Department of Defense clients. COL Clark served as the Chief Data Scientist for JSOC while on sabbatical from June 2021 - June 2022. While in this role he created the Army's Data Literacy 101 course teaching the fundamentals of Data Literacy to Army soldiers, civilians and contractors. Since inception, he and his team have now delivered the course over 30 times to a wide range of Army organizations.	Presentation	Session Recording	Materials	2023
Data Management for Research, Development, Test, and Evaluation (Abstract) It is important to manage Data from research, development, test, and evaluation effectively. Well-managed data makes research more efficient and promotes better analysis and decision-making. At present, numerous federal organizations are engaged in large-scale reforms to improve the way they manage their data, and these reforms are already effecting the way research is executed. Data management effects every part of the research process. Thoughtful, early planning sets research projects on the path to success by ensuring that the resources and expertise required to effectively manage data throughout the research process are in place when they are needed. This interactive tutorial will discuss the planning and execution of data management for research projects. Participants will build a data management plan, considering data security, organization, metadata, reproducibility, and archiving. By the conclusion of the tutorial, participants will be able to define data management and understand its importance, understand how the data lifecycle relates to the research process, and be able to build a data management plan. Session Files Session URLSession Recording	Matthew Avery Assistant Director IDA (bio) Matthew Avery is an OED Assistant Director and part of OED’s Sustainment group. He represents OED on IDA’s Data Governance Council and acts as the Deputy to IDA’s Director of Data Strategy and Chief Data Officer, helping craft data-related strategy and policy. Matthew spearheads a Sustainment group effort to develop an end-to-end model to identify ways to improve mission-capable rates for the V-22 fleet. Prior to joining Sustainment, Matthew was on the Test Science team. As the Test Science Data Management lead, he helped develop analytical methods and tools for operational test and evaluation. He also led OED’s project on operational test and evaluation of Army and Marine Corps unmanned aircraft systems. In 2018-19 Matthew served as an embedded analyst in the Pentagon’s Office of Cost Assessment and Program Evaluation, where among other projects he built state-space models in support of the Space Control Strategic Portfolio Review. Matthew earned his PhD in Statistics from North Carolina State University in 2012, his MS in Statistics from North Carolina State in 2009, and a BA from New College of Florida in 2006. He is a member of the American Statistical Association.		Session Recording	Materials	2023
Data Management for Research, Development, Test, and Evaluation (Abstract) It is important to manage Data from research, development, test, and evaluation effectively. Well-managed data makes research more efficient and promotes better analysis and decision-making. At present, numerous federal organizations are engaged in large-scale reforms to improve the way they manage their data, and these reforms are already effecting the way research is executed. Data management effects every part of the research process. Thoughtful, early planning sets research projects on the path to success by ensuring that the resources and expertise required to effectively manage data throughout the research process are in place when they are needed. This interactive tutorial will discuss the planning and execution of data management for research projects. Participants will build a data management plan, considering data security, organization, metadata, reproducibility, and archiving. By the conclusion of the tutorial, participants will be able to define data management and understand its importance, understand how the data lifecycle relates to the research process, and be able to build a data management plan. Session Files Session Recording	Heather Wojton Chief Data Officer IDA (bio) Heather Wojton is the Director, Research Quality and Chief Data Officer for IDA, a role she assumed in 2021. In this position, Heather provides strategic leadership, project management and direction for the corporation’s data strategy. She is responsible for enhancing IDA’s ability to efficiently and effectively accomplish research and business operations by assessing and evolving data systems, data management infrastructure and data-related practices. She also oversees the quality management processes for research projects, including the research product publication process and the technical review process. Heather joined IDA in 2015 as a member of the research staff. She is an expert in quantitative research methods, including test design and program evaluation. She held numerous research and leadership roles before being named an assistant director of a research division. As a researcher at IDA, Heather led IDA’s test science research program that facilitates data-driven decision-making within the Department of Defense by advancing statistical, behavioral, and data science methodologies and applying them to the evaluation of defense acquisition programs. Heather’s other accomplishments include advancing methods for test design, modeling and simulation validation, data management and curation, and artificial intelligence testing. In this role, she worked closely with academic and Defense Department partners to adapt existing test design and evaluation methods and develop novel methods where gaps persisted. Heather has a doctorate in experimental psychology from the University of Toledo and a bachelor’s degree in research psychology from Marietta College, where she was a member of the McDonough International Leadership Program. She is a graduate of the George Washington University National Security Studies Senior Management Program and the Maxwell School National Security Management Course at Syracuse University.		Session Recording		2023
DATAWorks Distinguished Leadership Award Session Files	Winner To Be Announced				2023
Presentation Developing a Domain-Specific NLP Topic Modeling Process for Army Experimental Data (Abstract) Researchers across the U.S. Army are conducting experiments on the implementation of emerging technologies on the battlefield. Key data points from these experiments include text comments on the technologies’ performances. Researchers use a range of Natural Language Processing (NLP) tasks to analyse such comments, including text summarization, sentiment analysis, and topic modelling. Based on the successful results from research in other domains, this research aims to yield greater insights by implementing military-specific language as opposed to a generalized corpus. This research is dedicated to developing a methodology to analyze text comments from Army experiments and field tests using topic models trained on an Army domain-specific corpus. The methodology is tested on experimental data agglomerated in the Forge database, an Army Futures Command (AFC) initiative to provide researchers with a common operating picture of AFC research. As a result, this research offers an improved framework for analysis with domain-specific topic models for researchers across the U.S. Army. Session Files	Anders Grau Cadet United States Military Academy (bio) Anders Grau is a United States Military Academy cadet currently studying for a Bachelor of Science in Operations Research. In his time as a cadet, he has had the opportunity to work with the Research Facilitation Laboratory to analyse insider threats in the Army and has conducted an independent study on topic modelling with Twitter data. He is currently writing a thesis on domain-specific topic modelling for Army experimental data. Upon the completion of his studies, he will commission as a Second Lieutenant in the Army's Air Defense Artillery branch.	Presentation			2023
Development of a Wald-Type Statistical Test to Compare Live Test Data and M&S Predictions (Abstract) This work describes the development of a statistical test created in support of ongoing verification, validation, and accreditation (VV&A) efforts for modeling and simulation (M&S) environments. The test decides between a null hypothesis of agreement between the simulation and reality, and an alternative hypothesis stating the simulation and reality do not agree. To do so, it generates a Wald-type statistic that compares the coefficients of two generalized linear models that are estimated on live test data and analogous simulated data, then determines whether any of the coefficient pairs are statistically different. The test was applied to two logistic regression models that were estimated from live torpedo test data and simulated data from the Naval Undersea Warfare Center’s (NUWC) Environment Centric Weapons Analysis Facility (ECWAF). The test did not show any significant differences between the live and simulated tests for the scenarios modeled by the ECWAF. While more work is needed to fully validate the ECWAF’s performance, this finding suggests that the facility is adequately modeling the various target characteristics and environmental factors that affect in-water torpedo performance. The primary advantage of this test is that it is capable of handling cases where one or more variables are estimable in one model but missing or inestimable from the other. While it is possible to simply create the linear models on the common set of variables, this results in the omission of potentially useful test data. Instead, this approach identifies the mismatched coefficients and combines them with the model’s intercept term, thus allowing the user to consider models that are created on the entire set of available data. Furthermore, the test was developed in a generalized manner without any references to a specific dataset or system. Therefore, other researchers who are conducting VV&A processes on other operational systems may benefit from using this test for their own purposes. Session Files Session URLSession Recording	Carrington Metts Data Science Fellow IDA (bio) Carrington Metts is a Data Science Fellow at IDA. She has a Masters of Science in Business Analytics from the College of William and Mary. Her work at IDA encompasses a wide range of topics, including wargaming, modeling and simulation, natural language processing, and statistical analyses.		Session Recording	Materials	2023
Digital Transformation Enabled by Enterprise Automation (Abstract) Digital transformation is a broad term that means a variety of things to people in many different operational domains, but the underlying theme is consistent: using digital technologies to improve business processes, culture, and efficiency. Digital transformation results in streamlining communications, collaboration, and information sharing while reducing errors. Properly implemented digital processes provide oversight and cultivate accountability to ensure compliance with business processes and timelines. A core tenet of effective digital transformation is automation. The elimination or reduction of human intervention in processes provides significant gains to operational speed, accuracy, and efficiency. DOT&E uses automation to streamline the creation of documents and reports which need to include up-to-date information. By using Smart Documentation capabilities, authors can define and automatically populate sections of documents with the most up-to-date data, ensuring that every published document always has the most current information. This session discusses a framework for driving digital transformation to automate nearly any business process. Session Files Session URLSession Recording	Nathan Pond Program Manager - Business Enterprise Systems Edaptive Computing, Inc. (bio) Nathan Pond is the Program Manager for Business Enterprise Systems at Edaptive Computing, Inc., where he works to provide integrated technology solutions around a variety of business and engineering processes. He oversees product development teams for core products and services enabling digital transformation, and acts as the principal cloud architect for cloud solutions. Mr. Pond has over 20 years of experience with software engineering and technology, with an emphasis on improving efficiency with digital transformation and process automation.		Session Recording	Materials	2023
Presentation Dose-Response Data Considerations for the NASA Quesst Community Test Campaign (Abstract) Key outcomes for NASA's Quesst mission are noise dose and perceptual response data to inform regulators on their decisions regarding noise certification standards for the future of overland commercial supersonic flight. Dose-response curves are commonly utilized in community noise studies to describe the annoyance of a community to a particular noise source. The X-59 aircraft utilizes shaped-boom technology to demonstrate low noise supersonic flight. For X-59 community studies, the sound level from X-59 overflights constitutes the dose, while the response is an annoyance rating selected from a verbal scale, e.g., “slightly annoyed” and “very annoyed.” Dose-response data will be collected from individual flyovers (single event dose) and an overall response to the accumulation of single events at the end of the day (cumulative dose). There are quantifiable sources of error in the noise dose due to uncertainty in microphone measurements of the sonic thumps and uncertainty in predicted noise levels at survey participant locations. Assessing and accounting for error in the noise dose is essential to obtain an accurate dose-response model. There is also a potential for error in the perceptual response. This error is due to the ability of participants to provide their response in a timely manner and participant fatigue after responding to up to one hundred surveys over the course of a month. This talk outlines various challenges in estimating noise dose and perceptual response and the methods considered in preparation for X-59 community tests. Session Files Session URLSession Recording	Aaron Vaughn Research Aerospace Engineer NASA Langley Research Center (bio) Aaron Vaughn works in the Structural Acoustics Branch at NASA Langley Research Center and is a member of the Community Test and Planning Execution team under the Commercial Supersonic Technology project. Primarily, Aaron researches statistical methods for modeling the dose-response relationship of boom level to community annoyance in preparation for upcoming X-59 community tests.	Presentation	Session Recording	Materials	2023
Presentation DOT&E Strategic Initiatives, Policy, and Emerging Technologies (SIPET) Mission Brief (Abstract) SIPET, established in 2021, is a deputate within the office of the Director, Operational Test and Evaluation (DOT&E). DOT&E created SIPET to codify and implement the Director’s strategic vision and keep pace with science and technology to modernize T&E tools, processes, infrastructure, and workforce. That is, The mission of SIPET is to drive continuous innovation to meet the T&E demands of the future; support the development of a workforce prepared to meet the toughest T&E challenges; and nurture a culture of information exchange across the enterprise and update policy and guidance. SIPET proactively identifies current and future operational test and evaluation needs, gaps, and potential solutions in coordination with the Services and agency operational test organizations. Collaborating with numerous stakeholders, SIPET develops and refines operational test policy guidance that support new test methodologies and technologies in the acquisition and test communities. SIPET, in collaboration with the T&E community, is leading the development of the 2022 DOT&E Strategy Update Implementation Plan (I-Plan). I-Plan initiatives include: Test the Way We Fight – Architect T&E around validated mission threads and demonstrate the operational performance of the Joint Force in multi-domain operations. Accelerate the Delivery of Weapons that Work – Embrace digital technologies to deliver high-quality systems at more dynamic rates. Increase the Survivability of DOD in Contested Environments – Identify, assess, and act on cyber, electromagnetic, spectrum, space, and other risks to DOD mission – at scale and at speed. Pioneer T&E of Weapons Systems Built to Change Over Time – Implement fluid and iterative T&E across the entire system lifecycle to help assure continued combat credibility as the system evolves to meet warfighter needs. Foster an Agile and Enduring T&E Enterprise Workforce – Centralize and leverage efforts to assess, curate, and engage T&E talent to quicken the pace of innovation across the T&E enterprise. Session Files Session URLSession Recording	Jeremy Werner Chief Scientist DOT&E (bio) Jeremy Werner, PhD, ST was appointed DOT&E’s Chief Scientist in December 2021 after initially starting at DOT&E as an Action Officer for Naval Warfare in August 2021. Before then, Jeremy was at Johns Hopkins University Applied Physics Laboratory (JHU/APL), where he founded a data science-oriented military operations research team that transformed the analytics of an ongoing military mission. Jeremy previously served as a Research Staff Member at the Institute for Defense Analyses where he supported DOT&E in the rigorous assessment of a variety of systems/platforms. Jeremy received a PhD in physics from Princeton University where he was an integral contributor to the Compact Muon Solenoid collaboration in the experimental discovery of the Higgs boson at the Large Hadron Collider at CERN, the European Organization for Nuclear Research in Geneva, Switzerland. Jeremy is a native Californian and received a bachelor’s degree in physics from the University of California, Los Angeles where he was the recipient of the E. Lee Kinsey Prize (most outstanding graduating senior in physics).	Presentation	Session Recording	Materials	2023
Effective Application of Self-Validated Ensemble Models in Challenging Test Scenarios (Abstract) We test the efficacy of SVEM versus alternative variable selection methods in a mixture experiment setting. These designs have built-in dependencies that require modifications of the typical design and analysis methods. The usual design metric of power is not helpful for these tests and analyzing results becomes quite challenging, particularly for factor characterization. We provide some guidance and lessons learned from hypersonic fuel formulation experience. We also show through simulation favorable combinations of design and Generalized Regression analysis options that lead to the best results. Specifically, we quantify the impact of changing run size, including complex design region constraints, using space-filling vs optimal designs, including replicates and/or center runs, and alternative analysis approaches to include full model, backward stepwise, SVEM forward selection, SVEM Lasso, and SVEM neural network. Session Files Session URLSession Recording	James Wisnowski Principal Consultant Adsurgo (bio) Dr. James Wisnowski is the co-founder of Adsurgo. He currently provides training and consulting services to industry and government in Reliability Engineering, Design of Experiments (DOE), and Applied Statistics. He retired as an Air Force officer with over 20 years of service as a commander, joint staff officer, Air Force Academy professor, and operational tester. He received his PhD in Industrial Engineering from Arizona State University and is currently a faculty member at the Colorado School of Mines Department of Mechanical Engineering teaching DOE. Some conference presentations and journal articles on applied statistics are shown at https://scholar.google.com/scholar?hl=en&as_sdt=0%2C44&q=james+wisnowski&btnG=		Session Recording	Materials	2023
Empirical Calibration for a Linearly Extrapolated Lower Tolerance Bound (Abstract) In many industries, the reliability of a product is often determined by a quantile of a distribution of a product's characteristics meeting a specified requirement. A typical approach to address this is to assume a distribution model and compute a one-sided confidence bound on the quantile. However, this can become difficult if the sample size is too small to reliably estimate a parametric model. Linear interpolation between order statistics is a viable nonparametric alternative if the sample size is sufficiently large. In most cases, linear extrapolation from the extreme order statistics can be used, but can result in inconsistent coverage. In this talk, we'll present an empirical study from our submitted manuscript used to generate calibrated weights for linear extrapolation that greatly improves the accuracy of the coverage across a feasible range of distribution families with positive support. We'll demonstrate this calibration technique using two examples from industry. Session Files Session URLSession Recording	Caleb King Research Statistician Developer JMP Statistical Discovery (bio) Dr. Caleb King is a Research Statistician Developer for the Design of Experiments platform in the JMP software. He's been responsible for developing the Sample Size Explorers suite of power and sample size explorers as well as the new MSA Design platform. Prior to joining JMP, he was a senior statistician at Sandia National Laboratories for 3 years, helping engineers design and analyze their experiments. He received his M.S. and Ph.D. in statistics from Virginia Tech, specializing in design and analysis for reliability.		Session Recording	Materials	2023
Energetic Defect Characterizations (Abstract) Energetic defect characterizations in munitions is a task requiring further refinement in military manufacturing processes. Convolutional neural networks (CNN) have shown promise in defect localization and segmentation in recent studies. These studies supplement that we may utilize a CNN architecture to localize casting defects in X-ray images. The U.S. Armament center has provided munition images for training to develop a system against MILSPEC requirements to identify and categorize defect munitions. In our approach, we utilize preprocessed munitions images and transfer learning from prior studies' model weights to compare the localization accuracy of this dataset for application in the field. Session Files Session URLSession Recording	Naomi Edegbe Cadet United States Military Academy (bio) Cadet Naomi Edegbe is a senior attending the United States Military Academy. As an Applied Statistics and Data Science Major, she enjoys proving a mathematical concept, wrangling data, and analyzing problems to answer questions. Her short-term professional goals include competing for a fellowship with the National GEM Consortium to obtain a master's degree in mathematical sciences or data science. After which, she plans to serve her six-year active-duty Army commitment in the Quartermaster Corps. Cadet Edegbe’s l professional goal is to produce meaningful research in STEM, either in application to relevant Army resourcing needs or as a separate track into the field of epidemiology within social frameworks.		Session Recording	Materials	2023
Presentation Estimating Sparsely and Irregularly Observed Multivariate Functional Data (Abstract) With the rise in availability of larger datasets, there is a growing need of tools and methods to help inform data-driven decisions. Data that vary over a continuum, such as time, exist in a wide array of fields, such as defense, finance, and medicine. One such class of methods that addresses data varying over a continuum is functional data analysis (FDA). FDA methods typically make three assumptions that are often violated in real datasets: all observations exist over the same continuum interval (such as a closed interval [a,b]), all observations are regularly and densely observed, and if the dataset consists of multiple covariates, the covariates are independent of one another. We look to address violation of the latter two assumptions. In this talk, we will discuss methods for analyzing functional data that are irregularly and sparsely observed, while also accounting for dependencies between covariates. These methods will be used to estimate the reconstruction of partially observed multivariate functional data that contain measurement errors. We will begin with a high-level introduction of FDA. Next, we will introduce functional principal components analysis (FPCA), which is a representation of functions that our estimation methods are based on. We will discuss a specific approach called principal components analysis through conditional expectation (PACE) (Yao et al, 2005), which computes the FPCA quantities for a sparsely or irregularly sampled function. The PACE method is a key component that allows us to estimate partially observed functions based on the available dataset. Finally, we will introduce multivariate functional principal components analysis (MFPCA) (Happ & Greven, 2018), which utilizes the FPCA representations of each covariate’s functions in order to compute a principal components representation that accounts for dependencies between covariates. We will illustrate these methods through implementation on simulated and real datasets. We will discuss our findings in terms of the accuracy of our estimates with regards to the amount and portions of a function that is observed, as well as the diversity of functional observations in the dataset. We will conclude our talk with discussion on future research directions. Session Files Session URLSession Recording	Maximillian Chen Senior Professional Staff Johns Hopkins University Applied Physics Laboratory (bio) Max Chen received his PhD from the Department of Statistical and Data Sciences at Cornell University. He previously worked as a senior member of technical staff at Sandia National Laboratories. Since December 2019, Max is a Senior Professional Staff member at the Johns Hopkins University Applied Physics Laboratory. He is interested in developing novel statistical methodologies in the areas of high-dimensional data analysis, dimension reduction and hypothesis testing methods for matrix- and tensor-variate data, functional data analysis, dependent data analysis, and data-driven uncertainty quantification.	Presentation	Session Recording	Materials	2023
Panel Featured Panel: AI Assurance Session Files Session Recording	Josh Poore Associate Research Scientist Applied Research Laboratory for Intelligence and Security, University of Maryland (bio) Dr. Joshua Poore is an Associate Research Scientist with the Applied Research Laboratory for Intelligence and Security (ARLIS) at the University of Maryland where he supports the Artificial Intelligence, Autonomy, and Augmentation (AAA) mission area. Previously, Dr. Poore was a Principal Senior Scientist and Technology Development Manager at BAE Systems, FAST Labs (6/2018-2/2021), and Principal Scientist at Draper (03/2011-06/2018). Across his industry career, Dr. Poore was Technical Director or Principal Investigator for a wide range of projects focused on human-system integration and human-AI interaction funded by DARPA, IARPA, AFRL, and other agencies. Dr. Poore’s primary research foci are: (1) the use of ubiquitous software and distributed information technology as measurement mediums for human performance; (2) knowledge management or information architectures for reasoning about human system integration within and across systems. Currently, this work serves ARLIS’ AAA test-bed for Test and Evaluation activities. Dr. Poore also leads the development of open source technology aligned with this research as a committer and Project Management Committee member for the Apache Software Foundation.	Panel	Session Recording		2023
Panel Featured Panel: AI Assurance Session Files Session Recording	Alec Banks Senior Principal Scientist Defence Science and Technology Laboratory (bio) Dr Alec Banks works as a Senior Principal Scientist at Defence Science and Technology Laboratory (Dstl). He has worked in defence engineering for over 30 years. His recent work has focused on the safety of software systems, including working as a software regulator for the UKs air platforms and developing software safety assurance for the UKs next generation submarines. In his regulatory capacity he has provided software certification assurance for platforms such as Scan Eagle, Watchkeeper and F-35 Lightning II. More recently, Alec has been the MOD lead on research in Test, Evaluation, Verification and Validation of Autonomy and AI-based systems; this has included revisions of the UK’s Defence Standard for software safety to facilitate greater use of models and simulations and the adoption of machine learning in higher-integrity applications.	Panel	Session Recording		2023
Panel Featured Panel: AI Assurance (Abstract) This panel discussion will bring together an international group of AI Assurance Case experts from Academia and Government labs to discuss the challenges and opportunities of applying assurance cases to AI-enabled systems. The panel will discuss how assurance cases apply to AI-enabled systems, pitfalls in developing assurance cases, including human-system integration into the assurance case, and communicating the results of an assurance case to non-technical audiences. This panel discussion will be of interest to anyone who is involved in the development or use of AI systems. It will provide insights into the challenges and opportunities of using assurance cases to provide justified confidence to all stakeholders, from the AI users and operators to executives and acquisition decision-makers. Session Files Session Recording	Laura Freeman Deputy Director Virginia Tech National Security Institute (bio) Dr. Laura Freeman is a Research Associate Professor of Statistics and the Deputy Director of the Virginia Tech National Security Institute. Her research leverages experimental methods for conducting research that brings together cyber-physical systems, data science, artificial intelligence (AI), and machine learning to address critical challenges in national security. She is also a hub faculty member in the Commonwealth Cyber Initiative and leads research in AI Assurance. She develops new methods for test and evaluation focusing on emerging system technology. She is also the Assistant Dean for Research for the College of Science, in that capacity she works to shape research directions and collaborations in across the College of Science. Previously, Dr. Freeman was the Assistant Director of the Operational Evaluation Division at the Institute for Defense Analyses. In that position, she established and developed an interdisciplinary analytical team of statisticians, psychologists, and engineers to advance scientific approaches to DoD test and evaluation. During 2018, Dr. Freeman served as that acting Senior Technical Advisor for Director Operational Test and Evaluation (DOT&E). As the Senior Technical Advisor, Dr. Freeman provided leadership, advice, and counsel to all personnel on technical aspects of testing military systems. She reviewed test strategies, plans, and reports from all systems on DOT&E oversight. Dr. Freeman has a B.S. in Aerospace Engineering, a M.S. in Statistics and a Ph.D. in Statistics, all from Virginia Tech. Her Ph.D. research was on design and analysis of experiments for reliability data.	Panel	Session Recording		2023
Panel Featured Panel: AI Assurance Session Files Session Recording	John Stogoski Senior Systems Engineer Software Engineering Institute, Carnegie Mellon University (bio) John Stogoski has been at Carnegie Mellon University’s Software Engineering Institute for 10 years including roles in the CERT and AI Divisions. He is currently a senior systems engineer working with DoD sponsors to research how artificial intelligence can be applied to increase capabilities and build the AI engineering discipline. In his previous role, he oversaw a prototyping lab focused on evaluating emerging technologies and design patterns for addressing cybersecurity operations at scale. John spent a significant portion of his career at a major telecommunications company where he served in director roles responsible for the security operations center and then establishing a homeland security office after the 9/11 attack. He worked with government and industry counterparts to advance policy and enhance our coordinated, operational capabilities to lessen impacts of future attacks or natural disaster events. Applying lessons from the maturing of the security field, along with considering the unique aspects of artificial intelligence, can help us enhance the system development lifecycle and realize the opportunities increasing our strategic advantage.	Panel	Session Recording		2023
Presentation Framework for Operational Test Design: An Example Application of Design Thinking (Abstract) Design thinking is a problem-solving approach that promotes the principles of human-centeredness, iteration, and diversity. The poster provides a five-step framework for how to incorporate these design principles when building an operational test. In the first step, test designers conduct research on test users and the problems they encounter. In the second step, designers articulate specific user needs to address in the test design. In the third step, designers generate multiple solutions to address user needs. In the forth step, designers create prototypes of their best solutions. In the fifth step, designers refine prototypes through user testing. Session Files	Miriam Armstrong Research Staff Member IDA (bio) Dr. Armstrong is a human factors researcher at IDA where she is involved in operational testing of defense systems. Her expertise includes interactions between humans and autonomous systems and psychometrics. She received her PhD in Human Factors Psychology from Texas Tech University in 2021.	Presentation			2023
Presentation Fully Bayesian Data Imputation using Stan Hamiltonian Monte Carlo (Abstract) When doing multivariate data analysis, one common obstacle is the presence of incomplete observations, i.e., observations for which one or more covariates are missing data. Rather than deleting entire observations that contain missing data, which can lead to small sample sizes and biased inferences, data imputation methods can be used to statistically “fill-in” missing data. Imputing data can help combat small sample sizes by using the existing information in partially complete observations with the end goal of producing less biased and higher confidence inferences. In aerospace applications, imputation of missing data is particularly relevant because sample sizes are small and quantifying uncertainty in the model is of utmost importance. In this paper, we outline the benefits of a fully Bayesian imputation approach which samples simultaneously from the joint posterior distribution of model parameters and the imputed values for the missing data. This approach is preferred over multiple imputation approaches because it performs the imputation and modeling steps in one step rather than two, making it more compatible with complex model forms. An example of this imputation approach is applied to the NASA Instrument Cost Model (NICM), a model used widely across NASA to estimate the cost of future spaceborne instruments. The example models are implemented in Stan, a statistical-modeling tool enabling Hamiltonian Monte Carlo (HMC). Session Files	Melissa Hooke Systems Engineer NASA Jet Propulsion Laboratory (bio) Melissa Hooke is a Systems Engineer at the Jet Propulsion Laboratory in the Systems Modeling, Analysis & Architectures group. She is the task manager for NASA's CubeSat or Microsat Probabilistic and Analogies Cost Tool (COMPACT) and the Analogy Software Cost Tool (ASCoT), and is the primary statistical model developer for the NASA Instrument Cost Model (NICM). Her areas of interest include Bayesian modeling, uncertainty quantification, and data visualization. Melissa was the recipient of the "Rising Star" Award at the NASA Cost Symposium in 2021. Melissa earned her B.A. in Mathematics and Statistics at Pomona College where she developed a Bayesian model for spacecraft safe mode events for her undergraduate thesis.	Presentation			2023
Presentation Gaps in DoD National Artificial Intelligence Test and Evaluation Infrastructure Capabilities (Abstract) Significant literature has been published in recent years calling for updated and new T&E infrastructure to allow for the credible, verifiable assessment of DoD AI-enabled capabilities (AIECs) . However, existing literature falls short in providing the detail necessary to justify investments in specific DoD Enterprise T&E infrastructure. The goal of this study was to collect data about current DoD programs with AIEC and corresponding T&E infrastructure to identify high priority investments by tracing AIECs to tailored, specific recommendations for enterprise AI T&E infrastructure. The study is divided into six bins of research, itemized below. This presentation provides an interim study update on the current state of programs with AIEC across DoD. Goals : State specific enterprise AI T&E normative goal(s) and timeline, including rationale. Demand: Generate T&E requirements and a demand forecast by querying existing and anticipated AI programs across the Department. Supply Baseline: Catalog existing and planned AI T&E activities and infrastructure at the enterprise, Service, and Program levels, including existing resourcing levels. Gaps: Identify specific gaps based on this supply and demand information. Responsibilities: Clarify roles and responsibilities across OSD T&E stakeholders. Actions: Identify differentiated lines of effort—including objectives, milestones, and cost estimates—that create a cohesive plan to achieve enterprise AI T&E goals. Session Files Session Recording	Brian Vickers CDAO AI Assurance IDA	Presentation	Session Recording		2023
I-TREE: a Tool for Characterizing Research Using Taxonomies (Abstract) IDA is developing a Data Strategy to develop solid infrastructures and practices that allow for a rigorous data-centric approach to answering U.S. security and science policy questions. The data strategy implements data governance and data architecture strategies to leverage data to gain trusted insights, and establishes a data-centric culture. One key component of the Data Strategy is a set of research taxonomies that describe and characterize the research done at IDA. These research taxonomies, broadly divided into six categories, are a vital tool to help IDA researchers gain insight into the research expertise of staff and divisions, in terms of the research products that are produced for our sponsors. We have developed an interactive web application which consumes numerous disparate sources of data related to these taxonomies, research products, researchers, and divisions, and unites them to create quantified analytics and visualizations to answer questions about research at IDA. This tool, titled I-TREE (IDA-Taxonomical Research Expertise Explorer), will enable staff to answer questions like ‘Who are the researchers most commonly producing products for a specified research area?’, ‘What is the research profile of a specified author?’, ‘What research topics are most commonly addressed by a specified division?’, ‘Who are the researchers most commonly producing products in a specified division?’, and ‘What divisions are producing products for a specified research topic?’. These are essential questions whose answers allow IDA to identify subject-matter expertise areas, methodologies, and key skills in response to sponsor requests, and to identify common areas of expertise to build a research team with a broad range of skills. I-TREE demonstrates the use of data science and data management techniques that enhance the company’s data strategy while actively enabling researchers and management to make informed decisions. Session Files Session URLSession Recording	Aayushi Verma Data Science Fellow IDA (bio) Aayushi Verma is a Data Science Fellow at the Institute for Defense Analyses. She supports the Chief Data Officer with the IDA Data Initiative Strategy by leveraging disparate sources of data to create applications and dashboards that help IDA staff. Her data science interests include data analysis, machine learning, artificial intelligence, and extracting stories from data. She has a B.Sc. (Hons.) in Astrophysics from the University of Canterbury, and is currently pursuing her M.S. in Data Science from Pace University.		Session Recording	Materials	2023
Implementing Fast Flexible Space Filling Designs In R (Abstract) Modeling and simulation (M&S) can be a useful tool for testers and evaluators when they need to augment the data collected during a test event. During the planning phase, testers use experimental design techniques to determine how much and which data to collect. When designing a test that involves M&S, testers can use Space-Filling Designs (SFD) to spread out points across the operational space. Fast Flexible Space-Filling Designs (FFSFD) are a type of SFD that are useful for M&S because they work well in nonrectangular design spaces and allow for the inclusion of categorical factors. Both of these are recurring features in defense testing. Guidance from the Deputy Secretary of Defense and the Director of Operational Test and Evaluation encourages the use of open and interoperable software and recommends the use of SFD. This project aims to address those. IDA analysts developed a function to create FFSFD using the free statistical software R. To our knowledge, there are no R packages for the creation of an FFSFD that could accommodate a variety of user inputs, such as categorical factors. Moreover, by using this function, users can share their code to make their work reproducible. This presentation starts with background information about M&S and, more specifically, SFD. The briefing uses a notional missile system example to explain FFSFD in more detail and show the FFSFD R function inputs and outputs. The briefing ends with a summary of the future work for this project. Session Files Session URLSession Recording	Christopher Dimapasok Summer Associate / Graduate Student IDA / Johns Hopkins University (bio) I graduated from UCLA in 2020 with a degree in Molecular Cell and Development Biology. Currently, I am a graduate student at Johns Hopkins University and also worked as a Summer Associate for IDA. I hope to leverage my multidisciplinary skills to make a long-lasting impact.		Session Recording	Materials	2023
Infusing Statistical Thinking into the NASA Quesst Community Test Campaign (Abstract) Statistical thinking permeates many important decisions as NASA plans its Quesst mission, which will culminate in a series of community overflights using the X-59 aircraft to demonstrate low-noise supersonic flight. Month-long longitudinal surveys will be deployed to assess human perception and annoyance to this new acoustic phenomenon. NASA works with a large contractor team to develop systems and methodologies to estimate noise doses, to test and field socio-acoustic surveys, and to study the relationship between the two quantities, dose and response, through appropriate choices of statistical models. This latter dose-response relationship will serve as an important tool as national and international noise regulators debate whether overland supersonic flights could be permitted once again within permissible noise limits. In this presentation we highlight several areas where statistical thinking has come into play, including issues of sampling, classification and data fusion, and analysis of longitudinal survey data that are subject to rare events and the consequences of measurement error. We note several operational constraints that shape the appeal or feasibility of some decisions on statistical approaches, and we identify several important remaining questions to be addressed. Session Files Session URLSession Recording	Nathan Cruze Statistician NASA Langley Research Center (bio) Dr. Nathan Cruze joined NASA Langley Research Center in 2021 as a statistician in the Engineering Integration Branch supporting the planning and execution of community testing during the Quesst mission. Prior to joining NASA, he served as a research mathematical statistician at USDA’s National Agricultural Statistics Service for more than eight years, where his work focused on improving crop and economic estimates programs by combining survey and auxiliary data through statistical modeling. His Ph.D. in Interdisciplinary Programs was co-directed by faculty from the statistics and chemical engineering departments at Ohio State University. He holds bachelor’s degrees in economics and mathematics and master’s degrees in economics and statistics, also from Ohio State University. Dr. Cruze currently co-chairs the Federal Committee on Statistical Methodology interest group on Computational Statistics and the Production of Official Statistics.		Session Recording	Materials	2023
Presentation Introducing Self-Validated Ensemble Models (SVEM) – Bringing Machine Learning to DOEs (Abstract) DOE methods have evolved over the years, as have the needs and expectations of experimenters. Historically, the focus emphasized separating effects to reduce bias in effect estimates and maximizing hypotheses testing power, which are largely a reflection of the methodological and computational tools of their time. Often DOE in industry is done to predict product or process behavior under possible changes. We introduce Self-Validating Ensemble Models (SVEM), an inherently predictive algorithmic approach to the analysis of DOEs, generalizing the fractional bootstrap to make machine learning and bagging possible for small datasets common in DOE. In many DOE applications the number of rows is small, and the factor layout is carefully structured to maximize information gain in the experiment. Applying machine learning methods to DOE is generally avoided because they begin with a partitioning the rows into a training set for model fitting and a holdout set for model selection. This alters the structure of the design in undesirable ways such as randomly introducing effect aliasing. SVEM avoids this problem by using a variation of the fractionally weighted bootstrap to create training and validation versions of the complete data that differ only in how rows are weighted. The weights are reinitialized, and models refit multiple times so that our final SVEM model is a model average, much like bagging. We find this allows us to fit models where the number of estimated effects exceeds the number of rows. We will present simulation results showing that in these supersaturated cases SVEM outperforms existing approaches like forward selection as measured by prediction accuracy. Session Files Session URLSession Recording	Chris Gotwalt JMP Chief Data Scientist JMP Statistical Discovery (bio) Chris joined JMP in 2001 while obtaining his Ph.D. in Statistics at NCSU. Chris has made many contributions to JMP, mostly in the form of the computational algorithms that fit models or design experiments. He developed JMP’s algorithms for fitting neural networks, mixed models, structural equations models, text analysis, and many more. Chris leads a team of 20 statistical software developers, testers, and technical writers. Chris was the 2020 Chair of the Quality and Productivity Section of the American Statistical Association and has held adjunct professor appointments at Univ. of Nebraska, Univ. of New Hampshire, and NCSU, guiding dissertation research into generalized linear mixed models, extending machine learning techniques designed experiments, and machine learning based imputation strategies.	Presentation	Session Recording	Materials	2023
Presentation Introducing TestScience.org (Abstract) The Test Science Team facilitates data-driven decision-making by disseminating various testing and analysis methodologies. One way they disseminate these methodologies is through the annual workshop, DATAWorks; another way is through the website, TestScience.org. The Test Science website includes video training, interactive tools, a related research library as well as the DATAWorks Archive. "Introducing TestScience.org", a presentation at DATAWorks, could include a poster and an interactive guided session through the site content. The presentation would inform interested DATAWorks attendees of the additional resources throughout the year. It could also be used to inform the audience about ways to participate, such as contributing interactive Shiny tools, training content, or research. "Introducing TestScience.org" would highlight the following sections of the website: 1. The DATAWorks Archives 2. Learn (Video Training) 3. Tools (Interactive Tools) 5. Research (Library) 6. Team (About and Contact) Incorporating into DATAWorks an introduction to TestScience.org would inform attendees of additional valuable resources available to them, and could encourage broader participation in Testscience.org, adding value to both the DATAWorks attendees and the TestScience.org efforts. Session Files	Sean Fiorito Contractor IDA / V-Strat, LLC (bio) Mr. Fiorito has been a contractor for the Institute for Defense Analyses (IDA) since 2015. He was a part of the original team to design and develop both the DATAWorks and Test Science team websites. He has expertise in application development, integrated systems, cloud architecture and cloud adoption. Mr. Fiorito started his Federal IT career in 2004 with Booz Allen Hamilton. Since then he's worked with other Federal IT contract firms, both large (Deloitte, Accenture) and small (Fila, Dynamo). He has contributed to projects such as the Coast Guard's Rescue 21, the Forest Service's Electronic Management of NEPA (eMNEPA) and Federal Student Aide's Enterprise Cloud Migration. He holds a BS in Information Systems with a concentration in programming, as well as an Amazon Web Services Cloud Architect certification.	Presentation			2023
Introduction to Design of Experiments in R: Generating and Evaluating Designs with skpr (Abstract) The Department of Defense requires rigorous testing to support the evaluation of effectiveness and suitability of oversight acquisition programs. These tests are performed in a resource constrained environment and must be carefully designed to efficiently use those resources. The field of Design of Experiments (DOE) provides methods for testers to generate optimal experimental designs taking these constraints into account, and computational tools in DOE can support this process by enabling analysts to create designs tailored specifically for their test program. In this tutorial, I will show how you can run these types of analyses using “skpr”: a free and open source R package developed by researchers at IDA for generating and evaluating optimal experimental designs. This software package allows you to perform DOE analyses entirely in code; rather than using a graphical user interface to generate and evaluate individual designs one-by-one, this tutorial will demonstrate how an analyst can use “skpr” to automate the creation of a variety of different designs using a short and simple R script. Attendees will learn the basics of using the R programming language and how to generate, save, and share their designs. Additionally, “skpr” provides a straightforward interface to calculate statistical power. Attendees will learn how to use built-in parametric and Monte Carlo power evaluation functions to compute power for a variety of models and responses, including linear models, split-plot designs, blocked designs, generalized linear models (including logistic regression), and survival models. Finally, I will demonstrate how you can conduct an end-to-end DOE analysis entirely in R, showing how to generate power versus sample size plots and other design diagnostics to help you design an experiment that meets your program's needs. Session Files Session URLSession Recording	Tyler Morgan-Wall Research Staff Member IDA (bio) Dr. Tyler Morgan-Wall is a Research Staff Member at the Institute for Defense Analyses, and is the developer of the software library skpr: a package developed at IDA for optimal design generation and power evaluation in R. He is also the author of several other R packages for data visualization, mapping, and cartography. He has a PhD in Physics from Johns Hopkins University and lives in Silver Spring, MD.		Session Recording	Materials	2023
Short Course Introduction to Machine Learning (Abstract) Machine learning (ML) teaches computer systems through data or experience and can generally be divided into three broad branches: supervised learning, unsupervised learning, and reinforcement learning. The objective of this course is to provide attendees with 1) an introduction to ML methods, 2) insights into best practices, and 3) a survey of limitations to existing ML methods that are leading to new areas of research. This introduction to machine learning course will cover a wide range of topics including regression, classification, clustering, feature selection, exploratory data analysis, reinforcement learning, transfer learning, and active learning. This course will be taught through a series of lectures followed by demonstrations on open-source data sets using Jupyter Notebooks and Python. Session Files Session Recording Session Recording part2	Stephen Adams Associate Research Professor Virginia Tech National Security Institute (bio) Stephen Adams is an Associate Research Professor in the Virginia Tech National Security Institute. He received a M.S. in Statistics from the University of Virginia (UVA) in 2010 and a Ph.D. from UVA in Systems Engineering in December of 2015. His research focuses on applications of machine learning and artificial intelligence in real-world systems. He has experience developing and implementing numerous types of machine learning and artificial intelligence algorithms. His research interests include feature selection, machine learning with cost, transfer learning, reinforcement learning, and probabilistic modeling of systems. His research has been applied to several domains including activity recognition, prognostics and health management, psychology, cybersecurity, data trustworthiness, natural language processing, and predictive modeling of destination given user geo-information data.	Short Course	Session Recording Session Recording part2		2023
Presentation Large-scale cross-validated Gaussian processes for efficient multi-purpose emulators (Abstract) We describe recent advances in Gaussian process emulation, which allow us to both save computation time and to apply inference algorithms that previously were too expensive for operational use. Specific examples are given from the Earth-orbiting Orbiting Carbon Observatory and the future Surface Biology and Geology Missions, dynamical systems, and other applications. While Gaussian processes are a well-studied field, there are surprisingly important choices that the community has not paid so much attention to this far, including dimension reduction, kernel parameterization, and objective function selection. This talk will highlight some of those choices and help understand what practical implications they have. Session Files	Jouni Susiluoto Data Scientist NASA Jet Propulsion Laboratory, California Institute of Technology (bio) Dr. Jouni Susiluoto is a Data Scientist at NASA Jet Propulsion Laboratory in Pasadena, California. His main research focus has recently been in inversion algorithms and forward model improvements for current and next-generation hyperspectral imagers, such as AVIRIS-NG, EMIT, and SBG. This research heavily leans on new developments in high-efficiency cross-validated Gaussian process techniques, a research topic that he has closely pursued together with Prof. Houman Owhadi's group at Caltech. Susiluoto's previous work includes a wide range of data science, uncertainty quantification and modeling applications in geosciences, such as spatio-temporal data fusion with very large numbers of data, Bayesian model selection, chaotic model analysis and parameter estimation, and climate and carbon cycle modeling. He has a doctorate in mathematics from University of Helsinki, Finland.	Presentation			2023
Model Validation Levels for Model Authority Quantification (Abstract) Due to the increased use of Modeling & Simulation (M&S) in the development of Department of Defense (DOD) weapon systems, it is critical to assign models appropriate levels of trust. Validation is an assessment process that can help mitigate the risks posed by relying on potentially inaccurate, insufficient, or incorrect models. However, validation criteria are often subjective, and inconsistently applied between differing models. Current Practice fails to reassess models as requirements change, mission scope is redefined, new data is collected, or models are adapted to a new use. This brief will present Model Validation Levels (MVLs) as a validation paradigm that enables rigorous, objective validation of a model and yields metrics that quantify the amount of trust that can be placed in a model. This validation framework will be demonstrated through a real-world example detailing the construction and interpretation of MVLs. Session Files Session URLSession Recording	Kyle Provost Senior Statistician STAT COE (bio) Kyle is a STAT Expert (Huntington Ingalls Industries contractor) at the Scientific Test and Analysis Techniques (STAT) Center of Excellence (COE) at the Air Force Institute of Technology (AFIT). The STAT COE provides independent STAT consultation to designated acquisition programs and special projects to improve Test & Evaluation (T&E) rigor, effectiveness, and efficiency. He received his M.S. in Applied Statistics from Wright State University.		Session Recording	Materials	2023
Model Verification in a Digital Engineering Environment: An Operational Test Perspective (Abstract) As the Department of Defense adopts digital engineering strategies for acquisition systems in development, programs are embracing the use highly federated models to assess the end-to-end performance of weapon systems, to include the threat environment. Often, due to resource limitations or political constraints, there is limited live data with which to validate the end-to-end performance of these models. In these cases, careful verification of the model, including from an operational factor-space perspective, early in model development can assist testers in prioritizing resources for model validation in later system development. This presentation will discuss how using Design of Experiments to assess the operational factor space can shape model verification efforts and provide data for model validation focused on the end-to-end performance of the system. Session Files Session URLSession Recording	Jo Anna Capp Research Staff Member IDA (bio) Jo Anna Capp is a Research Staff Member in the Operational Evaluation Division of IDA’s Systems and Analyses Center. She supports the Director, Operational Test and Evaluation (DOT&E) in the test and evaluation oversight of nuclear acquisition programs for the Department of Defense. Jo Anna joined IDA in 2017, and has worked on space and missile systems during her tenure. She is an expert in operational test and evaluation of nuclear weapon systems and in the use of statistical and machine learning techniques to derive insight into the performance of these and other acquisition systems. Jo Anna holds a doctorate in biochemistry from Duke University and a bachelor’s degree in cell and molecular biology from Florida Gulf Coast University.		Session Recording	Materials	2023
Presentation Multimodal Data Fusion: Enhancing Image Classification with Text (Abstract) Image classification is a critical part of gathering information on high-value targets. To this end, Convolutional Neural Networks (CNN) have become the standard model for image and facial classification. However, CNNs alone are not entirely effective at image classification, and especially human classification due to their lack of robustness and bias. Recent advances in CNNs, however, allow for data fusion to help reduce the uncertainty in their predictions. In this project, we describe a multimodal algorithm designed to increase confidence in image classification with the use of a joint fusion model with image and text data. Our work utilizes CNNs for image classification and bag-of-words for text categorization on Wikipedia images and captions relating to the same classes as the CIFAR-100 dataset. Using data fusion, we combine the vectors of the CNN and bag-of-words models and utilize a fully connected network on the joined data. We measure improvements by comparing the SoftMax layer for the joint fusion model and image-only CNN. Session Files	Jack Perreault Cadet United States Military Academy (bio) CDT Jack Perreault is a senior at the United States Military Academy majoring in Applied Statistics and Data Science and will commission as a Signal officer upon graduation. He hopes to pursue a Master of Science in Data Science through a technical scholarship. Within the Army, CDT Perreault plans to work in the 528th Special Operations Sustainment Brigade at Fort Bragg, North Carolina before transitioning to the Operations Research and Systems Analysis career field where he can conduct data-driven analysis that affects the operational and strategic decisions of the Army. CDT Perreault hopes to return to the United States Military Academy as an instructor within the Math Department where he can teach and inspire future cadets before transitioning to civilian sector. His current research is centered around analyzing how the use of a multimodal data fusion algorithm can leverage both images and accompanying text to enhance image classification. His prior research involves predictive modeling by analyzing role of public perception of the Vice President and its impact on presidential elections. CDT Perreault is a member of West Point’s track and field team and enjoys going to the beach while at home in Rhode Island.	Presentation			2023
Presentation NASEM Range Capabilities Study and T&E of Multi-Domain Operations (Abstract) The future viability of DoD’s range enterprise depends on addressing dramatic changes in technology, rapid advances in adversary military capabilities, and the evolving approach the United States will take to closing kill chains in a Joint All Domain Operations environment. This recognition led DoD’s former Director of Operational Test and Evaluation (OT&E), the Honorable Robert Behler, to request that the National Academies of Science, Engineering and Medicine examine the physical and technical suitability of DoD’s ranges and infrastructure through 2035. The first half of this presentation will cover the highlights and key recommendations of this study, to include the need to create the “TestDevOps” digital infrastructure for future operational test and seamless range enterprise interoperability. The second half of this presentation looks at the legacy frameworks for the relationships of physical and virtual test capabilities, and how those frameworks are becoming outdated. This briefing explores proposals on how the interaction of operations, physical test capabilities, and virtual test capabilities need to evolve to support new paradigms of the rapidly evolving technologies and changing nature of multi-domain operations. Session Files Session URLSession Recording	Hans Miller Department Chief Engineer MITRE (bio) Hans Miller, Col USAF (ret), is a Chief Engineer for Research and Advanced Capabilities department at the MITRE Corporation. He retired with over 25 years of experience in combat operations, experimental flight test, international partnering, command and control, policy, and strategic planning of defense weapon systems. His last assignment was as Division Chief of the Policy, Programs and Resources Division, Headquarters Air Force Test and Evaluation Directorate at the Pentagon. He led a team responsible for Test and Evaluation policy throughout the Air Force, coordination with OSD and Joint Service counterparts, and staff oversight across the spectrum of all Air Force acquisition programs. Prior to that assignment, he was the Commander of the 96th Test Group, Holloman AFB, NM. The 96th Test Group conducted avionics and weapon systems flight tests, inertial navigation and Global Positioning System tests, high-speed test track operations and radar cross section tests necessary to keep joint weapon systems ready for war. Hans Miller was commissioned as a graduate of the USAF Academy. He has served as an operational and experimental flight test pilot in the B-1B and as an F-16 chase pilot. He flew combat missions in the B-1B in Operation Allied Force and Operation Enduring Freedom. He served as an Exercise Planning Officer at the NATO Joint Warfare Center, Stavanger, Norway. Col (ret) Miller was the Squadron Commander of the Global Power Bomber Combined Test Force coordinating ground and flight test activities on the B-1, B-2 and B-52. He served as the Director, Comparative Technology Office, within the Office of the Secretary of Defense. He managed the Department’s Foreign Comparative Testing, and Rapid Innovation Fund programs. Hans Miller is a Command Pilot with over 2100 hours in 35 different aircraft types. He is a Department of Defense Acquisition Corps member and holds Level 3 certification in Test and Evaluation. He is a graduate of the USAF Weapons School, USAF Test Pilot School, Air Command and Staff College and Air War College. He holds a bachelor’s degree in Aeronautical Engineering and a master’s degree in Aeronautical and Astronautical engineering from Stanford University.	Presentation	Session Recording	Materials	2023
Neural Networks for Quantitative Resilience Prediction (Abstract) System resilience is the ability of a system to survive and recover from disruptive events, which finds applications in several engineering domains, such as cyber-physical systems and infrastructure. Most studies emphasize resilience metrics to quantify system performance, whereas more recent studies propose resilience models to project system recovery time after degradation using traditional statistical modeling approaches. Moreover, past studies are either performed on data after recovering or limited to idealized trends. Therefore, this talk considers alternative machine learning approaches such as (i) Artificial Neural Networks (ANN), (ii) Recurrent Neural Networks, and (iii) Long-Short Term Memory (LSTM) to model and predict system performance of alternative trends other than ones previously considered. These approaches include negative and positive factors driving resilience to understand and precisely quantify the impact of disruptive events and restorative activities. A hybrid feature selection approach is also applied to identify the most relevant covariates. Goodness of fit measures are calculated to evaluate the models, including (i) mean squared error, (ii) predictive-ratio risk, (iii) and adjusted R squared. The results indicate that LSTM models outperform ANN and RNN models requiring fewer neurons in the hidden layer in most of the data sets considered. In many cases, ANN models performed better than RNNs but required more time to be trained. These results suggest that neural network models for predictive resilience are both feasible and accurate relative to traditional statistical methods and may find practical use in many important domains. Session Files Session URLSession Recording	Karen Alves da Mata Master Student University of Massachusetts Dartmouth (bio) Karen da Mata is a Master's Student in the Electrical and Computer Engineering Department at the University of Massachusetts - Dartmouth. She completed her undergraduate studies in the Electrical Engineering Department at the Federal University of Ouro Preto - Brazil - in 2018.		Session Recording	Materials	2023
Presentation Novelty Detection in Network Traffic: Using Survival Analysis for Feature Identification (Abstract) Over the past decade, Intrusion Detection Systems have become an important component of many organizations’ cyber defense and resiliency strategies. However, one of the greatest downsides of these systems is their reliance on known attack signatures for successful detection of malicious network events. When it comes to unknown attack types and zero-day exploits, modern Intrusion Detection Systems often fall short. Since machine learning algorithms for event classification are widely used in this realm, it is imperative to analyze the characteristics of network traffic that can lead to novelty detection using such classifiers. In this talk, we introduce a novel approach to identifying network traffic features that influence novelty detection based on survival analysis techniques. Specifically, we combine several Cox proportional hazards models to predict which features of a network flow are most indicative of a novel network attack and likely to confuse the classifier as a result. We also implement Kaplan-Meier estimates to predict the probability that a classifier identifies novelty after the injection of an unknown network attack at any given time. The proposed model is successful at pinpointing PSH Flag Count, ACK Flag Count, URG Flag Count, and Down/Up Ratio as the main features to impact novelty detection via Random Forest, Bayesian Ridge, and Linear SVR classifiers. Session Files	Elie Alhajjar Senior Research Scientist USMA (bio) Dr. Elie Alhajjar is a senior research scientist at the Army Cyber Institute and jointly an Associate Professor in the Department of Mathematical Sciences at the United States Military Academy in West Point, NY, where he teaches and mentors cadets from all academic disciplines. His work is supported by grants from NSF, NIH, NSA, and ARL and he was recently named the Dean's Fellow for research. His research interests include mathematical modeling, machine learning and network analysis, from a cybersecurity viewpoint. He has presented his research work in international meetings in North America, Europe, and Asia. He is a recipient of the Civilian Service Achievement Medal, the NSF Trusted CI Open Science Cybersecurity Fellowship, the Day One Technology Policy Fellowship, and the SIAM Science Policy Fellowship. He holds a Master of Science and a PhD in mathematics from George Mason University, as well as master’s and bachelor’s degrees from Notre Dame University.	Presentation			2023
On the Validation of Statistical Software (Abstract) Validating statistical software involves a variety of challenges. Of these, the most difficult is the selection of an effective set of test cases, sometimes referred to as the “test case selection problem”. To further complicate matters, for many statistical applications, development and validation are done by individuals who often have limited time to validate their application and may not have formal training in software validation techniques. As a result, it is imperative that the adopted validation method is efficient, as well as effective, and it should also be one that can be easily understood by individuals not trained in software validation techniques. As it turns out, the test case selection problem can be thought of as a design of experiments (DOE) problem. This talk discusses how familiar DOE principles can be applied to validating statistical software. Session Files Session URLSession Recording	Ryan Lekivetz Manager, Advanced Analytics R&D JMP Statistical Discovery (bio) Ryan Lekivetz is the manager of the Design of Experiments (DOE) and Reliability team that develops those platforms in JMP. He earned his doctorate in statistics from Simon Fraser University in Burnaby, BC, Canada, and has publications related to topics in DOE in peer-reviewed journals. He looks for ways to apply DOE in other disciplines and even his everyday life.		Session Recording	Materials	2023
Opening Remarks Session Files Session Recording	General Norty Schwartz President U.S. Air Force, retired / Institute for Defense Analyses (bio) Norton A. Schwartz serves as President of the Institute for Defense Analyses (IDA), a nonprofit corporation operating in the public interest. IDA manages three Federally Funded Research and Development Centers that answer the most challenging U.S. security and science policy questions with objective analysis leveraging extraordinary scientific, technical, and analytic expertise. At IDA, General Schwartz (U.S. Air Force, retired) directs the activities of more than 1,000 scientists and technologists employed by IDA. General Schwartz has a long and prestigious career of service and leadership that spans over 5 decades. He was most recently President and CEO of Business Executives for National Security (BENS). During his 6-year tenure at BENS, he was also a member of IDA’s Board of Trustees. Prior to retiring from the U.S. Air Force, General Schwartz served as the 19th Chief of Staff of the U.S. Air Force from 2008 to 2012. He previously held senior joint positions as Director of the Joint Staff and as the Commander of the U.S. Transportation Command. He began his service as a pilot with the airlift evacuation out of Vietnam in 1975. General Schwartz is a U.S. Air Force Academy graduate and holds a master’s degree in business administration from Central Michigan University. He is also an alumnus of the Armed Forces Staff College and the National War College. He is a member of the Council on Foreign Relations and a 1994 Fellow of Massachusetts Institute of Technology’s Seminar XXI. General Schwartz has been married to Suzie since 1981.		Session Recording		2023
Opening Remarks Session Files Session Recording	The Honorable Nickolas H. Guertin Director, Operational Test & Evaluation OSD/DOT&E (bio) Nickolas H. Guertin was sworn in as Director, Operational Test and Evaluation (DOT&E) on December 20, 2021. A Presidential appointee confirmed by the United States Senate, he serves as the senior advisor to the Secretary of Defense on operational and live fire test and evaluation of Department of Defense weapon systems. Mr. Guertin has an extensive four-decade combined military and civilian career in submarine operations; ship construction and maintenance; development and testing of weapons, sensors, combat management products including the improvement of systems engineering; and defense acquisition. Most recently, he has performed applied research for government and academia in software-reliant and cyber-physical systems at Carnegie Mellon University’s Software Engineering Institute. Over his career, he has led organizational transformation, improved competition, and increased application of modular open-system approaches, prototyping, and experimentation. He has also researched and published extensively on software-reliant system design, testing, and acquisition. He received a Bachelor of Science in Mechanical Engineering from the University of Washington and an MBA from Bryant University. He is a retired Navy Reserve Engineering Duty Officer, was Defense Acquisition Workforce Improvement Act (DAWIA) certified in Program Management and Engineering, and is also a licensed Professional Engineer (Mechanical). Mr. Guertin is involved with his community as an Assistant Scoutmaster and Merit Badge Counselor for two local Boy Scouts of America troops, and is an avid amateur musician. He is a native of Connecticut and now resides in Virginia with his wife and twin children		Session Recording		2023
Opening Remarks Session Files Session Recording	Bram Lillard Director, Operational Evaluation Division IDA (bio) V. Bram Lillard assumed the role of Director of the Operational Evaluation Division (OED) in early 2022. In this position, Bram provides strategic leadership, project oversight, and direction for the division’s research program, which primarily supports the Director, Operational Test and Evaluation (DOT&E) within the Office of the Secretary of Defense. He also oversees OED’s contributions to strategic studies, weapon system sustainment analyses, and cybersecurity evaluations for DOD and anti-terrorism technology evaluations for the Department of Homeland Security. Bram joined IDA in 2004 as a member of the research staff. In 2013-14, he was the acting science advisor to DOT&E. He then served as OED’s assistant director in 2014-21, ascending to deputy director in late 2021. Prior to his current position, Bram was embedded in the Pentagon where he led IDA’s analytical support to the Cost Assessment and Program Evaluation office within the Office of the Secretary of Defense. He previously led OED’s Naval Warfare Group in support of DOT&E. In his early years at IDA, Bram was the submarine warfare project lead for DOT&E programs. He is an expert in quantitative data analysis methods, test design, naval warfare systems and operations and sustainment analyses for Defense Department weapon systems. Bram has both a doctorate and a master’s degree in physics from the University of Maryland. He earned his bachelor’s degree in physics and mathematics from State University of New York at Geneseo. Bram is also a graduate of the Harvard Kennedy School’s Senior Executives in National and International Security program, and he was awarded IDA’s prestigious Goodpaster Award for Excellence in Research in 2017.		Session Recording		2023
Optimal Release Policy for Covariate Software Reliability Models. (Abstract) The optimal time to release a software is a common problem of broad concern to software engineers, where the goal is to minimize cost by balancing the cost of fixing defects before or after release as well as the cost of testing. However, the vast majority of these models are based on defect discovery models that are a function of time and can therefore only provide guidance on the amount of additional effort required. To overcome this limitation, this paper presents a software optimal release model based on cost criteria, incorporating the covariate software defect detection model based on the Discrete Cox Proportional Hazards Model. The proposed model provides more detailed guidance recommending the amount of each distinct test activity performed to discover defects. Our results indicate that the approach can be utilized to allocate effort among alternative test activities in order to minimize cost. Session Files Session URLSession Recording	Ebenezer Yawlui Master's Student University of Massachusetts Dartmouth (bio) Ebenezer Yawlui is a MS student in the Department of Electrical & Computer Engineering at the University of Massachusetts Dartmouth (UMassD). He received his BS (2020) in Electrical Engineering from Regional Maritime University, Ghana.		Session Recording	Materials	2023
Presentation Overarching Tracker of DOT&E Actions (Abstract) OED’s Overarching Tracker of DOT&E Actions distills information from DOT&E’s operational test reports and memoranda on test plan and test strategy approvals to generate informative metrics on the office’s activities. In FY22, DOT&E actions covered 68 test plans, 28 strategies, and 28 reports, relating to 74 distinct programs. This poster presents data from those documents and highlights findings on DOT&E’s effectiveness, suitability, and survivability determinations and other topics related to the state of T&E. Session Files	Buck Thome Research Staff Member IDA (bio) Dr. Thome is a member of the research staff at Institute for Defense Analyses, focusing on test and evaluation of net-centric systems and cybersecurity. He received his PhD in Experimental High Energy Physics from Carnegie Mellon University in 2011. After working with a small business defense contractor developing radio frequency sensor systems, he came to IDA in 2013.	Presentation			2023
Perspectives on T&E of ML for Assuring Reliability in Safety-Critical Applications Session Files Session Recording	Pradeep Ramuhalli Oak Ridge National Laboratory		Session Recording		2023
Presentation Perspectives on T&E of ML for Assuring Reliability in Safety-Critical Applications (Abstract) Artificial intelligence (AI) and Machine Learning (ML) are increasingly being examined for their utility in many domains. AI/ML solutions are being proposed for a broad set of applications, including surrogate modeling, anomaly detection, classification, image segmentation, control, etc. A lot of effort is being put into evaluating these solutions for robustness, especially in the context of safety critical applications. While traditional methods of verification and validation continue to be necessary, challenges exist in many safety critical applications given the limited ability to gather data covering all possible conditions, and limited ability to conduct experiments. This presentation will discuss potential approaches for testing and evaluating machine learning algorithms in such applications, as well as metrics for this purpose. Session Files Session Recording	Pradeep Ramuhalli Group Lead Oak Ridge National Laboratory (bio) Dr. Pradeep Ramuhalli is a group lead for the Modern Nuclear Instrumentation and Controls group and a Distinguished R&D Scientist at Oak Ridge National Laboratory (ORNL). He leads a group with experience in measurement and data science applications to a variety of complex engineered systems. His research focus is on the development of sensor technologies for extreme environment and the integration of data from these sensors with data analytics technologies for prognostic health management and operational decision making. He also leads research on ML for enabling robust engineered systems (AIRES - AI for Robust Engineering and Science) as part of an internal research initiative at ORNL.	Presentation	Session Recording		2023
Planning for Public Sector Test and Evaluation in the Commercial Cloud (Abstract) As the public sector shifts IT infrastructure toward commercial cloud solutions, the government test community needs to adjust its test and evaluation (T&E) methods to provide useful insights into a cloud-hosted system’s cyber posture. Government entities must protect what they develop in the cloud by enforcing strict access controls and deploying securely configured virtual assets. However, publicly available research shows that doing so effectively is difficult, with accidental misconfigurations leading to the most commonly observed exploitations of cloud-hosted systems. Unique deployment configurations and identity and access management across different cloud service providers increases the burden of knowledge on testers. More care must be taken during the T&E planning process to ensure that test teams are poised to succeed in understanding the cyber posture of cloud-hosted systems and finding any vulnerabilities present in those systems. The T&E community must adapt to this new paradigm of cloud-hosted systems to ensure that vulnerabilities are discovered and mitigated before an adversary has the opportunity to use those vulnerabilities against the system. Session Files Session URLSession Recording	Brian Conway Research Staff Member IDA (bio) Lee Allison received his Ph.D. in experimental nuclear physics from Old Dominion University in 2017 studying a specialized particle detector system. Lee is now a Research Staff Member at the Institute for Defense Analyses in the cyber operational testing group where he focuses mainly on Naval and Land Warfare platform-level cyber survivability testing. Lee has also helped to build one of IDA’s cyber lab environments that both IDA staff members, DOT&E staff, and the OT community can use to better understand cyber survivability test and evaluation. Brian Conway holds a B.S. from the University of Notre Dame and a Ph.D. from Pennsylvania State University, where he studied solvation dynamics in ionic liquid mixtures with conventional solvents. He joined the Institute for Defense Analyses in 2019, and has since supported operational testing in both the Departments of Defense and Homeland Security. There, he focuses on cyber testing of Naval and Net-Centric and Space Systems to evaluate whether adversaries have the ability to exploit those systems and effect the missions executed by warfighters.		Session Recording	Materials	2023
Short Course Plotting and Programming in Python (Abstract) Plotting and Programming in Python is an introductory Python lesson offered by Software Carpentry. This workshop covers data analysis and visualization in Python, focusing on working with core data structures (including tabular data), using conditionals and loops, writing custom functions, and creating customized plots. This workshop also introduces learners to JupyterLab and strategies for getting help. This workshop is appropriate for learners with no previous programming experience. Session Files	Chasz Griego and Elif Dede Yildirim The Carpentries (bio) Chasz Griego is an Open Science Postdoctoral Associate at the Carnegie Mellon University (CMU) Libraries. He received a PhD in Chemical Engineering from the University of Pittsburgh studying computational models to accelerate catalyst material discovery. He leads and supports Open Science teaching and research initiatives, particularly in the areas of reproducibility in computational research. His research involves investigating how open tools help promote reproducibility with computational research. He supports students and researchers at CMU with Python programming for data science applications, literate programming with Jupyter Notebooks, and version control with Git/GitHub. Elif Dede Yildirim is a data scientist within the Office of Data and Analytics at All of US Program, NIH. She leads the data quality workstream and support demo and driver projects. She holds MS degrees in Statistics and Child Development, and my PhD in Child Development from Syracuse University. She completed her postdoctoral work at the University of Missouri-Columbia and held a faculty appointment at Auburn University, where she taught graduate-level method and stats courses and provided statistical consulting. She is currently pursuing her second undergraduate degree in Computer Science at Auburn, and plan to graduate in December 2023.	Short Course			2023
Presentation Post-hoc UQ of Deep Learning Models Applied to Remote Sensing Image Scene Classification (Abstract) Post-hoc Uncertainty Quantification of Deep Learning Models Applied to Remote Sensing Image Scene Classification Steadily growing quantities of high-resolution UAV, aerial, and satellite imagery provide an exciting opportunity for global transparency and geographic profiling of activities of interest. Advances in deep learning, such as deep convolutional neural networks (CNNs) and transformer models, offer more efficient ways to exploit remote sensing imagery. Transformers, in particular, are capable of capturing contextual dependencies in the data. Accounting for context is important because activities of interest are often interdependent and reveal themselves in co-occurrence of related image objects or related signatures. However, while transformers and CNNs are powerful models, their predictions are often taken as point estimates, also known as pseudo probabilities, as they are computed by the softmax function. They do not provide information about how confident the model is in its predictions, which is important information in many mission-critical applications, and therefore limits their use in this space. Model evaluation metrics can provide information about the predictive model’s performance. We present and discuss results of post-hoc uncertainty quantification (UQ) of deep learning models, i.e., UQ application to trained models. We consider an application of CNN and transformer models to remote sensing image scene classification using satellite imagery, and compare confidence estimates of scene classification predictions of these models using evaluation metrics, such as expected calibration error, reliability diagram, and Brier score, in addition to conventional metrics, e.g. accuracy and F1 score. For validation, we use the publicly available and well-characterized Remote Sensing Image Scene Classification (RESISC45) dataset, which contains 31,500 images, covering 45 scene categories with 700 images in each category, and with the spatial resolution that varies from 30 to 0.2 m per pixel. This dataset was collected over different locations and under different conditions and possesses rich variations in translation, viewpoint, object pose and appearance, spatial resolution, illumination, background, and occlusion. Session Files	Alexei Skurikhin scientist Los Alamos National Laboratory (bio) Alexei Skurikhin is a scientist with Remote Sensing and Data Science group at Los Alamos National Laboratory (LANL). He holds a Ph.D. in Computer Science and has been working at LANL since 1997 in the areas of signal and image analysis, evolutionary computations, computer vision, machine learning, and remote sensing applications.	Presentation			2023
Presentation Post-hoc UQ of Deep Learning Models Applied to Remote Sensing Image Scene Classification (Abstract) Post-hoc Uncertainty Quantification of Deep Learning Models Applied to Remote Sensing Image Scene Classification Steadily growing quantities of high-resolution UAV, aerial, and satellite imagery provide an exciting opportunity for global transparency and geographic profiling of activities of interest. Advances in deep learning, such as deep convolutional neural networks (CNNs) and transformer models, offer more efficient ways to exploit remote sensing imagery. Transformers, in particular, are capable of capturing contextual dependencies in the data. Accounting for context is important because activities of interest are often interdependent and reveal themselves in co-occurrence of related image objects or related signatures. However, while transformers and CNNs are powerful models, their predictions are often taken as point estimates, also known as pseudo probabilities, as they are computed by the softmax function. They do not provide information about how confident the model is in its predictions, which is important information in many mission-critical applications, and therefore limits their use in this space. Model evaluation metrics can provide information about the predictive model’s performance. We present and discuss results of post-hoc uncertainty quantification (UQ) of deep learning models, i.e., UQ application to trained models. We consider an application of CNN and transformer models to remote sensing image scene classification using satellite imagery, and compare confidence estimates of scene classification predictions of these models using evaluation metrics, such as expected calibration error, reliability diagram, and Brier score, in addition to conventional metrics, e.g. accuracy and F1 score. For validation, we use the publicly available and well-characterized Remote Sensing Image Scene Classification (RESISC45) dataset, which contains 31,500 images, covering 45 scene categories with 700 images in each category, and with the spatial resolution that varies from 30 to 0.2 m per pixel. This dataset was collected over different locations and under different conditions and possesses rich variations in translation, viewpoint, object pose and appearance, spatial resolution, illumination, background, and occlusion. Session Files	Alexei Skurikhin scientist Los Alamos National Laboratory (bio) Alexei Skurikhin is a scientist with Remote Sensing and Data Science group at Los Alamos National Laboratory (LANL). He holds a Ph.D. in Computer Science and has been working at LANL since 1997 in the areas of signal and image analysis, evolutionary computations, computer vision, machine learning, and remote sensing applications.	Presentation			2023
Presentation Post-hoc UQ of Deep Learning Models Applied to Remote Sensing Image Scene Classification (Abstract) Post-hoc Uncertainty Quantification of Deep Learning Models Applied to Remote Sensing Image Scene Classification Steadily growing quantities of high-resolution UAV, aerial, and satellite imagery provide an exciting opportunity for global transparency and geographic profiling of activities of interest. Advances in deep learning, such as deep convolutional neural networks (CNNs) and transformer models, offer more efficient ways to exploit remote sensing imagery. Transformers, in particular, are capable of capturing contextual dependencies in the data. Accounting for context is important because activities of interest are often interdependent and reveal themselves in co-occurrence of related image objects or related signatures. However, while transformers and CNNs are powerful models, their predictions are often taken as point estimates, also known as pseudo probabilities, as they are computed by the softmax function. They do not provide information about how confident the model is in its predictions, which is important information in many mission-critical applications, and therefore limits their use in this space. Model evaluation metrics can provide information about the predictive model’s performance. We present and discuss results of post-hoc uncertainty quantification (UQ) of deep learning models, i.e., UQ application to trained models. We consider an application of CNN and transformer models to remote sensing image scene classification using satellite imagery, and compare confidence estimates of scene classification predictions of these models using evaluation metrics, such as expected calibration error, reliability diagram, and Brier score, in addition to conventional metrics, e.g. accuracy and F1 score. For validation, we use the publicly available and well-characterized Remote Sensing Image Scene Classification (RESISC45) dataset, which contains 31,500 images, covering 45 scene categories with 700 images in each category, and with the spatial resolution that varies from 30 to 0.2 m per pixel. This dataset was collected over different locations and under different conditions and possesses rich variations in translation, viewpoint, object pose and appearance, spatial resolution, illumination, background, and occlusion. Session Files	Alexei Skurikhin scientist Los Alamos National Laboratory (bio) Alexei Skurikhin is a scientist with Remote Sensing and Data Science group at Los Alamos National Laboratory (LANL). He holds a Ph.D. in Computer Science and has been working at LANL since 1997 in the areas of signal and image analysis, evolutionary computations, computer vision, machine learning, and remote sensing applications.	Presentation			2023
Presentation Predicting Aircraft Load Capacity Using Regional Climate Data (Abstract) While the impact of local weather conditions on aircraft performance is well-documented, climate change has the potential to create long-term shifts in aircraft performance. Using just one metric, internal load capacity, we document operationally relevant performance changes for a UH-60L within the Indo-Pacific region. This presentation uses publicly available climate and aircraft performance data to create a representative analysis. The underlying methodology can be applied at varying geographic resolutions, timescales, airframes, and aircraft performance characteristics across the entire globe. Session Files Session URLSession Recording	Abraham Holland Research Staff Member IDA (bio) Dr. Abraham Holland joined the Institute for Defense Analyses (IDA) in 2019 after completing his PhD in Public Policy at Harvard University. At IDA, Dr. Holland is an applied microeconomist that has led a range analyses across defense manpower, operations, and infrastructure topics. He is also a founding member of IDA’s climate and energy security working group, those researchers focused on supporting IDA’s capability to bring the best available climate science to today’s national security challenges. In this area, he has completed analyses on the potential impact of climate change on Department of Defense equipment, personnel, and operations. In addition to being a U.S. Air Force veteran, he received his undergraduate degree from Dartmouth College and graduated summa cum laude in economics and Chinese literature.	Presentation	Session Recording	Materials	2023
Presentation Predicting Success and Identifying Key Characteristics in Special Forces Selection (Abstract) The United States Military possesses special forces units that are entrusted to engage in the most challenging and dangerous missions that are essential to fighting and winning the nations wars. Entry into special forces is based on a series of assessments called Special Forces Assessment and Selection (SFAS), which consists of numerous challenges that test a soldiers mental toughness, physical fitness, and intelligence. Using logistic regression, random forest classification, and neural network classification, the researchers in this study aim to create a model that both accurately predicts whether a candidate passes SFAS and which variables are significant indicators of passing selection. Logistic regression proved to be the most accurate model, while also highlighting physical fitness, military experience, and intellect as the most significant indicators associated with success. Session Files	Mark Bobinski Cadet United States Military Academy (bio) I am currently a senior at the United States Military Academy at West Point. My major is Applied Statistics and Data Science and I come from Cleveland, Ohio. This past summer I had the opportunity to work with the Army's Special Warfare Center and School as an intern where we began the work on this project. I thoroughly enjoy mathematical modeling and look to begin a career in data science upon retiring from the military.	Presentation			2023
Short Course Present Your Science (Abstract) This comprehensive 1-day course equips scientists, engineers, researchers, and technical professionals to present their science in an understandable, memorable, and persuasive way. Through a dynamic combination of lecture, discussion, exercises, and video analysis, each participant will walk away with the skills, knowledge, and practice necessary to transform the way their work is presented. Five course objectives are covered: Transform the scientific presentations skills of participants. Enable participants to utilize effective strategies for content, structure, slide design and delivery of scientific presentations. Teach participants to analyze and adapt to their audience. Help participants understand which scientific details to emphasize in their presentation and which details to filter out. Equip participants to understand and enact the assertion evidence slide design in their own talks to make their scientific presentation slides more understandable, memorable, and engaging. Assist participants in developing an engaging and confident delivery style. * Attendees should bring a laptop with them to the session. * Session Files	Melissa Marshall Founder Present Your Science (bio) Melissa Marshall is the leading expert on presenting complex ideas. Melissa Marshall is on a mission: to transform how scientists, engineers, and technical professionals present their work. That’s because she believes that even the best science is destined to remain undiscovered unless it’s presented in a clear and compelling way that sparks innovation and drives adoption. For a decade, she’s traveled around the world to work with Fortune 100 corporations, institutions and universities, teaching the proven strategies she’s mastered through her consulting work and during her decade as a faculty member at Penn State University. In 2019 through 2022, Microsoft has named her a Most Valuable Professional (MVP) for her work in transforming the way the scientific community uses PowerPoint to convey their research. Melissa has also authored a new online course on LinkedIn Learning. Melissa’s workshops are lively, practical and transformational. For a sneak peek, check out her TED Talk, “Talk Nerdy to Me.” It’s been watched by over 2.5 million people (and counting).	Short Course			2023
Recommendations for Statistical Analysis of Modeling and Simulation Environment Outputs (Abstract) Modeling and simulation (M&S) environments feature frequently in test and evaluation (T&E) of Department of Defense (DoD) systems. Testers may generate outputs from M&S environments more easily than collecting live test data, but M&S outputs nevertheless take time to generate, cost money, require training to generate, and are accessed directly only by a select group of individuals. Nevertheless, many M&S environments do not suffer many of the resourcing limitations associated with live test. We thus recommend testers apply higher resolution output generation and analysis techniques compared to those used for collecting live test data. Doing so will maximize stakeholders’ understanding of M&S environments’ behavior and help utilize its outputs for activities including M&S verification, validation, and accreditation (VV&A), live test planning, and providing information for non-T&E activities. This presentation provides recommendations for collecting outputs from M&S environments such that a higher resolution analysis can be achieved. Space filling designs (SFDs) are experimental designs intended to fill the operational space for which M&S predictions are expected. These designs can be coupled with statistical metamodeling techniques that estimate a model that flexibly interpolates or predicts M&S outputs and their distributions at both observed settings and unobserved regions of the operational space. Analysts can use the resulting metamodels as a surrogate for M&S outputs in situations where the M&S environment cannot be deployed. They can also study metamodel properties to decide if a M&S environment adequately represents the original systems. IDA has published papers recommending specific space filling design and metamodeling techniques; this presentation briefly covers the content of those papers. Session Files Session URLSession Recording	Curtis Miller Research Staff Member IDA (bio) Dr. Curtis Miller is a research staff member of the Operational Evaluation Division at the Institute for Defense Analyses. In that role, he advises analysts on effective use of statistical techniques, especially pertaining to modeling and simulation activities and U.S. Navy operational test and evaluation efforts, for the division's primary sponsor, the Director of Operational Test and Evaluation. He obtained a PhD in mathematics from the University of Utah and has several publications on statistical methods and computational data analysis, including an R package, CPAT. In the past, he has done research on topics in economics including estimating difference in pay between male and female workers in the state of Utah on behalf of Voices for Utah Children, an advocacy group.		Session Recording	Materials	2023
Reinforcement Learning Approaches to the T&E of AI/ML-based Systems Under Test (Abstract) Designed experiments provide an efficient way to sample the complex interplay of essential factors and conditions during operational testing. Analysis of these designs provide more detailed and rigorous insight into the system under test’s (SUT) performance than top-level summary metrics provide. The introduction of artificial intelligence and machine learning (AI/ML) capabilities in SUTs create a challenge for test and evaluation because the factors and conditions that constitute the AI SUT's “feature space” are more complex than those of a mechanical SUT. Executing the equivalent of a full-factorial design quickly becomes infeasible. This presentation will demonstrate an approach to efficient, yet rigorous, exploration of the AI/ML-based SUT’s feature space that achieves many of the benefits of a traditional design of experiments – allowing more operationally meaningful insight into the strengths and limitations of the SUT than top-level AI summary metrics (like ‘accuracy’) provide. The approach uses an algorithmically defined search method within a reinforcement learning-style test harness for AI/ML SUTs. An adversarial AI (or AI critic) efficiently traverses the feature space and maps the resulting performance of the AI/ML SUT. The process identifies interesting areas of performance that would not otherwise be apparent in a roll-up metric. Identifying 'toxic performance regions', in which combinations of factors and conditions result in poor model performance, provide critical operational insights for both testers and evaluators. The process also enables T&E to explore the SUT's sensitivity and robustness to changes in inputs and the boundaries of the SUT's performance envelope. Feedback from the critic can be used by developers to improve the AI/ML SUT and by evaluators to interpret in terms of effectiveness, suitability, and survivability. This procedure can be used for white box, grey box and black box testing. Session Files Session URLSession Recording	Karen O'Brien Principal Data Scientist Modern Technology Solutions, Inc (bio) Karen O'Brien has 20 years of service as a Dept. of the Army Civilian. She has worked as a physical scientist and ORSA in a wide range of mission areas, from ballistics to logistics, and from S&T to T&E. She was a physics and chemistry nerd as an undergrad but uses her Masters in Predictive Analytics from Northwestern to support DoD agencies in developing artificial intelligence, machine learning, and advanced analytics capabilities. She is currently a principal data scientist at Modern Technology Solutions, Inc.		Session Recording	Materials	2023
Presentation Saving hardware, labor, and time using Bayesian adaptive design of experiments (Abstract) Physical testing in the national security enterprise is often costly. Sometimes this is driven by hardware and labor costs, other times it can be driven by finite resources of time or hardware builds. Test engineers must make the most of their available resources to answer high consequence problems. Bayesian adaptive design of experiments (BADE) is one tool that should be in an engineer’s toolbox for designing and running experiments. BADE is sequential design of experiment approach which allows early stopping decisions to be made in real time using predictive probabilities (PP), allowing for more efficient data collection. BADE has seen successes in clinical trials, another high consequence arena, and it has resulted in quicker and more effective assessments of drug trials. BADE has been proposed for testing in the national security space for similar reasons of quicker and cheaper test series. Given the high-consequence nature of the tests performed in the national security space, a strong understanding of new methods is required before being deployed. The main contribution of this research is to assess the robustness of PP in a BADE under different modeling assumptions, and to compare PP results to its frequentist alternative, conditional power (CP). Comparisons are made based on Type I error rates, statistical power, and time savings through average stopping time. Simulation results show PP has some robustness to distributional assumptions. PP also tends to control Type I error rates better than CP, while maintaining relatively strong power. While CP usually recommends stopping a test earlier than PP, CP also tends to have more inconsistent results, again showing the benefits of PP in a high consequence application. An application to a real problem from Sandia National Laboratories shows the large potential cost savings for using PP. The results of this study suggest BADE can be one piece of an evidence package during testing to stop testing early and pivot, in order to decrease costs and increase flexibility. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525. Session Files Session URLSession Recording	Daniel Ries Senior Member of the Technical Staff Sandia National Laboratories (bio) Daniel Ries is a Senior Member of the Technical Staff at Sandia National Laboratories, where he has been since 2018. His roles at Sandia include statistical test engineer, technical researcher, project manager, and a short stint as acting manager of the Statistical Sciences Department. His work includes developing explainable AI to solve national security problems, applying Bayesian methods to include uncertainty quantification in solutions, and provide test and analysis support to weapon modernization programs. Daniel also serves as an Adjunct Professor at the University of Illinois Urbana-Champaign as an instructor and mentor for statistics majors interested in pursuing a data science career in the national security enterprise. Daniel received his PhD in statistics from Iowa State University in 2017.	Presentation	Session Recording	Materials	2023
Presentation Seamlessly Integrated Materials Labs at AFRL (Abstract) One of the challenges to conducting research in the Air Force Research Laboratory is that many of our equipment controllers cannot be directly connected to our internal networks, due to older or specialized operating systems and the need for administrative privileges for proper functioning. This means that the current data collection process is often highly manual, with users documenting experiments in physical notebooks and transferring data via CDs or portable hard drives to connected systems for sharing or further processing. In the Materials & Manufacturing Directorate, we have developed a unique approach to seamlessly integrate our labs for more efficient data collection and transfer, which is specifically designed to help users ensure that data is findable for future reuse. In this talk, we will highlight our two enabling tools: NORMS, which assists users to easily generate metadata for direct association with data collected in the lab to eliminate physical notebooks; and Spike, which automates one-way data transfer from isolated systems to databases mirrored on other networks. In these databases, metadata can be used for complex search queries and data is automatically shared with project members without requiring additional transfers. The impact of this solution has been significantly faster data availability (including searchability) to all project members: a transfer and scanning process that used to take 3 hours can now take a few minutes. Future use cases will also enable Spike to transfer data directly into cloud buckets for in situ analysis, which would streamline collaboration with partners. Session Files Session URLSession Recording	Lauren Ferguson Digital Transformation Lead, Materials & Manufacturing Directorate Air Force Research Laboratory (bio) Dr. Lauren Ferguson is the Digital Transformation Lead in the Materials & Manufacturing Directorate of the Air Force Research Laboratory in Dayton, OH. She earned her PhD in mathematics from Texas A&M University where she became interested in mathematical applications to materials science problems through an NSF fellowship. She spent eight years developing state-of-the-art simulation tools for composite materials that accurately model post-processing material state, capture complex damage patterns due to service loads and environments, and predict remaining life. For the last two years, she has pivoted to driving digital transformation efforts at AFRL, including facilitating pilot projects to seamlessly integrate labs for streamlined data collection and analysis, and to make Google Workspace and Cloud tools available to foster collaboration with global partners.	Presentation	Session Recording	Materials	2023
Presentation Skyborg Data Pipeline (Abstract) The purpose of the Skyborg Data Pipeline is to allow for the rapid turnover of flight data collect during a test event, using collaborative easily access tool sets available in the AFOTEC Data Vault. Ultimately the goal of this data pipeline is to provide a working up to date dashboard that leadership can utilize shortly after a test event. Session Files Session URLSession Recording	Alexander Malburg Data Analyst AFOTEC/EX (bio) AFOTEC/EX Data Analyst	Presentation	Session Recording	Materials	2023
STAR: A Cloud-based Innovative Tool for Software Quality Analysis (Abstract) Traditionally, subject matter experts perform software quality analysis using custom spreadsheets which produce inconsistent output and are challenging to share and maintain across teams. This talk will introduce and demonstrate STAR – a cloud-based, data-driven tool for software quality analysis. The tool is aimed at practitioners who manage software quality and make decisions based on its readiness for delivery. Being web-based and fully automated allows teams to collaborate on software quality analysis across multiple projects and releases. STAR is an integration of SaaS and automated analytics. It is a digital engineering tool for software quality practice. To use the tool, all users need to do is upload their defect and development effort (optional) data to the tool and set a couple of planned release milestones, such as test start date and delivery dates for customer trial and deployment. The provided data is then automatically processed and aggregated into a defect growth curve. The core innovation of STAR is in its set of Statistical sound algorithms that are then used to fit a defect prediction curve to the provided data. This is achieved through the automated identification of inflection points in the original defect data and their use in generating piece-wise exponential models that make up the final prediction curve. Moreover, during the early days of software development, where no defect data is available, STAR can use the development effort plan and learn from previous software releases' defects and effort data to make predictions for the current release. Finally, the tool implements a range of what-if scenarios that enable practitioners to evaluate several potential actions to correct course. Thanks to the use of an earlier version of STAR by a large software development group at Nokia and the current trialing collaboration with NASA, the features and accuracy of the tool have improved to be better than traditional single curve fitting. In particular, the defect prediction is stable several weeks before the planned software release, and the multiple metrics provided by the tool make the analysis of software quality straightforward, guiding users in making an intelligent decision regarding the readiness for high-quality software delivery. Session Files Session URLSession Recording	Kazu Okumoto CEO Sakura Software Solutions (3S) LLC (bio) Kazu is a well-recognized pioneer in software reliability engineering. He invented a world-famous statistical model for software reliability (the “Goel – Okumoto model”). After retiring from Nokia Bell Labs in 2020 as a Distinguished Member of the Technical Staff, Dr. Okumoto founded Sakura Software Solutions, which has developed a cloud-based innovative tool, STAR, for software quality assurance. His co-authored book on software reliability is the most frequently referenced in this field. And he has an impressive list of book chapters, keynote addresses, and numerous technical papers to his credit. Since joining Bell Labs in 1980, Kazu has worked on many exciting projects for original AT&T, Lucent, Alcatel Lucent, and Nokia. He has 13 years of management experience, including Bell Labs Field Representative in Japan. He completed his Ph D. (1979) and MS (1976) at Syracuse University and BS (1974) at Hiroshima University. He was an Assistant Professor at Rutgers University.		Session Recording	Materials	2023
Presentation Systems Engineering Applications of UQ in Space Mission Formulation (Abstract) In space mission formulation, it is critical to link the scientific phenomenology under investigation directly to the spacecraft design, mission design, and the concept of operations. With many missions of discovery, the large uncertainty in the science phenomenology and the operating environment necessitates mission architecture solutions that are robust and resilient to these unknowns, in order to maximize the probability of achieving the mission objectives. Feasible mission architectures are assessed against performance, cost, and risk, in the context of large uncertainties. For example, despite Cassini observations of Enceladus, significant uncertainties exist in the moon’s jet properties and the surrounding Enceladus environment. Orbilander or any other mission to Enceladus will need to quantify or bound these uncertainties in order to formulate a viable design and operations trade space that addresses a range of mission objectives within the imposed technical and programmatic constraints. Uncertainty quantification (UQ), utilizes a portfolio of stochastic, data science, and mathematical methods to characterize uncertainty of a system and inform risk and decision-making. This discussion will focus on a formulation of a UQ workflow and an example of an Enceladus mission development use case. Session Files Session URLSession Recording	Kelli McCoy and Roger Ghanem Senior Systems Engineer NASA Jet Propulsion Laboratory, University of Southern California (bio) Kelli McCoy began her career at NASA Kennedy Space Center as an Industrial Engineer in the Launch Services Program, following her graduation from Georgia Tech with a M.S in Industrial and Systems Engineering. She went on to obtain a M.S. in Applied Math and Statistics at Georgetown University, and subsequently developed probability models to estimate cost and schedule during her tenure with the Office of Evaluation at NASA Headquarters. Now at Jet Propulsion Laboratory, she has found applicability for math and probability models in an engineering environment. She further developed that skillset as the Lead of the Europa Clipper Project Systems Engineering Analysis Team, where she and her team produced 3 probabilistic risk assessments for the mission, using their model-based SE environment. She is currently the modeling lead for a JPL New Frontiers proposal and is a member of JPL's Quantification of Uncertainty Across Disciplines (QUAD) team, which is promoting Uncertainty Quantification practices across JPL. In parallel, Kelli is pursuing a PhD in UQ at University of Southern California. Co-author, Roger Ghanem is a Professor Civil Engineering at the University of Southern California	Presentation	Session Recording	Materials	2023
Presentation T&E as a Continuum (Abstract) A critical change in how Test and Evaluation (T&E) supports capability delivery is needed to maintain our advantage over potential adversaries. Making this change requires a new paradigm in which T&E provides focused and relevant information supporting decision-making continually throughout capability development and informs decision makers from the earliest stages of Mission Engineering (ME) through Operations and Sustainment (O&S). This new approach improves the quality of T&E by moving from a serial set of activities conducted largely independently of Systems Engineering (SE) and ME activities to a new integrative framework focused on a continuum of activities termed T&E as a Continuum. T&E as a Continuum has three key attributes – capability and outcome focused testing; an agile, scalable evaluation framework; and enhanced test design – critical in the conduct of T&E and improving capability delivery. T&E as a Continuum builds off the 2018 DoD Digital Engineering (DE) Strategy’s five critical goals with three key enablers – robust live, virtual, and constructive (LVC) testing; developing model-based environments; and a “digital” workforce knowledgeable of the processes and tools associated with MBSE, model-based T&E, and other model-based processes. T&E as a Continuum improves the quality of T&E through integration of traditional SE and T&E activities, providing a transdisciplinary, continuous process coupling design evolution with VV&A. Session Files Session URLSession Recording	Orlando Flores Chief Engineer, DTE&A OUSD(R&E) (bio) Mr. Orlando Flores is currently the Chief Engineer within the Office of the Executive Director for Developmental Test, Evaluation, and Assessments (DTE&A). He serves as the principle technical advisor to the Executive Director, Deputies, Cybersecurity Technical Director, Systems Engineering and Technical Assistance (SETA) staff, and outside Federally Funded Research and Development Center (FFRDC) technical support for all DT&E and Systems Engineering (SE) matters. Prior to this he served as the Technical Director for Surface Ship Weapons within the Program Executive Office Integrated Warfare Systems (PEO IWS). Where he was responsible for providing technical guidance and direction for the Surface Ship Weapons portfolio of surface missiles, launchers and gun weapon systems. From June 2016 to June 2019 he served as the Director for Surface Warfare Weapons for the Deputy Assistant Secretary of the Navy for Ship Programs (DASN Ships), supporting the Assistant Secretary of the Navy for Research, Development and Acquisition. He was responsible for monitoring and advising DASN Ships on all matters related to surface weapons and associated targets. Prior to this from August 2009 through June 2016, Mr. Flores was the Deputy Project Manager and Lead Systems Engineer for the Standard Missile-6 Block IA (SM-6 Blk IA) missile program within PEO IWS. In this role he oversaw the management, requirements definition, design, development and fielding of one of the Navy’s newest surface warfare weapons. Beginning in 2002 and through 2009, Mr. Flores served in multiple capacities within the Missile Defense Agency (MDA). His responsibilities included functional lead for test and evaluation of the Multiple Kill Vehicle program; Modeling and Simulation development lead; Command and Control, Battle Management, and Communications systems engineer for the Kinetic Energy Interceptors program; and Legislative Liaison for all U.S. House of Appropriations Committee matters. In 2008 he was selected to serve as a foreign affairs analyst for the Deputy Assistant Secretary of Defense for Nuclear and Missile Defense Policy within the Office of the Under Secretary of Defense for Policy where he developed and oversaw policies, strategies, and concepts pertaining to U.S. Ballistic Missile Defense System operations and deployment across the globe. Mr. Flores began his federal career in 1998 as an engineering intern for the Naval Sea Systems Command (NAVSEA) within the Department of the Navy. In July 2000 Mr. Flores graduated from the engineering intern program and assumed the position of Battle Management, Command Control, and Communications (BMC3) systems engineer for the NTW program where he led the design and development of ship-based ballistic missile defense BMC3 systems through 2002. Mr. Flores graduated from New Mexico State University in 1998 where he earned a bachelor’s of science degree in Mechanical Engineering. He earned a master’s of Business Administration in 2003. Mr. Flores is a member of the Department of Defense Acquisition Professional Community and has achieved two Defense Acquisition Workforce Improvements Act certifications: Level III in Program Management and Level II in Systems Planning, Research, Development and Engineering.	Presentation	Session Recording	Materials	2023
T&E Landscape for Advanced Autonomy (Abstract) The DoD is making significant investments in the development of autonomous systems, spanning from basic research, at organizations such as DARPA and ONR, to major acquisition programs, such as PEO USC. In this talk we will discuss advanced autonomous systems as complex, fully autonomous systems and systems of systems, rather than specific subgenres of autonomous functions – i.e. basic path planning autonomy or vessel controllers for moving vessels from point A to B. As a community, we are still trying to understand how to integrate these systems in the field with the warfighter to fully optimize their added capabilities. A major goal of using autonomous systems is to support multi-domain, distributed operations. We have a vision for how this may work, but we don’t know when, or if, these systems will be ready any time soon to implement these visions. We must identify trends, analyze bottlenecks, and find scalable approaches to fielding these capabilities, such as identifying certification criterion, or optimizing methods of testing and evaluating (T&E) autonomous systems. Traditional T&E methods are not sufficient for cutting edge autonomy and artificial intelligence (AI). Not only do we have to test the traditional aspects of system performance (speed, endurance, range, etc.) but also the decision-making capabilities that would have previously been performed by humans. This complexity increases when an autonomous system changes based on how it is applied in the real world. Each domain, environment, and platform an autonomy is run on, presents unique autonomy considerations. Complexity is further compounded when we begin to stack these autonomies and integrate them into a fully autonomous system of systems. Currently, there are no standard processes or procedures for testing these nested, complex autonomies; yet there are numerous areas for growth and improvement in this space. We will dive into identified capability gaps in Advanced Autonomy T&E that we have recognized and provide approaches for how the DOD may begin to tackle these issues. It is important that we make critical contributions towards testing, trusting and certifying these complex autonomous systems. Primary focus areas that are addressed include: - Recommending the use of bulk testing through Modeling and Simulation (M&S), while ensuring that the virtual environment is representative of the operational environment. - Developing intelligent tests and test selection tools to locate and discriminate areas of interest faster than through traditional Monte-Carlo sampling methods. - Building methods for testing black box autonomies faster than real time, and with fewer computational requirements. - Providing data analytics that assess autonomous systems in ways that human provide decision makers a means for certification. - Expanding the concept of what trust means, how to assess and, subsequently, validate trustworthiness of these systems across stakeholders. - Testing experimental autonomous systems in a safe and structured manner that encourages rapid fielding and iteration on novel autonomy components. Session Files Session URLSession Recording	Kathryn Lahman Program Manager Johns Hopkins University Applied Physics Laboratory (bio) I am the Program Manager for Advanced Autonomy Test & Evaluation (AAT&E) program within the Sea Control Mission Area of the Johns Hopkins Applied Physics Laboratory (JHU/APL). My primary focus is guiding the DoD towards a full scope understanding of what Autonomy T&E truly involves such as: - validation of Modeling and Simulation (M&S) environments and models - development of test tools and technologies to improve T&E within M&S environments - data collection, analysis and visualization to make smarter decisions more easily - improvement and streamlining of the T&E process to optimize continual development, test, and fielding - understanding and measuring trust of autonomy - supporting rapid experimentation and feedback loop from testing via M&S to testing with physical systems In my previous life as a Human Systems Engineer (HSE), I developed skills in Usability Engineering, Storyboarding, Technical Writing, Human Computer Interaction, and Information Design. Strong information technology professional with a Master of Science (M.S.) focused in Human Centered Computing from University of Maryland Baltimore County (UMBC). As I moved to JHU/APL I became focused on HSE involving UxVs (Unmanned Vehicles of multiple domains) and autonomous systems. I further moved into managing projects across autonomous system domains with the Navy as my primary sponsor. As my skillsets and understanding of Autonomy and Planning, Test and Evaluation (PT&E) of those systems grew, I applied a consistent human element to my approach for PT&E.		Session Recording	Materials	2023
Presentation Test and Evaluation Methods for Authorship Attribution and Privacy Preservation (Abstract) The aim of the IARPA HIATUS program is to develop explainable systems for authorship attribution and author privacy preservation through the development of feature spaces which encode the distinguishing stylistic characteristics of authors independently of text genre, topic, or format. In this talk, I will discuss progress towards defining an evaluation framework for this task to provide robust insights into system strengths, weaknesses, and overall performance. Our evaluation strategy includes the use of an adversarial framework between attribution and privacy systems, development of a focused set of core metrics, analysis of system performance dependencies on key data factors, systematic exploration of experimental variables to probe targeted questions about system performance, and investigation of key trade-offs between different performance measures. Session Files Session URLSession Recording	Emily Saldanha Senior Data Scientist Pacific Northwest National Laboratory (bio) Dr. Emily Saldanha is a research scientist in the Data Science and Analytics group of the National Security Directorate at Pacific Northwest National Laboratory. Her work focuses on developing machine learning, deep learning, and natural language processing methods for diverse applications with the aim to extract information and patterns from complex and multimodal datasets with weak and noisy signals. Her research efforts have spanned application areas ranging from energy technologies to computational social science. She received her Ph.D. in physics from Princeton University in 2016, where her work focused on the development and application of calibration algorithms for microwave sensors for cosmological observations.	Presentation	Session Recording	Materials	2023
Test and Evaluation of AI Cyber Defense Systems (Abstract) Adoption of Artificial Intelligence and Machine Learning powered cybersecurity defenses (henceforth, AI defenses) has outpaced testing and evaluation (T&E) capabilities. Industrial and governmental organizations around the United States are employing AI defenses to protect their networks in ever increasing numbers, with the commercial market for AI defenses currently estimated at $15 billion and expected to grow to $130 billion by 2030. This adoption of AI defenses is powered by a shortage of over 500,000 cybersecurity staff in the United States, by a need to expeditiously handle routine cybersecurity incidents with minimal human intervention and at machine speeds, and by a need to protect against highly sophisticated attacks. It is paramount to establish, through empirical testing, trust and understanding of the capabilities and risks associated with employing AI defenses. While some academic work exists for performing T&E of individual machine learning models trained using cybersecurity data, we are unaware of any principled method for assessing the capabilities of a given AI defense within an actual network environment. The ability of AI defenses to learn over time poses a significant T&E challenge, above and beyond those faced when considering traditional static cybersecurity defenses. For example, an AI defense may become more (or less) effective at defending against a given cyberattack as it learns over time. Additionally, a sophisticated adversary may attempt to evade the capabilities of an AI defense by obfuscating attacks to maneuver them into its blind spots, by poisoning the training data utilized by the AI defense, or both. Our work provides an initial methodology for performing T&E of on-premises network-based AI defenses on an actual network environment, including the use of a network environment with generated user network behavior, automated cyberattack tools to test the capabilities of AI cyber defenses to detect attacks on that network, and tools for modifying attacks to include obfuscation or data poisoning. Discussion will also center on some of the difficulties with performing T&E on an entire system, instead of just an individual model. Session Files Session URLSession Recording	Shing-hon Lau Senior Cybersecurity Engineer Software Engineering Institute, Carnegie Mellon University (bio) Shing-hon Lau is a Senior Cybersecurity Engineer at the CERT Division of the Software Engineering Institute at Carnegie Mellon University, where he investigates the intersection between cybersecurity, artificial intelligence, and machine learning. His research interests include rigorous testing of artificial intelligence systems, building secure and trustworthy machine learning systems, and understanding the linkage between cybersecurity and adversarial machine learning threats. One research effort concerns the development of a methodology to evaluate the capabilities of AI-powered cybersecurity defensive tools. Prior to joining the CERT Division, Lau obtained his PhD in Machine Learning in 2018 from Carnegie Mellon. His doctoral work focused on the application of keystroke dynamics, or the study of keyboard typing rhythms, for authentication, insider-threat detection, and healthcare applications.		Session Recording	Materials	2023
Test and Evaluation of Systems with Embedded Artificial Intelligence Components (Abstract) As Artificial Intelligence (AI) continues to advance, it is being integrated into more systems. Often, the AI component represents a significant portion of the system that reduces the burden on the end user or significantly improves the performance of a task. The AI component represents an unknown complex phenomenon that is learned from collected data without the need to be explicitly programmed. Despite the improvement in performance, the models are black boxes. Evaluating the credibility and the vulnerabilities of AI models poses a gap in current test and evaluation practice. For high consequence applications, the lack of testing and evaluation procedures represents a significant source of uncertainty and risk. To help reduce that risk, we have developed a red-teaming inspired methodology to evaluate systems embedded with an AI component. This methodology highlights the key expertise and components that are needed beyond what a typical red team generally requires. Opposed to academic evaluation of AI models, we present a system-level evaluation rather than the AI model in isolation. We outline three axes along which to evaluate an AI component: 1) Evaluating the performance of the AI component to ensure that the model functions as intended and is developed based on bast practices developed by the AI community. This process entails more than simply evaluating the learned model. As the model operates on data used for training as well as perceived by the system, peripheral functions such as feature engineering and the data pipeline need to be included. 2) AI components necessitate supporting infrastructure in deployed systems. The support infrastructure may introduce additional vulnerabilities that are overlooked in traditional test and evaluation processes. Further, the AI component may be subverted by modifying key configuration files or data pipeline components. 3) AI models introduce possible vulnerabilities to adversarial attacks. These could be attacks designed to evade detection by the model, steal the model, poison the model, steal the model or data, or misuse the model to act inappropriately. Within the methodology, we highlight tools that may be applicable as well as gaps that need to be addressed by the community. SNL is managed and operated by NTESS under DOE NNSA contract DE-NA0003525 Session Files Session URLSession Recording	Michael Smith Principal Member Sandia National Laboratories (bio) Michael R. Smith is a Principal Member at Sandia National Laboratories. He previously earned his PhD at Brigham Young University for for his work on instance-level metalearning. In his current, his research focuses on the explainability, credibility, and validation of machine learned models in high-consequence applications and their effects on decision making.		Session Recording	Materials	2023
Presentation Test and Evaluation Tool for Stealthy Communication (Abstract) Stealthy communication allows the transfer of information while hiding not only the content of that information but also the fact that any hidden information was transferred. One way of doing this is embedding information into network covert channels, e.g., timing between packets, header fields, and so forth. We describe our work on an integrated system for the design, analysis, and testing of such communication. The system consists of two main components: the analytical component, the NExtSteP (NRL Extensible Stealthy Protocols) testbed, and the emulation component, consisting of CORE (Common Open Research Emulator), an existing open source network emulator, and EmDec, a new tool for embedding stealthy traffic in CORE and decoding the result. We developed the NExtSteP testbed as a tool to evaluate the performance and stealthiness of embedders and detectors applied to network traffic. NExtSteP includes modules to: generate synthetic traffic data or ingest it from an external source (e.g., emulation or network capture); embed data using an extendible collection of embedding algorithms; classify traffic, using an extendible collection of detectors, as either containing or not containing stealthy communication; and quantify, using multiple metrics, the performance of a detector over multiple traffic samples. This allows us to systematically evaluate the performance of different embedders (and embedder parameters) and detectors against each other. Synthetic data are easy to generate with NExtSteP. We use these data for initial experiments to broadly guide parameter selection and to study asymptotic properties that require numerous long traffic sequences to test. The modular structure of NExtSteP allows us to make our experiments increasingly realistic. We have done this in two ways: by ingesting data from captured traffic and then doing embedding, classification, and detector analysis using NExtSteP, and by using EmDec to produce external traffic data with embedded communication and then using NExtStep to do the classification and detector analysis. The emulation component was developed to build and evaluate proof-of-concept stealthy communications over existing IP networks. The CORE environment provides a full network, consisting of multiple nodes, with minimal hardware requirements and allows testing and orchestration of real protocols. Our testing environment allows for replay of real traffic and generation of synthetic traffic using MGEN (Multi-Generator) network testing tool. The EmDec software was created with the already existing NRL-developed protolib (protocol library). EmDec, running on CORE networks and orchestrated using a set of scripts, generates sets of data which are then evaluated for effectiveness by NExtSteP. In addition to evaluation by NExtSteP, development of EmDec allowed us to discover multiple novelties that were not apparent while using theoretical models. We describe current status of our work, the results so far, and our future plans. Session Files	Olga Chen Computer Scientist U.S. Naval Research Laboratory (bio) Dr. Olga Chen has worked as a Computer Scientist at the U.S. Naval Research Laboratory since 1999. For the last three years, she has been the Principal Investigator for the “Stealthy Communications and Situational Awareness” project. Her current research focuses on network protocols and communications, design of security protocols and architectures, and their analysis and verification. She has published peer-reviewed research on approaches to software security and on design and analysis of stealthy communications. She has a Doctorate in Computer Science from the George Washington University.	Presentation			2023
Presentation The Application of Semi-Supervised Learning in Image Classification (Abstract) In today's Army, one of the fastest growing and most important areas in the effectiveness of our military is data science. One aspect of this field is image classification, which has applications such as target identification. However, one drawback within this field is that when an analyst begins to deal with a multitude of images, it becomes infeasible for an individual to examine all the images and classify them accordingly. My research presents a methodology for image classification which can be used in a military context, utilizing a typical unsupervised classification approach involving K-Means to classify a majority of the images while pairing this with user input to determine the label of designated images. The user input comes in the form of manual classification of certain images which are deliberately selected for presentation to the user, allowing this individual to select which group the image belongs in and refine the current image clusters. This shows how a semi-supervised approach to image classification can efficiently improve the accuracy of the results when compared to a traditional unsupervised classification approach. Session Files	Elijah Dabkowski Cadet United States Military Academy (bio) CDT Elijah Dabkowski is a senior at the United States Military Academy majoring in Applied Statistics and Data Science. He branched Engineers and hopes to pursue a Master of Science in Data Science through a technical scholarship upon graduation. Within the Army, CDT Dabkowski plans to be a combat engineer stationed in either Germany or Italy for the early portion of his career before transitioning to the Operations Research and Systems Analysis career field in order to use his knowledge to help the Army make informed data-driven decisions. His research is centered around the application of semi-supervised learning in image classification to provide a proof-of-concept for the Army in how data science can be integrated with the subject matter expertise of professional analysts to streamline and improve current practices. He enjoys soccer, fishing, and snowboarding and is a member of the club soccer team as well as a snowboard instructor at West Point.	Presentation			2023
The Automaton General-Purpose Data Intelligence Platform (Abstract) The Automaton general-purpose data intelligence platform abstracts data analysis out to a high level and automates many routine analysis tasks while being highly extensible and configurable – enabling complex algorithms to elucidate mission-level effects. Automaton is built primarily on top of R Project and its features enable analysts to build charts and tables, calculate aggregate summary statistics, group data, filter data, pass arguments to functions, generate animated geospatial displays for geospatial time series data, flatten time series data into summary attributes, fit regression models, create interactive dashboards, and conduct rigorous statistical tests. All of these extensive analysis capabilities are automated and enabled from an intuitive configuration file requiring no additional software code. Analysts or software engineers can easily extend Automaton to include new algorithms, however. Automaton’s development was started at Johns Hopkins University Applied Physics Laboratory in 2018 to support an ongoing military mission and perform statistically rigorous analyses that use Bayesian-inference-based Artificial Intelligence to elucidate mission-level effects. Automaton has unfettered Government Purpose Rights and is freely available. One of DOT&E’s strategic science and technology thrusts entails automating data analyses for Operational Test & Evaluation as well as developing data analysis techniques and technologies targeting mission-level effects; Automaton will be used, extended, demonstrated/trained on, and freely shared to accomplish these goals and collaborate with others to drive our Department’s shared mission forward. This tutorial will provide an overview of Automaton’s capabilities (first 30 min, for Action Officers and Senior Leaders) as well as instruction on how to install and use the platform (remaining duration for hands-on-time with technical practitioners). Installation instructions are below and depend upon the user installing Windows Subsystem for Linux or having access to another Unix environment (e.g., macOS): Please install WSL V2 on your machines before the tutorial: https://learn.microsoft.com/en-us/windows/wsl/install Then please download/unzip the Automaton demo environment and place in your home directory: https://www.edaptive.com/dataworks/automaton_2023-04-18_dry_run_1.tar Then open up powershell from your home directory and type: wsl --import automaton_2023-04-18_dry_run_1 automaton_2023-04-18_dry_run_1 automaton_2023-04-18_dry_run_1.tar wsl -d automaton_2023-04-18_dry_run_1 Session Files Session Recording	Jeremy Werner Chief Scientist DOT&E (bio) Jeremy Werner, PhD, ST was appointed DOT&E’s Chief Scientist in December 2021 after initially starting at DOT&E as an Action Officer for Naval Warfare in August 2021. Before then, Jeremy was at Johns Hopkins University Applied Physics Laboratory (JHU/APL), where he founded a data science-oriented military operations research team that transformed the analytics of an ongoing military mission. Jeremy previously served as a Research Staff Member at the Institute for Defense Analyses where he supported DOT&E in the rigorous assessment of a variety of systems/platforms. Jeremy received a PhD in physics from Princeton University where he was an integral contributor to the Compact Muon Solenoid collaboration in the experimental discovery of the Higgs boson at the Large Hadron Collider at CERN, the European Organization for Nuclear Research in Geneva, Switzerland. Jeremy is a native Californian and received a bachelor’s degree in physics from the University of California, Los Angeles where he was the recipient of the E. Lee Kinsey Prize (most outstanding graduating senior in physics).		Session Recording		2023
Presentation The Calculus of Mixed Meal Tolerance Test Trajectories (Abstract) BACKGROUND Post-prandial glucose response resulting from a mixed meal tolerance test is evaluated from trajectory data of measured glucose, insulin, C-peptide, GLP-1 and other measurements of insulin sensitivity and β-cell function. In order to compare responses between populations or different composition of mixed meals, the trajectories are collapsed into the area under the curve (AUC) or incremental area under the curve (iAUC) for statistical analysis. Both AUC and iAUC are coarse distillations of the post-prandial curves and important properties of the curve structure are lost. METHODS Visual Basic Application (VBA) code was written to automatically extract seven different key calculus-based curve-shape properties of post-prandial trajectories (glucose, insulin, C-peptide, GLP-1) beyond AUC. Through two-sample t-tests, the calculus-based markers were compared between outcomes (reactive hypoglycemia vs. healthy) and against demographic information. RESULTS Statistically significant p-values (p < .01) between multiple curve properties in addition to AUC were found between each molecule studied and the health outcome of subjects based on the calculus-based properties of their molecular response curves. A model was created which predicts reactive hypoglycemia based on individual curve properties most associated with outcomes. CONCLUSIONS There is a predictive power using response curve properties that was not present using solely AUC. In future studies, the response curve calculus-based properties will be used for predicting diabetes and other health outcomes. In this sense, response-curve properties can predict an individual's susceptibility to illness prior to its onset using solely mixed meal tolerance test results. Session Files	Skyler Chauff Cadet United States Military Academy (bio) Skyler Chauff is a third-year student at the United States Military Academy at West Point. He is studying "Operations Research" and hopes to further pursue a career in data science in the Army. His hobbies include scuba-diving, traveling, and tutoring. Skyler is the head of the West Point tutoring program and helps lead the Army Smart nonprofit in providing free tutoring services to enlisted soldiers pursuing higher-level education. Skyler specializes in bioinformatics given his pre-medical background interwoven with his passion for data science.	Presentation			2023
Presentation The Component Damage Vector Method: A Statistically Rigorous Method for Validating AJEM (Abstract) As the Test and Evaluation community increasingly relies on Modeling and Simulation (M&S) to supplement live testing, M&S validation has become critical for ensuring credible weapon system evaluations. System-level evaluations of Armored Fighting Vehicles (AFV) rely on the Advanced Joint Effectiveness Model (AJEM) and Full-Up System Level (FUSL) testing to assess AFV vulnerability. This report reviews one of the primary methods that analysts use to validate AJEM, called the Component Damage Vector (CDV) Method. The CDV method compares components that were damaged in FUSL testing to simulated representations of that damage from AJEM. Session Files	Tom Johnson Research Staff Member IDA (bio) Tom works on the LFT&E of Army land-based systems. He has three degrees in Aerospace Engineering and specializes in statistics and experimental design, including the validation of modeling and simulation. Tom has been at IDA for 11 years.	Presentation			2023
Presentation The Containment Assurance Risk Framework of the Mars Sample Return Program (Abstract) The Mars Sample Return campaign aims at bringing rock and atmospheric samples from Mars to Earth through a series of robotic missions. These missions would collect the samples being cached and deposited on Martian soil by the Perseverance rover, place them in a container, and launch them into Martian orbit for subsequent capture by an orbiter that would bring them back. Given there exists a non-zero probability that the samples contain biological material, precautions are being taken to design systems that would break the chain of contact between Mars and Earth. These include techniques such as sterilization of Martian particles, redundant containment vessels, and a robust reentry capsule capable of accurate landings without a parachute. Requirements exist that the probability of containment not assured of Martian-contaminated material into Earth’s biosphere be less than one in a million. To demonstrate compliance with this strict requirement, a statistical framework was developed to assess the likelihood of containment loss during each sample return phase and make a statement about the total combined mission probability of containment not assured. The work presented here describes this framework, which considers failure modes or fault conditions that can initiate failure sequences ultimately leading to containment not assured. Reliability estimates are generated from databases, design heritage, component specifications, or expert opinion in the form of probability density functions or point estimates and provided as inputs to the mathematical models that simulate the different failure sequences. The probabilistic outputs are then combined following the logic of several fault trees to compute the ultimate probability of containment not assured. Given the multidisciplinary nature of the problem and the different types of mathematical models used, the statistical tools needed for analysis are required to be computationally efficient. While standard Monte Carlo approaches are used for fast models, a multi-fidelity approach to rare event probabilities is proposed for expensive models. In this paradigm, inexpensive low-fidelity models are developed for computational acceleration purposes while the expensive high-fidelity model is kept in the loop to retain accuracy in the results. This work presents an example of end-to-end application of this framework highlighting the computational benefits of a multi-fidelity approach. The decision to implement Mars Sample Return will not be finalized until NASA’s completion of the National Environmental Policy Act process. This document is being made available for information purposes only. Session Files Session URLSession Recording	Giuseppe Cataldo Head, Planetary Protection, MSR CCRS NASA (bio) Giuseppe Cataldo leads the planetary protection efforts of the Mars Sample Return (MSR) Capture, Containment and Return System (CCRS). His expertise is in the design, testing and management of space systems. He has contributed to a variety of NASA missions and projects including the James Webb Space Telescope, where he developed a Bayesian framework for model validation and a multifidelity approach to uncertainty quantification for large-scale, multidisciplinary systems. Giuseppe holds a PhD in Aeronautics and Astronautics from the Massachusetts Institute of Technology (MIT) and several master's degrees from Italy and France.	Presentation	Session Recording	Materials	2023
Presentation Tools for Assessing Machine Learning Models' Performance in Real-World Settings (Abstract) Machine learning (ML) systems demonstrate powerful predictive capability, but fielding such systems does not come without risk. ML can catastrophically fail in some scenarios, and in the absence of formal methods to validate most ML models, we require alternative methods to increase trust. While emerging techniques for uncertainty quantification and model explainability may seem to lie beyond the scope of many ML projects, they are essential tools for understanding deployment risk. This talk will share a practical workflow, useful tools, and lessons learned for ML development best practices. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525. SAND2023-11982A Session Files Session URLSession Recording	Carianne Martinez Principal Computer Scientist Sandia National Laboratories (bio) Cari Martinez is a Principal Computer Scientist in the Applied Machine Intelligence Department at Sandia National Laboratories. She is a technical lead for a team that focuses on applied deep learning research to benefit Sandia’s mission across a diverse set of science and engineering disciplines. Her research focuses on improving deep learning modeling capabilities with domain knowledge, uncertainty quantification, and explainability techniques. Cari's work has been applied to modeling efforts in several fields such as materials science, engineering science, structural dynamics, chemical engineering, and healthcare.	Presentation	Session Recording	Materials	2023
Topological Data Analysis’ involvement in Cyber Security (Abstract) The purpose of this research is to see the use and application of Topological Data Analysis (TDA) in the real of Cyber Security. The methods used in this research include an exploration of different Python libraries or C++ python interfaces in order to explore the shape of data that is involved using TDA. These methods include, but are not limited to, the GUDHI, GIOTTO, and Scikit-tda libraries. The project’s results will show where the literal holes in cyber security lie and will offer methods on how to better analyze these holes and breaches. Session Files Session URLSession Recording	Anthony Cappetta and Elie Alhajjar Cadet United States Military Academy (bio) Anthony “Tony” Cappetta is native of Yardley, Pennsylvania, senior, and Operations Research major at the United States Military Academy (USMA) at West Point. Upon graduation, Tony will commission in the United States Army as a Field Artillery Officer. Tony serves as the training staff officer of the Scoutmasters’ Council, French Forum, and Center for Enhanced Performance. He was also on the Crew team for three years as a student-athlete. An accomplished pianist, Tony currently serves as the Cadet-in-Charge of the Department of Foreign Language’s Piano and Voice Mentorship program, which he has been a part of since arriving at the Academy. Tony has planned and conducted independent research in the field of statistical concepts at USMA as well as independent studies in complex mathematics (Topology and Number Theory) with Dr. Andrew Yarmola of Princeton University. Currently, his research is interested in the Topological Data Analysis application on Cyber Security. He is a current semi-finalist for in the Fulbright program where he hopes to model and map disease transmission in his pursuits to eradicate disease.		Session Recording	Materials	2023
Presentation Towards Scientific Practices for Situation Awareness Evaluation in Operational Testing (Abstract) Situation Awareness (SA) plays a key role in decision making and human performance; higher operator SA is associated with increased operator performance and decreased operator errors. In the most general terms, SA can be thought of as an individual’s “perception of the elements in the environment within a volume of time and space, the comprehension of their meaning, and the projection of their status in the near future.” While “situational awareness” is a common suitability parameter for systems under test, there is no standardized method or metric for quantifying SA in operational testing (OT). This leads to varied and suboptimal treatments of SA across programs and test events. Current measures of SA are exclusively subjective and paint an inadequate picture. Future advances in system connectedness and mission complexity will exacerbate the problem. We believe that technological improvements will necessitate increases in the complexity of the warfighters’ mission, including changes to team structures (e.g., integrating human teams with human-machine teams), command and control (C2) processes (e.g., expanding C2 frameworks toward joint all-domain C2), and battlespaces (e.g., overcoming integration challenges for multi-domain operations). Operational complexity increases the information needed for warfighters to maintain high SA, and assessing SA will become increasingly important and difficult to accomplish. IDA’s Test science team has proposed a piecewise approach to improve the measurement of situation awareness in operational evaluations. The aim of this presentation is to promote a scientific understanding of what SA is (and is not) and encourage discussion amongst practitioners tackling this challenging problem. We will briefly introduce Endsley’s Model of SA, review the trade-offs involved in some existing measures of SA, and discuss a selection of potential ways in which SA measurement during OT may be improved. Session Files Session URLSession Recording	Miriam Armstrong Research Staff Member IDA (bio) Dr. Armstrong is a human factors researcher at IDA where she is involved in operational testing of defense systems. Her expertise includes interactions between humans and autonomous systems and psychometrics. She received her PhD in Human Factors Psychology from Texas Tech University in 2021. Coauthors Elizabeth Green, Brian Vickers, and Janna Mantua also conduct human subjects research at IDA.	Presentation	Session Recording	Materials	2023
Uncertain Text Classification for Proliferation Detection (Abstract) A key global security concern in the nuclear weapons age is the proliferation and development of nuclear weapons technology, and a crucial part of enforcing non-proliferation policy is developing an awareness of the scientific research being pursued by other nations and organizations. Deep, transformer-based text classification models are an important piece of systems designed to monitor scientific research for this purpose. For applications like proliferation detection involving high-stakes decisions, there has been growing interest in ensuring that we can perform well-calibrated, interpretable uncertainty quantification with such classifier models. However, because modern transformer-based text classification models have hundreds of millions of parameters and the computational cost of uncertainty quantification typically scales with the size of the parameter space, it has been difficult to produce computationally tractable uncertainty quantification for these models. We propose a new variational inference framework that is computationally tractable for large models and meets important uncertainty quantification objectives including producing predicted class probabilities that are well-calibrated and reflect our prior conception of how different classes are related. Session Files Session URLSession Recording	Andrew Hollis Graduate Student North Carolina State University (bio) Andrew Hollis was born raised in Los Alamos, New Mexico. He attended the University of New Mexico as a Regents’ Scholar and received his bachelor’s degree in statistics with minors in computer science and mathematics in spring 2018. During his time in undergraduate, he also completed four summer internships at Los Alamos National Laboratory in the Principal Associate Directorate for Global Security. He began the PhD program in Statistics at North Carolina State University in August of 2018, and received his Masters of Statistics in December of 2020. While at NCSU, he has conducted research in collaboration with the Laboratory for Analytical Sciences, a research lab focused on building analytical tools for the intelligence community, the Consortium for Nonproliferation Enabling Capabilities, and West Point. He has had opportunities to complete two internships with the Department of Defense including an internship with the Air Force at the Pentagon in the summer of 2022. He plans to graduate with his PhD in May of 2023, and will begin working with the Air Force as an operations research analyst after graduation.		Session Recording	Materials	2023
Presentation Uncertainty Aware Machine Learning for Accelerators (Abstract) Standard deep learning models for classification and regression applications are ideal for capturing complex system dynamics. Unfortunately, their predictions can be arbitrarily inaccurate when the input samples are not similar to the training data. Implementation of distance aware uncertainty estimation can be used to detect these scenarios and provide a level of confidence associated with their predictions. We present results using Deep Gaussian Process Approximation (DGPA) methods for 1) anomaly detection at Spallation Neutron Source (SNS) accelerator and 2) uncertainty aware surrogate model for the Fermi National Accelerator Lab (FNAL) Booster Accelerator Complex. Session Files Session URLSession Recording	Malachi Schram Head of Data Science Dept. Thomas Jefferson National Accelerator Facility (bio) Dr. Malachi Schram is the head of the data scientist department at the Thomas Jefferson National Accelerator Facility. His research spans large scale distributed computing, applications for data science, and developing new techniques and algorithms in data science. His current research is focused on uncertainty quantification for deep learning and new techniques for design and control.	Presentation	Session Recording	Materials	2023
Presentation Uncertainty Quantification of High Heat Microbial Reduction for NASA Planetary Protection (Abstract) Planetary Protection is the practice of protecting solar system bodies from harmful contamination by Earth life and protecting Earth from possible life forms or bioactive molecules that may be returned from other solar system bodies. Microbiologists and engineers at NASA’s Jet Propulsion Laboratory (JPL) design microbial reduction and sterilization protocols that reduce the number of microorganisms on spacecraft or eliminate them entirely. These protocols are developed using controlled experiments to understand the microbial reduction process. Many times, a phenomenological model (such as a series of differential equations) is posited that captures key behaviors and assumptions of the process being studied. A Sterility Assurance Level (SAL) – the probability that a product, after being exposed to a given sterilization process, contains one or more viable organisms – is a standard metric used to assess risk and define cleanliness requirements in industry and for regulatory agencies. Experiments performed to estimate the SAL of a given microbial reduction or sterilization protocol many times have large uncertainties and variability in their results even under rigorously implemented controls that, if not properly quantified, can make it difficult for experimenters to interpret their results and can hamper a credible evaluation of risk by decision makers. In this talk, we demonstrate how Bayesian statistics and experimentation can be used to quantify uncertainty in phenomenological models in the case of microorganism survival under short-term high heat exposure. We show how this can help stakeholders make better risk-informed decisions and avoid the unwarranted conservatism that is often prescribed when processes are not well understood. The experiment performed for this study employs a 6 kW infrared heater to test survivability of heat resistant Bacillus canaveralius 29669 to temperatures as high as 350 °C for time durations less than 30 sec. The objective of this study was to determine SALs for various time-temperature combinations, with a focus on those time-temperature pairs that give a SAL of 10^-6. Survival ratio experiments were performed that allow estimation of the number of surviving spores and mortality rates characterizing the effect of the heat treatment on the spores. Simpler but less informative fraction-negative experiments that only provide a binary sterile/not-sterile outcome were also performed once a sterilization temperature regime was established from survival ratio experiments. The phenomenological model considered here is a memoryless mortality model that underlies many heat sterilization protocols in use today. This discussion and poster will outline how the experiment and model were brought together to determine SALs for the heat treatment under consideration. Ramifications to current NASA planetary protection sterilization specifications and current missions under development such as Mars Sample Return will be discussed. This presentation/poster is also relevant to experimenters and microbiologists working on military and private medical device applications where risk to human life is determined by sterility assurance of equipment. Session Files Session URLSession Recording	Michael DiNicola Systems Engineer Jet Propulsion Laboratory, California Institute of Technology (bio) Michael DiNicola is a senior systems engineer in the Systems Modeling, Analysis & Architectures Group at the Jet Propulsion Laboratory (JPL). At JPL, Michael has worked on several mission concept developments and flight projects, including Europa Clipper, Europa Lander and Mars Sample Return, developing probabilistic models to evaluate key mission requirements, including those related to planetary protection, and infuse this modeling into trades throughout formulation of the mission concepts. He works closely with microbiologists in the Planetary Protection group to model assay and sterilization methods, and applies mathematical and statistical methods to improve Planetary Protection engineering practices at JPL and across NASA. At the same time, he also works with planetary scientists to characterize the plumes of Enceladus in support of future mission concepts. Michael earned his B.S. in Mathematics from the University of California, Los Angeles and M.A. in Mathematics from the University of California, San Diego.	Presentation	Session Recording	Materials	2023
Presentation User-Friendly Decision Tools (Abstract) Personal experience and anecdotal evidence suggest that presenting analyses to sponsors, especially technical sponsors, is improved by helping the sponsor understand how results were derived. Providing summaries of analytic results is necessary but can be insufficient when the end goal is to help sponsors make firm decisions. When time permits, engaging sponsors with walk-throughs of how results may change given different inputs is particularly salient in helping sponsors make decisions in the context of the bigger picture. Data visualizations and interactive software are common examples of what we call "decision tools" that can walk sponsors through varying inputs and views of the analysis. Given long-term engagement and regular communication with a sponsor, developing user-friendly decision tools is a helpful practice to support sponsors. This talk presents a methodology for building decision tools that combines leading practices in agile development and STEM education. We will use a Python-based app development tool called Streamlit to show implementations of this methodology. Session Files Session URLSession Recording	Clifford Bridges Research Staff Member IDA (bio) Clifford is formally trained in theoretical mathematics and has additional experience in education, software development, and data science. He has been working for IDA since 2020 and often uses his math and data science skills to support sponsors' needs for easy-to-use analytic capabilities. Prior to starting at IDA, Clifford cofounded a startup company in the fashion technology space and served as Chief Information Officer for the company.	Presentation	Session Recording	Materials	2023
Presentation Using Multi-Linear Regression to Understand Cloud Properties' Impact on Solar Radiance (Abstract) With solar energy being the most abundant energy source on Earth, it is no surprise that the reliance on solar photovoltaics (PV) has grown exponentially in the past decade. The increasing costs of fossil fuels have made solar PV more competitive and renewable energy more attractive, and the International Energy Agency (IEA) forecasts that solar PV's installed power capacity will surpass that of coal by 2027. Crucial to the management of solar PV power is the accurate forecasting of solar irradiance, which is heavily impacted by different types and distributions of clouds. Many studies have aimed to develop models that accurately predict the global horizontal irradiance (GHI) while accounting for the volatile effects of clouds; in this study, we aim to develop a statistical model that helps explain the relationship between various cloud properties and solar radiance reflected by clouds them-self. Using 2020 GOES-16 data from the GOES R-Series Advanced Baseline Imager (ABI), we investigated the effect that the cloud-optical depth, cloud top temperature, solar zenith angle, and look zenith angle had on cloud solar radiance while accounting for differing longitude and latitudes. Using these variables as the explanatory variables, we developed a linear model using multi-linear regression that, when tested on untrained data sets from different days (same time of day as the training set), results in a coefficient of determination (R^2) between .70-.75. Lastly, after analyzing the variables' degree of contribution to the cloud solar radiance, we presented error maps that highlight areas where the model succeeds and fails in prediction accuracy. Session Files	Grant Parker Cadet United States Military Academy (bio) CDT Grant Parker attends the United States Military Academy and will graduate and commission in May 2023. He is an Applied Statistics and Data Science major and is currently conducting his senior thesis with Lockheed Martin Space. At the academy, he serves as 3rd Regiment's Operations Officer where he is responsible for planning and coordinating all trainings and events for the regiment. After graduation, CDT Parker hopes to attend graduate school and then start his career as a cyber officer in the US Army.	Presentation			2023
Presentation Utilizing Side Information alongside Human Demonstrations for Safe Robot Navigation (Abstract) Rather than wait to the test and evaluation stage of a given system to evaluate safety, this talk proposes a technique which explicitly considers safety constraints during the learning process while providing probabilistic guarantees on performance subject to the operational environment's stochasticity. We provide evidence that such an approach results an overall safer system than their non-explicit counterparts in the context of wheeled robotic ground systems learning autonomous waypoint navigation from human demonstrations. Specifically, inverse reinforcement learning (IRL) provides a means by which humans can demonstrate desired behaviors for autonomous systems to learn environmental rewards (or inversely costs). The proposed presentation addresses two limitations of existing IRL techniques. First, previous algorithms require an excessive amount of data due to the information asymmetry between the expert and the learner. When a demonstrator avoids a state, it is not clear if it was because the state is sub-optimal or dangerous. The proposed talk explains how safety can be explicitly incorporated in IRL by using task specifications defined using linear temporal logic. Referred to as side information, this approach enables autonomous ground robots to avoid dangerous states both during training, and evaluation. Second, previous IRL techniques make the often unrealistic assumption that the agent has access to full information about the environment. We remove this assumption by developing an algorithm for IRL in partially observable Markov decision processes (POMDPs) which induces state uncertainty. The developed algorithm reduces the information asymmetry while increasing the data efficiency by incorporating task specifications expressed in temporal logic into IRL. The intrinsic nonconvexity of the underlying problem is managed in a scalable manner through a sequential linear programming scheme that guarantees local converge. In a series of examples, including experiments in a high-fidelity Unity simulator, we demonstrate that even with a limited amount of data and POMDPs with tens of thousands of states, our algorithm learns reward functions and policies that satisfy the safety specifications while inducing similar behavior to the expert by leveraging the provided side information. Session Files	Christian Ellis Journeyman Fellow Army Research Laboratory (bio) Christian Ellis is a PhD student focused on building safe and robust learning algorithms for autonomous ground systems which can operate in environments beyond were they are trained. His research interests include safe inverse reinforcement learning, environmental uncertainty quantification, test and evaluation, and formal verification. His resulting publications provide applied solutions to complete United States Army missions on real robotic hardware. Christian recently received a best student paper award for the paper titled, "Software and System Reliability Engineering for Autonomous Systems incorporating Machine Learning". Before research, Christian worked as a lead software engineer at a startup.	Presentation			2023
Validating the Prediction Profiler with Disallowed Combination: A Case Study (Abstract) The prediction profiler is an interactive display in JMP statistical software that allows a user to explore the relationships between multiple factors and responses. A common use case of the profiler is for exploring the predicted model from a designed experiment. For experiments with a constrained design region defined by disallowed combinations, the profiler was recently enhanced to obey such constraints. In this case study, we show how a DOE based approach to validating statistical software was used to validate this enhancement. Session Files Session URLSession Recording	Yeng Saanchi Analytic Software Tester JMP Statistical Discovery (bio) Yeng Saanchi is an Analytic Software Tester at JMP Statistical Discovery LLC, a SAS company. Her research interests include stochastic optimization and applications of optimal experimental designs in precision medicine.		Session Recording	Materials	2023
Presentation Well-Calibrated Uncertainty Quantification for Language Models in the Nuclear Domain (Abstract) A key component of global and national security in the nuclear weapons age is the proliferation of nuclear weapons technology and development. A key component of enforcing this non-proliferation policy is developing an awareness of the scientific research being pursued by other nations and organizations. To support non-proliferation goals and contribute to nuclear science research, we trained a RoBERTa deep neural language model on a large set of U.S. Department of Energy Office of Science and Technical Information (OSTI) research article abstracts and then finetuned this model for classification of scientific abstracts into 60 disciplines, which we call NukeLM. This multi-step approach to training improved classification accuracy over its untrained or partially out-of-domain competitors. While it is important for classifiers to be accurate, there has also been growing interest in ensuring that classifiers are well-calibrated with uncertainty quantification that is understandable to human decision-makers. For example, in the multiclass problem, classes with a similar predicted probability should be semantically related. Therefore, we also introduced an extension of the Bayesian belief matching framework proposed by Joo et al. (2020) that easily scales to large NLP models, such as NukeLM, and better achieves the desired uncertainty quantification properties. Session Files Session URLSession Recording	Karl Pazdernik Senior Data Scientist Pacific Northwest National Laboratory (bio) Dr. Karl Pazdernik is a Senior Data Scientist within the National Security Directorate at Pacific Northwest National Laboratory (PNNL), a team lead within the Foundational Data Science group at PNNL, and a Research Assistant Professor at North Carolina State University. He is the program lead for the Open-Source Data Analytics program and a principal investigator on projects that involve disease modeling and image segmentation for materials science. His research has focused on the uncertainty quantification and dynamic modeling of multi-modal data with a particular interest in text analytics, spatial statistics, pattern recognition, anomaly detection, Bayesian statistics, and computer vision applied to financial data, networks, combined open-source data, disease prediction, and nuclear materials. He received a B.A. in Mathematics from Saint John's University, a Ph.D. in Statistics from Iowa State University, and was a postdoctoral scholar at North Carolina State University under the Consortium for Nonproliferation Enabling Capabilities.	Presentation	Session Recording	Materials	2023

Subscribe

DATAWorks Sessions Archive

Active Filters: