Session Type: Keynote Mini-Tutorial Panel Presentation Roundtable Short Course Short Course Group Webinar All Types Year: All Years 2025 2024 2023 2022 2021 2020 2019 2018 2017 2016 All Years Last Name: All Last NamesAbdullaAdamsAgata-MossAguayoAguirreAhmedAhnerAlanizAlderAlhajjarAllisonAlvaradoAlves da MataAmayaAmrutheshAnderson-BergmanAnderson-CookAndresAndrewsAnnettArmstrongAshbyAshtonAshwellAubertineAusmanAveryAxdahlBaileyBajumpaaBallardBanksBarkerBartelBartisBatlleBaurleBehlerBeilBelingBellBenardiniBernsteinBezenerBhadraBhattacharyaBieberBjornstadBlankenBoarnetBobinskiBoeningBoltonBomaritoBordenBottingBoucherBrattonBravermanBridgesBrownBrownlowBrumerBryantBullockBurkeBurnsButlerCalderonCalhounCalucciCannonCantorCappCappettaCardinal-StakenasCarterCaryCasletonCastro-SchiloCataldoCazaresChandrasekaranChatterjeeChauffChellChenChernisChestertonChickenChienChoChrismanChrisrianChuCilliClaassenClancyClarkCodyCoenColeCollinsCommoConwayCooperCorbettCortesCrespoCroomCruzeCummingsCunninghamD'AuriaDabkowskiDareDawsonDennisDennisonDezfuliDickinsonDimapasokDinicolaDiNicolaDiscussionDoaneDoeblerDonnellyDoughertyDrakeDriscollDroneDrsicollDumaEdegbeEdmanEdwardsEllisEngEsserFabritiusFaddiFantoFarinaFashenaFealingFedeleFelterFergusonFerryFillibenFiondellaFioritoFischerFlackFloccoFloresFlowersFossFoxFranckFrancomFrankforterFrederiksenFreelsFreemanFrenchFriesFronczykGallagherGarbuno-InigoGarciaGartrellGasbarreGawGelderGelsingerGerardiGhanemGilabertGilmoreGoldenGoldhahnGopalanGotwaltGouGoughGozdzGramacyGrauGrazianiGrecoGreenwoodGribokGriegoGriffithsGrimesGuertinHagaHallHamanHanHannanHansenHarmanHarrisonHarshbargerHasnainHawthorneHeatherHeimbachHeinichHellmanHerreraHerrmannHeuringHickmannHigdonHillHodsonHoerlHolbrookHolladayHollandHolldayHollisHolstonHookeHooverHorowitzHosderHotchkissHoufekHoustonHubbardHumphriesHundHuntHunterHurstHurtHuttoHuttonIguchiIngersollIyerJablonskiJainJenkinsJensenJinJohnsonJonesJuarezJurewiczKaberKackerKaoKarKhaKidderKilgoreKimKingKirkpatrickKirschKirshenbaumKleinKoserKoshuteKrogstadKrolikowskiKrometisKuhnKupfererLahmanLamminpaeaeLancasterLandmanLaneLangleyLanusLaPointeLauLeberLeeLekivetzLennonLeporeLeserLesthaegheLewisLiangLiawLievenseLillardLinLippLipscombLittleLiuLombardoLongoLooneyLundLundgrenLuperonLyonsMadhavanMaffeyMahmoudMalburgMaloneyManciniMarchetteMargolisMarloweMarshallMartinMartinezMaryMasielloMastinMatisooMatrinezMaueryMcBrideMcCollumMccoyMcCoyMcFaddenMcGinnityMcGovernMcKeeMcKnightMcLartyMcleanMcLeanMcNamaraMedlinMeekerMetcalfeMettsMiddletonMillerMitchellMoitraMolloyMontgomeryMorganMorgan-WallMorrisonMotgomeryMovitMullinsMurchisonMurpheyMurphyMyersNachtsheimNatoliNewcomerNiblickNilsenNivenNixNormanNullNykazaO'BrienO'connellO'TooleOasOED Cyber LabOkumotoOlingerOrelukOstranderOuelletteOwensPagán-RiveraPankokParasidisParcellParkerParrishPaté-CornellPatelPattersonPazdernikPearcePekPeriPerkinsPerklinsPerreaultPerryPhadkePhamPhillipsPinelisPintarPleasantPondPoorePorterPossoloPostPowellPrinzel IIIProvostQianQuinlanRamertRamuhalliRatchfordRathsamRaunakRayRazinRemleyRemundRhewRhodeRiccaRichieRiesRisherRobertsRobinsonRogersRomineRoughtRubinsteinRudakevychRushdiRushingRuthSaanchiSaftaSahakianSaieSalboukhSaldanhaSalvagniniSanbornSanchezSandmanSavage-KnepshieldSchaeferScheissSchlupSchmidtSchneck, IIISchramSchwartzSeablomSeeSgambelloneShafferShandSheehaSheehanShelburneSheldonShiSholderShteinmanSieckSiglerSilvaSilvestriniSimosSimpsonSingerSkarzynskiSkopicSkurikhinSmartSmirnovSmithSnavelySpaldingSparrowStaffordStallrichStankStanleyStarlingSteinleyStogoskiStoneStracuzziStreetSubediSullivanSusiluotoSzlendakTaketaTateTaylorTelfordThakkarTheimerThomasThomeThompsonThorpeThrushTimkoTimmeTomanTomlinsonToroTranTrentTruettTuckerTuftTurcotteUlrichUpadhyayValentinValentineVan-MellekomVan-MullekomVarbanovVaughnVeneriVengazhiyilVermaVickersVillantiViningVinnedgeWachWalkerWangWarnerWarrenWattsWelkerWellsWernerWestWheatonWhetstoneWhitcombWhiteWhitledgeWignallWilkersonWilkinsWilliamsenWilsonWisnowskiWojtonWomachWoolleyWrayXuYawluiYiYildirimYonzonYoumansYoungZachariasZhangZoellner Active Filters: -clear filters- X 2024 Army Test and Evaluation Command AI Challenge - Xray Image Defect Detector Session Recording Materials Abstract: Developing AI solutions requires the alignment of business needs, data availability, digital tools, and subject matter expertise. The Army Test and Evaluation Command (ATEC) is an organization with extensive data and operational requirements but is still developing its AI capabilities. To address this gap, ATEC launched an annual AI Challenge—an enterprise-wide, education-focused initiative aimed at solving real-world business problems using the new ATEC Data Mesh digital ecosystem. The 2024 challenge focused on automating defect detection in X-ray scans of body armor. Over three months, 153 participants across 29 teams—including internal and external partners—developed computer vision solutions to autonomously identify manufacturing defects such as cracks, foreign debris, and voids. The winning team achieved a remarkable 92% accuracy in defect detection. This effort not only resulted in a valuable tool that enhances operational capacity and efficiency but also significantly advanced AI expertise across the organization. Participants gained hands-on experience with cloud infrastructure, while ATEC refined its methodologies for testing and evaluating AI-enabled systems. The AI Challenge exemplifies how combining educational resources and competition can foster innovation towards real capabilities. Speaker Info: David Niblick AI Evaluator Army Evaluator Center MAJ David Niblick graduated from the United States Military Academy at West Point in 2010 with a BS in Electrical Engineering. He served in the Engineer Branch as a lieutenant and captain at Ft. Campbell, KY with the 101st Airborne Division (Air Assault) and at Schofield Barracks, HI with the 130th Engineer Brigade. He deployed twice to Afghanistan ('11-'12 and '13-'14) and to the Republic of Korea ('15-'16). After company command, he attended Purdue University and received an MS in Electrical and Computer Engineering with a thesis in computer vision and deep learning. He instructed in the Department of Electrical Engineering and Computer Science at USMA, after which he transferred from the Engineer Branch to Functional Area 49 (Operations Research and Systems Analysis). He currently serves as an Artificial Intelligence Evaluator with Army Test and Evaluation Command at Aberdeen Proving Ground, MD. Presentation | 2025 2024 Army Test and Evaluation Command AI Challenge - Xray David Niblick Army Evaluator Center X A Case Study-based Assessment of a Model-driven Testing Methodology for Applicability and Session Recording Materials Abstract: The Department of Defense (DoD) Test and Evaluation (T&E) community has fully embraced digital engineering, as defined in the 2018 Digital Engineering Strategy, motivating the ongoing development and adoption of model-based testing methodologies. This article expands upon existing grey box model-driven test design (MDTD) approaches by leveraging model-based systems engineering (MBSE) artifacts to generate flight test planning models and documents. A baseline model of a system under test (SUT) and two additional system case studies are used to assess a Model-Driven Test Design (MDTD) process. The paper illustrates the method's applicability to these case studies, assesses the benefits of MDTD by applying novel metrics of model element reuse, and discusses the relevance to operational flight testing. This approach is novel within the flight-testing community, as it is the first implementation of MDTD in USAF operational testing applications. Whereas previous studies have explored SysML model reuse in small-scale problems or product families, MBSE model management for operational tests at flight-system scale and assessment of reuse in the T&E phase of the SE lifecycle are unresearched to date. This methodology and the case studies will be of particular interest to those involved in developing, executing, and reporting on flight test plans in the context of the DoD Digital Engineering transformation. Speaker Info: Jose Alvarado Technical Advisor AFOTEC Detachment 5 JOSE ALVARADO serves as a technical advisor for AFOTEC Detachment 5 at Edwards AFB, California, with over 33 years of developmental and operational test and evaluation experience. He is interested in applying MBSE concepts to the flight test engineering domain and implementing test process improvements through MBT. Jose holds a B.S. in Electrical Engineering from California State University, Fresno (1991), an M.S. in Electrical Engineering from California State University, Northridge (2002), and a Ph.D. in the Systems Engineering from Colorado State University (2024). He also serves as an adjunct faculty member for the mathematics, science and engineering (MSE) and technical career education (CTE) departments at the Antelope Valley College. He is a member of the International Test and Evaluation Association, Antelope Valley Chapter. Presentation | 2025 A Case Study-based Assessment of a Model-driven Testing Jose Alvarado AFOTEC Detachment 5 X A Comparison of Methods for Integrated Evaluation of Complex Systems Session Recording Materials Abstract: A strategic goal of the DoD test and evaluation community is to combine information from across the acquisition lifecycle, enabling better understanding of systems earlier and design of tests to maximize information later. This talk will provide a systematic comparison of methods for integrating such information, ranging from hierarchical methods to informative priors to normalized power priors, leveraging as a motivating example a notional model of a defense system for which behavior and test factors evolve as the system develops. Though large-scale simulation experiments testing a variety of situations and assumptions, we will illustrate how the techniques work and their promise for improving understanding of systems, while highlighting best practices as well as potential implementation pitfalls. The comparison illuminates best practices for integrated system evaluation and illustrates how modeling assumptions affect estimates of system parameters. Speaker Info: Justin Krometis Research Assistant Professor Virginia Tech Justin Krometis is a Research Assistant Professor with the Virginia Tech National Security Institute and holds an affiliate position in the Math Department at Virginia Tech. His research in mostly in development of theoretical and computational frameworks for Bayesian data analysis. These include approaches to incorporating and balancing data and expert opinion into decision-making, estimating model parameters, including high- or even infinite-dimensional quantities, from noisy data, and designing experiments to maximize the information gained. His research interests include: Parameter Estimation, Uncertainty Quantification, Experimental Design, High-Performance Computing, Artificial Intelligence/Machine Learning (AI/ML), and Reinforcement Learning. Prior to joining VTNSI, Dr. Krometis worked as a computational scientist supporting high-performance computing and as a transportation modeler to enhance evacuation planning and hurricane, pandemic, and other emergency preparedness. He holds Ph.D., M.S., and B.S. degrees in Math and a B.S. degree in Physics, all from Virginia Tech. | 2025 A Comparison of Methods for Integrated Evaluation of Complex Justin Krometis Virginia Tech X A Framework for VV&A of Preproduction Software Environments Session Recording Materials Abstract: The evolution from a traditional waterfall software release model to a continuous and iterative release model allows operational testers to conduct operational testing earlier in the development lifecycle. To enable operationally realistic software testing before deploying to users, programs create preproduction environments designed to replicate the hardware and software infrastructure of production environments. However, it can be challenging for testers to assess the similarity between the preproduction and production software environments and determine what data can be used from the preproduction environment to support operational evaluations. We present a general framework for the Verification, Validation, and Accreditation (VV&A) of preproduction environments that aims to be rigorous yet flexible enough to meet the needs of acquisition programs conducting operational testing in preproduction environments. This framework includes a three-stage VV&A process, composed of an initial VV&A, followed by a set of automated and continuous verification and validation (V&V) checks, and a VV&A renewal if major differences between the environments appear. We describe the data needed to verify and validate the environment and how to customize the VV&A process to fit the needs of each program. Speaker Info: Luis Aguirre Research Staff Member IDA Luis Aguirre is a Research Staff Member at the Institute for Defense Analyses. His work at IDA has focused on operational test and evaluation of Joint C3 systems and major automated information systems. Luis earned a BS in life sciences from the University of Illinois-Chicago and a PhD in organismic and evolutionary biology from the University of Massachusetts-Amherst. | 2025 A Framework for VV&A of Preproduction Software Luis Aguirre IDA X A Quantified Approach to Synthetic Dataset Evaluation Session Recording Materials Abstract: The advent of advanced machine learning (ML) capabilities has dramatically increased the need for data to train, test, and validate models. At the same time, systems and models are being asked to operate in increasingly diverse environments. With the high cost of data collection and labeling, or even the complete lack of available data, synthetic data has become an attractive alternative. Synthetic data promises many potential benefits over traditional datasets including reduced cost, improved coverage of edge-cases, more balanced data sets, reduced data collection time, and in many cases, it may be the only data that is available for a target environment. At the same time, it introduces potential risks including lack of realism, bias amplification, overfitting to potentially synthetic features, missing real-world variability, and lack of confidence in the results. The degree to which these benefits and risks manifest depends greatly on the type of problem and the way in which synthetic data is generated and utilized. In this paper we propose a principled systematic approach to testing the effectiveness of synthetic data for specific classes of problems. We illustrate our approach on an image classifier using a flower type database. We first establish a model baseline by training and testing the classifier model with real data, then measure its performance. We then establish a synthetic dataset baseline by attempting to train binary classifiers to distinguish each synthetic dataset from the real dataset. Poor performance of the binary classifier indicates that the corresponding synthetic dataset is a better representation of the real data. We then conduct two core sets of experiments evaluating the effectiveness of the synthetic data in training (replacement and augmentation), and another set evaluating the effectiveness of synthetic data for testing. In the replacement experiments we gradually replace real data with synthetic data and measure the degradation in performance for each synthetic dataset. In the augmentation experiments we augment the available real data with additional synthetic data and measure the improvement in performance. Finally, we conduct a set of experiments to evaluate the usefulness of synthetic data for testing. We do this by comparing performance metrics calculated with different subsets of real data against different subsets of synthetic data. In addition, we perturb the model by deliberately degrading the training data (e.g., by deliberately mislabeling subsets) and verifying that the resulting degradation in performance as calculate with synthetic data tracks the degradation as calculated with real data. For each of the synthetic data sets, we compare the results with the original synthetic data quality evaluation we calculated in our baseline. Speaker Info: Jeffery Hansen ML Research Scientist Software Engineering Institute Presentation | 2025 A Quantified Approach to Synthetic Dataset Evaluation Jeffery Hansen Software Engineering Institute X A Quest Called TRIBE: Clustering Malware Families for Enhanced Triage and Analysis Session Recording Materials Abstract: According to AV-Test, roughly 450,000 new malicious programs are detected each day adding to a total number of malware signatures that stands at almost 1.5 billion. These known signatures are analyzed and given labels by antivirus companies to classify the malware. These classifications allow security operation centers or antivirus programs to more easily take action to prevent or stop costly damage. In recent years, polymorphic malware, malware that intentionally obfuscates its behavior and signature, has seen a rise in prevalence. We aim to show that current antivirus classifications inefficiently group malware, especially in the case of polymorphic malware, that share enough intrinsic similarities with other malware to justify consolidation into broader groupings we are calling tribes. We hypothesize that the consolidation of these labels will reduce the time it takes for analysts to classify malware thus lowering incident response time. This generalized labeling will be implemented through the use of a transformer-based sequence-to-sequence variational autoencoder that takes in a malware binary and produces a clustering based on its distinct characteristics. We are naming this method Tribal Relational Inferential Encoder (TRIBE). The use of autoencoders in malware classification has shown promise in accurately labeling malware. TRIBE will perform unsupervised learning to independently create these tribes, generalized labels, and compare the results to existing labeling schemes. We estimate three outcomes from research: a data set of existing malware antivirus families with associated malware, a trainable autoencoder tool that will produce robust malware tribes, and a classifier that will make use of tribes to label malware. Speaker Info: Justin Liaw Student United States Naval Academy Our names are name MIDN 1/C Justin Liaw, MIDN 1/C Michael Chen, and MIDN 1/C John Jenness. We are seniors at the United States Naval Academy working with Professor Dane Brown and Commander Edgar Jatho on our final capstone project. We are Computer Science and Cyber Operations dual majors all commissioning into different occupational fields in the Navy and Marine Corps. | 2025 A Quest Called TRIBE: Clustering Malware Families for Justin Liaw United States Naval Academy X A Quest for Meaningful Performance Metrics in Multi-Label Classification Materials Abstract: Multi-label classification is a crucial tool in various applications such as image classification and document categorization. Unlike single-label classification, the evaluation of multi-label classification performance is complex because a model's prediction can be partially correct—capturing some labels while missing others. This lack of a straightforward binary correct/incorrect outcome introduces challenges in model testing and evaluation. Model developers rely on a collection of metrics to fine-tune these models for optimal performance; however, existing metrics are mostly adapted from single-label contexts. This has led to large array of metrics which are difficult to interpret and may not adequately reflect model performance in a practical manner. To address this issue, we designed an experiment which replaced multi-label classification algorithms with a random process to evaluate how metrics perform relative to prescribed classifier and dataset characteristics. The relationships between metrics and how those relationships change relative to dataset attributes were investigated with the goal to down-select to a spanning set. Additionally, we explored the potential for developing more interpretable metrics by incorporating dataset characteristics into model evaluation. Speaker Info: Marie Tuft Senior R&D Statistician Sandia National Laboratories Marie Tuft is a Senior Statistician at Sandia National Laboratories. Her work involves AI/ML evaluation with emphasis on human interaction with algorithms, statistical methods development for faster industrial product realization, and communication of data for risk-informed decision making. She earned an Honors BS in Mathematics from the University of Utah and a PhD in Biostatistics from the University of Pittsburgh. Presentation | 2025 A Quest for Meaningful Performance Metrics in Multi-Label Marie Tuft Sandia National Laboratories X Achieving Predictable Delivery with Credible Modeling, Simulation, and Analysis Session Recording Materials Abstract: Over the past three decades, an evolution has occurred in popular terminology, beginning with “simulation-based acquisition” (SBA), transitioning to “digital engineering,” and then to “model-based systems engineering.” Although these three terms emphasize different aspects, the consistent, unspoken, element that underlies all of them is credible modeling, simulation, and analysis (CMSA). It is CMSA that enables predictable delivery. We use the word “delivery” to encompass three aspects that are relevant to SBA: (a) satisfying design intent, (b) meeting the cost/schedule goal, and (c) reaching the production throughput target. The word “credible” implies using the language of probability to quantify uncertainty to support decision-making within estimated risk/reward bounds to achieve predictable delivery. We introduce a Predictability Bayes Net (PBN) that represents, from a contractor’s perspective, top-level dependencies between modeling and simulation (M&S) activities and standardized workflows across a population of programs. The PBN describes how to meet SBA objectives, or to diagnose and learn why objectives were not met. For example, given failure to deliver at-cost and on-time, the PBN computes marginal probabilities that suggest the most likely sequence of events leading to this failure. The PBN does this by linking verified and validated M&S to standardized workflows, thereby transitioning from CMSA to assured physical delivery. Our previous publications focused on CMSA for design engineering, using Bayes Nets to assess compliance with system performance requirements. We have now expanded CMSA to include production cost estimation and factory throughput modeling. The PBN includes top-level elements such as mission stability, technology stretch, workforce experience level, standardized workflow adherence, and supplier responsiveness. A Bayes Net includes a set of event nodes and node state definitions. These definitions become metrics to hold ourselves accountable for product delivery. The PBN is a joint probability distribution; it includes conditional probability estimates, which are based on a combination of opinions from subject matter experts and data from developmental and operational test events. Sufficient, relevant data from these test events is crucial. It supports M&S verification and validation and, when standardized workflows are followed, embodies CMSA. Without this data, M&S is non-credible, and predictable delivery becomes impossible. The PBN is of mutual interest to both the DoD and contractors. Early phases in a program cast the largest shadow over a system’s ultimate cost, performance, and production throughput. Given imperfect, limited, or partially shared information available in early program phases, probabilistic inference becomes critical for optimal decision-making using this information. The PBN facilitates SBA through CMSA by (a) first bringing clarity to fine-grained communication using the language of probability, and (b) then quantifying the uncertainty that exists at a specific moment of decision. The PBN also serves as a starting point for building lower-level Bayes Nets to answer targeted queries regarding a program’s execution, including aspects of design, supply chain, and production. The PBN is the mechanism for achieving predictable delivery. Speaker Info: Terril Hurst Senior Engineering Fellow Raytheon Terril Hurst is a Senior Engineering Fellow at Raytheon in Tucson, Arizona. Before coming to Raytheon in 2005, Terril worked for 27 years at Hewlett-Packard Laboratories on computer data storage physics, devices, and distributed file systems. He received his Bachelors, Masters, and PhD degrees in Applied Mechanics at Brigham Young University and completed a post-doctoral appointment at Stanford University in Artificial Intelligence. At Raytheon, Dr. Hurst is responsible for teaching credible modeling, simulation, and analysis, and working with programs to assure quantitative rigor in the verification, validation, and usage of modeling and simulation. He has presented his work regularly at DATAWorks for over 15 years. | 2025 Achieving Predictable Delivery with Credible Modeling, Terril Hurst Raytheon X Active Learning with Deep Gaussian Processes for Trimmed Aero Database Construction Session Recording Materials Abstract: Well characterized aerodynamic databases are necessary for accurate simulation of flight dynamics. High-fidelity CFD is computationally expensive and thus we use a surrogate model to represent the database. By utilizing active learning, we can efficiently generate samples for the database and target areas of the design space that are most useful for flight simulation. Here we focus on trimmed aero databases where our goal is to find regions where multiple moments are simultaneously zero and use a novel contour active learning to achieve this goal. Entropy-based methods are well explored for computer experiment research about reliability, however sequential design for estimation for multiple contours is less studied. We compare multiple entropy-based approaches for estimating contours for multiple responses simultaneously. This task requires the development of new metrics to evaluate the performance of the active learning strategies. We apply these active learning methods and metrics for both Gaussian Process and Deep Gaussian Process surrogate models. The performance of this approach is evaluated with multiple examples and applied to a reference vehicle as a simulation study. Speaker Info: Kevin Quinlan Lawrence Livermore National Laboratory Kevin Quinlan is staff in the Applied Statistics Group at Lawrence Livermore National Laboratory. He completed his PhD in statistics at Penn State. His main research interests are design of computer experiments, Gaussian process modeling, and active learning. Presentation | 2025 Active Learning with Deep Gaussian Processes for Trimmed Kevin Quinlan Lawrence Livermore National Laboratory X Addressing Ambiguity in Detection & Classification Tests Session Recording Materials Abstract: Ambiguity in how to associate system detections with ground truth measurements poses difficulty in interpreting system evaluation tests. As an example, we discuss the tests performed for the Strategic Environmental Research and Development Program and the Environmental Security Technology Certification Program (SERDP/ESTCP) meant to assess novel systems which detect & classify underwater Unexploded Ordnances (UXO). Due to the larger uncertainties associated with underwater environments, these tests frequently have ambiguities. The Institute of Defense Analyses (IDA), tasked with scoring these SERDP/ESTCP tests, developed and implemented a scoring methodology which interpret tests with ambiguities. This talk will introduce the basics of non-ambiguous detection & classification scoring, discuss methods IDA has used to address ambiguity, and discuss how this is approach to applied to produce graphs and tables which can be interpreted by relevant stakeholders. Speaker Info: Tyler Pleasant Research Associate IDA Tyler Pleasant is a Research Associate at the Institute for Defense Analyses (IDA) within the Science, Systems, and Sustainment division. He holds a M.S. from University of Chicago in Chemistry and a B.S in Physics and Mathematics from Massachusetts Institute of Technology. In addition to his detection and classification scoring work at IDA, he works on model verification & validation, data analysis, statistical testing, and technology assessments. Presentation | 2025 Addressing Ambiguity in Detection & Classification Tests Tyler Pleasant IDA X Advancing the Test Science of LLM-enabled Systems: A Survey of Factors and Conditions that Session Recording Materials Abstract: Regardless of test design method (combinatoric, Design of Experiments, a narrow robustness study, etc.), a scientifically rigorous experiment must understand, manage, and control the variables that impact test outcomes. For most scientific fields, this is settled science with decades – even centuries – of formalism and honed methodology. For the emerging field of Large Language Models (LLM) in military weapon systems, it is the wild west. This presentation will survey the factors and conditions that impact LLM test outcomes, along with supporting literature and practical methods, models, and measures for your use in tests. The presentation will also highlight: 1) The statistical assumptions that underly the common LLM performance metrics and how to test those assumptions; 2) How to evaluate a benchmark for its utility in addressing measures of performance, as well as checking the benchmark’s statistical validity; 3) Practical models, and supporting literature, for binning factors into levels of severity (conditions); 4) Resources for ensuring a User-centered test design; and 5) Incorporating selected adversarial techniques. These resources and techniques are immediately actionable (you can even try them out on your device and favorite LLM during the session) and will equip you to navigate the complexity of scientific test design for LLM-enabled systems. Speaker Info: Karen O'Brien Sr. Principal Data Scientist Modern Technology Solutions, Inc Karen O’Brien is a senior principal data scientist and AI/ML practice lead at Modern Technology Solutions, Inc. In this capacity, she leverages her 20-year Army civilian career as a scientist, evaluator, ORSA, and analytics leader to aid DoD agencies in implementing AI/ML and advanced analytics solutions. Her Army analytics career ranged ‘from ballistics to logistics’ and most of her career was at Army Test and Evaluation Command or supporting Army T&E from the Army Research Laboratory. She was physics and chemistry nerd in the early days but now uses her M.S. in Predictive Analytics from Northwestern University to help her DoD clients tackle the toughest analytics challenges in support of the nation’s Warfighters. She is the Co-Lead of the Women in Data Huntsville Chapter, a guest lecturer in data and analytics graduate programs, and an ad hoc study committee member at the National Academy of Sciences. | 2025 Advancing the Test Science of LLM-enabled Systems: A Survey Karen O'Brien Modern Technology Solutions, Inc X AI/ML and UQ in Systems Engineering Session Recording Materials Abstract: The integration of Artificial Intelligence (AI), Machine Learning (ML), and Uncertainty Quantification (UQ) is transforming aerospace systems engineering by improving how systems are designed, tested, and operated. This presentation explores the transformative role of large language models (LLMs) and UQ in tackling the high stakes and inherent uncertainty of space systems. LLMs streamline requirements analysis, enable intelligent creation and querying of design documents, and accelerate development timelines, while UQ provides robust risk assessments, predictive modeling, and cost-saving opportunities throughout the mission lifecycle. Effectively managing uncertainty is critical at every stage of the project lifecycle, from early design formulation to on-orbit operations. This presentation highlights practical applications of UQ in space mission formulation and science data pipelines, as well as its role in assessing risk and enhancing system reliability. It also examines how LLMs improve the development and analysis of system documentation, enabling more agile and informed decision-making in complex projects. By integrating LLMs and UQ into systems engineering, aerospace teams can better manage complexity, enhance system resilience, and achieve cost-effective solutions. This presentation offers key insights, lessons learned, and future opportunities for advancing systems engineering with AI/ML and UQ. Speaker Info: Kelli McCoy Senior Systems Engineer NASA JPL + USC Kelli McCoy is currently a Senior Systems Engineer at NASA Jet Propulsion Laboratory, working to promote the infusion of Uncertainty Quantification, Machine Learning, and risk-informed decision-making practices across the Systems Engineering organization. Before joining JPL, Kelli gained valuable experience at NASA Headquarters and Kennedy Space Center. Her research interests include Statistical Learning Theory, Digital Twin analytics, and Probabilistic Risk Analysis. Presentation | 2025 AI/ML and UQ in Systems Engineering Kelli McCoy NASA JPL + USC X An Optimization Approach for Improved Strategic Material Shortfall Estimates Session Recording Materials Abstract: Material supply chains are complex, global systems that drive the production of goods and services, from raw material extraction to manufacturing processes. As the U.S. becomes increasingly dependent on foreign production of strategic and critical materials (S&CMs), our nation’s security will demand effective analyses of material supply chains to identify potential shortages and suggest ways to alleviate them. The Institute for Defense Analyses (IDA) developed the Risk Assessment and Mitigation Framework for Strategic Materials (RAMF-SM) to help the Department of Defense identify and resolve potential shortfalls of S&CMs in the National Defense Stockpile Program. The Stockpile Sizing Module (SSM) is the main computational vehicle in the RAMF-SM suite of models and is used to estimate shortfalls of S&CMs during national emergencies. This talk presents a multicommodity flow model that extends the SSM’s network linear programming formulation by explicitly representing two stages of supply—commonly categorized as mining and refining—and tracking material throughout these stages while incorporating decrements to supply. This more accurate representation offers a robust framework for analyzing material production dynamics, enabling precise shortfall calculations and the identification of bottlenecks. While focused on two production stages, this work lays critical groundwork for future extensions to a comprehensive multi-stage production model. Speaker Info: Dominic Flocco Research Associate Institute for Defense Analyses Dominic C. Flocco is a Research Associate in the Strategy, Forces, and Resources Division at the Institute for Defense Analyses (IDA) and a Ph.D. Candidate in Applied Mathematics & Statistics, and Scientific Computing at the University of Maryland, College Park. He specializes in mathematical optimization and equilibrium modeling, with application in operations research, energy economics, game theory, supply chain management and defense logistics. At IDA, his analytic work supports the Defense Logistic Agency’s effort to assess the risk and vulnerabilities in strategic and critical material supply chains for the National Defense Stockpile. | 2025 An Optimization Approach for Improved Strategic Material Dominic Flocco Institute for Defense Analyses X Application of DOD's VAULTIS Data Management Framework to Testing Session Recording Materials Abstract: The Department of Defense has realigned its approach to data management. Prior to 2020, data was viewed as a strategic risk for the Department. Now it is seen as a strategic asset that will position the Department for joint all-domain operations and artificial intelligence applications. Commensurate with Department-level data policy, Director, Operation Test and Evaluation has published new policy that test programs shall create data management plans to make test data VAULTIS (visible, accessible, understandable, linked, trustworthy, interoperable, and secure). In this briefing, I will motivate the necessity for testers to take an intentional approach to data management, tour the new policies for test data, and provide an overview of the Data Management Plan Guidebook – an approach to planning for test data management that is in line with DOD’s VAULTIS framework. Speaker Info: John Haman Research Staff Institute for Defense Analyses I have been a member of the research staff at the Institute for Defense Analyses since 2018. I lead the Test Science team, a team of statisticians, mathematicians, psychologists, and neuroscientists focused on methodological and workforce improvements in testing and evaluation. My overall research interest is identifying effective and pragmatic statistical methods that align to DOD assumptions and analytics goals. I earned a PhD in statistics from Bowling Green State University for my work in energy statistics under the study of Maria Rizzo. Presentation | 2025 Application of DOD's VAULTIS Data Management Framework to John Haman Institute for Defense Analyses X Approximate Bayesian inference for neural networks: a case study in analysis of spectra Materials Abstract: Bayesian neural networks (BNNs) combine the remarkable flexibility of deep learning models with principled uncertainty quantification. However, poor scalability of traditional Bayesian inference methods such as MCMC has limited the utility of BNNs for uncertainty quantification (UQ). In this talk, we focus on recent advances in approximate Bayesian inference for BNNs and seek to evaluate, in the context of a real application, how useful these approximate inference methods are in providing UQ for scientific applications. As an example application, we consider prediction of chemical composition from laser-induced breakdown spectroscopy measured by the ChemCam instrument on the Mars rover Curiosity, which was designed to characterize Martian geology. We develop specialized BNNs for this task and apply multiple existing approximate inference methods. We evaluate the quality of the posterior predictive distribution under different inference algorithms and touch on the utility of approximate inference schemes for other tasks, including model selection. Speaker Info: Natalie Klein Statistician Los Alamos National Laboratory Dr. Natalie Klein is a staff scientist in the Statistical Sciences group at Los Alamos National Laboratory. Natalie’s research centers on the development and application of statistical and machine learning approaches in a variety of application areas, including hyperspectral imaging, laser-induced breakdown spectroscopy, and high-dimensional physics simulations. Dr. Klein holds a joint Ph.D. in Statistics and Machine Learning from Carnegie Mellon University. Presentation | 2025 Approximate Bayesian inference for neural networks: a case Natalie Klein Los Alamos National Laboratory X April 23 - Keynote Session Recording Materials Speaker Info: Joseph Lyons Senior Scientist for Human-Machine Teaming Air Force Research Laboratory Joseph B. Lyons, a member of the scientific and professional cadre of senior executives, is the Senior Scientist for Human-Machine Teaming, 711th Human Performance Wing, Human Effectiveness Directorate, Air Force Research Laboratory, Wright-Patterson AFB, Ohio. He serves as the principal scientific authority and independent researcher in the research, development, adaptation, and application of Human-Machine Teaming. Dr. Lyons began his career with the Air Force in 2005 in the Human Effectiveness Directorate, Wright-Patterson AFB, Ohio. Dr. Lyons has served as a thought leader for the DoD in the areas of trust in autonomy and Human-Machine Teaming. Dr. Lyons has published over 100 technical publications including 64 journal articles in outlets focused on human factors, human-machine interaction, applied psychology, robotics, and organizational behavior. Dr. Lyons also served as Co-Editor for the 2020 book, Trust in Human-Robot Interaction. Dr. Lyons is an AFRL Fellow, a Fellow of the American Psychological Association, and a Fellow of the Society for Military Psychologists. Prior to assuming his current position, Dr. Lyons served as a Program Officer for the Air Force Office of Scientific Research and was a Principal Research Psychologist within the Human Effectiveness Directorate. Keynote | 2025 Joseph Lyons Air Force Research Laboratory X April 24 – Keynote Session Recording Materials Speaker Info: David Salvagnini Chief Data Officer / Chief Artificial Intelligence Officer NASA David Salvagnini serves as the Chief Data Officer at NASA. Since joining NASA in June 2023, David’s role recently expanded to include his appointment in May 2024 as the Chief Artificial Intelligence Officer. In these roles, David will yield synergies between these critical roles especially in assuring data readiness in response to responsible and transparent artificial intelligence (AI). David formerly served as the Director of the Intelligence Community Chief Information Officer (IC CIO) Architecture and Integration Group (AIG) and Chief Architect. In these roles, he worked with Intelligence Community elements and 5-Eye Enterprise (5EEE) international partners on the development and implementation of reference architectures for interoperability, data sharing and technical advancement of Information Technology (IT) infrastructure, data services, foundational AI services, and other mission capabilities. Before joining the IC CIO, Mr. Salvagnini held a variety of positions at the Defense Intelligence Agency (DIA), to include Chief Information Office (CIO) Technical Director, Chief Data Officer, and Deputy Chief of the Enterprise Cyber and Infrastructure Services Division. In these roles, David supported the deployment of AI capabilities and the development of analytic tradecraft related to AI use as part of intelligence production. He was appointed to the Senior Executive Service as the Senior Technical Officer for Enterprise IT and Cyber Operations at DIA in June 2016. Mr. Salvagnini joined DIA as a civil servant in May 2005. Prior to his selection as a senior executive, he served on a joint duty assignment as the Chief Architect for the IC Desktop Environment (IC DTE). In that position, he was responsible for DIA and National Geospatial-Intelligence Agency (NGA) adoption of DTE services, and the transitioning of over 57,000 personnel to the IC Information Technology Enterprise (IC ITE). Previously, Mr. Salvagnini held numerous key leadership positions, to include Deputy Chief, Infrastructure Integration; Deputy Chief, Applications Operations; and Acting Chief, Infrastructure Innovation Division. Mr. Salvagnini's experience includes all aspects of enterprise IT service delivery, including research, engineering, testing, security, and operations. Mr. Salvagnini retired from the Air Force as a Communications and Computer Systems Officer in May 2005 after having served in a variety of leadership assignments during his 21-year career. He is a native of Setauket, New York and resides with his family in Falls Church, Virginia. Keynote | 2025 David Salvagnini NASA X Army Evaluation Center's Progression and Advancement of Design of Experiments Session Recording Materials Abstract: Army Test and Evaluation Command (ATEC) has been an advocate of Design of Experiments (DOE) since the inception of DOTE DOE policies in the early 2000s. During that time, test designs were primarily D-optimal designs for Operational Tests. As ATEC has endorsed a shift-left mindset, meaning collecting/utilizing Developmental Testing to prove out performance metrics, more statistical analysis models, and therefore more test design options became applicable. With the emphasis on these new avenues for test and evaluation, ATEC has expanded their use of DOE outside of what the test community generally considers the standard Operational and even Developmental Test events. This talk will include test designs for deterministic data, sampling from models, modeling and simulation vs live data, and initial artificial intelligence test cases within ATEC. The designs discussed are not novel in their statistical approach but do shed light on the advances ATEC has made in implementing DOE across multiple stages of test and evaluation. Speaker Info: Shane Hall Analytics and Artificial Intelligence Division Chief Army Evaluation Center Shane Hall graduated from Penn State University in 2011 with a Bachelors in Statistics and Masters in Applied Statistics. He has worked as a CIV for the US Army for 15 years. Mr. Hall started his Army career at the US Army Public Health Command where he was the Command Statistician. He then transitioned to the Army Evaluation Center at Aberdeen Proving Ground, MD where he first started as a Statistician and has since transitioned into the role of the Division Chief for the Analytics and Artificial Intelligence Division. Presentation | 2025 Army Evaluation Center's Progression and Advancement of Shane Hall Army Evaluation Center X Automated User Feedback in Software Development Materials Abstract: Frequent user engagements are critical to the success of modern software development, but documentation lacks precise and structured guidance on when and how programs should obtain user feedback. The Software Acquisition Pathway (SWP) was introduced for software-intensive systems as part of the new Adaptive Acquisition Framework (AAF). DOD Instruction 5000.87, “Operation of the Software Acquisition Pathway,” recognizes the unique characteristics of software development and acquisition, including the rapid pace of change in software technologies and the iterative and incremental nature of modern software development methodologies. Success in developing software-intensive systems will require agile and iterative development processes that incorporate user feedback throughout the development process. This work merges established survey principles with the agile, iterative methods necessary to facilitate rapid delivery of a software capability to the user. Four critical milestones in the software development process were identified to gather user feedback throughout the SWP. Examples are given for building effective surveys to gain insight from the user. This work is presented as a framework for collecting actionable user feedback and generating analysis plans and automated reports at the identified key points. Incorporating user feedback early and throughout software development reduces the risk of developing the wrong product. Speaker Info: Brittany Fischer Statistician STAT COE Ms. Brittany Fischer is a contractor with Huntington Ingalls Industries (HII) working at the Scientific Test and Analysis Techniques (STAT) Center of Excellence (COE). As a STAT expert, she supports DOD programs through direct consultation, applying tailored solutions that deliver insight, and conducts applied research. Before joining the STAT COE, she spent five years as a statistical engineer with Corning, Inc. gathering and analyzing data to solve problems and inform decisions. Her project experience includes collaborating with cross-functional teams for new product development, manufacturing, reliability, and quality systems. Presentation | 2025 Automated User Feedback in Software Development Brittany Fischer STAT COE X Bayes Factor Approach for Calculating Model Validation Live Test Size Materials Abstract: Model validation is critical to using models to evaluate systems but requires resource-limited live test to validate. If a simulated model shows the system meets requirements, then the model needs to be validated with live data. Bayes Factors can be applied to determine whether live data is more consistent with the model or with an alternative that the system does not meet requirements. If the Bayes Factor is sufficiently large, then the evidence shows that the model aligns better with the data and therefore adequately represents the system. This presentation shows how Bayes Factors can be used to validate a model and how to size live tests when using this method. This approach is demonstrated for a model that predicts success and failures. Examples illustrating the methods will be shown and the effect of the factors influencing the required number of tests will be discussed. The simulation results can be represented with a beta distribution that captures the probability of success and compared against an alternative distribution. Speaker Info: James Theimer STAT Expert HS COBP Dr. James Theimer is a Scientific Test and Analysis Techniques Expert employed by Huntington Ingles Industries Technical Solutions and working to support the Homeland Security Center of Best Practices. Dr. Theimer worked for Air Force Research Laboratory and predecessor organizations for more than 35 years. He carried out research on the simulations of sensors and devices as well as the analysis of data. | 2025 Bayes Factor Approach for Calculating Model Validation Live James Theimer HS COBP X Bayesian Reliability Assurance Testing Made Easy Session Recording Materials Abstract: A common challenge for the Department of Defense is determining how long a system should be tested for to adequately assess system reliability. Unlike traditional reliability demonstration tests, which rely solely on current test data and often demand an impractical amount of testing, Bayesian methods aim to improve test plan efficiency by incorporating previous knowledge of a system's reliability. Although prior information is often available and has the potential to improve test plan efficiency, evaluators face challenges in applying Bayesian methods to test planning and analysis due to the lack of readily available tools. To address this gap, we developed an easy-to-use R Shiny application which automatically generates recommended test plans to assess system reliability using all available information about a system. Researchers can also use the application to estimate a system’s reliability using Bayesian methods. Through a case study, we show leveraging prior information into the analysis of test data can yield a narrower range of uncertainty for reliability estimates compared to traditional methods. Speaker Info: Emma Mitchell UNC Emma is a fifth-year Ph.D. candidate in Statistics at the University of North Carolina, Chapel Hill, advised by Dr. Jan Hannig and Dr. Corbin Jones. Emma concentrates on methodological advances in the analysis of genomic data using Bayesian modeling and multi-view data integration techniques. She also interned this summer with the Institute for Defense Analyses where she used her knowledge of Bayesian Statistics to design an R shiny application for Bayesian reliability assurance testing. | 2025 Bayesian Reliability Assurance Testing Made Easy Emma Mitchell UNC X Bayesian Statistical Analysis for Mass Spectrometric Data Processing Session Recording Materials Abstract: High-precision mass spectrometry (MS) is a key technology in advancing nuclear nonproliferation analytical capabilities and enabling mission organizations at Savannah River National Laboratory (SRNL). Despite the centrality of precision MS to SRNL projects and organizations, no software currently exists with either basic or advanced data analysis tools and data interactivity functions necessitated by modern high-precision MS, including thermal ionization mass spectrometry (TIMS) and multicollector–inductively coupled plasma–mass spectrometry (MC-ICP-MS). The absence of transparent, optimizable, multifaceted data management and analytics tools in commercial MS software is further compounded by the non-user-friendly and cumbersome experience of using manufacturer-supplied software, whose interface is often buggy and prone to copy/paste errors, mislabeling, and misassignment of samples in a serialized autosampler queue (MC-ICP-MS) or filament turret (TIMS). If left unchecked, such seemingly small user “quality of life” considerations can lead to loss of precious instrument time and/or spuriously reported customer sample results. Insofar as open-source alternatives are concerned, the relevance of the few open-source data tools that do exist is inherently limited. Thus, we aim to develop a comprehensive data analytics software package and R Shiny graphical user interface (GUI) that focuses on flexibility, transparency, and reproducibility. Within the GUI, the implementation of a Bayesian framework supports the national security focus of DATAWorks 2025 by allowing for cross validation, more conservative uncertainty classification, and improved model performance. We suggest the implementation of Markov chain Monte Carlo (MCMC) and other sampling algorithms to better standardize and quantify the distribution of traceable isotope ratios in support of high precision mass spectrometry data processing. Additionally, we demonstrate the implementation of various priors including hierarchical, uninformative, and informative to allow for further flexibility in model construction. In doing so, we aim to emphasize how mass spectrometrists at US National Laboratories, academia, and beyond can seamlessly implement Bayesian, data-driven analysis into their own research. Speaker Info: Ellis McLarty Graduate Student Clemson University and Savannah River National Laboratory Statistician Ellis McLarty of Greenville, South Carolina works in statistical computing, applied data analysis, and mathematical statistics. She is a Technical Graduate Intern for Savannah River National Laboratory and a graduate student at Clemson University’s School of Mathematical and Statistical Sciences. At Savannah River National Laboratory (SRNL), Ellis researches uncertainty characterization methods in a Laboratory Directed Research and Development project entitled "Rapid, rigorous, reproducible data analysis software for high precision mass spectrometry." Specifically, she is incorporating Bayesian inference into the measurement uncertainty classification of traceable isotope ratios for elements such as uranium and plutonium. This project is a multi-directorate effort between Environment and Legacy Management and Global Security at SRNL. At Clemson University’s Statistical and Mathematical Consulting Center, Ellis researches data characterization and classification methods as a consultant for Cotton Incorporated. Additionally, she routinely performs data management and statistical programming for Clemson University Cooperative Extension program evaluation. Ellis is also a classically trained cellist. In recent years, she has enjoyed looking at the intersection of mathematics and music. Ellis holds a Bachelor of Science in Mathematics, Bachelor of Arts in Music Performance, and Performance Certificate in Cello Performance from the University of South Carolina – Columbia. | 2025 Bayesian Statistical Analysis for Mass Spectrometric Data Ellis McLarty Clemson University and Savannah River National Laboratory X Calculation of Continuous Surrogate Load Distributions from Discrete Resampled Data Session Recording Materials Abstract: Aerodynamic design optimization and database generation have seen a growing need for surrogate aerodynamic models based on computational data. Traditionally, the accuracy and quality of surrogate models is analyzed by examining performance compared to test data. However, in some situations this is not possible such as when cost makes the acquisition of additional data impractical. In that case alternative methods are needed to evaluate the quality of a surrogate model. Resampling of computational data (e.g., cross validation and bootstrapping) is a technique that can be used to inform the quality of surrogate models and downstream analysis (e.g., structural loading, control robustness). Recent work has shown that modal decomposition-based methods (Principal Component Analysis/Proper Orthogonal Decomposition) enable this type of analysis. By taking subsamples of the snapshot matrix used to generate the model, a distribution of predictions can be generated that reflects the sensitivity of the model to the quality and quantity of input data. A wider spread gives evidence of the need for more data collection. One of the underlying problems is that resampling the snapshot matrix is discrete by nature, which complicates the identification of a representative, continuous output distribution. The presented work employs quadrature weighting to make discrete matrix resampling continuous, with an arbitrarily small amount of input data neglected instead of an arbitrary number of discrete snapshots. In this way, size in the variations can be controlled by the user to give a better understanding of how the system reacts to both small and large changes in the system. This results in more coherent output distributions, which then yield more meaningful uncertainties on the surrogate models of interest. To explore the validity and usefulness of the approach, data from a simplified three body launch vehicle were used. Three parameters are explored: aerodynamic angle of attack varying from 0 deg. to 90 deg., vehicle roll from 0 deg. to 90 deg., and freestream Reynolds number from 0.6 to 10 million at low Mach numbers. The flow characteristics in this domain are both variable, making for a challenging test problem, and relevant to contemporary applications. The separated wakes in a majority of the incidence range mandate costly, high-fidelity simulations; in such computationally expensive regimes, the use of surrogate models is essential to be able to characterize the entire parameter space. The presented method thus enables surrogate model uncertainty quantification of both input data scope and significance, where no other data sources are available. Speaker Info: T.J. Wignall Aerospace Engineer NASA LaRC T.J. Wignall is an aerospace engineer with the configuration aerodynamics branch at NASA Langley. He primarily works on the SLS program supporting low speed aerodynamics. Recently his interests have focused on surrogate modeling and resampling methods. He received his master's degree from ODU and his PhD from NCSU. | 2025 Calculation of Continuous Surrogate Load Distributions from T.J. Wignall NASA LaRC X Case Study on Bayesian Test Planning for Binary Reliability Evaluation Session Recording Materials Abstract: Follow-on operational test and evaluation (FOT&E) and agile approaches to system development need rigorous ways to determine the amount of testing required. These systems change in small increments over time, and traditional frequentist methods for sizing tests that do not account for prior data result in unrealistically large sample sizes for testing the latest update. Bayesian methods mathematically account for previous system data, enabling reduced sample sizes while maintaining scientific rigor. This presentation will demonstrate a Bayesian power prior framework for evaluating binary (pass/fail) reliability of systems undergoing agile development. Using this framework, the presentation will discuss how to plan the number of trials required for a future update. Additionally, the results and lessons learned from application to a DOD program are discussed. Speaker Info: Corinne Stafford STAT COE Ms. Corinne Stafford is the Applied Research Lead at the Scientific Test and Analysis Techniques Center of Excellence (STAT COE) located at the Air Force Institute of Technology at Wright-Patterson Air Force Base. Ms. Stafford leads research efforts in areas including modeling and simulation, Bayesian statistics, and software testing. Ms. Stafford obtained her M.S. in chemical engineering from Stanford University and her B.S. in applied mathematics from Mary Baldwin University. Presentation | 2025 Case Study on Bayesian Test Planning for Binary Reliability Corinne Stafford STAT COE X Combinatorial Testing for AI-Enabled Systems Session Recording Session Recording part2 Abstract: Combinatorial testing (CT) is a black-box approach for software integration testing utilizing test suites like covering arrays that guarantee to identify the presence of faults caused by up to a fixed number of interacting components while minimizing the number of tests required. CT makes different assumptions from other DOE approaches, such as requiring components to have discrete levels. CT has applications in testing Artificial Intelligence-enabled systems (AIES) such as constructing test sets with measurable coverage, identifying factors that define a model’s operating envelope, augmenting training datasets for fine-tuning in transfer learning, testing for fairness and bias, and explainable AI. This short course is intended to introduce how to apply CT to testing AIES to the test practitioner through examples that are recognizable within the DOD and NASA as a collaboration between Virginia Tech (VT), National Institute of Standards and Technology (NIST), and Institute for Defense Analyses (IDA). Topics to be covered at a conceptual level will include: CT theoretical background including measures as well as empirical results from the software testing community; differences between CT and other design of experiments approaches; CT applications for AIES across the AI development lifecycle; and how to know when to use CT within a broader test program and at what level of test. Participants will be guided through hands-on exercises with CT tools including NIST’s Automated Combinatorial Testing for Software Tool (ACTS) for test generation and VT’s Coverage of Data Explorer (CODEX) tool for characterizing the AI model input space. The group will also work through how CT may be applied to practical problems within the DOD. To support the hands-on and practical components of this short course, participants should expect to bring a personal laptop on which they have permission to download and run software tools and, if possible, provide real-world (or surrogate) example problems to the organizers prior to the course. Information regarding the software and request for inputs will be sent to registered participants in advance. Speaker Info: Jaganmohan Chandrasekaran Research Assistant Professor Virginia Tech Jaganmohan Chandrasekaran is a research assistant professor at the Sanghani Center for AI & Data Analytics, Virginia Tech, Arlington, VA 22203, USA. His research interests is at the intersection of software engineering and artificial intelligence, focusing on the reliability and trustworthiness of artificial intelligence-enabled software systems. Chandrasekaran received a Ph.D. in computer science from the University of Texas at Arlington. Short Course Group | 2025 Combinatorial Testing for AI-Enabled Systems Jaganmohan Chandrasekaran Virginia Tech X Combinatorial Testing for AI-Enabled Systems Session Recording Session Recording part2 Materials Abstract: Combinatorial testing (CT) is a black-box approach for software integration testing utilizing test suites like covering arrays that guarantee to identify the presence of faults caused by up to a fixed number of interacting components while minimizing the number of tests required. CT makes different assumptions from other DOE approaches, such as requiring components to have discrete levels. CT has applications in testing Artificial Intelligence-enabled systems (AIES) such as constructing test sets with measurable coverage, identifying factors that define a model’s operating envelope, augmenting training datasets for fine-tuning in transfer learning, testing for fairness and bias, and explainable AI. This short course is intended to introduce how to apply CT to testing AIES to the test practitioner through examples that are recognizable within the DOD and NASA as a collaboration between Virginia Tech (VT), National Institute of Standards and Technology (NIST), and Institute for Defense Analyses (IDA). Topics to be covered at a conceptual level will include: CT theoretical background including measures as well as empirical results from the software testing community; differences between CT and other design of experiments approaches; CT applications for AIES across the AI development lifecycle; and how to know when to use CT within a broader test program and at what level of test. Participants will be guided through hands-on exercises with CT tools including NIST’s Automated Combinatorial Testing for Software Tool (ACTS) for test generation and VT’s Coverage of Data Explorer (CODEX) tool for characterizing the AI model input space. The group will also work through how CT may be applied to practical problems within the DOD. To support the hands-on and practical components of this short course, participants should expect to bring a personal laptop on which they have permission to download and run software tools and, if possible, provide real-world (or surrogate) example problems to the organizers prior to the course. Information regarding the software and request for inputs will be sent to registered participants in advance. Speaker Info: Erin Lanus Research Assistant Professor Virginia Tech Erin Lanus is a research assistant professor at the National Security Institute and Affiliate Faculty Computer Science at Virginia Tech, Arlington, VA 22203 USA. Her research interests include the adaptation of combinatorial testing to the input space of artificial intelligence and machine learning (AI/ML), metrics and algorithms for designing test sets with coverage-related properties, and data security concerns in AI/ML systems. Lanus received a Ph.D. in computer science and a B.A. in psychology, both from Arizona State University. Short Course Group | 2025 Combinatorial Testing for AI-Enabled Systems Erin Lanus Virginia Tech X Combinatorial Testing for AI-Enabled Systems Session Recording Session Recording part2 Abstract: Combinatorial testing (CT) is a black-box approach for software integration testing utilizing test suites like covering arrays that guarantee to identify the presence of faults caused by up to a fixed number of interacting components while minimizing the number of tests required. CT makes different assumptions from other DOE approaches, such as requiring components to have discrete levels. CT has applications in testing Artificial Intelligence-enabled systems (AIES) such as constructing test sets with measurable coverage, identifying factors that define a model’s operating envelope, augmenting training datasets for fine-tuning in transfer learning, testing for fairness and bias, and explainable AI. This short course is intended to introduce how to apply CT to testing AIES to the test practitioner through examples that are recognizable within the DOD and NASA as a collaboration between Virginia Tech (VT), National Institute of Standards and Technology (NIST), and Institute for Defense Analyses (IDA). Topics to be covered at a conceptual level will include: CT theoretical background including measures as well as empirical results from the software testing community; differences between CT and other design of experiments approaches; CT applications for AIES across the AI development lifecycle; and how to know when to use CT within a broader test program and at what level of test. Participants will be guided through hands-on exercises with CT tools including NIST’s Automated Combinatorial Testing for Software Tool (ACTS) for test generation and VT’s Coverage of Data Explorer (CODEX) tool for characterizing the AI model input space. The group will also work through how CT may be applied to practical problems within the DOD. To support the hands-on and practical components of this short course, participants should expect to bring a personal laptop on which they have permission to download and run software tools and, if possible, provide real-world (or surrogate) example problems to the organizers prior to the course. Information regarding the software and request for inputs will be sent to registered participants in advance. Speaker Info: Brian Lee Research Data Analyst Virginia Tech Brian Lee is a research data analyst at the Virginia Tech National Security Institute's Intelligent Systems Division, Arlington, VA 22203 USA. His interests stem from his research on improving the efficiency and efficacy of machine learning methods such as generative adversarial networks and combinatorics in model training. He received a B.S. in Computational Modeling and Data Analytics from Virginia Tech. Short Course Group | 2025 Combinatorial Testing for AI-Enabled Systems Brian Lee Virginia Tech X Combinatorial Testing for AI-Enabled Systems Session Recording Session Recording part2 Abstract: Combinatorial testing (CT) is a black-box approach for software integration testing utilizing test suites like covering arrays that guarantee to identify the presence of faults caused by up to a fixed number of interacting components while minimizing the number of tests required. CT makes different assumptions from other DOE approaches, such as requiring components to have discrete levels. CT has applications in testing Artificial Intelligence-enabled systems (AIES) such as constructing test sets with measurable coverage, identifying factors that define a model’s operating envelope, augmenting training datasets for fine-tuning in transfer learning, testing for fairness and bias, and explainable AI. This short course is intended to introduce how to apply CT to testing AIES to the test practitioner through examples that are recognizable within the DOD and NASA as a collaboration between Virginia Tech (VT), National Institute of Standards and Technology (NIST), and Institute for Defense Analyses (IDA). Topics to be covered at a conceptual level will include: CT theoretical background including measures as well as empirical results from the software testing community; differences between CT and other design of experiments approaches; CT applications for AIES across the AI development lifecycle; and how to know when to use CT within a broader test program and at what level of test. Participants will be guided through hands-on exercises with CT tools including NIST’s Automated Combinatorial Testing for Software Tool (ACTS) for test generation and VT’s Coverage of Data Explorer (CODEX) tool for characterizing the AI model input space. The group will also work through how CT may be applied to practical problems within the DOD. To support the hands-on and practical components of this short course, participants should expect to bring a personal laptop on which they have permission to download and run software tools and, if possible, provide real-world (or surrogate) example problems to the organizers prior to the course. Information regarding the software and request for inputs will be sent to registered participants in advance. Speaker Info: Raghu Kacker Scientist National Institute of Standards and Technology Raghu N. Kacker is a scientist in the National Institute of Standards and Technology (NIST), Gaithersburg, MD 20899, USA. He received Ph.D. from the Iowa State University in 1979. His interests include testing for trust, assurance, and performance of software-based systems. He has worked in AT&T Bell Labs and Virginia Tech. He is a Fellow of the American Statistical Association, a Fellow of the American Society for Quality, and a Fellow of the Washington Academy of Sciences. He has authored or co-authored over 200 papers. His papers have been cited over 14,750 times per Google Scholar Short Course Group | 2025 Combinatorial Testing for AI-Enabled Systems Raghu Kacker National Institute of Standards and Technology X Combinatorial Testing for AI-Enabled Systems Session Recording Session Recording part2 Abstract: Combinatorial testing (CT) is a black-box approach for software integration testing utilizing test suites like covering arrays that guarantee to identify the presence of faults caused by up to a fixed number of interacting components while minimizing the number of tests required. CT makes different assumptions from other DOE approaches, such as requiring components to have discrete levels. CT has applications in testing Artificial Intelligence-enabled systems (AIES) such as constructing test sets with measurable coverage, identifying factors that define a model’s operating envelope, augmenting training datasets for fine-tuning in transfer learning, testing for fairness and bias, and explainable AI. This short course is intended to introduce how to apply CT to testing AIES to the test practitioner through examples that are recognizable within the DOD and NASA as a collaboration between Virginia Tech (VT), National Institute of Standards and Technology (NIST), and Institute for Defense Analyses (IDA). Topics to be covered at a conceptual level will include: CT theoretical background including measures as well as empirical results from the software testing community; differences between CT and other design of experiments approaches; CT applications for AIES across the AI development lifecycle; and how to know when to use CT within a broader test program and at what level of test. Participants will be guided through hands-on exercises with CT tools including NIST’s Automated Combinatorial Testing for Software Tool (ACTS) for test generation and VT’s Coverage of Data Explorer (CODEX) tool for characterizing the AI model input space. The group will also work through how CT may be applied to practical problems within the DOD. To support the hands-on and practical components of this short course, participants should expect to bring a personal laptop on which they have permission to download and run software tools and, if possible, provide real-world (or surrogate) example problems to the organizers prior to the course. Information regarding the software and request for inputs will be sent to registered participants in advance. Speaker Info: Rick Kuhn Computer Scientist National Institute of Standards and Technology Rick Kuhn is a computer scientist at the National Institute of Standards and Technology, Gaithersburg, MD 20899 USA, and is a Fellow of the Institute of Electrical and Electronics Engineers (IEEE) and the Washington Academy of Sciences. He co-developed the role based access control (RBAC) model that is the dominant form of access control today. His current research focuses on combinatorial methods for assured autonomy. He has authored three books and more than 200 conference or journal publications on cybersecurity, software failure, and software verification and testing. Previously he served as Program Manager for the Committee on Applications and Technology of the President's Information Infrastructure Task Force, and as manager of the Software Quality Group at NIST. He received an MS in computer science from the University of Maryland College Park and an MBA from William & Mary. Short Course Group | 2025 Combinatorial Testing for AI-Enabled Systems Rick Kuhn National Institute of Standards and Technology X Combinatorial Testing for AI-Enabled Systems Session Recording Session Recording part2 Abstract: Combinatorial testing (CT) is a black-box approach for software integration testing utilizing test suites like covering arrays that guarantee to identify the presence of faults caused by up to a fixed number of interacting components while minimizing the number of tests required. CT makes different assumptions from other DOE approaches, such as requiring components to have discrete levels. CT has applications in testing Artificial Intelligence-enabled systems (AIES) such as constructing test sets with measurable coverage, identifying factors that define a model’s operating envelope, augmenting training datasets for fine-tuning in transfer learning, testing for fairness and bias, and explainable AI. This short course is intended to introduce how to apply CT to testing AIES to the test practitioner through examples that are recognizable within the DOD and NASA as a collaboration between Virginia Tech (VT), National Institute of Standards and Technology (NIST), and Institute for Defense Analyses (IDA). Topics to be covered at a conceptual level will include: CT theoretical background including measures as well as empirical results from the software testing community; differences between CT and other design of experiments approaches; CT applications for AIES across the AI development lifecycle; and how to know when to use CT within a broader test program and at what level of test. Participants will be guided through hands-on exercises with CT tools including NIST’s Automated Combinatorial Testing for Software Tool (ACTS) for test generation and VT’s Coverage of Data Explorer (CODEX) tool for characterizing the AI model input space. The group will also work through how CT may be applied to practical problems within the DOD. To support the hands-on and practical components of this short course, participants should expect to bring a personal laptop on which they have permission to download and run software tools and, if possible, provide real-world (or surrogate) example problems to the organizers prior to the course. Information regarding the software and request for inputs will be sent to registered participants in advance. Speaker Info: M S Raunak Computer Scientist National Institute of Standards and Technology M S Raunak is a computer scientist at the National Institute of Standards and Technology (NIST), Gaithersburg, MD 20899 USA. His research interests include verification, validation, and assurance of “difficult-to-test” systems such as complex simulation models, cryptographic implementations, and machine learning algorithms. Dr. Raunak received his Ph.D. in computer science from the University of Massachusetts Amherst. He is a Senior Member of IEEE. Short Course Group | 2025 Combinatorial Testing for AI-Enabled Systems M S Raunak National Institute of Standards and Technology X Combining Joint Cost and Schedule Risk Analysis with Earned Value Management Using Bayesia Session Recording Materials Abstract: The joint risk analysis of cost and schedule are typically conducted either via parametric methods or using a detailed bottom-up approach by resource-loading schedule networks. Earned value data are not typically used as part of this process. However, there is significant value to utilizing earned value data in order to improve the accuracy of joint cost and schedule risk analyses. This paper will explain how to use a joint cost and schedule risk analysis as an input and then as earned value data are collected, the joint cost and schedule risk probability distributions will be updated using this information. The specific technique used is Bayesian Parameter Learning, which provides a rigorous mathematical framework for updating probability distributions using new information. This presentation is an extension of a prior work that applies the same principle of Bayesian Parameter Learning to improve the predictive accuracy of estimate at completion for cost. Speaker Info: Murray Cantor Consultant Cantor Consulting Murray Cantor is a retired IBM Distinguished Engineer. With his Ph.D. in mathematics from the University of California at Berkeley and extensive experience in managing complex, innovative projects, he has focused on applying predictive reasoning and causal analysis to the execution and economics of project management. In addition to many journal articles, Murray is the author of two books: Object-Oriented Project Management with UML and Software Leadership. He is an inventor of 15 IBM patents. After retiring from IBM, he was a founder and lead scientist of Aptage, which developed and delivered tools for learning and tracking the probability of meeting project goals. Aptage was sold to Planview. Dr. Cantor’s quarter-century career with IBM included two periods: An architecture and senior project manager for the Workstation Division and An IBM Distinguished Engineer in the Software Group and an IBM Rational CTO team member. The second IBM stint began with IBM acquiring Rational Software, where Murray was the Lead Engineer for Rational Services. In that role, he consulted on delivering large projects at Boeing, Raytheon, Lockheed, and various intelligence agencies. He was the IBM representative to SysML partners who created the Object Management Group’s System Modeling Language standard. While at Rational, He was the lead author of the Rational Unified Process for System Engineering (RUPSE). Before joining Rational, he was project lead at the defense and intelligence contractor TASC, delivering systems for Space Command. | 2025 Combining Joint Cost and Schedule Risk Analysis with Earned Murray Cantor Cantor Consulting X Competence Measure Enhanced Ensemble Voting Schemes Session Recording Materials Abstract: Ensemble methods are comprised of multiple individual models each producing a prediction. Voting schemes are used to weigh the decisions of the individual models, namely classifiers, to predict class. A well-formed ensemble should be formed from classifiers with diverse assumptions, e.g., differing underlying training data, feature space selection, and therefore decision boundaries. Voting scheme methods are often based on consideration of underlying feature space and individual reported classifier confidence in predictions. Diversity across the classifiers is to an advantage, but is not being fully exploited with existing voting schemes. The purpose of the described concept is to enhance current voting scheme approaches by weighing the individual model competence measures ensuring that input data are appropriate to the prediction space of the individual classifiers. Consideration of the individual classifiers in the voting for the specified input will be based on achieving a threshold model competence measure. This approach appends confidence-based schemes with ensuring that inputs are consistent with the training data of the individual models. An application employing random forest classifiers will be demonstrated. Speaker Info: Francesca McFadden Principal Professional Staff The Johns Hopkins University Applied Physics Laboratory Francesca McFadden works on the modeling, simulation, and analysis to evaluate system architectures. Francesca has a Masters Degree in Applied Mathematics. | 2025 Competence Measure Enhanced Ensemble Voting Schemes Francesca McFadden The Johns Hopkins University Applied Physics Laboratory X Computer Experiments for Meta-learning of Machine Learning Models Materials Abstract: Operationally realistic data to inform machine learning models can be costly to gather. An example is collecting aerial images of rare objects to train an image classifier. Before collecting new data, it is helpful to understand where your model is deficient. For example, it may not be good at identifying rare objects in seasons not well represented in the training data. We offer a way of informing subsequent data acquisition to maximize model performance by leveraging both the toolkit of computer experiments and the metadata describing the circumstances under which the training data was collected (e.g. time of day, location, source). We do this by treating the composition of metadata and the performance of the learner, respectively, as the inputs and outputs of a Gaussian process (GP). The resulting GP fit shows which metadata features yield the best learner performance. We take this a step further by using the GP to inform new data acquisitions, recommending the best circumstances to collect future data. Our method for active learning offers improvements to learner performance as compared to data with randomly selected metadata, which we illustrate on image classification and detection benchmarks. Speaker Info: Anna Flowers Ph.D. Student Virginia Tech Anna Flowers is a fourth-year Ph.D. student in Statistics at Virginia Tech. She received an M.S. in Statistics from Virginia Tech in 2023 and a B.S. in Mathematical Statistics from Wake Forest University in 2021. She is jointly advised by Bobby Gramacy and Chris Franck, and her research focuses on Gaussian Process regression, surrogate modeling, and active learning. She was an intern at the Institute for Defense Analyses in 2024. | 2025 Computer Experiments for Meta-learning of Machine Learning Anna Flowers Virginia Tech X Confidence Based Skip-Lot Sampling Session Recording Materials Abstract: The Lake City Army Ammunition Plant (LCAAP) in Independence, Missouri produces and tests millions of rounds of small arms ammunition daily, in many cases using decades-old procedures. While these testing methods are effective, the long history of high manufacturing quality for most products suggests they are significantly over-testing the lots in production. To address this issue, we have developed a skip-lot testing procedure that uses a Bayesian approach to estimate the true quality of each lot in production, even as some lots are skipped. By using this updated approach, we can reduce the total number of tests required while controlling the risk of accepting low-quality lots. Simulation results demonstrate that this process both reduces the number of tests required and meets the production facility’s standards for risk exposure. Speaker Info: Alexander Boarnet Cadet United States Military Academy I am currently a senior at the United States Military Academy Majoring in Mathematical Science. I have research interests in applied statistics and improving testing procedures. In my free time I am the captain of the West Point Judo team. | 2025 Confidence Based Skip-Lot Sampling Alexander Boarnet United States Military Academy X Creating a Robust Cyber Workforce Session Recording Materials Abstract: The current talent pool is struggling to keep pace with the demand for cyber professionals. The rapid pace of technological advancement coupled with the evolving nature of cyber threats has created a constant need for relevant skills. To make cyber careers more accessible, the Office of the National Cyber Director (ONCD) has outlined several initiatives in the National Cyber Workforce and Education Strategy. ONCD policy initiatives include removing unnecessary degree requirements, transitioning to a skills-based hiring approach, expanding work-based learning opportunities, and supporting efforts to bring together employers, academia, local governments, and non-profit organizations. While the federal government has implemented measures such as the Workforce Innovation and Opportunity Act to tackle skills gaps, the aim of our study is to measure the gap between the workforce and unfilled cyber positions and decipher what parameters are necessary to close the skills gap without overfitting for unfit candidates? By implementing data cleaning and isolation of a LinkedIn job postings dataset, we found more than 124,000 cyber job descriptions. With these records, we used key words to bin cyber job descriptions into three distinct levels for each job description. We then used natural language processing and text analytics to identify the knowledge, skills, and abilities (KSAs) undergraduate students obtain from their education. Finally, we extend existing cosine similarity methods to enable us to determine how similar employer job descriptions are to the KSAs of the job applicant population. Using these methods, cyber policy makers and employers have additional tools to ensure that cyber jobs are filled by the correct candidates with applicable experience and credentials. Speaker Info: Sheyla Street Cadet United States Military Academy Cadet Sheyla Street studies Applied Statistics and Data Science with a Cybersecurity Engineering track at the United States Military Academy. Sheyla’s published research projects have focused on social bias in artificial intelligence and interoperability within NATO. As an interdisciplinary scholar and future Army Engineer Officer, Sheyla hopes to bridge gaps between technology, policy, and justice. Sheyla’s general research interests include artificial intelligence, policy, computer vision, and social bias. At West Point Sheyla serves as a Regimental Prevention Officer, Stokes Writing Center Senior Fellow, President of the MELANATE Club, CLD STEM volunteer, and varsity athlete on the Army Women’s Crew Team. She has also served as the Cadet in Charge of the Corbin Women’s Leadership Forum Workouts and Cultural Affairs Seminar tutor. As President of the Tau Theta Chapter of Delta Sigma Theta Sorority Inc., Sheyla remains active in public service and social action. | 2025 Creating a Robust Cyber Workforce Sheyla Street United States Military Academy X Developing a Social Engineering Framework and Data Collection Standards for CAT Operations Session Recording Materials Abstract: For more than a decade, IDA has supported DOT&E through detailed analysis of cyber attack pathways. This work has resulted in multiple IDA publications and inputs to DOT&E reports that have informed Congressional, U.S. Cyber Command, and OSD Chief Information Officer decision-making. The IDA attack pathway analysis has used the MITRE on-network ATT&CK framework as a common taxonomy and classification for cyber attack actions, and to enable the development of data standards, analysis methodologies, and reporting. However, the MITRE ATT&CK framework only applies to on-network activity, and does not include a framework for assessing physical security. Physical penetrations parallel on-network cyber attacks in many ways; both are multi-step activity chains where steps in the attack chain enable one or more later steps. Therefore, it should be possible to analyze the data from Close Access Team (CAT) physical security assessments similarly to how IDA has historically treated on-network cyber attack data. This presentation provides an overview of the Central Research Project (CRP) work to develop a framework for CAT social engineering and physical intrusion TTPs as an analog to the MITRE ATT&CK framework. After an overview of how IDA currently uses the MITRE ATT&CK framework and program data standards to analyze CAT mission data and produce data trend products in support of the DOT&E Cyber Assessment Program (CAP), we provide an overview of the academic literature used as a foundation for the development of IDA’s own social engineering framework the Social Spoofing Security Analysis Reference (S3AR). S3AR summarizes CAT operator TTPs into four nodes: Planning, ingress, lateral, and objective. These nods span the life cycle of a CAT operator planning their attack (planning node), entering the target location of interest (ingress node), operating within that location of interest (lateral node), and deploying a device on a network of interest (objective node). IDA leveraged academic literature as well as a tailored survey of CAT operations deployed to CAT operators spanning 5 CATs. The results of this survey and their direct tie to the continued development of lateral node of the analysis reference TTPs are presented. In addition to the original charge to develop an analogous analysis framework to MITRE ATT&CK for physical ingress and social engineering techniques, IDA also generated data collection standards for CAT operations. The three data collection sheets presented are designed to be completed during CAT missions and each align to at least one of the analysis reference nodes: planning and prep sheet (planning node), daily actions sheet (ingress and lateral node), and systems accessed sheet (objective node). Speaker Info: Wendy-Angela Agata-Moss RSM IDA Dr. Saringi Agata-Moss holds a PhD in Chemical Engineering from UVA and is currently a Research Staff Member in the Operational Evaluation Division. Saringi primarily works on the Cyber Assessment Program task supporting cyber planning and analysis for US Strategic Command annual assessments , Nuclear Command, Control and Communications (NC3) special assessments, Persistent Cyber Operations (PCO), and the CAP's Close Access Team assessments. In addition, Saringi was recently funded by IDA's Central Research Program to develop a social engineering and physical ingress tactics, techniques, and procedures analysis framework to assist in continued CAP work with CAT assessments for the DOD. Saringi enjoys supporting DE&I recruitment efforts for IDA particularly in engineering to show the pathway from academia to government support work. | 2025 Developing a Social Engineering Framework and Data Wendy-Angela Agata-Moss IDA X Developmental T&E of Autonomous Systems – Consolidated Challenges and Guidance Session Recording Materials Abstract: Developmental T&E of Autonomous Systems – Consolidated Challenges and Guidance This presentation will give an overview of challenges, methodologies, and best practices for Developmental Test and Evaluation of Autonomous Systems. This addresses the novel challenges of removing human operators from DoD systems, and empowering future autonomous systems to independently act in contested environments. These challenges, such as safety, black-box components, data, and human-machine teaming, demand iterative approaches to evaluating the growing capabilities of autonomous systems, to assure trusted mission capability across complex operational environments. The guidance provided includes lessons learned and best practices for the full continuum of autonomy T&E, such as runtime assurance, LVC testing, continuous testing, and cognitive instrumentation. This guidance leverages emerging best practices in agile and iterative testing to extend success throughout the T&E continuum. By applying these best practices to achieve efficient, effective, and robust DT&E, autonomous DoD systems will be primed for successful operational T&E and operational employment. The information presented is being published as a new “Developmental T&E of Autonomous Systems Guidebook” which is intended to be a living document contributed by a broad community and will adapt to ensure the best information reaches a wide audience. The views expressed are those of the author /presenter and do not necessarily reflect the official policy or position of the Department of the Air Force, the Department of Defense, or the U.S. government Speaker Info: Charlie Middleton Consultant STAT Center of Excellence Charlie Middleton, Scientific Test & Analysis Techniques (STAT) Center of Excellence, Technical Support Contractor Charlie Middleton currently leads the Advancements in Test and Evaluation (T&E) of Autonomous Systems team for the OSD STAT Center of Excellence. His responsibilities include researching autonomous system T&E methods and tools; collaborating with Department of Defense program and project offices developing autonomous systems; leading working groups of autonomy testers, staffers, and researchers; and authoring a guidebook, reports, and papers related to T&E of autonomous systems. | 2025 Developmental T&E of Autonomous Systems – Consolidated Charlie Middleton STAT Center of Excellence X Digital Engineering and Test and Evaluation: How DE impacts T&E Session Recording Materials Abstract: Since the Office of the Undersecretary of Defense, Research and Engineering released its Digital Engineering (DE) strategy in 2018, the services along with supporting agencies and industry have been working to transform their practices, improve tooling, and upskill their workforce to realize the vision of a digitally harmonized engineering environment that supports Department of Defense (DoD) weapon system acquisition as well as operations and sustainment of existing weapon systems. In kind, the Test and Evaluation (T&E) community within the DoD and supporting agencies is advancing and institutionalizing the application of DE methods to T&E. The research team at the Acquisition Innovation Research Center (AIRC) have developed a short course to introduce program offices and other T&E professionals to the basics of implementing DE in support of verification, validation, and accreditation activities. This two-hour course will introduce DE concepts and lifecycles, establish the value proposition of DE methods in DoD acquisition, introduce various DE tools, and share best practices as well as potential challenges in applying DE methods to T&E efforts. The course will address basics in Model Based Mission Engineering, Model Based Systems Engineering, Digital Design, Modeling & Simulation, Model Based T&E Planning, as well as briefly explore application of Generative Artificial Intelligence to T&E. Participants will leave with a working knowledge of how DE can be applied to their specific tasks and have an awareness of some of the tooling employed across the DoD for realizing DE objectives. Speaker Info: Paul Wach Research Assistant Professor Virginia Tech National Security Institute Dr. Paul Wach is a Research Assistant Faculty with the Intelligent Systems Division of the Virginia Tech National Security Institute and is an Adjunct Faculty with the Grado Department of Industrial & Systems Engineering. His research interests include the intersection of theoretical foundations of systems engineering, digital transformation, and artificial intelligence. Specifically, Dr. Wach’s research is at the cutting edge of conjoining model-based systems engineering (MBSE), modeling & simulation (M&S), and generative AI (e.g., LLMs). He is also associated with The Aerospace Corporation, serving as a subject matter expert on digital transformation. Dr. Wach has prior work experience with the Department of Energy in lead engineering and management roles ranging in scope magnitude from $1-12B as well as work experience with two National Laboratories and the medical industry. He received a B.S. in Biomedical Engineering from Georgia Tech, M.S. in Mechanical Engineering from the University of South Carolina, and Ph.D. in Industrial & Systems Engineering from Virginia Tech. Mini-Tutorial | 2025 Digital Engineering and Test and Evaluation: How DE impacts Paul Wach Virginia Tech National Security Institute X Digital Engineering and Test and Evaluation: Operation Safe Passage Session Recording Materials Abstract: Upon the release of the Digital Engineering (DE) Strategy in 2018 by the Office of the Undersecretary of Defense for Research and Engineering, the Department of Defense (DoD) and supporting agencies and industry have been working towards transforming practices, improving tooling, and enhancing a workforce that will contribute to an enhanced engineering environment. This balanced and harmonized environment supports weapon acquisition as well as the upkeep and sustainment of existing weapon systems, contributing to the Test and Evaluation (T&E) community. The advancement and commitment to applying DE methods to T&E are highlighted within the DoD and its supporting agencies. The research team at the Acquisition Innovation Research Center (AIRC), alongside the University Affiliated Research Center (UARC) in partnership with the Director, Operational Test and Evaluation (DOT&E) have been developing and maturing a proxy as a solution for a DoD system acquisition to enhance the digital transformation of T&E. A testbed and framework have been developed with the mission defined as an Unmanned Ground Vehicle navigating through a minefield and disarming them to allow cargo and troops to be transported safely. The mission title is regarded as Operation Safe Passage (OSP). The proxy for the mission utilizes the Lego Mindstorm™ as a physical model, while digital models include Computer Aided Design (CAD) models, SysML-based models, physics-based models for analysis, and decision dashboards. This integrated approach enables rapid decision-making by connecting architecture models, test planning, and physical testing. The testbed and framework for OSP enhance the vision of prospective transformations in T&E Speaker Info: Brandt Sandman Graduate Research Assistant Virginia Tech Brandt Sandman is a first year Ph.D student in the Industrial and Systems Engineering program at Virginia Tech. He has been working as a Graduate Research Assistant with a focus in Digital Transformation. | 2025 Digital Engineering and Test and Evaluation: Operation Safe Brandt Sandman Virginia Tech X Digital Twins in Reliability Engineering: Innovations, Challenges and Opportunities Materials Abstract: The digital twin (DT) is a rapidly evolving technology that draws upon a multidisciplinary foundation, integrating principles from computer science, physics, mathematics, statistics, and engineering. Its applications are diverse, spanning industries such as engineering, healthcare, biomedicine, climate changes, renewable energy, and national security. This work aims to discuss the characterization, development, and application of DT as well as to identify both the challenges and opportunities that lie ahead. The study identifies research gaps and a path forward to advance the statistical and computational foundations and applications of DT in the field of reliability engineering and preventive maintenance for the statistical quality control and assurance. Fostering innovation in the total quality management, DT is poised to transform industries. Leveraging advanced data analytics, data science, machine learning (ML), and artificial intelligence (AI), DT enables monitoring, simulation, and optimization of complex systems, ensuring higher quality, greater reliability, and improved decision-making. Addressing the challenges and opportunities, continued investment in the DT technologies will drive the next wave of engineering excellence and operational efficiency. Speaker Info: David Han UT San Antonio David Han, M.S., Ph.D. teaches statistics and data science at the University of Texas at SanAntonio. His research interests include statistical modeling and inference, machine learning, andartificial intelligence applied to lifetime analysis and reliability engineering. | 2025 Digital Twins in Reliability Engineering: Innovations, David Han UT San Antonio X Enabling Efficient Research and Testing Through Data Stewardship on the Individual Level Materials Abstract: Data governance determines how an organization makes decisions, while data stewardship determines how these decisions are carried out. Broadly, a data steward is a person with data-related responsibilities. One might execute data-related plans made by data governors, engage with metadata, resolve data issues, or track individual data elements. Benefits of data stewardship include ensuring the quality of data. Taken on an individual or team level, this can lead to savings in time and money and increased accuracy of research and testing. However, the return on investment associated with data stewardship can be best seen if clear goals and data management bottlenecks are identified first. This presentation will introduce data stewardship and discuss a case where the author applied data stewardship principles to their research practice. Data management strategies will be discussed. Goals, progress, challenges, and lessons learned will be presented. Speaker Info: Christin Lundgren NASA Langley Research Center Dr. Christin Lundgren joined NASA Langley Research Center in 2021 after 10 years in industry, with a background in RF engineering, optics, and electromagnetics. She is currently a computational electromagnetics researcher in the Revolutionary Aviation Technologies Branch. Previously, Dr. Lundgren was a Lead in Optical Engineering at L3Harris Technologies, working in RF photonics. As a graduate student at the University of Arizona, she focused on electro-optic modulators and optical modeling of nanostructured materials for solar cells. She has authored multiple publications and two U.S. patents. She is a Senior Member of IEEE and the Treasurer of the Hampton Roads section of the Society of Women Engineers. Presentation | 2025 Enabling Efficient Research and Testing Through Data Christin Lundgren NASA Langley Research Center X Estimating Combat Losses: An Application of Multiple System Estimation Session Recording Materials Abstract: Recently, analysts from OED’s live fire group reviewed multiple sources to amass a comprehensive list of aircraft combat damage events that are of interest to vulnerability assessments. Many of these events could be found in more than one source. Analysts believe there were events not known to any of the sources and are absent from the dataset. We want to estimate this number of unobserved events, which when combined with observed events would produce a better estimate of the total number of events that actually occurred. Ecologists developed statistical techniques to estimate the sizes of hard-to-count populations, and these techniques can be applied to other domains where observations are recorded in multiple lists. In this presentation, I will demonstrate various techniques of multiple system estimation, building on simple intuition. I will then apply it to the combat loss data using several computational tools readily available in R. The analysis will show that the total number of events is likely to be much larger than the observed count. Speaker Info: Gregory Chesterton RSM IDA Greg Chesterton is a research staff member with a master’s degree in operations research from the Naval Postgraduate School. After graduating in 1988 from The Pennsylvania State University with a degree in aerospace engineering, Greg served 20 years in the Marine Corps as a Naval Flight Officer, with tours in the A-6E Intruder and the F/A-18D. Greg graduated from the Naval Postgraduate School in 2005, and spent two years at the Marine Corps Operational Test and Evaluation Activity (MCOTEA). After retiring from active duty as a Lieutenant Colonel in 2008, Greg joined MITRE’s Center for Advanced Aviation System Development (CAASD) where he produced quantitative safety risk analysis products for numerous FAA sponsors during a 15-year tenure. Greg joined OED’s live fire portfolio in 2024. In addition to his LFT&E responsibilities, his technical areas of interest are in design of experiments, data analysis, and statistical inference. Presentation | 2025 Estimating Combat Losses: An Application of Multiple System Gregory Chesterton IDA X Eucalyptus – An Analysis Suite for Fault Trees with Uncertainty Quantification Session Recording Materials Abstract: Eucalyptus – An Analysis Suite for Fault Trees with Uncertainty Quantification Eucalyptus is a novel code developed by Lawrence Livermore National Laboratory to incorporate uncertainty quantification into Fault Tree Analysis. This tool addresses the challenge of imperfect knowledge in “grey-box” systems by allowing analysts to incorporate and propagate uncertainty from component-level assessments to system-level effects. Eucalyptus facilitates a consistent evaluation of the impact of subject matter expert judgment and knowledge gaps on overall system response by Monte Carlo generation of possible system fault trees, sampling probabilities of the existence of subsystems and components. The code supports the specification of fault trees through text and allows export to various formats, including auto-generated images, easing analysis and reducing errors. It has undergone extensive verification testing, demonstrating its reliability and readiness for deployment, and leverages on-node parallelism for rapid analysis. Example analyses are shown that include the identification of system failure paths and quantification of the value of further information about system components. Speaker Info: Adam Taylor Computational Engineering Analyst Lawrence Livermore National Laboratory Adam Taylor is a computational analyst at Lawrence Livermore National Laboratory, specializing in structural and hydrodynamic simulations. | 2025 Eucalyptus – An Analysis Suite for Fault Trees with Adam Taylor Lawrence Livermore National Laboratory X Evaluating Metrics for Multiclass Computer Vision Models Materials Abstract: In support of Chief Digital and Artificial Intelligence Office (CDAO) ongoing efforts to provide best practices and methods for test and evaluation of artificial intelligence-enabled systems, I was tasked to examine metrics for computer vision models. In particular, I studied and implemented multiclass metrics for computer vision models. A table was produced and lists the strengths and weaknesses of different popular metrics. I then simulated these strengths and weaknesses by using the CDAO's Joint Artificial Intelligence Infrastructure Capability (JATIC), a python-based test and evaluation package, to evaluate models trained on the overhead MNIST2 satellite imaging dataset for the automated target recognition (ATR) use case. Speaker Info: Jeff Lin AI Assurance Research Associate I Institute for Defense Analyses Jeff Lin is an AI Assurance Research Associate at IDA, where they specialize in computer vision and data science. With a B.S. in Computer Science and an M.S. in Data Science, Jeff’s current work focuses on the development of a computer vision guidebook, which outlines best practices and guidelines for computer vision applications within the DoD setting. This past summer, Jeff served as a summer associate, concentrating on metric evaluation for multi-class computer vision models. This brief has been peer-reviewed and sent to the Chief Digital and Artificial Intelligence Office. | 2025 Evaluating Metrics for Multiclass Computer Vision Models Jeff Lin Institute for Defense Analyses X F-ANOVA: Tutorial for Grouping Data and Identifying Interactions Across Arbitrary Domains Session Recording Materials Abstract: Extending Analysis of Variance (ANOVA) to functional data enables researchers and analysts to better understand how categorical variables influence data that vary continuously over a common domain, such as time or frequency. Functional ANOVA (F-ANOVA) builds upon the strengths of traditional scalar ANOVA, allowing for one-way and two-way analyses to test the equality of mean and covariance functions across groups at a given statistical significance threshold. This capability is particularly valuable for uncovering meaningful insights into functional data at both the group level and the interaction level. In the absence of group and interaction effects, datasets can be confidently pooled, resulting in larger sample sizes and enhanced statistical power. To address the lack of existing tools for performing F-ANOVA, a custom library was developed and validated, offering unique analytical capabilities while maintaining robust performance. These capabilities include two-way analyses, equality of covariance tests, and greater support when heteroscedasticity is present between groups. This library not only simplifies the application of F-ANOVA but also provides tools tailored for handling diverse functional data scenarios. Best practices for F-ANOVA will be demonstrated in the context of mechanical shock analysis, a domain where functional data is particularly beneficial. However, the library's design makes it broadly applicable to other fields, offering a versatile solution for modern functional data challenges. Speaker Info: Adam Watts R&D Engineer Los Alamos National Laboratory Adam Watts specializes in applying statistical methods and uncertainty quantification to complex engineering challenges. His expertise includes the uncertainty quantification of chemical kinetics in thermosetting polymers, as well as the thermomechanical properties of constituent materials used in legacy aeroshells as a function of temperature. His research interests focus on functional data analysis and computational statistics, with an emphasis on leveraging these methods to solve engineering problems. Adam is an R&D Engineer on the Data Analysis Team for Test Engineering at Los Alamos National Laboratory (LANL). He holds a B.S. in Plastics and Composites Engineering from Western Washington University and an M.S. in Textile Engineering from North Carolina State University. Presentation | 2025 F-ANOVA: Tutorial for Grouping Data and Identifying Adam Watts Los Alamos National Laboratory X Fast solvers and UQ for computationally demanding inverse problems Abstract: Satellite-based remote sensing of greenhouse gases and carbon cycle science are examples of operational science data production use cases where thousands to millions inverse problems have to be solved daily as part of processing pipelines. These inversions are typically computationally costly, and further require rigorous Uncertainty Quantification (UQ) to ensure the reliability of data products for downstream users. Even current state-of-the-art methods face considerable challenges with downlinked data volumes, and these problems are only getting more pressing with upcoming next generation of Earth observing satellites and orders of magnitudes increase in data volume. In this talk, we present recent advances in computationally efficient statistical methods and machine learning to tackle these pressing issues. We present approaches on emulating the costly atmospheric radiative transfer physics models to lower the computational burden of inversions, as well as techniques to emulating a direct solution to the inverse problem together with well-calibrated UQ. Specifically, we focus on efficient Gaussian process regression for forward model emulation, and Gaussian mixture modeling and diffusion-based approaches for the inversion and UQ. Speaker Info: Otto Lamminpaeae Data Scientist NASA JPL Dr. Otto Lamminpaeae is a data scientist and applied mathematician working on greenhouse gas retrieval Uncertainty Quantification, Gaussian process emulation of computationally expensive radiative transfer models, fast UQ-aware direct retrieval techniques using machine learning and mixture modeling, and Markov Chain Monte Carlo (MCMC) methods at NASA Jet Propulsion Laboratory. His research is mainly conducted as a member of OCO-2 and OCO-3 UQ teams, with additional interest and work on NASA's EMIT and GeoCarb missions, planetary boundary layer investigation of water vapor and clouds, and terrestrial carbon cycle modeling using the CARDAMOM framework. Dr. Lamminp received his PhD in Applied Mathematics from the University of Helsinki, Finland, while working in the Greenhouse Gases and Satellite Methods research group at the Finnish Meteorological Institute (FMI). In his dissertation work, he applied Dimension Reduction and MCMC to two remote sensing retrieval problems: the XCO2 retrieval of NASA's Orbiting Carbon Observatory 2 and the CH4 profile retrieval of the Sodankyl, Finland, TCCON station. He maintains active academic research collaboration with FMI and several US based universities, and is passionate about bringing experts on cutting edge physics modeling and advanced computational mathematics together to solve current challenges in Earth science. Presentation | 2025 Fast solvers and UQ for computationally demanding inverse Otto Lamminpaeae NASA JPL X From PDFs to Insights: A Machine Vision and LLM Approach to Test Science with TEMP Copilot Materials Abstract: At the Institute for Defense Analyses (IDA), we are advancing test science through the use of machine vision and generative AI to process a database of 50,000 Test and Evaluation Master Plans (TEMPs) in PDF format. This effort is encapsulated in an air-gapped, retrieval-augmented LLM framework known as TEMP Copilot. We demonstrate our methodology using a repository of DOT&E annual reports. Starting with scanned, inaccessible PDFs, these reports are transformed by applying machine learning techniques in combination with LLM-enabled qualitative research tools. Our approach reconstructs, parses, and restructures the dataset to enable efficient querying. In this demonstration, we compare various querying techniques on our dataset and show that a combination of data-centric LLM routing and layered search algorithms outperforms traditional frameworks, such as basic retrieval-augmented generation (RAG) and GRAPH-RAG. Speaker Info: Valerie Bullock AI Researcher Institute for Defense Analyses At the Institute for Defense Analyses (IDA), Valerie is currently developing LLM capabilities to analyze test science data, polls, and focus group interviews, in addition to her ongoing efforts to integrate generative AI (GenAI) into military exercises. Prior to her role at IDA, Valerie worked as a quantitative researcher in the equities markets at a Chicago-based investment bank. Her journey into machine learning began in 2015 during her bachelor's degree, when she first coded a neural network from scratch and began researching bio-inspired neural network architectures. During her graduate studies, Valerie focused on the underlying mathematics and optimization techniques of machine learning. She holds a Master’s degree in Operations Research from the Kellogg School of Management and a Master’s degree in Applied Mathematics from Northwestern University. | 2025 From PDFs to Insights: A Machine Vision and LLM Approach to Valerie Bullock Institute for Defense Analyses X Functional Data Analysis – “What to do when your data are a curve or spectra” Session Recording Session Recording part2 Materials Abstract: Are you currently NOT USING YOUR ENTIRE DATA STREAM to inform decisions? Sensors that stream data (e.g., temperature, pressure, vibration, flow, force, proximity, humidity, intensity, concentration, etc.), as well as radar, sonar, chromatography, NMR, Raman, NIR, or mass spectroscopy, all measure a signal versus a longitudinal component like wavelength, frequency, energy, distance, or in many cases - time. Are you just using select points, peaks, or thresholds in your curved or spectral data to evaluate performance? This course will show you how use the complete data stream to improve your process knowledge and make better predictions. Curves and spectra are fundamental to understanding many scientific and engineering processes. They are created by many types of test and manufacturing processes, as well as measurement and detection technologies. Any response varying over a continuum is functional data. Functional Data Analysis (FDA) uses functional principal components analysis (FPCA) to break curve or spectral data into two parts - FPC Scores and Shape Components. The FPC Scores are scalar quantities (or weights) that explain function-to-function variation. The Shape Components explain the longitudinal variation. FPC Scores can then be used with a wide range of traditional modeling and machine learning methods to extract more information from curves or spectra. When these functional data are used as part of a designed experiment, the curves and spectra can be well predicted as functions of the experimental factors. Curves and spectra can also be used to optimize or “reverse engineer” factor settings. In a machine learning application functional data analysis uses the whole curve or spectra to better predict outcomes than employing “landmark” or summary statistical analyses of individual peaks, slopes, or thresholds. References and links will be provided for open-source tools to do FDA, but in this course JMP Pro 18 software will be used to demonstrate analyses and to illustrate multiple case studies. See how a functional model is created by fitting a B-spline, P-spline, Fourier, or Wavelets basis model to the data. One can also perform functional principal components analysis directly on the data, without fitting a basis function model first. Direct Models include several Singular Value Decomposition (SVD) approaches as well as Multivariate Curve Resolution (MCR). Curve or spectral data can often be messy. Several data preprocessing techniques will be presented. Methods to cleanup (remove, filter, reduce), transform (center, standardize, rescale), and align data (line up peaks, dynamic time warping) will be demonstrated. Correction methods specific to spectral data including Standard Normal Variate (SNV), Multiplicative Scatter Correction (MSC), Savitzky-Golay filtering, and Baseline Correction will be shown. Case studies will be used to demonstrate the methods discussed above. Speaker Info: Ryan Parker Senior Research Statistician Developer JMP Ryan Parker is a Senior Research Statistician Developer at JMP. Parker develops the Functional Data Explorer platform for JMP Pro, and he is responsible for the Gaussian Process platform and the Bootstrap/Simulate features. He has also contributed to the assess variable importance technique in JMP Profiler, as well as variable clustering. He studied statistics at North Carolina State University, earning a PhD in 2015. Short Course Group | 2025 Functional Data Analysis – “What to do when your data are a Ryan Parker JMP X Functional Data Analysis – “What to do when your data are a curve or spectra” Session Recording Session Recording part2 Materials Abstract: Are you currently NOT USING YOUR ENTIRE DATA STREAM to inform decisions? Sensors that stream data (e.g., temperature, pressure, vibration, flow, force, proximity, humidity, intensity, concentration, etc.), as well as radar, sonar, chromatography, NMR, Raman, NIR, or mass spectroscopy, all measure a signal versus a longitudinal component like wavelength, frequency, energy, distance, or in many cases - time. Are you just using select points, peaks, or thresholds in your curved or spectral data to evaluate performance? This course will show you how use the complete data stream to improve your process knowledge and make better predictions. Curves and spectra are fundamental to understanding many scientific and engineering processes. They are created by many types of test and manufacturing processes, as well as measurement and detection technologies. Any response varying over a continuum is functional data. Functional Data Analysis (FDA) uses functional principal components analysis (FPCA) to break curve or spectral data into two parts - FPC Scores and Shape Components. The FPC Scores are scalar quantities (or weights) that explain function-to-function variation. The Shape Components explain the longitudinal variation. FPC Scores can then be used with a wide range of traditional modeling and machine learning methods to extract more information from curves or spectra. When these functional data are used as part of a designed experiment, the curves and spectra can be well predicted as functions of the experimental factors. Curves and spectra can also be used to optimize or “reverse engineer” factor settings. In a machine learning application functional data analysis uses the whole curve or spectra to better predict outcomes than employing “landmark” or summary statistical analyses of individual peaks, slopes, or thresholds. References and links will be provided for open-source tools to do FDA, but in this course JMP Pro 18 software will be used to demonstrate analyses and to illustrate multiple case studies. See how a functional model is created by fitting a B-spline, P-spline, Fourier, or Wavelets basis model to the data. One can also perform functional principal components analysis directly on the data, without fitting a basis function model first. Direct Models include several Singular Value Decomposition (SVD) approaches as well as Multivariate Curve Resolution (MCR). Curve or spectral data can often be messy. Several data preprocessing techniques will be presented. Methods to cleanup (remove, filter, reduce), transform (center, standardize, rescale), and align data (line up peaks, dynamic time warping) will be demonstrated. Correction methods specific to spectral data including Standard Normal Variate (SNV), Multiplicative Scatter Correction (MSC), Savitzky-Golay filtering, and Baseline Correction will be shown. Case studies will be used to demonstrate the methods discussed above. Speaker Info: Clay Barker Principal Research Statistician Developer JMP Clay Barker is a Principal Research Statistician Developer for JMP Statistical Discovery in Cary, North Carolina. He has developed a wide variety of capabilities in JMP, including tools for variable selection, generalized linear modeling, and nonlinear modeling. Recently, his focus has been on implementing matrix decompositions to be used when analyzing functional data. Clay joined JMP after earning his PhD in Statistics from North Carolina State University. Short Course Group | 2025 Functional Data Analysis – “What to do when your data are a Clay Barker JMP X Functional Data Analysis – “What to do when your data are a curve or spectra” Session Recording Session Recording part2 Materials Materials Abstract: Are you currently NOT USING YOUR ENTIRE DATA STREAM to inform decisions? Sensors that stream data (e.g., temperature, pressure, vibration, flow, force, proximity, humidity, intensity, concentration, etc.), as well as radar, sonar, chromatography, NMR, Raman, NIR, or mass spectroscopy, all measure a signal versus a longitudinal component like wavelength, frequency, energy, distance, or in many cases - time. Are you just using select points, peaks, or thresholds in your curved or spectral data to evaluate performance? This course will show you how use the complete data stream to improve your process knowledge and make better predictions. Curves and spectra are fundamental to understanding many scientific and engineering processes. They are created by many types of test and manufacturing processes, as well as measurement and detection technologies. Any response varying over a continuum is functional data. Functional Data Analysis (FDA) uses functional principal components analysis (FPCA) to break curve or spectral data into two parts - FPC Scores and Shape Components. The FPC Scores are scalar quantities (or weights) that explain function-to-function variation. The Shape Components explain the longitudinal variation. FPC Scores can then be used with a wide range of traditional modeling and machine learning methods to extract more information from curves or spectra. When these functional data are used as part of a designed experiment, the curves and spectra can be well predicted as functions of the experimental factors. Curves and spectra can also be used to optimize or “reverse engineer” factor settings. In a machine learning application functional data analysis uses the whole curve or spectra to better predict outcomes than employing “landmark” or summary statistical analyses of individual peaks, slopes, or thresholds. References and links will be provided for open-source tools to do FDA, but in this course JMP Pro 18 software will be used to demonstrate analyses and to illustrate multiple case studies. See how a functional model is created by fitting a B-spline, P-spline, Fourier, or Wavelets basis model to the data. One can also perform functional principal components analysis directly on the data, without fitting a basis function model first. Direct Models include several Singular Value Decomposition (SVD) approaches as well as Multivariate Curve Resolution (MCR). Curve or spectral data can often be messy. Several data preprocessing techniques will be presented. Methods to cleanup (remove, filter, reduce), transform (center, standardize, rescale), and align data (line up peaks, dynamic time warping) will be demonstrated. Correction methods specific to spectral data including Standard Normal Variate (SNV), Multiplicative Scatter Correction (MSC), Savitzky-Golay filtering, and Baseline Correction will be shown. Case studies will be used to demonstrate the methods discussed above. Speaker Info: Tom Donnelly Systems Engineer JMP Tom Donnelly works as a Systems Engineer for JMP Statistical Discovery supporting users of JMP software in the Defense and Aerospace sector. He has been actively using and teaching Design of Experiments (DOE) methods for the past 40 years to develop and optimize products, processes, and technologies. Donnelly joined JMP in 2008 after working as an analyst for the Modeling, Simulation & Analysis Branch of the US Army’s Edgewood Chemical Biological Center – now DEVCOM CBC. There, he used DOE to develop, test, and evaluate technologies for detection, protection, and decontamination of chemical and biological agents. Prior to working for the Army, Tom was a partner in the first DOE software company for 20 years where he taught over 300 industrial short courses to engineers and scientists. Tom received his PhD in Physics from the University of Delaware. Short Course Group | 2025 Functional Data Analysis – “What to do when your data are a Tom Donnelly JMP X Hardened Extension of the Adversarial Robustness Toolbox: Evaluating & Hardening AI Models Session Recording Materials Abstract: Reliable AI systems require secure AI models. The proliferation of AI capabilities in civilian and government workflows creates novel attack vectors for adversaries to exploit. The Adversarial Robustness Toolbox (ART), created in 2018 by IBM Research, is designed to simulate and evaluate threats targeting modern AI systems, identify which AI models are at greatest risk, and provide methods for risk mitigation. ART is designed to support red-blue/attack-defense test and evaluation operations and contains a broad catalog of state-of-the-art methods encompassing evasion, poisoning, extraction, and inference attacks. ART is accessible as an open-source software repository via APIs that execute the evaluation, defense, certification, and verification of AI models. ART supports a wide range of AI frameworks (e.g. TensorFlow, PyTorch), tasks (e.g. classification, object detection, speech recognition) and data types (e.g. images, video, audio). This enables end users to bring their own custom datasets and AI models to assess model adversarial robustness using a variety of attack types, explore available avenues to mitigate attacks’ impacts, and harden pre-trained models via fine-tuning. Using ART, end users can better understand and ultimately mitigate vulnerabilities. ART is continuing to support mission critical enhancements such as supporting scalability across GPUs and the addition new state of the art methods, including automated detection of adversarial inputs. In collaboration with the Department of Defense (DoD)’s Chief Digital and AI Office (CDAO), IBM has created an extension of ART to support a suite of adversarial robustness test & evaluation procedures as part of the Joint AI Test Infrastructure Capability (JATIC) program. This extension, the Hardened Extension of Adversarial Robustness Toolbox (HEART), focuses primarily on computer vision use cases including object detection and image classification to bolster model performance against evasion attacks. HEART is available via PyPi and Conda Forge and enables ART’s capabilities to be leveraged through a set of standardized protocols designed to increase ease of use and interoperability across AI model test and evaluation tooling. Specifically, HEART is developed to address key DoD use cases and integrate with DoD workflows. Recently, HEART was deployed on the Navy’s Project Harbinger as an added method of protecting the AI models. Speaker Info: Jordan Fischer Solutions Architect IBM I am a Lead AI developer and solutions architect in IBM’s public service division, designing and implementing integrated AI and machine learning systems for the US Government. I specialize in Advanced AI (LLMs and other foundation models), AI Governance, Responsible AI, and Adversarial AI Robustness. My background in AI and data systems in the public sector has spanned topics as diverse as urban development, climate resiliency, public administration, public health and human rights. I hold a master’s degree in Business Analytics from George Washington University and a bachelor’s degree in Economics from the University of Utah. | 2025 Hardened Extension of the Adversarial Robustness Toolbox: Jordan Fischer IBM X High, Hot, and Humid: On the Impacts of Extreme Conditions on Aviation Session Recording Materials Abstract: Rising occurrences of extreme heat and humidity, combined with high-altitude conditions, present significant challenges for aviation by increasing density altitude—a critical factor in aircraft performance. Elevated temperatures, humidity, and high-altitude environments reduce air density, impacting engine efficiency, lift, and takeoff capability while extending required runway lengths. These challenges are particularly pronounced at high-altitude airports, where all three factors converge to affect operational safety and efficiency. This study utilizes high-resolution atmospheric projections and aircraft performance modeling to assess risks for global airports and proposes scalable adaptive strategies. Addressing the growing prevalence of extreme heat and humidity requires resilient infrastructure, comprehensive risk assessments, and forward-thinking policies to ensure aviation operations remain safe and reliable in evolving conditions. Speaker Info: Cameron Liang Research Staff Member IDA Cameron Liang (B.S., Physics, University of California, San Diego, 2012; Ph.D., Astronomy & Astrophysics, University of Chicago, 2018) arrived at the Institute for Defense Analyses (IDA) in 2020 after a postdoctoral research position in the Kavli Institute for Theoretical Physics and University of California, Santa Barbara. He studies structure formation in the Universe with a focus on galaxy formation using state-of-the-art hydrodynamic simulations. He is an expert on high performance computing, machine learning, statistics, modeling and simulation. At IDA, he works on a range of topics, including climate change, artificial intelligence, and orbital debris. Presentation | 2025 High, Hot, and Humid: On the Impacts of Extreme Conditions Cameron Liang IDA X Improving Readiness Outcomes with Collaboration: Quantifying the Cost of Siloed Resourcing Session Recording Materials Abstract: Resource decisions (e.g. purchasing and positioning spares) across the DoD supply system are optimally made from a multi-echelon viewpoint, allocating resources between retail sites and a centralized wholesale in tandem to maximize readiness outcomes. Due to the size and complexity of the supply system, it is difficult to draw direct connections between resource decisions and mission outcomes. Thus, the common metrics used to grade performance do not strongly correlate to readiness and result in siloed thinking and inefficient resource allocation. Using discrete-event simulations of the end-to-end Naval aviation sustainment system, we quantified the readiness and cost impact of sub-optimal resourcing decisions due to siloed decision-making. These inefficiencies are best avoided by taking a multi-echelon approach. However, recognizing the DoD-wide paradigm shift this would require, we identified new wholesale metrics that more strongly tie to flight line readiness to mitigate the inefficiency. Furthermore, quantifying the cost-to-readiness relationship across the supply system can serve as a powerful basis for DoD-wide resource optimization in lieu of multi-echelon approaches. Speaker Info: Connor McNamara Research Staff Member IDA Presentation | 2025 Improving Readiness Outcomes with Collaboration: Quantifying Connor McNamara IDA X Improving the Long-Term Reusability of Nondestructive Evaluation Data Sets Session Recording Materials Abstract: The field of nondestructive evaluation (NDE) is currently undergoing a digital transformation. NDE 4.0, mirroring the Industry 4.0 concept, seeks to improve inspections by leveraging advancements in computing hardware, machine learning/artificial intelligence, and digital thread/digital twin with large volumes of digital NDE data. These new capabilities cannot be realized without well-curated data sets. However, NDE data poses several major challenges. Data are often complex and unstructured, ranging from large multi-dimensional arrays of raw data to perhaps simplified image or one-dimensional representations. Often, meaningful data representing signals of interest are limited and are often in the presence of significant noise. Critical metadata describing the circumstances of the measurement may often be missing. Data may also be stored in a variety of proprietary formats, potentially restricting access to critical raw data needed to develop new measurement techniques or train AI/ML models. These factors have hindered research and development on these NDE 4.0 concepts. This talk will discuss efforts toward overcoming some of these hurdles and improving long term reusability of NDE data through improved data management practices. This includes development of software tools to improve metadata capture, with an emphasis on user experience and minimizing impact on workflow. Also discussed will be efforts toward development of a new data format standard for NDE data and some of the lessons learned to date. Portions of this work are funded by the U.S. Air Force through contract FA8650-19-D-5230. DISTRIBUTION STATEMENT A. Approved for public release: distribution is unlimited. Speaker Info: Tyler Lesthaeghe Research Engineer University of Dayton Research Institute Tyler Lesthaeghe is a research engineer in the NDE Engineering group at the University of Dayton Research Institute. He hold a B.S. in mechanical engineering, an M.S. in Engineering Mechanics, and is a Ph.D. candidate at Iowa State University. His area of expertise is nondestructive evaluation, but he also works on problems related to infrastructure for data collection and analysis, data management, and ensuring long term data reusability. Presentation | 2025 Improving the Long-Term Reusability of Nondestructive Tyler Lesthaeghe University of Dayton Research Institute X Interpretable Machine Learning 101 Session Recording Materials Abstract: This mini tutorial focuses on introducing the methods and concepts used in interpretable machine learning, particularly for applications that incorporate tabular operational data. Much of the current AI focus is on generative AI, which is deeply rooted in uninterpretable neural networks; however, there is still substantial research in interpretable machine learning that regularly outperforms neural networks for classical machine learning tasks such as regression and classification. This course will summarize supervised vs. unsupervised machine learning tasks as well as online vs offline learning settings. Classical machine learning models are introduced and used to motivate concepts such as bias-variance trade-off, hyperparameter tuning, and optimization algorithms/convergence. Last, a swift overview of recent work in interpretable machine learning will point the audience towards state-of-the-art advances. The only pre-requisites for this course are familiarity with mathematical notation and elementary linear algebra: mainly being familiar with matrix/vector operations, matrix inverses, and ill-conditioned matrices. Speaker Info: Nikolai Lipscomb Research Staff Member IDA Nikolai Lipscomb is a research staff member at the Institute for Defense Analyses and works within the Science, Systems, and Sustainment Division. Prior to his work at IDA, Nikolai graduated from the Department of Statistics and Operations Research at The University of North Carolina at Chapel Hill. Nikolai's research experience includes stochastic modeling and numerical 0ptimization. Mini-Tutorial | 2025 Interpretable Machine Learning 101 Nikolai Lipscomb IDA X Kernel Model Validation: How To Do It, And Why You Should Care Session Recording Materials Abstract: Gaussian Process (GP) models are popular tools in uncertainty quantification (UQ) because they purport to furnish functional uncertainty estimates that can be used to represent model uncertainty. It is often difficult to state with precision what probabilistic interpretation attaches to such an uncertainty, and in what way is it calibrated. Without such a calibration statement, the value of such uncertainty estimates is quite limited and qualitative. I will discuss the interpretation of GP-generated uncertainty intervals in UQ, and how one may learn to trust them, through a formal procedure for covariance kernel validation that exploits the multivariate normal nature of GP predictions, and show examples. Speaker Info: Carlo Graziani Argonne National Laboratory Carlo Graziani received a B.S. in applied physics from Columbia University School of Engineering and Applied Science in 1982 and a Ph.D. in physics from the University of Chicago in 1993. He was a postdoctoral research associate at the University of Chicago for the summer of 1993 and then an NRC/NASA research associate at the Goddard Space Flight Center from 1993 to 1996 and at the Enrico Fermi Institute, the University of Chicago, from 1996 to 1999. He then worked as a science team member of the international High-Energy Transient Explorer (HETE) project for over a decade. In June 2007 he joined the University of Chicago, where he was a research associate professor in the Department of Astronomy & Astrophysics. He joined Argonne in 2017, and currently works on theory and applications of uncertainty quantification and machine learning. Presentation | 2025 Kernel Model Validation: How To Do It, And Why You Should Carlo Graziani Argonne National Laboratory X Lethal Debris Creation Following Untracked Orbital Debris Impacts Session Recording Materials Abstract: In this study, we use smooth particle hydrodynamics modeling to examine the creation of mission-lethal non-trackable orbital debris from impacts of 10 g, 100 g, and 10 kg spherical and cylindrical objects on small satellite bus structures at 0, 15, 45, and 75 degrees obliquity. Our simulations include impacts at velocities below, approaching, and above the energy-density threshold that typically disables satellite functionality and creates additional lethal debris. We compare the mass distributions resulting from smooth particle hydrodynamics simulations to the distributions derived from NASA’s satellite breakup model; NASA’s approximation of debris generation aligns well with our simulation results for large, trackable masses but deviates from our simulation results for small, non-trackable masses. Results also show only minor differences in satellite damage and debris generation between spherical and cylindrical 10 kg impactors. Speaker Info: Peter Mancini Research Staff Member IDA Peter Mancini works at the Institute for Defense Analyses, supporting the Director, Operational Test and Evaluation (DOT&E) as a Cybersecurity OT&E analyst. Presentation | 2025 Lethal Debris Creation Following Untracked Orbital Debris Peter Mancini IDA X Motivating Effects to Missions from Cyber Threats During Operational Testing Session Recording Materials Abstract: Operational Testing and Evaluation with cyber threats informs decision making by providing examples of adversarial cyber threats and the effects those exploits cause to mission performance. This briefing examines the goals for, current problems with, and opportunities for improvement in operational testing in a cyber-contested environment. Namely, operational testing in the DoD too often does not consider the cyber effects against a unit during active missions. This briefing argues the DoD needs to move from evaluating segregated systems, to evaluating integrated systems-of-systems. This also requires operational realism with regards to personnel and system configuration, cyber threat integration with mission performance testing, dedicated and methodical data collection from operational testers, and including advanced cyber-attacks such as supply chain compromises. Speaker Info: Jordon Adams Research Staff Member Institute for Defense Analyses Dr. Jordon R. Adams is currently the Project Lead for cyber testing of Land and Expeditionary (LEW) systems and the Deputy Portfolio Lead for cyber testing of all systems on DOT&E oversight at IDA. Jordon received his PhD in High Energy Physics at the Florida State University in 2015, worked for the Center for Army Analysis from 2015-2016 as an Operations Research Systems Analyst, and has been with IDA in support of DOT&E since 2017. | 2025 Motivating Effects to Missions from Cyber Threats During Jordon Adams Institute for Defense Analyses X Multimodal Video Summarization on Multi-Scene Single-Context Data Session Recording Materials Abstract: This project explores transfer-learning approaches for operationalizing multimodal video captioning, focusing on effectively summarizing longer videos. Our methodology employs a Convolutional Neural Network (CNN) encoder with image, motion, and audio modes and a Long Short-Term Memory (LSTM) decoder architecture, incorporating key-frame extraction to reduce computational overhead while integrating audio features surrounding the key frames to improve caption quality. We begin by training a model on the Microsoft Research Video-to-Text (MSR-VTT) benchmark dataset, primarily containing short video clips. We then operationalize the model by evaluating performance on a context-specific dataset, the Dattalion dataset, which features war footage from the conflict in Ukraine of substantially higher length than the videos in MSR-VTT. To address challenges associated with labeling the Dattalion dataset ourselves and to enable the model to understand context-specific themes in the dataset, we apply transfer-learning by combining baseline training on MSR-VTT with fine-tuning on the Dattalion dataset to better handle context-specific videos of potentially long durations. We investigate the comparative performance before and after transfer-learning by evaluating key metrics — BLEU4, METEOR, ROUGE, and CIDEr. This research aims to provide insights into the effectiveness of transfer-learning for video captioning on longer videos in context-specific environments , with implications for improving video summarization in domains such as disaster response and defense. Speaker Info: Aidan Looney Cadet United States Military Academy Aidan Looney is a senior at the United States Military Academy and is studying Operations Research. He will graduate and commission in May into the United States Army as a Cyber Warfare Officer. His research interests are at the intersection optimization and machine learning, building products to further defense initiatives. | 2025 Multimodal Video Summarization on Multi-Scene Single-Context Aidan Looney United States Military Academy X Multiview Computer Vision for Detecting Defects in Munitions Session Recording Materials Abstract: Quality control for munitions manufacturing is an arduous process in which trained technicians examine and enhance X-ray scans of each round to determine whether defects are present. In this talk, we will introduce a machine learning model that would allow manufacturers to automate this process - which is particularly suited for supervised learning given its repetitive nature and clearly defined defect characteristics. Four scans are taken for each round at 30-degree deflections from one another. The three distinct zones of the round have different standards for what constitutes a defect. Our preprocessing pipeline involves applying the necessary image enhancement techniques to highlight defects in the scan and then applying an unsupervised masking algorithm to isolate and segment each zone of the round. Isolated zones from all four views are then fed into a zone-specific multiview neural network trained to detect defects. Due to different defect rates in each zone, two zones use a variational autoencoder for anomaly-based detection while one zone uses a convolutional neural network for heuristic-based detection. The implementation of this system stands to save manufacturers significant resources and man-hours devoted to quality control of their rounds. Speaker Info: William Niven Cadet United States Military Academy Cadet William Niven is currently an Applied Statistics & Data Science Major at the United States Military Academy at West Point and will graduate this May. Upon graduation, Cadet Niven will commission as a 2nd Lieutenant in the Army Cyber Corps and plans to pursue a career as a software developer within the service. For his senior thesis within West Point's Department of Math, Cadet Niven is conducting research in conjunction with U.S. Army Combat Capabilities Development Command (DEVCOM). In the coming years, he plans to use his education to contribute to the Army's burgeoning data science and artificial intelligence capabilities. | 2025 Multiview Computer Vision for Detecting Defects in Munitions William Niven United States Military Academy X Navigating Atmospheric Data An Introduction to the Atmospheric Science Data Center Session Recording Abstract: The Atmospheric Science Data Center (ASDC) is a vital resource for the global scientific community, providing access to a wealth of atmospheric and climate-related data collected through NASA's Earth observing missions. This paper introduces the ASDC, outlining its key functions, data offerings, and user services. By facilitating the discovery, access, and utilization of extensive atmospheric datasets—including those related to radiation budget, aerosols, clouds, and air quality—the ASDC plays a crucial role in advancing Earth system science research. We highlight how researchers, educators, and decision-makers can leverage the center’s resources for applications ranging from climate modeling to air quality monitoring and public health studies. Additionally, we explore the ASDC's commitment to open science, emphasizing its data management practices, user support, and tools for ensuring data accessibility and usability across diverse scientific disciplines. This introduction aims to guide users in navigating the ASDC’s data portal and effectively integrating these datasets into their research workflows for enhanced environmental understanding and decision-making. Speaker Info: Hazem Mahmoud Science Lead NASA LaRC ASDC ADNET Dr. Hazem Mahmoud, the Science Lead at the Atmospheric Science Data Center, brings a wealth of expertise in geophysics and environmental engineering to his role. His primary focus lies in utilizing both orbital and suborbital instruments for remote sensing of the Earth's atmosphere. Dr. Hazem specializes in analyzing radiation budget, cloud formations, aerosol distribution, and tropospheric composition. His ultimate goal is to achieve near real-time air quality monitoring from space and study the impact of the air we breathe on our health. His passion for this field ignited when he confronted the challenge of limited Earth data availability early in his career, compelling him to dedicate his research to remote sensing applications. He firmly advocates for the integration of remote sensing data into scientific endeavors, believing it to be a crucial step in advancing global research efforts. Presentation | 2025 Navigating Atmospheric Data An Introduction to the Hazem Mahmoud NASA LaRC ASDC ADNET X Non-destructive evaluation uncertainty quantification using optimization-based confidence Session Recording Materials Abstract: Non-destructive evaluation is an integral component of engineering applications that is subject to variability and uncertainty. One such example is the use of ultrasonic waves to infer bond strength of a specimen of two adhered composite materials by first inferring the specimen’s interfacial stiffness using noisy phase angle measurements and then propagating the result through an established linear regression relating interfacial stiffness to bond strength. We apply optimization-based confidence intervals to obtain the interfacial stiffness confidence interval and then propagate the result through the existing linear regression such that the final bond strength interval achieves finite-sample coverage under reasonable assumptions. Since we have access to a parameterized forward model of the ultrasonic wave propagation through the specimen along with known physical constraints on the model parameters, this technique is a particularly effective approach to leverage known information without relying on a subjective prior distribution. Applying this technique requires two methodological innovations; incorporation of unknown noise variance and the propagation of the resulting interval through an existing linear regression output. Additionally, the forward model characterizing the propagation of ultrasonic waves is highly nonlinear in the parameters, necessitating interval calibration innovation. We compute a variety of intervals and demonstrate their statistical validity via a simulation study. Speaker Info: Michael Stanley Research Scientist AMA/NASA LaRC Michael (Mike) Stanley is a new postdoctoral researcher at NASA Langley Research Center contracted through Analytical Mechanics Associates (AMA). He obtained his PhD in Statistics from Carnegie Mellon University under Mikael Kuusela with a thesis focused on statistical inference for ill-posed inverse problems in the physical sciences. During his PhD, he collaborated extensively with the Jet Propulsion Laboratory on carbon flux uncertainty quantification and was awarded a Strategic University Research Partnership (SURP) to develop decision theoretic and optimization-based UQ. This work centered around the strategic desire to provide prior-free UQ alternatives to NASA and remains a key focus of his research. More broadly, he is interested in the intersection of statistics, probability, optimization, and computation. Presentation | 2025 Non-destructive evaluation uncertainty quantification using Michael Stanley AMA/NASA LaRC X Novel Batch Weighing Methodology using Machine Learning and Monte Carlo Simulation Session Recording Materials Abstract: Batch weighing is a common practice in the manufacture, research, development, and handling of product. Counting individual parts can be a time consuming and inefficient process, and the ability to batch weigh can save time and money. The main downside of batch weighing is the potential risk of error in the estimated quantity due to tolerance and noise stacking. The methodology highlighted in this presentation aims to directly address and alleviate this risk by quantifying it using Monte Carlo simulation and Discriminant Analysis, a supervised machine learning modeling approach. The final model can be used to inform the user of the specific risk associated with each batch based on weight and allow for less potential for misclassification. The presentation also discusses guidelines for applying the methodology, and remedial methods for certain issues that may arise during its application, using a case study to help illustrate the method’s benefits. Speaker Info: Christopher Drake Lead Mathematical Statistician US Army Futures Command Christopher Drake, PStat® is a Lead Mathematical Statistician in Army Futures Command at Picatinny Arsenal, as part of the Systems Engineering Directorate. Mr. Drake has over 10 years of experience working on highly complex research and development programs as a statistician in the Army. Mr. Drake also has more than 10 years of experience as a Lecturer for Probability & Statistics courses for the Armament Graduate School, a graduate school at Picatinny Arsenal offering advanced degrees in specialized areas of armaments engineering. Mr. Drake gained his Bachelor’s degree in Industrial Engineering from Penn State University with a focus in Manufacturing Systems, and his Master’s degree in Applied Statistics from the New Jersey Institute of Technology. | 2025 Novel Batch Weighing Methodology using Machine Learning and Christopher Drake US Army Futures Command X Optimal Transport-based Space Filling Designs for AI and Autonomy Session Recording Materials Abstract: Space-filling designs play a critical role in efficiently exploring high-dimensional input spaces, especially in applications involving simulation, autonomous systems, and complex physical experiments. While a variety of methods exist for generating such designs, most rely on rectangular constraints, fixed weighting schemes, or limited support for nominal factors. In this presentation, we introduce a novel approach based on sliced optimal transport that addresses these limitations by enabling the creation of designs with significantly improved space-filling properties—offering enhanced minimum inter-point distances and better uniformity compared to existing methods. Our approach accommodates arbitrary non-rectangular domains and weighted target distributions, ensuring that practitioners can capture realistic constraints and focus sampling where it matters most. The sliced formulation naturally extends to multi-class domains, enabling a unified design across any number of categorical factors in which each sub-design maintains favorable space-filling properties, and the classes collectively collapse into a well-distributed design when nominal factors are ignored. Furthermore, our method is computationally efficient, readily scaling to hundreds of thousands of design points without sacrificing performance—an essential feature for testing AI and autonomous systems in high-dimensional simulation environments. We will demonstrate the theory behind sliced optimal transport, outline our algorithms, and present empirical comparisons that highlight the benefits of this method over existing space-filling approaches. Speaker Info: Tyler Morgan-Wall Research Staff Member IDA Dr. Tyler Morgan-Wall is a Research Staff Member at the Institute for Defense Analyses, and is the developer of the software library skpr: a package developed at IDA for optimal design generation and power evaluation in R. He is also the author of several other R packages for data visualization, mapping, and cartography. He has a PhD in Physics from Johns Hopkins University and lives in Silver Spring, MD. Presentation | 2025 Optimal Transport-based Space Filling Designs for AI and Tyler Morgan-Wall IDA X Optimizing Surveillance Test Designs for DoD Programs Using a Simulation-Based Approach Abstract: As some DoD programs extend beyond their original decommission dates, ongoing subcomponent testing is crucial to ensure system reliability and study aging effects over the extended service life. However, surveillance efforts often face challenges due to limited asset availability, as predetermined quantities are typically allocated to match the original decommission timeline. Options such as recycling assets or procuring additional units are frequently constrained by destructive testing, high costs, long lead times, or a lack of capable manufacturers. Consequently, programs face depleting supplies, leading to a halt in surveillance efforts to sustain life-extension, risking undetected performance degradation. This work presents a simulation-based approach to optimize test quantities and testing intervals while maintaining reliable detection of performance degradation given the surveillance study design, current data characteristics and model specification. By simulating datasets that mirror the original dataset, using the parameter coefficients and residual standard deviation derived from a fitted model, the approach estimates statistical power. Starting with a fixed test size and interval (e.g., n = 1 and t = 1), an equivalent model is sequentially fitted to each dataset, adding simulations cumulatively. This process is repeated across various test sizes and intervals to determine the optimal set with sufficient power to detect degradation consistent with the original dataset’s effect size and variability. While a simple linear regression is specified for demonstration, the approach is flexible and can potentially accommodate other models that involve hypothesis testing such as nonlinear models or generalized additive models (GAMs). A key challenge is the increased risk of inflated false-positive rates as accumulating data are analyzed over time given the sequential nature of surveillance data collection. To address this, sequential methods, commonly used in clinical trials (e.g., Haybittle-Peto and O’Brien-Fleming boundaries) can be employed and integrated into the simulation framework. These methods help balance the need for interim analyses with the risk of false positives in long-term surveillance studies, ensuring statistical rigor. Results demonstrate that variability, effect size, test quantity and testing intervals collectively affect the ability to detect true aging trends. Visualizations highlight the idea that larger effect sizes with low variability are more likely to reveal true aging trends with less data compared to smaller effect sizes in high-variability settings. This approach can enable programs to tailor test schedules based on achieved power for individual parameters, adjusting the overall test quantity and testing frequency for a subcomponent as necessary while maintaining confidence in detecting performance degradation over time. Alternatively, the test frequency and quantity can be increased to accelerate the identification of aging trends and emerging issues. Extensions and limitations of this approach are also planned for as a discussion. Speaker Info: Bryant Chen Statistician Lockheed Martin Space Bryant Chen is currently a Statistician at Lockheed Martin for the MMIII RSRV program, supporting the Air Force Nuclear Weapons Center at Hill AFB, UT. Additionally, he tutors students in mathematics at the STEM Learning Center located on the campus of Salt Lake Community College. He earned a B.S in Industrial Engineering & Mathematics from the University at Buffalo, an M.S in Statistics from California State University, Fullerton and an M.S in Finance from the University of Utah. | 2025 Optimizing Surveillance Test Designs for DoD Programs Using Bryant Chen Lockheed Martin Space X Packing for a Road Trip: Provisioning Deployed Units for a Contested Logistics Environment Session Recording Materials Abstract: In the event of a conflict, the Department of Defense (DOD) anticipates significant disruptions to their ability to resupply deployed units with the spare components required to repair their equipment. Simply giving units enough additional spares to last the entirety of the mission without resupply is the most straightforward and risk-averse approach to ensure success. However, this approach is also the most expensive, as a complete duplicate set of spares must be purchased for each unit, reducing the number of systems that can be so augmented on a limited budget. An alternative approach would be to support multiple combatant units with a common set of forward-positioned spares, reducing the duplicative purchasing of critical items with relatively low failure rates and freeing up funding to support additional systems. This approach, while cost-effective, introduces a single point of failure, and presupposes timely local resupply. We have used Readiness Based Sparing (RBS) tools and discrete event simulations to explore and quantify effectiveness of different strategies for achieving high availability in a contested logistics environment. Assuming that local, periodic resupply of spares is possible, we found that creating a centralized pool of forward-positioned spares dramatically decreases the overall cost for a given readiness target compared to augmenting each individual unit with additional spares. Our work ties dollars spent to readiness outcomes, giving DOD leadership the tools to make quantitative tradeoffs. Speaker Info: Joshua Ostrander Research Staff Member Institute for Defense Analyses Dr. Joshua Ostrander is a Research Staff Member in the Sustainment group at the Institute for Defense Analyses. Trained as a chemist, he received his Ph.D. in Physical Chemistry from the University of Wisconsin-Madison in the lab of Martin Zanni where he developed and applied nonlinear spectroscopy and microscopy to scientific problems in biology and materials science. Joshua's current research is focused on using readiness modeling and simulation to help DoD leaders make informed, data-driven decisions. | 2025 Packing for a Road Trip: Provisioning Deployed Units for a Joshua Ostrander Institute for Defense Analyses X Predicting Cyber Attack Likelihood using Probabilistic Attack Trees Session Recording Materials Abstract: Understanding how weapon systems and platforms will perform in cyber-contested environments is crucial for making rational programmatic and engineering decisions. Understanding the cyber survivability of systems as part of full-spectrum survivability is particularly difficult. Probabilistic attack trees, that combine attack surface analysis, vulnerability analysis, and mission loss into an overall risk picture provide a productive approach to this challenge. Attack trees can be used to show all of the individual pathways an attacker could follow to lead to a particular mission loss, and if the probabilities of the lowest level on the attack tree is determined in some way, the rest of the probabilities across the tree can be easily calculated providing the likelihood that an adversary will achieve a particular effect, and the likelihood of the various pathways an attacker might utilize to get to that effect. Assigning the initial probabilities is the most challenging and contentious part of this approach, but it can be accomplished using a three tiered process. First, any available historical data should be used to generate probabilities of leaf nodes, including historical, test, and architectural data. Second, simple linear models can be built using a combination of data, human experts, and AI. Finally, direct assessment can be utilized for leaves that do not yet have applicable data or models built using a combination of SMEs and AI models. The initially completed attack tree can then be analyzed to find opportunities where additional data or models can be applied, with a focus on those leaves that appear to have the greatest impact on the overall mission risk. Mitigations and design changes can be considered and the attack tree recalculated with those changes, providing an easy and compelling way to understand and present return on investment for different options. Probabilistic attack trees have the potential to become a cornerstone of modern cyber risk assessment with quantitative results that are transparent, repeatable, and easily understood enabling defense programs and operators to make better decisions. This approach offers a reliable and scalable way to safeguard mission-critical platforms and weapon systems enabling them to continue to function as intended despite whatever an adversary may throw at them. Speaker Info: William Bryant Technical Fellow Modern Technology Solutions, Inc. (MTSI) Dr. Bill “Data” Bryant is a cyberspace defense and risk leader with a diverse background in operations, engineering, planning, and strategy. As a thought leader in cyber defense and risk assessment of non-traditional cyber-physical systems, Dr. Bryant believes that cyber-physical systems such as aircraft are often an organization’s most critical and least defended assets, and he is passionate about improving the defensive posture of these systems. In his current role at Modern Technology Solutions Incorporated, Dr. Bryant created the Unified Risk Assessment and Measurement Process (URAMS). With a focus on assessing the cyber risk to aviation platforms and weapon systems, Dr. Bryant has supported numerous strategic and operational efforts for cyber resiliency, survivability of weapon systems, and cybersecurity risk assessments on various critical cyber-physical systems across multiple agencies. Dr. Bryant also co-developed Aircraft Cyber Combat Survivability (ACCS) with Dr. Bob Ball and has been working to apply kinetic survivability concepts to the new realm of cyber weapons. With over 25 years in the Air Force—including serving as the Deputy Chief Information Security Officer—Dr. Bryant has extensive experience successfully implementing proposals and policies to improve the cyber defense of weapon systems. He holds a wide range of academic degrees, in addition to his PhD, including Aeronautical Engineering, Space Systems, Military Strategy, and Organizational Management. He also holds CISSP, C|EH, and Security+ certifications. | 2025 Predicting Cyber Attack Likelihood using Probabilistic William Bryant Modern Technology Solutions, Inc. (MTSI) X Reducing Test to Purpose Session Recording Materials Abstract: In 2023, the Scientific Test and Analysis Techniques Center of Excellence (STAT COE) assisted the planning of a test for the effect of storage conditions on the protective coating of launched devices. It was observed that the protective coating had developed potentially indented striations on support surfaces, which could affect the flight properties of the device in question. However, the extent of these striations under various potential storage conditions were unknown. A physics-based model to simulate the extent of striations over time was updated. In addition to validation of the physics-based model, the model was slow and expensive to run and the team wished to approximate or extrapolate results in a larger factor space than feasible with the simulation. The test team had initially planned a replicated full factorial design to characterize the behavior of the protective coating. With the assistance of the STAT COE, they were able to identify the meaningful depth of striation that would be necessary to detect and estimate the inherent variability of the system. With this information the team was able to substantially reduce testing, saving the program more than $200,000 and schedule impact while sufficiently characterizing the behavior of the system under test. Speaker Info: Anthony Sgambellone Sr STAT Expert STAT COE Dr. Anthony Sgambellone is a Scientific Test and Analysis Techniques (STAT) Expert at the STAT Center of Excellence (COE). He has been part of the STAT COE since 2020 where he provides support and instruction in efficient and effective test design and analysis across the DOD. Before joining the STAT COE Dr. Sgambellone developed machine-learning models in the financial industry and on an Agile software development team in support of customers on Wright Patterson Air Force Base. Anthony holds a BS in Biology from Case Western Reserve University, a MS in Statistics from the University of Akron and a PhD in Statistics from the Ohio State University. Presentation | 2025 Reducing Test to Purpose Anthony Sgambellone STAT COE X Responsible Artificial Intelligence Materials Abstract: Responsible AI (RAI) is a critical framework that ensures artificial intelligence (AI) systems are designed, developed, and deployed responsibly, with trust and safety as a primary consideration, alongside ethical and legal use. As AI becomes increasingly pervasive across various domains, including business, healthcare, transportation, the military and education, it is essential that we prioritize responsible principles, policies, and practices. This RAI one-day short course ensures practitioners have the critical knowledge, skills, and analytical abilities needed to identify and address opportunities and challenges in the design, development, and deployment of systems that incorporate AI. Speaker Info: Missy Cummings Professor George Mason University Professor Mary (Missy) Cummings received her BS in Mathematics from the US Naval Academy in 1988, MS in Space Systems Engineering from the Naval Postgraduate School in 1994, and Ph.D. in Systems Engineering from the University of Virginia in 2004. A naval pilot from 1988-1999, she was one of the U.S. Navy's first female fighter pilots. She is a Professor in the George Mason University College of Engineering and Computing, and directs the Mason Responsible AI program as well as the Mason Autonomy and Robotics Center. She is an American Institute of Aeronautics and Astronautics and Royal Aeronautical Society Fellow. Her research interests include artificial intelligence in safety-critical systems, assured autonomy, human-systems engineering, and the ethical and social impact of technology. Short Course Group | 2025 Responsible Artificial Intelligence Missy Cummings George Mason University X Responsible Artificial Intelligence Materials Abstract: Responsible AI (RAI) is a critical framework that ensures artificial intelligence (AI) systems are designed, developed, and deployed responsibly, with trust and safety as a primary consideration, alongside ethical and legal use. As AI becomes increasingly pervasive across various domains, including business, healthcare, transportation, the military and education, it is essential that we prioritize responsible principles, policies, and practices. This RAI one-day short course ensures practitioners have the critical knowledge, skills, and analytical abilities needed to identify and address opportunities and challenges in the design, development, and deployment of systems that incorporate AI. Speaker Info: Jesse Kirkpatrick Research Associate Professor George Mason University Jesse Kirkpatrick is a Research Associate Professor and the co-director of the Mason Autonomy and Robotics Center at George Mason University. Jesse is also an International Security Fellow at New America and serves as a consultant for numerous organizations, including some of the world’s largest technology companies. Jesse’s research and teaching focuses on responsible innovation, with an emphasis on Responsible AI. He has received various honors and awards and is an official “Mad Scientist” for the U.S. Army. Short Course Group | 2025 Responsible Artificial Intelligence Jesse Kirkpatrick George Mason University X Risk-Informed Decision Making: An Introduction Session Recording Materials Abstract: All projects are inherently uncertain. To make wise decisions, projects should incorporate an analysis of uncertainty in considering potential alternatives. One process that provides an approach to incorporating uncertainty in the decision-making process is known as risk-informed decision making (RIDM). RIDM can be applied whenever there is uncertainty involved and competing alternatives. This tutorial provides a comprehensive of everything needed to conduct RIDM analysis. Examples of technical failures, cost overruns, and schedule delays for historical project projects are provided, giving clear evidence of project uncertainty and provide strong motivation for conducting risk analysis. The terminology of risk is presented. Risk and uncertainty for a variety of sources – cost, schedule, technical, and safety – are discussed. The mathematical basis for risk analysis is discussed, along with a simple worked example. The tools and techniques necessary for the conduct of risk analysis are provided as well. Best practices and potential pitfalls are provided. The RIDM process is outlined and contrasted with a similar approach called risk-based decision making. An overview of risk management is provided and its relationship with RIDM is described. The tutorial concludes notional case studies of RIDM. Speaker Info: Christian Smart Cost Engineer Jet Propulsion Laboratoy Dr. Smart is a Cost Engineer with NASA’s Jet Propulsion Laboratory. He has experience supporting both NASA and the Department of Defense in the theory and application of risk, cost, and schedule analytics for cutting-edge programs, including nuclear propulsion and hypersonic weapon systems. For several years he served as the Cost Director for the Missile Defense Agency. An internationally recognized expert on risk analysis, he is the author of Solving for Project Risk Management: Understanding the Critical Role of Uncertainty in Project Management (McGraw-Hill, 2020). Dr. Smart received the 2021 Frank Freiman lifetime achievement award from the International Cost Estimating and Analysis Association. In 2010, he received an Exceptional Public Service Medal from NASA for the application of risk analysis. Dr. Smart was the 2009 recipient of the Parametrician of the Year award from the International Society of Parametrics Analysts. Dr. Smart has BS degrees in Mathematics and Economics from Jacksonville State University, an MS in Mathematics from the University of Alabama in Huntsville (UAH), and a PhD in Applied Mathematics from UAH. Mini-Tutorial | 2025 Risk-Informed Decision Making: An Introduction Christian Smart Jet Propulsion Laboratoy X Semi-parametric Modeling of the Equation of State of Dissociating Materials Materials Abstract: Modeling the equation of state (EOS) of chemically dissociating materials at extreme temperature and density conditions is necessary to predict their thermodynamic behavior in simulations and experiments. However, this task is challenging due to sparse experimental and theoretical data needed to calibrate the parameters of the equation of state model, such as the latent molar mass surface. In this work, we adopt semi-parametric models for the latent molar mass of the material and its corresponding free energy surface. Our method employs basis representations of the latent surfaces with regularization to address challenges in basis selection and prevent overfitting. We show with an example involving carbon dioxide that our method improves model fit over simpler representations of the molar mass surface while preserving low computational overhead. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. LLNL-ABS-872125 Speaker Info: Jolypich Pek Graduate Student George Mason University Jolypich Pek is a PhD student in the statistics department at George Mason University, working with Dr. Ben Seiyon Lee on developing Bayesian calibration and uncertainty quantification methodology for materials science applications. Her current research focuses on calibrating equation of state models, a collaboration with and funded by Lawrence Livermore National Laboratory. She earned her bachelor’s degree in Mathematical Statistics from George Mason, where she conducted research on modeling COVID-19 transmission on campus using an epidemic model. Additionally, she was a data science intern at ReefPoint Group, where she developed a framework to support Veteran healthcare access for the Department of Veteran Affairs. | 2025 Semi-parametric Modeling of the Equation of State of Jolypich Pek George Mason University X Sequential Space-Filling Designs Session Recording Materials Abstract: There are few recommended methods to help testers plan efficient modeling and simulation studies. Space-filling designs are a rigorous choice, but one of their drawbacks is that they require the final sample size to be selected prior to testing. More efficient testing can be completed by using sequential designs, which choose test points without knowledge of the final sample size. Using sequential designs can prevent oversampling and help to augment poorly designed tests. We provide an overview of sequential space-filling designs, with the focus on designs that are most suitable for the test and evaluation community. Speaker Info: Anna Flowers Ph.D. Student Virginia Tech Anna Flowers is a fourth-year Ph.D. student in Statistics at Virginia Tech. She received an M.S. in Statistics from Virginia Tech in 2023 and a B.S. in Mathematical Statistics from Wake Forest University in 2021. She is jointly advised by Bobby Gramacy and Chris Franck, and her research focuses on Gaussian Process regression, surrogate modeling, and active learning. She was an intern at the Institute for Defense Analyses in 2024. | 2025 Sequential Space-Filling Designs Anna Flowers Virginia Tech X SERPENS: Assessing the Operational Impacts of Orbital Debris Session Recording Materials Abstract: Space is becoming increasingly crowded with both satellites and orbital debris as more proliferated constellations come online. While research has shown that self-sustained run-away growth of orbital debris (a.k.a. Kessler Syndrome) is unlikely in the future, crowded space lanes are degrading satellites’ ability to perform their mission today. To quantify these operational impacts, we built the Space Environment Risk Prediction by Evaluating Numerical Simulations (SERPENS) model. SERPENS uses a high-fidelity, physics-based propagator to precisely calculate orbits and evaluate conjunction scenarios. Here, we use SERPENS to simulate the space environments after various debris-creating events and predict the operational consequences on particular satellites and constellations. SERPENS provides IDA with a test-bed to evaluate the operational impacts of future DoD capabilities including upgrades to space domain awareness (SDA) infrastructure and satellite collision avoidance methodologies. Speaker Info: Benjamin Skopic Research Staff Member IDA Dr. Benjamin Skopic is a Research Staff Member in the Science, Systems and Sustainment (S3D) Division at the Institute for Defense Analyses (IDA). His work has focused on assessing the operational impacts of various space-based threats to satellites using modeling and simulation. He is a primary developer on several such tools used to answer technical questions for the DoD. Dr. Skopic received his Ph.D. in Materials Science & Engineering and B.S. in Physics from William & Mary. His dissertation focused on the ribbon silk fibers naturally produced by recluse spiders. The unique adhesive properties of the silk inspired the design of ribbon/tape-based metastructures. Presentation | 2025 SERPENS: Assessing the Operational Impacts of Orbital Debris Benjamin Skopic IDA X SMOOTH PARTICLE HYDRODYNAMIC CODE PREDICTIONS FOR METEOROID DAMAGE TO THERMAL PROTECTION S Session Recording Materials Abstract: Interplanetary spacecraft are exposed to meteoroid fluxes with characteristics far exceeding the physical simulation capabilities of test facilities for predicting the likelihood that a meteoroid will penetrate a spacecraft’s critical systems. Accurate risk predictions are crucial to ensuring that important interplanetary missions, such as sample returns, can survive years of exposure to the meteoroid environment and safely reenter the Earth’s atmosphere with their scientific cargo. In this paper, we summarize a series of meteoroid impact damage computational simulations into two types of spacecraft composite protective structures using the Smooth Particle Hydrodynamics Code. We consider the effects of both meteoric materials and non-meteoric materials on a shielded forebody thermal protection system (TPS) for an extreme entry environment and on an unshielded aftbody TPS that is similar to the material covering the space shuttle’s external tank. Key to ensuring spacecraft reentry survival is understanding the potential damage from a meteoroid to a spacecraft’s TPSs, which may be housed beneath protective “garage” shielding enclosures. This analysis was presented at the Hypervelocity Impact Symposium 2024 [2] and is a follow-on effort to work published in 2022 by Williamsen et al. [1], who evaluated select meteoroid impact simulations into two types of spacecraft composite protective structures using the Smooth Particle Hydrodynamics Code (SPHC). Both past [1] and present [2] analyses support ongoing National Aeronautics and Space Administration Engineering Safety Center tasks. Here, we present the SPHC predictions for a different type of multi-shock shield than in [1]. We also report patterns discovered in our analysis that reveal how this multi-shock shield and underlying TPS respond to physical impact. The meteoritic impactor materials we investigated in this study are iron, ice, and dunite (chondrite); the non-meteoritic impactor material is aluminum. We discuss the insights SPHC offers regarding debris cloud characteristics and forebody TPS damage, and then we use these insights to identify recognizable trends in forebody TPS penetration depth based on impact energy and overmatch energy. We leverage a new general multi-shock shield ballistic limit equation [3] to provide ballistic limit data that are missing from our limited set of SPHC predictions. Finally, we evaluate the SPHC predictions for aluminum particles impacting the aftbody TPS at three obliquities. [1] Williamsen, Joel, et al. “Prediction and Enhancement of Thermal Protection Systems from Meteoroid Damage Using a Smooth Particle Hydrodynamic Code.” Proceedings of 2022 Hypervelocity Impact Symposium. Paper #: HVIS2022-09XWV8VB6X. Alexandria, VA, September 18-22, 2022. [2] Corbett, Brooke et al. "Smooth Particle Hydrodynamic Code Predictions for Meteoroid Damage to Thermal Protection Systems Shielded by Composite Structures." Proceedings of 2024 Hypervelocity Impact Symposium. Paper #: HVIS2024-013. Tsukuba, Japan, September 9-13. 2024. [3] Schonberg, William P., et al. “Toward a More Generalized Ballistic Limit Equation for Multi-Shock Shields.” Acta Astronautica Vol. 213 (2023): pp. 307-319. Speaker Info: Brooke Corbett RSM IDA Brooke Corbett is a Research Staff Member at the Institute for Defense Analyses. She has worked on a broad range of programs within IDA's Operational Evaluation Division, including Live Fire Test & Evaluation survivability and lethality evaluations for U.S. Army, Air Force, USSOCOM, Navy, and Space Force programs. Brooke is developing subject matter expertise to support survivability and lethality evaluations of US Directed Energy Weapon Systems on DOT&E oversight, and supports survivability and risk analyses for select NASA Engineering and Safety Center programs. Brooke earned a PhD in Materials Science Engineering from the University of Denver in 2008, with research focused on survivability and risk assessments of hypervelocity impact damage response to the International Space Station's meteoroid/orbital debris shielding at elevated temperatures. She earned a MS in Physics and Astronomy from the University of Denver in 2001, with research focused on middle atmospheric long-wave infrared measurements and line-by-line radiative transfer model predictions of water vapor and carbon dioxide at polar regions. She earned a BS in Physics from Le Moyne College, with project work focused on optics and the manual creation of a parabolic primary mirror for a Dobsonian telescope. | 2025 SMOOTH PARTICLE HYDRODYNAMIC CODE PREDICTIONS FOR METEOROID Brooke Corbett IDA X Synthetic anchoring under the specific source problem Session Recording Materials Abstract: Source identification is an inferential problem that evaluates the likelihood of opposing propositions regarding the origin of items. The specific source problem refers to a situation where the researcher aims to assess if a particular source originated the items or if they originated from an alternative, unknown source. Score-based likelihood ratios offer an alternative method to assess the relative likelihood of both propositions when formulating a probabilistic model is challenging or infeasible, as in the case of pattern evidence in forensic science. However, the lack of available data and the dependence structure created by the current procedure for generating learning instances can lead to reduced performance of score likelihood ratio systems. To address these issues, we propose a resampling plan that creates synthetic items to generate learning instances under the specific source problem. Simulation results show that our approach achieves a high level of agreement with an ideal scenario where data is not a limitation and learning instances are independent. We also present two applications in forensic sciences -handwriting and glass analysis- illustrating our approach with both a score-based and a machine learning-based score likelihood ratio system. These applications show that our method may outperform current alternatives in the literature. Speaker Info: Federico Veneri Iowa State University Dr. Federico Veneri is a consultant at the Inter-American Development Bank (IADB). He received his PhD in statistics from Iowa State University, where he collaborated with the Center for Statistics and Applications in Forensic Evidence (CSAFE) on statistical foundation for machine learning-based score likelihood ratio (SLR) inference for source attribution problems in forensic sciences. His research focuses on machine learning applications, quantitative criminology, and large-scale policy evaluation. | 2025 Synthetic anchoring under the specific source problem Federico Veneri Iowa State University X Test Incident Report Analytics Tool Session Recording Materials Abstract: Each test event, ATEC test centers collect tens, hundreds, or even thousands of Test Incident Reports (TIRs) capturing issues with systems. Currently AEC evaluators score these TIRs one by one based on their severity, mission impact, and cause, which can be extremely time-consuming and prone to inconsistency. We are developing a TIR Analytics tool (TIRANICS) to accelerate the TIR scoring process. For any one TIR to be evaluated, TIRANICS provides three main capabilities: (1) lists of similar TIRs from current and past test events, (2) a recommended score, and (3) references to the Failure Definition and Scoring Criteria (FDSC) Guide for the system under test. To provide these capabilities, TIRANICS applies natural language processing techniques, namely, term frequency and transformer-based embeddings, similarity metrics, hierarchical text splitting, and neural network classifiers, to both the TIR narratives and/or the corresponding FDSC for the system under test. In the way we have processed and represented the corpus to capture semantic meaning, this also paves the foundation for utilizing a Large Language Model with the text as a knowledge base to further provide an evaluator with a recommended score and reasoning for that score. Speaker Info: Dan Owens AI Evaluator Army Evaluation Center Dan Owens has a Masters Degree from Carnegie Mellon University in Information Networking and a PhD in Electrical Engineering from Georgia Tech. He has worked for the US Army Test and Evaluation Command (ATEC) at Aberdeen Proving Ground, MD for fifteen years, first as a reliability evaluator working on a wide range of commodities including missile defense, battlefield networks, and tracked vehicles. For the last six years he has been the Army Evaluation Center (AEC) AI Team Lead, focusing on preparing the command to perform evaluations of AI-enabled systems, and identifying ways AI can be used to improve test and evaluation processes. Presentation | 2025 Test Incident Report Analytics Tool Dan Owens Army Evaluation Center X The Content and Context to Orbital Debris Analytics Session Recording Materials Abstract: Orbital debris collision hazard and population evolution modeling are foundational for space safety, space sustainability, and space security. The entangling dependencies of these three domains require persistent orbital intelligence to maintain sufficient situational awareness to satisfy space operators. LeoLabs provides this service by leveraging a global network of state of the art radars, a cloud-based computational/distribution engine, physics-based utilities, advanced visual analytics and machine learning tools. These capabilities combine to provide precise alerts in near real-time to space operators and contextual insights to space planners/policymakers. The suite of tools available via LeoLabs is examined and how they combine to serve a variety of demands for situational awareness. An emphasis is placed on balancing results between reacting to tactical events all the way to identifying strategic trends – both requiring immediate attention. Speaker Info: Darren McKnight Senior Technical Fellow LEOLABS Dr. Darren McKnight is currently Senior Technical Fellow for LeoLabs. Darren leads efforts to realize the value proposition for the growing global network of ground-based radars for space security, space safety, and space sustainability. He creates models, designs data depictions, develops risk algorithms, and leads space incident investigations. He is focusing on creating new statistical collision risk assessment approaches to provide valuable context to the global space safety community. Presentation | 2025 The Content and Context to Orbital Debris Analytics Darren McKnight LEOLABS X The Craft of Scientific Writing Materials Abstract: Communicating technical concepts—clearly, concisely, and with purpose—is a skill that gives you the edge. Whether influencing business decisions, shaping research pathways, or simply growing as a technical professional, an ability to write distinguishes you from your peers. Learn the simple tools to make your technical content stand out. At The Technical Pen, we’ve found that strong technical writers showcase their skills in three primary categories: document layout, written voice, and audience engagement. This short course is built around core principles in each of these categories, offering actionable techniques to improve technical writing skills. Throughout the day, we will work on building a meaningful structure, energizing your written voice, and keeping your audience on track. Speaker Info: Kathryn Kirsch The Technical Pen Dr. Kathryn Kirsch is the founder and principal of The Technical Pen, a firm dedicated to helping scientists and engineers communicate effectively. For over a decade, Kathryn has worked with academics and industry professionals to build their technical writing skillset and to find their written voice. Her extensive experience in writing technical documents—whether proposals, journal papers, conference papers, technical reports, or write papers—provides the foundation for her engaging workshops and short courses. She holds a B.S., M.S., and Ph.D. in Mechanical Engineering. Short Course | 2025 The Craft of Scientific Writing Kathryn Kirsch The Technical Pen X Toward an Integrated T&E Framework for AI-enabled Systems: A Conceptual Model Materials Abstract: The classic DoD T&E paradigm (Operational Effectiveness-Suitability-Survivability-Safety) benefits from 40 years of formalism and refinement and has produced numerous specialized testing disciplines (e.g., the -ilities), governing regulations, and rigorous analysis procedures. T&E of AI-enabled systems is still new, and the laboratory of ideas is a constant source of new testing options, the majority of which focus on performance. The classic gap between measures of performance and measures of effectiveness is very much present in AI-enabled systems and the over emphasis on performance testing means we might miss the (effectiveness) forest for the (performance) trees. Borrowing from the classic “integrated survivability onion” conceptual model, we propose a set of integrated and nested evaluation questions for AI-enabled systems that covers the full range of classic T&E considerations, plus a few that are unique to AI technologies within the military operational context and implied by the DoD Responsible AI Guidance. All requirements for rigorous analytical and statistical techniques are preserved and new opportunities to apply test science are identified. We hope to prompt an exchange of ideas that moves the community toward filling significant T&E capability gaps – especially the gaps between performance and effectiveness – and advancing a whole-program evaluation approach. Speaker Info: Karen O'Brien Sr. Principal Data Scientist Modern Technology Solutions, Inc Karen O’Brien is a senior principal data scientist and AI/ML practice lead at Modern Technology Solutions, Inc. In this capacity, she leverages her 20-year Army civilian career as a scientist, evaluator, ORSA, and analytics leader to aid DoD agencies in implementing AI/ML and advanced analytics solutions. Her Army analytics career ranged ‘from ballistics to logistics’ and most of her career was at Army Test and Evaluation Command or supporting Army T&E from the Army Research Laboratory. She was physics and chemistry nerd in the early days but now uses her M.S. in Predictive Analytics from Northwestern University to help her DoD clients tackle the toughest analytics challenges in support of the nation’s Warfighters. She is the Co-Lead of the Women in Data Huntsville Chapter, a guest lecturer in data and analytics graduate programs, and an ad hoc study committee member at the National Academy of Sciences. | 2025 Toward an Integrated T&E Framework for AI-enabled Karen O'Brien Modern Technology Solutions, Inc X Towards Flight Uncertainty Prediction of Hypersonic Entry Systems Session Recording Materials Abstract: The development of planetary entry technologies relies heavily on computational modeling due to limited ground and flight test data, which simulate the extreme environments encountered by spacecraft during atmospheric entry. Besides the inherent uncertainties associated with the hypersonic entry environment, computational models include significant uncertainties that may affect the prediction accuracy of the simulations. This makes uncertainty quantification a necessary tool for predicting the flight uncertainty with computational models and improving the robustness and reliability of entry systems. In this talk, after outlining the approach for flight uncertainty prediction of hypersonic entry systems, I will focus on presenting an overview of our research on efficient uncertainty quantification (UQ) and sensitivity analysis (SA) based on non-intrusive polynomial chaos theory applied to aerothermal prediction of entry systems. The application examples will include the demonstration of the UQ and SA methods on the stagnation point heat flux prediction of a hypersonic vehicle with a reduced order correlation and thermal protection system response of a hypersonic inflatable aerodynamic decelerator. Speaker Info: Serhat Hosder James A. Drallmeier Centennial Professor of Aerospace Engineering Missouri University of Science and Technology Dr. Serhat Hosder is currently the James A. Drallmeier Centennial Professor of Aerospace Engineering in the Mechanical and Aerospace Engineering Department at Missouri S&T and serves as the director of Aerospace Simulations Laboratory. He received his PhD degree in Aerospace Engineering from Virginia Tech in 2004. Prof. Hosder’s research activities focus on the fields of computational aerothermodynamics, multi-fidelity modeling and uncertainty quantification of hypersonic flows and technologies, directed energy for hypersonic applications, planetary entry/descent/landing of spacecraft, and aerodynamic shape optimization. His recent research projects have been funded by DoD Joint Hypersonics Transition Office, NASA, Missile Defense Agency, NSF and industry. Prof. Hosder is a Fellow of the Royal Aeronautical Society and an Associate Fellow of AIAA. He was the past chair of AIAA Hypersonic Technologies and Space Planes (HyTASP) Technical Committee (TC) between 2019 and 2021 and currently serves in the steering committee of HyTASP TC. Presentation | 2025 Towards Flight Uncertainty Prediction of Hypersonic Entry Serhat Hosder Missouri University of Science and Technology X Transforming the Testing and Evaluation of Collaborative AI-enabled Multi-Agent Systems Session Recording Materials Abstract: Our presentation explores the application of Consensus-based Distributed Ledger Technology (C-DLT) in the testing and evaluation of collaborative adaptive AI-enabled systems (CA2IS). It highlights the potential of C-DLT to enhance real-time data collection, data validation, synchronization, and security while providing a trusted framework for model & parameter sharing, access control, multi-agent data fusion, and (as emphasized for this topic area) novel methods for continuous monitoring and Test & Evaluation of CA2IS systems. As autonomous multi-agent systems (AMAS) evolve, the underlying probabilistic foundation of increasingly AI/ML-enabled systems necessitates the adoption of new approaches to system Test and Evaluation (T&E) that extend traditional range-based & fixed scenario-based repetitive testing methods that have primarily evolved for the testing of historically deterministic systems - into capabilities to assess the performance expectations of probabilistic systems for applications in highly complex and dynamic environments where both operating conditions and system performance may be constantly changing and continuously adapting. Ideally, T&E methods would seamlessly extend from developmental testing, through acceptance and validation testing, and into missions operations (of critical importance as autonomous weapons-capable systems emerge). Additionally, these provisions may provide a mechanism for enhancing system resilience in contested environments through the inherently distributed, time ordered, and redundant records of on-board status, diagnostic & behavioral indicators, as well as, inter-agent communications and data transactions, collectively providing for continuous assessment of system performance to enable real-time characterization of system confidence as an input to well-informed decision making – whether such decisions are made via an agenic AI, or human-in-the-loop, context. The proposed C-DLT framework addresses these considerations by enabling both onboard and in-situ monitoring combined with system-wide record synchronizations to capture the real-world context and dynamics of inter-agent behaviors within a global frame of reference model (and common operating picture). Additionally, the proposed method provides for recurring periodic testing of operational AI/ML-based systems, emphasizing and addressing the dynamic nature of collaborative adaptive Continuous Learning Systems (CLS) as they incorporate new training data, and adapt to new operational environments and changing environmental conditions. The performance trade-offs and T&E challenges that arise within this vision of CA2IS underscore the necessity for the proposed in-situ testing method. Speaker Info: Stuart Harshbarger Chief Technology Officer Assured Intelligence, LLC Stuart Harshbarger is the Co-founder and Chief Technology Officer of Assured Intelligence.ai Stuart previously served as the Chief Technology Officer of two prior start-up ventures and has held various leadership roles in government and defense R&D programs. Most recently, he served the US Government in respective Technical Director and Innovation Leadership roles where led a number of Artificial Intelligence program initiatives and was responsible for promoting best practices for transitioning emerging research outcomes into enterprise operations. Assured Intelligence was envisioned to help facilitate and accelerate well-informed AI adoption through broad community partnerships with a primary focus on AI/ML Assurance methods. Presentation | 2025 Transforming the Testing and Evaluation of Collaborative Stuart Harshbarger Assured Intelligence, LLC X UQ for spacecraft thermal modeling during design and ground operations