Session Type: Year:

Last Name:

Active Filters:

-clear filters-

  • 2024 Army Test and Evaluation Command AI Challenge - Xray Image Defect Detector

    Abstract:

    Developing AI solutions requires the alignment of business needs, data availability, digital tools, and subject matter expertise. The Army Test and Evaluation Command (ATEC) is an organization with extensive data and operational requirements but is still developing its AI capabilities. To address this gap, ATEC launched an annual AI Challenge—an enterprise-wide, education-focused initiative aimed at solving real-world business problems using the new ATEC Data Mesh digital ecosystem. The 2024 challenge focused on automating defect detection in X-ray scans of body armor. Over three months, 153 participants across 29 teams—including internal and external partners—developed computer vision solutions to autonomously identify manufacturing defects such as cracks, foreign debris, and voids. The winning team achieved a remarkable 92% accuracy in defect detection. This effort not only resulted in a valuable tool that enhances operational capacity and efficiency but also significantly advanced AI expertise across the organization. Participants gained hands-on experience with cloud infrastructure, while ATEC refined its methodologies for testing and evaluating AI-enabled systems. The AI Challenge exemplifies how combining educational resources and competition can foster innovation towards real capabilities.

    Speaker Info:

    David Niblick

    AI Evaluator

    Army Evaluator Center

    MAJ David Niblick graduated from the United States Military Academy at West Point in 2010 with a BS in Electrical Engineering. He served in the Engineer Branch as a lieutenant and captain at Ft. Campbell, KY with the 101st Airborne Division (Air Assault) and at Schofield Barracks, HI with the 130th Engineer Brigade. He deployed twice to Afghanistan ('11-'12 and '13-'14) and to the Republic of Korea ('15-'16). After company command, he attended Purdue University and received an MS in Electrical and Computer Engineering with a thesis in computer vision and deep learning. He instructed in the Department of Electrical Engineering and Computer Science at USMA, after which he transferred from the Engineer Branch to Functional Area 49 (Operations Research and Systems Analysis). He currently serves as an Artificial Intelligence Evaluator with Army Test and Evaluation Command at Aberdeen Proving Ground, MD.

  • A Case Study-based Assessment of a Model-driven Testing Methodology for Applicability and

    Abstract:

    The Department of Defense (DoD) Test and Evaluation (T&E) community has fully embraced digital engineering, as defined in the 2018 Digital Engineering Strategy, motivating the ongoing development and adoption of model-based testing methodologies. This article expands upon existing grey box model-driven test design (MDTD) approaches by leveraging model-based systems engineering (MBSE) artifacts to generate flight test planning models and documents. A baseline model of a system under test (SUT) and two additional system case studies are used to assess a Model-Driven Test Design (MDTD) process. The paper illustrates the method's applicability to these case studies, assesses the benefits of MDTD by applying novel metrics of model element reuse, and discusses the relevance to operational flight testing. This approach is novel within the flight-testing community, as it is the first implementation of MDTD in USAF operational testing applications. Whereas previous studies have explored SysML model reuse in small-scale problems or product families, MBSE model management for operational tests at flight-system scale and assessment of reuse in the T&E phase of the SE lifecycle are unresearched to date. This methodology and the case studies will be of particular interest to those involved in developing, executing, and reporting on flight test plans in the context of the DoD Digital Engineering transformation.

    Speaker Info:

    Jose Alvarado

    Technical Advisor

    AFOTEC Detachment 5

    JOSE ALVARADO serves as a technical advisor for AFOTEC Detachment 5 at Edwards AFB, California, with over 33 years of developmental and operational test and evaluation experience. He is interested in applying MBSE concepts to the flight test engineering domain and implementing test process improvements through MBT. Jose holds a B.S. in Electrical Engineering from California State University, Fresno (1991), an M.S. in Electrical Engineering from California State University, Northridge (2002), and a Ph.D. in the Systems Engineering from Colorado State University (2024). He also serves as an adjunct faculty member for the mathematics, science and engineering (MSE) and technical career education (CTE) departments at the Antelope Valley College. He is a member of the International Test and Evaluation Association, Antelope Valley Chapter.

  • A Comparison of Methods for Integrated Evaluation of Complex Systems

    Abstract:

    A strategic goal of the DoD test and evaluation community is to combine information from across the acquisition lifecycle, enabling better understanding of systems earlier and design of tests to maximize information later. This talk will provide a systematic comparison of methods for integrating such information, ranging from hierarchical methods to informative priors to normalized power priors, leveraging as a motivating example a notional model of a defense system for which behavior and test factors evolve as the system develops. Though large-scale simulation experiments testing a variety of situations and assumptions, we will illustrate how the techniques work and their promise for improving understanding of systems, while highlighting best practices as well as potential implementation pitfalls. The comparison illuminates best practices for integrated system evaluation and illustrates how modeling assumptions affect estimates of system parameters.

    Speaker Info:

    Justin Krometis

    Research Assistant Professor

    Virginia Tech

    Justin Krometis is a Research Assistant Professor with the Virginia Tech National Security Institute and holds an affiliate position in the Math Department at Virginia Tech. His research in mostly in development of theoretical and computational frameworks for Bayesian data analysis. These include approaches to incorporating and balancing data and expert opinion into decision-making, estimating model parameters, including high- or even infinite-dimensional quantities, from noisy data, and designing experiments to maximize the information gained. His research interests include: Parameter Estimation, Uncertainty Quantification, Experimental Design, High-Performance Computing, Artificial Intelligence/Machine Learning (AI/ML), and Reinforcement Learning.

    Prior to joining VTNSI, Dr. Krometis worked as a computational scientist supporting high-performance computing and as a transportation modeler to enhance evacuation planning and hurricane, pandemic, and other emergency preparedness. He holds Ph.D., M.S., and B.S. degrees in Math and a B.S. degree in Physics, all from Virginia Tech.

  • A Framework for VV&A of Preproduction Software Environments

    Abstract:

    The evolution from a traditional waterfall software release model to a continuous and iterative release model allows operational testers to conduct operational testing earlier in the development lifecycle. To enable operationally realistic software testing before deploying to users, programs create preproduction environments designed to replicate the hardware and software infrastructure of production environments. However, it can be challenging for testers to assess the similarity between the preproduction and production software environments and determine what data can be used from the preproduction environment to support operational evaluations. We present a general framework for the Verification, Validation, and Accreditation (VV&A) of preproduction environments that aims to be rigorous yet flexible enough to meet the needs of acquisition programs conducting operational testing in preproduction environments. This framework includes a three-stage VV&A process, composed of an initial VV&A, followed by a set of automated and continuous verification and validation (V&V) checks, and a VV&A renewal if major differences between the environments appear. We describe the data needed to verify and validate the environment and how to customize the VV&A process to fit the needs of each program.

    Speaker Info:

    Luis Aguirre

    Research Staff Member

    IDA

    Luis Aguirre is a Research Staff Member at the Institute for Defense Analyses. His work at IDA has focused on operational test and evaluation of Joint C3 systems and major automated information systems. Luis earned a BS in life sciences from the University of Illinois-Chicago and a PhD in organismic and evolutionary biology from the University of Massachusetts-Amherst.

  • A Quantified Approach to Synthetic Dataset Evaluation

    Abstract:

    The advent of advanced machine learning (ML) capabilities has dramatically increased the need for data to train, test, and validate models. At the same time, systems and models are being asked to operate in increasingly diverse environments. With the high cost of data collection and labeling, or even the complete lack of available data, synthetic data has become an attractive alternative. Synthetic data promises many potential benefits over traditional datasets including reduced cost, improved coverage of edge-cases, more balanced data sets, reduced data collection time, and in many cases, it may be the only data that is available for a target environment. At the same time, it introduces potential risks including lack of realism, bias amplification, overfitting to potentially synthetic features, missing real-world variability, and lack of confidence in the results. The degree to which these benefits and risks manifest depends greatly on the type of problem and the way in which synthetic data is generated and utilized. In this paper we propose a principled systematic approach to testing the effectiveness of synthetic data for specific classes of problems. We illustrate our approach on an image classifier using a flower type database. We first establish a model baseline by training and testing the classifier model with real data, then measure its performance. We then establish a synthetic dataset baseline by attempting to train binary classifiers to distinguish each synthetic dataset from the real dataset. Poor performance of the binary classifier indicates that the corresponding synthetic dataset is a better representation of the real data. We then conduct two core sets of experiments evaluating the effectiveness of the synthetic data in training (replacement and augmentation), and another set evaluating the effectiveness of synthetic data for testing. In the replacement experiments we gradually replace real data with synthetic data and measure the degradation in performance for each synthetic dataset. In the augmentation experiments we augment the available real data with additional synthetic data and measure the improvement in performance. Finally, we conduct a set of experiments to evaluate the usefulness of synthetic data for testing. We do this by comparing performance metrics calculated with different subsets of real data against different subsets of synthetic data. In addition, we perturb the model by deliberately degrading the training data (e.g., by deliberately mislabeling subsets) and verifying that the resulting degradation in performance as calculate with synthetic data tracks the degradation as calculated with real data. For each of the synthetic data sets, we compare the results with the original synthetic data quality evaluation we calculated in our baseline.

    Speaker Info:

    Jeffery Hansen

    ML Research Scientist

    Software Engineering Institute

  • A Quest Called TRIBE: Clustering Malware Families for Enhanced Triage and Analysis

    Abstract:

    According to AV-Test, roughly 450,000 new malicious programs are detected each day adding to a total number of malware signatures that stands at almost 1.5 billion. These known signatures are analyzed and given labels by antivirus companies to classify the malware. These classifications allow security operation centers or antivirus programs to more easily take action to prevent or stop costly damage. In recent years, polymorphic malware, malware that intentionally obfuscates its behavior and signature, has seen a rise in prevalence. We aim to show that current antivirus classifications inefficiently group malware, especially in the case of polymorphic malware, that share enough intrinsic similarities with other malware to justify consolidation into broader groupings we are calling tribes. We hypothesize that the consolidation of these labels will reduce the time it takes for analysts to classify malware thus lowering incident response time. This generalized labeling will be implemented through the use of a transformer-based sequence-to-sequence variational autoencoder that takes in a malware binary and produces a clustering based on its distinct characteristics. We are naming this method Tribal Relational Inferential Encoder (TRIBE). The use of autoencoders in malware classification has shown promise in accurately labeling malware. TRIBE will perform unsupervised learning to independently create these tribes, generalized labels, and compare the results to existing labeling schemes. We estimate three outcomes from research: a data set of existing malware antivirus families with associated malware, a trainable autoencoder tool that will produce robust malware tribes, and a classifier that will make use of tribes to label malware.

    Speaker Info:

    Justin Liaw

    Student

    United States Naval Academy

    Our names are name MIDN 1/C Justin Liaw, MIDN 1/C Michael Chen, and MIDN 1/C John Jenness. We are seniors at the United States Naval Academy working with Professor Dane Brown and Commander Edgar Jatho on our final capstone project. We are Computer Science and Cyber Operations dual majors all commissioning into different occupational fields in the Navy and Marine Corps.

  • A Quest for Meaningful Performance Metrics in Multi-Label Classification

    Abstract:

    Multi-label classification is a crucial tool in various applications such as image classification and document categorization. Unlike single-label classification, the evaluation of multi-label classification performance is complex because a model's prediction can be partially correct—capturing some labels while missing others. This lack of a straightforward binary correct/incorrect outcome introduces challenges in model testing and evaluation. Model developers rely on a collection of metrics to fine-tune these models for optimal performance; however, existing metrics are mostly adapted from single-label contexts. This has led to large array of metrics which are difficult to interpret and may not adequately reflect model performance in a practical manner.

    To address this issue, we designed an experiment which replaced multi-label classification algorithms with a random process to evaluate how metrics perform relative to prescribed classifier and dataset characteristics. The relationships between metrics and how those relationships change relative to dataset attributes were investigated with the goal to down-select to a spanning set. Additionally, we explored the potential for developing more interpretable metrics by incorporating dataset characteristics into model evaluation.

    Speaker Info:

    Marie Tuft

    Senior R&D Statistician

    Sandia National Laboratories

    Marie Tuft is a Senior Statistician at Sandia National Laboratories. Her work involves AI/ML evaluation with emphasis on human interaction with algorithms, statistical methods development for faster industrial product realization, and communication of data for risk-informed decision making. She earned an Honors BS in Mathematics from the University of Utah and a PhD in Biostatistics from the University of Pittsburgh.

  • Achieving Predictable Delivery with Credible Modeling, Simulation, and Analysis

    Abstract:

    Over the past three decades, an evolution has occurred in popular terminology, beginning with “simulation-based acquisition” (SBA), transitioning to “digital engineering,” and then to “model-based systems engineering.” Although these three terms emphasize different aspects, the consistent, unspoken, element that underlies all of them is credible modeling, simulation, and analysis (CMSA). It is CMSA that enables predictable delivery. We use the word “delivery” to encompass three aspects that are relevant to SBA: (a) satisfying design intent, (b) meeting the cost/schedule goal, and (c) reaching the production throughput target. The word “credible” implies using the language of probability to quantify uncertainty to support decision-making within estimated risk/reward bounds to achieve predictable delivery.

    We introduce a Predictability Bayes Net (PBN) that represents, from a contractor’s perspective, top-level dependencies between modeling and simulation (M&S) activities and standardized workflows across a population of programs. The PBN describes how to meet SBA objectives, or to diagnose and learn why objectives were not met. For example, given failure to deliver at-cost and on-time, the PBN computes marginal probabilities that suggest the most likely sequence of events leading to this failure. The PBN does this by linking verified and validated M&S to standardized workflows, thereby transitioning from CMSA to assured physical delivery.

    Our previous publications focused on CMSA for design engineering, using Bayes Nets to assess compliance with system performance requirements. We have now expanded CMSA to include production cost estimation and factory throughput modeling. The PBN includes top-level elements such as mission stability, technology stretch, workforce experience level, standardized workflow adherence, and supplier responsiveness.

    A Bayes Net includes a set of event nodes and node state definitions. These definitions become metrics to hold ourselves accountable for product delivery. The PBN is a joint probability distribution; it includes conditional probability estimates, which are based on a combination of opinions from subject matter experts and data from developmental and operational test events. Sufficient, relevant data from these test events is crucial. It supports M&S verification and validation and, when standardized workflows are followed, embodies CMSA. Without this data, M&S is non-credible, and predictable delivery becomes impossible.

    The PBN is of mutual interest to both the DoD and contractors. Early phases in a program cast the largest shadow over a system’s ultimate cost, performance, and production throughput. Given imperfect, limited, or partially shared information available in early program phases, probabilistic inference becomes critical for optimal decision-making using this information. The PBN facilitates SBA through CMSA by (a) first bringing clarity to fine-grained communication using the language of probability, and (b) then quantifying the uncertainty that exists at a specific moment of decision. The PBN also serves as a starting point for building lower-level Bayes Nets to answer targeted queries regarding a program’s execution, including aspects of design, supply chain, and production. The PBN is the mechanism for achieving predictable delivery.

    Speaker Info:

    Terril Hurst

    Senior Engineering Fellow

    Raytheon

    Terril Hurst is a Senior Engineering Fellow at Raytheon in Tucson, Arizona. Before coming to Raytheon in 2005, Terril worked for 27 years at Hewlett-Packard Laboratories on computer data storage physics, devices, and distributed file systems. He received his Bachelors, Masters, and PhD degrees in Applied Mechanics at Brigham Young University and completed a post-doctoral appointment at Stanford University in Artificial Intelligence.

    At Raytheon, Dr. Hurst is responsible for teaching credible modeling, simulation, and analysis, and working with programs to assure quantitative rigor in the verification, validation, and usage of modeling and simulation. He has presented his work regularly at DATAWorks for over 15 years.

  • Active Learning with Deep Gaussian Processes for Trimmed Aero Database Construction

    Abstract:

    Well characterized aerodynamic databases are necessary for accurate simulation of flight dynamics. High-fidelity CFD is computationally expensive and thus we use a surrogate model to represent the database. By utilizing active learning, we can efficiently generate samples for the database and target areas of the design space that are most useful for flight simulation. Here we focus on trimmed aero databases where our goal is to find regions where multiple moments are simultaneously zero and use a novel contour active learning to achieve this goal. Entropy-based methods are well explored for computer experiment research about reliability, however sequential design for estimation for multiple contours is less studied. We compare multiple entropy-based approaches for estimating contours for multiple responses simultaneously. This task requires the development of new metrics to evaluate the performance of the active learning strategies. We apply these active learning methods and metrics for both Gaussian Process and Deep Gaussian Process surrogate models. The performance of this approach is evaluated with multiple examples and applied to a reference vehicle as a simulation study.

    Speaker Info:

    Kevin Quinlan

    Lawrence Livermore National Laboratory

    Kevin Quinlan is staff in the Applied Statistics Group at Lawrence Livermore National Laboratory. He completed his PhD in statistics at Penn State. His main research interests are design of computer experiments, Gaussian process modeling, and active learning.

  • Addressing Ambiguity in Detection & Classification Tests

    Abstract:

    Ambiguity in how to associate system detections with ground truth measurements poses difficulty in interpreting system evaluation tests. As an example, we discuss the tests performed for the Strategic Environmental Research and Development Program and the Environmental Security Technology Certification Program (SERDP/ESTCP) meant to assess novel systems which detect & classify underwater Unexploded Ordnances (UXO). Due to the larger uncertainties associated with underwater environments, these tests frequently have ambiguities. The Institute of Defense Analyses (IDA), tasked with scoring these SERDP/ESTCP tests, developed and implemented a scoring methodology which interpret tests with ambiguities.

    This talk will introduce the basics of non-ambiguous detection & classification scoring, discuss methods IDA has used to address ambiguity, and discuss how this is approach to applied to produce graphs and tables which can be interpreted by relevant stakeholders.

    Speaker Info:

    Tyler Pleasant

    Research Associate

    IDA

    Tyler Pleasant is a Research Associate at the Institute for Defense Analyses (IDA) within the Science, Systems, and Sustainment division. He holds a M.S. from University of Chicago in Chemistry and a B.S in Physics and Mathematics from Massachusetts Institute of Technology. In addition to his detection and classification scoring work at IDA, he works on model verification & validation, data analysis, statistical testing, and technology assessments.

  • Advancing the Test Science of LLM-enabled Systems: A Survey of Factors and Conditions that

    Abstract:

    Regardless of test design method (combinatoric, Design of Experiments, a narrow robustness study, etc.), a scientifically rigorous experiment must understand, manage, and control the variables that impact test outcomes. For most scientific fields, this is settled science with decades – even centuries – of formalism and honed methodology. For the emerging field of Large Language Models (LLM) in military weapon systems, it is the wild west. This presentation will survey the factors and conditions that impact LLM test outcomes, along with supporting literature and practical methods, models, and measures for your use in tests. The presentation will also highlight: 1) The statistical assumptions that underly the common LLM performance metrics and how to test those assumptions; 2) How to evaluate a benchmark for its utility in addressing measures of performance, as well as checking the benchmark’s statistical validity; 3) Practical models, and supporting literature, for binning factors into levels of severity (conditions); 4) Resources for ensuring a User-centered test design; and 5) Incorporating selected adversarial techniques. These resources and techniques are immediately actionable (you can even try them out on your device and favorite LLM during the session) and will equip you to navigate the complexity of scientific test design for LLM-enabled systems.

    Speaker Info:

    Karen O'Brien

    Sr. Principal Data Scientist

    Modern Technology Solutions, Inc

    Karen O’Brien is a senior principal data scientist and AI/ML practice lead at Modern Technology Solutions, Inc. In this capacity, she leverages her 20-year Army civilian career as a scientist, evaluator, ORSA, and analytics leader to aid DoD agencies in implementing AI/ML and advanced analytics solutions. Her Army analytics career ranged ‘from ballistics to logistics’ and most of her career was at Army Test and Evaluation Command or supporting Army T&E from the Army Research Laboratory. She was physics and chemistry nerd in the early days but now uses her M.S. in Predictive Analytics from Northwestern University to help her DoD clients tackle the toughest analytics challenges in support of the nation’s Warfighters. She is the Co-Lead of the Women in Data Huntsville Chapter, a guest lecturer in data and analytics graduate programs, and an ad hoc study committee member at the National Academy of Sciences.

  • AI/ML and UQ in Systems Engineering

    Abstract:

    The integration of Artificial Intelligence (AI), Machine Learning (ML), and Uncertainty Quantification (UQ) is transforming aerospace systems engineering by improving how systems are designed, tested, and operated. This presentation explores the transformative role of large language models (LLMs) and UQ in tackling the high stakes and inherent uncertainty of space systems. LLMs streamline requirements analysis, enable intelligent creation and querying of design documents, and accelerate development timelines, while UQ provides robust risk assessments, predictive modeling, and cost-saving opportunities throughout the mission lifecycle.

    Effectively managing uncertainty is critical at every stage of the project lifecycle, from early design formulation to on-orbit operations. This presentation highlights practical applications of UQ in space mission formulation and science data pipelines, as well as its role in assessing risk and enhancing system reliability. It also examines how LLMs improve the development and analysis of system documentation, enabling more agile and informed decision-making in complex projects.

    By integrating LLMs and UQ into systems engineering, aerospace teams can better manage complexity, enhance system resilience, and achieve cost-effective solutions. This presentation offers key insights, lessons learned, and future opportunities for advancing systems engineering with AI/ML and UQ.

    Speaker Info:

    Kelli McCoy

    Senior Systems Engineer

    NASA JPL + USC

    Kelli McCoy is currently a Senior Systems Engineer at NASA Jet Propulsion Laboratory, working to promote the infusion of Uncertainty Quantification, Machine Learning, and risk-informed decision-making practices across the Systems Engineering organization. Before joining JPL, Kelli gained valuable experience at NASA Headquarters and Kennedy Space Center. Her research interests include Statistical Learning Theory, Digital Twin analytics, and Probabilistic Risk Analysis.

  • An Optimization Approach for Improved Strategic Material Shortfall Estimates

    Abstract:

    Material supply chains are complex, global systems that drive the production of goods and services, from raw material extraction to manufacturing processes. As the U.S. becomes increasingly dependent on foreign production of strategic and critical materials (S&CMs), our nation’s security will demand effective analyses of material supply chains to identify potential shortages and suggest ways to alleviate them. The Institute for Defense Analyses (IDA) developed the Risk Assessment and Mitigation Framework for Strategic Materials (RAMF-SM) to help the Department of Defense identify and resolve potential shortfalls of S&CMs in the National Defense Stockpile Program. The Stockpile Sizing Module (SSM) is the main computational vehicle in the RAMF-SM suite of models and is used to estimate shortfalls of S&CMs during national emergencies. This talk presents a multicommodity flow model that extends the SSM’s network linear programming formulation by explicitly representing two stages of supply—commonly categorized as mining and refining—and tracking material throughout these stages while incorporating decrements to supply. This more accurate representation offers a robust framework for analyzing material production dynamics, enabling precise shortfall calculations and the identification of bottlenecks. While focused on two production stages, this work lays critical groundwork for future extensions to a comprehensive multi-stage production model.

    Speaker Info:

    Dominic Flocco

    Research Associate

    Institute for Defense Analyses

    Dominic C. Flocco is a Research Associate in the Strategy, Forces, and Resources Division at the Institute for Defense Analyses (IDA) and a Ph.D. Candidate in Applied Mathematics & Statistics, and Scientific Computing at the University of Maryland, College Park. He specializes in mathematical optimization and equilibrium modeling, with application in operations research, energy economics, game theory, supply chain management and defense logistics. At IDA, his analytic work supports the Defense Logistic Agency’s effort to assess the risk and vulnerabilities in strategic and critical material supply chains for the National Defense Stockpile.

  • Application of DOD's VAULTIS Data Management Framework to Testing

    Abstract:

    The Department of Defense has realigned its approach to data management. Prior to 2020, data was viewed as a strategic risk for the Department. Now it is seen as a strategic asset that will position the Department for joint all-domain operations and artificial intelligence applications. Commensurate with Department-level data policy, Director, Operation Test and Evaluation has published new policy that test programs shall create data management plans to make test data VAULTIS (visible, accessible, understandable, linked, trustworthy, interoperable, and secure). In this briefing, I will motivate the necessity for testers to take an intentional approach to data management, tour the new policies for test data, and provide an overview of the Data Management Plan Guidebook – an approach to planning for test data management that is in line with DOD’s VAULTIS framework.

    Speaker Info:

    John Haman

    Research Staff

    Institute for Defense Analyses

    I have been a member of the research staff at the Institute for Defense Analyses since 2018. I lead the Test Science team, a team of statisticians, mathematicians, psychologists, and neuroscientists focused on methodological and workforce improvements in testing and evaluation. My overall research interest is identifying effective and pragmatic statistical methods that align to DOD assumptions and analytics goals. I earned a PhD in statistics from Bowling Green State University for my work in energy statistics under the study of Maria Rizzo.

  • Approximate Bayesian inference for neural networks: a case study in analysis of spectra

    Abstract:

    Bayesian neural networks (BNNs) combine the remarkable flexibility of deep learning models with principled uncertainty quantification. However, poor scalability of traditional Bayesian inference methods such as MCMC has limited the utility of BNNs for uncertainty quantification (UQ). In this talk, we focus on recent advances in approximate Bayesian inference for BNNs and seek to evaluate, in the context of a real application, how useful these approximate inference methods are in providing UQ for scientific applications. As an example application, we consider prediction of chemical composition from laser-induced breakdown spectroscopy measured by the ChemCam instrument on the Mars rover Curiosity, which was designed to characterize Martian geology. We develop specialized BNNs for this task and apply multiple existing approximate inference methods. We evaluate the quality of the posterior predictive distribution under different inference algorithms and touch on the utility of approximate inference schemes for other tasks, including model selection.

    Speaker Info:

    Natalie Klein

    Statistician

    Los Alamos National Laboratory

    Dr. Natalie Klein is a staff scientist in the Statistical Sciences group at Los Alamos National Laboratory. Natalie’s research centers on the development and application of statistical and machine learning approaches in a variety of application areas, including hyperspectral imaging, laser-induced breakdown spectroscopy, and high-dimensional physics simulations. Dr. Klein holds a joint Ph.D. in Statistics and Machine Learning from Carnegie Mellon University.

  • April 23 - Keynote

    Speaker Info:

    Joseph Lyons

    Senior Scientist for Human-Machine Teaming

    Air Force Research Laboratory

    Joseph B. Lyons, a member of the scientific and professional cadre of senior executives, is the Senior Scientist for Human-Machine Teaming, 711th Human Performance Wing, Human Effectiveness Directorate, Air Force Research Laboratory, Wright-Patterson AFB, Ohio.  He serves as the principal scientific authority and independent researcher in the research, development, adaptation, and application of Human-Machine Teaming.

     Dr. Lyons began his career with the Air Force in 2005 in the Human Effectiveness Directorate, Wright-Patterson AFB, Ohio. Dr. Lyons has served as a thought leader for the DoD in the areas of trust in autonomy and Human-Machine Teaming. Dr. Lyons has published over 100 technical publications including 64 journal articles in outlets focused on human factors, human-machine interaction, applied psychology, robotics, and organizational behavior. Dr. Lyons also served as Co-Editor for the 2020 book, Trust in Human-Robot Interaction. Dr. Lyons is an AFRL Fellow, a Fellow of the American Psychological Association, and a Fellow of the Society for Military Psychologists. Prior to assuming his current position, Dr. Lyons served as a Program Officer for the Air Force Office of Scientific Research and was a Principal Research Psychologist within the Human Effectiveness Directorate.

  • April 24 – Keynote

    Speaker Info:

    David Salvagnini

    Chief Data Officer / Chief Artificial Intelligence Officer

    NASA

    David Salvagnini serves as the Chief Data Officer at NASA. Since joining NASA in June 2023, David’s role recently expanded to include his appointment in May 2024 as the Chief Artificial Intelligence Officer. In these roles, David will yield synergies between these critical roles especially in assuring data readiness in response to responsible and transparent artificial intelligence (AI).

    David formerly served as the Director of the Intelligence Community Chief Information Officer (IC CIO) Architecture and Integration Group (AIG) and Chief Architect. In these roles, he worked with Intelligence Community elements and 5-Eye Enterprise (5EEE) international partners on the development and implementation of reference architectures for interoperability, data sharing and technical advancement of Information Technology (IT) infrastructure, data services, foundational AI services, and other mission capabilities.

    Before joining the IC CIO, Mr. Salvagnini held a variety of positions at the Defense Intelligence Agency (DIA), to include Chief Information Office (CIO) Technical Director, Chief Data Officer, and Deputy Chief of the Enterprise Cyber and Infrastructure Services Division. In these roles, David supported the deployment of AI capabilities and the development of analytic tradecraft related to AI use as part of intelligence production. He was appointed to the Senior Executive Service as the Senior Technical Officer for Enterprise IT and Cyber Operations at DIA in June 2016.

    Mr. Salvagnini joined DIA as a civil servant in May 2005. Prior to his selection as a senior executive, he served on a joint duty assignment as the Chief Architect for the IC Desktop Environment (IC DTE). In that position, he was responsible for DIA and National Geospatial-Intelligence Agency (NGA) adoption of DTE services, and the transitioning of over 57,000 personnel to the IC Information Technology Enterprise (IC ITE). Previously, Mr. Salvagnini held numerous key leadership positions, to include Deputy Chief, Infrastructure Integration; Deputy Chief, Applications Operations; and Acting Chief, Infrastructure Innovation Division. Mr. Salvagnini's experience includes all aspects of enterprise IT service delivery, including research, engineering, testing, security, and operations.

    Mr. Salvagnini retired from the Air Force as a Communications and Computer Systems Officer in May 2005 after having served in a variety of leadership assignments during his 21-year career.

    He is a native of Setauket, New York and resides with his family in Falls Church, Virginia.

  • Army Evaluation Center's Progression and Advancement of Design of Experiments

    Abstract:

    Army Test and Evaluation Command (ATEC) has been an advocate of Design of Experiments (DOE) since the inception of DOTE DOE policies in the early 2000s. During that time, test designs were primarily D-optimal designs for Operational Tests. As ATEC has endorsed a shift-left mindset, meaning collecting/utilizing Developmental Testing to prove out performance metrics, more statistical analysis models, and therefore more test design options became applicable. With the emphasis on these new avenues for test and evaluation, ATEC has expanded their use of DOE outside of what the test community generally considers the standard Operational and even Developmental Test events. This talk will include test designs for deterministic data, sampling from models, modeling and simulation vs live data, and initial artificial intelligence test cases within ATEC. The designs discussed are not novel in their statistical approach but do shed light on the advances ATEC has made in implementing DOE across multiple stages of test and evaluation.

    Speaker Info:

    Shane Hall

    Analytics and Artificial Intelligence Division Chief

    Army Evaluation Center

    Shane Hall graduated from Penn State University in 2011 with a Bachelors in Statistics and Masters in Applied Statistics. He has worked as a CIV for the US Army for 15 years. Mr. Hall started his Army career at the US Army Public Health Command where he was the Command Statistician. He then transitioned to the Army Evaluation Center at Aberdeen Proving Ground, MD where he first started as a Statistician and has since transitioned into the role of the Division Chief for the Analytics and Artificial Intelligence Division.

  • Automated User Feedback in Software Development

    Abstract:

    Frequent user engagements are critical to the success of modern software development, but documentation lacks precise and structured guidance on when and how programs should obtain user feedback. The Software Acquisition Pathway (SWP) was introduced for software-intensive systems as part of the new Adaptive Acquisition Framework (AAF). DOD Instruction 5000.87, “Operation of the Software Acquisition Pathway,” recognizes the unique characteristics of software development and acquisition, including the rapid pace of change in software technologies and the iterative and incremental nature of modern software development methodologies. Success in developing software-intensive systems will require agile and iterative development processes that incorporate user feedback throughout the development process.
    This work merges established survey principles with the agile, iterative methods necessary to facilitate rapid delivery of a software capability to the user. Four critical milestones in the software development process were identified to gather user feedback throughout the SWP. Examples are given for building effective surveys to gain insight from the user. This work is presented as a framework for collecting actionable user feedback and generating analysis plans and automated reports at the identified key points. Incorporating user feedback early and throughout software development reduces the risk of developing the wrong product.

    Speaker Info:

    Brittany Fischer

    Statistician

    STAT COE

    Ms. Brittany Fischer is a contractor with Huntington Ingalls Industries (HII) working at the Scientific Test and Analysis Techniques (STAT) Center of Excellence (COE). As a STAT expert, she supports DOD programs through direct consultation, applying tailored solutions that deliver insight, and conducts applied research. Before joining the STAT COE, she spent five years as a statistical engineer with Corning, Inc. gathering and analyzing data to solve problems and inform decisions. Her project experience includes collaborating with cross-functional teams for new product development, manufacturing, reliability, and quality systems.

  • Bayes Factor Approach for Calculating Model Validation Live Test Size

    Abstract:

    Model validation is critical to using models to evaluate systems but requires resource-limited live test to validate. If a simulated model shows the system meets requirements, then the model needs to be validated with live data. Bayes Factors can be applied to determine whether live data is more consistent with the model or with an alternative that the system does not meet requirements. If the Bayes Factor is sufficiently large, then the evidence shows that the model aligns better with the data and therefore adequately represents the system.

    This presentation shows how Bayes Factors can be used to validate a model and how to size live tests when using this method. This approach is demonstrated for a model that predicts success and failures. Examples illustrating the methods will be shown and the effect of the factors influencing the required number of tests will be discussed. The simulation results can be represented with a beta distribution that captures the probability of success and compared against an alternative distribution.

    Speaker Info:

    James Theimer

    STAT Expert

    HS COBP

    Dr. James Theimer is a Scientific Test and Analysis Techniques Expert employed by Huntington Ingles Industries Technical Solutions and working to support the Homeland Security Center of Best Practices.

    Dr. Theimer worked for Air Force Research Laboratory and predecessor organizations for more than 35 years. He carried out research on the simulations of sensors and devices as well as the analysis of data.

  • Bayesian Reliability Assurance Testing Made Easy

    Abstract:

    A common challenge for the Department of Defense is determining how long a system should be tested for to adequately assess system reliability. Unlike traditional reliability demonstration tests, which rely solely on current test data and often demand an impractical amount of testing, Bayesian methods aim to improve test plan efficiency by incorporating previous knowledge of a system's reliability. Although prior information is often available and has the potential to improve test plan efficiency, evaluators face challenges in applying Bayesian methods to test planning and analysis due to the lack of readily available tools. To address this gap, we developed an easy-to-use R Shiny application which automatically generates recommended test plans to assess system reliability using all available information about a system. Researchers can also use the application to estimate a system’s reliability using Bayesian methods. Through a case study, we show leveraging prior information into the analysis of test data can yield a narrower range of uncertainty for reliability estimates compared to traditional methods.

    Speaker Info:

    Emma Mitchell

    UNC

    Emma is a fifth-year Ph.D. candidate in Statistics at the University of North Carolina, Chapel Hill, advised by Dr. Jan Hannig and Dr. Corbin Jones. Emma concentrates on methodological advances in the analysis of genomic data using Bayesian modeling and multi-view data integration techniques. She also interned this summer with the Institute for Defense Analyses where she used her knowledge of Bayesian Statistics to design an R shiny application for Bayesian reliability assurance testing.

  • Bayesian Statistical Analysis for Mass Spectrometric Data Processing

    Abstract:

    High-precision mass spectrometry (MS) is a key technology in advancing nuclear nonproliferation analytical capabilities and enabling mission organizations at Savannah River National Laboratory (SRNL). Despite the centrality of precision MS to SRNL projects and organizations, no software currently exists with either basic or advanced data analysis tools and data interactivity functions necessitated by modern high-precision MS, including thermal ionization mass spectrometry (TIMS) and multicollector–inductively coupled plasma–mass spectrometry (MC-ICP-MS). The absence of transparent, optimizable, multifaceted data management and analytics tools in commercial MS software is further compounded by the non-user-friendly and cumbersome experience of using manufacturer-supplied software, whose interface is often buggy and prone to copy/paste errors, mislabeling, and misassignment of samples in a serialized autosampler queue (MC-ICP-MS) or filament turret (TIMS). If left unchecked, such seemingly small user “quality of life” considerations can lead to loss of precious instrument time and/or spuriously reported customer sample results. Insofar as open-source alternatives are concerned, the relevance of the few open-source data tools that do exist is inherently limited. Thus, we aim to develop a comprehensive data analytics software package and R Shiny graphical user interface (GUI) that focuses on flexibility, transparency, and reproducibility.
    Within the GUI, the implementation of a Bayesian framework supports the national security focus of DATAWorks 2025 by allowing for cross validation, more conservative uncertainty classification, and improved model performance. We suggest the implementation of Markov chain Monte Carlo (MCMC) and other sampling algorithms to better standardize and quantify the distribution of traceable isotope ratios in support of high precision mass spectrometry data processing. Additionally, we demonstrate the implementation of various priors including hierarchical, uninformative, and informative to allow for further flexibility in model construction. In doing so, we aim to emphasize how mass spectrometrists at US National Laboratories, academia, and beyond can seamlessly implement Bayesian, data-driven analysis into their own research.

    Speaker Info:

    Ellis McLarty

    Graduate Student

    Clemson University and Savannah River National Laboratory

    Statistician Ellis McLarty of Greenville, South Carolina works in statistical computing, applied data analysis, and mathematical statistics. She is a Technical Graduate Intern for Savannah River National Laboratory and a graduate student at Clemson University’s School of Mathematical and Statistical Sciences.

    At Savannah River National Laboratory (SRNL), Ellis researches uncertainty characterization methods in a Laboratory Directed Research and Development project entitled "Rapid, rigorous, reproducible data analysis software for high precision mass spectrometry." Specifically, she is incorporating Bayesian inference into the measurement uncertainty classification of traceable isotope ratios for elements such as uranium and plutonium. This project is a multi-directorate effort between Environment and Legacy Management and Global Security at SRNL.

    At Clemson University’s Statistical and Mathematical Consulting Center, Ellis researches data characterization and classification methods as a consultant for Cotton Incorporated. Additionally, she routinely performs data management and statistical programming for Clemson University Cooperative Extension program evaluation.

    Ellis is also a classically trained cellist. In recent years, she has enjoyed looking at the intersection of mathematics and music. Ellis holds a Bachelor of Science in Mathematics, Bachelor of Arts in Music Performance, and Performance Certificate in Cello Performance from the University of South Carolina – Columbia.

  • Calculation of Continuous Surrogate Load Distributions from Discrete Resampled Data

    Abstract:

    Aerodynamic design optimization and database generation have seen a growing need for surrogate aerodynamic models based on computational data. Traditionally, the accuracy and quality of surrogate models is analyzed by examining performance compared to test data. However, in some situations this is not possible such as when cost makes the acquisition of additional data impractical. In that case alternative methods are needed to evaluate the quality of a surrogate model.

    Resampling of computational data (e.g., cross validation and bootstrapping) is a technique that can be used to inform the quality of surrogate models and downstream analysis (e.g., structural loading, control robustness). Recent work has shown that modal decomposition-based methods (Principal Component Analysis/Proper Orthogonal Decomposition) enable this type of analysis. By taking subsamples of the snapshot matrix used to generate the model, a distribution of predictions can be generated that reflects the sensitivity of the model to the quality and quantity of input data. A wider spread gives evidence of the need for more data collection.

    One of the underlying problems is that resampling the snapshot matrix is discrete by nature, which complicates the identification of a representative, continuous output distribution. The presented work employs quadrature weighting to make discrete matrix resampling continuous, with an arbitrarily small amount of input data neglected instead of an arbitrary number of discrete snapshots. In this way, size in the variations can be controlled by the user to give a better understanding of how the system reacts to both small and large changes in the system. This results in more coherent output distributions, which then yield more meaningful uncertainties on the surrogate models of interest.

    To explore the validity and usefulness of the approach, data from a simplified three body launch vehicle were used. Three parameters are explored: aerodynamic angle of attack varying from 0 deg. to 90 deg., vehicle roll from 0 deg. to 90 deg., and freestream Reynolds number from 0.6 to 10 million at low Mach numbers. The flow characteristics in this domain are both variable, making for a challenging test problem, and relevant to contemporary applications. The separated wakes in a majority of the incidence range mandate costly, high-fidelity simulations; in such computationally expensive regimes, the use of surrogate models is essential to be able to characterize the entire parameter space. The presented method thus enables surrogate model uncertainty quantification of both input data scope and significance, where no other data sources are available.

    Speaker Info:

    T.J. Wignall

    Aerospace Engineer

    NASA LaRC

    T.J. Wignall is an aerospace engineer with the configuration aerodynamics branch at NASA Langley. He primarily works on the SLS program supporting low speed aerodynamics. Recently his interests have focused on surrogate modeling and resampling methods. He received his master's degree from ODU and his PhD from NCSU.

  • Case Study on Bayesian Test Planning for Binary Reliability Evaluation

    Abstract:

    Follow-on operational test and evaluation (FOT&E) and agile approaches to system development need rigorous ways to determine the amount of testing required. These systems change in small increments over time, and traditional frequentist methods for sizing tests that do not account for prior data result in unrealistically large sample sizes for testing the latest update. Bayesian methods mathematically account for previous system data, enabling reduced sample sizes while maintaining scientific rigor. This presentation will demonstrate a Bayesian power prior framework for evaluating binary (pass/fail) reliability of systems undergoing agile development. Using this framework, the presentation will discuss how to plan the number of trials required for a future update. Additionally, the results and lessons learned from application to a DOD program are discussed.

    Speaker Info:

    Corinne Stafford

    STAT COE

    Ms. Corinne Stafford is the Applied Research Lead at the Scientific Test and Analysis Techniques Center of Excellence (STAT COE) located at the Air Force Institute of Technology at Wright-Patterson Air Force Base. Ms. Stafford leads research efforts in areas including modeling and simulation, Bayesian statistics, and software testing. Ms. Stafford obtained her M.S. in chemical engineering from Stanford University and her B.S. in applied mathematics from Mary Baldwin University.

  • Combinatorial Testing for AI-Enabled Systems

    Abstract:

    Combinatorial testing (CT) is a black-box approach for software integration testing utilizing test suites like covering arrays that guarantee to identify the presence of faults caused by up to a fixed number of interacting components while minimizing the number of tests required. CT makes different assumptions from other DOE approaches, such as requiring components to have discrete levels. CT has applications in testing Artificial Intelligence-enabled systems (AIES) such as constructing test sets with measurable coverage, identifying factors that define a model’s operating envelope, augmenting training datasets for fine-tuning in transfer learning, testing for fairness and bias, and explainable AI. This short course is intended to introduce how to apply CT to testing AIES to the test practitioner through examples that are recognizable within the DOD and NASA as a collaboration between Virginia Tech (VT), National Institute of Standards and Technology (NIST), and Institute for Defense Analyses (IDA). Topics to be covered at a conceptual level will include: CT theoretical background including measures as well as empirical results from the software testing community; differences between CT and other design of experiments approaches; CT applications for AIES across the AI development lifecycle; and how to know when to use CT within a broader test program and at what level of test. Participants will be guided through hands-on exercises with CT tools including NIST’s Automated Combinatorial Testing for Software Tool (ACTS) for test generation and VT’s Coverage of Data Explorer (CODEX) tool for characterizing the AI model input space. The group will also work through how CT may be applied to practical problems within the DOD. To support the hands-on and practical components of this short course, participants should expect to bring a personal laptop on which they have permission to download and run software tools and, if possible, provide real-world (or surrogate) example problems to the organizers prior to the course. Information regarding the software and request for inputs will be sent to registered participants in advance.

    Speaker Info:

    Jaganmohan Chandrasekaran

    Research Assistant Professor

    Virginia Tech

    Jaganmohan Chandrasekaran is a research assistant professor at the Sanghani Center for AI & Data Analytics, Virginia Tech, Arlington, VA 22203, USA. His research interests is at the intersection of software engineering and artificial intelligence, focusing on the reliability and trustworthiness of artificial intelligence-enabled software systems. Chandrasekaran received a Ph.D. in computer science from the University of Texas at Arlington.

  • Combinatorial Testing for AI-Enabled Systems

    Abstract:

    Combinatorial testing (CT) is a black-box approach for software integration testing utilizing test suites like covering arrays that guarantee to identify the presence of faults caused by up to a fixed number of interacting components while minimizing the number of tests required. CT makes different assumptions from other DOE approaches, such as requiring components to have discrete levels. CT has applications in testing Artificial Intelligence-enabled systems (AIES) such as constructing test sets with measurable coverage, identifying factors that define a model’s operating envelope, augmenting training datasets for fine-tuning in transfer learning, testing for fairness and bias, and explainable AI. This short course is intended to introduce how to apply CT to testing AIES to the test practitioner through examples that are recognizable within the DOD and NASA as a collaboration between Virginia Tech (VT), National Institute of Standards and Technology (NIST), and Institute for Defense Analyses (IDA). Topics to be covered at a conceptual level will include: CT theoretical background including measures as well as empirical results from the software testing community; differences between CT and other design of experiments approaches; CT applications for AIES across the AI development lifecycle; and how to know when to use CT within a broader test program and at what level of test. Participants will be guided through hands-on exercises with CT tools including NIST’s Automated Combinatorial Testing for Software Tool (ACTS) for test generation and VT’s Coverage of Data Explorer (CODEX) tool for characterizing the AI model input space. The group will also work through how CT may be applied to practical problems within the DOD. To support the hands-on and practical components of this short course, participants should expect to bring a personal laptop on which they have permission to download and run software tools and, if possible, provide real-world (or surrogate) example problems to the organizers prior to the course. Information regarding the software and request for inputs will be sent to registered participants in advance.

    Speaker Info:

    Erin Lanus

    Research Assistant Professor

    Virginia Tech

    Erin Lanus is a research assistant professor at the National Security Institute and Affiliate Faculty Computer Science at Virginia Tech, Arlington, VA 22203 USA. Her research interests include the adaptation of combinatorial testing to the input space of artificial intelligence and machine learning (AI/ML), metrics and algorithms for designing test sets with coverage-related properties, and data security concerns in AI/ML systems. Lanus received a Ph.D. in computer science and a B.A. in psychology, both from Arizona State University.

  • Combinatorial Testing for AI-Enabled Systems

    Abstract:

    Combinatorial testing (CT) is a black-box approach for software integration testing utilizing test suites like covering arrays that guarantee to identify the presence of faults caused by up to a fixed number of interacting components while minimizing the number of tests required. CT makes different assumptions from other DOE approaches, such as requiring components to have discrete levels. CT has applications in testing Artificial Intelligence-enabled systems (AIES) such as constructing test sets with measurable coverage, identifying factors that define a model’s operating envelope, augmenting training datasets for fine-tuning in transfer learning, testing for fairness and bias, and explainable AI. This short course is intended to introduce how to apply CT to testing AIES to the test practitioner through examples that are recognizable within the DOD and NASA as a collaboration between Virginia Tech (VT), National Institute of Standards and Technology (NIST), and Institute for Defense Analyses (IDA). Topics to be covered at a conceptual level will include: CT theoretical background including measures as well as empirical results from the software testing community; differences between CT and other design of experiments approaches; CT applications for AIES across the AI development lifecycle; and how to know when to use CT within a broader test program and at what level of test. Participants will be guided through hands-on exercises with CT tools including NIST’s Automated Combinatorial Testing for Software Tool (ACTS) for test generation and VT’s Coverage of Data Explorer (CODEX) tool for characterizing the AI model input space. The group will also work through how CT may be applied to practical problems within the DOD. To support the hands-on and practical components of this short course, participants should expect to bring a personal laptop on which they have permission to download and run software tools and, if possible, provide real-world (or surrogate) example problems to the organizers prior to the course. Information regarding the software and request for inputs will be sent to registered participants in advance.

    Speaker Info:

    Brian Lee

    Research Data Analyst

    Virginia Tech

    Brian Lee is a research data analyst at the Virginia Tech National Security Institute's Intelligent Systems Division, Arlington, VA 22203 USA. His interests stem from his research on improving the efficiency and efficacy of machine learning methods such as generative adversarial networks and combinatorics in model training. He received a B.S. in Computational Modeling and Data Analytics from Virginia Tech.

  • Combinatorial Testing for AI-Enabled Systems

    Abstract:

    Combinatorial testing (CT) is a black-box approach for software integration testing utilizing test suites like covering arrays that guarantee to identify the presence of faults caused by up to a fixed number of interacting components while minimizing the number of tests required. CT makes different assumptions from other DOE approaches, such as requiring components to have discrete levels. CT has applications in testing Artificial Intelligence-enabled systems (AIES) such as constructing test sets with measurable coverage, identifying factors that define a model’s operating envelope, augmenting training datasets for fine-tuning in transfer learning, testing for fairness and bias, and explainable AI. This short course is intended to introduce how to apply CT to testing AIES to the test practitioner through examples that are recognizable within the DOD and NASA as a collaboration between Virginia Tech (VT), National Institute of Standards and Technology (NIST), and Institute for Defense Analyses (IDA). Topics to be covered at a conceptual level will include: CT theoretical background including measures as well as empirical results from the software testing community; differences between CT and other design of experiments approaches; CT applications for AIES across the AI development lifecycle; and how to know when to use CT within a broader test program and at what level of test. Participants will be guided through hands-on exercises with CT tools including NIST’s Automated Combinatorial Testing for Software Tool (ACTS) for test generation and VT’s Coverage of Data Explorer (CODEX) tool for characterizing the AI model input space. The group will also work through how CT may be applied to practical problems within the DOD. To support the hands-on and practical components of this short course, participants should expect to bring a personal laptop on which they have permission to download and run software tools and, if possible, provide real-world (or surrogate) example problems to the organizers prior to the course. Information regarding the software and request for inputs will be sent to registered participants in advance.

    Speaker Info:

    Raghu Kacker

    Scientist

    National Institute of Standards and Technology

    Raghu N. Kacker is a scientist in the National Institute of Standards and Technology (NIST), Gaithersburg, MD 20899, USA. He received Ph.D. from the Iowa State University in 1979. His interests include testing for trust, assurance, and performance of software-based systems. He has worked in AT&T Bell Labs and Virginia Tech. He is a Fellow of the American Statistical Association, a Fellow of the American Society for Quality, and a Fellow of the Washington Academy of Sciences. He has authored or co-authored over 200 papers. His papers have been cited over 14,750 times per Google Scholar

  • Combinatorial Testing for AI-Enabled Systems

    Abstract:

    Combinatorial testing (CT) is a black-box approach for software integration testing utilizing test suites like covering arrays that guarantee to identify the presence of faults caused by up to a fixed number of interacting components while minimizing the number of tests required. CT makes different assumptions from other DOE approaches, such as requiring components to have discrete levels. CT has applications in testing Artificial Intelligence-enabled systems (AIES) such as constructing test sets with measurable coverage, identifying factors that define a model’s operating envelope, augmenting training datasets for fine-tuning in transfer learning, testing for fairness and bias, and explainable AI. This short course is intended to introduce how to apply CT to testing AIES to the test practitioner through examples that are recognizable within the DOD and NASA as a collaboration between Virginia Tech (VT), National Institute of Standards and Technology (NIST), and Institute for Defense Analyses (IDA). Topics to be covered at a conceptual level will include: CT theoretical background including measures as well as empirical results from the software testing community; differences between CT and other design of experiments approaches; CT applications for AIES across the AI development lifecycle; and how to know when to use CT within a broader test program and at what level of test. Participants will be guided through hands-on exercises with CT tools including NIST’s Automated Combinatorial Testing for Software Tool (ACTS) for test generation and VT’s Coverage of Data Explorer (CODEX) tool for characterizing the AI model input space. The group will also work through how CT may be applied to practical problems within the DOD. To support the hands-on and practical components of this short course, participants should expect to bring a personal laptop on which they have permission to download and run software tools and, if possible, provide real-world (or surrogate) example problems to the organizers prior to the course. Information regarding the software and request for inputs will be sent to registered participants in advance.

    Speaker Info:

    Rick Kuhn

    Computer Scientist

    National Institute of Standards and Technology

    Rick Kuhn is a computer scientist at the National Institute of Standards and Technology, Gaithersburg, MD 20899 USA, and is a Fellow of the Institute of Electrical and Electronics Engineers (IEEE) and the Washington Academy of Sciences. He co-developed the role based access control (RBAC) model that is the dominant form of access control today. His current research focuses on combinatorial methods for assured autonomy. He has authored three books and more than 200 conference or journal publications on cybersecurity, software failure, and software verification and testing. Previously he served as Program Manager for the Committee on Applications and Technology of the President's Information Infrastructure Task Force, and as manager of the Software Quality Group at NIST. He received an MS in computer science from the University of Maryland College Park and an MBA from William & Mary.

  • Combinatorial Testing for AI-Enabled Systems

    Abstract:

    Combinatorial testing (CT) is a black-box approach for software integration testing utilizing test suites like covering arrays that guarantee to identify the presence of faults caused by up to a fixed number of interacting components while minimizing the number of tests required. CT makes different assumptions from other DOE approaches, such as requiring components to have discrete levels. CT has applications in testing Artificial Intelligence-enabled systems (AIES) such as constructing test sets with measurable coverage, identifying factors that define a model’s operating envelope, augmenting training datasets for fine-tuning in transfer learning, testing for fairness and bias, and explainable AI. This short course is intended to introduce how to apply CT to testing AIES to the test practitioner through examples that are recognizable within the DOD and NASA as a collaboration between Virginia Tech (VT), National Institute of Standards and Technology (NIST), and Institute for Defense Analyses (IDA). Topics to be covered at a conceptual level will include: CT theoretical background including measures as well as empirical results from the software testing community; differences between CT and other design of experiments approaches; CT applications for AIES across the AI development lifecycle; and how to know when to use CT within a broader test program and at what level of test. Participants will be guided through hands-on exercises with CT tools including NIST’s Automated Combinatorial Testing for Software Tool (ACTS) for test generation and VT’s Coverage of Data Explorer (CODEX) tool for characterizing the AI model input space. The group will also work through how CT may be applied to practical problems within the DOD. To support the hands-on and practical components of this short course, participants should expect to bring a personal laptop on which they have permission to download and run software tools and, if possible, provide real-world (or surrogate) example problems to the organizers prior to the course. Information regarding the software and request for inputs will be sent to registered participants in advance.

    Speaker Info:

    M S Raunak

    Computer Scientist

    National Institute of Standards and Technology

    M S Raunak is a computer scientist at the National Institute of Standards and Technology (NIST), Gaithersburg, MD 20899 USA. His research interests include verification, validation, and assurance of “difficult-to-test” systems such as complex simulation models, cryptographic implementations, and machine learning algorithms. Dr. Raunak received his Ph.D. in computer science from the University of Massachusetts Amherst. He is a Senior Member of IEEE.

  • Combining Joint Cost and Schedule Risk Analysis with Earned Value Management Using Bayesia

    Abstract:

    The joint risk analysis of cost and schedule are typically conducted either via parametric methods or using a detailed bottom-up approach by resource-loading schedule networks. Earned value data are not typically used as part of this process. However, there is significant value to utilizing earned value data in order to improve the accuracy of joint cost and schedule risk analyses. This paper will explain how to use a joint cost and schedule risk analysis as an input and then as earned value data are collected, the joint cost and schedule risk probability distributions will be updated using this information. The specific technique used is Bayesian Parameter Learning, which provides a rigorous mathematical framework for updating probability distributions using new information. This presentation is an extension of a prior work that applies the same principle of Bayesian Parameter Learning to improve the predictive accuracy of estimate at completion for cost.

    Speaker Info:

    Murray Cantor

    Consultant

    Cantor Consulting

    Murray Cantor is a retired IBM Distinguished Engineer. With his Ph.D. in mathematics from the University of California at Berkeley and extensive experience in managing complex, innovative projects, he has focused on applying predictive reasoning and causal analysis to the execution and economics of project management.
    In addition to many journal articles, Murray is the author of two books: Object-Oriented Project Management with UML and Software Leadership. He is an inventor of 15 IBM patents.

    After retiring from IBM, he was a founder and lead scientist of Aptage, which developed and delivered tools for learning and tracking the probability of meeting project goals. Aptage was sold to Planview.

    Dr. Cantor’s quarter-century career with IBM included two periods:
    An architecture and senior project manager for the Workstation Division and An IBM Distinguished Engineer in the Software Group and an IBM Rational CTO team member.

    The second IBM stint began with IBM acquiring Rational Software, where Murray was the Lead Engineer for Rational Services. In that role, he consulted on delivering large projects at Boeing, Raytheon, Lockheed, and various intelligence agencies. He was the IBM representative to SysML partners who created the Object Management Group’s System Modeling Language standard. While at Rational, He was the lead author of the Rational Unified Process for System Engineering (RUPSE).

    Before joining Rational, he was project lead at the defense and intelligence contractor TASC, delivering systems for Space Command.

  • Competence Measure Enhanced Ensemble Voting Schemes

    Abstract:

    Ensemble methods are comprised of multiple individual models each producing a prediction. Voting schemes are used to weigh the decisions of the individual models, namely classifiers, to predict class. A well-formed ensemble should be formed from classifiers with diverse assumptions, e.g., differing underlying training data, feature space selection, and therefore decision boundaries. Voting scheme methods are often based on consideration of underlying feature space and individual reported classifier confidence in predictions. Diversity across the classifiers is to an advantage, but is not being fully exploited with existing voting schemes. The purpose of the described concept is to enhance current voting scheme approaches by weighing the individual model competence measures ensuring that input data are appropriate to the prediction space of the individual classifiers. Consideration of the individual classifiers in the voting for the specified input will be based on achieving a threshold model competence measure. This approach appends confidence-based schemes with ensuring that inputs are consistent with the training data of the individual models. An application employing random forest classifiers will be demonstrated.

    Speaker Info:

    Francesca McFadden

    Principal Professional Staff

    The Johns Hopkins University Applied Physics Laboratory

    Francesca McFadden works on the modeling, simulation, and analysis to evaluate system architectures. Francesca has a Masters Degree in Applied Mathematics.

  • Computer Experiments for Meta-learning of Machine Learning Models

    Abstract:

    Operationally realistic data to inform machine learning models can be costly to gather. An example is collecting aerial images of rare objects to train an image classifier. Before collecting new data, it is helpful to understand where your model is deficient. For example, it may not be good at identifying rare objects in seasons not well represented in the training data. We offer a way of informing subsequent data acquisition to maximize model performance by leveraging both the toolkit of computer experiments and the metadata describing the circumstances under which the training data was collected (e.g. time of day, location, source). We do this by treating the composition of metadata and the performance of the learner, respectively, as the inputs and outputs of a Gaussian process (GP). The resulting GP fit shows which metadata features yield the best learner performance. We take this a step further by using the GP to inform new data acquisitions, recommending the best circumstances to collect future data. Our method for active learning offers improvements to learner performance as compared to data with randomly selected metadata, which we illustrate on image classification and detection benchmarks.

    Speaker Info:

    Anna Flowers

    Ph.D. Student

    Virginia Tech

    Anna Flowers is a fourth-year Ph.D. student in Statistics at Virginia Tech. She received an M.S. in Statistics from Virginia Tech in 2023 and a B.S. in Mathematical Statistics from Wake Forest University in 2021. She is jointly advised by Bobby Gramacy and Chris Franck, and her research focuses on Gaussian Process regression, surrogate modeling, and active learning. She was an intern at the Institute for Defense Analyses in 2024.

  • Confidence Based Skip-Lot Sampling

    Abstract:

    The Lake City Army Ammunition Plant (LCAAP) in Independence, Missouri produces and tests millions of rounds of small arms ammunition daily, in many cases using decades-old procedures. While these testing methods are effective, the long history of high manufacturing quality for most products suggests they are significantly over-testing the lots in production. To address this issue, we have developed a skip-lot testing procedure that uses a Bayesian approach to estimate the true quality of each lot in production, even as some lots are skipped. By using this updated approach, we can reduce the total number of tests required while controlling the risk of accepting low-quality lots. Simulation results demonstrate that this process both reduces the number of tests required and meets the production facility’s standards for risk exposure.

    Speaker Info:

    Alexander Boarnet

    Cadet

    United States Military Academy

    I am currently a senior at the United States Military Academy Majoring in Mathematical Science. I have research interests in applied statistics and improving testing procedures. In my free time I am the captain of the West Point Judo team.

  • Creating a Robust Cyber Workforce

    Abstract:

    The current talent pool is struggling to keep pace with the demand for cyber professionals. The rapid pace of technological advancement coupled with the evolving nature of cyber threats has created a constant need for relevant skills. To make cyber careers more accessible, the Office of the National Cyber Director (ONCD) has outlined several initiatives in the National Cyber Workforce and Education Strategy. ONCD policy initiatives include removing unnecessary degree requirements, transitioning to a skills-based hiring approach, expanding work-based learning opportunities, and supporting efforts to bring together employers, academia, local governments, and non-profit organizations. While the federal government has implemented measures such as the Workforce Innovation and Opportunity Act to tackle skills gaps, the aim of our study is to measure the gap between the workforce and unfilled cyber positions and decipher what parameters are necessary to close the skills gap without overfitting for unfit candidates? By implementing data cleaning and isolation of a LinkedIn job postings dataset, we found more than 124,000 cyber job descriptions. With these records, we used key words to bin cyber job descriptions into three distinct levels for each job description. We then used natural language processing and text analytics to identify the knowledge, skills, and abilities (KSAs) undergraduate students obtain from their education. Finally, we extend existing cosine similarity methods to enable us to determine how similar employer job descriptions are to the KSAs of the job applicant population. Using these methods, cyber policy makers and employers have additional tools to ensure that cyber jobs are filled by the correct candidates with applicable experience and credentials.

    Speaker Info:

    Sheyla Street

    Cadet

    United States Military Academy

    Cadet Sheyla Street studies Applied Statistics and Data Science with a Cybersecurity Engineering track at the United States Military Academy. Sheyla’s published research projects have focused on social bias in artificial intelligence and interoperability within NATO. As an interdisciplinary scholar and future Army Engineer Officer, Sheyla hopes to bridge gaps between technology, policy, and justice. Sheyla’s general research interests include artificial intelligence, policy, computer vision, and social bias. At West Point Sheyla serves as a Regimental Prevention Officer, Stokes Writing Center Senior Fellow, President of the MELANATE Club, CLD STEM volunteer, and varsity athlete on the Army Women’s Crew Team. She has also served as the Cadet in Charge of the Corbin Women’s Leadership Forum Workouts and Cultural Affairs Seminar tutor. As President of the Tau Theta Chapter of Delta Sigma Theta Sorority Inc., Sheyla remains active in public service and social action.

  • Developing a Social Engineering Framework and Data Collection Standards for CAT Operations

    Abstract:

    For more than a decade, IDA has supported DOT&E through detailed analysis of cyber attack pathways. This work has resulted in multiple IDA publications and inputs to DOT&E reports that have informed Congressional, U.S. Cyber Command, and OSD Chief Information Officer decision-making. The IDA attack pathway analysis has used the MITRE on-network ATT&CK framework as a common taxonomy and classification for cyber attack actions, and to enable the development of data standards, analysis methodologies, and reporting. However, the MITRE ATT&CK framework only applies to on-network activity, and does not include a framework for assessing physical security.
    Physical penetrations parallel on-network cyber attacks in many ways; both are multi-step activity chains where steps in the attack chain enable one or more later steps. Therefore, it should be possible to analyze the data from Close Access Team (CAT) physical security assessments similarly to how IDA has historically treated on-network cyber attack data.
    This presentation provides an overview of the Central Research Project (CRP) work to develop a framework for CAT social engineering and physical intrusion TTPs as an analog to the MITRE ATT&CK framework. After an overview of how IDA currently uses the MITRE ATT&CK framework and program data standards to analyze CAT mission data and produce data trend products in support of the DOT&E Cyber Assessment Program (CAP), we provide an overview of the academic literature used as a foundation for the development of IDA’s own social engineering framework the Social Spoofing Security Analysis Reference (S3AR). S3AR summarizes CAT operator TTPs into four nodes: Planning, ingress, lateral, and objective. These nods span the life cycle of a CAT operator planning their attack (planning node), entering the target location of interest (ingress node), operating within that location of interest (lateral node), and deploying a device on a network of interest (objective node). IDA leveraged academic literature as well as a tailored survey of CAT operations deployed to CAT operators spanning 5 CATs. The results of this survey and their direct tie to the continued development of lateral node of the analysis reference TTPs are presented.
    In addition to the original charge to develop an analogous analysis framework to MITRE ATT&CK for physical ingress and social engineering techniques, IDA also generated data collection standards for CAT operations. The three data collection sheets presented are designed to be completed during CAT missions and each align to at least one of the analysis reference nodes: planning and prep sheet (planning node), daily actions sheet (ingress and lateral node), and systems accessed sheet (objective node).

    Speaker Info:

    Wendy-Angela Agata-Moss

    RSM

    IDA

    Dr. Saringi Agata-Moss holds a PhD in Chemical Engineering from UVA and is currently a Research Staff Member in the Operational Evaluation Division. Saringi primarily works on the Cyber Assessment Program task supporting cyber planning and analysis for US Strategic Command annual assessments , Nuclear Command, Control and Communications (NC3) special assessments, Persistent Cyber Operations (PCO), and the CAP's Close Access Team assessments. In addition, Saringi was recently funded by IDA's Central Research Program to develop a social engineering and physical ingress tactics, techniques, and procedures analysis framework to assist in continued CAP work with CAT assessments for the DOD. Saringi enjoys supporting DE&I recruitment efforts for IDA particularly in engineering to show the pathway from academia to government support work.

  • Developmental T&E of Autonomous Systems – Consolidated Challenges and Guidance

    Abstract:

    Developmental T&E of Autonomous Systems – Consolidated Challenges and Guidance

    This presentation will give an overview of challenges, methodologies, and best practices for Developmental Test and Evaluation of Autonomous Systems. This addresses the novel challenges of removing human operators from DoD systems, and empowering future autonomous systems to independently act in contested environments. These challenges, such as safety, black-box components, data, and human-machine teaming, demand iterative approaches to evaluating the growing capabilities of autonomous systems, to assure trusted mission capability across complex operational environments.

    The guidance provided includes lessons learned and best practices for the full continuum of autonomy T&E, such as runtime assurance, LVC testing, continuous testing, and cognitive instrumentation. This guidance leverages emerging best practices in agile and iterative testing to extend success throughout the T&E continuum. By applying these best practices to achieve efficient, effective, and robust DT&E, autonomous DoD systems will be primed for successful operational T&E and operational employment.
    The information presented is being published as a new “Developmental T&E of Autonomous Systems Guidebook” which is intended to be a living document contributed by a broad community and will adapt to ensure the best information reaches a wide audience.

    The views expressed are those of the author /presenter and do not necessarily reflect the official policy or position of the Department of the Air Force, the Department of Defense, or the U.S. government

    Speaker Info:

    Charlie Middleton

    Consultant

    STAT Center of Excellence

    Charlie Middleton, Scientific Test & Analysis Techniques (STAT) Center of Excellence, Technical Support Contractor

    Charlie Middleton currently leads the Advancements in Test and Evaluation (T&E) of Autonomous Systems team for the OSD STAT Center of Excellence. His responsibilities include researching autonomous system T&E methods and tools; collaborating with Department of Defense program and project offices developing autonomous systems; leading working groups of autonomy testers, staffers, and researchers; and authoring a guidebook, reports, and papers related to T&E of autonomous systems.

  • Digital Engineering and Test and Evaluation: How DE impacts T&E

    Abstract:

    Since the Office of the Undersecretary of Defense, Research and Engineering released its Digital Engineering (DE) strategy in 2018, the services along with supporting agencies and industry have been working to transform their practices, improve tooling, and upskill their workforce to realize the vision of a digitally harmonized engineering environment that supports Department of Defense (DoD) weapon system acquisition as well as operations and sustainment of existing weapon systems. In kind, the Test and Evaluation (T&E) community within the DoD and supporting agencies is advancing and institutionalizing the application of DE methods to T&E.

    The research team at the Acquisition Innovation Research Center (AIRC) have developed a short course to introduce program offices and other T&E professionals to the basics of implementing DE in support of verification, validation, and accreditation activities. This two-hour course will introduce DE concepts and lifecycles, establish the value proposition of DE methods in DoD acquisition, introduce various DE tools, and share best practices as well as potential challenges in applying DE methods to T&E efforts. The course will address basics in Model Based Mission Engineering, Model Based Systems Engineering, Digital Design, Modeling & Simulation, Model Based T&E Planning, as well as briefly explore application of Generative Artificial Intelligence to T&E. Participants will leave with a working knowledge of how DE can be applied to their specific tasks and have an awareness of some of the tooling employed across the DoD for realizing DE objectives.

    Speaker Info:

    Paul Wach

    Research Assistant Professor

    Virginia Tech National Security Institute

    Dr. Paul Wach is a Research Assistant Faculty with the Intelligent Systems Division of the Virginia Tech National Security Institute and is an Adjunct Faculty with the Grado Department of Industrial & Systems Engineering. His research interests include the intersection of theoretical foundations of systems engineering, digital transformation, and artificial intelligence. Specifically, Dr. Wach’s research is at the cutting edge of conjoining model-based systems engineering (MBSE), modeling & simulation (M&S), and generative AI (e.g., LLMs). He is also associated with The Aerospace Corporation, serving as a subject matter expert on digital transformation. Dr. Wach has prior work experience with the Department of Energy in lead engineering and management roles ranging in scope magnitude from $1-12B as well as work experience with two National Laboratories and the medical industry. He received a B.S. in Biomedical Engineering from Georgia Tech, M.S. in Mechanical Engineering from the University of South Carolina, and Ph.D. in Industrial & Systems Engineering from Virginia Tech.

  • Digital Engineering and Test and Evaluation: Operation Safe Passage

    Abstract:

    Upon the release of the Digital Engineering (DE) Strategy in 2018 by the Office of the Undersecretary of Defense for Research and Engineering, the Department of Defense (DoD) and supporting agencies and industry have been working towards transforming practices, improving tooling, and enhancing a workforce that will contribute to an enhanced engineering environment. This balanced and harmonized environment supports weapon acquisition as well as the upkeep and sustainment of existing weapon systems, contributing to the Test and Evaluation (T&E) community. The advancement and commitment to applying DE methods to T&E are highlighted within the DoD and its supporting agencies.

    The research team at the Acquisition Innovation Research Center (AIRC), alongside the University Affiliated Research Center (UARC) in partnership with the Director, Operational Test and Evaluation (DOT&E) have been developing and maturing a proxy as a solution for a DoD system acquisition to enhance the digital transformation of T&E. A testbed and framework have been developed with the mission defined as an Unmanned Ground Vehicle navigating through a minefield and disarming them to allow cargo and troops to be transported safely. The mission title is regarded as Operation Safe Passage (OSP). The proxy for the mission utilizes the Lego Mindstorm™ as a physical model, while digital models include Computer Aided Design (CAD) models, SysML-based models, physics-based models for analysis, and decision dashboards. This integrated approach enables rapid decision-making by connecting architecture models, test planning, and physical testing. The testbed and framework for OSP enhance the vision of prospective transformations in T&E

    Speaker Info:

    Brandt Sandman

    Graduate Research Assistant

    Virginia Tech

    Brandt Sandman is a first year Ph.D student in the Industrial and Systems Engineering program at Virginia Tech. He has been working as a Graduate Research Assistant with a focus in Digital Transformation.

  • Digital Twins in Reliability Engineering: Innovations, Challenges and Opportunities

    Abstract:

    The digital twin (DT) is a rapidly evolving technology that draws upon a multidisciplinary foundation, integrating principles from computer science, physics, mathematics, statistics, and engineering. Its applications are diverse, spanning industries such as engineering, healthcare, biomedicine, climate changes, renewable energy, and national security. This work aims to discuss the characterization, development, and application of DT as well as to identify both the challenges and opportunities that lie ahead. The study identifies research gaps and a path forward to advance the statistical and computational foundations and applications of DT in the field of reliability engineering and preventive maintenance for the statistical quality control and assurance. Fostering innovation in the total quality management, DT is poised to transform industries. Leveraging advanced data analytics, data science, machine learning (ML), and artificial intelligence (AI), DT enables monitoring, simulation, and optimization of complex systems, ensuring higher quality, greater reliability, and improved decision-making. Addressing the challenges and opportunities, continued investment in the DT technologies will drive the next wave of engineering excellence and operational efficiency.

    Speaker Info:

    David Han

    UT San Antonio

    David Han, M.S., Ph.D. teaches statistics and data science at the University of Texas at San
    Antonio. His research interests include statistical modeling and inference, machine learning, and
    artificial intelligence applied to lifetime analysis and reliability engineering.

  • Enabling Efficient Research and Testing Through Data Stewardship on the Individual Level

    Abstract:

    Data governance determines how an organization makes decisions, while data stewardship determines how these decisions are carried out. Broadly, a data steward is a person with data-related responsibilities. One might execute data-related plans made by data governors, engage with metadata, resolve data issues, or track individual data elements. Benefits of data stewardship include ensuring the quality of data. Taken on an individual or team level, this can lead to savings in time and money and increased accuracy of research and testing. However, the return on investment associated with data stewardship can be best seen if clear goals and data management bottlenecks are identified first.

    This presentation will introduce data stewardship and discuss a case where the author applied data stewardship principles to their research practice. Data management strategies will be discussed. Goals, progress, challenges, and lessons learned will be presented.

    Speaker Info:

    Christin Lundgren

    NASA Langley Research Center

    Dr. Christin Lundgren joined NASA Langley Research Center in 2021 after 10 years in industry, with a background in RF engineering, optics, and electromagnetics. She is currently a computational electromagnetics researcher in the Revolutionary Aviation Technologies Branch. Previously, Dr. Lundgren was a Lead in Optical Engineering at L3Harris Technologies, working in RF photonics. As a graduate student at the University of Arizona, she focused on electro-optic modulators and optical modeling of nanostructured materials for solar cells. She has authored multiple publications and two U.S. patents. She is a Senior Member of IEEE and the Treasurer of the Hampton Roads section of the Society of Women Engineers.

  • Estimating Combat Losses: An Application of Multiple System Estimation

    Abstract:

    Recently, analysts from OED’s live fire group reviewed multiple sources to amass a comprehensive list of aircraft combat damage events that are of interest to vulnerability assessments. Many of these events could be found in more than one source. Analysts believe there were events not known to any of the sources and are absent from the dataset. We want to estimate this number of unobserved events, which when combined with observed events would produce a better estimate of the total number of events that actually occurred.
    Ecologists developed statistical techniques to estimate the sizes of hard-to-count populations, and these techniques can be applied to other domains where observations are recorded in multiple lists.
    In this presentation, I will demonstrate various techniques of multiple system estimation, building on simple intuition. I will then apply it to the combat loss data using several computational tools readily available in R. The analysis will show that the total number of events is likely to be much larger than the observed count.

    Speaker Info:

    Gregory Chesterton

    RSM

    IDA

    Greg Chesterton is a research staff member with a master’s degree in operations research from the Naval Postgraduate School. After graduating in 1988 from The Pennsylvania State University with a degree in aerospace engineering, Greg served 20 years in the Marine Corps as a Naval Flight Officer, with tours in the A-6E Intruder and the F/A-18D. Greg graduated from the Naval Postgraduate School in 2005, and spent two years at the Marine Corps Operational Test and Evaluation Activity (MCOTEA). After retiring from active duty as a Lieutenant Colonel in 2008, Greg joined MITRE’s Center for Advanced Aviation System Development (CAASD) where he produced quantitative safety risk analysis products for numerous FAA sponsors during a 15-year tenure. Greg joined OED’s live fire portfolio in 2024. In addition to his LFT&E responsibilities, his technical areas of interest are in design of experiments, data analysis, and statistical inference.

  • Eucalyptus – An Analysis Suite for Fault Trees with Uncertainty Quantification

    Abstract:

    Eucalyptus – An Analysis Suite for Fault Trees with Uncertainty Quantification

    Eucalyptus is a novel code developed by Lawrence Livermore National Laboratory to incorporate uncertainty quantification into Fault Tree Analysis. This tool addresses the challenge of imperfect knowledge in “grey-box” systems by allowing analysts to incorporate and propagate uncertainty from component-level assessments to system-level effects. Eucalyptus facilitates a consistent evaluation of the impact of subject matter expert judgment and knowledge gaps on overall system response by Monte Carlo generation of possible system fault trees, sampling probabilities of the existence of subsystems and components. The code supports the specification of fault trees through text and allows export to various formats, including auto-generated images, easing analysis and reducing errors. It has undergone extensive verification testing, demonstrating its reliability and readiness for deployment, and leverages on-node parallelism for rapid analysis. Example analyses are shown that include the identification of system failure paths and quantification of the value of further information about system components.

    Speaker Info:

    Adam Taylor

    Computational Engineering Analyst

    Lawrence Livermore National Laboratory

    Adam Taylor is a computational analyst at Lawrence Livermore National Laboratory, specializing in structural and hydrodynamic simulations.

  • Evaluating Metrics for Multiclass Computer Vision Models

    Abstract:

    In support of Chief Digital and Artificial Intelligence Office (CDAO) ongoing efforts to provide best practices and methods for test and evaluation of artificial intelligence-enabled systems, I was tasked to examine metrics for computer vision models. In particular, I studied and implemented multiclass metrics for computer vision models. A table was produced and lists the strengths and weaknesses of different popular metrics. I then simulated these strengths and weaknesses by using the CDAO's Joint Artificial Intelligence Infrastructure Capability (JATIC), a python-based test and evaluation package, to evaluate models trained on the overhead MNIST2 satellite imaging dataset for the automated target recognition (ATR) use case.

    Speaker Info:

    Jeff Lin

    AI Assurance Research Associate I

    Institute for Defense Analyses

    Jeff Lin is an AI Assurance Research Associate at IDA, where they specialize in computer vision and data science. With a B.S. in Computer Science and an M.S. in Data Science, Jeff’s current work focuses on the development of a computer vision guidebook, which outlines best practices and guidelines for computer vision applications within the DoD setting. This past summer, Jeff served as a summer associate, concentrating on metric evaluation for multi-class computer vision models. This brief has been peer-reviewed and sent to the Chief Digital and Artificial Intelligence Office.

  • F-ANOVA: Tutorial for Grouping Data and Identifying Interactions Across Arbitrary Domains

    Abstract:

    Extending Analysis of Variance (ANOVA) to functional data enables researchers and analysts to better understand how categorical variables influence data that vary continuously over a common domain, such as time or frequency. Functional ANOVA (F-ANOVA) builds upon the strengths of traditional scalar ANOVA, allowing for one-way and two-way analyses to test the equality of mean and covariance functions across groups at a given statistical significance threshold. This capability is particularly valuable for uncovering meaningful insights into functional data at both the group level and the interaction level. In the absence of group and interaction effects, datasets can be confidently pooled, resulting in larger sample sizes and enhanced statistical power.

    To address the lack of existing tools for performing F-ANOVA, a custom library was developed and validated, offering unique analytical capabilities while maintaining robust performance. These capabilities include two-way analyses, equality of covariance tests, and greater support when heteroscedasticity is present between groups. This library not only simplifies the application of F-ANOVA but also provides tools tailored for handling diverse functional data scenarios. Best practices for F-ANOVA will be demonstrated in the context of mechanical shock analysis, a domain where functional data is particularly beneficial. However, the library's design makes it broadly applicable to other fields, offering a versatile solution for modern functional data challenges.

    Speaker Info:

    Adam Watts

    R&D Engineer

    Los Alamos National Laboratory

    Adam Watts specializes in applying statistical methods and uncertainty quantification to complex engineering challenges. His expertise includes the uncertainty quantification of chemical kinetics in thermosetting polymers, as well as the thermomechanical properties of constituent materials used in legacy aeroshells as a function of temperature. His research interests focus on functional data analysis and computational statistics, with an emphasis on leveraging these methods to solve engineering problems.

    Adam is an R&D Engineer on the Data Analysis Team for Test Engineering at Los Alamos National Laboratory (LANL). He holds a B.S. in Plastics and Composites Engineering from Western Washington University and an M.S. in Textile Engineering from North Carolina State University.

  • Fast solvers and UQ for computationally demanding inverse problems

    Abstract:

    Satellite-based remote sensing of greenhouse gases and carbon cycle science are examples of operational science data production use cases where thousands to millions inverse problems have to be solved daily as part of processing pipelines. These inversions are typically computationally costly, and further require rigorous Uncertainty Quantification (UQ) to ensure the reliability of data products for downstream users. Even current state-of-the-art methods face considerable challenges with downlinked data volumes, and these problems are only getting more pressing with upcoming next generation of Earth observing satellites and orders of magnitudes increase in data volume. In this talk, we present recent advances in computationally efficient statistical methods and machine learning to tackle these pressing issues. We present approaches on emulating the costly atmospheric radiative transfer physics models to lower the computational burden of inversions, as well as techniques to emulating a direct solution to the inverse problem together with well-calibrated UQ. Specifically, we focus on efficient Gaussian process regression for forward model emulation, and Gaussian mixture modeling and diffusion-based approaches for the inversion and UQ.

    Speaker Info:

    Otto Lamminpaeae

    Data Scientist

    NASA JPL

    Dr. Otto Lamminpaeae is a data scientist and applied mathematician working on greenhouse gas retrieval Uncertainty Quantification, Gaussian process emulation of computationally expensive radiative transfer models, fast UQ-aware direct retrieval techniques using machine learning and mixture modeling, and Markov Chain Monte Carlo (MCMC) methods at NASA Jet Propulsion Laboratory. His research is mainly conducted as a member of OCO-2 and OCO-3 UQ teams, with additional interest and work on NASA's EMIT and GeoCarb missions, planetary boundary layer investigation of water vapor and clouds, and terrestrial carbon cycle modeling using the CARDAMOM framework.

    Dr. Lamminp received his PhD in Applied Mathematics from the University of Helsinki, Finland, while working in the Greenhouse Gases and Satellite Methods research group at the Finnish Meteorological Institute (FMI). In his dissertation work, he applied Dimension Reduction and MCMC to two remote sensing retrieval problems: the XCO2 retrieval of NASA's Orbiting Carbon Observatory 2 and the CH4 profile retrieval of the Sodankyl, Finland, TCCON station. He maintains active academic research collaboration with FMI and several US based universities, and is passionate about bringing experts on cutting edge physics modeling and advanced computational mathematics together to solve current challenges in Earth science.

  • From PDFs to Insights: A Machine Vision and LLM Approach to Test Science with TEMP Copilot

    Abstract:

    At the Institute for Defense Analyses (IDA), we are advancing test science through the use of machine vision and generative AI to process a database of 50,000 Test and Evaluation Master Plans (TEMPs) in PDF format. This effort is encapsulated in an air-gapped, retrieval-augmented LLM framework known as TEMP Copilot. We demonstrate our methodology using a repository of DOT&E annual reports. Starting with scanned, inaccessible PDFs, these reports are transformed by applying machine learning techniques in combination with LLM-enabled qualitative research tools. Our approach reconstructs, parses, and restructures the dataset to enable efficient querying. In this demonstration, we compare various querying techniques on our dataset and show that a combination of data-centric LLM routing and layered search algorithms outperforms traditional frameworks, such as basic retrieval-augmented generation (RAG) and GRAPH-RAG.

    Speaker Info:

    Valerie Bullock

    AI Researcher

    Institute for Defense Analyses

    At the Institute for Defense Analyses (IDA), Valerie is currently developing LLM capabilities to analyze test science data, polls, and focus group interviews, in addition to her ongoing efforts to integrate generative AI (GenAI) into military exercises. Prior to her role at IDA, Valerie worked as a quantitative researcher in the equities markets at a Chicago-based investment bank. Her journey into machine learning began in 2015 during her bachelor's degree, when she first coded a neural network from scratch and began researching bio-inspired neural network architectures. During her graduate studies, Valerie focused on the underlying mathematics and optimization techniques of machine learning. She holds a Master’s degree in Operations Research from the Kellogg School of Management and a Master’s degree in Applied Mathematics from Northwestern University.

  • Functional Data Analysis – “What to do when your data are a curve or spectra”

    Abstract:

    Are you currently NOT USING YOUR ENTIRE DATA STREAM to inform decisions?
    Sensors that stream data (e.g., temperature, pressure, vibration, flow, force, proximity, humidity, intensity, concentration, etc.), as well as radar, sonar, chromatography, NMR, Raman, NIR, or mass spectroscopy, all measure a signal versus a longitudinal component like wavelength, frequency, energy, distance, or in many cases - time. Are you just using select points, peaks, or thresholds in your curved or spectral data to evaluate performance? This course will show you how use the complete data stream to improve your process knowledge and make better predictions.

    Curves and spectra are fundamental to understanding many scientific and engineering processes. They are created by many types of test and manufacturing processes, as well as measurement and detection technologies. Any response varying over a continuum is functional data.

    Functional Data Analysis (FDA) uses functional principal components analysis (FPCA) to break curve or spectral data into two parts - FPC Scores and Shape Components. The FPC Scores are scalar quantities (or weights) that explain function-to-function variation. The Shape Components explain the longitudinal variation. FPC Scores can then be used with a wide range of traditional modeling and machine learning methods to extract more information from curves or spectra.

    When these functional data are used as part of a designed experiment, the curves and spectra can be well predicted as functions of the experimental factors. Curves and spectra can also be used to optimize or “reverse engineer” factor settings. In a machine learning application functional data analysis uses the whole curve or spectra to better predict outcomes than employing “landmark” or summary statistical analyses of individual peaks, slopes, or thresholds.

    References and links will be provided for open-source tools to do FDA, but in this course JMP Pro 18 software will be used to demonstrate analyses and to illustrate multiple case studies. See how a functional model is created by fitting a B-spline, P-spline, Fourier, or Wavelets basis model to the data. One can also perform functional principal components analysis directly on the data, without fitting a basis function model first. Direct Models include several Singular Value Decomposition (SVD) approaches as well as Multivariate Curve Resolution (MCR).

    Curve or spectral data can often be messy. Several data preprocessing techniques will be presented. Methods to cleanup (remove, filter, reduce), transform (center, standardize, rescale), and align data (line up peaks, dynamic time warping) will be demonstrated. Correction methods specific to spectral data including Standard Normal Variate (SNV), Multiplicative Scatter Correction (MSC), Savitzky-Golay filtering, and Baseline Correction will be shown.

    Case studies will be used to demonstrate the methods discussed above.

    Speaker Info:

    Ryan Parker

    Senior Research Statistician Developer

    JMP

    Ryan Parker is a Senior Research Statistician Developer at JMP. Parker develops the Functional Data Explorer platform for JMP Pro, and he is responsible for the Gaussian Process platform and the Bootstrap/Simulate features. He has also contributed to the assess variable importance technique in JMP Profiler, as well as variable clustering. He studied statistics at North Carolina State University, earning a PhD in 2015.

  • Functional Data Analysis – “What to do when your data are a curve or spectra”

    Abstract:

    Are you currently NOT USING YOUR ENTIRE DATA STREAM to inform decisions?
    Sensors that stream data (e.g., temperature, pressure, vibration, flow, force, proximity, humidity, intensity, concentration, etc.), as well as radar, sonar, chromatography, NMR, Raman, NIR, or mass spectroscopy, all measure a signal versus a longitudinal component like wavelength, frequency, energy, distance, or in many cases - time. Are you just using select points, peaks, or thresholds in your curved or spectral data to evaluate performance? This course will show you how use the complete data stream to improve your process knowledge and make better predictions.

    Curves and spectra are fundamental to understanding many scientific and engineering processes. They are created by many types of test and manufacturing processes, as well as measurement and detection technologies. Any response varying over a continuum is functional data.

    Functional Data Analysis (FDA) uses functional principal components analysis (FPCA) to break curve or spectral data into two parts - FPC Scores and Shape Components. The FPC Scores are scalar quantities (or weights) that explain function-to-function variation. The Shape Components explain the longitudinal variation. FPC Scores can then be used with a wide range of traditional modeling and machine learning methods to extract more information from curves or spectra.

    When these functional data are used as part of a designed experiment, the curves and spectra can be well predicted as functions of the experimental factors. Curves and spectra can also be used to optimize or “reverse engineer” factor settings. In a machine learning application functional data analysis uses the whole curve or spectra to better predict outcomes than employing “landmark” or summary statistical analyses of individual peaks, slopes, or thresholds.

    References and links will be provided for open-source tools to do FDA, but in this course JMP Pro 18 software will be used to demonstrate analyses and to illustrate multiple case studies. See how a functional model is created by fitting a B-spline, P-spline, Fourier, or Wavelets basis model to the data. One can also perform functional principal components analysis directly on the data, without fitting a basis function model first. Direct Models include several Singular Value Decomposition (SVD) approaches as well as Multivariate Curve Resolution (MCR).

    Curve or spectral data can often be messy. Several data preprocessing techniques will be presented. Methods to cleanup (remove, filter, reduce), transform (center, standardize, rescale), and align data (line up peaks, dynamic time warping) will be demonstrated. Correction methods specific to spectral data including Standard Normal Variate (SNV), Multiplicative Scatter Correction (MSC), Savitzky-Golay filtering, and Baseline Correction will be shown.

    Case studies will be used to demonstrate the methods discussed above.

    Speaker Info:

    Clay Barker

    Principal Research Statistician Developer

    JMP

    Clay Barker is a Principal Research Statistician Developer for JMP Statistical Discovery in Cary, North Carolina. He has developed a wide variety of capabilities in JMP, including tools for variable selection, generalized linear modeling, and nonlinear modeling. Recently, his focus has been on implementing matrix decompositions to be used when analyzing functional data. Clay joined JMP after earning his PhD in Statistics from North Carolina State University.

  • Functional Data Analysis – “What to do when your data are a curve or spectra”

    Abstract:

    Are you currently NOT USING YOUR ENTIRE DATA STREAM to inform decisions?
    Sensors that stream data (e.g., temperature, pressure, vibration, flow, force, proximity, humidity, intensity, concentration, etc.), as well as radar, sonar, chromatography, NMR, Raman, NIR, or mass spectroscopy, all measure a signal versus a longitudinal component like wavelength, frequency, energy, distance, or in many cases - time. Are you just using select points, peaks, or thresholds in your curved or spectral data to evaluate performance? This course will show you how use the complete data stream to improve your process knowledge and make better predictions.

    Curves and spectra are fundamental to understanding many scientific and engineering processes. They are created by many types of test and manufacturing processes, as well as measurement and detection technologies. Any response varying over a continuum is functional data.

    Functional Data Analysis (FDA) uses functional principal components analysis (FPCA) to break curve or spectral data into two parts - FPC Scores and Shape Components. The FPC Scores are scalar quantities (or weights) that explain function-to-function variation. The Shape Components explain the longitudinal variation. FPC Scores can then be used with a wide range of traditional modeling and machine learning methods to extract more information from curves or spectra.

    When these functional data are used as part of a designed experiment, the curves and spectra can be well predicted as functions of the experimental factors. Curves and spectra can also be used to optimize or “reverse engineer” factor settings. In a machine learning application functional data analysis uses the whole curve or spectra to better predict outcomes than employing “landmark” or summary statistical analyses of individual peaks, slopes, or thresholds.

    References and links will be provided for open-source tools to do FDA, but in this course JMP Pro 18 software will be used to demonstrate analyses and to illustrate multiple case studies. See how a functional model is created by fitting a B-spline, P-spline, Fourier, or Wavelets basis model to the data. One can also perform functional principal components analysis directly on the data, without fitting a basis function model first. Direct Models include several Singular Value Decomposition (SVD) approaches as well as Multivariate Curve Resolution (MCR).

    Curve or spectral data can often be messy. Several data preprocessing techniques will be presented. Methods to cleanup (remove, filter, reduce), transform (center, standardize, rescale), and align data (line up peaks, dynamic time warping) will be demonstrated. Correction methods specific to spectral data including Standard Normal Variate (SNV), Multiplicative Scatter Correction (MSC), Savitzky-Golay filtering, and Baseline Correction will be shown.

    Case studies will be used to demonstrate the methods discussed above.

    Speaker Info:

    Tom Donnelly

    Systems Engineer

    JMP

    Tom Donnelly works as a Systems Engineer for JMP Statistical Discovery supporting users of JMP software in the Defense and Aerospace sector. He has been actively using and teaching Design of Experiments (DOE) methods for the past 40 years to develop and optimize products, processes, and technologies. Donnelly joined JMP in 2008 after working as an analyst for the Modeling, Simulation & Analysis Branch of the US Army’s Edgewood Chemical Biological Center – now DEVCOM CBC. There, he used DOE to develop, test, and evaluate technologies for detection, protection, and decontamination of chemical and biological agents. Prior to working for the Army, Tom was a partner in the first DOE software company for 20 years where he taught over 300 industrial short courses to engineers and scientists. Tom received his PhD in Physics from the University of Delaware.

  • Hardened Extension of the Adversarial Robustness Toolbox: Evaluating & Hardening AI Models

    Abstract:

    Reliable AI systems require secure AI models. The proliferation of AI capabilities in civilian and government workflows creates novel attack vectors for adversaries to exploit. The Adversarial Robustness Toolbox (ART), created in 2018 by IBM Research, is designed to simulate and evaluate threats targeting modern AI systems, identify which AI models are at greatest risk, and provide methods for risk mitigation. ART is designed to support red-blue/attack-defense test and evaluation operations and contains a broad catalog of state-of-the-art methods encompassing evasion, poisoning, extraction, and inference attacks. ART is accessible as an open-source software repository via APIs that execute the evaluation, defense, certification, and verification of AI models. ART supports a wide range of AI frameworks (e.g. TensorFlow, PyTorch), tasks (e.g. classification, object detection, speech recognition) and data types (e.g. images, video, audio). This enables end users to bring their own custom datasets and AI models to assess model adversarial robustness using a variety of attack types, explore available avenues to mitigate attacks’ impacts, and harden pre-trained models via fine-tuning. Using ART, end users can better understand and ultimately mitigate vulnerabilities. ART is continuing to support mission critical enhancements such as supporting scalability across GPUs and the addition new state of the art methods, including automated detection of adversarial inputs.

    In collaboration with the Department of Defense (DoD)’s Chief Digital and AI Office (CDAO), IBM has created an extension of ART to support a suite of adversarial robustness test & evaluation procedures as part of the Joint AI Test Infrastructure Capability (JATIC) program. This extension, the Hardened Extension of Adversarial Robustness Toolbox (HEART), focuses primarily on computer vision use cases including object detection and image classification to bolster model performance against evasion attacks. HEART is available via PyPi and Conda Forge and enables ART’s capabilities to be leveraged through a set of standardized protocols designed to increase ease of use and interoperability across AI model test and evaluation tooling. Specifically, HEART is developed to address key DoD use cases and integrate with DoD workflows. Recently, HEART was deployed on the Navy’s Project Harbinger as an added method of protecting the AI models.

    Speaker Info:

    Jordan Fischer

    Solutions Architect

    IBM

    I am a Lead AI developer and solutions architect in IBM’s public service division, designing and implementing integrated AI and machine learning systems for the US Government. I specialize in Advanced AI (LLMs and other foundation models), AI Governance, Responsible AI, and Adversarial AI Robustness. My background in AI and data systems in the public sector has spanned topics as diverse as urban development, climate resiliency, public administration, public health and human rights. I hold a master’s degree in Business Analytics from George Washington University and a bachelor’s degree in Economics from the University of Utah.

  • High, Hot, and Humid: On the Impacts of Extreme Conditions on Aviation

    Abstract:

    Rising occurrences of extreme heat and humidity, combined with high-altitude conditions, present significant challenges for aviation by increasing density altitude—a critical factor in aircraft performance. Elevated temperatures, humidity, and high-altitude environments reduce air density, impacting engine efficiency, lift, and takeoff capability while extending required runway lengths. These challenges are particularly pronounced at high-altitude airports, where all three factors converge to affect operational safety and efficiency. This study utilizes high-resolution atmospheric projections and aircraft performance modeling to assess risks for global airports and proposes scalable adaptive strategies. Addressing the growing prevalence of extreme heat and humidity requires resilient infrastructure, comprehensive risk assessments, and forward-thinking policies to ensure aviation operations remain safe and reliable in evolving conditions.

    Speaker Info:

    Cameron Liang

    Research Staff Member

    IDA

    Cameron Liang (B.S., Physics, University of California, San Diego, 2012; Ph.D., Astronomy & Astrophysics, University of Chicago, 2018) arrived at the Institute for Defense Analyses (IDA) in 2020 after a postdoctoral research position in the Kavli Institute for Theoretical Physics and University of California, Santa Barbara. He studies structure formation in the Universe with a focus on galaxy formation using state-of-the-art hydrodynamic simulations. He is an expert on high performance computing, machine learning, statistics, modeling and simulation. At IDA, he works on a range of topics, including climate change, artificial intelligence, and orbital debris.

  • Improving Readiness Outcomes with Collaboration: Quantifying the Cost of Siloed Resourcing

    Abstract:

    Resource decisions (e.g. purchasing and positioning spares) across the DoD supply system are optimally made from a multi-echelon viewpoint, allocating resources between retail sites and a centralized wholesale in tandem to maximize readiness outcomes. Due to the size and complexity of the supply system, it is difficult to draw direct connections between resource decisions and mission outcomes. Thus, the common metrics used to grade performance do not strongly correlate to readiness and result in siloed thinking and inefficient resource allocation.

    Using discrete-event simulations of the end-to-end Naval aviation sustainment system, we quantified the readiness and cost impact of sub-optimal resourcing decisions due to siloed decision-making. These inefficiencies are best avoided by taking a multi-echelon approach. However, recognizing the DoD-wide paradigm shift this would require, we identified new wholesale metrics that more strongly tie to flight line readiness to mitigate the inefficiency. Furthermore, quantifying the cost-to-readiness relationship across the supply system can serve as a powerful basis for DoD-wide resource optimization in lieu of multi-echelon approaches.

    Speaker Info:

    Connor McNamara

    Research Staff Member

    IDA

  • Improving the Long-Term Reusability of Nondestructive Evaluation Data Sets

    Abstract:

    The field of nondestructive evaluation (NDE) is currently undergoing a digital transformation. NDE 4.0, mirroring the Industry 4.0 concept, seeks to improve inspections by leveraging advancements in computing hardware, machine learning/artificial intelligence, and digital thread/digital twin with large volumes of digital NDE data. These new capabilities cannot be realized without well-curated data sets. However, NDE data poses several major challenges. Data are often complex and unstructured, ranging from large multi-dimensional arrays of raw data to perhaps simplified image or one-dimensional representations. Often, meaningful data representing signals of interest are limited and are often in the presence of significant noise. Critical metadata describing the circumstances of the measurement may often be missing. Data may also be stored in a variety of proprietary formats, potentially restricting access to critical raw data needed to develop new measurement techniques or train AI/ML models. These factors have hindered research and development on these NDE 4.0 concepts. This talk will discuss efforts toward overcoming some of these hurdles and improving long term reusability of NDE data through improved data management practices. This includes development of software tools to improve metadata capture, with an emphasis on user experience and minimizing impact on workflow. Also discussed will be efforts toward development of a new data format standard for NDE data and some of the lessons learned to date.
    Portions of this work are funded by the U.S. Air Force through contract FA8650-19-D-5230.

    DISTRIBUTION STATEMENT A. Approved for public release: distribution is unlimited.

    Speaker Info:

    Tyler Lesthaeghe

    Research Engineer

    University of Dayton Research Institute

    Tyler Lesthaeghe is a research engineer in the NDE Engineering group at the University of Dayton Research Institute. He hold a B.S. in mechanical engineering, an M.S. in Engineering Mechanics, and is a Ph.D. candidate at Iowa State University. His area of expertise is nondestructive evaluation, but he also works on problems related to infrastructure for data collection and analysis, data management, and ensuring long term data reusability.

  • Interpretable Machine Learning 101

    Abstract:

    This mini tutorial focuses on introducing the methods and concepts used in interpretable machine learning, particularly for applications that incorporate tabular operational data.
    Much of the current AI focus is on generative AI, which is deeply rooted in uninterpretable neural networks; however, there is still substantial research in interpretable machine learning that regularly outperforms neural networks for classical machine learning tasks such as regression and classification.
    This course will summarize supervised vs. unsupervised machine learning tasks as well as online vs offline learning settings. Classical machine learning models are introduced and used to motivate concepts such as bias-variance trade-off, hyperparameter tuning, and optimization algorithms/convergence. Last, a swift overview of recent work in interpretable machine learning will point the audience towards state-of-the-art advances.
    The only pre-requisites for this course are familiarity with mathematical notation and elementary linear algebra: mainly being familiar with matrix/vector operations, matrix inverses, and ill-conditioned matrices.

    Speaker Info:

    Nikolai Lipscomb

    Research Staff Member

    IDA

    Nikolai Lipscomb is a research staff member at the Institute for Defense Analyses and works within the Science, Systems, and Sustainment Division.

    Prior to his work at IDA, Nikolai graduated from the Department of Statistics and Operations Research at The University of North Carolina at Chapel Hill. Nikolai's research experience includes stochastic modeling and numerical 0ptimization.

  • Kernel Model Validation: How To Do It, And Why You Should Care

    Abstract:

    Gaussian Process (GP) models are popular tools in uncertainty quantification (UQ) because they purport to furnish functional uncertainty estimates that can be used to represent model uncertainty. It is often difficult to state with precision what probabilistic interpretation attaches to such an uncertainty, and in what way is it calibrated. Without such a calibration statement, the value of such uncertainty estimates is quite limited and qualitative. I will discuss the interpretation of GP-generated uncertainty intervals in UQ, and how one may learn to trust them, through a formal procedure for covariance kernel validation that exploits the multivariate normal nature of GP predictions, and show examples.

    Speaker Info:

    Carlo Graziani

    Argonne National Laboratory

    Carlo Graziani received a B.S. in applied physics from Columbia University School of Engineering and Applied Science in 1982 and a Ph.D. in physics from the University of Chicago in 1993. He was a postdoctoral research associate at the University of Chicago for the summer of 1993 and then an NRC/NASA research associate at the Goddard Space Flight Center from 1993 to 1996 and at the Enrico Fermi Institute, the University of Chicago, from 1996 to 1999. He then worked as a science team member of the international High-Energy Transient Explorer (HETE) project for over a decade. In June 2007 he joined the University of Chicago, where he was a research associate professor in the Department of Astronomy & Astrophysics. He joined Argonne in 2017, and currently works on theory and applications of uncertainty quantification and machine learning.

  • Lethal Debris Creation Following Untracked Orbital Debris Impacts

    Abstract:

    In this study, we use smooth particle hydrodynamics modeling to examine the creation of mission-lethal non-trackable orbital debris from impacts of 10 g, 100 g, and 10 kg spherical and cylindrical objects on small satellite bus structures at 0, 15, 45, and 75 degrees obliquity. Our simulations include impacts at velocities below, approaching, and above the energy-density threshold that typically disables satellite functionality and creates additional lethal debris. We compare the mass distributions resulting from smooth particle hydrodynamics simulations to the distributions derived from NASA’s satellite breakup model; NASA’s approximation of debris generation aligns well with our simulation results for large, trackable masses but deviates from our simulation results for small, non-trackable masses. Results also show only minor differences in satellite damage and debris generation between spherical and cylindrical 10 kg impactors.

    Speaker Info:

    Peter Mancini

    Research Staff Member

    IDA

    Peter Mancini works at the Institute for Defense Analyses, supporting the Director, Operational Test and Evaluation (DOT&E) as a Cybersecurity OT&E analyst.

  • Motivating Effects to Missions from Cyber Threats During Operational Testing

    Abstract:

    Operational Testing and Evaluation with cyber threats informs decision making by providing examples of adversarial cyber threats and the effects those exploits cause to mission performance. This briefing examines the goals for, current problems with, and opportunities for improvement in operational testing in a cyber-contested environment. Namely, operational testing in the DoD too often does not consider the cyber effects against a unit during active missions. This briefing argues the DoD needs to move from evaluating segregated systems, to evaluating integrated systems-of-systems. This also requires operational realism with regards to personnel and system configuration, cyber threat integration with mission performance testing, dedicated and methodical data collection from operational testers, and including advanced cyber-attacks such as supply chain compromises.

    Speaker Info:

    Jordon Adams

    Research Staff Member

    Institute for Defense Analyses

    Dr. Jordon R. Adams is currently the Project Lead for cyber testing of Land and Expeditionary (LEW) systems and the Deputy Portfolio Lead for cyber testing of all systems on DOT&E oversight at IDA. Jordon received his PhD in High Energy Physics at the Florida State University in 2015, worked for the Center for Army Analysis from 2015-2016 as an Operations Research Systems Analyst, and has been with IDA in support of DOT&E since 2017.

  • Multimodal Video Summarization on Multi-Scene Single-Context Data

    Abstract:

    This project explores transfer-learning approaches for operationalizing multimodal video captioning, focusing on effectively summarizing longer videos. Our methodology employs a Convolutional Neural Network (CNN) encoder with image, motion, and audio modes and a Long Short-Term Memory (LSTM) decoder architecture, incorporating key-frame extraction to reduce computational overhead while integrating audio features surrounding the key frames to improve caption quality. We begin by training a model on the Microsoft Research Video-to-Text (MSR-VTT) benchmark dataset, primarily containing short video clips. We then operationalize the model by evaluating performance on a context-specific dataset, the Dattalion dataset, which features war footage from the conflict in Ukraine of substantially higher length than the videos in MSR-VTT. To address challenges associated with labeling the Dattalion dataset ourselves and to enable the model to understand context-specific themes in the dataset, we apply transfer-learning by combining baseline training on MSR-VTT with fine-tuning on the Dattalion dataset to better handle context-specific videos of potentially long durations. We investigate the comparative performance before and after transfer-learning by evaluating key metrics — BLEU4, METEOR, ROUGE, and CIDEr. This research aims to provide insights into the effectiveness of transfer-learning for video captioning on longer videos in context-specific environments , with implications for improving video summarization in domains such as disaster response and defense.

    Speaker Info:

    Aidan Looney

    Cadet

    United States Military Academy

    Aidan Looney is a senior at the United States Military Academy and is studying Operations Research. He will graduate and commission in May into the United States Army as a Cyber Warfare Officer. His research interests are at the intersection optimization and machine learning, building products to further defense initiatives.

  • Multiview Computer Vision for Detecting Defects in Munitions

    Abstract:

    Quality control for munitions manufacturing is an arduous process in which trained technicians examine and enhance X-ray scans of each round to determine whether defects are present. In this talk, we will introduce a machine learning model that would allow manufacturers to automate this process - which is particularly suited for supervised learning given its repetitive nature and clearly defined defect characteristics. Four scans are taken for each round at 30-degree deflections from one another. The three distinct zones of the round have different standards for what constitutes a defect. Our preprocessing pipeline involves applying the necessary image enhancement techniques to highlight defects in the scan and then applying an unsupervised masking algorithm to isolate and segment each zone of the round. Isolated zones from all four views are then fed into a zone-specific multiview neural network trained to detect defects. Due to different defect rates in each zone, two zones use a variational autoencoder for anomaly-based detection while one zone uses a convolutional neural network for heuristic-based detection. The implementation of this system stands to save manufacturers significant resources and man-hours devoted to quality control of their rounds.

    Speaker Info:

    William Niven

    Cadet

    United States Military Academy

    Cadet William Niven is currently an Applied Statistics & Data Science Major at the United States Military Academy at West Point and will graduate this May. Upon graduation, Cadet Niven will commission as a 2nd Lieutenant in the Army Cyber Corps and plans to pursue a career as a software developer within the service. For his senior thesis within West Point's Department of Math, Cadet Niven is conducting research in conjunction with U.S. Army Combat Capabilities Development Command (DEVCOM). In the coming years, he plans to use his education to contribute to the Army's burgeoning data science and artificial intelligence capabilities.

  • Navigating Atmospheric Data An Introduction to the Atmospheric Science Data Center

    Abstract:

    The Atmospheric Science Data Center (ASDC) is a vital resource for the global scientific community, providing access to a wealth of atmospheric and climate-related data collected through NASA's Earth observing missions. This paper introduces the ASDC, outlining its key functions, data offerings, and user services. By facilitating the discovery, access, and utilization of extensive atmospheric datasets—including those related to radiation budget, aerosols, clouds, and air quality—the ASDC plays a crucial role in advancing Earth system science research. We highlight how researchers, educators, and decision-makers can leverage the center’s resources for applications ranging from climate modeling to air quality monitoring and public health studies. Additionally, we explore the ASDC's commitment to open science, emphasizing its data management practices, user support, and tools for ensuring data accessibility and usability across diverse scientific disciplines. This introduction aims to guide users in navigating the ASDC’s data portal and effectively integrating these datasets into their research workflows for enhanced environmental understanding and decision-making.

    Speaker Info:

    Hazem Mahmoud

    Science Lead

    NASA LaRC ASDC ADNET

    Dr. Hazem Mahmoud, the Science Lead at the Atmospheric Science Data Center, brings a wealth of expertise in geophysics and environmental engineering to his role. His primary focus lies in utilizing both orbital and suborbital instruments for remote sensing of the Earth's atmosphere. Dr. Hazem specializes in analyzing radiation budget, cloud formations, aerosol distribution, and tropospheric composition. His ultimate goal is to achieve near real-time air quality monitoring from space and study the impact of the air we breathe on our health. His passion for this field ignited when he confronted the challenge of limited Earth data availability early in his career, compelling him to dedicate his research to remote sensing applications. He firmly advocates for the integration of remote sensing data into scientific endeavors, believing it to be a crucial step in advancing global research efforts.

  • Non-destructive evaluation uncertainty quantification using optimization-based confidence

    Abstract:

    Non-destructive evaluation is an integral component of engineering applications that is subject to variability and uncertainty. One such example is the use of ultrasonic waves to infer bond strength of a specimen of two adhered composite materials by first inferring the specimen’s interfacial stiffness using noisy phase angle measurements and then propagating the result through an established linear regression relating interfacial stiffness to bond strength. We apply optimization-based confidence intervals to obtain the interfacial stiffness confidence interval and then propagate the result through the existing linear regression such that the final bond strength interval achieves finite-sample coverage under reasonable assumptions. Since we have access to a parameterized forward model of the ultrasonic wave propagation through the specimen along with known physical constraints on the model parameters, this technique is a particularly effective approach to leverage known information without relying on a subjective prior distribution. Applying this technique requires two methodological innovations; incorporation of unknown noise variance and the propagation of the resulting interval through an existing linear regression output. Additionally, the forward model characterizing the propagation of ultrasonic waves is highly nonlinear in the parameters, necessitating interval calibration innovation. We compute a variety of intervals and demonstrate their statistical validity via a simulation study.

    Speaker Info:

    Michael Stanley

    Research Scientist

    AMA/NASA LaRC

    Michael (Mike) Stanley is a new postdoctoral researcher at NASA Langley Research Center contracted through Analytical Mechanics Associates (AMA). He obtained his PhD in Statistics from Carnegie Mellon University under Mikael Kuusela with a thesis focused on statistical inference for ill-posed inverse problems in the physical sciences. During his PhD, he collaborated extensively with the Jet Propulsion Laboratory on carbon flux uncertainty quantification and was awarded a Strategic University Research Partnership (SURP) to develop decision theoretic and optimization-based UQ. This work centered around the strategic desire to provide prior-free UQ alternatives to NASA and remains a key focus of his research. More broadly, he is interested in the intersection of statistics, probability, optimization, and computation.

  • Novel Batch Weighing Methodology using Machine Learning and Monte Carlo Simulation

    Abstract:

    Batch weighing is a common practice in the manufacture, research, development, and handling of product. Counting individual parts can be a time consuming and inefficient process, and the ability to batch weigh can save time and money. The main downside of batch weighing is the potential risk of error in the estimated quantity due to tolerance and noise stacking. The methodology highlighted in this presentation aims to directly address and alleviate this risk by quantifying it using Monte Carlo simulation and Discriminant Analysis, a supervised machine learning modeling approach. The final model can be used to inform the user of the specific risk associated with each batch based on weight and allow for less potential for misclassification. The presentation also discusses guidelines for applying the methodology, and remedial methods for certain issues that may arise during its application, using a case study to help illustrate the method’s benefits.

    Speaker Info:

    Christopher Drake

    Lead Mathematical Statistician

    US Army Futures Command

    Christopher Drake, PStat® is a Lead Mathematical Statistician in Army Futures Command at Picatinny Arsenal, as part of the Systems Engineering Directorate. Mr. Drake has over 10 years of experience working on highly complex research and development programs as a statistician in the Army. Mr. Drake also has more than 10 years of experience as a Lecturer for Probability & Statistics courses for the Armament Graduate School, a graduate school at Picatinny Arsenal offering advanced degrees in specialized areas of armaments engineering. Mr. Drake gained his Bachelor’s degree in Industrial Engineering from Penn State University with a focus in Manufacturing Systems, and his Master’s degree in Applied Statistics from the New Jersey Institute of Technology.

  • Optimal Transport-based Space Filling Designs for AI and Autonomy

    Abstract:

    Space-filling designs play a critical role in efficiently exploring high-dimensional input spaces, especially in applications involving simulation, autonomous systems, and complex physical experiments. While a variety of methods exist for generating such designs, most rely on rectangular constraints, fixed weighting schemes, or limited support for nominal factors. In this presentation, we introduce a novel approach based on sliced optimal transport that addresses these limitations by enabling the creation of designs with significantly improved space-filling properties—offering enhanced minimum inter-point distances and better uniformity compared to existing methods. Our approach accommodates arbitrary non-rectangular domains and weighted target distributions, ensuring that practitioners can capture realistic constraints and focus sampling where it matters most. The sliced formulation naturally extends to multi-class domains, enabling a unified design across any number of categorical factors in which each sub-design maintains favorable space-filling properties, and the classes collectively collapse into a well-distributed design when nominal factors are ignored. Furthermore, our method is computationally efficient, readily scaling to hundreds of thousands of design points without sacrificing performance—an essential feature for testing AI and autonomous systems in high-dimensional simulation environments. We will demonstrate the theory behind sliced optimal transport, outline our algorithms, and present empirical comparisons that highlight the benefits of this method over existing space-filling approaches.

    Speaker Info:

    Tyler Morgan-Wall

    Research Staff Member

    IDA

    Dr. Tyler Morgan-Wall is a Research Staff Member at the Institute for Defense Analyses, and is the developer of the software library skpr: a package developed at IDA for optimal design generation and power evaluation in R. He is also the author of several other R packages for data visualization, mapping, and cartography. He has a PhD in Physics from Johns Hopkins University and lives in Silver Spring, MD.

  • Optimizing Surveillance Test Designs for DoD Programs Using a Simulation-Based Approach

    Abstract:

    As some DoD programs extend beyond their original decommission dates, ongoing subcomponent testing is crucial to ensure system reliability and study aging effects over the extended service life. However, surveillance efforts often face challenges due to limited asset availability, as predetermined quantities are typically allocated to match the original decommission timeline. Options such as recycling assets or procuring additional units are frequently constrained by destructive testing, high costs, long lead times, or a lack of capable manufacturers. Consequently, programs face depleting supplies, leading to a halt in surveillance efforts to sustain life-extension, risking undetected performance degradation.

    This work presents a simulation-based approach to optimize test quantities and testing intervals while maintaining reliable detection of performance degradation given the surveillance study design, current data characteristics and model specification. By simulating datasets that mirror the original dataset, using the parameter coefficients and residual standard deviation derived from a fitted model, the approach estimates statistical power. Starting with a fixed test size and interval (e.g., n = 1 and t = 1), an equivalent model is sequentially fitted to each dataset, adding simulations cumulatively. This process is repeated across various test sizes and intervals to determine the optimal set with sufficient power to detect degradation consistent with the original dataset’s effect size and variability. While a simple linear regression is specified for demonstration, the approach is flexible and can potentially accommodate other models that involve hypothesis testing such as nonlinear models or generalized additive models (GAMs).

    A key challenge is the increased risk of inflated false-positive rates as accumulating data are analyzed over time given the sequential nature of surveillance data collection. To address this, sequential methods, commonly used in clinical trials (e.g., Haybittle-Peto and O’Brien-Fleming boundaries) can be employed and integrated into the simulation framework. These methods help balance the need for interim analyses with the risk of false positives in long-term surveillance studies, ensuring statistical rigor.

    Results demonstrate that variability, effect size, test quantity and testing intervals collectively affect the ability to detect true aging trends. Visualizations highlight the idea that larger effect sizes with low variability are more likely to reveal true aging trends with less data compared to smaller effect sizes in high-variability settings. This approach can enable programs to tailor test schedules based on achieved power for individual parameters, adjusting the overall test quantity and testing frequency for a subcomponent as necessary while maintaining confidence in detecting performance degradation over time. Alternatively, the test frequency and quantity can be increased to accelerate the identification of aging trends and emerging issues. Extensions and limitations of this approach are also planned for as a discussion.

    Speaker Info:

    Bryant Chen

    Statistician

    Lockheed Martin Space

    Bryant Chen is currently a Statistician at Lockheed Martin for the MMIII RSRV program, supporting the Air Force Nuclear Weapons Center at Hill AFB, UT. Additionally, he tutors students in mathematics at the STEM Learning Center located on the campus of Salt Lake Community College. He earned a B.S in Industrial Engineering & Mathematics from the University at Buffalo, an M.S in Statistics from California State University, Fullerton and an M.S in Finance from the University of Utah.

  • Packing for a Road Trip: Provisioning Deployed Units for a Contested Logistics Environment

    Abstract:

    In the event of a conflict, the Department of Defense (DOD) anticipates significant disruptions to their ability to resupply deployed units with the spare components required to repair their equipment. Simply giving units enough additional spares to last the entirety of the mission without resupply is the most straightforward and risk-averse approach to ensure success. However, this approach is also the most expensive, as a complete duplicate set of spares must be purchased for each unit, reducing the number of systems that can be so augmented on a limited budget. An alternative approach would be to support multiple combatant units with a common set of forward-positioned spares, reducing the duplicative purchasing of critical items with relatively low failure rates and freeing up funding to support additional systems. This approach, while cost-effective, introduces a single point of failure, and presupposes timely local resupply.
    We have used Readiness Based Sparing (RBS) tools and discrete event simulations to explore and quantify effectiveness of different strategies for achieving high availability in a contested logistics environment. Assuming that local, periodic resupply of spares is possible, we found that creating a centralized pool of forward-positioned spares dramatically decreases the overall cost for a given readiness target compared to augmenting each individual unit with additional spares. Our work ties dollars spent to readiness outcomes, giving DOD leadership the tools to make quantitative tradeoffs.

    Speaker Info:

    Joshua Ostrander

    Research Staff Member

    Institute for Defense Analyses

    Dr. Joshua Ostrander is a Research Staff Member in the Sustainment group at the Institute for Defense Analyses. Trained as a chemist, he received his Ph.D. in Physical Chemistry from the University of Wisconsin-Madison in the lab of Martin Zanni where he developed and applied nonlinear spectroscopy and microscopy to scientific problems in biology and materials science. Joshua's current research is focused on using readiness modeling and simulation to help DoD leaders make informed, data-driven decisions.

  • Predicting Cyber Attack Likelihood using Probabilistic Attack Trees

    Abstract:

    Understanding how weapon systems and platforms will perform in cyber-contested environments is crucial for making rational programmatic and engineering decisions. Understanding the cyber survivability of systems as part of full-spectrum survivability is particularly difficult.

    Probabilistic attack trees, that combine attack surface analysis, vulnerability analysis, and mission loss into an overall risk picture provide a productive approach to this challenge. Attack trees can be used to show all of the individual pathways an attacker could follow to lead to a particular mission loss, and if the probabilities of the lowest level on the attack tree is determined in some way, the rest of the probabilities across the tree can be easily calculated providing the likelihood that an adversary will achieve a particular effect, and the likelihood of the various pathways an attacker might utilize to get to that effect.

    Assigning the initial probabilities is the most challenging and contentious part of this approach, but it can be accomplished using a three tiered process. First, any available historical data should be used to generate probabilities of leaf nodes, including historical, test, and architectural data. Second, simple linear models can be built using a combination of data, human experts, and AI. Finally, direct assessment can be utilized for leaves that do not yet have applicable data or models built using a combination of SMEs and AI models.

    The initially completed attack tree can then be analyzed to find opportunities where additional data or models can be applied, with a focus on those leaves that appear to have the greatest impact on the overall mission risk. Mitigations and design changes can be considered and the attack tree recalculated with those changes, providing an easy and compelling way to understand and present return on investment for different options.

    Probabilistic attack trees have the potential to become a cornerstone of modern cyber risk assessment with quantitative results that are transparent, repeatable, and easily understood enabling defense programs and operators to make better decisions. This approach offers a reliable and scalable way to safeguard mission-critical platforms and weapon systems enabling them to continue to function as intended despite whatever an adversary may throw at them.

    Speaker Info:

    William Bryant

    Technical Fellow

    Modern Technology Solutions, Inc. (MTSI)

    Dr. Bill “Data” Bryant is a cyberspace defense and risk leader with a diverse background in operations, engineering, planning, and strategy. As a thought leader in cyber defense and risk assessment of non-traditional cyber-physical systems, Dr. Bryant believes that cyber-physical systems such as aircraft are often an organization’s most critical and least defended assets, and he is passionate about improving the defensive posture of these systems.

    In his current role at Modern Technology Solutions Incorporated, Dr. Bryant created the Unified Risk Assessment and Measurement Process (URAMS). With a focus on assessing the cyber risk to aviation platforms and weapon systems, Dr. Bryant has supported numerous strategic and operational efforts for cyber resiliency, survivability of weapon systems, and cybersecurity risk assessments on various critical cyber-physical systems across multiple agencies. Dr. Bryant also co-developed Aircraft Cyber Combat Survivability (ACCS) with Dr. Bob Ball and has been working to apply kinetic survivability concepts to the new realm of cyber weapons.

    With over 25 years in the Air Force—including serving as the Deputy Chief Information Security Officer—Dr. Bryant has extensive experience successfully implementing proposals and policies to improve the cyber defense of weapon systems. He holds a wide range of academic degrees, in addition to his PhD, including Aeronautical Engineering, Space Systems, Military Strategy, and Organizational Management. He also holds CISSP, C|EH, and Security+ certifications.

  • Reducing Test to Purpose

    Abstract:

    In 2023, the Scientific Test and Analysis Techniques Center of Excellence (STAT COE) assisted the planning of a test for the effect of storage conditions on the protective coating of launched devices. It was observed that the protective coating had developed potentially indented striations on support surfaces, which could affect the flight properties of the device in question. However, the extent of these striations under various potential storage conditions were unknown. A physics-based model to simulate the extent of striations over time was updated. In addition to validation of the physics-based model, the model was slow and expensive to run and the team wished to approximate or extrapolate results in a larger factor space than feasible with the simulation. The test team had initially planned a replicated full factorial design to characterize the behavior of the protective coating. With the assistance of the STAT COE, they were able to identify the meaningful depth of striation that would be necessary to detect and estimate the inherent variability of the system. With this information the team was able to substantially reduce testing, saving the program more than $200,000 and schedule impact while sufficiently characterizing the behavior of the system under test.

    Speaker Info:

    Anthony Sgambellone

    Sr STAT Expert

    STAT COE

    Dr. Anthony Sgambellone is a Scientific Test and Analysis Techniques (STAT) Expert at the STAT Center of Excellence (COE). He has been part of the STAT COE since 2020 where he provides support and instruction in efficient and effective test design and analysis across the DOD. Before joining the STAT COE Dr. Sgambellone developed machine-learning models in the financial industry and on an Agile software development team in support of customers on Wright Patterson Air Force Base. Anthony holds a BS in Biology from Case Western Reserve University, a MS in Statistics from the University of Akron and a PhD in Statistics from the Ohio State University.

  • Responsible Artificial Intelligence

    Abstract:

    Responsible AI (RAI) is a critical framework that ensures artificial intelligence (AI) systems are designed, developed, and deployed responsibly, with trust and safety as a primary consideration, alongside ethical and legal use. As AI becomes increasingly pervasive across various domains, including business, healthcare, transportation, the military and education, it is essential that we prioritize responsible principles, policies, and practices. This RAI one-day short course ensures practitioners have the critical knowledge, skills, and analytical abilities needed to identify and address opportunities and challenges in the design, development, and deployment of systems that incorporate AI.

    Speaker Info:

    Missy Cummings

    Professor

    George Mason University

    Professor Mary (Missy) Cummings received her BS in Mathematics from the US Naval Academy in 1988, MS in Space Systems Engineering from the Naval Postgraduate School in 1994, and Ph.D. in Systems Engineering from the University of Virginia in 2004. A naval pilot from 1988-1999, she was one of the U.S. Navy's first female fighter pilots. She is a Professor in the George Mason University College of Engineering and Computing, and directs the Mason Responsible AI program as well as the Mason Autonomy and Robotics Center. She is an American Institute of Aeronautics and Astronautics and Royal Aeronautical Society Fellow. Her research interests include artificial intelligence in safety-critical systems, assured autonomy, human-systems engineering, and the ethical and social impact of technology.

  • Responsible Artificial Intelligence

    Abstract:

    Responsible AI (RAI) is a critical framework that ensures artificial intelligence (AI) systems are designed, developed, and deployed responsibly, with trust and safety as a primary consideration, alongside ethical and legal use. As AI becomes increasingly pervasive across various domains, including business, healthcare, transportation, the military and education, it is essential that we prioritize responsible principles, policies, and practices. This RAI one-day short course ensures practitioners have the critical knowledge, skills, and analytical abilities needed to identify and address opportunities and challenges in the design, development, and deployment of systems that incorporate AI.

    Speaker Info:

    Jesse Kirkpatrick

    Research Associate Professor

    George Mason University

    Jesse Kirkpatrick is a Research Associate Professor and the co-director of the Mason Autonomy and Robotics Center at George Mason University. Jesse is also an International Security Fellow at New America and serves as a consultant for numerous organizations, including some of the world’s largest technology companies. Jesse’s research and teaching focuses on responsible innovation, with an emphasis on Responsible AI. He has received various honors and awards and is an official “Mad Scientist” for the U.S. Army.

  • Risk-Informed Decision Making: An Introduction

    Abstract:

    All projects are inherently uncertain. To make wise decisions, projects should incorporate an analysis of uncertainty in considering potential alternatives. One process that provides an approach to incorporating uncertainty in the decision-making process is known as risk-informed decision making (RIDM). RIDM can be applied whenever there is uncertainty involved and competing alternatives. This tutorial provides a comprehensive of everything needed to conduct RIDM analysis. Examples of technical failures, cost overruns, and schedule delays for historical project projects are provided, giving clear evidence of project uncertainty and provide strong motivation for conducting risk analysis. The terminology of risk is presented. Risk and uncertainty for a variety of sources – cost, schedule, technical, and safety – are discussed. The mathematical basis for risk analysis is discussed, along with a simple worked example. The tools and techniques necessary for the conduct of risk analysis are provided as well. Best practices and potential pitfalls are provided. The RIDM process is outlined and contrasted with a similar approach called risk-based decision making. An overview of risk management is provided and its relationship with RIDM is described. The tutorial concludes notional case studies of RIDM.

    Speaker Info:

    Christian Smart

    Cost Engineer

    Jet Propulsion Laboratoy

    Dr. Smart is a Cost Engineer with NASA’s Jet Propulsion Laboratory. He has experience supporting both NASA and the Department of Defense in the theory and application of risk, cost, and schedule analytics for cutting-edge programs, including nuclear propulsion and hypersonic weapon systems. For several years he served as the Cost Director for the Missile Defense Agency. An internationally recognized expert on risk analysis, he is the author of Solving for Project Risk Management: Understanding the Critical Role of Uncertainty in Project Management (McGraw-Hill, 2020).

    Dr. Smart received the 2021 Frank Freiman lifetime achievement award from the International Cost Estimating and Analysis Association. In 2010, he received an Exceptional Public Service Medal from NASA for the application of risk analysis. Dr. Smart was the 2009 recipient of the Parametrician of the Year award from the International Society of Parametrics Analysts. Dr. Smart has BS degrees in Mathematics and Economics from Jacksonville State University, an MS in Mathematics from the University of Alabama in Huntsville (UAH), and a PhD in Applied Mathematics from UAH.

  • Semi-parametric Modeling of the Equation of State of Dissociating Materials

    Abstract:

    Modeling the equation of state (EOS) of chemically dissociating materials at extreme temperature and density conditions is necessary to predict their thermodynamic behavior in simulations and experiments. However, this task is challenging due to sparse experimental and theoretical data needed to calibrate the parameters of the equation of state model, such as the latent molar mass surface. In this work, we adopt semi-parametric models for the latent molar mass of the material and its corresponding free energy surface. Our method employs basis representations of the latent surfaces with regularization to address challenges in basis selection and prevent overfitting. We show with an example involving carbon dioxide that our method improves model fit over simpler representations of the molar mass surface while preserving low computational overhead. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. LLNL-ABS-872125

    Speaker Info:

    Jolypich Pek

    Graduate Student

    George Mason University

    Jolypich Pek is a PhD student in the statistics department at George Mason University, working with Dr. Ben Seiyon Lee on developing Bayesian calibration and uncertainty quantification methodology for materials science applications. Her current research focuses on calibrating equation of state models, a collaboration with and funded by Lawrence Livermore National Laboratory. She earned her bachelor’s degree in Mathematical Statistics from George Mason, where she conducted research on modeling COVID-19 transmission on campus using an epidemic model. Additionally, she was a data science intern at ReefPoint Group, where she developed a framework to support Veteran healthcare access for the Department of Veteran Affairs.

  • Sequential Space-Filling Designs

    Abstract:

    There are few recommended methods to help testers plan efficient modeling and simulation studies. Space-filling designs are a rigorous choice, but one of their drawbacks is that they require the final sample size to be selected prior to testing. More efficient testing can be completed by using sequential designs, which choose test points without knowledge of the final sample size. Using sequential designs can prevent oversampling and help to augment poorly designed tests. We provide an overview of sequential space-filling designs, with the focus on designs that are most suitable for the test and evaluation community.

    Speaker Info:

    Anna Flowers

    Ph.D. Student

    Virginia Tech

    Anna Flowers is a fourth-year Ph.D. student in Statistics at Virginia Tech. She received an M.S. in Statistics from Virginia Tech in 2023 and a B.S. in Mathematical Statistics from Wake Forest University in 2021. She is jointly advised by Bobby Gramacy and Chris Franck, and her research focuses on Gaussian Process regression, surrogate modeling, and active learning. She was an intern at the Institute for Defense Analyses in 2024.

  • SERPENS: Assessing the Operational Impacts of Orbital Debris

    Abstract:

    Space is becoming increasingly crowded with both satellites and orbital debris as more proliferated constellations come online. While research has shown that self-sustained run-away growth of orbital debris (a.k.a. Kessler Syndrome) is unlikely in the future, crowded space lanes are degrading satellites’ ability to perform their mission today. To quantify these operational impacts, we built the Space Environment Risk Prediction by Evaluating Numerical Simulations (SERPENS) model. SERPENS uses a high-fidelity, physics-based propagator to precisely calculate orbits and evaluate conjunction scenarios. Here, we use SERPENS to simulate the space environments after various debris-creating events and predict the operational consequences on particular satellites and constellations. SERPENS provides IDA with a test-bed to evaluate the operational impacts of future DoD capabilities including upgrades to space domain awareness (SDA) infrastructure and satellite collision avoidance methodologies.

    Speaker Info:

    Benjamin Skopic

    Research Staff Member

    IDA

    Dr. Benjamin Skopic is a Research Staff Member in the Science, Systems and Sustainment (S3D) Division at the Institute for Defense Analyses (IDA). His work has focused on assessing the operational impacts of various space-based threats to satellites using modeling and simulation. He is a primary developer on several such tools used to answer technical questions for the DoD. Dr. Skopic received his Ph.D. in Materials Science & Engineering and B.S. in Physics from William & Mary. His dissertation focused on the ribbon silk fibers naturally produced by recluse spiders. The unique adhesive properties of the silk inspired the design of ribbon/tape-based metastructures.

  • SMOOTH PARTICLE HYDRODYNAMIC CODE PREDICTIONS FOR METEOROID DAMAGE TO THERMAL PROTECTION S

    Abstract:

    Interplanetary spacecraft are exposed to meteoroid fluxes with characteristics far exceeding the physical simulation capabilities of test facilities for predicting the likelihood that a meteoroid will penetrate a spacecraft’s critical systems. Accurate risk predictions are crucial to ensuring that important interplanetary missions, such as sample returns, can survive years of exposure to the meteoroid environment and safely reenter the Earth’s atmosphere with their scientific cargo.

    In this paper, we summarize a series of meteoroid impact damage computational simulations into two types of spacecraft composite protective structures using the Smooth Particle Hydrodynamics Code. We consider the effects of both meteoric materials and non-meteoric materials on a shielded forebody thermal protection system (TPS) for an extreme entry environment and on an unshielded aftbody TPS that is similar to the material covering the space shuttle’s external tank. Key to ensuring spacecraft reentry survival is understanding the potential damage from a meteoroid to a spacecraft’s TPSs, which may be housed beneath protective “garage” shielding enclosures.

    This analysis was presented at the Hypervelocity Impact Symposium 2024 [2] and is a follow-on effort to work published in 2022 by Williamsen et al. [1], who evaluated select meteoroid impact simulations into two types of spacecraft composite protective structures using the Smooth Particle Hydrodynamics Code (SPHC). Both past [1] and present [2] analyses support ongoing National Aeronautics and Space Administration Engineering Safety Center tasks. Here, we present the SPHC predictions for a different type of multi-shock shield than in [1]. We also report patterns discovered in our analysis that reveal how this multi-shock shield and underlying TPS respond to physical impact.

    The meteoritic impactor materials we investigated in this study are iron, ice, and dunite (chondrite); the non-meteoritic impactor material is aluminum. We discuss the insights SPHC offers regarding debris cloud characteristics and forebody TPS damage, and then we use these insights to identify recognizable trends in forebody TPS penetration depth based on impact energy and overmatch energy. We leverage a new general multi-shock shield ballistic limit equation [3] to provide ballistic limit data that are missing from our limited set of SPHC predictions. Finally, we evaluate the SPHC predictions for aluminum particles impacting the aftbody TPS at three obliquities.

    [1] Williamsen, Joel, et al. “Prediction and Enhancement of Thermal Protection Systems from Meteoroid Damage Using a Smooth Particle Hydrodynamic Code.” Proceedings of 2022 Hypervelocity Impact Symposium. Paper #: HVIS2022-09XWV8VB6X. Alexandria, VA, September 18-22, 2022.

    [2] Corbett, Brooke et al. "Smooth Particle Hydrodynamic Code Predictions for Meteoroid Damage to Thermal Protection Systems Shielded by Composite Structures." Proceedings of 2024 Hypervelocity Impact Symposium. Paper #: HVIS2024-013. Tsukuba, Japan, September 9-13. 2024.

    [3] Schonberg, William P., et al. “Toward a More Generalized Ballistic Limit Equation for Multi-Shock Shields.” Acta Astronautica Vol. 213 (2023): pp. 307-319.

    Speaker Info:

    Brooke Corbett

    RSM

    IDA

    Brooke Corbett is a Research Staff Member at the Institute for Defense Analyses. She has worked on a broad range of programs within IDA's Operational Evaluation Division, including Live Fire Test & Evaluation survivability and lethality evaluations for U.S. Army, Air Force, USSOCOM, Navy, and Space Force programs. Brooke is developing subject matter expertise to support survivability and lethality evaluations of US Directed Energy Weapon Systems on DOT&E oversight, and supports survivability and risk analyses for select NASA Engineering and Safety Center programs.

    Brooke earned a PhD in Materials Science Engineering from the University of Denver in 2008, with research focused on survivability and risk assessments of hypervelocity impact damage response to the International Space Station's meteoroid/orbital debris shielding at elevated temperatures. She earned a MS in Physics and Astronomy from the University of Denver in 2001, with research focused on middle atmospheric long-wave infrared measurements and line-by-line radiative transfer model predictions of water vapor and carbon dioxide at polar regions. She earned a BS in Physics from Le Moyne College, with project work focused on optics and the manual creation of a parabolic primary mirror for a Dobsonian telescope.

  • Synthetic anchoring under the specific source problem

    Abstract:

    Source identification is an inferential problem that evaluates the likelihood of opposing propositions regarding the origin of items. The specific source problem refers to a situation where the researcher aims to assess if a particular source originated the items or if they originated from an alternative, unknown source. Score-based likelihood ratios offer an alternative method to assess the relative likelihood of both propositions when formulating a probabilistic model is challenging or infeasible, as in the case of pattern evidence in forensic science. However, the lack of available data and the dependence structure created by the current procedure for generating learning instances can lead to reduced performance of score likelihood ratio systems. To address these issues, we propose a resampling plan that creates synthetic items to generate learning instances under the specific source problem.
    Simulation results show that our approach achieves a high level of agreement with an ideal scenario where data is not a limitation and learning instances are independent. We also present two applications in forensic sciences -handwriting and glass analysis- illustrating our approach with both a score-based and a machine learning-based score likelihood ratio system. These applications show that our method may outperform current alternatives in the literature.

    Speaker Info:

    Federico Veneri

    Iowa State University

    Dr. Federico Veneri is a consultant at the Inter-American Development Bank (IADB). He received his PhD in statistics from Iowa State University, where he collaborated with the Center for Statistics and Applications in Forensic Evidence (CSAFE) on statistical foundation for machine learning-based score likelihood ratio (SLR) inference for source attribution problems in forensic sciences. His research focuses on machine learning applications, quantitative criminology, and large-scale policy evaluation.

  • Test Incident Report Analytics Tool

    Abstract:

    Each test event, ATEC test centers collect tens, hundreds, or even thousands of Test Incident Reports (TIRs) capturing issues with systems. Currently AEC evaluators score these TIRs one by one based on their severity, mission impact, and cause, which can be extremely time-consuming and prone to inconsistency. We are developing a TIR Analytics tool (TIRANICS) to accelerate the TIR scoring process. For any one TIR to be evaluated, TIRANICS provides three main capabilities: (1) lists of similar TIRs from current and past test events, (2) a recommended score, and (3) references to the Failure Definition and Scoring Criteria (FDSC) Guide for the system under test. To provide these capabilities, TIRANICS applies natural language processing techniques, namely, term frequency and transformer-based embeddings, similarity metrics, hierarchical text splitting, and neural network classifiers, to both the TIR narratives and/or the corresponding FDSC for the system under test. In the way we have processed and represented the corpus to capture semantic meaning, this also paves the foundation for utilizing a Large Language Model with the text as a knowledge base to further provide an evaluator with a recommended score and reasoning for that score.

    Speaker Info:

    Dan Owens

    AI Evaluator

    Army Evaluation Center

    Dan Owens has a Masters Degree from Carnegie Mellon University in Information Networking and a PhD in Electrical Engineering from Georgia Tech. He has worked for the US Army Test and Evaluation Command (ATEC) at Aberdeen Proving Ground, MD for fifteen years, first as a reliability evaluator working on a wide range of commodities including missile defense, battlefield networks, and tracked vehicles. For the last six years he has been the Army Evaluation Center (AEC) AI Team Lead, focusing on preparing the command to perform evaluations of AI-enabled systems, and identifying ways AI can be used to improve test and evaluation processes.

  • The Content and Context to Orbital Debris Analytics

    Abstract:

    Orbital debris collision hazard and population evolution modeling are foundational for space safety, space sustainability, and space security. The entangling dependencies of these three domains require persistent orbital intelligence to maintain sufficient situational awareness to satisfy space operators. LeoLabs provides this service by leveraging a global network of state of the art radars, a cloud-based computational/distribution engine, physics-based utilities, advanced visual analytics and machine learning tools. These capabilities combine to provide precise alerts in near real-time to space operators and contextual insights to space planners/policymakers. The suite of tools available via LeoLabs is examined and how they combine to serve a variety of demands for situational awareness. An emphasis is placed on balancing results between reacting to tactical events all the way to identifying strategic trends – both requiring immediate attention.

    Speaker Info:

    Darren McKnight

    Senior Technical Fellow

    LEOLABS

    Dr. Darren McKnight is currently Senior Technical Fellow for LeoLabs. Darren leads efforts to realize the value proposition for the growing global network of ground-based radars for space security, space safety, and space sustainability. He creates models, designs data depictions, develops risk algorithms, and leads space incident investigations. He is focusing on creating new statistical collision risk assessment approaches to provide valuable context to the global space safety community.

  • The Craft of Scientific Writing

    Abstract:

    Communicating technical concepts—clearly, concisely, and with purpose—is a skill that gives you the edge. Whether influencing business decisions, shaping research pathways, or simply growing as a technical professional, an ability to write distinguishes you from your peers. Learn the simple tools to make your technical content stand out.

    At The Technical Pen, we’ve found that strong technical writers showcase their skills in three primary categories: document layout, written voice, and audience engagement. This short course is built around core principles in each of these categories, offering actionable techniques to improve technical writing skills. Throughout the day, we will work on building a meaningful structure, energizing your written voice, and keeping your audience on track.

    Speaker Info:

    Kathryn Kirsch

    The Technical Pen

    Dr. Kathryn Kirsch is the founder and principal of The Technical Pen, a firm dedicated to helping scientists and engineers communicate effectively. For over a decade, Kathryn has worked with academics and industry professionals to build their technical writing skillset and to find their written voice. Her extensive experience in writing technical documents—whether proposals, journal papers, conference papers, technical reports, or write papers—provides the foundation for her engaging workshops and short courses. She holds a B.S., M.S., and Ph.D. in Mechanical Engineering.

  • Toward an Integrated T&E Framework for AI-enabled Systems: A Conceptual Model

    Abstract:

    The classic DoD T&E paradigm (Operational Effectiveness-Suitability-Survivability-Safety) benefits from 40 years of formalism and refinement and has produced numerous specialized testing disciplines (e.g., the -ilities), governing regulations, and rigorous analysis procedures. T&E of AI-enabled systems is still new, and the laboratory of ideas is a constant source of new testing options, the majority of which focus on performance. The classic gap between measures of performance and measures of effectiveness is very much present in AI-enabled systems and the over emphasis on performance testing means we might miss the (effectiveness) forest for the (performance) trees.

    Borrowing from the classic “integrated survivability onion” conceptual model, we propose a set of integrated and nested evaluation questions for AI-enabled systems that covers the full range of classic T&E considerations, plus a few that are unique to AI technologies within the military operational context and implied by the DoD Responsible AI Guidance. All requirements for rigorous analytical and statistical techniques are preserved and new opportunities to apply test science are identified. We hope to prompt an exchange of ideas that moves the community toward filling significant T&E capability gaps – especially the gaps between performance and effectiveness – and advancing a whole-program evaluation approach.

    Speaker Info:

    Karen O'Brien

    Sr. Principal Data Scientist

    Modern Technology Solutions, Inc

    Karen O’Brien is a senior principal data scientist and AI/ML practice lead at Modern Technology Solutions, Inc. In this capacity, she leverages her 20-year Army civilian career as a scientist, evaluator, ORSA, and analytics leader to aid DoD agencies in implementing AI/ML and advanced analytics solutions. Her Army analytics career ranged ‘from ballistics to logistics’ and most of her career was at Army Test and Evaluation Command or supporting Army T&E from the Army Research Laboratory. She was physics and chemistry nerd in the early days but now uses her M.S. in Predictive Analytics from Northwestern University to help her DoD clients tackle the toughest analytics challenges in support of the nation’s Warfighters. She is the Co-Lead of the Women in Data Huntsville Chapter, a guest lecturer in data and analytics graduate programs, and an ad hoc study committee member at the National Academy of Sciences.

  • Towards Flight Uncertainty Prediction of Hypersonic Entry Systems

    Abstract:

    The development of planetary entry technologies relies heavily on computational modeling due to limited ground and flight test data, which simulate the extreme environments encountered by spacecraft during atmospheric entry. Besides the inherent uncertainties associated with the hypersonic entry environment, computational models include significant uncertainties that may affect the prediction accuracy of the simulations. This makes uncertainty quantification a necessary tool for predicting the flight uncertainty with computational models and improving the robustness and reliability of entry systems. In this talk, after outlining the approach for flight uncertainty prediction of hypersonic entry systems, I will focus on presenting an overview of our research on efficient uncertainty quantification (UQ) and sensitivity analysis (SA) based on non-intrusive polynomial chaos theory applied to aerothermal prediction of entry systems. The application examples will include the demonstration of the UQ and SA methods on the stagnation point heat flux prediction of a hypersonic vehicle with a reduced order correlation and thermal protection system response of a hypersonic inflatable aerodynamic decelerator.

    Speaker Info:

    Serhat Hosder

    James A. Drallmeier Centennial Professor of Aerospace Engineering

    Missouri University of Science and Technology

    Dr. Serhat Hosder is currently the James A. Drallmeier Centennial Professor of Aerospace Engineering in the Mechanical and Aerospace Engineering Department at Missouri S&T and serves as the director of Aerospace Simulations Laboratory. He received his PhD degree in Aerospace Engineering from Virginia Tech in 2004. Prof. Hosder’s research activities focus on the fields of computational aerothermodynamics, multi-fidelity modeling and uncertainty quantification of hypersonic flows and technologies, directed energy for hypersonic applications, planetary entry/descent/landing of spacecraft, and aerodynamic shape optimization. His recent research projects have been funded by DoD Joint Hypersonics Transition Office, NASA, Missile Defense Agency, NSF and industry. Prof. Hosder is a Fellow of the Royal Aeronautical Society and an Associate Fellow of AIAA. He was the past chair of AIAA Hypersonic Technologies and Space Planes (HyTASP) Technical Committee (TC) between 2019 and 2021 and currently serves in the steering committee of HyTASP TC.

  • Transforming the Testing and Evaluation of Collaborative AI-enabled Multi-Agent Systems

    Abstract:

    Our presentation explores the application of Consensus-based Distributed Ledger Technology (C-DLT) in the testing and evaluation of collaborative adaptive AI-enabled systems (CA2IS). It highlights the potential of C-DLT to enhance real-time data collection, data validation, synchronization, and security while providing a trusted framework for model & parameter sharing, access control, multi-agent data fusion, and (as emphasized for this topic area) novel methods for continuous monitoring and Test & Evaluation of CA2IS systems.
    As autonomous multi-agent systems (AMAS) evolve, the underlying probabilistic foundation of increasingly AI/ML-enabled systems necessitates the adoption of new approaches to system Test and Evaluation (T&E) that extend traditional range-based & fixed scenario-based repetitive testing methods that have primarily evolved for the testing of historically deterministic systems - into capabilities to assess the performance expectations of probabilistic systems for applications in highly complex and dynamic environments where both operating conditions and system performance may be constantly changing and continuously adapting. Ideally, T&E methods would seamlessly extend from developmental testing, through acceptance and validation testing, and into missions operations (of critical importance as autonomous weapons-capable systems emerge). Additionally, these provisions may provide a mechanism for enhancing system resilience in contested environments through the inherently distributed, time ordered, and redundant records of on-board status, diagnostic & behavioral indicators, as well as, inter-agent communications and data transactions, collectively providing for continuous assessment of system performance to enable real-time characterization of system confidence as an input to well-informed decision making – whether such decisions are made via an agenic AI, or human-in-the-loop, context.
    The proposed C-DLT framework addresses these considerations by enabling both onboard and in-situ monitoring combined with system-wide record synchronizations to capture the real-world context and dynamics of inter-agent behaviors within a global frame of reference model (and common operating picture). Additionally, the proposed method provides for recurring periodic testing of operational AI/ML-based systems, emphasizing and addressing the dynamic nature of collaborative adaptive Continuous Learning Systems (CLS) as they incorporate new training data, and adapt to new operational environments and changing environmental conditions. The performance trade-offs and T&E challenges that arise within this vision of CA2IS underscore the necessity for the proposed in-situ testing method.

    Speaker Info:

    Stuart Harshbarger

    Chief Technology Officer

    Assured Intelligence, LLC

    Stuart Harshbarger is the Co-founder and Chief Technology Officer of Assured Intelligence.ai

    Stuart previously served as the Chief Technology Officer of two prior start-up ventures and has held various leadership roles in government and defense R&D programs. Most recently, he served the US Government in respective Technical Director and Innovation Leadership roles where led a number of Artificial Intelligence program initiatives and was responsible for promoting best practices for transitioning emerging research outcomes into enterprise operations.

    Assured Intelligence was envisioned to help facilitate and accelerate well-informed AI adoption through broad community partnerships with a primary focus on AI/ML Assurance methods.

  • UQ for spacecraft thermal modeling during design and ground operations

    Abstract:

    We explore an uncertainty quantification (UQ) and design optimization pipeline for thermal analysis of spacecraft by incorporating an industry standard commercial tool, Thermal Desktop (TD), with the Sandia Dakota framework. Optimizing design elements manually can be tedious and engineers often rely on their own strategy, leading to non-standardized and often unscalable practices. Similarly, timely and efficient uncertainty quantification is limited by the lack of standard interfaces to solvers and a lack of access to robust and reliable uncertainty routines. Sandia’s Dakota framework solves many of the challenges, free of charge, and has detailed documentation. We leverage Dakota’s UQ and optimizations routines by connecting it to TD’s API called OpenTD. We explore the challenges and viability of this pipeline for performing thermal design optimization and thermal uncertainty quantification on a simplified satellite model.

    Speaker Info:

    Zaki Hasnain

    NASA JPL

    Dr. Zaki Hasnain is a data scientist in NASA JPL’s Systems Engineering Division where he participates in and leads research and development tasks for autonomous space exploration. His research interests include physics informed machine learning and system health management for autonomous systems. He has experience developing data-driven, game-theoretic, probabilistic, physics-based, and machine learning models and algorithms for space, cancer, and autonomous systems applications.

  • Using a Bayesian Approach to Quantify Pilot's Physiological State for Adaptive Automation

    Abstract:

    As technology becomes increasingly capable in modern defense and aerospace systems, the human operator is often forced into a state of information overload. Advancements in technology, such as AI-enabled automation, has offset the cognitive overload to an automated agent; however, previous research has shown that as automation increases, situational awareness may decrease as the human moves out-of-the-loop. The interplay between fully autonomous and fully manual systems introduces a dichotomy of placing the operator either in cognitive overload (fully manual) or cognitive underload (fully autonomous), and it is important to design systems such that human agents and automated agents can collaborate optimally. Cognitive workload is difficult to measure and is often captured subjectively and intrusively (e.g., NASA-TLX); therefore, our work focuses on a non-intrusive means of capturing operator state through physiological measurement. Physiological measures, such as heart rate (HR) and heart rate variability (HRV), have shown to be associated with cognitive workload; therefore, the objective of our work is to develop a Bayesian statistical model to quantify physiological state to determine a quantifiable trigger for turning on and off automation in the flight deck.

    An experimental study with 45 participants was performed using the OpenMATB environment which is an open-source version of NASA’s Multi-Attribute Task Battery (MATB). The sub-tasks in the OpenMATB environment were either in automated and manual mode, and participants interacted with the interface under 5 levels of automation ranging from no automation (level 0), partially automated (levels 1-3), and fully automated (level 4). Automation reliability was treated as a between-subjects factor with three levels: 50%, 70%, and 99.9% reliable.

    A hierarchical Bayesian model was built to quantify HR data collected from an ECG strap worn by participants. The Normal-Normal conjugacy was utilized to create posterior estimates for each participant’s HR under different experimental conditions. The hierarchical model produces posterior estimates for group level (e.g. reliability condition) and participant level (e.g. automation level) factors. As the model is fed increasingly specific physiological datasets, it quantifies more of the uncertainty associated with continuous HR data. Due to its hierarchical nature, the final output of the model shows the individual differences in HR as it depends on automation level and reliability condition. Therefore, our model is able to quantify current physiological state as well as predict how the operator may change physiologically in the future as the automation and reliability change.

    The research team is currently exploring how to utilize this model to trigger adaptive automation for defense and aerospace applications using pilot and soldier data. The current work shows that we can predict future physiological state; therefore, future work will investigate how to then adjust the automation level if the operator’s physiological state trends towards cognitive overload or cognitive underload. The long-term goal of this work is to design and develop an adaptive automation system that utilizes a non-invasive physiological and neurological data collection approach to quantify cognitive workload and adjust automation to fit the individual operator.

    Speaker Info:

    Katie Jurewicz

    Assistant Professor, School of Industrial Engineering and Management

    Oklahoma State University

    Dr. Katie Jurewicz is an assistant professor in the School of Industrial Engineering and Management at Oklahoma State University. Dr. Jurewicz received her BS, MS, and PhD at Clemson University in Industrial Engineering. She is the director of the Human-Systems Engineering and Applied Statistics (HSEAS) Lab at OSU. Dr. Jurewicz’s research focus is human factors engineering, and her work seeks to understand the complex interactions between humans and technology in various industries. Her research lab specifically uses quantitative and analytical approaches to study cognitive engineering, psychophysiology, human-automation interaction, and AI/ML-enabled automation in healthcare, aerospace, and defense applications.

    The authors on this abstract are: Ainsley Kyle (PhD student, Industrial Engineering and Management), Brock Rouser (PhD student, Mechanical and Aerospace Engineering), Dr. Ryan Paul (Assistant Professor, Mechanical and Aerospace Engineering, and Dr. Katie Jurewicz (Assistant Professor, Industrial Engineering and Management)

  • Using the Logistics Composite Model (LCOM) and Bayesian Statistics to Evaluate the B 21

    Abstract:

    • Jerome Perkins
    • Using the Logistics Composite Model (LCOM) and Bayesian Statistics to Evaluate the B 21 / B-52J Availability in Operational Test
    • Test and Evaluation (Suitability)
    • Abstract
    This presentation addresses the integration of the Logistics Composite Model (LCOM) and Bayesian statistics within the B-21/B-52J Operational Test and Evaluation (OT&E). With the increasing complexity of modern defense systems, ensuring reliability and readiness through comprehensive testing and evaluation is crucial. The Logistics Composite Model provides a framework for assessing availability, logistics support and system sustainment, while Bayesian statistics offer a method for incorporating prior knowledge and managing uncertainties in the evaluation process. By combining these methodologies, the presentation will showcase how to enhance analysis, leading to more accurate predictions of system performance and potential shortfalls. Attendees will be introduced to techniques and examples demonstrating the effective application of LCOM and Bayesian approaches in operational test and evaluation. The goal is to equip defense analysts and testers with tools and techniques to improve evaluation of systems with small sample sizes and limited test time to ensure operational suitability.

    Speaker Info:

    Jerome Perkins

    Operations Manager

    AFOTEC Detachment 5

    Jerome F. Perkins
    Mr. Jerome F. Perkins is the Operations Manager, Air Force Operational Test and Evaluation Center Detachment 5, Edwards Air Force Base, California. Mr. Perkins provides technical support regarding the test and evaluation of the B-21, B-52 and other major acquisition programs within Detachment 5 portfolio. With a passion for developing tools that allow for executing of realistic, objective operational test and evaluation results, Mr. Perkins has made significant contributions to operational safety, suitability, and effectiveness of Air Force and joint warfighting capabilities.
    Mr. Perkins entered the Air Force in 1977 and assigned to Barksdale Air Force Base, Louisiana. He holds a Bachelor of Science in Electronic Engineering Technology. Mr. Perkins has served staff positions at Strategic Air Command and Air Combat Command. He has been an operational test director for the Common Low Observable Verification System at Detachment 5. As an operational tester, he has provided test support for the U-2, RQ-4, MQ-9, C-5, C-135, C-17, F-16, F-35, F-22, B-2, CV-22, C-17, B-52, C-130J, B-1, B-2, SR-71, and B-21. Mr. Perkins is a seasoned operational tester with over 48 years of operational experience and 30 years of operational test experience.

  • Speaker Info:

  • A Bayesian Approach for Credible Modeling, Simulation, and Analysis

    Abstract:

    During the 2016 Conference on Applied Statistics in Defense (CASD), we presented a paper describing “The DASE Axioms.” The paper included several “divide-and-conquer” strategies for addressing the Curse of Dimensionality that is typical in simulated systems.

    Since then, a new, integrate-and-conquer approach has emerged, which applies decision-theoretic concepts from Bayesian Analysis (BA). This paper and presentation re-visit the DASE axioms from the perspective of BA.

    Over the past fifteen years, we have tailored and expanded conventional design-of-experiments (DOE) principles to take advantage of the flexibility that is offered by modeling, simulation, and analysis (MSA). The result is embodied within three, high-level checklists: (a) the Model Description and Report (MDR) protocol enables iteratively developing credible models and simulation (M&S) for an evolving intended use; (b) the 7-step Design & Analysis of Simulation Experiments (DASE) protocol guides credible M&S usage; and (c) the Bayesian Analysis (BA ) protocol enables fully quantifying the uncertainty that accumulates, both when building and using M&S.

    When followed iteratively by all MSA stakeholders throughout the product lifecycle, the MSA protocols result in effective and efficient risk-informed decision making.

    The paper and presentation include several quantitative examples to show how the three MSA protocols interact. For example, we show how to use BA to combine Sim and field data for calibrating M&S. Thereafter, given a well-specified query, adaptive sampling is illustrated for optimizing usage of high-performance computing (HPC), either to minimize resources required to answer a specific query, or to maximize HPC utilization within a fixed time period.

    The Bayesian approach to M&S development and usage reflects a shift in perspective, from viewing MSA as mainly a design tool, to being a digital test and evaluation venue. This change renders fully relevant all of the attendant operational constraints and associated risks regarding M&S scheduling, availability, cost, accuracy, and delay in analyzing inappropriately large HPC data sets. The MSA protocols employ statistical models and other aspects of Scientific Test and Analysis Techniques (STAT) that are being taught and practiced within the operational test and evaluation community.

    Speaker Info:

    Terril Hurst

    Senior Engineering Fellow

    Raytheon

    Terril Hurst has worked at Raytheon since 2005, with focus on rigorous methods for developing and using modeling and simulation resources. He teaches several courses, including a Johns Hopkins/Raytheon course in credible modeling and simulation, and internal courses for DASE, Bayesian Networks, and Bayesian Analysis.  Terril has attended and contributed regularly for fifteen years to DATAWorks, CASD, and ACAS.   

    Prior to joining Raytheon, Dr. Hurst worked for 27 years at Hewlett-Packard Laboratories on developing and testing computer data storage devices and distributed systems. He obtained all of his degrees at Brigham Young University and completed a post-doctoral appointment in artificial intelligence at Stanford University.  

    Terril and his wife Mary have six children and seventeen grandchildren, who he includes in his amateur astronomy and model rocket hobbies.  

     

  • A Comparative Analysis of AI Topic Coverage Across Degree Programs

    Abstract:

    This study employs cosine similarity topic modeling to analyze the curriculum content of AI (Artificial Intelligence) bachelor’s and master’s degrees, comparing them with Data Science bachelor’s and master’s degrees, as well as Computer Science (CS) bachelor’s degrees with concentrations in AI. 97 programs total were compared. 52 topics of interest were identified at the course level. The analysis creates a representation for each of the 52 identified topics by compiling course descriptions whose course title matches into a bag-of-words. Cosine similarity is employed to compare the topic coverage of each program against all course descriptions of required courses from within that program.

    Subsequently, Kmeans and Hierarchical clustering methods are applied to the results to investigate potential patterns and similarities among the programs. The primary objective was to discern whether there are distinguishable differences in the topic coverage of AI degrees in comparison to CS bachelor’s degrees with AI concentrations and Data Science degrees.
    The findings reveal a notable similarity between AI bachelor’s degrees and CS bachelor’s degrees with AI concentrations, suggesting a shared thematic focus. In contrast, both AI and CS bachelor’s programs exhibit distinct dissimilarities in topic coverage when compared to Data Science bachelor’s and master's degrees. A notable difference being that the Data Science degrees exhibit much higher coverage of math and statistics than the AI and CS bachelor’s degrees. This research contributes to our understanding of the academic landscape, and helps scope the field as public and private interest into AI is at an all-time high.

    Speaker Info:

    Jacob Langley

    Data Science Fellow II

    IDA

    Jacob Langley is a Data Science Fellow at the Institute for Defense Analyses (IDA). He holds a master's degree in economics and a graduate certificate in statistics. At IDA, Jacob has been serving as an AI researcher for the Science and Technology Policy Institute and assists the Chief Digital AI Office (CDAO) of the DoD.

  • A Framework for OT&E of Rapidly Changing Software Systems: C3I and Business Systems

    Abstract:

    Operational test and evaluation (OT&E) of a system provides the opportunity to examine how representative individuals and units use the system to accomplish their missions, and complements functionality-focused automated testing conducted throughout development. Operational evaluations of software acquisitions need to consider more than just the software itself; they must account for the complex interactions between the software, the end users, and supporting personnel (such as maintainers, help desk staff, and cyber defenders) to support the decision-maker who uses information processed through the software system. We present a framework for meeting OT&E objectives while enabling the delivery schedule for software acquisitions by identifying potential areas for OT&E efficiencies. The framework includes continuous involvement beginning in the early stages of the acquisition program to prepare a test strategy and infrastructure for the envisioned pace of activity during the develop and deploy cycles of the acquisition program. Key early OT&E activities are to acquire, develop, and accredit test infrastructure and tools for OT&E, and embed the OT&E workforce in software acquisition program activities. Early OT&E community involvement in requirements development and program planning supports procedural efficiencies. It further allows the OT&E community to determine whether the requirements address the collective use of the system and include all potential user roles. OT&E during capability development and deployment concentrates on operational testing efficiencies via appropriately scoped, dedicated tests while integrating information from all sources to provide usable data that meets stakeholder needs and informs decisions. The testing aligns with deliveries starting with the initial capability release and continuing with risk-informed approaches for subsequent software deployments.

    Speaker Info:

    Logan Ausman

    Research Staff Member

    IDA

    Logan Ausman is a Research Staff Member at the Institute for Defense Analyses. He has worked on IDA operational evaluation division's project on operational test and evaluation of Joint C3 systems since 2013. His current work also includes supporting projects on operational test and evaluation of major automated information systems, and on test and assurance of artificial intelligence capabilities. Logan earned a PhD in Chemistry from Northwestern University in 2010, with his research focusing on the theory and computational modeling of enhancements of electromagnetic scattering caused by small particles. He earned a BS in Chemistry from the University of Wisconsin-Eau Claire in 2004.

  • A Mathematical Programming Approach to Wholesale Planning

    Abstract:

    The DOD’s materiel commands generally rely on working capital funds (WCFs) to fund their purchases of spares. A WCF insulates the materiel commands against the disruptions of the yearly appropriations cycle, and allows for long-term planning and contracting. A WCF is expected to cover its own costs by allocating its funds judiciously and adjusting the prices it charges to the end customer, but the multi-year lead times associated with most items means that items must be ordered years in advance of anticipated need. Being financially conservative (ordering less) leads to backorders, while minimizing backorders (ordering more) often introduces financial risk by buying items that may not be sold in a timely manner. In this work, we develop an optimization framework that produces a "Buy List" of repairs and procurements for each fiscal year. The optimizer seeks to maximize a financial and readiness-minded objective function subject to constraints such as budget limitations, contract priorities, and historical variability of demand signals. Buy Lists for each fiscal year provide a concrete baseline for examining the repair/procurement decisions of real wholesale planners and comparing performance via simulation of different histories.

    Speaker Info:

    Nikolai Lipscomb

    Research Staff Member

    IDA

    Nikolai is a Research Staff Member at the Institute for Defense Analyses and is part of the Sustainment Group within the Operational Evaluation Division, focusing on tasks for the Naval Supply Systems Command (NAVSUP).

    Prior to IDA, Nikolai was a PhD student at the University of North Carolina at Chapel Hill where he received his doctorate from the Department of Statistics & Operations Research.

    Nikolai's areas of research interest are optimization, supply networks, stochastic systems, and decision problems.

  • A practitioner's framework for federated model V&V resource allocation

    Abstract:

    Recent advances in computation and statistics led to an increasing use of Federated Models for system evaluation. A federated model is a collection of sub models interconnected where the outputs of a sub-model act as inputs to subsequent models. However, the process of verifying and validating federated models is poorly understood and testers often struggle with determining how to best allocate limited test resources for model validation. We propose a graph-based representation of federated models, where the graph encodes the connections between sub-models. Vertices of the graph are given by sub-models. A directed edge between vertex a and b is drawn if a inputs into b. We characterize sub-models through vertex attributes and quantify their uncertainties through edge weights. The graph-based framework allows us to quantify the uncertainty propagated through the model and optimize resource allocation based on the uncertainties.

    Speaker Info:

    Dhruv Patel

    Research Staff Member

    IDA

    Dhruv is a research staff member at the Institute of Defense Analyses and obtained his Ph.D. in Statistics and Operations Research from the University of North Carolina at Chapel Hill. 

  • A preview of functional data analysis for modeling and simulation validation

    Abstract:

    Modeling and simulation (M&S) validation for operational testing often involves comparing live data with simulation outputs. Statistical methods known as functional data analysis (FDA) provides techniques for analyzing large data sets ("large" meaning that a single trial has a lot of information associated with it), such as radar tracks. We preview how FDA methods could assist M&S validation by providing statistical tools handling these large data sets. This may facilitate analyses that make use of more of the data available and thus allows for better detection of differences between M&S predictions and live test results. We demonstrate some fundamental FDA approaches with a notional example of live and simulated radar tracks of a bomber’s flight.

    Speaker Info:

    Curtis Miller

    Research Staff Member

    IDA

    Dr. Curtis Miller is a research staff member of the Operational Evaluation Division at the Institute for Defense Analyses. In that role, he advises analysts on effective use of statistical techniques, especially pertaining to modeling and simulation activities and U.S. Navy operational test and evaluation efforts, for the division's primary sponsor, the Director of Operational Test and Evaluation. He obtained a PhD in mathematics from the University of Utah and has several publications on statistical methods and computational data analysis, including an R package. In the past, he has done research on topics in economics including estimating difference in pay between male and female workers in the state of Utah on behalf of Voices for Utah Children, an advocacy group.

  • A Statistical Framework for Benchmarking Foundation Models with Uncertainty

    Abstract:

    Modern artificial intelligence relies upon foundation models (FMs), which are prodigious, multi-purpose machine learning models, typically deep neural networks, trained on a massive data corpus. Many benchmarks assess FMs by evaluating their performances on a battery of tasks for which the FMs are adapted to solve, but uncertainty is usually not accounted for in such benchmarking practices. This talk will present statistical approaches for performing uncertainty quantification with benchmarks meant to compare FMs. We demonstrate bootstrapping of task evaluation data, Bayesian hierarchical models for task evaluation data, rank aggregation techniques, and visualization of model performance under uncertainty with different task weightings. The utility of these statistical approaches is illustrated with real machine learning benchmark data, and a crucial finding is that the incorporation of uncertainty leads to less clear-cut distinctions in FM performance than would otherwise be apparent.

    Speaker Info:

    Giri Gopalan

    Scientist

    Los Alamos

    Giri Gopalan is a staff scientist in the statistical sciences group at Los Alamos National Laboratory. His current research interests include statistics in the physical sciences, spatial and spatiotemporal statistics, and statistical uncertainty quantification. Prior to his present appointment, Giri was Assistant Professor at
    California Polytechnic State University and a Visiting Assistant Professor at the University of California Santa Barbara, both in statistics. He has taught courses such as time series, mathematical statistics, and probability for engineering students.

  • Adaptive Sequential Experimental Design for Strategic Reentry Simulated Environment

    Abstract:

    To enable the rapid design and evaluation of survivable reentry systems, the Johns Hopkins University Applied Physics Laboratory (JHU/APL) developed a simulation environment to quickly explore the reentry system tradespace. As part of that effort, a repeatable process for designing and assessing the tradespace was implemented utilizing experimental design and statistical modeling techniques. This talk will discuss the utilization of the fast flexible filling experimental design and maximum value-weighted squared error (MaxVSE) adaptive sequential experimental design methods and Gaussian Process modeling techniques for assessing features that impact reentry system trajectories and enabling continuous model refinements. The repeatable scripts used to implement these methods allow for integration into other software tools for a complete end-to-end simulation of reentry systems.

    Speaker Info:

    Kelly Koser

    Senior Project Manager & Statistician

    Johns Hopkins University Applied Physics Laboratory

    Kelly Koser currently serves as a senior project manager and statistician at the Johns Hopkins University Applied Physics Laboratory (JHU/APL).  She has 15 years of experience conducting system test & evaluation (T&E), experimental design, statistical analysis and modeling, program evaluation, and strategic program development activities.  Ms. Koser currently supports a variety of U.S. Navy and U.S. Air Force weapon system evaluation activities and reliability assessments.  She also leads the investigation and T&E of promising screening technologies to ensure public spaces protection for the Transportation Security Administration (TSA).  Ms. Koser holds a Bachelor of Science in Mathematical Sciences from Carnegie Mellon University, a Master of Science in Applied and Computational Mathematics from Johns Hopkins University, and a graduate certificate in Engineering Management from Drexel University. 

  • ADICT - A Power BI visualization tool used to transform budget forecasting.

    Abstract:

    AVATAR Dynamic Interactive Charting Tool (ADICT) is an advanced Power BI tool that has been developed by the National Aeronautics and Space Administration (NASA) to transform budget forecasting and assist in critical budget decision-making. This innovative tool leverages the power of the M language within Power BI to provide organizations with a comprehensive 15-year budget projection system that ensures real-time accuracy and efficiency. It is housed in Power BI for the capabilities to simultaneously update to the model from our excel file named AVATAR.

    One of the standout features of ADICT is its capability to allow users to define and apply rate changes. This feature empowers organizations, including NASA, to customize their budget projections by specifying rate variations, resulting in precise and adaptable financial forecasting. NASA integrates ADICT to SharePoint to house the model to avoid local drives and seamlessly update the model to adjust for any scenario. This tool is used a scenario-based planner as well, as it can provide support for workforce planning and budget decisions.

    ADICT seamlessly integrates with source Excel sheets, offering dynamic updates as data evolves. This integration eliminates the need for manual data manipulation, enhancing the overall decision-making process. It ensures that financial projections remain current and reliable, enabling organizations like NASA to respond swiftly to changing economic conditions and emerging challenges.
    At its core, ADICT enhances budgeting by transforming complex financial data into interactive visualizations, enabling NASA to gain deeper insights into their financial data and make agile decisions. ADICT also seamlessly integrates with source Excel sheets, offering dynamic updates as data evolves. This integration eliminates the need for manual data manipulation, enhancing the overall decision-making process. It ensures that financial projections remain current and reliable for organizations to respond swiftly to changing economic conditions and emerging challenges.

    Speaker Info:

    Mohammad Ahmed

    NASA Program Data Analyst

    OCFO

    Mohammad is a data analyst with over 10 years of experience in the field. Started out his career in the insurance industry as a data analyst in the actuarial field working on various risk-based projects and models which include Catastrophe modeling, pricing, program reviewing and reporting. He also has worked supporting Homeland Security in a Data role that also coincided with Data architecture and system administration.

    In his current role, he is supporting the Strategic Insights and Budget division within the Office of Chief Financial Officer with scenario forecasting and creating new tools and dashboard's as well as all technical needs.

  • Advancing Edge AI: Benchmarking ResNet50 for Image Classification on Diverse Hardware Platforms

    Abstract:

    The ability to run AI at the edge can be transformative for applications that need to process data to make decisions at the location where sensing and data acquisition takes place. Deep neural networks (DNNs) have a huge number of parameters and consist of many layers, including nodes and edges that contain mathematical relationships that need to be computed when the DNN is run during deployment. This is why it is important to benchmark DNNs on edge computers which are constrained in hardware resources and usually run on a limited supply of battery power. The objective of our NASA funded project which is aligned to the mission of robotic space exploration is to enable AI through fine-tuning of convolutional neural networks (CNNs) for extraterrestrial terrain analysis. This research currently focuses on the optimization of the ResNet50 model, which consists of 4.09 GFLOPs and 25.557 million parameters, to set performance baselines on various edge devices using the Mars Science Laboratory (MSL) v2.1 dataset. Although our initial focus is on Martian terrain classification, the research is potentially impactful for other sectors where efficient edge computing is critical.

    We addressed a critical imbalance in the dataset by augmenting the underrepresented class of with an additional 167 images, improving the model's classification accuracy substantially. Pre-augmentation, these images were frequently misclassified as another class, as indicated by our confusion matrix analysis. Post-augmentation, the fine-tuned ResNet50 model achieved an exceptional test accuracy of 99.31% with a test loss of 0.0227, setting a new benchmark for similar tasks.

    The core objective of this project extends beyond classification accuracy; it aims to establish a robust development environment for testing efficient edge AI models suitable for deployment in resource-constrained scenarios. The fine-tuned ResNet50-MSL-v2.1 model serves as a baseline for this development. The model was converted into a TorchScript format to facilitate cross-platform deployment and inference consistency.

    Our comprehensive cross-platform evaluation included four distinct hardware configurations, chosen to mirror a variety of deployment scenarios. The NVIDIA Jetson Nano achieved an average inference time of 62.04 milliseconds with 85.83% CPU usage, highlighting its utility in mobile contexts. An Intel NUC with a Celeron processor, adapted for drone-based deployment, registered an inference time of 579.87 milliseconds at near-maximal CPU usage of 99.77%. A standard PC equipped with an RTX 3060 GPU completed inference in just 6.52 milliseconds, showcasing its capability for high-performance, stationary tasks. Lastly, an AMD FX 8350 CPU-only system demonstrated a reasonable inference time of 215.17 milliseconds, suggesting its appropriateness for less demanding edge computing applications.

    These results not only showcase the adaptability of ResNet50 across diverse computational environments but also emphasize the importance of considering both model complexity and hardware capabilities when deploying AI at the edge. Our findings indicate that with careful optimization and platform-specific tuning, it is possible to deploy advanced AI models like ResNet50 effectively on a range of hardware, from low-power edge devices to high-performance ground stations. Our ongoing research will use these established baselines to further explore efficient AI model deployment in resource-constrained setting.

    Speaker Info:

    Matthew Wilkerson

    Undergraduate Researcher

    Intelligent Systems Laboratory, Fayetteville State University

    Matthew Wilkerson is an undergraduate student at Fayetteville State University, majoring in computer science. He is an undergraduate researcher at the university’s Intelligent Systems Laboratory, where he is assigned to two NASA-funded research projects involving deep neural networks.  

    Prior to attending Fayetteville State University, Matthew served in the U.S. Army for over 22 years. He had various assignments, to include:  Paralegal NCO, 1st Battalion, 75th Ranger Regiment, Hunter Army Airfield, Georgia (three deployments in support of Operation Iraqi Freedom); Paralegal NCO, 1st Battalion, 5th Special Forces Group (Airborne), Fort Campbell, Kentucky (three deployments in support of Operation Iraq Freedom); Instructor/Writer, 27D Advanced Individual Training, Fort Jackson, South Carolina; Senior Instructor/Writer and later Course Director, 27D Advanced Individual Training, Fort Lee, Virginia; Senior Paralegal NCO, 2nd Infantry Brigade Combat Team, 3rd Infantry Division, Fort Stewart, Georgia; Senior 27D Talent Management NCO, Personnel, Plans, and Training Office, Office of the Judge Advocate General, Pentagon, Washington, DC; Student, US Army Sergeants Major Academy, Fort Bliss, Texas; and Command Paralegal NCO, 82nd Airborne Division, Fort Liberty, North Carolina.  

    Matthew’s most notable awards and decorations include the Bronze Star Medal, Legion of Merit Medal, Meritorious Service Medal (w/ 2 oak leaf clusters), Valorous Unit Award, Iraqi Campaign Medal (w/ 6 campaign stars), Global War on Terrorism Expeditionary medal, Ranger Tab, Basic Parachutist Badge, and Pathfinder Badge. 

  • Advancing Reproducible Research: Concepts, Compliance, and Practical Applications

    Abstract:

    Reproducible research principles ensure that analyses can be verified and defended by meeting the criterion that conducting the same analysis on the same data should yield identical results. Not only are reproducible analyses more defensible and less susceptible to errors, but they also enable faster iteration and yield cleaner results. In this seminar, we will delve into how to conceptualize reproducible research and explore how reproducible research practices align with government policies. Additionally, we will provide hands-on examples, using Python and MS Excel, illustrating various approaches for conducting reproducible research.

    Speaker Info:

    Boris Chernis

    Research Associate

    IDA

    I have an MS in Computer Science and have been working at the Institute for Defense Analyses since 2020. I have experience in data analytics, machine learning, cyber security, and miscellaneous software engineering.

  • AI for Homeland Security: A Comprehensive Approach for Detecting Sex Trafficking

    Abstract:

    Sex trafficking remains a global problem, requiring new innovations to detect and disrupt such criminal enterprises. Our research project is an application of artificial intelligence (AI) methods, and the knowledge of social science and homeland security to the detection and understanding of the operational models of sex trafficking networks (STNs). Our purpose is to enhance the AI capabilities of software-based detection technologies and support the homeland defense community in detecting and countering human sex trafficking, including the trafficking of underage victims.
    To accomplish this, we propose a novel architecture capable of jointly representing and learning from multiple modalities, including images and text. The interdisciplinary nature of this work involves the fusion of computer vision, natural language processing, and deep neural networks (DNNs) to address the complexities of sex trafficking detection from online advertisements. This research proposes the creation of a software prototype as an extension of the Image Surveillance Assistant (ISA) built by our research team, to focus on cross-modal information retrieval and context understanding critical for identifying potential sex trafficking cases. Our initiative aligns with the objectives outlined in the DHS Strategic Plan, aiming to counter both terrorism and security threats, specifically focusing on the victim-centered approach to align with security threat segments.
    We leverage current AI and machine learning techniques integrated by the project to create a working software prototype. DeepFace, a DNN for biometric analysis of facial image features such as age, race, and gender from images is utilized. Few-shot text classification, utilizing the SciKit Learn Python library and Large Language Models (LLMs), is enabling the detection of written trafficking advertisements. The prime funding agency, the Department of Homeland Security (DHS) mandates the use of synthetic data for this unclassified project, so we have developed code to leverage Application Programming Interfaces (APIs) that connect to LLMs and generative AI for images to create synthetic training and test data for the DNN models. Test and evaluation with synthetic data are the core capabilities of our approach to build prototype software that can potentially be used for real applications with real data.
    Ongoing work includes creating a program to fuse the outputs from AI models on a single advertisement input. The fusion program will provide the numeric value of the likelihood of the class of advertisement, ranging from classes such as legal advertisement to different categories of trafficking. This research project is a potential contribution to the development of deployment-ready software for intelligence agencies, law enforcement, and border security. We currently show high accuracy for detecting advertisements related to victims of specific demographic categories. We have identified areas where increased accuracy is needed, and we are collecting more training data to address those gaps. The AI-based capabilities emerging from our research hold promise for enhancing the understanding of STN operational models, addresses technical challenges of sex trafficking detection, and also emphasizes the broader societal impact and alignment with national security goals.

    Speaker Info:

    Sambit Bhattacharya

    Professor

    Fayetteville State University

    Sambit Bhattacharya is a Computer Scientist with more than 15 years of experience in teaching and research. He has a PhD in Computer Science and Engineering from the State University of New York at Buffalo. He is a tenured Full Professor of Computer Science at Fayetteville State University, North Carolina, USA. In 2023 he was honored by the University of North Carolina (UNC) Board of Governor’s Award for Teaching Excellence. Dr. Bhattacharya is experienced in developing and executing innovative and use-inspired research in Artificial Intelligence and Machine Learning (AIML) with a broad range of techniques and applications, and with multidisciplinary teams. He has more than 60 peer reviewed publications and has delivered 50+ oral presentations, including keynote lectures at conferences. Dr. Bhattacharya works on research in the applications of AIML to geospatial intelligence, computer vision with synthetic data for target recognition, efficiency and latency reducing inferencing for edge computing, automation and manufacturing. Dr. Bhattacharya leads projects funded by the National Science Foundation, the US Department of Defense (DoD), including support for research aligned with interests of the US Intelligence Community (IC), NASA, the North Carolina Department of Transportation, and the University of North Carolina Research Opportunities Initiative. He directs the Intelligent Systems Lab (ISL) at Fayetteville State University which hosts research and houses resources like robotics equipment, and high-performance computing for AIML research. The ISL supports faculty advisors and students, and collaborations with external partners. He is a Senior Member of the Institute of Electrical and Electronics Engineers (IEEE). Beginning in 2014 and while on teaching leave from the university, Dr. Bhattacharya served as faculty research fellow in the following research labs of the DoD and the IC: Naval Research Lab (DC), Army Research Lab (Adelphi, MD), and the National Geospatial Intelligence Agency (NGA). He has been appointed as Visiting Scientist (part-time) at NGA starting 2023. He was Faculty in Residence at Google’s Global HQ in Mountain View, CA in 2017 and he collaborates with industry through grant funded partnerships and consulting opportunities.

  • Analyzing Factors for Starting Pitcher Pull Decision with Survival Analysis

    Abstract:

    Background: Current practice on how starting pitchers are pulled vary, are inconsistent, and not transparent. With the new major league baseball (MLB) data available through Statcast, such decisions can be more consistently supported by combining measured data.
    Methods: To address this gap, we scraped pitch-level data from Statcast, a new technology system that collects real time MLB game measurements using laser technology. Here, we used Statcast data within a Cox regression for survival analysis to identify measurable factors that are associated to pitcher longevity. Measurements from 696,743 pitches were extracted for analysis from the 2021 MLB season. The pitcher was considered “surviving” the pitch if they remained in the game. Mortality was defined as the pitcher’s last pitch. Analysis began at the second inning to account for high variation during the first inning.
    Results: Statistically significant factors include HSR(Hits to Strike Ratio), Runs per batter faced, and total bases per batter faced yielded the highest hazard coefficients (ranging from 10-23), which means higher risk of being relieved.
    Conclusions: Our findings indicate that HSR, runs per batter faced and total bases per batter faced provide decision making information for relieving the starting pitcher.

    Speaker Info:

    Lucas Villanti

    Cadet

    US Military Academy

    Cadet Lucas Villanti is an aspiring undergraduate at the United States Military Academy, West Point, majoring in Applied Statistics and Data Science. With a focus on R, Python, and machine learning, Lucas has specialized in Sabremetrics, analyzing baseball statistics, particularly in starting pitchers, over the last three years.

    As a leader in both the Finance Club and Ski Club, he brings analytical skills and enthusiasm to these groups. Additionally, Lucas has broadened his global perspective through a project in Morocco, where he aided in restoring a soccer field while immersing himself in the local culture. His academic prowess, leadership, and international experiences mark him as a well-rounded and impactful individual.

  • Applications for inspection simulation at NASA

    Abstract:

    The state of the art of numerical simulation of nondestructive evaluations (NDE) has begun to transition from research to application. The simulation software, both commercial and custom, has reached a level of maturity where it can be readily deployed to solve real world problems. The next area of research that is beginning to emerge is determining when and how to NDE simulation should be applied. At NASA Langley Research Center, NDE simulations have already been utilized for several aerospace projects to facilitate or enhance understanding of the inspection optimization and interpretation of results. Researchers at NASA have identified several different scenarios where it is appropriate to utilized NDE simulations. In this presentation, we will describe these scenarios, give examples of each instance, and demonstrate how NDE simulations were applied to solve problems with an emphasis on the mechanics of integrating with other workgroups. These examples will include inspection planning for multi-layer pressure vessels as well as on-orbit inspections.

    Speaker Info:

    Peter Juarez

    Research Engineer

    NASA

    Peter Juarez is a research engineer who specializes in multiple fields of NDE including Design for Inspection (DFI), NDE modeling ultrasound, thermography, guided wave, artificial flaw manufacturing and automated data processing. Peter implements these skills in the fulfillment of both commercial aviation and NASA space programs, such as the Advanced Composites Project (ACP), High Rate Composites Advanced Manufacturing (HiCAM), Orion capsule heat shield inspection, and the Advanced Composite Solar Sail System (ACS3).

  • Assessing the Calibration and Performance of Attention-based Spatiotemporal Neural Network

    Abstract:

    In the last decade, deep learning models have proven capable of learning complex spatiotemporal relations and producing highly accurate short-term forecasts, known as nowcasts. Various models have been proposed to forecast precipitation associated with storm events hours before they happen. More recently, neural networks have been developed to produce accurate lightning nowcasts, using various types of satellite imagery, past lightning data, and other weather parameters as inputs to their model. Furthermore, the inclusion of attention mechanisms into these spatiotemporal weather prediction models has shown increases in the model’s predictive capabilities.

    However, the calibration of these models and other spatiotemporal neural networks is rarely discussed. In general, model calibration addresses how reliable model predictions are, and models are typically calibrated after the model training process using scaling and regression techniques. Recent research suggests that neural networks are poorly calibrated despite being highly accurate, which brings into question how accurate the models are.

    This research develops attention-based and non-attention-based deep-learning neural networks that uniquely incorporate reliability measures into the model tuning and training process to investigate the performance and calibration of spatiotemporal deep-learning models. All of the models developed in this research prove capable of producing lightning occurrence nowcasts using common remotely sensed weather modalities, such as radar and satellite imagery. Initial results suggest that the inclusion of attention mechanisms into the model architecture improves the model’s accuracy and predictive capabilities while improving the model’s calibration and reliability.

    Speaker Info:

    Nathan Gaw

    Assistant Professor of Data Science

    Air Force Institute of Technology

    Dr. Nathan Gaw is an Assistant Professor of Data Science in the Department of Operational Sciences at Air Force Institute of Technology, Wright-Patterson AFB, Ohio, USA. His research develops new statistical machine learning algorithms to optimally fuse high-dimensional, heterogeneous, multi-modality data sources to support decision making in military, healthcare and remote sensing. He received his B.S.E. and M.S. in biomedical engineering and a Ph.D. in industrial engineering from Arizona State University (ASU), Tempe, AZ, USA, in 2013, 2014, and 2019, respectively. Dr. Gaw was a Postdoctoral Research Fellow at the ASU-Mayo Clinic Center for Innovative Imaging (AMCII), Tempe, AZ, USA, from 2019-2020, and a Postdoctoral Research Fellow in the School of Industrial and Systems Engineering (ISyE) at Georgia Institute of Technology, Atlanta, GA, USA, from 2020-2021. He is also chair of the INFORMS Data Mining Society, and a member of IISE and IEEE. For additional information, please visit www.nathanbgaw.com.

  • Automated Tools for Improved Accessibility of Bayesian Analysis Methods

    Abstract:

    Statistical analysis is integral to the evaluation of defense systems throughout the acquisition process. Unlike traditional frequentist statistical methods, newer Bayesian statistical analysis incorporates prior information, such as historical data and expert knowledge, into the analysis for a more integrated approach to test data analysis. With Bayesian techniques, practitioners can more easily decide what data to leverage in their analysis, and how much that data should impact their analysis results. This provides a more flexible, informed framework for decision making in the testing and evaluation of DoD systems.

    However, the application of Bayesian statistical analyses is often challenging due to the advanced statistical knowledge and technical coding experience necessary for the utilization of current Bayesian programming tools. The development of automated analysis tools can help address these barriers and make modern Bayesian analysis techniques available to a wide range of stakeholders, regardless of technical background. By making new methods more readily-available, collaboration and decision-making are made easier and more effective within the T&E community.

    To facilitate this, we have developed a web application with the R Shiny package in R. This application uses an intuitive user interface to enable the implementation of our Bayesian reliability analysis approach by non-technical users without the need for any coding knowledge or advanced statistical background. Users can upload reliability data from the developmental testing and operational testing stages of a system of interest and tweak parameters of their choosing to automatically generate plots and estimates of system reliability performance based on their uploaded data and prior knowledge of system behavior.

    Speaker Info:

    Kyle Risher

    Undergraduate Research Intern

    Virginia Tech National Security Institute

    Kyle Risher is an undergraduate research intern at the Virginia Tech National Security Institute. His research has been focused on the creation of tools for automating defense system reliability analysis. He is currently a Senior pursuing a B.S. in Statistics from Virginia Tech. After graduation, he will be taking on a Naval Warfighting Analyst role at the Naval Surface Warfare Center in Carderock, Maryland.

  • Bayesian Design of Experiments and Parameter Recovery

    Abstract:

    With recent advances in computing power, many Bayesian methods that were once impracticably expensive are becoming increasingly viable. Parameter recovery problems present an exciting opportunity to explore some of these Bayesian techniques. In this talk we briefly introduce Bayesian design of experiments and look at a simple case study comparing its performance to classical approaches. We then discuss a PDE inverse problem and present ongoing efforts to optimize parameter recovery in this more complicated setting. This is joint work with Justin Krometis, Nathan Glatt-Holtz, Victoria Sieck, and Laura Freeman.

    Speaker Info:

    Christian Frederiksen

    PhD Student

    Tulane University

    Christian is a 5th year mathematics PhD student at Tulane University expecting to graduate in 2024.  His research involves a combination of Bayesian statistics, partial differential equations, and Markov Chain Monte Carlo and includes both abstract and applied components.  After working at the Virginia Tech National Security Institute during the summer of 2023 where his work largely focused on Bayesian design of experiments he is interested in pursuing a career in testing and evaluation.

  • Bayesian Projection Pursuit Regression

    Abstract:

    In projection pursuit regression (PPR), a univariate response variable is approximated by the sum of M "ridge functions," which are flexible functions of one-dimensional projections of a multivariate input variable. Traditionally, optimization routines are used to choose the projection directions and ridge functions via a sequential algorithm, and M is typically chosen via cross-validation. We introduce a novel Bayesian version of PPR, which has the benefit of accurate uncertainty quantification. To infer appropriate projection directions and ridge functions, we apply novel adaptations of methods used for the single ridge function case (M=1), called the Bayesian Single Index Model; and use a Reversible Jump Markov chain Monte Carlo algorithm to infer the number of ridge functions $M$. We evaluate the predictive ability of our model in 20 simulated scenarios and for 23 real datasets, in a bake-off against an array of state-of-the-art regression methods. Finally, we generalize this methodology and demonstrate the ability to accurately model multivariate response variables. Its effective performance indicates that Bayesian Projection Pursuit Regression is a valuable addition to the existing regression toolbox.

    Speaker Info:

    Gavin Collins

    R&D Statistician

    Sandia National Laboratories

    Gavin received a joint BS/MS degree in statistics from Brigham Young University in 2018, then went on to complete a PhD in statistics at Ohio State University in 2023. He recently started as a full-time R&D statistician at Sandia National Laboratories in Albuquerque, New Mexico. His research interests include Bayesian statistics, nonparametric regression, functional data analysis, and emulation and calibration of computational models.

  • Bayesian Reliability Growth Planning for Discrete Systems

    Abstract:

    Developmental programs for complex systems with limited resources often face the daunting task of predicting the time needed to achieve system reliability goals.

    Traditional reliability growth plans rely heavily on operational testing. They use confidence estimates to determine the required sample size, and then work backward to calculate the amount of testing required during the developmental test program to meet the operational test goal and satisfy a variety of risk metrics. However, these strategies are resource-intensive and do not take advantage of the information present in the developmental test period.

    This presentation introduces a new method for projecting the reliability growth of a discrete, one-shot system. This model allows for various corrective actions to be considered, while accounting for both the uncertainty in the corrective action effectiveness and the management strategy used to parameterize those actions. Solutions for the posterior distribution on the system reliability are found numerically, while allowing for a variety of prior distributions on the corrective action effectiveness and the management strategy. Additionally, the model can be extended to account for system degradation across testing environments. A case study demonstrates how this model can use historical data with limited failure observations to inform its parameters, making it even more valuable for real-world applications.

    This work builds upon previous research in Reliability Growth planning from Drs. Brian Hall and Martin Wayne.

    Speaker Info:

    Harris Bernstein

    Data Scientist

    Johns Hopkins University Applied Physics Lab

    Harris Bernstein is a data scientist with experience in implementing complex machine learning models and data visualization techniques while collaborating with scientist and engineers across varying domains.

     

    He currently serves as a Senior Data Scientist at the Johns Hopkins University Applied Physics Lab in the System Performance Analysis Group. His current research includes optimal experimental designs for machine learning methods and incorporating statistical models inside of digital engineering environments.

     

    Previously he worked on the Large Hadron Collider beauty (LHCb) experiment at Syracuse University as part of the Experimental Particle Physics Group. There he was the principle analyst on a branching fraction measurement that incorporated complex statistical modeling that accounted several different kinds of physical processes.

     

    Education:

     

    Doctor of Philosophy. Physics. Syracuse University. New York.

     

    Bachelor of Science. Physics. Pennsylvania State University. Pennsylvania.

  • Best of Both Worlds: Combining Parametric Cost Risk Analysis with Earned Value Management

    Abstract:

    Murray Cantor, Ph.D., Cantor Consulting
    Christian Smart, Ph.D., Jet Propulsion Laboratory, California Institute of Technology

    Cost risk analysis and earned value data are typically used separately and independently to estimate Estimates at Completion (EAC). However, there is significant value to combining the two in order to improve the accuracy of EAC forecasting. In this paper, we provide a rigorous method for doing this using Bayesian methods.
    In earned value management (EVM), the Estimate at Completion (EAC) is perhaps the critical metric. It is used to forecast the effort’s total work cost as it progresses. In particular, it is used to see if the work is running over or under its planned budget, specified as the budget at completion (BAC).
    Separate probability distribution functions (PDF) of the EAC at the onset of the effort and after some activities have been completed show the probability that EAC will fall within the BAC, and, conversely, the probability it won’t. At the onset of an effort, the budget is fixed, and the EAC is uncertain. As the work progresses, some of the actual costs (AC) are reported. The EAC uncertainty should then decrease and the likelihood of meeting the budget should increase. If the area under the curve to the left of the BAC decreases as the work progresses, the budget is in jeopardy, and some management action is warranted.
    This paper will explain how to specify the initial PDF and learn the later PDFs from the data tracked in EVM. We describe the technique called Bayesian parameter learning (BPL). We chose this technique because it is the most robust for exploiting small sets of progress data and is most easily used by practitioners. This point will be elaborated further in the paper.

    Speaker Info:

    Christian Smart

    Dr. Christian Smart is a Senior Programmatic Risk Analyst with NASA’s Jet Propulsion Laboratory. He has experience supporting both NASA and the Department of Defense in the theory and application of risk, cost, and schedule analytics for cutting-edge programs, including nuclear propulsion and hypersonic weapon systems. For several years he served as the Cost Director for the Missile Defense Agency. An internationally recognized expert on risk analysis, he is the author of Solving for Project Risk Management: Understanding the Critical Role of Uncertainty in Project Management (McGraw-Hill, 2020).

    Dr. Smart received the 2021 Frank Freiman lifetime achievement award from the International Cost Estimating and Analysis Association. In 2010, he received an Exceptional Public Service Medal from NASA for the application of risk analysis. Dr. Smart was the 2009 recipient of the Parametrician of the Year award from the International Society of Parametrics Analysts. Dr. Smart has BS degrees in Mathematics and Economics from Jacksonville State University, an MS in Mathematics from the University of Alabama in Huntsville (UAH), and a PhD in Applied Mathematics from UAH. 


  • Best of Both Worlds: Combining Parametric Cost Risk Analysis with Earned Value Management

    Abstract:

    Murray Cantor, Ph.D., Cantor Consulting
    Christian Smart, Ph.D., Jet Propulsion Laboratory, California Institute of Technology

    Cost risk analysis and earned value data are typically used separately and independently to estimate Estimates at Completion (EAC). However, there is significant value to combining the two in order to improve the accuracy of EAC forecasting. In this paper, we provide a rigorous method for doing this using Bayesian methods.
    In earned value management (EVM), the Estimate at Completion (EAC) is perhaps the critical metric. It is used to forecast the effort’s total work cost as it progresses. In particular, it is used to see if the work is running over or under its planned budget, specified as the budget at completion (BAC).
    Separate probability distribution functions (PDF) of the EAC at the onset of the effort and after some activities have been completed show the probability that EAC will fall within the BAC, and, conversely, the probability it won’t. At the onset of an effort, the budget is fixed, and the EAC is uncertain. As the work progresses, some of the actual costs (AC) are reported. The EAC uncertainty should then decrease and the likelihood of meeting the budget should increase. If the area under the curve to the left of the BAC decreases as the work progresses, the budget is in jeopardy, and some management action is warranted.
    This paper will explain how to specify the initial PDF and learn the later PDFs from the data tracked in EVM. We describe the technique called Bayesian parameter learning (BPL). We chose this technique because it is the most robust for exploiting small sets of progress data and is most easily used by practitioners. This point will be elaborated further in the paper.

    Speaker Info:

    Murray Cantor

    Murray Cantor is a retired IBM Distinguished Engineer. With his Ph.D. in mathematics from the University of California at Berkeley and extensive experience in managing complex, innovative projects, he has focused on applying predictive reasoning and causal analysis to the execution and economics of project management.
    In addition to many journal articles, Murray is the author of two books: Object-Oriented Project Management with UML and Software Leadership. He is an inventor of 15 IBM patents. After retiring from IBM, he was a founder and lead scientist of Aptage, which developed and delivered tools for learning and tracking the probability of meeting project goals. Aptage was sold to Planview.
    Dr. Cantor’s quarter-century career with IBM included two periods:
    • An architecture and senior project manager for the Workstation Division and
    • An IBM Distinguished Engineer in the Software Group and an IBM Rational CTO team member.

    The second IBM stint began with IBM acquiring Rational Software, where Murray was the Lead Engineer for Rational Services. In that role, he consulted on delivering large projects at Boeing, Raytheon, Lockheed, and various intelligence agencies. He was the IBM representative to SysML partners who created the Object Management Group’s System Modeling Language standard. While at Rational, He was the lead author of the Rational Unified Process for System Engineering (RUPSE).
    Before joining Rational, he was project lead at the defense and intelligence contractor TASC, delivering systems for Space Command.

  • Beyond the Matrix: The Quantitative Cost and Schedule Risk Management Imperative

    Abstract:

    In the modern world, we underappreciate the role of uncertainty and tend to be blind to risk. As the Nobel Prize–winning economist Kenneth Arrow once wrote, “Most individuals underestimate the uncertainty of the world . . . our knowledge of the way things work, in society or in nature, comes trailing clouds of vagueness.” The resistance to recognition of risk and uncertainty includes project management. As a colleague of mine aptly put it, “Project management types, especially, have a tendency to treat plans as reality.” However, projects of all types – weapon systems, robotic and human space efforts, dams, tunnels, bridges, the Olympics, etc. – experience regular and often extreme cost growth and schedule delays. Cost overruns occur in 80% or more of project development efforts and schedule delays happen in 70% or more. For many types of projects, average cost growth is 50% or more and these costs double in more than one in every six. These widespread and enduring increases reflect the high degree of risk inherent in projects. If there were no recurring history of cost and schedule growth, there would be no need for resource risk analysis and management. The planning of such projects would be as easy as planning a trip to a local dry cleaner. Instead, the tremendous risk inherent in such projects necessitates the consideration of risk and uncertainty throughout a program’s life cycle.
    In practice the underestimation of project risk manifests itself in one of four ways. First, projects often completely ignore variation and rely exclusively on point estimates. As we will demonstrate, overlooking risk in the planning stages guarantees cost growth and schedule delays. Second, even when variation is considered, there is often an exclusive reliance on averages. We will show that there is much more to risk than simple averages. Third, even when the potential consequences are considered, risk matrices are often used in place of a rigorous quantitative analysis overreliance on risk matrices. We will provide proof that qualitative risk matrices underestimate the true degree of uncertainty. Fourth, an often-overlooked weakness in quantitative applications is the human element. There is an innate bias early in a project’s lifecycle to perceive less risk than is present which leads to a significant underestimation in uncertainty until late in a project’s development, at which point many of the risks a project faces have either been avoided, mitigated, or confronted. Even the application of best practices in risk analysis often lead to uncertainty ranges that are significantly tighter than indicated by history. Reasons for this phenomenon are discussed and the calibration of risk analysis outputs to historical cost growth and schedule delays is presented as a remedy.

    Speaker Info:

    Christian Smart

    Senior Programmatic Risk Analyst

    JPL

    Dr. Christian Smart is a Senior Programmatic Risk Analyst with NASA’s Jet Propulsion Laboratory. He has experience supporting both NASA and the Department of Defense in the theory and application of risk, cost, and schedule analytics for cutting-edge programs, including nuclear propulsion and hypersonic weapon systems. For several years he served as the Cost Director for the Missile Defense Agency. An internationally recognized expert on risk analysis, he is the author of Solving for Project Risk Management: Understanding the Critical Role of Uncertainty in Project Management (McGraw-Hill, 2020).

    Dr. Smart received the 2021 Frank Freiman lifetime achievement award from the International Cost Estimating and Analysis Association. In 2010, he received an Exceptional Public Service Medal from NASA for the application of risk analysis. Dr. Smart was the 2009 recipient of the Parametrician of the Year award from the International Society of Parametrics Analysts. Dr. Smart has BS degrees in Mathematics and Economics from Jacksonville State University, an MS in Mathematics from the University of Alabama in Huntsville (UAH), and a PhD in Applied Mathematics from UAH. 

  • Bringing No-Code Machine Learning to the average user

    Abstract:

    In the rapidly evolving landscape of technology, Artificial Intelligence (AI) and Machine Learning (ML) have emerged as powerful tools with transformative potential. However, the adoption of these advanced technologies has often been limited to individuals with coding expertise, leaving a significant portion of the population, particularly those without programming skills, on the sidelines. This shift towards user-friendly AI/ML interfaces not only enhances inclusivity but also opens new avenues for innovation. A broader spectrum of individuals can combine the benefits of these cutting-edge technologies with their own domain knowledge to solve complex problems rapidly and effectively. Bringing no-code AI/ML to subject matter experts is necessary to ensure that the massive amount of data being produced by the DoD is properly analyzed and valuable insights are captured. This presentation delves into the importance of making AI and ML accessible to individuals with no coding experience. By doing so, it opens a world of possibilities for diverse participants to engage with and reap the benefits of the AI revolution.
    While the prospect of making AI and ML accessible to individuals without coding experience is promising, it comes with its own set of challenges, particularly in addressing the barriers for individuals lacking a background in data analysis. One significant hurdle lies in the complexity of AI and ML algorithms, which often require a nuanced understanding of statistical concepts, data preprocessing, and model evaluation. Individuals without a foundation in analysis may find it challenging to interpret results accurately, hindering their ability to derive meaningful insights from AI-driven applications.
    Another challenge is the availability of data, especially in the defense domain. Many models require large amounts of data to be effective. Ensuring the quality and consistency of the chosen dataset is a challenge, as individuals may encounter missing values, outliers, or inaccuracies that can adversely impact the performance of their ML models. Data preprocessing steps such as categorical variable encoding, interpolation, and normalization can be performed automatically, but it is important to understand when to use these techniques and why. Applying transformations such as logarithmic or polynomial transformations can enhance model performance. However, individuals with limited experience may struggle to determine when and how to apply these techniques effectively.
    The lack of familiarity with key concepts such as feature engineering, model selection, and hyperparameter tuning can impede users from effectively utilizing AI tools. The black-box nature of some advanced models further complicates matters, as users may struggle to comprehend the inner workings of these algorithms, raising concerns about transparency and trust in AI-generated outcomes. Ethical considerations and biases inherent in AI models also pose substantial challenges. Users without an analysis background may inadvertently perpetuate biases or misinterpret results, underscoring the need for education and awareness to navigate these ethical complexities.
    In this talk, we delve into the multifaceted challenges of bringing AI and ML to individuals without a background in analysis, emphasizing the importance of developing solutions that empower individuals to harness the potential of these technologies while mitigating potential pitfalls.

    Speaker Info:

    Alex Margolis

    Subject Matter Expert

    Edaptive Computing, Inc

    Alex Margolis is a Subject Matter Expert at Edaptive Computing, Inc. in Dayton, OH.  He has 11 years experience in software development with a focus on Machine Learning and AI.  He lead development of ECI AWESOME, a tool designed to bring machine learning into the hands of non-analysts, and ECI SEAMLESS, a tool designed to scan ML/AI algorithms for potential vulnerabilities and limitations.

  • Case Study: State Transition Maps for Mission Model Development and Test Objective Identifiers

    Abstract:

    Coming soon

    Speaker Info:

    Nicholas Jones

    STAT Expert

    STAT COE

    Mr. Nicholas Jones received his Master’s Degree in Materials Science and Engineering from the University of Dayton, Ohio. After working in mission performance analysis at Missile Defense Agency,  he currently works at the Scientific Test and Analysis Techniques Center of Excellence (STAT COE) in direct consultation with DoD programs. Mr. Jones assists programs with test planning and analysis, and also supports STAT COE initiatives for Model Validation Levels (MVLs) and development of STAT to support Cyber T&E.

  • CIVA NDT Simulation: Improving Inspections Today for a Better Tomorrow

    Abstract:

    CIVA NDT simulation software is a powerful tool for non-destructive testing (NDT) applications. It allows users to design, optimize, and validate inspection procedures for various NDT methods, such as ultrasonic, eddy current, radiographic, and guided wave testing. Come learn about the benefits of using CIVA NDT simulation software to improve the reliability, efficiency, and cost-effectiveness of NDT inspections.

    Speaker Info:

    Starr D'Auria

    NDE Engineer

    Extende

    Starr D’Auria is an NDE Engineer at Extende Inc, where she specializes in CIVA simulation software and TRAINDE NDT training simulators. She offers sales, technical support, training, and consulting services for these products. She holds a Bachelor of Science in Mechanical Engineering from LeTourneau University and has Level II VT and UT training, as well as ET, RT, PT and MT training. She leads the Hampton Roads ASNT chapter as the chairperson and serves on the steering committee for the DWGNDT.

  • Classifying violent anti-government conflicts in Mexico: A machine learning framework

    Abstract:

    Domestic crime, conflict, and instability pose a significant threat to many contemporary governments.
    These challenges have proven to be particularly acute within modern-day Mexico. While there have been significant developments in predicting intrastate armed and electoral conflict in various contemporary settings, such efforts have thus far been limited in their use of spatial as well as temporal correlations, as well as in the features they have considered. Machine learning, especially deep learning, has been proven to be highly effective in predicting future conflicts
    using word embeddings in Convolutional Neural Networks (CNN) but lacks the spatial structure and, due to the black box nature, cannot explain the importance of predictors. We develop a novel methodology using machine learning that can accurately classify future anti-government violence in Mexico. We further demonstrate that our approach can identify important leading predictors of such violence. This can help policymakers make informed decisions and can also help governments and NGOs better allocate security and humanitarian resources, which could prove beneficial in tackling this problem. Using a variety of political event aggregations from the ICEWS database alongside other textual and demographic features, we trained various classical machine learning algorithms, including but not limited
    to Logistic Regression, Random Forest, XGBoost, and a Voting classifier. The development of this reseearch was a stepwise process in three phases where the following phase was built upon the shortcomings of the previous phases. In the very first phase, we considered a mix of CNN + Long Short Term Memory (LSTM) networks to decode the spatial and temporal relationship in the data. The performance of all the black box deep learning models was not at par with the classical machine learning models. The second phase deals with the analysis of the temporal relationships in the data to identify the dependency of the conflicts over time and its lagged relationship. This also serves as a method to reduce feature dimension space by removing variables not covered with the cutoff lag. The third phase talks about the general variable selection methodologies used to further reduce the feature space along with identifying the important predictors that fuel anti-government violence along with their directional effect using Shapley additive values. The voting classifier, utilizing a subset of features derived from LASSO
    across 100 simulations, consistently surpasses alternative models in performance and demonstrates efficacy in accurately classifying future anti-government conflicts. Notably, Random Forest feature importance indicates that some features, including but not limited to homicides, accidents, material conflicts, and positive worded citizen information sentiments emerge as pivotal predictors in the classification of anti-government conflicts. Finally, in the fourth phase, we conclude the
    research by analysing the spatial structure of the data using Moran’s I index extended version for spatiotemporal data to identify the global spatial dependency and local clusters followed by modelling the data spatially and evaluating the same using Gaussian Process Boost(GPBoost). The global spatial autocorrelation is minimal, characterized by localized conflicts cluster within the region. Furthermore, the Voting Classifier demonstrates superior performance over GPBoost, leading to the inference that no substantial spatial dependency exists among the various locations.

    Speaker Info:

    Vishal Subedi

    PhD Student

    University of Maryland Baltimore County

    I am a first year PhD student at UMBC. My interests lie in applied statistics, machine learning and deep learning.

  • Clustering Singular and Non-Singular Covariance Matrices for Classification

    Abstract:

    In classification problems when working in high dimensions with a large number of classes and few observations per class, linear discriminant analysis (LDA) requires the strong assumptions of a shared covariance matrix between all classes and quadratic discriminant analysis leads to singular or unstable covariance matrix estimates. Both of these can lead to lower than desired classification performance. We introduce a novel, model-based clustering method which can relax the shared covariance assumptions of LDA by clustering sample covariance matrices, either singular or non-singular. This will lead to covariance matrix estimates which are pooled within each cluster. We show using simulated and real data that our method for classification tends to yield better discrimination compared to other methods.

    Speaker Info:

    Andrew Simpson

    Ph.D. Student

    South Dakota State University

    Ph.D. Student in the Computational Science and Statistics program at South Dakota State University. Research focuses on novel methods for modeling data generated from a hierarchical sampling process where subpopulation structures exist. The main application of this research is to forensic statistics and source identification.

  • Cobalt Strike: A Cyber Tooling T&E Challenge

    Abstract:

    Cyber Test and Evaluation serves a critical role in the procurement process of Red Team tools; however, once a tool is vetted and approved for use at the Red Team level, it is generally incorporated into their steady state operations without additional concern with regards to testing or maintenance of the tool. As a result, approved tools may not undergo routine in-depth T&E as new versions are released. This presents a major concern for the Red Team community as new versions can change the Operational Security of those tools. Similarly, cyber defenders - either through lack of training or limited resources - have been known to upload Red Team tools to commercial malware analysis platforms, which inadvertently releases potentially sensitive information about Red Team operations. The DOT&E Advanced Cyber Operations team, as part of the Cyber Assessment Program, performed in-depth analysis into Cobalt Strike, versions 4.8 and newer, an adversary simulation software widely used across the Department of Defense and the United States Government. Advanced Cyber Operations identified several operational security concerns that could disclose sensitive information to an adversary with access to payloads generated by Cobalt Strike. This highlights the need to improve the test and evaluation of cyber tooling, at a minimum, for major releases of tools utilized by Red Teams. Advanced Cyber Operations recommends in-depth, continuous test and evaluation of offensive operations tools and continued evaluation to mitigate potential operational security concerns.

    Speaker Info:

    Nathan Wray

    Senior Operator

    DOT&E ACO

    Dr. Nathan Wray is a technical lead and senior operator on the Advanced Cyber Operations team under the Office of the Director, Operational Test and Evaluation. Within his role, over the past seven years, Dr. Wray has performed red teaming, developed offensive cyber operations capabilities, and assisted cyber teams across the Department of Defense. Before his current role, Dr. Wray had over a decade of experience in operational and research-related positions in the private and public sectors. Dr. Wray's prior research and focus areas include leveraging machine learning to detect crypto-ransomware and researching offensive cyber capabilities, techniques, and related detection methods. Dr. Wray has Computer Engineering, Network Protection, and Information Assurance degrees and received his Doctorate of Science in Cybersecurity from Capitol Technology University in 2018.

  • Command Slating Sensitivity Analysis

    Abstract:

    The Army's Human Resources Command (HRC) annually takes on the critical task of the Centralized Selection List (CSL) process, where approximately 400 officers are assigned to key battalion command roles. This slating process is a cornerstone of the Army's broader talent management strategy, involving collaborative input from branch proponent officers and culminating in the approval of the Army Chief of Staff. The study addresses crucial shortcomings in the existing process for officer assignments, focusing on the biases and inconsistent weighting that affect slate selection outcomes. It examines the effects of incorporating specific criteria like Skill Experience Match (SEM), Knowledge, Skills, Behaviors (KSB), Order of Merit List (OML), and Officer Preferences (PREF) into the selection process of a pilot program. Our research specifically addresses the terms efficiency, strength, and weakness within the context of the pairing process. Our objective is to illuminate the potential advantages of a more comprehensive approach to decision-making in officer-job assignments, ultimately enhancing the effectiveness of placing the most suitable officer in the most fitting role.

    Speaker Info:

    Ryan Krolikowski

    CDT

    United States Military Academy

    I am currently a cadet at the United States Military Academy at West Point, pursuing an undergraduate degree in Operations Research.

  • Cost Considerations for Estimating Small Satellite Integration & Test

    Abstract:

    In the early phases of project formulation, mission integration and test (I&T) costs are typically estimated via a wrap factor approach, analogies to similar missions adjusted for mission specifics, or a Bottom Up Estimate (BUE). The wrap factor approach estimates mission I&T costs as a percentage of payload and spacecraft hardware costs. This percentage is based on data from historical missions, with the assumption that the project being estimated shares similar characteristics with the underlying data set used to develop the wrap factor. This technique has worked well for traditional spacecraft builds since typically as hardware costs grow, I&T test costs do as well. However, with the emergence of CubeSats and nanosatellites, the cost basis of hardware is just not large enough to use the same approach. This suggests that there is a cost “floor” that covers basic I&T tasks, such as a baseline of labor and testing.

    This paper begins the process of developing a cost estimating relationship (CER) for estimating Small Satellite (SmallSat) Integration & Test (I&T) costs. CERs are a result of a cost estimating methodology using statistical relationships between historical costs and other program variables. The objective in generating a CER equation is to show a relationship between the dependent variable, cost, to one or more independent variables. The results of this analysis can be used to better predict SmallSat I&T costs.

    Speaker Info:

    Rachel Sholder

    Rachel Sholder is a parametric cost analyst within the Systems Engineering Group of the APL Space Exploration Sector. She joined APL in 2017 after graduating with M.S. in Statistics and B.S. in Mathematics from Lehigh University. Rachel has since become a valuable team member responsible for life-cycle space mission cost estimates at various stages of programmatic development (pre-proposal, proposal, mission milestone, trade studies, etc.). She is an active participant in the NASA Cost Estimating Community. Rachel was named the “NASA Cost and Schedule Rising Star” in 2023. Rachel is currently working towards a Doctor of Engineering degree with a focus in applied mathematics and statistics.


  • Cost Considerations for Estimating Small Satellite Integration & Test

    Abstract:

    In the early phases of project formulation, mission integration and test (I&T) costs are typically estimated via a wrap factor approach, analogies to similar missions adjusted for mission specifics, or a Bottom Up Estimate (BUE). The wrap factor approach estimates mission I&T costs as a percentage of payload and spacecraft hardware costs. This percentage is based on data from historical missions, with the assumption that the project being estimated shares similar characteristics with the underlying data set used to develop the wrap factor. This technique has worked well for traditional spacecraft builds since typically as hardware costs grow, I&T test costs do as well. However, with the emergence of CubeSats and nanosatellites, the cost basis of hardware is just not large enough to use the same approach. This suggests that there is a cost “floor” that covers basic I&T tasks, such as a baseline of labor and testing.

    This paper begins the process of developing a cost estimating relationship (CER) for estimating Small Satellite (SmallSat) Integration & Test (I&T) costs. CERs are a result of a cost estimating methodology using statistical relationships between historical costs and other program variables. The objective in generating a CER equation is to show a relationship between the dependent variable, cost, to one or more independent variables. The results of this analysis can be used to better predict SmallSat I&T costs.

    Speaker Info:

    Kathy Kha

    Kathy Kha is a parametric cost analyst within the Systems Engineering Group in the Space Exploration Sector at The Johns Hopkins University Applied Physics Laboratory (APL). She has been working at APL since 2018 and is APL’s subject-matter expert in parametric cost analysis. At APL, she is responsible for life-cycle space mission cost estimates at various stages of programmatic development (pre-proposal, proposal, mission milestone, trade studies, etc.). Prior to joining APL, her work included consulting engagements providing cost estimates and proposal evaluation support for NASA source selection panels for space science and Earth science missions and cost estimating support at NASA Ames Research Center. She has a bachelor’s degree in Applied Mathematics from the University of California – San Diego, a master’s degree in Systems Engineering from the University of Southern California and a doctorate in engineering from The Johns Hopkins University.

  • Creating Workflows for Synthetic Data Generation and Advanced Military Image Classificatio

    Abstract:

    The US Government has a specific need for tools that intelligence analysts can use to search and filter data effectively. Artificial Intelligence (AI), through the application of Deep Neural Networks (DNNs) can assist in a multitude of military applications, requiring a constant supply of relevant data sets to keep up with the always-evolving battlefield. Existing imagery does not adequately represent the evolving nature of modern warfare; therefore, finding a way to simulate images of future conflicts could give us a strategic advantage against our adversaries. Additionally, using physical cameras to capture sufficient various lighting and environmental conditions is nearly impossible. The technical challenge in this area is to create software tools for edge computing devices integrated with cameras to process the video feed locally without having to send the video data through bandwidth-constrained networks to servers in data centers. The ability to collect and process data locally, often in austere environments, can accelerate decision making and action taken in response to emergency situations. An important part of this challenge is to create labeled datasets that are relevant to the problem and are needed for training the edge-efficient AI. Teams from Fayetteville State University (FSU) and The United States Military Academy (USMA) will present their proposed workflows that will enable accurate detection of various threats using Unreal Engine (UE) to generate synthetic training data. In principle, production of synthetic data is unlimited and can be customized to location, various environmental variables, and human and crowd characteristics. Together, both teams address the challenges of realism and fidelity; diversity and variability; and integration with real data.
    The focus of the FSU team is on creating semi-automated workflows to create simulated human-crowd behaviors and the ability to detect anomalous behaviors. It will provide methods of specifying collective behaviors to create crowd simulations of many human agents, and for selecting a few of those agents to exhibit behaviors that are outside of the defined range of normality. The analysis is needed for rapid detection of anomalous activities that can pose security threats and cost human lives.
    The focus of the USMA team will be in creating semi-autonomous workflows that evaluate the ability of DNNs to identify key military assets under various environmental conditions, specifically armored vehicles and personnel. We aim to vary environmental parameters to simulate varying light conditions and introduce obscuration experiments using artificial means like smoke and natural phenomena like fog to add complexity to the scenarios. Additionally, the USMA team will explore a variety of camouflage patterns and various levels of defilade.
    The outcome of both teams is to provide workflow solutions that maximize the use of UE to provide realistic datasets that simulate future battlefields and emergency scenarios for evaluating and training existing models. These studies pave the way for creating advanced models trained specifically for military application. Creating adaptive models that can keep up with today’s evolving battlefield will give the military a great advantage in the race for artificial intelligence applications.

    Speaker Info:

    James Starling

    Associate Professor

    United States Military Academy

    (please note that we would like to present work from both FSU and USMA during this presentation. We plan on having 5-7 presenters provide their insights on a handful of items related to the creation and use of synthetic data generation in Unreal Engine)

    Dr. James K. Starling is an Associate Professor and Director for the Center for Data Analysis and Statistics at the United States Military Academy, West Point. He has served in the United States Army as an Artilleryman and an Operations Research and Systems Analysis (ORSA) analyst for over 23  years. His research interests include military simulations, optimization, remote sensing, and object detection and recognition.

  • Data VV&A for AI Enabled Capabilities

    Abstract:

    Data – collection, preparation, and curation - is a crucial need in the AI lifecycle. Ensuring that the data are consistent, correct, and representative for the intended use is critical to ensuring the efficacy of an AI enabled system. Data verification, validation, and accreditation (VV&A) is meant to address this need. The dramatic increase in the prevalence of AI-enabled capabilities and analytic tools across the DoD has emphasized the need for a unified understanding of data VV&A, as quality data forms the foundation of AI models. In practice, data VV&A and associated activities are often used in an ad-hoc manner that may limit the ability to support development and testing of AI enabled capabilities. However, existing DOD frameworks for data VV&A are applicable to the AI lifecycle and embody important supporting activities for T&E of AI enabled systems. We highlight the importance of data VV&A, relying on established definitions, and outline some concerns and best practices.

    Speaker Info:

    John Dennis

    Research Staff Member (Economist)

    IDA

    John W. Dennis (Jay) earned his PhD in Economics from UNC Chapel Hill. He is a research staff member at the Institute for Defense Analyses, where he is a member of the Human Capital and Test Science groups. He specializes in Econometrics, Statistics, and Data Science.

  • Data-Driven Robust Design of an Aeroelastic Wing

    Abstract:

    This paper applies a Bayesian Optimization approach to the design of a wing subject to stress and aeroelastic constraints. The parameters of these constraints, which correspond to various flight conditions and uncertain parameters, are prescribed by a finite number of scenarios. Chance-constrained optimization is used to seek a wing design that is robust to the parameter variation prescribed by such scenarios. This framework enables computing designs with varying degrees of robustness. For instance, we can deliberately eliminate a given number of scenarios in order to obtain a lighter wing that is more likely to violate a requirement, or might seek a conservative wing design that satisfies the constraints for as many scenarios as possible.

    Speaker Info:

    Andrew Cooper

    Graduate Student

    Virginia Tech Department of Statistics

    Andrew Cooper is a 4th-year PhD candidate in Virginia Tech's Department of Statistics. He received his bachelors and masters degrees in Statistical Science from Duke University. His research areas include computer experiments and surrogate modeling, as well as Bayesian methodology.

  • Demystifying Deep Learning - Aircraft Identification from Satellite Images

    Abstract:

    In the field of Artificial Intelligence and Machine Learning (AI/ML), the literature can be filled with technical language and/or buzzwords, making it challenging for readers to understand the content. It will be a pedagogical talk focusing on demystifying "Artificial Intelligence" by providing a mathematical, but most importantly, an intuitive understanding of how deep learning really works. I will provide some existing tools and practical steps for how one can train their own neutral networks using an example of automatically identifying aircrafts and their attributes (e.g., civil vs. military, engine types, and size) from satellite images. Audience members with some knowledge of linear regression and coding will be armed with an increased understanding, confidence, and practical tools to develop their own AI applications.

    Speaker Info:

    Cameron Liang

    Research Staff Member

    IDA

    Dr. Cameron Liang is a research staff member. He received his Ph.D in Astronomy & Astrophysics in 2018 from the University of Chicago. Prior to joining IDA, he was a Postdoctoral Researcher at the University of California, Santa Barbara. Dr. Liang worked on theoretical and observational aspects of galaxy formation using magneto-hydrodynamic simulations. At IDA, he works on a variety of space-related topics, such as orbital debris and dynamic space operations. 

  • Design and Analysis of Experiments – Next-Level Methods with Case Studies

    Abstract:

    This is the short course for you if you are familiar with the fundamental techniques in the science of test and want to learn useful, real-world, and advanced methods applicable in the DoD/NASA test community. The focus will be on use cases not typically covered in most short courses. JMP software will primarily be used, and datasets will be provided for you to follow along many of the hands-on demonstrations of practical case studies. Design topics will include custom design of experiments tips, choosing optimality criteria, creating designs from existing runs, augmenting adaptively in high gradient regions, creating designs with constraints, repairing broken designs, mixture design intricacies, modern screening designs, designs for computer simulation, accelerated life test, and measurement system testing. Analysis topics will include ordinary least squares, stepwise, and logistic regression, generalized regression (LASSO, ridge, elastic net), model averaging (to include Self-Validated Ensemble Models), random effects (split-plot, repeated measures), comparability/equivalence, functional data analysis (think your data is a curve), nonlinear approaches and multiple response optimization and trade-space analysis. The day will finish with an hour-long Q&A session to help solve your specific T&E challenges.

    Speaker Info:

    Tom Donnelly

    JMP Statistical Discovery LLC

    Tom Donnelly works as a Systems Engineer for JMP Statistical Discovery supporting users of JMP software in the Defense and Aerospace sector. He has been actively using and teaching Design of Experiments (DOE) methods for the past 40 years to develop and optimize products, processes, and technologies. Donnelly joined JMP in 2008 after working as an analyst for the Modeling, Simulation & Analysis Branch of the US Army’s Edgewood Chemical Biological Center – now DEVCOM CBC. There, he used DOE to develop, test, and evaluate technologies for detection, protection, and decontamination of chemical and biological agents. Prior to working for the Army, Tom was a partner in the first DOE software company for 20 years where he taught over 300 industrial short courses to engineers and scientists. Tom received his PhD in Physics from the University of Delaware.

  • Design and Analysis of Experiments – Next-Level Methods with Case Studies

    Abstract:

    This is the short course for you if you are familiar with the fundamental techniques in the science of test and want to learn useful, real-world, and advanced methods applicable in the DoD/NASA test community. The focus will be on use cases not typically covered in most short courses. JMP software will primarily be used, and datasets will be provided for you to follow along many of the hands-on demonstrations of practical case studies. Design topics will include custom design of experiments tips, choosing optimality criteria, creating designs from existing runs, augmenting adaptively in high gradient regions, creating designs with constraints, repairing broken designs, mixture design intricacies, modern screening designs, designs for computer simulation, accelerated life test, and measurement system testing. Analysis topics will include ordinary least squares, stepwise, and logistic regression, generalized regression (LASSO, ridge, elastic net), model averaging (to include Self-Validated Ensemble Models), random effects (split-plot, repeated measures), comparability/equivalence, functional data analysis (think your data is a curve), nonlinear approaches and multiple response optimization and trade-space analysis. The day will finish with an hour-long Q&A session to help solve your specific T&E challenges.

    Speaker Info:

    James Wisnowski

    Adsurgo

    Dr. James Wisnowski, co-founder and principal consultant at Adsurgo, leads the enterprise consulting and government divisions of Adsurgo. Dr. Wisnowski has consulting experience and expertise in applied statistics, program management, strategic planning, military operations, design of experiments, reliability engineering, quality engineering, data mining, text analytics, simulation modelling, along with operations research analysis. He has published refereed journal articles and texts in addition to presenting consulting results, new research, and short courses at conferences worldwide. He retired from the US Air Force as an officer with 20 years of service as an acquisition, test, personnel, and force structure analyst in addition to having significant leadership responsibilities as a squadron commander, joint staff officer, and Air Force Academy professor.

  • Design of In-Flight Cyber Experimentation for Spacecraft

    Abstract:

    Cyber resilience technologies are critical to ensuring the survival of mission critical assets for space systems. Such emerging cyber resilience technologies ultimately need to be proven out through in-flight experimentation. However, there are significant technical challenges for proving that new technologies actually enhance resilience of spacecraft. In particular, in-flight experimentation suffers from a “low data” problem due to many factors including 1) no physical access limits what types of data can be collected, 2) even if data can be collected, size, weight, and power (SWaP) constraints of the spacecraft make it difficult to store large amounts of data, 3) even if data can be stored, bandwidth constraints limit the transfer of data to the ground in a timely manner, and 4) only a limited number of trials can be performed due to spacecraft scheduling and politics. This talk will discuss a framework developed for design and execution of in-flight cyber experimentation as well as statistical techniques appropriate for analyzing the data. More specifically, we will discuss how data from ground-based test beds can be used to augment the results of in-flight experiments. The discussed framework and statistical techniques will be demonstrated on a use case.

    Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525. SAND2023-14847A.

    Speaker Info:

    Meghan Sahakian

    Principal Member of Technical Staff

    Sandia National Laboratories

    Meghan Galiardi Sahakian is a Principal Member of the Technical Staff at Sandia National Laboratories. She earned her PhD in mathematics at University of Illinois Urbana-Champaign in 2016. She has since been at Sandia National Laboratories and her work focuses on cyber experimentation including topics such as design of experiments, modeling/simulation/virtualization, and quantitative cyber resilience metrics.

  • Developing AI Trust: From Theory to Testing and the Myths In Between

    Abstract:

    The Director, Operational Test and Evaluation (DOT&E) and the Institute for Defense Analyses (IDA) are developing recommendations for how to account for trust and trustworthiness in AI-enabled systems during Department of Defense (DoD) Operational Testing (OT). Trust and trustworthiness have critical roles in system adoption, system use and misuse, and performance of human-machine teams. The goal, however, is not to maximize trust, but to calibrate the human’s trust to the system’s trustworthiness. Trusting more than a system warrants can result in shattered expectations, disillusionment, and remorse. Conversely, under trusting implies that humans are not making the most of available resources.
    Trusted and trustworthy systems are commonly referenced as essential for the deployment of AI by political and defense leaders and thinkers. Executive Order 14110 requires “safe, secure, and trustworthy development and use” of AI. Furthermore, the desired end state of the Department of Defense Responsible AI Strategy is trust. These terms are not well characterized and there is no standard, accepted model for understanding, or method for quantifying, trust or trustworthiness for test and evaluation (T&E). This has resulted in trust and trust calibration rarely being assessed in T&E. This is, in part, due to the contextual and relational nature of trustworthiness. For instance, the developmental tester requires a different level of algorithmic transparency than the operational tester or the operator; whereas the operator may need more understandability than transparency. This means that to successfully operationally test AI-enabled systems, such testing must be done at the right level, with the actual operators and commanders and up-to-date CONOPS as well as sufficient time for training and experience for trust to evolve. The need for testing over time is further amplified by particular features of AI, wherein machine behaviors are no longer as predictable or static as traditional systems but may continue to be updated and adaptive. Thus, testing for trust and trustworthiness cannot be one and done.
    It is critical to ensure that those who work within AI – in its design, development, and testing – understand exactly what trust actually means, why it is important, and how to operationalize and measure it. This session will empower testers by:
    • Establishing a common foundation for understanding what trust and trustworthiness are.
    • Defining key terms related to trust, enabling testers to think about trust more effectively.
    • Demonstrating the importance of trust calibration for system acceptance and use and the risks of poor calibration.
    • Decomposing the factors within trust to better elucidate how trust functions and what factors and antecedents have been shown to effect trust in human-machine interaction.
    • Introducing concepts on how to design AI-enabled systems for better trust calibration, assurance, and safety.
    • Proposing validated and reliable survey measures for trust.
    • Discussing common cognitive biases implicated in trust and AI and both the positive and negative roles biases play.
    • Addressing common myths around trust in AI, including that trust or its measurement doesn’t matter, or that trust in AI can be “solved” with ever more transparency, understandability, and fairness.

    Speaker Info:

    Yosef Razin

    Research Associate

    IDA

    Yosef S. Razin is a Research Associate at IDA and doctoral candidate in Robotics at the Georgia Institute of Technology, specializing in human-machine trust and the particular challenges to trust that AI poses.  His research has spanned the psychology of trust, ethical and legal implications, game theory, and trust measure development and validation.  His applied research has focused on human-machine teaming, telerobotics, autonomous cars, and AI-assistants and decision support. At IDA, he is in the Operational Evaluation Division and involved with the Human-System Integration group and the Test Science group.

  • Developing Model-Based Flight Test Scenarios

    Abstract:

    The Department of Defense (DoD) is undergoing a digital engineering transformation in every process of the systems engineering lifecycle. This transformation provides the requirement that DoD Test and Evaluation (T&E) processes begin to implement and begin executing model-based testing methodologies. This paper describes and assesses a grey box model-driven test design (MDTD) approach to create flight test scenarios based on model-based systems engineering artifacts. To illustrate the methodology and evaluate the expected outcomes of the process in practice, a case study using a model representation of a training system utilized to train new Air Force Operational Test and Evaluation Center (AFOTEC) members in conducting operational test and evaluation (OT&E) is presented. The results of the grey box MDTD process are a set of activity diagrams that are validated to generate the same test scenario cases as the traditional document-centric approach. Using artifacts represented in System Modeling Language (SysML), this paper will discuss key comparisons between the traditional and MDTD processes. This paper demonstrates the costs and benefits of model-based testing and their relevance in the context of operational flight testing.

    Speaker Info:

    Jose Alvarado

    Technical Advisor

    AFOTEC Det 5/CTO

    JOSE ALVARADO is a senior test engineer and system analyst for AFOTEC at Edwards AFB, California with over 33 years of developmental and operational test and evaluation experience. He is a Ph.D. candidate in the Systems Engineering doctorate program at Colorado State University with research interests in applying MBSE concepts to the flight test engineering domain and implementing test process improvements through MBT. Jose holds a B.S. in Electrical Engineering from California State University, Fresno (1991), and an M.S. in Electrical Engineering from California State University, Northridge (2002). He serves as an adjunct faculty member for the electrical engineering department at the Antelope Valley Engineering Program (AVEP) overseen by California State University, Long Beach. He is a member of the International Test and Evaluation Association, Antelope Valley Chapter.

  • Development, Test, and Evaluation of Small-Scale Artificial Intelligence Models

    Abstract:

    As data becomes more commoditized across all echelons of the DoD, developing Artificial Intelligence (AI ) solutions, even at small scales, offer incredible opportunity for advanced data analysis and processing. However, these solutions require intimate knowledge of the data in question, as well as robust Test and Evaluation (T&E) procedures to ensure performance and trustworthiness. This paper presents a case study and recommendations for developing and evaluating small-scale AI solutions. The model automates an acoustic trilateration system. First, the system accurately identifies the precise times of acoustic events across a variable number of sensors using a neural network. It then corresponds the events across the sensors through a heuristic matching process. Finally, using the correspondences and difference of times, the system triangulates a physical location. We find that even a relatively simple dataset requires extensive understanding at all phases of the process. Techniques like data augmentation and data synthesis, which must capture the unique attributes of the real data, were necessary both for improved performance, as well as robust T&E. The T&E metrics and pipeline required unique approaches to account for the AI solution, which lacked traceability and explainability. As leaders leverage the growing availability of AI tools to solve problems within their organizations, strong data analysis skills must remain at the core of process.

    Speaker Info:

    David Niblick

    AI Evaluator

    Army Test and Evaluation Command

    MAJ David Niblick graduated from the United States Military Academy at West Point in 2010 with a BS in Electrical Engineering. He served in the Engineer Branch as a lieutenant and captain at Ft. Campbell, KY with the 101st Airborne Division (Air Assault) and at Schofield Barracks, HI with the 130th Engineer Brigade. He deployed twice to Afghanistan ('11-'12 and '13-'14) and to the Republic of Korea ('15-'16). After company command, he attended Purdue University and received an MS in Electrical and Computer Engineering with a thesis in computer vision and deep learning. He instructed in the Department of Electrical Engineering and Computer Science at USMA, after which he transferred from the Engineer Branch to Functional Area 49 (Operations Research and Systems Analysis). He currently serves as an Artificial Intelligence Evaluator with Army Test and Evaluation Command at Aberdeen Proving Ground, MD.

  • Dynamo: Adaptive T&E via Bayesian Decision Theory

    Abstract:

    The Dynamo paradigm for T&E compares a set of test options for a system by computing which of them provides the greatest expected operational benefit relative to the cost of testing. This paradigm will be described and demonstrated for simple, realistic cases. Dyanmo stands for DYNAmic Knowledge + MOneyball. These two halves of Dynamo are its modeling framework and its chief evaluation criterion, respectively.

    A modeling framework for T&E is what allows test results (and domain knowledge) to be leveraged to predict operational system performance. Without a model, one can only predict, qualitatively, that operational performance will be similar to test performance in similar environments. For quantitative predictions one can formulate a model that inputs a representation of an operational environment and outputs the probabilities of the various possible outcomes of using the system there. Such models are typically parametric: they have a set of unknown parameters to be calibrated during test. The more knowledge one has about a suitable model’s parameters, the better predictions one can make about the modeled system’s operational performance. The Bayesian approach to T&E encodes this knowledge as a probability distribution over the model parameters. This knowledge is initialized with data from previous testing and with subject matter expertise, and it is “dynamic” because it is updated whenever new test results arrive.

    An evaluation criterion is a metric for the operational predictions provided by the modeling framework. One type of metric is about whether test results indicate a system meets requirements: this question can be addressed with increasing nuance as one employs more sophisticated modeling frameworks. Another type of metric is how well a test design will tighten knowledge about model parameters, regardless of what the test results themselves are. The Dynamo paradigm can leverage either, but it uses a “Moneyball” metric for recommending test decisions. A Moneyball metric quantifies the expected value of the knowledge one would gain from testing (whether from an entire test event, or from just a handful of trials) in terms of the operational value this knowledge would provide. It requires a Bayesian modeling framework so that incremental gains in knowledge can be represented and measured. A Moneyball metric quantifies stakeholder preferences in the same units as testing costs, which enables a principled cost/benefit analysis not only of which tests to perform, but of whether to conduct further testing at all.

    The essence of Dynamo is that it applies Bayesian Decision Theory to T&E to maintain and visualize the state of knowledge about a system under test at all times, and that it can make recommendations at any time about which test options to conduct to provide the greatest expected benefit to stakeholders relative to the cost of testing. This talk will discuss the progress to date developing Dynamo and some of the future work remaining to make it more easily adaptable to testing specific systems.

    Speaker Info:

    James Ferry

    Principal Research Scientist

    Metron, Inc.

    James Ferry’s research focuses on the problems of the Defense and Intelligence communities. His interests encompass the application of Bayesian methods to a variety of domains: Test & Evaluation, tracking and data association for kinematic and non-kinematic data, and the synthesis of classical detection and tracking theory with the modern theory of networks. Prior to Metron, he worked in computational fluid dynamics at UIUC, specializing in multiphase flow and thermal convection. Dr. Ferry holds a Ph.D. and M.S. in applied mathematics from Brown University and an S.B. in mathematics from MIT.

  • Enhancing Battlefield Intelligence with ADS-B Change Detection

    Abstract:

    The ability to detect change in flight patterns using air traffic control (ATC) communication can better inform battlefield intelligence. ADS-B (Automatic Dependent Surveillance Broadcast) technology has this capability to capture movement of both military and civilian aircraft over conflict zones. Leveraging the inclusivity of ADS-B in flight tracking and its widespread global availability, we focus on its application in understanding changes leading up to conflicts, with a specific case study on Ukraine.
    In this presentation we analyze days leading up to Russia’s February 24 invasion to understand how ADS-B technology can indicate change in Russo-Ukrainian military movements. The proposed detection algorithm encourages the use of ADS-B technology in future intelligence efforts. The potential for fusion with GICB (Ground-initiated Comm-B) ATC communication and other modes of data is also explored.
    This is a submission for the Student Poster Competition

    Speaker Info:

    Cooper Klein

    Cadet

    United States Military Academy, West Point

    Cooper Klein is an applied statistics and data science major from Seattle, Washington. A senior at West Point, he is researching fusion of air traffic control data to inform battlefield intelligence. Cooper will commission as a military intelligence officer in the United States Army this May. He represents West Point on the Triathlon Team where he pursues his passion of endurance sports.

  • Enhancing Multiple Regression-based Resilience Model Prediction with Transfer Function

    Abstract:

    Resilience engineering involves creating and maintaining systems capable of efficiently managing disruptive incidents. Past research in this field has employed various statistical techniques to track and forecast the system's recovery process within the resilience curve. However, many of these techniques fall short in terms of flexibility, struggling to accurately capture the details of shocks. Moreover, most of them are not able to predict long-term dependencies. To address these limitations, this paper introduces an advanced statistical method, the transfer function, which effectively tracks and predicts changes in system performance when subjected to multiple shocks and stresses of varying intensity and duration. This approach offers a structured methodology for planning resilience assessment tests tailored to specific shocks and stresses and guides the necessary data collection to ensure efficient test execution. Although resilience engineering is domain-specific, the transfer function is a versatile approach, making it suitable for various domains. To assess the effectiveness of the transfer function model, we conduct a comparative analysis with the interaction regression model, using historical data on job losses during the 1980 recessions in the United States. This comparison not only underscores the strengths of the transfer function in handling complex temporal data but also reaffirms its competitiveness compared to existing methods. Our numerical results using goodness of fit measures provide compelling evidence of the transfer function model's enhanced predictive power, offering an alternative for advancing resilience prediction in time series analysis.

    Speaker Info:

    Fatemeh Salboukh

    PhD Student

    University of Massachusetts Dartmouth

    Fatemeh Salboukh is a PhD student in the Department of Engineering and Applied Science at the University of Massachusetts Dartmouth. She received her Master’s from the University of Allame Tabataba’i in Mathematical Statistics (September, 2020) and Bachelor’s degree from Yazd University (July, 2018) in Applied Statistics.

  • Evaluating Military Vehicle Object Detectors with Synthetic Data

    Abstract:

    The prevalence of unmanned aerial systems (UAS) and remote sensing technology on the modern battlefield has laid the foundation for an automated targeting system. However, no computer vision model has been trained to support such a system. Difficulties arise in creating these models due to a lack of available battlefield training data. This work aims to investigate the use of synthetic images generated in Unreal Engine as supplementary training data for a battlefield image classifier. We test state-of-the-art computer vision models to determine their performance on drone images of modern battlefields and the suitability of synthetic images as training data. Our results suggest that synthetic training images can improve the performance of state-of-the-art models in battlefield computer vision tasks.

    This is an abstract for a student poster.

    Speaker Info:

    Charles Wheaton

    Student

    United States Military Academy

    I am a fourth-year cadet at the United States Military Academy. After my graduation this spring, I will join the US Army Cyber branch as a capabilities developer. I am interested in data science and artificial intelligence, specifically graph neural networks.

  • Examining the Effects of Implementing Data-Driven Uncertainty in Cost Estimating Models

    Abstract:

    When conducting probabilistic cost analysis, correlation assumptions are key assumptions and often a driver for the total output or point estimate of a cost model. Although the National Aeronautics and Space Administration (NASA) has an entire community dedicated to the development of statistical cost estimating tools and techniques to manage program and project performance, the application of accurate and data-driven correlation coefficients within these models is often overlooked. Due to the uncertain nature of correlation between random variables, NASA has had difficulty quantifying the relationships between spacecraft subsystems with specific, data-driven correlation matrices. Previously, the NASA cost analysis community has addressed this challenge by either selecting a blanket correlation value to address uncertainty within the model or opting out of using any correlation value altogether. One hypothesized method of improving NASA cost estimates involves deriving subsystem correlation coefficients from the residuals of the regression equations for the cost estimating relationships (CERs) of various spacecraft subsystems and support functions. This study investigates the feasibility of this methodology using the CERs from NASA's Project Cost Estimating Capability (PCEC) model. The correlation coefficients for each subsystem of the NASA Work Breakdown Structure were determined by correlating the residuals of PCEC's subsystem CERs. These correlation coefficients were then compiled into a 20x20 correlation matrix and were implemented into PCEC as an uncertainty factor influencing the model's pre-existing cost distributions. Once this correlation matrix was implemented into the cost distributions of PCEC, the Latin Hypercube Sampling function of the Microsoft Excel add-in Argo was used to simulate PCEC results for 40 missions within the PCEC database. These steps were repeated three additional times using the following correlation matrices: (1) a correlation matrix assuming the correlation between each subsystem is zero, (2) a correlation matrix assuming the correlation between each subsystem is 1, and (3) a correlation matrix using a blanket value of 0.3. The results of these simulations showed that the correlation matrix derived from the residuals of the subsystem CERs significantly reduced bias and error within PCEC's estimating capability. The results also indicated that the probability density function and cumulative distribution function of each mission in the PCEC database were altered significantly by the correlation matrices that were implemented into the model. This research produced (1) a standard subsystem correlation matrix that has been proven to improve estimating accuracy within PCEC and (2) a replicable methodology for creating this correlation matrix that can be used in future cost estimating models. This information can help the NASA cost analysis community understand the effects of applying uncertainty within cost models and perform sensitivity analyses on project cost estimates. This is significant because NASA has been frequently critiqued for underestimating project costs and this methodology has shown promise in improving NASA's future cost estimates and painting a more realistic picture of the total possible range of spacecraft development costs.

    Speaker Info:

    Victoria Nilsen

    Operations Research Analyst

    NASA HQ

    Vicky Nilsen is an Operations Research Analyst within the Office of the Chief Financial Officer (OCFO) at NASA Headquarters in Washington, DC. In this role, Vicky serves as OCFO's point of contact for cost and schedule analysis, research, and model development. Vicky began her tenure as a civil servant at NASA in March 2022. Prior to this, she has been affiliated with NASA in various ways. In 2017, she was a systems engineer on the George Washington University CubeSat project, sponsored by NASA's CubeSat Launch Initiative. From 2018-2020, she performed academic research for the Mission Design Lab at NASA's Goddard Space Flight Center and Team X at the Jet Propulsion Lab. In 2019, she was an intern in OCFO's Portfolio Investment Analysis branch working on cost and schedule analysis for the Human Landing System of the Artemis mission. Most recently, she worked as a contractor supporting the development of NASA's 2022 Strategic Plan and Evidence Act. She is extremely passionate about the work that she has done and continues to do at NASA and aims to drive change and innovation within NASA's Project Planning & Control and Cost Estimating communities.

  • Experimental Design for Usability Testing of LLMs

    Abstract:

    Large language models (LLMs) are poised to dramatically impact the process of composing, analyzing, and editing documents, including within DoD and IC communities. However, there have been few studies that focus on understanding human interactions and perceptions of LLM outputs, and even fewer still when one considers only those relevant to a government context. Furthermore, there is a paucity of benchmark datasets and standardized data collection schemes necessary for assessing the usability of LLMs in complex tasks, such as summarization, across different organizations and mission use cases. Such usability studies require an understanding beyond the literal content of the document; the needs and interests of the reader must be considered, necessitating an intimate understanding of the operational context. Thus, adequately measuring the effectiveness and suitability of LLMs requires usability testing to be incorporated into the testing and evaluation process.

    However, measures of usability are stymied by three challenges. First, there is an unsatisfied need for mission-relevant data that can be used for assessment, a critical first step. Agencies must provide data for assessment of LLM usage, such as report summarization, to best evaluate the effectiveness of LLMs. Current widely available datasets for assessing LLMs consist primarily of ad hoc exams ranging from the LSAT to sommelier exams. High performance on these exams offers little insight into LLM performance on mission tasks, which possess a unique lexicon, set of high-stakes mission applications, and DoD and IC userbase. Notably, our prior work indicates that currently available curated datasets are unsuitable proxies for government reporting. Our search for proxy data for intelligence reports led us on a path to create our own dataset in order to evaluate LLMs within mission contexts.

    Second, a range of experimental design techniques exists for collecting human-centric measures of LLM usability, each with their own benefits and disadvantages. Navigating the tradeoffs between these different techniques is challenging, and the lack of standardization across different groups inhibits comparison between groups. A discussion is provided on the potential usage of commonly conducted usability studies, including heuristic evaluations, observational and user experience studies, and tool instrumentation focusing on LLMs used in summarization. We will describe the pros and cons of each study, crafting guidance against their approximate required resources in terms of time (planning, participant recruitment, study, and analysis), compute, and data. We will demonstrate how our data collection prototype for summarization tasks can be used to streamline the above.

    The final challenge involves associating human-centric measures, such as ratings of fluency, to other more quantitative and mission-level metrics. We will provide an overview of measures for summarization quality, including ratings for accuracy, concision, fluency, and completeness, and discuss current efforts and existing challenges in associating those measures to quantitative and qualitative metrics. We will also discuss the value of such efforts in building a more comprehensive assessment of LLMs, as well as the relevance of these efforts to document summarization.

    Speaker Info:

    Jasmine Ratchford

    Research Scientist

    CMU Software Engineering Institute

    Jasmine Ratchford is a Machine Learning Research Scientist and the CMU Software Engineering Institute (SEI). Dr. Ratchford received her Ph.D. in Physics from the University of Texas at Austin and has spent 15 years working across the federal government, supporting efforts at DARPA, DHS, and DOT&E. At SEI, Dr. Ratchford focuses on AI research & engineering practices in areas such as large language model development and scientific machine learning.

  • Failure Distributions for Parallel Dependent Identical Weibull Components

    Abstract:

    For a parallel system, when one component fails, the failure distribution of the remaining components will have an increased failure rate. This research takes a novel approach to finding the associated failure distribution of the full system using ordinal statistic distributions for correlated Weibull components, allowing for unknown correlations between the dependent components. A Taylor series approximation is presented for two components; system failure time distributions are also derived for two failures in a two component system, two failures in an n component system, three failures in a three component system, and k failures in an component system. Additionally, a case study is presented on aircraft turnbuckles. Simulated data is used to illustrate how the derived formulas can be used to create a maintenance plan for the second turnbuckle in the two component system.

    Speaker Info:

    Gina Sigler

    STAT COE contractor

    HII/STAT COE

    Dr. Gina Sigler, contractor with Huntington Ingalls Industries (HII), is the DoD Program Lead at the Scientific Test and Analysis Techniques (STAT) Center of Excellence (COE). She has been working at the STAT COE since 2018, where she provides rigorous test designs and best practices to programs across the DOD. Before joining the STAT COE, she worked as a faculty associate in the Statistics Department at the University of Wisconsin-Madison for three years. She earned a B.S. degree in statistics from Michigan State University, an M.S. in statistics from the University of Wisconsin-Madison, and a PhD in Applied Mathematics-Statistics from the Air Force Institute of Technology.

  • From Text to Metadata: Automated Product Tagging with Python and NLP

    Abstract:

    As a research organization, the Institute for Defense Analyses (IDA) produces a variety of deliverables like reports, memoranda, slides, and other formats for our sponsors. Due to their length and volume, summarizing these products quickly for efficient retrieval of information on specific research topics poses a challenge. IDA has led numerous initiatives for historical tagging of documents, but this is a manual and time-consuming process, and must be led periodically to tag newer products. To address this challenge, we have developed a Python-based automated product tagging pipeline using natural language processing (NLP) techniques.

    This pipeline utilizes NLP keyword extraction techniques to identify descriptive keywords within the content. Filtering these keywords with IDA's research taxonomy terms produces a set of product tags, serving as metadata. This process also enables standardized tagging of products, compared to the manual tagging process, which introduces variability in tagging quality across project leaders, authors, and divisions. Instead, the tags produced through this pipeline are consistent and descriptive of the contents. This product-tagging pipeline facilitates an automated and standardized process for streamlined topic summarization of IDA's research products, and has many applications for quantifying and analyzing IDA's research in terms of these product tags.

    Speaker Info:

    Aayushi Verma

    Data Science Fellow II

    IDA

    Aayushi Verma is a Data Science Fellow at the Institute for Defense Analyses (IDA), where she collaborates with the Chief Data Officer to drive IDA's Data Strategy. She has developed numerous data pipelines and visualization dashboards to bring data-driven insights to staff. Her data science interests include machine learning/deep learning, image processing, and extracting stories from data. Aayushi holds an M.S. in Data Science from Pace University, and a B.Sc. (Hons.) in Astrophysics from the University of Canterbury.

  • Functional Data Analysis of Radar Tracking Data

    Abstract:

    Functional data are an ordered series of data collected over a continuous scale, such as time or distance. The data are collected in ordered x,y pairs and can be viewed as a smoothed line with an underlying function. The Army Evaluation Center (AEC) has identified multiple instances where functional data analysis could have been applied, but instead evaluators used more traditional and/or less statistically rigorous methods to evaluate the data. One of these instances is radar tracking data.

    This poster highlights historical shortcomings of how AEC currently analyzes functional data, such as radar tracking data and our vision for future applications. Using a notional data from a real radar example, the response of 3D track error is plotted against distance, where each function represents a unique run number and additional factors held constant throughout a given run. The example includes the selected model, functional principle components, the resulting significant factors, and summary graphics used for report. Additionally, the poster will highlight historical analysis methods and the improvements the functional data analysis method brings. The analysis and output from this poster will utilize JMP's functional data analysis platform.

    Speaker Info:

    Shane Hall

    Division Chief - Analytics and Artificial Intelligence

    Army Evaluation Center

    Shane Hall is the Division Chief of the Analytics and Artificial Intelligence Division within the Army Evaluation Center.  Shane graduated from Penn State University in 2011 with a Bachelor's degree in Statistics and a Masters of Applied Statistics.  Shane began his Army civilian career immediately after college, working for the Army Public Health Command as their statistician.  In 2016, he moved to the Army Evaluation Center as a statistician.  In 2022, he became the Division Chief of the Analytics team and newly formed Artificial Intelligence Team.

  • Generative AI -Large Language Model Introduction

    Abstract:

    Generative artificial intelligence (AI) is a rapidly advancing field and transformative technology that involves the creation of new content. Generative AI encompasses AI models that produce novel data, information, or documents in response to the prompts. This technology has gained significant attention due to the emergence of models like DALL-E , Imagen, and ChatGPT. Generative AI excels in generating content across various domains. The versatility of Generative AI extends to generating text, software code, images, videos, and music by statistically analyzing patterns in training data.
    One of the most prominent applications of Generative AI is ChatGPT, developed by OpenAI. ChatGPT is a sophisticated language model trained on vast amounts of text data from diverse sources. It can engage in conversations, answer questions, write essays, generate code snippets, and more. Generative AI's strengths lie in its ability to produce diverse and seemingly original outputs quickly.

    Large Language Models (LLMs) are advanced deep learning algorithms that can understand, summarize, translate, predict, and generate content using extensive datasets. These models work by being trained on massive amounts of data, deriving relationships between words and concepts, and then using transformer neural network processes to understand and generate responses.

    LLMs are widely used for tasks like text generation, translation, content summarization, rewriting content, classification, and categorization. They are trained on huge datasets to understand language better and provide accurate responses when given prompts or queries. The key algorithms used in LLMs include:
    • Word Embedding: This algorithm represents the meaning of words in a numerical format, enabling the AI model to process and analyze text data efficiently.
    • Attention Mechanisms: These algorithms allow the AI to focus on specific parts of input text, such as sentiment-related words, when generating an output, leading to more accurate responses.
    • Transformers: Transformers are a type of neural network architecture designed to solve sequence-to-sequence tasks efficiently by using self-attention mechanisms. They excel at handling long-range dependencies in data sequences. They learn context and meaning by tracking relationships between elements in a sequence.
    This presentation will focus on basics of large language models, algorithms, and applications to nuclear decommissioning knowledge management.

    Speaker Info:

    Himanshu Upadhyay

    Associate Professor -ECE

    Florida International University

    Dr Himanshu Upadhyay is serving Florida International University for the past 23 years, leading the AI & Cyber Center of Excellence at Applied Research Center. He is working as an Associate Professor in Electrical & Computer Engineering teaching Artificial Intelligence and Cybersecurity courses. His research focuses on Artificial Intelligence, Machine Learning, Deep Learning, Generative AI, Cyber Security, Big Data, Cyber Analytics/Forensics, Malware Analysis and Blockchain. He has published multiple papers in reputed journals & conferences and is mentoring AI & Cyber Fellows, undergraduate and graduate students supporting multiple AI & Cybersecurity research projects from various federal agencies.

  • Global Sensitivity Analyses for Test Planning under Constraints with Black-box Models

    Abstract:

    This work describes sensitivity analyses performed on complex black-box models used to support experimental test planning under limited resources in the context of the Mars Sample Return program, which aims at bringing to Earth rock and atmospheric samples from Mars. We develop a systematic workflow that allows the analysts to simultaneously obtain quantitative insights on key drivers of uncertainty, on the direction of impact, and the presence of interactions. We apply novel optimal transport-based global sensitivity measures to tackle the multivariate nature of the output. On the modeling side, we apply multi-fidelity techniques that leverage low-fidelity models to speed up the calculations and make up for the limited amount of high-fidelity samples, while keeping these in the loop for accuracy guarantees. The sensitivity analysis reveals insights useful for the analysts to understand the model's behavior and identify the factors to focus on during testing in order to maximize the value of information extracted from them to ensure mission success when limited resources are available.

    Speaker Info:

    Giuseppe Cataldo

    Planetary Protection Lead

    NASA

    Giuseppe Cataldo leads the NASA planetary protection efforts for the last mission of the Mars Sample Return program. In this role, he oversees the efforts aimed at safely returning rock and atmosphere samples from Mars without contaminating the Earth's biosphere with potentially hazardous biological particles. Previously, he was the chief engineer of NASA's EXCLAIM mission and the near-infrared camera NASA contributed to the PRIME telescope. He worked on the James Webb Space Telescope from 2014 to 2020 and on a variety of other NASA missions and technology development projects. His expertise is in the design, testing and management of space systems, gained over 10+ years at NASA and by earning his doctorate at the Massachusetts Institute of Technology. Giuseppe is the recipient of numerous awards including NASA's Early Career Public Achievement Medal and Mentoring Award. He speaks six languages, plays the violin, loves swimming and skiing as well as helping the homeless with his friends and wife.

  • Hypersonic Glide Vehicle Trajectories: A conversation about synthetic data in T&E

    Abstract:

    The topic of synthetic data in test and evaluation is steeped in controversy – and rightfully so. Generative AI techniques can be erratic, producing non-credible results that should give evaluators pause. At the same time, there are mission domains that are difficult to test, and these rely on modeling and simulation to generate insights for evaluation. High fidelity modeling and simulation can be slow, computationally intensive, and burdened by large volumes of data – challenges which become prohibitive as test complexity grows.

    To mitigate these challenges, we posit a defensible, physically valid generative AI approach to creating fast-running synthetic data for M&S studies of hard-to-test scenarios. Characterized as a “Narrow Digital Twin,” we create an exemplar Generative AI model of high-fidelity Hypersonic Glide Vehicle trajectories. The model produces a set of trajectories that meets user-specified criteria (particularly as directed by a Design of Experiments) and that can be validated against the equations of motion that govern these trajectories. This presentation will identify the characteristics of the model that make it suitable for generating synthetic data and propose easy-to-measure acceptability criteria. We hope to advance a conversation about appropriate and rigorous uses of synthetic data within T&E.

    Speaker Info:

    Karen O'Brien

    Senior Principal Data Scientist

    Modern Technology Solutions, Inc

    Karen O’Brien is a senior principal data scientist and AI/ML practice lead at Modern Technology Solutions, Inc.  In this capacity, she leverages her 20-year Army civilian career as a scientist, evaluator, ORSA, and analytics leader to aid DoD agencies in implementing AI/ML and advanced analytics solutions.  Her analytics career ranged ‘from ballistics to logistics’ and most of her career was in Army Test and Evaluation Command or supporting Army T&E from the Army Research Laboratory.  She was physics and chemistry nerd in her early career, but now uses her M.S. in Predictive Analytics from Northwestern University to help her DoD clients tackle the toughest analytics challenges in support of the nation’s Warfighters.

  • Improving Data Visualizations Through R Shiny

    Abstract:

    Poster Abstract

    Shiny is an R package and framework used to build applications in a web browser without needing any programming experience. This capability allows the power and functionality of R to be accessible to audiences that would typically not utilize R because of the roadblocks with learning a programming language.

    Army Evaluation Center (AEC) has rapidly increased their utilization of R shiny to develop apps for use in Test and Evaluation. These apps have allowed evaluators to upload data, perform calculations, and then create data visualizations which can be customized for their reporting needs. The apps have streamlined and standardized the analysis process.

    The poster outlines the before and after of three data visualizations utilizing the shiny apps. The first displays survey data using the likert package. The second graphs a cyberattack cycle timeline. And the third displays individual qualification course results and calculates qualification scores. The apps are hosted in a cloud environment and their usage is tracked with an additional shiny app.

    Speaker Info:

    Allison Holston

    Mathematical Statistician

    Army Evaluation Center

    Allison Holston is the Lead Statistician for the Army Evaluation Center in Aberdeen Proving Ground, Maryland. She has an M.S. in Statistics from the University of Georgia and a B.S. in Mathematics & Statistics from Virginia Tech.

  • Introduction to Uncertainty Quantification

    Abstract:

    Uncertainty quantification (UQ) sits at the confluence of data, computers, basic science, and operation. It has emerged with the need to inform risk assessment with rapidly evolving science and to bring the full power of sensing and computing to bear on its management.
    With this role, UQ must provide analytical insight into several disparate disciplines, a task that may seem daunting and highly technical. But not necessarily so.
    In this mini-tutorial, I will present foundational concepts of UQ, showing how it is the simplicity of the underlying ideas that allows them to straddle multiple disciplines. I will also describe how operational imperatives have helped shape the evolution of UQ and discuss how current research at the forefront of UQ can in turn affect these operations.

    Speaker Info:

    Roger Ghanem

    Professor

    University of Southern California

    Roger Ghanem is Professor of Civil and Environmental Engineering at the U of Southern California where he also holds the Tryon Chair in Stochastic Methods and Simulation. Ghanem's research is in the general areas of uncertainty quantification and computational science with a focus on coupled phenomena. He received his PhD from Rice University and had served on the faculty of SUNY-Buffalo and Johns Hopkins University before joining USC in 2005.

  • Leading Change: Applying Human Centered Design Facilitation Techniques

    Abstract:

    First introduced in 1987, modern design thinking was popularized by the Stanford Design School and the global design and innovation company, IDEO. Design thinking is now recognized as a “way of thinking which leads to transformation, evolution and innovation” and has been so widely accepted across industry and within the DoD, that universities offer graduate degrees in the discipline. Relying on the design thinking foundation, the Decision Science Division (DSD) of Virginia Tech Applied Research Corporation (VT-ARC) human centered design facilitation technique integrates related methodologies including liberating structures and open thinking. Liberating structures, are “simple and concrete tools that can enhance group performance in diverse organizational settings.” Open thinking, popularized by Dan Pontefract, provides a comprehensive approach to decision-making that incorporates critical and creative thinking techniques. The combination of these methodologies enables tailored problem framing, innovative solution discovery, and creative adaptability to harness collaborative analytic potential, overcome the limitations of cognitive biases, and lead change. DSD VT-ARC applies this approach to complex and wicked challenges to deliver solutions that address implementation challenges and diverse stakeholder requirements. Operating under the guiding principle that collaboration is key to success, DSD regularly partners with other research organizations, such as Virginia Tech National Security Institute (VT NSI), in human centered design activities to help further the understanding, use, and benefits of the approach. This experiential session will provide attendees with some basic human centered design facilitation tools and an understanding of how these techniques might be applied across a multitude of technical and non-technical projects.

    Speaker Info:

    Christina Houfek

    Christina Houfek joined Virginia Tech’s Applied Research Corporation in 2022 following her time as an Irregular Warfare Analyst at the Naval Special Warfare Command and as Senior Professional Staff at the Johns Hopkins University Applied Physics Laboratory. She holds a B.S. in behavioral sciences, an M.A. in Leadership in Education from Notre Dame of Maryland University and a Graduate Certificate in Terrorism Analysis awarded by the University of Maryland.

     

  • Leading Change: Applying Human Centered Design Facilitation Techniques

    Abstract:

    First introduced in 1987, modern design thinking was popularized by the Stanford Design School and the global design and innovation company, IDEO. Design thinking is now recognized as a “way of thinking which leads to transformation, evolution and innovation” and has been so widely accepted across industry and within the DoD, that universities offer graduate degrees in the discipline. Relying on the design thinking foundation, the Decision Science Division (DSD) of Virginia Tech Applied Research Corporation (VT-ARC) human centered design facilitation technique integrates related methodologies including liberating structures and open thinking. Liberating structures, are “simple and concrete tools that can enhance group performance in diverse organizational settings.” Open thinking, popularized by Dan Pontefract, provides a comprehensive approach to decision-making that incorporates critical and creative thinking techniques. The combination of these methodologies enables tailored problem framing, innovative solution discovery, and creative adaptability to harness collaborative analytic potential, overcome the limitations of cognitive biases, and lead change. DSD VT-ARC applies this approach to complex and wicked challenges to deliver solutions that address implementation challenges and diverse stakeholder requirements. Operating under the guiding principle that collaboration is key to success, DSD regularly partners with other research organizations, such as Virginia Tech National Security Institute (VT NSI), in human centered design activities to help further the understanding, use, and benefits of the approach. This experiential session will provide attendees with some basic human centered design facilitation tools and an understanding of how these techniques might be applied across a multitude of technical and non-technical projects.

    Speaker Info:

    Kelli Esser

  • Lessons Learned for Study of Uncertainty Quantification in Cyber-Physical System Emulation

    Abstract:

    Over the past decade, the number and severity of cyber-attacks to critical infrastructure has continued to increase, necessitating a deeper understanding of these systems and potential threats. Recent advancements for high-fidelity system modeling, also called emulation, have enabled quantitative cyber experimentation to support analyses of system design, planning decisions, and threat characterization. However, much remains to be done to establish scientific methodologies for performing these cyber analyses more rigorously.
    Without a rigorous approach to cyber experimentation, it is difficult for analysts to fully characterize their confidence in the results of an experiment, degrading the ability to make decisions based upon analysis results, and often defeating the purpose of performing the analysis. This issue is particularly salient when analyzing critical infrastructures or similarly impactful systems, where confident, well-informed decision making is imperative. Thus, the integration of tools for rigorous scientific analysis with platforms for emulation-driven experimentation is crucial.
    This work discusses one such effort to integrate the tools necessary to perform uncertainty quantification (UQ) on an emulated model, motivated by a study on a notional critical infrastructure use case. The goal of the study was to determine how variations in the aggressiveness of the given threat affected how resilient the system was to the attacker. Resilience was measured using a series of metrics which were designed to capture the system’s ability to perform its mission in the presence of the attack. One reason for the selection of this use case was that the threat and system models were believed to be fairly deterministic and well-understood. The expectation was that results would show a linear correlation between the aggressiveness of the attacker and the resilience of the system. Surprisingly, this hypothesis was not supported by the data.
    The initial results showed no correlation, and they were deemed inconclusive. These findings spurred a series of mini analyses, leading to extensive evaluation of the data, methodology, and model to identify the cause of these results. Significant quantities of data collected as part of the initial UQ study enabled closer inspection of data sources and metrics calculation. In addition, tools developed during this work facilitated supplemental statistical analyses, including a noise study. These studies all supported the conclusion that the system model and threat model chosen were far less deterministic than initially assumed, highlighting key lessons learned for approaching similar analyses in the future.
    Although this work is discussed in the context of a specific use case, the authors believe that the lessons learned are generally applicable to similar studies applying statistical testing to complex, high-fidelity system models. Insights include the importance of deeply understanding potential sources of stochasticity in a model, planning how to handle or otherwise account for such stochasticity, and performing multiple experiments and looking at multiple metrics to gain a more holistic understanding of a modeled scenario. These results highlight the criticality of approaching system experimentation with a rigorous scientific mindset.

    Speaker Info:

    Jamie Thorpe

    Cybersecurity R&D

    Sandia National Laboratories

    Jamie Thorpe is a cybersecurity researcher at Sandia National Laboratories in Albuquerque, NM, where she develops the tools and methodologies needed to help build and analyze models of critical infrastructure systems. Her research interests include cyber resilience metrics, efficient system model development, data analysis for emulated environments, and rigorous cyber experimentation.

  • Leveraging Bayesian Methods to support Integrated Testing

    Abstract:

    This mini-tutorial will outline approaches to apply Bayesian methods to the test and evaluation process, from development of tests to interpretation of test results to translating that understanding into decision-making. We will begin by outlining the basic concepts that underlie the Bayesian approach to statistics and the potential benefits of applying that approach to test and evaluation. We will then walk through application to an example (notional) program, setting up data models and priors on the associated parameters, and interpreting the results. From there, techniques for integrating results from multiple stages of tests will be discussed, building understanding of system behavior as evidence accumulates. Finally, we will conclude by describing how Bayesian thinking can be used to translate information from test outcomes into requirements and decision-making. The mini-tutorial will assume some background in statistics but the audience need not have prior exposure to Bayesian methods.

    Speaker Info:

    Justin Krometis

    Justin Krometis is a Research Assistant Professor in the Intelligent Systems Division of the Virginia Tech National Security Institute and an Affiliate Research Assistant Professor in the Virginia Tech Department of Mathematics. His research is in the development of theoretical and computational frameworks for Bayesian inference, particularly in high-dimensional regimes, and in the application of those methods to domain sciences ranging from fluids to geophysics to testing and evaluation. His areas of interest include statistical inverse problems, parameter estimation, machine learning, data science, and experimental design. Dr. Krometis holds a Ph.D. in mathematics, a M.S. in mathematics, a B.S. in mathematics, and a B.S. in physics, all from Virginia Tech.

  • Leveraging Bayesian Methods to support Integrated Testing

    Abstract:

    This mini-tutorial will outline approaches to apply Bayesian methods to the test and evaluation process, from development of tests to interpretation of test results to translating that understanding into decision-making. We will begin by outlining the basic concepts that underlie the Bayesian approach to statistics and the potential benefits of applying that approach to test and evaluation. We will then walk through application to an example (notional) program, setting up data models and priors on the associated parameters, and interpreting the results. From there, techniques for integrating results from multiple stages of tests will be discussed, building understanding of system behavior as evidence accumulates. Finally, we will conclude by describing how Bayesian thinking can be used to translate information from test outcomes into requirements and decision-making. The mini-tutorial will assume some background in statistics but the audience need not have prior exposure to Bayesian methods.

    Speaker Info:

    Adam Ahmed

    Adam S. Ahmed is a Research Scientist at Metron, Inc. and is the technical lead for the Metron DOT&E effort. His research interests include applying novel Bayesian approaches to testing and evaluation, machine learning methods for small datasets as applied to undersea mine classification, and time series classification for continuous active sonar systems. Prior to Metron, he worked on the synthesis and measurement of skyrmion-hosting materials for next generation magnetic memory storage devices at The Ohio State University. Dr. Ahmed holds a Ph.D. and M.S. in physics from The Ohio State University, and a B.S. in physics from University of Illinois Urbana-Champaign.

  • Live demo: Monte Carlo power evaluation with “skpr” and “skprJMP”

    Abstract:

    Speaker Info:

    Tyler Morgan-Wall

    RSM

    IDA

  • Maritime Automatic Target Recognition

    Abstract:

    The goal of this project was to develop an algorithm that automatically detects and predicts the future position of boats in a maritime environment. We integrated these algorithms into an intelligent combat system developed by the Naval Surface Warfare Center Dahlgren Division (NSWCDD). This algorithm used YOLOv8 for computer vision detection and a linear Kalman filter for prediction. The data used underwent extensive augmentation and third party integration. It was tested at a Live, Virtual, and Constructive (LVC) event held at NSWCDD this past fall (October 2023).

    The initial models faced challenges of overfitting. However, through processes such as data augmentation, incorporation of third-party data, and layer freezing techniques, we were able to develop a more robust model. Various datasets were processed by tools to improve data robustness. By further labeling the data, we were able to obtain ground truth data to evaluate the Kalman filter. The Kalman filter was chosen for its versatility and predictive tracking capabilities. Qualitative and quantitative analysis were performed for both the YOLO and Kalman filter models.

    Much of the project's contribution lay in its ability to adapt to a variety of data. YOLO displayed effectiveness across various maritime scenarios, and the Kalman Filter excelled in predicting boat movements across difficult situations such as abrupt camera movements.

    In preparation for the live fire test event, our algorithm was integrated into the NSWCDD system and code was written to produce the expected output files.

    In summary, this project successfully developed an algorithm for detecting and predicting boats in a maritime environment. This project demonstrated the potential of the intersection of machine learning, rapid integration of technology, and maritime security.

    Speaker Info:

    Mason Zoellner

    Undergraduate Research Assistant

    Hume Center for National Security and Technology

    I am currently serving as an Undergraduate Research Assistant at the Hume Center for National Security and Technology at Virginia Tech.  As a computer science major, I excel in analytical skills, software development, and critical thinking. My current research involves contributing to a Digital Transformation Artificial Intelligence/Machine Learning project, where I apply machine learning and computer vision to automatic target recognition.

  • Meta-analysis of the SALIANT procedure for assessing team situation awareness

    Abstract:

    Many Department of Defense (DoD) systems aim to increase or maintain Situational Awareness (SA) at the individual or group level. In some cases, maintenance or enhancement of SA is listed as a primary function or requirement of the system. However, during test and evaluation SA is examined inconsistently or is not measured at all. Situational Awareness Linked Indicators Adapted to Novel Tasks (SALIANT) is an empirically-based methodology meant to measure SA at the team, or group, level. While research using the SALIANT model suggests that it effectively quantifies team SA, no study has examined the effectiveness of SALIANT across the entirety of the existing empirical research. The aim of the current work is to conduct a meta-analysis of previous research to examine the overall reliability of SALIANT as an SA measurement tool. This meta-analysis will assess when and how SALIANT can serve as a reliable indicator of performance at testing. Additional applications of SALIANT in non-traditional operational testing domains will also be discussed.

    Speaker Info:

    Sarah Shaffer

    Research Staff Member

    IDA

    Dr. Sarah Shaffer received her Ph.D. in Experimental Psychology from Florida International University. Her research focuses on the application of cognitive science principles in decision-
    making and problem-solving, strategy, and attention and memory. Her research spans a variety of topics including information management in decision-making and intelligence collection, deception detection, and elicitation.  Upon completing her doctorate, she pursued a fellowship working with federal law enforcement to conduct research in areas including stochastic terrorism, deception, and cybercrime.

    Dr. Shaffer is currently a Research Staff Member at the Institute for Defense Analyses (IDA), where she focuses on operations testing and evaluation in the areas of Human Systems
    Integration, Human-Computer Interaction, and test design.

  • Mission Engineering

    Abstract:

    The US Department of Defense (DoD) has expanded their emphasis on the application of systems engineering approaches to ‘missions’. As originally defined in the Defense Acquisition Guidebook, Mission Engineering (ME) is “the deliberate planning, analyzing, organizing, and integrating of current and emerging operational and system capabilities to achieve desired operational mission effects”. Based on experience to date, the new definition reflects ME as an “an interdisciplinary approach and process encompassing the entire technical effort to analyze, design, and integrate current and emerging operational needs and capabilities to achieve desired mission outcomes”. This presentation presents the current mission engineering methodology, describes how it is currently being applied, and explores the role of T&E in the ME process. Mission engineering is applying systems engineering to missions – that is, engineering a system of systems, (including organizations, people and technical systems) to provide desired impact on mission or capability outcomes. Traditionally, systems of systems engineering focused on designing systems or systems of systems to achieve specified technical performance. Mission engineering goes one step further to assess whether the system of systems, when deployed in a realistic user environment, achieves the user mission or capability objectives. Mission engineering applies digital model-based engineering approaches to describe the sets of activities in the form of ‘mission threads’ (or activity models) needed to execute the mission and then adds information on players and systems used to implement these activities in the form of ‘mission engineering threads.’ These digital ‘mission models’ are then implemented in operational simulations to assess how well they achieve user capability objectives. Gaps are identified and models are updated to reflect proposed changes, including reorientation of systems and insertion of new candidate solutions, and which are assessed relative to changes in overall mission effectiveness.

    Speaker Info:

    Gabriela Parasidis

    Lead Systems Engineer

    MITRE

    Gabriela Parasidis is a Lead Systems Engineer in the MITRE Systems Engineering Innovation Center. She applies Digital Engineering to Mission Engineering and Systems of Systems (SoS) Engineering to support Department of Defense acquisition decisions. She has led research in hypersonics, including analyses related to flight dynamics, aerodynamics, aerothermodynamics, and structural loading. She holds a B.S. in Mechanical Engineering from Cornell University and a M.S. in Systems Engineering from Johns Hopkins University.

  • MLTEing Models: Negotiating, Evaluating, and Documenting Model and System Qualities

    Abstract:

    Many organizations seek to ensure that machine learning (ML) and artificial intelligence (AI) systems work as intended in production but currently do not have a cohesive methodology in place to do so. To fill this gap, we built MLTE (Machine Learning Test and Evaluation, colloquially referred to as “melt”), a framework and implementation to evaluate ML models and systems. The framework compiles state-of-the-art evaluation techniques into an organizational process for interdisciplinary teams, including model developers, software engineers, system owners, and other stakeholders. MLTE tooling, a Python package, supports this process by providing a domain-specific language that teams can use to express model requirements, an infrastructure to define, generate, and collect ML evaluation metrics, and the means to communicate results.

    In this presentation, we will discuss current MLTE details as well as future plans to support developmental testing (DT) and operational testing (OT) organizations and teams. A problem in the Department of Defense (DoD) is that test and evaluation (T&E) organizations are segregated: OT organizations work independently from DT organizations, which leads to inefficiencies. Model developers doing contractor testing (CT) may not have access to mission and system requirements and therefore fail to adequately address the real-world operational environment. Motivation to solve these two problems has generated a push for Integrated T&E — or T&E as a Continuum — in which testing is iteratively updated and refined based on previous test outcomes, and is informed by mission and system requirements. MLTE helps teams to better negotiate, evaluate, and document ML model and system qualities, and will aid in the facilitation of this iterative testing approach. As MLTE matures, it can be extended to further support Integrated T&E by (1) providing test data and artifacts that OT can use as evidence to make risk-based assessments regarding the appropriate level of OT and (2) ensuring that CT and DT testing of ML models accurately reflects the challenges and constraints of real-world operational environments.

    Speaker Info:

    Kate Maffey

    CPT Kate Maffey is a Data Scientist at the U.S. Army's Artificial Intelligence Integration Center (AI2C), where she specializes in applied machine learning evaluation research.

  • MLTEing Models: Negotiating, Evaluating, and Documenting Model and System Qualities

    Abstract:

    Many organizations seek to ensure that machine learning (ML) and artificial intelligence (AI) systems work as intended in production but currently do not have a cohesive methodology in place to do so. To fill this gap, we built MLTE (Machine Learning Test and Evaluation, colloquially referred to as “melt”), a framework and implementation to evaluate ML models and systems. The framework compiles state-of-the-art evaluation techniques into an organizational process for interdisciplinary teams, including model developers, software engineers, system owners, and other stakeholders. MLTE tooling, a Python package, supports this process by providing a domain-specific language that teams can use to express model requirements, an infrastructure to define, generate, and collect ML evaluation metrics, and the means to communicate results.

    In this presentation, we will discuss current MLTE details as well as future plans to support developmental testing (DT) and operational testing (OT) organizations and teams. A problem in the Department of Defense (DoD) is that test and evaluation (T&E) organizations are segregated: OT organizations work independently from DT organizations, which leads to inefficiencies. Model developers doing contractor testing (CT) may not have access to mission and system requirements and therefore fail to adequately address the real-world operational environment. Motivation to solve these two problems has generated a push for Integrated T&E — or T&E as a Continuum — in which testing is iteratively updated and refined based on previous test outcomes, and is informed by mission and system requirements. MLTE helps teams to better negotiate, evaluate, and document ML model and system qualities, and will aid in the facilitation of this iterative testing approach. As MLTE matures, it can be extended to further support Integrated T&E by (1) providing test data and artifacts that OT can use as evidence to make risk-based assessments regarding the appropriate level of OT and (2) ensuring that CT and DT testing of ML models accurately reflects the challenges and constraints of real-world operational environments.

    Speaker Info:

    Robert Edman

  • Moving Target Defense for Space Systems

    Abstract:

    Space systems provide many critical functions to the military, federal agencies, and infrastructure networks. In particular, MIL-STD-1553 serves as a common command and control network for space systems, nuclear weapons, and DoD weapon systems. Nation-state adversaries have shown the ability to disrupt critical infrastructure through cyber-attacks targeting systems of networked, embedded computers. Moving target defenses (MTDs) have been proposed as a means for defending various networks and systems against potential cyber-attacks. In addition, MTDs could be employed as an ‘operate through’ mitigation for improving cyber resilience.

    We devised a MTD algorithm and tested its application to a MIL-STD-1553 network. We demonstrated and analyzed four aspects of the MTD algorithm usage: 1) characterized the performance, unpredictability, and randomness of the core algorithm, 2) demonstrated feasibility by conducting experiments on actual commercial hardware, 3) conducted an exfiltration experiment where the reduction in adversarial knowledge was 97%, and 4) employed the LSTM machine learning model to see if it could defeat the algorithm and glean information about the algorithm’s resistance to machine learning attacks. Given the above analysis, we show that the algorithm has the ability to be used in real-time bus networks as well as other (non-address) applications.

    Speaker Info:

    Chris Jenkins

    R&D S&E, Cybersecurity

    Sandia National Labs

    Chris is a Principal Cybersecurity Research & Development staff member in the Systems Security Research Department as part of Sandia National Laboratories’ Information Operations Center.  Chris supports Sandia’s mission in three key areas: cyber-physical cybersecurity research, space cybersecurity, and cybersecurity expertise outside the lab. Chris regularly publishes in the open literature, is responsible for multiple technical advances, has been granted patents, and actively seeks opportunities to transition technology outside of Sandia.

    Chris leads a team researching innovative ways to protect critical infrastructure and other high-consequence operational technology. His work uses a technology called moving target defense (MTD) to protect these systems from adversary attack. He has partnered with Purdue University to determine the strength of the innovative, patent pending MTD algorithm he created. His work has explored integrating MTD into real-time communication systems employed in space systems and other national security relevant communications architectures. His current research represents Sandia’s national commitment to space systems and Sandia’s strategic investment in the Science and Technology Advancing Resilience for Contested Space Mission Campaign.

  • Object Identification and Classification in Threat Scenarios

    Abstract:

    In rapidly evolving threat scenarios, the accurate and timely identification of hostile enemies armed with weapons is crucial to strategic advantage and personnel safety. This study aims to develop both a timely and accurate model utilizing YOLOv5 for the detection of weapons and persons in real-time drone footage and generate an alert containing the count of weapons and persons detected. Existing methods in this field often focus on either minimizing type I/type II errors or the speed at which the model runs. In our current work, we have focused on two main points of emphasis throughout training our model. The minimization of type II error (minimizing instances of weapons of persons present but not detected) and keeping accuracy and precision consistent while increasing the speed of our model to keep up with real-time footage. Various parameters were adjusted within our model, including but not limited to speed, freezing layers, and image size. Going from our first to the final adjusted model, overall precision and recall went from 71.9% to 89.2% and 63.7% to 77.5%, respectively. The occurrences of misidentification produced from our model decreased dramatically, from 27% of persons misidentified as either weapon or background noise to 14%, and the misidentification of weapons from 50% to 34%. An important consideration for future work is mitigating overfitting the model to a particular dataset when training. In real-world implementation, our model needs to perform well across a variety of conditions and angles not all of which were introduced in the training data set.

    Speaker Info:

    Karly Parcell

    Cadet

    United States Military Academy

    My name is Karly Parcell, I am a class of 2024 Cadet, majoring in Applied Statistics and Data Science at the United States Military Academy. My collaborators for this paper are COL James Starling and Dr. Brian Choi. 

  • Onboard spacecraft thermal modeling using physics informed machine learning

    Abstract:

    Modeling thermal states for complex space missions, such as the surface exploration of airless bodies, requires high computation, whether used in ground-based analysis for spacecraft design or during onboard reasoning for autonomous operations. For example, a finite-element-method (FEM) thermal model with hundreds of elements can take significant time to simulate on a typical workstation, which makes it unsuitable for onboard reasoning during time-sensitive scenarios such as descent and landing, proximity operations, or in-space assembly. Further, the lack of fast and accurate thermal modeling drives thermal designs to be more conservative and leads to spacecraft with larger mass and higher power budgets.
    The emerging paradigm of physics-informed machine learning (PIML) presents a class of hybrid modeling architectures that address this challenge by combining simplified physics models (e.g., analytical, reduced-order, and coarse mesh models) with sample-based machine learning (ML) models (e.g., deep neural networks and Gaussian processes) resulting in models which maintain both interpretability and robustness. Such techniques enable designs with reduced mass and power through onboard thermal-state estimation and control and may lead to improved onboard handling of off-nominal states, including unplanned down-time (e.g. GOES-7 cite{bedingfield1996spacecraft}, and M2020)
    The PIML model or hybrid model presented here consists of a neural network which predicts reduced nodalizations (coarse mesh size) given on-orbit thermal load conditions, and subsequently a (relatively coarse) finite-difference model operates on this mesh to predict thermal states. We compare the computational performance and accuracy of the hybrid model to a purely data-driven model, and a high-fidelity finite-difference model (on a fine mesh) of a prototype Earth-orbiting small spacecraft. This hybrid thermal model promises to achieve 1) faster design iterations, 2) reduction in mission costs by circumventing worst-case-based conservative planning, and 3) safer thermal-aware navigation and exploration.

    Speaker Info:

    Zaki Hasnain

    Data Scientist

    NASA Jet Propulsion Laboratory

    Dr. Zaki Hasnain is a data scientist in NASA JPL’s Systems Engineering Division where he participates in and leads research and development tasks for space exploration. His research interests include physics informed machine learning and system health management for autonomous systems. He has experience developing data-driven, game-theoretic, probabilistic, physics-based, and machine learning models and algorithms for space, cancer, and autonomous systems applications. He received a B.S. in engineering science and mechanics at Virginia Polytechnic and State University. He received M.S. and Ph.D. degrees in mechanical engineering, and a M.S. in computer science at University of Southern California.

  • Operational T&E of AI-supported Data Integration, Fusion, and Analysis Systems

    Abstract:

    AI will play an important role in future military systems. However, large questions remain about how to test AI systems, especially in operational settings. Here, we discuss an approach for the operational test and evaluation (OT&E) of AI-supported data integration, fusion, and analysis systems. We highlight new challenges posed by AI-supported systems and we discuss new and existing OT&E methods for overcoming them. We demonstrate how to apply these OT&E methods via a notional test concept that focuses on evaluating an AI-supported data integration system in terms of its technical performance (how accurate is the AI output?) and human systems interaction (how does the AI affect users?).

    Speaker Info:

    Adam Miller

    Research Staff Member

    IDA

    Adam Miller has been a Research Staff Member at IDA since 2022. He is a member of the IDA Test Science team supporting HSI evaluations of Land and Expeditionary Warfare systems. Previously, Adam worked as a behavioral neuroscientist at The Hospital for Sick Children in Toronto, ON, where he studied how memories are encoded in the brain. He has a PhD in Psychology from Cornell University, and a B.A. in Psychology from Providence College.

  • Operationally Representative Data and Cybersecurity for Avionics

    Abstract:

    This talk discusses the ARINC 429 standard and its inherent lack of security, demonstrates proven mission effects in a hardware-in-the-loop (HITL) simulator, and presents a data set collected from real avionics.
    ARINC 429 is a ubiquitous data bus for civil avionics, enabling safe and reliable communication between devices from disparate manufacturers. However, ARINC 429 lacks any form of encryption or authentication, making it an inherently insecure communication protocol and rendering any connected avionics vulnerable to a range of attacks.
    We constructed a HITL simulator with ARINC 429 buses to explore these vulnerabilities, and to identify potential mission effects. The HITL simulator includes commercial off-the-shelf avionics hardware including a multi-function display, an Enhanced Ground Proximity Warning System, as well as a realistic flight simulator.
    We performed a denial-of-service attack against the multi-function display via a compromised transmit node on an ARINC 429 bus, using commercially available tools, which succeeded in disabling important navigational aids. This simple replay attack demonstrates how effectively a “leave-behind” device can cause serious mission effects.
    This proven adversarial effect on physical avionics illustrates the risk inherent in ARINC 429 and the need for the ability to detect, mitigate, and recover from these attacks. One potential solution is an intrusion detection system (IDS) trained using data collected from the electrical properties of the physical bus. Although previous research has demonstrated the feasibility of an IDS on an ARINC 429 bus, none have been trained on data generated by actual avionics hardware.

    Speaker Info:

    Steven Movit

    Research Staff Member

    IDA

    Dr. Steven Movit earned bachelor's degrees in Astrophysics and Statistics from Rice University in Houston, Texas, followed by a doctorate from Penn State University in Astronomy and Astrophysics, where he worked on the IceCube telescope.

    Dr. Movit joined the Institute for Defense Analyses in 2011, where he has concentrated on aircraft survivability, including cyber survivability and electronic warfare, mainly supporting analyses for the Director, Operational Test and Evaluation. Dr. Movit started a "non-IP" cyber lab at IDA in 2021 which has developed hardware-in-the-loop simulators for research and training.

  • Operationally Representative Data and Cybersecurity for Avionics Demonstration

    Abstract:

    This poster session considers the ARINC 429 standard and its inherent lack of security by using a hardware-in-the-loop (HITL) simulator to demonstrate possible mission effects from a cyber compromise. ARINC 429 is a ubiquitous data bus for civil avionics, enabling safe and reliable communication between devices from disparate manufacturers. However, ARINC 429 lacks any form of encryption or authentication, making it an inherently insecure communication protocol and rendering any connected avionics vulnerable to a range of attacks.

    This poster session includes a hands-on demonstration of possible mission effects due to a cyber compromise of the ARINC 429 data bus by putting the audience at the control of the HITL flight simulator with ARINC 429 buses. The HITL simulator uses commercial off-the-shelf avionics hardware including a multi-function display, and an Enhanced Ground Proximity Warning System to generate operationally realistic ARINC 429 messages. Realistic flight controls and flight simulation software are used to further increase the simulator’s fidelity. The cyberattack is based on a system with a malicious device physically-connected to the ARINC 429 bus network. The cyberattack degrades the multi-function display through a denial-of-service attack which disables important navigational aids. The poster also describes how testers can plan to test similar buses found on vehicles and can observe and document data from this type of testing event.

    Speaker Info:

    Jason Schlup

    Research Staff Member

    IDA

    Dr. Jason Schlup received his Ph.D. in Aeronautics from the California Institute of Technology in 2018.  He is now a Research Staff Member at the Institute for Defense Analyses and provides analytical support to the Director, Operational Test and Evaluation’s Cyber Assessment Program.  His research and interest areas include enterprise network attack analysis, improving data collection methodologies, and cloud security.  Jason also contributes to IDA’s Cyber Lab capability, focusing on Internet Protocol-based training modules and outreach opportunities.

  • Optimizing Mission Concept Development: A Bayesian Approach Utilizing DAGs

    Abstract:

    This study delves into the application of influence diagrams in mission concept development at the Jet Propulsion Laboratory, emphasizing the importance of how technical variables influence mission costs. Concept development is an early stage in the design process that requires extensive decision-making, in which the tradeoffs between time, money, and scientific goals are explored to generate a wide range of project ideas from which new missions can be selected. Utilizing influence diagrams is one strategy for optimizing decision making. An influence diagram represents decision scenarios in a graphical and mathematical manner, providing an intuitive interpretation of the relationships between input variables (which are functions of the decisions made in a trade space) and outcomes. These input-to-outcome relationships may be mediated by intermediate variables, and the influence diagram provides a convenient way to encode the hypothesized “trickle-down” structure of the system. In the context of mission design and concept development, influence diagrams can inform analysis and decision-making under uncertainty to encourage the design of realistic projects within imposed cost limitations, and better understand the impacts of trade space decisions on outcomes like cost.

    This project addresses this initiative by focusing on the analysis of an influence diagram framed as a Directed Acyclic Graph (DAG), a graphical structure where vertices are connected by directed edges that do not form loops. Edge weights in the DAG represent the strength and direction of relationships between variables. The DAG aims to model the trickle-down effects of mission technical parameters, such as payload mass, payload power, delta V, and data volume, on mission cost elements. A Bayesian multilevel regression model with random effects is used for estimating the edge weights in a specific DAG (constructed according to expert opinion) that is meant to represent a hypothesized trickle-down structure from technical parameters to cost. This Bayesian approach provides a flexible and robust framework, allowing us to incorporate prior knowledge, handle small datasets effectively, and leverage its capacity to capture the inherent uncertainty in our data.

    Speaker Info:

    Patricia Gallagher

    Data Scientist

    Jet Propulsion Laboratory

    Patricia Gallagher is a Data Scientist at Jet Propulsion Laboratory (JPL) in the Systems Modeling, Analysis & Architectures group. In her current role, Patricia focuses on supporting mission formulation through the development of tools and models, with a particular emphasis on cost modeling. Prior to this, she spent four years as a Project Resource Analyst at JPL, gaining a strong foundation in the business operations side of missions. Patricia holds a Bachelor's degree in Economics with a minor in Mathematics from California State University, Los Angeles, and a Master of Science in Data Science from University of California, Berkeley.

  • Overview of a survey methods test for the NASA Quesst community survey campaign

    Abstract:

    In its mission to expand knowledge and improve aviation, NASA conducts research to address sonic boom noise, the prime barrier to overland supersonic flight. NASA is currently preparing for a community survey campaign to assess response to noise from the new X-59 aircraft. During each community survey, a substantial number of observations must be collected over a limited timeframe to generate a dose response relationship. A sample of residents will be recruited in advance to fill out a brief survey each time X-59 flies over, approximately 80 times throughout a month. In preparation NASA conducted a month-long test of survey methods in 2023. A sample of 800 residents was recruited from a simulated fly-over area. Because there were no actual X-59 fly-overs, respondents were asked about their reactions to noise from normal aircraft operations. The respondents chose whether to fill out the survey on the web or via a smartphone application. Evaluating response rates and how they evolved over time was a specific focus of the test. Also, a graduated incentive structure was implemented to keep respondents engaged. Finally, location data was collected from respondents since it will be needed to estimate individual noise exposure from X-59. The results of this survey test will help determine the design of the community survey campaign. This is an overview presentation that will cover the key goals, results, and lessons learned from the survey test.

    Speaker Info:

    Jonathan Rathsam

    Senior Research Engineer

    NASA Langley Research Center

    Dr. Jonathan Rathsam is a Senior Research Engineer at NASA’s Langley Research Center in Hampton, Virginia.  He conducts laboratory and field research on human perceptions of low noise supersonic overflights.  He currently serves as technical lead of survey design and analysis for the X-59 community overflight phase of NASA’s Quesst mission.  He also serves as co-chair for Team 6 – Community response to noise and annoyance for the International Commission on Biological Effects of Noise, and previously served as NASA co-chair for DATAWorks.  He holds a Ph.D. in Engineering from the University of Nebraska, a B.A. in Physics from Grinnell College in Iowa, and completed postdoctoral research in acoustics at Ben-Gurion University in Israel.

  • Practical Experimental Design Strategies for Binary Responses under Operational Constraints

    Abstract:

    Defense and aerospace testing commonly involves binary responses to changing levels of a system configuration or an explanatory variable. Examples of binary responses are hit or miss, detect or not detect, and success or fail, and they are a special case of categorical responses with multiple discreet levels. The test objective is typically to estimate a statistical model that predicts the probability of occurrence of the binary response as a function of the explanatory variable(s). Statistical approaches are readily available for modeling binary responses; however, they often assume that the design features large sample sizes that provide responses distributed across the range of the explanatory variable. In practice, these assumptions are often challenged by small sample sizes and response levels focused over a limited range of the explanatory variable(s). These practical restrictions are due to experimentation cost, operational constraints, and a primary interest in one response level, e.g., testing may be more focused on hits compared to a misses. This presentation provides strategies to address these challenges with an emphasis on collaboration techniques to develop experimental design approaches under practical constraints. Case studies are presented to illustrate these strategies from estimating human annoyance to low noise supersonic overflights in NASA’s Quesst mission and evaluating detection capability of nondestructive evaluation methods for fracture-critical human-spaceflight components. This presentation offers practical guidance on experimental design strategies for binary responses under operational constraints.

    Speaker Info:

    Peter Parker

    Statistician

    NASA

    Peter A. Parker and Nathan Cruze

  • Quantitative Reliability and Resilience Assessment of a Machine Learning Algorithm

    Abstract:

    Advances in machine learning (ML) have led to applications in safety-critical domains, including security, defense, and healthcare. These ML models are confronted with dynamically changing and actively hostile conditions characteristic of real-world applications, requiring systems incorporating ML to be reliable and resilient. Many studies propose techniques to improve the robustness of ML algorithms. However, fewer consider quantitative methods to assess the reliability and resilience of these systems. To address this gap, this study demonstrates how to collect relevant data during the training and testing of ML suitable for applying software reliability, with and without covariates, and resilience models, and the subsequent interpretation of these analyses. The proposed approach promotes quantitative risk assessment of machine learning technologies, providing the ability to track and predict degradation and improvement in the ML model performance and assisting ML and system engineers with an objective approach to compare the relative effectiveness of alternative training and testing methods. The approach is illustrated in the context of an image recognition model subjected to two generative adversarial attacks and then iteratively retrained to improve the system's performance. Our results indicate that software reliability models incorporating covariates characterized the misclassification discovery process more accurately than models without covariates. Moreover, the resilience model based on multiple linear regression incorporating interactions between covariates tracked and predicted degradation and recovery of performance best. Thus, software reliability and resilience models offer rigorous quantitative assurance methods for ML-enabled systems and processes.

    Speaker Info:

    Karen Alves da Mata

    Graduate Research Assistant

    University of Massachusetts Dartmouth

    Karen da Mata is a Ph.D. student in the Electrical and Computer Engineering Department at the University of Massachusetts - Dartmouth. She received her MS  in Computer Engineering from UMassD in 2023 and her BS degree in the Electrical Engineering Department at the Federal University of Ouro Preto - Brazil - in 2018. 

  • R for Reproducible Scientific Analysis

    Abstract:

    The Carpentries more introductory R lesson. In addition to their standard content, this workshop covers data analysis and visualization in R, focusing on working with tabular data and other core data structures, using conditionals and loops, writing custom functions, and creating publication-quality graphics. As their more introductory R offering, this workshop also introduces learners to RStudio and strategies for getting help. This workshop is appropriate for learners with no previous programming experience.

    Speaker Info:

    SherAaron Hurt

    The Carpentries

    SherAaron (Sher!) Hurt is the Director of Workshops and Instruction for The Carpentries, an organisation that teaches foundational coding and data science skills to researchers worldwide. As the Director of Workshops and Training she provides strategy, oversight, and overall management, planning, vision, and leadership for The Carpentries Workshops and Instruction Team. She oversees and support all administration, communications, and data entry aspects for workshops. Sher! oversees and supports all Instructor and Trainer training aspects, including curriculum development and maintenance, certification, and community management. Sher! supports the certified Instructor community by developing appropriate programming, communications processes, and workflows. She develop resources and workflows to ensure the overall health of workshops and the Instructor Training and Trainer Training programs. She earned her B.S. in Business Management at Michigan Technological University and M.A. degree in Hospitality Management at Florida International University. Sher! resides in Detroit, MI where she enjoys travel and fitness.

  • R for Reproducible Scientific Analysis

    Abstract:

    The Carpentries more introductory R lesson. In addition to their standard content, this workshop covers data analysis and visualization in R, focusing on working with tabular data and other core data structures, using conditionals and loops, writing custom functions, and creating publication-quality graphics. As their more introductory R offering, this workshop also introduces learners to RStudio and strategies for getting help. This workshop is appropriate for learners with no previous programming experience.

    Speaker Info:

    Barney Ricca

    University of Colorado, Colorado Springs

    Dr. Bernard (Barney) Ricca is a Research Associate Professor at the Lyda Hill Institute for Human Resilience at the University of Colorado Colorado Springs, and the President of the Society for Chaos Theory, Psychology, and Life Sciences. His research focuses on the development of nonlinear dynamical systems analysis approaches and the applications of dynamical systems to the social and behavioral sciences, particularly to dynamics of trauma survivors. Recent projects include analyses of hurricane and wildfire survivors and a study of small-group dynamics in schools. He is currently the Co-Principal Investigator on an NSF-funded project to investigate the post-trauma dynamics of motor vehicle accident survivors. He received a Ph.D. in Physics from the University of Michigan.

  • Rancor-HUNTER: A Virtual Plant and Operator Environment for Predicting Human Performance

    Abstract:

    Advances in simulation capabilities to model physical systems have outpaced the development of simulations for humans using those physical systems. There is an argument that the infinite span of potential human behaviors inherently render human modeling more challenging than physical systems. Despite this challenge, the need for modeling humans interacting with these complex systems is paramount. As technologies have improved, many of the failure modes originating from the physical systems have been solved. This means the overall proportion of human errors has increased, such that it is not uncommon to be the primary driver of system failure in modern complex systems. Moreover, technologies such as automated systems may introduce emerging contexts that can cause new, unanticipated modes of human error. Therefore, it is now more important than ever to develop models of human behavior to realize overall system error reductions and achieve established safety margins. To support new and novel concepts of operations for the anticipated wave of advanced nuclear reactor deployments, human factors and human reliability analysis researchers need to develop advanced simulation-based approaches. This talk presents a simulation environment suitable to both collect data and then perform Monte Carlo simulations to evaluate human performance and develop better models of human behavior. Specifically, the Rancor Microworld Simulator models a complex energy production system in a simplified manner. Rancor includes computer-based procedures, which serve as a framework to automatically classify human behaviors without manual, subjective experimenter coding during scenarios. This method supports a detailed level of analysis at the task level. It is feasible for collecting large sample sizes required to develop quantitative modelling elements that have historically challenged traditional full-scope simulator study approaches. Additionally, the other portion of this experimental platform, the Human Unimodel for Nuclear Technology to Enhance Reliability (HUNTER), is presented to show how the collected data can be used to evaluate novel scenarios based on the contextual factors, or performance shaping factors, derived from Rancor simulations. Rancor-HUNTER is being used to predict operator performance with new procedures, such as results from control room modernization or new-build situations. Rancor-HUNTER is also proving a useful surrogate platform to model human performance for other complex systems.

    Speaker Info:

    Thomas Ulrich

    Human Factors and Reliability Research Scientist

    Idaho National Laboratory

    Dr. Thomas Ulrich is a human factors and reliability research scientist at the Idaho National Laboratory. He has led and participated in several full-scope, full-scale simulator studies using the Human Systems Simulation Laboratory (HSSL) to investigate a range of nuclear control room topics. Dr. Ulrich possesses expertise in human performance assessment methodology. He is an expert in nuclear process control simulation and interface prototyping development. Dr. Ulrich’s active research includes dynamic human reliability analysis methodology and digital and automated HMI software development for existing and advanced reactor nuclear power plant operations. He is the codeveloper of the Rancor microworld simulator and holds a copyright for “RANCOR Microworld Simulation Environment for Nuclear Process Control,” assertion extension granted on 9/27/18, for a period of ten (10) years, under BEA Attorney Docket No. CW-18-08. He actively develops the HUNTER INL software, which supports dynamic human reliability analysis via simulating virtual operators for nuclear and electric grid operations. Dr. Ulrich currently leads a research project using the HSSL to evaluate commercial flexible power operation and generation concepts of operations for coupling offsite hydrogen production to existing light water reactors. Most recently Dr. Ulrich started supporting a first of a kind research project to develop an advanced reactor remote concept of operations leveraging multiple digital twins located at both the reactor site and as a support tool for operators at a remote operations center.
  • Recommendations for Cyber Test & Evaluation of Space Systems

    Abstract:

    This presentation marks the conclusion of a study aimed to understand the current state of cyber test and evaluation (T&E) activities that occur within the space domain. This includes topics such as cyber T&E challenges unique to the space domain (e.g., culture and motivations, space system architectures and threats, cyber T&E resources), cyber T&E policy and guidance, and results from a space cyber T&E survey and set of interviews. Recommendations include establishing a cyber T&E helpdesk and rapid response team, establishing contracting templates, incentivizing space cyber T&E innovation, growing and maturing the space cyber T&E workforce, and learning from cyber ranges.

    Speaker Info:

    David "Fuzzy" Wells

    Principal Cyber Simulationist

    The MITRE Corporation

    Dr. David “Fuzzy” Wells is Principal Cyber Simulationist for The MITRE Corporation supporting the National Cyber Range Complex.  Dr. Wells is the former Director of U.S. Indo-Pacific Command’s Cyber War Innovation Center where he built the first combatant command venue for cyber testing, training, and experimentation; managed the Command's joint cyber innovation & experimentation portfolio; and executed cyber range testing and training events for service, joint, and coalition partners. Dr. Wells was the first Air Force officer to obtain a Ph.D. in Modeling, Virtual Environments, and Simulation from the Naval Postgraduate School and M.S. in Modeling & Simulation from the Air Force Institute of Technology. He is a Certified Modeling & Simulation Professional Charter Member.

  • Regression and Time Series Mixture Approaches to Predict Resilience

    Abstract:

    Resilience engineering is the ability to build and sustain a system that can deal effectively with disruptive events. Previous resilience engineering research focuses on metrics to quantify resilience and models to characterize system performance. However, resilience metrics are normally computed after disruptions have occurred and existing models lack the ability to predict one or more shocks and subsequent recoveries. To address these limitations, this talk presents three alternative approaches to model system resilience with statistical techniques based on (i) regression, (ii) time series, and (iii) a combination of regression and time series to track and predict how system performance will change when exposed to multiple shocks and stresses of different intensity and duration, provide structure for planning tests to assess system resilience against particular shocks and stresses and guide data collection necessary to conduct tests effectively. These modeling approaches are general and can be applied to systems and processes in multiple domains. A historical data set on job losses during the 1980 recessions in the United States is used to assess the predictive accuracy of these approaches. Goodness-of-fit measures and confidence intervals are computed, and interval-based and point-based resilience metrics are predicted to assess how well the models perform on the data set considered. The results suggest that resilience models based on statistical methods such as multiple linear regression and multivariate time series models are capable of modeling and predicting resilience curves exhibiting multiple shocks and subsequent recoveries. However, models that combine regression and time series account for changes in performance due to current and time-delayed effects from disruptions most effectively, demonstrating superior performance in long-term predictions and higher goodness-of-fit despite increased parametric complexity.

    Speaker Info:

    Priscila Silva

    Graduate Research Assistant

    University of Massachusetts Dartmouth, Department of Electrical and Computer Engineering

    Priscila Silva is a Ph.D. candidate in Electrical and Computer Engineering at University of Massachusetts Dartmouth (UMassD). She received her MS degree in Computer Engineering from UMassD in 2022, and her BS degree in Electrical Engineering from Federal University of Ouro Preto (UFOP) in 2017. Her research interests include system reliability and resilience engineering for performance predictions, including computer, cyber-physical,
    infrastructure, finance, and environment domains.

  • Regularization Approach to Learning Bioburden Density for Planetary Protection

    Abstract:

    Over the last 2 years, the scientific community and the general public both saw a surge of practical application of artificial intelligence (AI) and machine learning (ML) to numerous technological and everyday problems. The emergence of AI/ML data-driven tools was enabled by decades of research in statistics, neurobiology, optimization, neural networks, statistical learning theory, and other fields—research that synergized into an overarching discipline of learning from data.
    Learning from data is one of the most fundamental problems facing empirical science. In the most general setting, it may be formulated as finding the true data-generating function or dependency given a set of noisy empirical observations. In statistics, the most prominent example is estimation of the cumulative distribution function or probability density function from a limited number of observations. The principal difficulty in learning functional dependencies from a limited set of noisy data is the ill-posed nature of this problem. Here, “ill-posed” is used in the sense suggested by Hadamard—namely, that the problem’s solution lacks existence, uniqueness, or stability with respect to minor variations in the data. In other words, ill-posed problems are underdetermined as the data do not contain all the information necessary to arrive at a unique, stable solution.
    Finding functional dependencies from noisy data may in fact be hindered by all three conditions of ill-posedness: the data may not contain information about the solution, numerous solutions can be found to fit the data, and the solution may be unstable with respect to minor variations in the data. To deal with ill-posed problems, a regularization method was proposed for augmenting the information contained in the data with some additional information about the solution (e.g., its smoothness). In this presentation, we demonstrate how the regularization techniques, as applied to learning function dependencies with neural networks, can be successfully applied to the planetary protection problem of estimating microbial bioburden density (i.e., spores per square meter) on spacecraft.
    We shall demonstrate that the problem of bioburden density estimation can be formulated as a solution to a least squares problem, and that this problem is indeed ill-posed. This presentation will elucidate the relationship between maximum likelihood estimates and the least squares solution by demonstrating their mathematical equivalence. It will be shown that the maximum likelihood estimation is identical to the differentiation of the cumulative count of colony forming units which can be represented as a least squares problem. Since the problem of differentiation of noisy data is ill-posed the method of regularization will be applied to obtain a stable solution.
    It will demonstrate that the problem of bioburden density estimation can be cast as a problem of regularized differentiation of the cumulative count of colony-forming units found on the spacecraft. The regularized differentiation will be shown to be a shrinkage estimator and its performance compared with other shrinkage estimators commonly used in statistics for simultaneously estimating parameters of a set of independent Poisson distributions. The strengths and weaknesses of the regularized differentiation will then be highlighted in comparison to the other shrinkage estimators.

    Speaker Info:

    Andrei Gribok

    Andrei Gribok is a Distinguished Research Scientist in Instrumentation, Controls, and Data Science Department. He received his Ph.D. in Mathematical Physics from Moscow Institute of Biological Physics in 1996 and his B.S. and M.S. degrees in systems science/nuclear engineering from Moscow Institute of Physics and Engineering in 1987. Dr. Gribok worked as an instrumentation and control researcher at the Institute of Physics and Power Engineering, Russia where he conducted research on advanced data driven algorithms for fault detection and prognostics for fast breeder reactors. He also worked as an invited research scientist at Cadarache Nuclear Research Center, France, where his research focus was on ultrasonic visualization systems for liquid metal reactors. Dr. Gribok holds position of Research Associate Professor with Department of Nuclear Engineering, University of Tennessee, Knoxville. From 2005 until 2015 Dr. Gribok was employed as a Research Scientist with Telemedicine and Advanced Technology Research Center of the U.S. Army Medical Research and Materiel Command, USDA, and USARIEM. His research interests included military operational medicine and telemedicine for combat casualty care missions. Dr. Gribok was a member of a number of international programs including IAEA coordinated research program on acoustical signal processing for the detection of sodium boiling or sodium-water reaction in LMFRs and large-scale experiments on acoustical water-in-sodium leak detection in LMFBR. Dr. Gribok is an author and co-author of three book chapters, over 40 journal peer-reviewed papers and numerous peer-reviewed conference papers. He is also a co-author of the book "Optimization Techniques in Computer Vision: Ill-Posed Problems and Regularization " Springer, 2016


  • Regularization Approach to Learning Bioburden Density for Planetary Protection

    Abstract:

    Over the last 2 years, the scientific community and the general public both saw a surge of practical application of artificial intelligence (AI) and machine learning (ML) to numerous technological and everyday problems. The emergence of AI/ML data-driven tools was enabled by decades of research in statistics, neurobiology, optimization, neural networks, statistical learning theory, and other fields—research that synergized into an overarching discipline of learning from data.
    Learning from data is one of the most fundamental problems facing empirical science. In the most general setting, it may be formulated as finding the true data-generating function or dependency given a set of noisy empirical observations. In statistics, the most prominent example is estimation of the cumulative distribution function or probability density function from a limited number of observations. The principal difficulty in learning functional dependencies from a limited set of noisy data is the ill-posed nature of this problem. Here, “ill-posed” is used in the sense suggested by Hadamard—namely, that the problem’s solution lacks existence, uniqueness, or stability with respect to minor variations in the data. In other words, ill-posed problems are underdetermined as the data do not contain all the information necessary to arrive at a unique, stable solution.
    Finding functional dependencies from noisy data may in fact be hindered by all three conditions of ill-posedness: the data may not contain information about the solution, numerous solutions can be found to fit the data, and the solution may be unstable with respect to minor variations in the data. To deal with ill-posed problems, a regularization method was proposed for augmenting the information contained in the data with some additional information about the solution (e.g., its smoothness). In this presentation, we demonstrate how the regularization techniques, as applied to learning function dependencies with neural networks, can be successfully applied to the planetary protection problem of estimating microbial bioburden density (i.e., spores per square meter) on spacecraft.
    We shall demonstrate that the problem of bioburden density estimation can be formulated as a solution to a least squares problem, and that this problem is indeed ill-posed. This presentation will elucidate the relationship between maximum likelihood estimates and the least squares solution by demonstrating their mathematical equivalence. It will be shown that the maximum likelihood estimation is identical to the differentiation of the cumulative count of colony forming units which can be represented as a least squares problem. Since the problem of differentiation of noisy data is ill-posed the method of regularization will be applied to obtain a stable solution.
    It will demonstrate that the problem of bioburden density estimation can be cast as a problem of regularized differentiation of the cumulative count of colony-forming units found on the spacecraft. The regularized differentiation will be shown to be a shrinkage estimator and its performance compared with other shrinkage estimators commonly used in statistics for simultaneously estimating parameters of a set of independent Poisson distributions. The strengths and weaknesses of the regularized differentiation will then be highlighted in comparison to the other shrinkage estimators.

    Speaker Info:

    Mike DiNicola

    Michael DiNicola is a senior systems engineer in the Systems Modeling, Analysis & Architectures Group at the Jet Propulsion Laboratory (JPL). At JPL, Michael has worked on several mission concept developments and flight projects, including Europa Clipper, Europa Lander and Mars Sample Return, developing probabilistic models to evaluate key mission requirements, including those related to planetary protection. He works closely with microbiologists in the Planetary Protection group to model assay and sterilization methods, and applies mathematical and statistical methods to improve Planetary Protection engineering practices at JPL and across NASA. At the same time, he also works with planetary scientists to characterize the plumes of Enceladus in support of future mission concepts. Michael earned his B.S. in Mathematics from the University of California, Los Angeles and M.A. in Mathematics from the University of California, San Diego.

  • Rethinking Defense Planning: Are We Buying Weapons and Forces? Or Security?

    Abstract:

    We propose a framework to enable an updated approach for modeling national security planning decisions. The basis of our approach is to treat national security as the multi-stage production of a service provided by the state to foster a nation’s welfare. The challenge in analyzing this activity stems from the fact that this a complex process that is conducted by a vast number of actors across four discrete stages of production: budgeting, planning, coercion, warfighting. We argue that decisions made at any given stage of the process that fail to consider actor incentives at all stages of the process may create serious problems. In this presentation we will present our general Feasible Production Framework approach (a formal framework based on Principal-Agent analysis), paying particular attention to the planning stage of production for this audience. This presentation will highlight the trade-offs in modeling within a narrow “single-stage aperture” versus a holistic “multi-stage aperture.”

    Speaker Info:

    Leo Blanken

    Coauthored:

    Leo Blanken, Associate Professor, Defense Analysis Department at the Naval Postgraduate School and Irregular Warfare Initiative Fellow (West Point Modern War Institute).

  • Rethinking Defense Planning: Are We Buying Weapons and Forces? Or Security?

    Abstract:

    We propose a framework to enable an updated approach for modeling national security planning decisions. The basis of our approach is to treat national security as the multi-stage production of a service provided by the state to foster a nation’s welfare. The challenge in analyzing this activity stems from the fact that this a complex process that is conducted by a vast number of actors across four discrete stages of production: budgeting, planning, coercion, warfighting. We argue that decisions made at any given stage of the process that fail to consider actor incentives at all stages of the process may create serious problems. In this presentation we will present our general Feasible Production Framework approach (a formal framework based on Principal-Agent analysis), paying particular attention to the planning stage of production for this audience. This presentation will highlight the trade-offs in modeling within a narrow “single-stage aperture” versus a holistic “multi-stage aperture.”

    Speaker Info:

    Jason Lepore

    Jason Lepore, Professor and Chair of the Economics Department at the Orfalea College of Business, CalPoly, San Luis Obispo and a Visiting Professor at the Defense Analysis Department, Naval Postgraduate School.

  • Rocket Motor Design Qualification Through Enhanced Reliability Assurance Testing

    Abstract:

    Composite pressure vessel designs for rocket motors must be qualified for use in both military and space applications. By intent, demonstration testing methods ignore a priori information about a system which inflates typically constrained test budgets and often have low probability of test success. On the other hand, reliability assurance tests encourage use of previous test data and other relevant information about a system. Thus, an assurance testing approach can dramatically reduce the cost of a qualification test. This work extends reliability assurance testing to allow scenarios with right-censored and exact failure possibilities. This enhancement increases the probability of test success and provides a post-test re-evaluation of test results. The method is demonstrated by developing a rocket motor design qualification assurance test.

    Speaker Info:

    Todd Remund

    Staff Data Scientist

    Northrop Grumman

    Todd received his BS and MS degrees in statistics from Brigham Young University in Provo Utah. After graduation he worked for ATK where he worked on the Shuttle program, Minute Man, Peace Keeper, and other DOD and NASA related programs. He left ATK to go to Edwards AFB to work as a civil servant. While there he had the opportunity to do statistics on nearly every fighter, bomber, cargo plane, refueler, and UAV that you can think of. After six years at Edwards AFB he returned to Utah to work at Orbital ATK, now Northrop Grumman Space Systems where he currently works as a staff data scientist / statistician and LMDS technical fellow.

  • Sensor Fusion for Automated Gathering of Labeled Data in Edge Settings

    Abstract:

    Data labeling has been identified as the most significant bottleneck and expense in the development of ML enabled systems. High quality labeled data also plays a critical role in the testing and deployment of AI/ML enabled systems, by providing a realistic measurement of model performance in a realistic environment. Moreover, the lack of agreement on test and production data is a commonly cited failure mode for ML systems.

    This work focuses on methods for automatic label acquisition using sensor fusion methods, specifically in edge settings where multiple sensors, including multi-modal sensors, provide multiple views of an object. When multiple sensors provide probable detection of an object, the detection capabilities of the overall system (as opposed to those of each component of the system) can be improved to highly probable or nearly certain. This is accomplished via a system network of belief propagation that fuses the observations of an object from multiple sensors. These nearly certain detections can, in turn, be used as labels in a semi-supervised like manner. Once the detection likelihood exceeds a specified threshold, the data and the associated label can be used in retraining to produce higher performing models in near real time to improve overall detection capabilities.

    Automated edge retraining scenarios provide a particular challenge for test and evaluation because it also requires high confidence tests that generalize to potentially unseen environments. The rapid and automated collection of labels enables edge retraining, federated training, dataset construction, and improved model performance. Additionally, improved model performance is an enabling capability for downstream system tasks, including more rapid model deployment, faster time to detect, fewer false positives, simplified data pipelines, and decreased network bandwidth requirements.

    To demonstrate these benefits, we have developed a scalable reference architecture and dataset that allows repeatable experimentation for edge retraining scenarios. This architecture allows exploration of the complex design space for sensor fusion systems, with variation points including: methods for belief automation, automated labeling methods, automatic retraining triggers, and drift detection mechanisms. Our reference architecture exercises all of these variation points using multi-modal data (overhead imaging, ground-based imaging, and acoustic data).

    Speaker Info:

    Robert Edman

    Machine Learning Research Scientist

    Software Engineering Institute

    This work is a collaboration between the Army AI Integration Center and the Software Engineering Institute. Collaborators include Dr. Robert Edman, CPT Bryce Wilkins , Dr. Jose Morales who focus on the rapid maturation of basic research into Army relevant capabilities, particularly through testing using Army relevant data and metrics.

  • Silence of the Logs: A Cyber Red Team Data Collection Framework

    Abstract:

    Capturing the activities of Cyber Red Team operators as they conduct their mission in a way that is both reproducible and also granular enough for detailed analysis poses a challenge to test organizations for cyber testing. Cyber Red Team members act as both operators and data collectors, all while keeping a busy testing schedule and working within a limited testing window. Data collection often suffers at the expense of meeting testing objectives. Data collection assistance may therefore be beneficial to support Cyber Red Team members so they can conduct cyber operations while still delivering the needed data.

    To assist in data collection, DOT&E, IDA, Johns Hopkins University Applied Physics Lab, and MITRE are developing a framework, including a data standard that supports data collection requirements, for Cyber Red Teams called Silence of the Logs (SotL). The goal of delivering SotL is to have Red Teams continue operations as normal while automatically logging activity in the SotL data standard and generating data needed for analyses. In addition to the data standard and application framework, the SotL development team has created example capabilities that record logs from a commonly used commercial Red Team tool in the data standard format. As Cyber Red Teams adopt other Red Team tools, they can use the SotL data standard and framework to create their own logging mechanisms to meet data collection requirements. Analysts also benefit from the SotL data standard as it enables reproducible data analysis. This talk demonstrates current SotL capabilities and presents possible data analysis techniques enabled by SotL.

    Speaker Info:

    Jason Schlup

    Dr. Jason Schlup received his Ph.D. in Aeronautics from the California Institute of Technology in 2018.  He is now a Research Staff Member at the Institute for Defense Analyses and provides analytical support to the Director, Operational Test and Evaluation’s Cyber Assessment Program.  His research and interest areas include enterprise network attack analysis, improving data collection methodologies, and cloud security.  Jason also contributes to IDA’s Cyber Lab capability, focusing on Internet Protocol-based training modules and outreach opportunities.


  • Silence of the Logs: A Cyber Red Team Data Collection Framework

    Abstract:

    Capturing the activities of Cyber Red Team operators as they conduct their mission in a way that is both reproducible and also granular enough for detailed analysis poses a challenge to test organizations for cyber testing. Cyber Red Team members act as both operators and data collectors, all while keeping a busy testing schedule and working within a limited testing window. Data collection often suffers at the expense of meeting testing objectives. Data collection assistance may therefore be beneficial to support Cyber Red Team members so they can conduct cyber operations while still delivering the needed data.

    To assist in data collection, DOT&E, IDA, Johns Hopkins University Applied Physics Lab, and MITRE are developing a framework, including a data standard that supports data collection requirements, for Cyber Red Teams called Silence of the Logs (SotL). The goal of delivering SotL is to have Red Teams continue operations as normal while automatically logging activity in the SotL data standard and generating data needed for analyses. In addition to the data standard and application framework, the SotL development team has created example capabilities that record logs from a commonly used commercial Red Team tool in the data standard format. As Cyber Red Teams adopt other Red Team tools, they can use the SotL data standard and framework to create their own logging mechanisms to meet data collection requirements. Analysts also benefit from the SotL data standard as it enables reproducible data analysis. This talk demonstrates current SotL capabilities and presents possible data analysis techniques enabled by SotL.

    Speaker Info:

    Jared Aguayo

    Jared Aguayo, a professional in computer science and software engineering at Johns Hopkins APL, holds a Bachelor's in Computer Science and a Master's in Software Engineering from the University of Texas at El Paso, where he was an SFS scholar. Specializing in 5G and SDN research during his master's, he honed his development and teamwork skills through capstone projects with the Army and Pacific Northwest National Lab. At APL, Jared applies these skills to significant projects, leveraging his willingness to learn and experience.

  • Silence of the Logs: A Cyber Red Team Data Collection Framework

    Abstract:

    Capturing the activities of Cyber Red Team operators as they conduct their mission in a way that is both reproducible and also granular enough for detailed analysis poses a challenge to test organizations for cyber testing. Cyber Red Team members act as both operators and data collectors, all while keeping a busy testing schedule and working within a limited testing window. Data collection often suffers at the expense of meeting testing objectives. Data collection assistance may therefore be beneficial to support Cyber Red Team members so they can conduct cyber operations while still delivering the needed data.

    To assist in data collection, DOT&E, IDA, Johns Hopkins University Applied Physics Lab, and MITRE are developing a framework, including a data standard that supports data collection requirements, for Cyber Red Teams called Silence of the Logs (SotL). The goal of delivering SotL is to have Red Teams continue operations as normal while automatically logging activity in the SotL data standard and generating data needed for analyses. In addition to the data standard and application framework, the SotL development team has created example capabilities that record logs from a commonly used commercial Red Team tool in the data standard format. As Cyber Red Teams adopt other Red Team tools, they can use the SotL data standard and framework to create their own logging mechanisms to meet data collection requirements. Analysts also benefit from the SotL data standard as it enables reproducible data analysis. This talk demonstrates current SotL capabilities and presents possible data analysis techniques enabled by SotL.

    Speaker Info:

    Misael Valentin

    Misael Valentin is a software engineer with the Resilient Military Systems group at Johns Hopkins University Applied Physics Laboratory. He graduated with a BS in Computer Engineering with a focus on embedded systems from the University of Puerto Rico, and later obtained an MS in Computer Science with a focus on machine learning from Johns Hopkins University. As part of his work at APL, Misael develops software that helps to enable the creation of cyber-resilient systems in support of multiple sponsors spanning multiple domains, from Virginia Class submarines, to national security space systems, and nuclear command, control, and communications. He is also the APL representative to the DOT&E AI Working Group.

  • Simulated Multipath Using Software Generated GPS Signals

    Abstract:

    Depending on the environment, multipath can be one of the largest error sources contributing to degradation in Global Navigation Satellite System (GNSS) (e.g., GPS) performance. Multipath is a phenomenon that occurs as radio signals reflect off of surfaces, such as buildings, producing multiple copies of the original signal. When this occurs with GPS signals, it results in one or more delayed signals arriving at the receiver with or without the on-time/direct GPS signal. The receiver measures the composite of these signals which, depending on the severity of the multipath, can substantially degrade the accuracy of the receiver's calculated position. Multipath is commonly experienced in cities due to tall buildings and its mitigation is an ongoing area of study. This research demonstrates a novel approach for simulating GPS multipath through the modification of an open-source tool, GPS-SDR-SIM. The resulting additional testing capability could allow for improved development of multipath mitigating technologies.

    Currently, open-source tools for simulating GPS signals are available and can be used in the testing and evaluation of GPS receiver equipment. These tools can generate GPS signals that, when used by a GPS receiver, result in computation of a position solution that was pre-determined at the time of signal generation. That is, the signals produced are properly formed for the pre-determined location and result in the receiver reporting that position. This allows for a GPS receiver under test to be exposed to various simulated locations and conditions without having to be physically subjected to them. Additionally, while these signals are generated by a software simulation, they can be processed by real or software defined GPS receivers. This work utilizes the GPS-SDR-SIM software tool for GPS signal generation and while this tool does implement some sources of error that are inherent to GPS, it cannot inject multipath. GPS-SDR-SIM was modified in this effort to produce additional copies of signals with pre-determined delays. These additional delayed signals mimic multipath and represent what happens to GPS signals in the real world as they reflect off of surfaces and arrive at a receiver in place of or alongside the direct GPS signal.

    A successful proof of concept was prototyped and demonstrated using this modified version of GPS-SDR-SIM to produce simulated GPS signals as well as additional simulated multipath signals. The generated data was processed using a software defined GPS receiver and it was found that the introduction of simulated multipath signals successfully produced the expected characteristics of a composite multipath signal. Further maturation of this work could allow for the development of a GPS receiver testing and evaluation framework and aid in the development of multipath mitigating technologies.

    Speaker Info:

    Russell Gilabert

    Researcher/Engineer

    NASA Langley Research Center

    Russell Gilabert is a computer research engineer in the Safety Critical Avionics Systems Branch at NASA Langley Research Center. Russell received his MSc in electrical engineering from Ohio University in 2018. His research is currently focused on GNSS augmentation techniques and dependable navigation for autonomous aerial vehicles.

  • Simulation Insights on Power Analysis with Binary Responses: From SNR Methods to 'skprJMP'

    Abstract:

    Logistic regression is a commonly-used method for analyzing tests with probabilistic responses in the test community, yet calculating power for these tests has historically been challenging. This difficulty prompted the development of methods based on signal-to-noise ratio (SNR) approximations over the last decade, tailored to address the intricacies of logistic regression's binary outcomes and complex probability distributions. Originally conceived as a solution to the limitations of then-available statistical software, these approximations provided a necessary, albeit imperfect, means of power analysis. However, advancements and improvements in statistical software and computational power have reduced the need for such approximate methods. Our research presents a detailed simulation study that compares SNR-based power estimates with those derived from exact Monte Carlo simulations, highlighting the inadequacies of SNR approximations. To address these shortcomings, we will discuss improvements in the open-source R package "skpr" as well as present "skprJMP," a new plug-in that offers more accurate and reliable power calculations for logistic regression analyses for organizations that prefer to work in JMP. Our presentation will outline the challenges initially encountered in calculating power for logistic regression, discuss the findings from our simulation study, and demonstrate the capabilities and benefits "skpr" and "skprJMP" provide to an analyst.

    Speaker Info:

    Tyler Morgan-Wall

    Research Staff Member

    IDA

    Dr. Tyler Morgan-Wall is a Research Staff Member at the Institute for Defense Analyses, and is the developer of the software library skpr: a package developed at IDA for optimal design generation and power evaluation in R. He is also the author of several other R packages for data visualization, mapping, and cartography. He has a PhD in Physics from Johns Hopkins University and lives in Silver Spring, MD.

  • Statistical Advantages of Validated Surveys over Custom Surveys

    Abstract:

    Surveys play an important role in quantifying user opinion during test and evaluation (T&E). Current best practice is to use surveys that have been tested, or “validated,” to ensure that they produce reliable and accurate results. However, unvalidated (“custom”) surveys are still widely used in T&E, raising questions about how to determine sample sizes for—and interpret data from— T&E events that rely on custom surveys. In this presentation, I characterize the statistical properties of validated and custom survey responses using data from recent T&E events, and then I demonstrate how these properties affect test design, analysis, and interpretation. I show that validated surveys reduce the number of subjects required to estimate statistical parameters or to detect a mean difference between two populations. Additionally, I simulate the survey process to demonstrate how poorly designed custom surveys introduce unintended changes to the data, increasing the risk of drawing false conclusions.

    Speaker Info:

    Adam Miller

    Research Staff Member

    IDA

    Adam Miller has been a Research Staff Member at IDA since 2022. He is a member of the IDA Test Science team supporting HSI evaluations of Land and Expeditionary Warfare systems. Previously, Adam worked as a behavioral neuroscientist at The Hospital for Sick Children in Toronto, ON, where he studied how the brain stores memories. He has a PhD in Psychology from Cornell University, and a B.A. in Psychology from Providence College.

  • Statistical Modeling of Machine Learning Operating Envelopes

    Abstract:

    Characterizing a model’s operating envelope, or the range of values for which the model performs well, is often of interest to a researcher. Of particular interest is estimating the operating envelope of a model at each phase of a testing process. Bayesian methods have been developed to complete this task for relatively simple models, but at present there is no method for more complicated models, in particular Machine Learning models. Preliminary research has shown that metadata influences model performance, although this work has primarily focused on categorical metadata. We are currently conducting a more rigorous investigation of the effect of metadata on a Machine Learning model’s operating envelope using the MNIST handwritten data set.

    Speaker Info:

    Anna Flowers

    Graduate Student

    Virginia Tech

    Anna Flowers is a third-year Ph.D student in the statistics department at Virginia Tech, where she is a recipient of the Jean Gibbons Fellowship. She received a B.S. in Mathematical Statistics from Wake Forest University in 2021 and an M.S. in Statistics from Virginia Tech in 2023. Her research focuses on mixture modeling and large-scale Gaussian Process approximation, particularly as it applies to estimating model performance. She is co-advised by Bobby Gramacy and Chris Franck.

  • Statistical Validation of Fuel Savings from In-Flight Data Recordings

    Abstract:

    The efficient use of energy is a critical challenge for any organization, but especially in aviation, where entities such as the United States Air Force operate on a global scale, using many millions of gallons of fuel per year and requiring a massive logistical network to maintain operational readiness. Even very small modifications to aircraft, whether it be physical, digital, or operational, can accumulate substantial changes in a fleet’s fuel consumption. We have developed a prototype system to quantify changes in fuel use due to the application of an intervention, with the purpose of informing decision-makers and promoting fuel-efficient practices. Given a set of in-flight sensor data from a certain type of aircraft and a list of sorties for which an intervention is present, we use statistical models of fuel consumption to provide confidence intervals for the true fuel efficiency improvements of the intervention. Our analysis shows that, for some aircraft, we can reliably detect the presence of interventions with as little as a 1% fuel rate improvement and only a few hundred sorties, enabling rapid mitigation of even relatively minor issues.

    Speaker Info:

    Keltin Grimes

    Assistant Machine Learning Research Scientist

    Software Engineering Institute

    Keltin joined the Software Engineering Institute's AI Division in June of 2023 as an Assistant Machine Learning Research Scientist after graduating from Carnegie Mellon University with a B.S. in Statistics and Machine Learning and an additional major in Computer Science. His previous research projects have included work on Machine Unlearning, adversarial attacks on ML systems, and ML for materials discovery. 

  • Synthetic Data for Target Acquisition

    Abstract:

    As the battlefield undergoes constant evolution, and we anticipate future conflicts, there is a growing need for apt computer vision models tailored toward military applications. The heightened use of drones and other technology on the modern battlefield has led to a demand for effective models specifically trained on military equipment. However, there has not been a proper effort to assemble or utilize data from recent wars for training future-oriented models. Creating new quality data poses costs and challenges that make it unrealistic for the sole purpose of training these models. This project explores a way around these barriers with the use of synthetic data generation using the Unreal Engine, a prominent computer graphics gaming engine. The ability to create computer-generated videos representative of the battlefield can impact model training and performance. I will be limiting the scope to focus on armored vehicles and the point of view of a consumer drone. Simulating a drone’s point of view in the Unreal Engine, I will create a collection of videos with ample variation. Using this data, I will experiment with various training methods to provide commentary on the best use of synthetic imagery for this task. If shown to be promising, this method can provide a feasible solution to prepare our models and military for what comes next.

    Speaker Info:

    Max Felter

    CDT

    USCC West Point

    My name is Max Felter and I am currently a sophomore at the United States Military Academy at West Point. I am an applied statistics and data science major in the honors track. I plan to stay involved with research as a cadet and hope to pursue a graduate degree sometime thereafter. In my first year conducting research, I have focused on the topic of computer vision utilizing current off-the-shelf models. I am passionate about the intersection of innovation and service and hope to contribute throughout my career.

  • Tactical Route Optimization: A Data Driven Method for Military Route Planning

    Abstract:

    Military planners frequently face the challenging task of devising a route plan based solely on a map and a grid coordinate of their objective. This traditional approach is not only time-consuming but also mentally taxing. Moreover, it often compels planners to make broad assumptions, resulting in a route that is based more on educated guesses than on data-driven analysis. To address these limitations, this research explores the potential of utilizing a path-finding algorithm to assist planners such as A*. Specifically, our algorithm aims to identify the route that minimizes the likelihood of enemy detection, thereby providing a more optimized and data-driven path for mission success. We have developed a model that takes satellite imagery data and produces a feasible route that minimizes detection given the location of an enemy. Future work includes improving the graphical interface and the development of k-distinct paths to provide planners with multiple options.

    Speaker Info:

    Jason Ingersoll

    Cadet

    West Point

    Jason Ingersoll is an operations research major from the United States Military Academy, with a focus on the innovative applications of mathematics and computer science. They began their research journey by utilizing Markov Chains to predict NBA game scores, later incorporating Monte Carlo simulations. Their internships at Lockheed Martin and MIT Lincoln Labs further allowed them to work on radar optimization and design a short-range communication system using infrared LEDs. Additionally, Jason Ingersoll has presented research on the ethical use of brain-computer interfaces to address PTSD in soldiers at an Oxford conference and is published in the American Intelligence Journal. During their senior year, they have worked on a military route planning program, integrating A-Star algorithms, GPS, and LIDAR data, as part of their capstone project. Their graduate research plan includes pursuing a degree in Artificial Intelligence and Machine Learning at an institution like MIT, aiming to advance AI's ethical integration within the military. With a passion for bridging technological innovation and ethical responsibility, Jason Ingersoll is dedicated to enhancing the military's decision-making capabilities and software through the responsible application of AI.

  • Text Analysis: Introduction to Advanced Language Modeling

    Abstract:

    This course will provide a broad overview of text analysis and natural language processing (NLP), including a significant amount of introductory material with extensions to state-of-the-art methods. All aspects of the text analysis pipeline will be covered including data preprocessing, converting text to numeric representations (from simple aggregation methods to more complex embeddings), and training supervised and unsupervised learning methods for standard text-based tasks such as named entity recognition (NER), sentiment analysis, topic modeling, and text generation using Large Language Models (LLMs). The course will alternate between presentations and hands-on exercises in Python. Translations from Python to R will be provided for students more comfortable with that language. Attendees should be familiar with Python (preferably), R, or both and have a basic understanding of statistics and/or machine learning. Attendees will gain the practical skills necessary to begin using text analysis tools for their tasks, an understanding of the strengths and weaknesses of these tools, and an appreciation for the ethical considerations of using these tools in practice.

    Speaker Info:

    Karl Pazdernik

    Pacific Northwest National Laboratory

    Dr. Karl Pazdernik is a senior data scientist at Pacific Northwest National Laboratory. He is also a research assistant professor at North Carolina State University (NCSU) and the former chair of the American Statistical Association Section on Statistics in Defense and National Security. His research has focused on the dynamic modeling of multi-modal data with a particular interest in text analytics, spatial statistics, pattern recognition, anomaly detection, Bayesian statistics, and computer vision. Recent projects include natural language processing of multilingual unstructured financial data, anomaly detection in combined open-source data streams, automated biosurveillance and disease forecasting, and deep learning for defect detection and element mass quantification in nuclear materials. He received a Ph.D. in Statistics from Iowa State University and was a postdoctoral scholar at NCSU under the Consortium for Nonproliferation Enabling Capabilities.

  • The Joint Test Concept: Reimagining T&E for the Modern Joint Environment

    Abstract:

    The Joint force will likely be contested in all domains during the execution of distributed and potentially non-contiguous, combat operations. This challenge inspires the question, “How do we effectively reimagine efficient T&E within the context of expected contributions to complex Joint kill/effects webs?” The DOT&E sponsored Joint Test Concept applies an end-to-end capability lifecycle campaign of learning approach, anchored in mission engineering, and supported by a distributed live, virtual, constructive environment to assess material and non-material solutions’ performance, interoperability, and impact to service and Joint mission execution. Relying on input from the expanding JTC community of interest and human centered design facilitation, the final concept is intended to ensure data quality, accessibility, utility, and analytic value across existing and emergent Joint mission (kill/effects) webs for all systems under test throughout the entire capability lifecycle. Using modeling and simulation principles, the JTC team is developing an evaluation model to assess the impact of the JTC within the current T&E construct to identify the value proposition across a diverse stakeholder population.

    Speaker Info:

    Christina Houfek

    Lead PM

    VT-ARC

    Christina Houfek joined Virginia Tech’s Applied Research Corporation in 2022 following her time as an Irregular Warfare Analyst at the Naval Special Warfare Command and as Senior Professional Staff at the Johns Hopkins University Applied Physics Laboratory. She holds an M.A. in Leadership in Education from Notre Dame of Maryland University and a Graduate Certificate in Terrorism Analysis awarded by the University of Maryland.

  • The Joint Test Concept: Reimagining T&E for the Modern Joint Environment

    Abstract:

    The Joint force will likely be contested in all domains during the execution of distributed and potentially non-contiguous, combat operations. This challenge inspires the question, “How do we effectively reimagine efficient T&E within the context of expected contributions to complex Joint kill/effects webs?” The DOT&E sponsored Joint Test Concept applies an end-to-end capability lifecycle campaign of learning approach, anchored in mission engineering, and supported by a distributed live, virtual, constructive environment to assess material and non-material solutions’ performance, interoperability, and impact to service and Joint mission execution. Relying on input from the expanding JTC community of interest and human centered design facilitation, the final concept is intended to ensure data quality, accessibility, utility, and analytic value across existing and emergent Joint mission (kill/effects) webs for all systems under test throughout the entire capability lifecycle. Using modeling and simulation principles, the JTC team is developing an evaluation model to assess the impact of the JTC within the current T&E construct to identify the value proposition across a diverse stakeholder population.

    Speaker Info:

    Maegan Nix

    VT-ARC

    Dr. Maegen Nix, Ph.D. is a veteran and a former intelligence officer with 25 years of experience in the national security community and academia and currently serves as the Director of the Decision Science Division at Virginia Tech’s Applied Research Corporation. Her civilian career has focused on the development of portfolios related to irregular warfare and insurgencies, cybersecurity, critical infrastructure security, national communications, autonomous systems, and intelligence. Dr. Nix earned a Ph.D. in government and politics from the University of Maryland, an M.A. in political science from Virginia Tech, and a B.S. in political science from the U.S. Naval Academy.

  • The Role of Bayesian Multilevel Models in Performance Measurement and Prediction

    Abstract:

    T&E relies on series of observations under varying conditions in order to assess overall performance. Traditional evaluation methods can in fact oversimplify complex structures in the data, where variance within groups of observations made under identical experimental conditions differs significantly from that between such groups, introducing biases and potentially misrepresenting true performance capabilities. To address these challenges, MORSE is implementing Bayesian multilevel models. These models adeptly capture the nuanced group-wise structure inherent in T&E data, simultaneously estimating intragroup and intergroup parameters while efficiently pooling information across different model levels. This methodology is particularly adept at regressing against experimental parameters, a feature that conventional models often overlook. A distinct advantage of employing Bayesian approaches lies in their ability to generate comprehensive uncertainty distributions for all model parameters, providing a more robust and holistic understanding how performance varies. Our application of these Bayesian multilevel models has been instrumental in generating credible intervals for performance metrics for applications with varying levels of risk tolerance. Looking forward, our focus will shift towards advancing T&E past the idea of measuring performance towards the idea of modeling performance. 

    Speaker Info:

    Austin Amaya

    Dr. Austin Amaya is the lead for Algorithm Testing and Evaluation at MORSE. He has more than 10 years' of experience developing and testing AI/ML-driven systems within the DoD.

  • The Role of Bayesian Multilevel Models in Performance Measurement and Prediction

    Abstract:

    T&E relies on series of observations under varying conditions in order to assess overall performance. Traditional evaluation methods can in fact oversimplify complex structures in the data, where variance within groups of observations made under identical experimental conditions differs significantly from that between such groups, introducing biases and potentially misrepresenting true performance capabilities. To address these challenges, MORSE is implementing Bayesian multilevel models. These models adeptly capture the nuanced group-wise structure inherent in T&E data, simultaneously estimating intragroup and intergroup parameters while efficiently pooling information across different model levels. This methodology is particularly adept at regressing against experimental parameters, a feature that conventional models often overlook. A distinct advantage of employing Bayesian approaches lies in their ability to generate comprehensive uncertainty distributions for all model parameters, providing a more robust and holistic understanding how performance varies. Our application of these Bayesian multilevel models has been instrumental in generating credible intervals for performance metrics for applications with varying levels of risk tolerance. Looking forward, our focus will shift towards advancing T&E past the idea of measuring performance towards the idea of modeling performance. 

    Speaker Info:

    Sean Dougherty

  • Threat Integration for Full Spectrum Survivability Assessments

    Abstract:

    The expansion of DOT&E’s oversight role to cover full spectrum survivability and lethality assessments includes a need to reexamine how threats are evaluated in a Live Fire Test and Evaluation (LFT&E) rubric. Traditionally, threats for LFT&E assessment have been considered in isolation, with focus on only conventional weapon hits that have the potential to directly damage the system under test. The inclusion of full spectrum threats - including electronic warfare, directed energy, CBRNE, and cyber - requires a new approach to how LFT&E assessments are conducted. Optimally, assessment of full spectrum threats will include integrated survivability vignettes appropriate to how our systems will actually be used in combat and how combinations of adversary threats are likely to be used against them. This approach will require new assessment methods with an increased reliance on data from testing at design sites, component/surrogate tests, and digital twins.

    Speaker Info:

    Russell Kupferer

    Naval Warfare Action Officer

    DOT&E

    Russell Kupferer is an Action Officer in the Naval Warfare directorate of the office of Director, Operational Test and Evaluation (DOT&E). In this role, he provides oversight of the Live Fire Test and Evaluation (LFT&E) programs for all USN ships and submarines. Mr. Kupferer received his bachelor’s degree in Naval Architecture and Marine Engineering at Webb Institute.

  • Understanding and Applying the Human Readiness Level Scale During User-Centered Design

    Abstract:

    The purpose of this short course is to support knowledge and application of the Human Readiness Level (HRL) scale described in ANSI/HFES 400-2021 Human Readiness Level Scale in the System Development Process. The HRL scale is a simple nine-level scale designed to supplement the Technology Readiness Level (TRL) scale to evaluate, track, and communicate the readiness of a technology or system for safe and effective human use. Application of the HRL scale ensures proper attention to human systems design throughout system development, which minimizes or prevents human error and enhances the user experience.
    Learning objectives for the short course include:
    (1) Understand the relationship between a user-centered design (UCD) process and the HRL Scale. Instructors will discuss a “typical” UCD process describing the design activities and data collected that support HRL Scale evaluation and tracking.
    (2) Learn effective application of usability testing in a DOD environment. Instructors will describe iterative, formative usability testing with a hands-on opportunity to perform usability tasks. Human-centered evaluation of system design is a critical activity when evaluating the extent to which a system is ready for human use.
    (3) Understand HFES 400-2021 development and contents. Instructors will describe the evolution of the HRL concept to convey its significance and the rigor behind the development of the technical standard. Instructors will walk through major sections of the standard and describe how to apply them.
    (4) Learn how the HRL scale is applied in current and historical acquisition programs. Instructors will describe real-world Army applications of the HRL scale, including a case study of a software modernization program.
    (5) Apply the HRL scale to practical real-world problems. Attendees will gain hands-on experience applying the HRL scale during group exercises that simulate teamwork during the system development process. Group exercises incorporate three different scenarios, being both hardware and software solutions at various stages of technological development. The hands-on exercises specifically address common questions about the practical use of the HRL scale. Course attendees do not need prior human factors/ergonomics knowledge or ability. The HRL scale is intended to be applied by human systems professionals with proper ability and experience; however, recipients of HRL scale ratings include many other types of personnel in design, engineering, and acquisition as well as high-level decision-makers, all of whom benefit from understanding the HRL scale. Before attending the course, students should download a free copy of the ANSI/HFES technical standard at https://my.hfes.org/online-store/publications and bring it to the course in electronic or hard copy format. Laptops are not necessary for the course but may facilitate notetaking and completion of the group exercises.

    Speaker Info:

    Judi See

    Sandia National Laboratories

    Dr. Judi See is a systems analyst and human factors engineer at Sandia National Laboratories in Albuquerque, New Mexico. Her work involves leading research and analysis focused on the human component of the nuclear deterrence system. Dr. See has a doctorate degree in human factors, master’s degrees in human factors and systems engineering, and professional certification in human factors and ergonomics through the Board of Certification in Professional Ergonomics. She became a Distinguished Member of the Technical Staff at Sandia National Laboratories in 2021. Her research interests include vigilance, signal detection theory, visual inspection, and human readiness levels.

  • Understanding and Applying the Human Readiness Level Scale During User-Centered Design

    Abstract:

    The purpose of this short course is to support knowledge and application of the Human Readiness Level (HRL) scale described in ANSI/HFES 400-2021 Human Readiness Level Scale in the System Development Process. The HRL scale is a simple nine-level scale designed to supplement the Technology Readiness Level (TRL) scale to evaluate, track, and communicate the readiness of a technology or system for safe and effective human use. Application of the HRL scale ensures proper attention to human systems design throughout system development, which minimizes or prevents human error and enhances the user experience.
    Learning objectives for the short course include:
    (1) Understand the relationship between a user-centered design (UCD) process and the HRL Scale. Instructors will discuss a “typical” UCD process describing the design activities and data collected that support HRL Scale evaluation and tracking.
    (2) Learn effective application of usability testing in a DOD environment. Instructors will describe iterative, formative usability testing with a hands-on opportunity to perform usability tasks. Human-centered evaluation of system design is a critical activity when evaluating the extent to which a system is ready for human use.
    (3) Understand HFES 400-2021 development and contents. Instructors will describe the evolution of the HRL concept to convey its significance and the rigor behind the development of the technical standard. Instructors will walk through major sections of the standard and describe how to apply them.
    (4) Learn how the HRL scale is applied in current and historical acquisition programs. Instructors will describe real-world Army applications of the HRL scale, including a case study of a software modernization program.
    (5) Apply the HRL scale to practical real-world problems. Attendees will gain hands-on experience applying the HRL scale during group exercises that simulate teamwork during the system development process. Group exercises incorporate three different scenarios, being both hardware and software solutions at various stages of technological development. The hands-on exercises specifically address common questions about the practical use of the HRL scale. Course attendees do not need prior human factors/ergonomics knowledge or ability. The HRL scale is intended to be applied by human systems professionals with proper ability and experience; however, recipients of HRL scale ratings include many other types of personnel in design, engineering, and acquisition as well as high-level decision-makers, all of whom benefit from understanding the HRL scale. Before attending the course, students should download a free copy of the ANSI/HFES technical standard at https://my.hfes.org/online-store/publications and bring it to the course in electronic or hard copy format. Laptops are not necessary for the course but may facilitate notetaking and completion of the group exercises.

    Speaker Info:

    Pam Savage-Knepshield

    CACI, International

    Pam Savage-Knepshield is employed by CACI, International as the user-centered design (UCD) lead for Army Field Artillery Command and Control Systems supporting the U.S. Army Project Manager Mission Command in Aberdeen Proving Ground, Maryland. She has a doctorate degree in cognitive psychology and is a Fellow of the Human Factors and Ergonomics Society. With over 35 years of human factors experience working in industry, academia, and the US Army, her interests focus on user-centered design from front-end development identifying user needs and translating them into user stories, through usability testing and post-fielding user satisfaction assessment.

  • Unlocking our Collective Knowledge: LLMs for Data Extraction from Long-Form Documents

    Abstract:

    As the primary mode of communication between humans, natural language (oftentimes found in the form of text) is one of the most prevalent sources of information across all domains. From scholarly articles to industry reports, textual documentation pervades every facet of knowledge dissemination. This is especially true in the world of aerospace. While other structured data formats may struggle to capture complex relationships, natural language excels by allowing for detailed explanations that a human can understand. However, the flexible, human-centered nature of text has made it traditionally difficult to incorporate into quantitative analyses, leaving potentially valuable insights and features hidden within the troves of documents collecting dust in various repositories.

    Large Language Models (LLMs) are an emerging technology that can bridge the gap between the expressiveness of unstructured text and the practicality of structured data. Trained to predict the next most likely word following a sequence of text, LLMs built on large and diverse datasets must implicitly learn knowledge related to a variety of fields in order to perform prediction effectively. As a result, modern LLMs have the capability to interpret the underlying semantics of language in many different contexts, allowing them to digest long-form, domain-specific textual information in a fraction of the time that a human could. Among other things, this opens up the possibility of knowledge extraction: the transformation of unstructured textual knowledge to a structured format that is consistent, queryable, and amenable to being incorporated in future statistical or machine learning analyses.

    Specifically, this work begins by highlighting the use of GPT-4 for categorizing NASA work contracts based on JPL’s organizational structure using textual descriptions of the contract’s work, allowing the lab to better understand how different divisions will be impacted by the increasingly outsourced work environment. Despite its simplicity, the task demonstrates the capability of LLMs to ingest unstructured text and produce structured results (categorical features for each contract indicating the JPL organization that the work would involve) useful for statistical analysis. Potential extensions to this proof of concept are then highlighted, such as the generation of knowledge-graphs/ontologies to encode domain and mission-specific information. Access to a consistent, structured graphical knowledge base would not only improve data-driven decision making in engineering contexts by exposing previously out-of-reach data artifacts to traditional analyses (e.g., numerical data extracted from text, or even graph embeddings which encode entities/nodes as vectors in a way that captures the entity’s relation to the overall structure of the graph), but could also accelerate the development of specialized capabilities like the mission Digital Twin (DT) by enabling access to a reliable, machine-readable database of mission and domain expertise.

    Speaker Info:

    Patrick Bjornstad

    Systems Engineer I

    Jet Propulsion Laboratory

    Patrick Bjornstad is a Systems Engineer / Data Scientist at Jet Propulsion Laboratory in the Systems Modeling, Analysis & Architectures group. With expertise in a range of topics including statistical modeling, machine/deep learning, software development, and data engineering, Patrick has been involved with a variety of projects, primarily supporting formulation work at JPL. Patrick earned a B.S. in an Applied & Computational Mathematics and an M.S. in Applied Data Science at the University of Southern California (USC).

  • Using AI to Classify Combat Vehicles in Degraded Environments

    Abstract:

    In the last decade, warfare has come to be characterized by rapid technological advances and the increased integration of artificial intelligence platforms. From China’s growing emphasis on advanced technological development programs to Ukraine’s use of facial recognition technologies in the war with Russia, the prevalence of artificial intelligence (AI) is undeniable. Currently, the United States is innovating the use of machine learning (ML) and AI through a variety of projects. Various systems use cutting-edge sensing technologies and emerging ML algorithms to automate the target acquisition process. As the United States attempts to increase its use of ATR and AiTR systems, it is important to consider the inaccuracy that may occur as a result of environmental degradations, such as smoke, fog, or rain. Therefore, this project aims to mimic various battlefield degradations through the implementation of different types of noise, namely Uniform, Gaussian, and Impulse noise to determine the effect of these various degradations on an Commercial-off-the-Shelf image classification system’s ability to correctly identify combat vehicles. This is an undergraduate research project which we wish to present via a Poster Presentation.

    Speaker Info:

    Morgan Brown

    Undergrad Researcher

    USMA

    Morgan Brown is originally from Phoenix, Arizona, but is currently attending the United States Military Academy pursuing an undergraduate degree in Mathematical Sciences. Throughout her academic career, she has worked on projects relating to modeling ideal fluid flow, data visualization, and developing unconventional communication platforms. She is now working on her senior thesis in order to graduate with honors in May of 2024. 

  • Using Bayesian Network of subsystem statistical models to assess system behavior

    Abstract:

    Situations exists when a system-level test is rarely accomplished or simply not feasible. When subsystem testing is available, to include creating a subsystem statistical model, an approach is required to combine these models. A Bayesian Network (BN) is an approach to address this problem. A BN models system behavior using subsystem statistical models. The system is decomposed into a network of subsystems and the interactions between the subsystems are described. Each subsystem is in turn described by a statistical model which determines the subjective probability distribution of the outputs given a set of inputs. Previous methods have been developed for validating performance of the subsystem models and subsequently what can be known about system performance. This work defined a notional system, created the subsystem statistical models, generated synthetic data, and developed the Bayesian Network.
    Then, subsystem models are validated followed by a discussion on how system level information is derived from the Bayesian Network.

    Speaker Info:

    James Theimer

    Operations Research Analyst

    HS COBP

    Dr. James Theimer is a Scientific Test and Analysis Techniques Expert employed by Huntington Ingles Industries Technical Solutions and working to support the Homeland Security Center of Best Practices.
    Dr. Theimer worked for Air Force Research Laboratory and predecessor organizations for more than 35 years. He worked on modeling and simulation of sensors systems and supporting devices. His doctoral research was on modeling pulse formation in fiber lasers. He worked with a semiconductor reliability team as a reliability statistician and led a team which studied statistical validation of models of automatic sensor exploitation systems. This team also worked with programs to evaluate these systems.
    Dr. Theimer has a PhD in Electrical Engineering from Rensselaer Polytechnic Institute, and MS in Applied Statistics from Wright State University, and MS in Atmospheric Science from SUNY Albany and a BS in Physics from University of Rochester.

  • What drove the Carrington event? An analysis of currents and geospace regions.

    Abstract:

    The 1859 Carrington Event is the most intense geomagnetic storm in recorded history. This storm produced large changes to the geomagnetic field observed on the Earth’s surface, damaged telegraph systems, and created aurora visible over large portions of the earth. The literature provides numerous explanations for which phenomena drove the observed effects. Previous analyses typically relied upon on the historic magnetic field data from the event, newspaper reports, and empirical models. These analyses generally focus on whether one current system (e.g., magnetospheric currents) is more important than another (e.g., ionospheric currents). We expand the analysis by using results from the Space Weather Modeling Framework (SWMF), a complex magnetohydrodynamics code, to compute the contributions that various currents and geospace regions make to the northward magnetic field on the Earth’s surface. The analysis considers contributions from magnetospheric currents, ionospheric currents, and gap region field-aligned currents (FACs). In addition, we evaluate contributions from specific regions: the magnetosheath (between the earth and the sun), near Earth (within 6.6 earth radii), and the neutral sheet (behind the earth). Our analysis indicates that magnetic field changes observed during the Carrington Event involved a combination of current systems and regions rather than being driven by one specific current or region.

    Speaker Info:

    Dean Thomas

    Researcher

    George Mason University

    Dr. Dean Thomas works with the George Mason University Space Weather Lab, supporting a collaborative effort led by NASA Goddard.  His research focuses on space weather phenomena related to solar storms.  During these storms, the sun can eject billions of tons of plasma into space over just a few hours.  Most of these storms miss the Earth, but they can create large geomagnetically-induced currents (GIC), cause electrical blackouts, force airliners to change course, and damage satellites. Dr. Thomas’ research examines some of the largest storms observed, and the major factors that drive effects observed on the earth’s surface.  Earlier in his career, he was Deputy Director for the Operational Evaluation Division at the Institute for Defense Analyses, helping to manage a team of 150 researchers.  The division supports the Director, Operational Test and Evaluation (DOT&E) within the Pentagon.  DOT&E is responsible for operational testing of new military systems including aircraft, ships, ground vehicles, sensors, weapons, and information technology systems.  Dean Thomas received his PhD in Physics from Stony Brook University in 1987, and in 1982, his Bachelor of Science in Engineering Physics from the Colorado School of Mines.

  • Wildfire Burned Area Mapping Using Sentinel-1 SAR and Sentinel-2 MSI with Convolutional Neural Networks

    Abstract:

    The escalating environmental and societal repercussions of wildfires, underscored by the occurrence of four of the five largest wildfires in Colorado within the past five years, necessitate efficient mapping of burned areas to enhance emergency response and fire control strategies. This study investigates the potential of Synthetic Aperture Radar (SAR) capabilities of the Sentinel-1 satellite, in conjunction with optical imagery from Sentinel-2, to expedite the assessment of wildfire conditions and progression. Our research is structured into four distinct cases; each applied to our dataset comprising seven Colorado wildfires. In each case, we iteratively refined our methods to mitigate the inherent challenges associated with SAR data. Our results demonstrate that while SAR imagery may not match the precision of traditional methodologies, it offers a valuable trade-off by providing a sufficiently accurate estimate of burned areas in significantly less time.
    Furthermore, we developed a deep learning framework for predicting burn severity using both Sentinel-1 SAR and Sentinel-2 MSI data acquired during wildfire events. Our findings underscore the potential of spaceborne imagery for real-time burn severity prediction, providing valuable insights for the effective management of wildfires. This research contributes to the advancement of wildfire monitoring and response, particularly in regions prone to such events like Colorado, and underscores the significance of remote sensing technologies in addressing contemporary environmental challenges.

    Speaker Info:

    Garrett Chrisman

    Cadet

    United States Military Academy

    Garrett Chrisman is currently an undergraduate cadet at the United States Military Academy, West Point, majoring in Applied Statistics and Data Science. His academic focus includes Python, R, and machine learning applications. Garrett has engaged in research on wildfire severity assessment using Convolutional Neural Networks and satellite imagery. Additionally, he has held leadership roles, including being the Treasurer for his class, Captain of the Cycling team, and the President of the Finance Club. His work demonstrates a blend of technical skill and leadership.

  • Speaker Info:

    Missy Cummings

    Professor

    George Mason University

    Professor Mary (Missy) Cummings received her B.S. in Mathematics from the US Naval Academy in 1988, her M.S. in Space Systems Engineering from the Naval Postgraduate School in 1994, and her Ph.D. in Systems Engineering from the University of Virginia in 2004. A naval officer and military pilot from 1988-1999, she was one of the U.S. Navy's first female fighter pilots. She is a Professor in the George Mason University College of Engineering and Computing and is the director of the Mason Autonomy and Robotics Center (MARC). She is an American Institute of Aeronautics and Astronautics (AIAA) Fellow, and recently served as the senior safety advisor to the National Highway Traffic Safety Administration. Her research interests include the application of artificial intelligence in safety-critical systems, assured autonomy, human-systems engineering, and the ethical and social impact of technology.

  • Speaker Info:

    Elisabeth Paté-Cornell

    NASA Advisory Council/Professor

    Stanford University

    M. Elisabeth Paté-Cornell is the Burt and Deedee MacMurtry Professor in the department of Management Science and Engineering at Stanford University, a department which she founded then chaired from January 2000 to June 2011. She was elected to the National Academy of Engineering in 1995, to its Council (2001-2007), and to the French Académie des Technologies (2003). She was a member of the President’s Intelligence Advisory Board (2001-2004; 2006-2008), of the Boards of Trustees of the Aerospace Corporation (2004-2016), of InQtel (2006-2017) and of the Draper Corporation (2009-2016). She is a member of the NASA Advisory Council, and currently co-chairs the National Academies Committee on methods of analysis of the risks of nuclear war and nuclear terrorism. She is a world leader in engineering risk analysis and risk management. Her research and that of her Engineering Risk Research Group at Stanford have focused on the inclusion of technical and management factors in probabilistic risk analysis models with applications to the NASA shuttle tiles, offshore oil platforms and medical systems. Since 2001, she has combined risk analysis and game analysis to assess intelligence information and the risks of terrorist attacks. More recently her research has centered on the failure risks of cyber systems and artificial intelligence in risk management decision. She is past president (1995) and fellow of the Society for Risk Analysis, and a fellow of the Institute for Operations Research and Management Science. She received the 2021 IEEE Ramo medal for Systems Science and Engineering, and the 2022 PICMET Award in Engineering Risk Management. She has been a consultant to many industrial firms and government organizations. She has authored or co-authored more than a hundred papers in refereed journals and conference proceedings and has received several best-paper awards from professional organizations and peer-reviewed journals.

  • Speaker Info:

    Douglas Schmidt

    Director

    Operational Test & Evaluation

    Dr. Douglas C. Schmidt was sworn in as Director, Operational Test and Evaluation (DOT&E) on April 8, 2024. A Presidential appointee confirmed by the United States Senate, he serves as the senior advisor to the Secretary of Defense on operational and live fire test and evaluation of Department of Defense weapon systems.

    Prior to DOT&E, Dr. Schmidt served as the Cornelius Vanderbilt Professor of Engineering in Computer Science, the Associate Chair of Computer Science, and a Senior Researcher at the Institute for Software Integrated Systems at Vanderbilt University.

    From 2010 to 2014, Dr. Schmidt was a member of the Air Force Scientific Advisory Board (AF SAB), where he served as Vice Chair of studies on cyber situational awareness for Air Force mission operations and on sustaining hardware and software for U.S. aircraft. He also served on the advisory board for the joint Army/Navy Future Airborne Capability Environment initiative. From 2000 to 2003, Dr. Schmidt served as a program manager in the Defense Advanced Research Projects Agency (DARPA) Information Exploitation Office and Information Technology Office.

    Dr. Schmidt is an internationally renowned and widely cited researcher. Dr. Schmidt received Bachelor and Master of Arts degrees in Sociology from the College of William and Mary, and Master of Science and Doctorate degrees in Computer Science from the University of California, Irvine.

  • Abstract:

    Speaker Info:

    Pete Parker

    Statistician and Team Lead

    NASA

  • Abstract:

    Speaker Info:

    Norty Schwartz

    President

    IDA

    Norty Schwartz serves as President of IDA where he directs the activities of more than 1,000 scientists and technologists.

    Norty has a long and prestigious career of service and leadership that spans over five decades. He was most recently President and CEO of Business Executives for National Security (BENS). During his six-year tenure at BENS, he was also a member of IDA’s Board of Trustees.

    Prior to retiring from the U.S. Air Force, he served as the 19th Chief of Staff of the U.S. Air Force from 2008 to 2012. He previously held senior joint positions as Director of the Joint Staff and as the Commander of the U.S. Transportation Command. He began his service as a pilot with the airlift evacuation out of Vietnam in 1975.

    Norty is a U.S. Air Force Academy graduate and holds a master’s degree in business administration from Central Michigan University. He is also an alumnus of the Armed Forces Staff College and the National War College.

    He is a member of the Council on Foreign Relations and a 1994 Fellow of Massachusetts Institute of Technology’s Seminar XXI. He has been married to Suzie since 1981.

  • Abstract:

    Speaker Info:

    Bram Lillard

    Director

    IDA

  • *Virtual Speaker* Epistemic and Aleatoric Uncertainty Quantification for Gaussian Processes

    Abstract:

    One of the major advantages of using Gaussian Processes for regression and surrogate modeling is the availability of uncertainty bounds for the predictions. However, the validity of these bounds depends on using an appropriate covariance function, which is usually learned from the data out of a parametric family through maximum likelihood estimation or cross-validation methods. In practice, the data might not contain enough information to select the best covariance function, generating uncertainty in the hyperparameters, which translates into an additional layer of uncertainty in the predictions (epistemic uncertainty) on top of the usual posterior covariance (aleatoric uncertainty). In this talk, we discuss considering both uncertainties for UQ, and we quantify them by extending the MLE paradigm using a game theoretical framework that identifies the worst-case prior under a likelihood constraint specified by the practitioner.

    Speaker Info:

    Pau Batlle

    PhD Student

    California Institute of Technology

    Pau is a PhD student in Computing and Mathematical Sciences at Caltech, advised by Houman Owhadi. His main research area is Game-theoretical Uncertainty Quantification (UQ) and Gaussian Process Regression (GPR), both from a theoretical point of view and with applications to the Physical Sciences, including collaboration with scientists from the Machine Learning and Uncertainty Quantification groups at NASA Jet Propulsion Laboratory. Before joining Caltech, he graduated from Universitat Politècnica de Catalunya with a double degree in Mathematics and Engineering Physics as part of the CFIS program and he was a research intern at the Center for Data Science at NYU for 9 months.

  • *Virtual Speaker* Using Changepoint Detection and Artificial Intelligence to Classify Fuel Pressure Behaviors in Aerial Refueling

    Abstract:

    An open question in aerial refueling system test and evaluation is how to classify fuel pressure states and behaviors reproducibly and defensibly when visually inspecting the data stream. This question exists because fuel pressure data streams are highly stochastic, may exhibit multiple types of troublesome behavior simultaneously in a single stream, and may exhibit unique platform-dependent discernable behaviors. These data complexities result in differences in fuel pressure behavior classification determinations between engineers based on experience level and individual judgement. In addition to consuming valuable time, discordant judgements between engineers reduce confidence in metrics and other derived analytic products that are used to evaluate the system's performance. The Pruned Exact Linear Time (PELT) changepoint detection algorithm is an unsupervised machine learning method that, when coupled with an expert system AI, has provided a consistent and reproducible solution in classifying various fuel pressure states and behaviors with adjustable sensitivity.

    Speaker Info:

    Nelson Walker

    Mathematical Statistician

    United States Air Force

    Dr. Walker is a statistician for the United States Air Force at the 412th Test Wing at Edwards AFB, California. He graduate with a PhD in statistics from Kansas State University in 2021.

  • *Virtual Speaker* Using Changepoint Detection and Artificial Intelligence to Classify Fuel Pressure Behaviors in Aerial Refueling

    Abstract:

    An open question in aerial refueling system test and evaluation is how to classify fuel pressure states and behaviors reproducibly and defensibly when visually inspecting the data stream. This question exists because fuel pressure data streams are highly stochastic, may exhibit multiple types of troublesome behavior simultaneously in a single stream, and may exhibit unique platform-dependent discernable behaviors. These data complexities result in differences in fuel pressure behavior classification determinations between engineers based on experience level and individual judgement. In addition to consuming valuable time, discordant judgements between engineers reduce confidence in metrics and other derived analytic products that are used to evaluate the system's performance. The Pruned Exact Linear Time (PELT) changepoint detection algorithm is an unsupervised machine learning method that, when coupled with an expert system AI, has provided a consistent and reproducible solution in classifying various fuel pressure states and behaviors with adjustable sensitivity.

    Speaker Info:

    Michelle Ouellette

    Mathematical Statistician

    United States Air Force

    Ms. Ouellette is the Data Science and Statistics Lead at the 418th Global Reach Flight Test Squadron for the United States Air Force at the 412th Test Wing at Edwards AFB, California. She graduated with a M.S. in Statistics from California State University, Fullerton in 2018.

    Ms. Ouellette is the Data Science and Statistics Lead at the 418th Global Reach Flight Test Squadron for the United States Air Force at the 412th Test Wing at Edwards AFB, California. She graduated with a M.S. in Statistics from California State University, Fullerton in 2018.

  • A Bayesian Approach for Nonparametric Multivariate Process Monitoring using Universal Residuals

    Abstract:

    In Quality Control, monitoring sequential-functional observations for characteristic changes through change-point detection is a common practice to ensure that a system or process produces high-quality outputs. Existing methods in this field often only focus on identifying when a process is out-of-control without quantifying the uncertainty of the underlying decision-making processes. To address this issue, we propose using universal residuals under a Bayesian paradigm to determine if the process is out-of-control and assess the uncertainty surrounding that decision. The universal residuals are computed by combining two non-parametric techniques: regression trees and kernel density estimation. These residuals have the key feature of being uniformly distributed when the process is in control. To test if the residuals are uniformly distributed across time (i.e., that the process is in-control), we use a Bayesian approach for hypothesis testing, which outputs posterior probabilities for events such as the process being out-of-control at the current time, in the past, or in the future. We perform a simulation study and demonstrate that the proposed methodology has remarkable detection and a low false alarm rate.

    Speaker Info:

    Daniel Timme

    PhD Candidate

    Florida State University

    Daniel A. Timme is currently a PhD candidate in Statistics at Florida State University. Mr. Timme graduated with a BS in Mathematics from the University of Houston and a BS in Business Management from the University of Houston-Clear Lake. He earned an MS in Systems Engineering with a focus in Reliability and a second MS in Space Systems with focuses in Space Vehicle Design and Astrodynamics, both from the Air Force Institute of Technology. Mr. Timme’s research interest is primarily focused in the areas of reliability engineering, applied mathematics and statistics, optimization, and regression.

  • A Bayesian Decision Theory Framework for Test & Evaluation

    Abstract:

    Decisions form the core of T&E: decisions about which tests to conduct and, especially, decisions on whether to accept or reject a system at its milestones. The traditional approach to acceptance is based on conducting tests under various conditions to ensure that key performance parameters meet certain thresholds with the required degree of confidence. In this approach, data is collected during testing, then analyzed with techniques from classical statistics in a post-action report.
    This work explores a new Bayesian paradigm for T&E based on one simple principle: maintaining a model of the probability distribution over system parameters at every point during testing. In particular, the Bayesian approach posits a distribution over parameters prior to any testing. This prior distribution provides (a) the opportunity to incorporate expert scientific knowledge into the inference procedure, and (b) transparency regarding all assumptions being made. Once a prior distribution is specified, it can be updated as tests are conducted to maintain a probability distribution over the system parameters at all times. One can leverage this probability distribution in a variety of ways to produce analytics with no analog in the traditional T&E framework.
    In particular, having a probability distribution over system parameters at any time during testing enables one to implement an optimal decision-making procedure using Bayesian Decision Theory (BDT). BDT accounts for the cost of various testing options relative to the potential value of the system being tested. When testing is expensive, it provides guidance on whether to conserve resources by ending testing early. It evaluates the potential benefits of testing for both its ability to inform acceptance decisions and for its intrinsic value to the commander of an accepted system.
    This talk describes the BDT paradigm for T&E and provides examples of how it performs in simple scenarios. In future work we plan to extend the paradigm to include the features, the phenomena, and the SME elicitation protocols necessary to address realistic T&E cases.

    Speaker Info:

    James Ferry

    Senior Research Scientist

    Metron, Inc.

    Dr. James Ferry has been developing Bayesian analytics at Metron for 18 years. He has been the Principal Investigator for a variety of R&D projects that apply Bayesian methods to data fusion, network science, and machine learning. These projects range from associating disparate data types from multiple sensors for missile defense, developing methods to track hidden structures on dynamically changing networks, computing incisive analytics efficiently from information in large databases, and countering adversarial attacks on neural-network-based image classifiers. Dr. Ferry was active in the network science community in the 2010's. He organized a full-day special session on network science at FUSION 2015, co-organized WIND 2016 (Workshop on Incomplete Network Data), and organized a multi-day session on the Frontiers of Networks at the MORS METSM meeting in December 2016. Since then, his focus has been Bayesian analytics and network science algorithms for the Intelligence Community.

    Prior to Metron, Dr. Ferry was a computational fluid dynamicist. He developed models and supercomputer simulations of the multiphase fluid dynamics of rocket engines at the Center for Simulation of Advanced Rockets at UIUC. He has 30+ technical publications in fluid dynamics, network science, and Bayesian analytics.  He holds a B.S. in Mathematics from M.I.T. and a Ph.D. in Applied Mathematics from Brown University.

  • A Bayesian Optimal Experimental Design for High-dimensional Physics-based Models

    Abstract:

    Many scientific and engineering experiments are developed to study specific questions of interest. Unfortunately, time and budget constraints make operating these controlled experiments over wide ranges of conditions intractable, thus limiting the amount of data collected. In this presentation, we discuss a Bayesian approach to identify the most informative conditions, based on the expected information gain. We will present a framework for finding optimal experimental designs that can be applied to physics-based models with high-dimensional inputs and outputs. We will study a real-world example where we aim to infer the parameters of a chemically reacting system, but there are uncertainties in both the model and the parameters. A physics-based model was developed to simulate the gas-phase chemical reactions occurring between highly reactive intermediate species in a high-pressure photolysis reactor coupled to a vacuum-ultraviolet (VUV) photoionization mass spectrometer. This time-of-flight mass spectrum evolves in both kinetic time and VUV energy producing a high-dimensional output at each design condition. The high-dimensional nature of the model output poses a significant challenge for optimal experimental design, as a surrogate model is built for each output. We discuss how accurate low-dimensional representations of the high-dimensional mass spectrum are necessary for computing the expected information gain. Bayesian optimization is employed to maximize the expected information gain by efficiently exploring a constrained design space, taking into account any constraint on the operating range of the experiment. Our results highlight the trade-offs involved in the optimization, the advantage of using optimal designs, and provide a workflow for computing optimal experimental designs for high-dimensional physics-based models.

    Speaker Info:

    James Oreluk

    Postdoctoral Researcher

    Sandia National Laboratories

    James Oreluk is a postdoctoral researcher at Sandia National Laboratories in Livermore, CA. He earned his Ph.D. in Mechanical Engineering from UC Berkeley with research on developing optimization methods for validating physics-based models. His current research focuses on advancing uncertainty quantification and machine learning methods to efficiently solve complex problems, with recent work on utilizing low-dimensional representation for optimal decision making.

  • A Data-Driven Approach of Uncertainty Quantification on Reynolds Stress Based on DNS Turbulence Data

    Abstract:

    High-fidelity simulation capabilities have progressed rapidly over the past decades in computational fluid dynamics (CFD), resulting in plenty of high-resolution flow field data. Uncertainty quantification remains an unsolved problem due to the high-dimensional input space and the intrinsic complexity of turbulence. Here we developed an uncertainty quantification method to model the Reynolds stress based on Karhunen-Loeve Expansion(KLE) and Project Pursuit basis Adaptation polynomial chaos expansion(PPA). First, different representative volume elements (RVEs) were randomly drawn from the flow field, and KLE was used to reduce them into a moderate dimension. Then, we build polynomial chaos expansions of Reynolds stress using PPA. Results show that this method can yield a surrogate model with a test accuracy of up to 90%. PCE coefficients also show that Reynolds stress strongly depends on second-order KLE random variables instead of first-order terms. Regarding data efficiency, we built another surrogate model using a neural network(NN) and found that our method outperforms NN in limited data cases.

    Speaker Info:

    Zheming Gou

    Graduate Research assistant

    University of Southern California

    Zheming Gou, a Ph.D. student in the Department of Mechanical Engineering at the University of Southern California, is a highly motivated individual with a passion for high-fidelity simulations, uncertainty quantification, and machine learning, especially in high dimensions and rare data scenarios. Currently, Zheming Gou is engaged in research to build probabilistic models for multiscale simulations using tools including polynomial chaos expansion (PCE) and state-of-art machine learning methods.

  • A generalized influence maximization problem

    Abstract:

    The influence maximization problem is a popular topic in social networks with several applications in viral marketing and epidemiology. One possible way to understand the problem is from the perspective of a marketer who wants to achieve the maximum influence on a social network by choosing an optimum set of nodes of a given size as seeds. The marketer actively influences these seeds, followed by a passive viral process based on a certain influence diffusion model, in which influenced nodes influence other nodes without external intervention. Kempe et al. showed that a greedy algorithm-based approach can provide a (1-1/e)-approximation guarantee compared to the optimal solution if the influence spreads according to the Triggering model. In our current work, we consider a much more general problem where the goal is to maximize the total expected reward obtained from the nodes that are influenced by a given time (that may be finite or infinite) where the reward obtained by influencing a set of nodes can depend on the set (and not necessarily a sum of rewards from the individual nodes) as well as the times at which each node gets influenced, we can restrict ourself to a subset of the network from where the seeds can be chosen, we can choose to assign multiple units of our budget to a single node (where the maximum number of budget units that may be assigned on a node can depend on the node), and a seeded node will actually get influenced with a certain probability where the probability is a non-decreasing function of the number of budget units assigned to that node. We have formulated a greedy algorithm that provides a (1-1/e)-approximation guarantee compared to the optimal solution of this generalized influence maximization problem if the influence spreads according to the Triggering model.

    Speaker Info:

    Sumit Kumar Kar

    Ph.D. candidate

    University of North Carolina at Chapel Hill

    I am currently a Ph.D. candidate and a Graduate Teaching Assistant in the department of Statistics & Operations Research at the University of North Carolina at Chapel Hill  (UNC-CH). My Ph.D. advisors are Prof. Nilay Tanik Argon, Prof. Shankar Bhamidi, and Prof. Serhan Ziya. I broadly work in probability, statistics, and operations research. My current research interests include (but are not confined to) working on problems in random networks which find interesting practical applications such as in viral marketing or Word-Of-Mouth (WOM) marketing, efficient immunization during epidemic outbreaks, finding the root of an epidemic, and so on. Apart from research, I have worked on two statistical consulting projects with the UNC Dept. of Medicine and Quantum Governance, L3C, CUES. Also, I am extremely passionate about teaching. I have been the primary instructor as well as a teaching assistant for several courses at UNC-CH STOR. You can find more about me on my website here: https://sites.google.com/view/sumits-space/home.

  • A Stochastic Petri Net Model of Continuous Integration and Continuous Delivery

    Abstract:

    Modern software development organizations rely on continuous integration and continuous delivery (CI/CD), since it allows developers to continuously integrate their code in a single shared repository and automates the delivery process of the product to the user. While modern software practices improve the performance of the software life cycle, they also increase the complexity of this process. Past studies make improvements to the performance of the CI/CD pipeline. However, there are fewer formal models to quantitatively guide process and product quality improvement or characterize how automated and human activities compose and interact asynchronously. Therefore, this talk develops a stochastic Petri net model to analyze a CI/CD pipeline to improve process performance in terms of the probability of successfully delivering new or updated functionality by a specified deadline. The utility of the model is demonstrated through a sensitivity analysis to identify stages of the pipeline where improvements would most significantly improve the probability of timely product delivery. In addition, this research provided an enhanced version of the conventional CI/CD pipeline to examine how it can improve process performance in general. The results indicate that the augmented model outperforms the conventional model, and sensitivity analysis suggests that failures in later stages are more important and can impact the delivery of the final product.

    Speaker Info:

    Sushovan Bhadra

    Master's student

    University of Massachusetts Dartmouth

    Sushovan Bhadra is a MS student in the Department of Electrical & Computer Engineering at the University of Massachusetts Dartmouth (UMassD). He received his BS (2016) in Electrical Engineering from Ahsanullah University of Science and Technology (AUST), Bangladesh.

  • A Tour of JMP Reliability Platforms and Bayesian Methods for Reliability Data

    Abstract:

    JMP is a comprehensive, visual, and interactive statistical discovery software with a carefully curated graphical user interface designed for statistical discovery. The software is packed with traditional and modern statistical analysis capabilities and many unique innovative features. The software hosts several suites of tools that are especially valuable to the DATAWorks’s audience. The software includes suites for Design of Experiments, Quality Control, Process Analysis, and Reliability Analysis.

    JMP has been building its reliability suite for the past fifteen years. The reliability suite in JMP is a comprehensive and mature collections of JMP platforms. The suite empowers reliability engineers with tools for analyzing time-to-event data, accelerated life test data, observational reliability data, competing cause data, warranty data, cumulative damage data, repeated measures degradation data, destructive degradation data, and recurrence data. For each type of data, there are numerous models and one or more methodologies that are applicable based on the nature of data. In addition to reliability data analysis platforms, the suite also provides capabilities of reliability engineering for system reliability from two distinct perspectives, one for non-repairable systems and the other for repairable systems. The capability of JMP reliability suite is also at the frontier of advanced research on reliability data analysis. Inspired by the research by Prof. William Meeker at Iowa State University, we have implemented Bayesian inference methodologies for analyzing three most important types of reliability data. The tutorial will start with an overall introduction to JMP’s reliability platforms. Then the tutorial will focus on analyzing time-to-event data, accelerated life test data, and repeated measures degradation data. The tutorial will present analyzing these types of reliability data using traditional methods, and highlight when, why, and how to analyze them in JMP using Bayesian methods.

    Speaker Info:

    Peng Liu

    Principal Research Statistician Developer

    JMP Statistical Discovery

    Peng Liu is a Principal Research Statistician Developer at JMP Statistical Discovery LLC. He holds a Ph.D. in statistics from NCSU. He has been working at JMP since 2007. He specializes in computational statistics, software engineering, reliability data analysis, reliability engineering, time series analysis, and time series forecasting. He is responsible for developing and maintaining all JMP platforms in the above areas. He has broad interest in statistical analysis research and software product development.

  • Advanced Automated Machine Learning System for Cybersecurity

    Abstract:

    Florida International University (FIU) has developed an Advanced Automated Machine Learning System (AAMLS) under the sponsored research from Department of Defense - Test Resource Management Center (DOD-TRMC), to provide Artificial Intelligence based advanced analytics solutions in the area of Cyber, IOT, Network, Energy, Environment etc. AAMLS is a Rapid Modeling & Testing Tool (RMTT) for developing machine learning and deep learning models in few steps by subject matter experts from various domains with minimum machine learning knowledge using auto & optimization workflows.
    AAMLS allows analysis of data collected from different test technology domains by using machine learning / deep learning and ensemble learning approaches to generate models, make predictions, then apply advanced analytics and visualization to perform analysis. This system enables automated machine learning using AI based Advanced Analytics and the Analytics Control Center platforms by connecting to multiple Data Sources.

    Artificial Intelligence based Advanced Analytics Platform:
    This platform is the analytics engine of AAML which provides pre-processing, feature engineering, model building and predictions. Primary components of this platform include:

    • Machine Learning Server: This module is deployed to build ML/DL models using the training data from the data sources and perform predictions/analysis of associated test data based on the AAML-generated ML/DL models.
    • Machine Learning Algorithms: ML algorithms like Logistic Regression, Linear regression, Decision tree, Random Forest, One Class Support Vector Machine, Jaccard Similarity etc. are available for model building.
    • Deep learning Algorithms: Deep learning algorithms such as Deep Neural Networks and Recurrent Neural Networks are available to perform classification & anomaly detection using the TensorFlow framework and the Keras API.

    Analytics Control Center:
    This platform is a centralized application to manage the AAML system. It consists of following main modules.

    • Data Source: This module allows the user to connect to the existing data to the AAML system to perform analytics. These data sources may reside in a Network File Share, Database or Big Data Cluster.
    • Model Development: This module provides the functionality to build ML/DL models with various AI algorithms. This is performed by engaging specific ML algorithms for five types of analysis: Classification, Regression, Time-Series, Anomaly Detection and Clustering
    • Predictions: This module provides the functionality to predict the outcome of an analysis of an associated data set based on model built during Model Development.
    • Manage Models and Predictions: These modules allow the user to manage the ML models that have been generated and resulting predictions of associated data sets.

    Speaker Info:

    Himanshu Dayaram Upadhyay

    Associate Professor (ECE)

    Florida International University

    Dr. Himanshu Upadhyay is serving Florida International University’s Applied Research Center for the past 21 years, leading the Artificial Intelligence / Cybersecurity / Big Data research group. He is currently working as Associate Professor in Electrical & Computer Engineering teaching Artificial Intelligence and Cybersecurity courses. He is mentoring DOE Fellows, AI Fellows, Cyber Fellows, undergraduate and graduate students supporting multiple cybersecurity & AI research projects from various federal agencies.  He brings more than 30 years of experience in artificial intelligence/machine learning, big data, cybersecurity, information technology, management and engineering to his role, serving as co-principal investigator for multimillion - dollar cybersecurity and artificial intelligence projects for the Department of Defense and Defense Intelligence Agency. He is also serving as co-principal investigator for Department of Energy’s Office of Environmental Management research projects focused on knowledge/waste management, cybersecurity, artificial intelligence and big data technologies. He has published multiple papers in the area of cybersecurity, machine learning, deep learning, big data, knowledge / nuclear waste management and service-oriented architecture. His current research focuses on artificial intelligence, machine learning, deep learning, cyber security, big data, cyber analytics/visualization, cyber forensics, malware analysis and blockchain. He has architected a range of tiered and distributed application system to address strategic business needs, managing a team of researchers and scientists building secured enterprise information systems.

  • An Evaluation Of Periodic Developmental Reviews Using Natural Language Processing

    Abstract:

    As an institution committed to developing leaders of character, the United States Military Academy (USMA) holds a vested interest in measuring character growth. One such tool, the Periodic Developmental Review (PDR), has been used by the Academy’s Institutional Effectiveness Office for over a decade. PDRs are written counseling statements evaluating how a cadet is developing with respect to his/her peers. The objective of this research was to provide an alternate perspective of the PDR system by using statistical and natural language processing (NLP) based approaches to find whether certain dimensions of PDR data were predictive of a cadet’s overall rating. This research implemented multiple NLP tasks and techniques, including sentiment analysis, named entity recognition, tokenization, part-of-speech tagging, and word2vec, as well as statistical models such as linear regression and ordinal logistic regression. The ordinal logistic regression model concluded PDRs with optional written summary statements had more predictable overall scores than those without summary statements. Additionally, those who wrote the PDR on the cadet (Self, Instructor, Peer, Subordinate) held strong predictive value towards the overall rating. When compared to a self-reflecting PDR, instructor-written PDRs were 62.40% more probable to have a higher overall score, while subordinate-written PDRs had a probability of improvement of 61.65%. These values were amplified to 70.85% and 73.12% respectively when considering only those PDRs with summary statements. These findings indicate that different writer demographics have a different understanding of the meaning of each rating level. Recommendations for the Academy would be implementing a forced distribution or providing a deeper explanation of overall rating in instructions. Additionally, no written language facets analyzed demonstrated predictive strength, meaning written statements do not introduce unwanted bias and could be made a required field for more meaningful feedback to cadets.

    Speaker Info:

    Dominic Rudakevych

    Cadet

    United States Military Academy

    Cadet Dominic Rudakevych is from Hatboro, Pennsylvania, and is a senior studying mathematics at the United States Military Academy (USMA). Currently, CDT Rudakevych serves as the Society for Industrial and Applied Mathematics president at USMA and the captain of the chess team. He is involved in research within the Mathematics department, using artificial intelligence methods to analyze how written statements about cadets impact overall cadet ratings. He will earn a Bachelor of Science Degree in mathematics upon graduation. CDT Rudakevych will commission as a Military Intelligence officer in May and is excited to serve his country as a human-intelligence platoon leader.

  • An Introduction to Ranking Data and a Case Study of a National Survey of First Responders

    Abstract:

    Ranking data are collected by presenting a respondent with a list of choices,
    and then asking what are the respondent's favorite, second favorite, and so on.
    The rankings may be complete, the respondent rank orders the complete list, or
    partial, only the respondent's favorite two or three, etc. Given a sample of
    rankings from a population, one goal may be to estimate the most favored choice
    from the population. Another may be to compare the preferences of one
    subpopulation to another.

    In this presentation I will introduce ranking data and probability models that
    form the foundation for statistical inference for them. The Plackette-Luce
    model will be the main focus. After that I will introduce a real data set
    containing ranking data assembled by the National Institute of Standards and
    Technology (NIST) based on the results of a national survey of first
    responders. The survey asked about how first responders use communication
    technology. With this data set, questions such as do rural and urban/suburban
    first responders prefer the same types communication devices, can be explored.
    I will conclude with some ideas for incorporating rankning data into test and
    evaluation settings.

    Speaker Info:

    Adam Pintar

    Mathematical Statistician

    National Institute of Standards and Technology

    Adam earned a Ph.D. in Statistics from Iowa State University in 2010, and has
    been a Mathematical Statistician with NIST's Statistical Engineering Division
    since. His primary focus is providing statistical and machine learning
    expertise and insight on multidisciplinary research teams. He has collaborated
    with researchers from very diverse backgrounds such as social science,
    engineering, chemistry, and physics. He is a Past Chair of the Statistics
    Division of the American Society for Quality (ASQ), he currently serves on the
    editorial board of the journal Transactions on Mathematical Software, and he is
    a member of the American Statistical Association and a senior member of ASQ.

  • An Introduction to Uncertainty Quantification for Modeling & Simulation

    Abstract:

    Predictions from modeling and simulation (M&S) are increasingly relied upon to inform critical decision making in a variety of industries including defense and aerospace. As such, it is imperative to understand and quantify the uncertainties associated with the computational models used, the inputs to the models, and the data used for calibration and validation of the models. The rapidly evolving field of uncertainty quantification (UQ) combines elements of statistics, applied mathematics, and discipline engineering to provide this utility for M&S.

    This mini tutorial provides an introduction to UQ for M&S geared towards engineers and analysts with little-to-no experience in the field but with some knowledge of probability and statistics. A brief review of basic probability will be provided before discussing some core UQ concepts in more detail, including uncertainty propagation and the use of Monte Carlo simulation for making probabilistic predictions with computational models, model calibration to estimate uncertainty in model input parameters using experimental data, and sensitivity analysis for identifying the most important and influential model inputs parameters. Examples from relevant NASA applications are included and references are provided throughout to point viewers to resources for further study.

    Speaker Info:

    James Warner

    Computational Scientist

    NASA Langley Research Center

    Dr. James (Jim) Warner joined NASA Langley Research Center (LaRC) in 2014 as a Research Computer Engineer after receiving his PhD in Computational Solid Mechanics from Cornell University. Previously, he received his B.S. in Mechanical Engineering from SUNY Binghamton University and held temporary research positions at the National Institute of Standards and Technology and Duke University. Dr. Warner is a member of the Durability, Damage Tolerance, and Reliability Branch (DDTRB) at LaRC, where he focuses on developing computationally-efficient approaches for uncertainty quantification for a range of applications including structural health management, additive manufacturing, and trajectory simulation. Additionally, he works to bridge the gap between UQ research and NASA mission impact, helping to transition state-of-the-art methods to solve practical engineering problems. To that end, he has recently been involved in efforts to certify the xEMU spacesuit and develop guidance systems for entry, descent, and landing for Mars landing. His other research interests include machine learning, high performance computing, and topology optimization.

  • An Overview of Methods, Tools, and Test Capabilities for T&E of Autonomous Systems

    Abstract:

    This tutorial will give an overview of selected methodologies, tools and test capabilities discussed in the draft “Test and Evaluation Companion Guide for Autonomous Military Systems.” This test and evaluation (T&E) companion guide is being developed to provide guidance to test and evaluation practitioners, to include program managers, test planners, test engineers, and analysts with test strategies, applicable methodologies, and tools that will help to improve rigor in addressing the challenges unique to the T&E of autonomy. It will also cover selected capabilities of test laboratories and ranges that support autonomous systems. The companion guide is intended to be a living document contributed by the entire community and will adapt to ensure the right information reaches the right audience.

    Speaker Info:

    Leonard Truett

    Senior STAT Expert

    STAT COE

    Dr. Truett has been a member of the Scientific Test and Analysis Techniques Center of Excellence (STAT COE) located at WPAFB, OH since 2012 and is currently the is the Senior STAT Expert. He began his career as a civilian for the 46th Test Wing supporting Live Fire Test and Evaluation (LFT&E) specializing in fire and explosion suppression for Aircraft.  He has also worked for the Institute for Defense Analyses (IDA) supporting the Director, Operational Test and Evaluation (DOT&E) in LFT&E and Operational Test and Evaluation (OT&E) for air systems.  He holds a Bachelor’s of Science in Aerospace Engineering and a Master’s of Science in Aerospace Engineering from the Georgia Institute of Technology, and Doctorate of Aerospace Engineering from The University of California, San Diego.

  • An Overview of Methods, Tools, and Test Capabilities for T&E of Autonomous Systems

    Abstract:

    This tutorial will give an overview of selected methodologies, tools and test capabilities discussed in the draft “Test and Evaluation Companion Guide for Autonomous Military Systems.” This test and evaluation (T&E) companion guide is being developed to provide guidance to test and evaluation practitioners, to include program managers, test planners, test engineers, and analysts with test strategies, applicable methodologies, and tools that will help to improve rigor in addressing the challenges unique to the T&E of autonomy. It will also cover selected capabilities of test laboratories and ranges that support autonomous systems. The companion guide is intended to be a living document contributed by the entire community and will adapt to ensure the right information reaches the right audience.

    Speaker Info:

    Charlie Middleton

    Senior STAT Expert

    STAT COE

    Charlie Middleton currently leads the Advancements in Test and Evaluation (T&E) of
    Autonomous Systems team for the OSD STAT Center of Excellence. His responsibilities include
    researching autonomous system T&E methods and tools; collaborating with Department of
    Defense program and project offices developing autonomous systems; leading working groups of
    autonomy testers, staffers, and researchers; and authoring a handbook, reports, and papers
    related to T&E of autonomous systems. Previously, Mr. Middleton led development of a live-fire
    T&E risk-based framework for survivability and lethality evaluation for the office of the Director,
    Operational T&E; led a multi-domain modeling and simulation team supporting Air Force
    Research Labs future space efforts; and developed a Bayesian reliability analysis toolset for the
    National Air and Space Intelligence Center. While an active-duty Air Force officer, he was a
    developmental and operational test pilot leading several aircraft and weapons T&E programs
    and projects, and piloted 291 combat hours in the F-16 aircraft, employing precision munitions
    in Close Air Support, Time-Sensitive Targeting, and Suppression of Enemy Air Defense combat
    missions. Mr. Middleton is a distinguished graduate of the U.S. Naval Test Pilot School, and
    holds undergraduate and graduate degrees in operations research and operations analysis from
    Princeton University and the Air Force Institute of Technology.

  • An Overview of the NASA Quesst Community Test Campaign with the X-59 Aircraft

    Abstract:

    In its mission to expand knowledge and improve aviation, NASA conducts research to address sonic boom noise, the prime barrier to overland supersonic flight. For half a century civilian aircraft have been required to fly slower than the speed of sound when over land to prevent sonic boom disturbances to communities under the flight path. However, lower noise levels may be achieved via new aircraft shaping techniques that reduce the merging of shockwaves generated during supersonic flight. As part of its Quesst mission, NASA is building a piloted, experimental aircraft called the X-59 to demonstrate low noise supersonic flight. After initial flight testing to ensure the aircraft performs as designed, NASA will begin a national campaign of community overflight tests to collect data on how people perceive the sounds from this new design. The data collected will support national and international noise regulators’ efforts as they consider new standards that would allow supersonic flight over land at low noise levels. This presentation provides an overview of the community test campaign, including the scope, key objectives, stakeholders, and challenges.

    Speaker Info:

    Jonathan Rathsam

    Senior Research Engineer

    NASA Langley Research Center

    Jonathan Rathsam is a Senior Research Engineer at NASA’s Langley Research Center in Hampton, Virginia.  He conducts laboratory and field research on human perceptions of low noise supersonic overflights.  He currently serves as Technical Lead of Survey Design and Analysis for Community Test Planning and Execution within NASA’s Commercial Supersonic Technology Project.  He has previously served as NASA co-chair for DATAWorks and as chair for a NASA Source Evaluation Board.  He holds a Ph.D. in Engineering from the University of Nebraska, a B.A. in Physics from Grinnell College in Iowa, and completed postdoctoral research in acoustics at Ben-Gurion University in Israel.

  • Analysis of Surrogate Strategies and Regularization with Application to High-Speed Flows

    Abstract:

    Surrogate modeling is an important class of techniques used to reduce the burden of resource-intensive computational models by creating fast and accurate approximations. In aerospace engineering, surrogates have been used to great effect in design, optimization, exploration, and uncertainty quantification (UQ) for a range of problems, like combustor design, spacesuit damage assessment, and hypersonic vehicle analysis. Consequently, the development, analysis, and practice of surrogate modeling is of broad interest. In this talk, several widely used surrogate modeling strategies are studied as archetypes in a discussion on parametric/nonparametric surrogate strategies, local/global model forms, complexity regularization, uncertainty quantification, and relative strengths/weaknesses. In particular, we consider several variants of two widely used classes of methods: polynomial chaos and Gaussian process regression. These surrogate models are applied to several synthetic benchmark test problems and examples of real high-speed flow problems, including hypersonic inlet design, thermal protection systems, and shock-wave/boundary-layer interactions. Through analysis of these concrete examples, we analyze the trade-offs that modelers must navigate to create accurate, flexible, and robust surrogates.

    Speaker Info:

    Gregory Hunt

    Assistant Professor

    William & Mary

    Greg is an interdisciplinary researcher that helps advance science with statistical and data-analytic tools. He is trained as a statistician, mathematician and computer scientist, and currently works on a diverse set of problems in engineering, physics, and microbiology.

  • Analysis of Surrogate Strategies and Regularization with Application to High-Speed Flows

    Abstract:

    Surrogate modeling is an important class of techniques used to reduce the burden of resource-intensive computational models by creating fast and accurate approximations. In aerospace engineering, surrogates have been used to great effect in design, optimization, exploration, and uncertainty quantification (UQ) for a range of problems, like combustor design, spacesuit damage assessment, and hypersonic vehicle analysis. Consequently, the development, analysis, and practice of surrogate modeling is of broad interest. In this talk, several widely used surrogate modeling strategies are studied as archetypes in a discussion on parametric/nonparametric surrogate strategies, local/global model forms, complexity regularization, uncertainty quantification, and relative strengths/weaknesses. In particular, we consider several variants of two widely used classes of methods: polynomial chaos and Gaussian process regression. These surrogate models are applied to several synthetic benchmark test problems and examples of real high-speed flow problems, including hypersonic inlet design, thermal protection systems, and shock-wave/boundary-layer interactions. Through analysis of these concrete examples, we analyze the trade-offs that modelers must navigate to create accurate, flexible, and robust surrogates.

    Speaker Info:

    Gregory Hunt

    Assistant Professor

    William & Mary

    Greg is an interdisciplinary researcher that helps advance science with statistical and data-analytic tools. He is trained as a statistician, mathematician and computer scientist, and currently works on a diverse set of problems in engineering, physics, and microbiology.

  • Analysis Opportunities for Missile Trajectories

    Abstract:

    Contractor analysis teams are given 725 monte-carlo generated trajectories for thermal heating analysis. The analysis team currently uses a reduced-order model to evaluate all 725 options for the worst cases. The worst cases are run with the full model for handoff of predicted temperatures to the design team. Two months later, the customer arrives with yet another set of trajectories updated for the newest mission options.
    This presentation will question each step in this analysis process for opportunities to improve the cost, schedule, and fidelity of the effort. Using uncertainty quantification and functional data analysis processes, the team should be able to improve the analysis coverage and power with reduced (or at least a similar number) of model runs.

    Speaker Info:

    Kelsey Cannon

    Research Scientist

    Lockheed Martin Space

    Kelsey Cannon is a senior research scientist at Lockheed Martin Space in Denver, CO. She earned a bachelors degree from the CO School of Mines in Metallurgical and Materials Engineering and a masters degree in Computer Science and Data Science. In her current role, Kelsey works across aerospace and DoD programs to advise teams on effective use of statistical and uncertainty quantification techniques to save time and budget.

  • Application of Recurrent Neural Network for Software Defect Prediction

    Abstract:

    Traditional software reliability growth models (SRGM) characterize software defect detection as a function of testing time. Many of those SRGM are modeled by the non-homogeneous Poisson process (NHPP). However, those models are parametric in nature and do not explicitly encode factors driving defect or vulnerability discovery. Moreover, NHPP models are characterized by a mean value function that predicts the average of the number of defects discovered by a certain point in time during the testing interval, but may not capture all changes and details present in the data and do not consider them. More recent studies proposed SRGM incorporating covariates, where defect discovery is a function of one or more test activities documented and recorded during the testing process. These covariate models introduce an additional parameter per testing activity, which adds a high degree of non-linearity to traditional NHPP models, and parameter estimation becomes complex since it is limited to maximum likelihood estimation or expectation maximization. Therefore, this talk assesses the potential use of neural networks to predict software defects due to their ability to remember trends. Three different neural networks are considered, including (i) Recurrent neural networks (RNNs), (ii) Long short-term memory (LSTM), and (iii) Gated recurrent unit (GRU) to predict software defects. The neural network approaches are compared with the covariate model to evaluate the ability in predictions. Results suggest that GRU and LSTM present better goodness-of-fit measures such as SSE, PSSE, and MAPE compared to RNN and covariate models, indicating more accurate predictions.

    Speaker Info:

    Fatemeh Salboukh

    phd student

    University of Massachusetts Dartmouth

    I am a Ph.D. student in the Department of Engineering and Applied Science at the University of Massachusetts Dartmouth. I received my Master’s in from the University of Allame Tabataba’i in Mathematical Statistics (September, 2020) and my Bachelor’s degree from Yazd University (July, 2018) in Applied Statistics.

  • Application of Software Reliability and Resilience Models to Machine Learning

    Abstract:

    Machine Learning (ML) systems such as Convolutional Neural Networks (CNNs) are susceptible to adversarial scenarios. In these scenarios, an attacker attempts to manipulate or deceive a machine learning model by providing it with malicious input, necessitating quantitative reliability and resilience evaluation of ML algorithms. This can result in the model making incorrect predictions or decisions, which can have severe consequences in applications such as security, healthcare, and finance. Failure in the ML algorithm can lead not just to failures in the application domain but also to the system to which they provide functionality, which may have a performance requirement, hence the need for the application of software reliability and resilience. This talk demonstrates the applicability of software reliability and resilience tools to ML algorithms providing an objective approach to assess recovery after a degradation from known adversarial attacks. The results indicate that software reliability growth models and tools can be used to monitor the performance and quantify the reliability and resilience of ML models in the many domains in which machine learning algorithms are applied.

    Speaker Info:

    Zakaria Faddi

    Master Student

    University of Massachusetts Dartmouth

    Zakaria Faddi is a master's student at the University of Massachusetts Dartmouth in the Electrical and Computer Engineering department.  He completed his undergraduate in the Spring of 2022 at the same institute in Electrical and Computer Engineering with a concentration in Cybersecurity.

  • Applications of Network Methods for Supply Chain Review

    Abstract:

    The DoD maintains a broad array of systems, each one sustained by an often complex supply chain of components and suppliers. The ways that these supply chains are interlinked can have major implications for the resilience of the defense industrial base as a whole, and the readiness of multiple weapon systems. Finding opportunities to improve overall resilience requires gaining visibility of potential weak links in the chain, which requires integrating data across multiple disparate sources. By using open-source data pipeline software to enhance reproducibility, and flexible network analysis methods, multiple stovepiped data sources can be brought together to develop a more complete picture of the supply chain across systems.

    Speaker Info:

    Zed Fashena

    Research Associate

    IDA

    Zed Fashena is currently a Research Associate in the Information Technology and Systems Division at the Institute for Defense Analyses. He holds a Master of Science in Statistics from the University of Wisconsin - Madison and a Bachelor of Arts in Economics from Carleton College (MN).

  • Applied Bayesian Methods for Test Planning and Evaluation

    Abstract:

    Bayesian methods have been promoted as a promising way for test and evaluation analysts to leverage previous information across a continuum-of-testing approach to system evaluation. This short course will cover how to identify when Bayesian methods might be useful within a test and evaluation context, components required to accomplish a Bayesian analysis, and provide an understanding of how to interpret the results of that analysis. The course will apply these concepts to two hands-on examples (code and applications provided): one example focusing on system reliability and one focusing on system effectiveness. Furthermore, individuals will gain an understanding of the sequential nature of a Bayesian approach to test and evaluation, the limitations thereof, and gain a broad understanding of questions to ask to ensure a Bayesian analysis is appropriately accomplished.

    Additional Information:

    1. Recommended background: basic understanding of common distributions (e.g., Normal distribution, Binomial distribution)
    2. To follow along with examples, it is recommended you bring a computer that…
      o Can access the internet
      o Already has R Studio loaded (code will be provided; however, not having R Studio will not be a detriment to those who are interested in understanding the overall process but not executing an analysis themselves)

    Speaker Info:

    Victoria Sieck

    Deputy Director

    STAT COE/AFIT

    Dr. Victoria R. C. Sieck is the Deputy Director of the Scientific Test & Analysis Center of Excellence (STAT COE), where she works with major acquisition programs within the Department of Defense (DoD) to apply rigor and efficiency to current and emerging test and evaluation methodologies through the application of the STAT process. Additionally, she is an Assistant Professor of Statistics at the Air Force Institute of Technology (AFIT), where her research interests include design of experiments, and developing innovate Bayesian approaches to DoD testing. As an Operations Research Analyst in the US Air Force (USAF), her experiences in the USAF testing community include being a weapons and tactics analyst and an operational test analyst. Dr. Sieck has a M.S. in Statistics from Texas A&M University, and a Ph.D. in Statistics from the University of New Mexico.

  • Applied Bayesian Methods for Test Planning and Evaluation

    Abstract:

    Bayesian methods have been promoted as a promising way for test and evaluation analysts to leverage previous information across a continuum-of-testing approach to system evaluation. This short course will cover how to identify when Bayesian methods might be useful within a test and evaluation context, components required to accomplish a Bayesian analysis, and provide an understanding of how to interpret the results of that analysis. The course will apply these concepts to two hands-on examples (code and applications provided): one example focusing on system reliability and one focusing on system effectiveness. Furthermore, individuals will gain an understanding of the sequential nature of a Bayesian approach to test and evaluation, the limitations thereof, and gain a broad understanding of questions to ask to ensure a Bayesian analysis is appropriately accomplished.

    Additional Information

    1. Recommended background: basic understanding of common distributions (e.g., Normal distribution, Binomial distribution)
    2. To follow along with examples, it is recommended you bring a computer that…
      o Can access the internet
      o Already has R Studio loaded (code will be provided; however, not having R Studio will not be a detriment to those who are interested in understanding the overall process but not executing an analysis themselves)

    Speaker Info:

    Cory Natoli

    Applied Statistician

    Huntington Ingalls Industries/STAT COE

    Dr. Cory Natoli works as an applied statistician at Huntington Ingalls Industries as a part of the Scientific Test and Analysis Techniques Center of Excellence (STAT COE). He received his MS in Applied Statistics from The Ohio State University and his Ph.D. in Statistics from The Air Force Institute of Technology. His emphasis lies in design of experiments, regression modeling, statistical analysis, and teaching.

  • Applied Bayesian Methods for Test Planning and Evaluation

    Abstract:

    Bayesian methods have been promoted as a promising way for test and evaluation analysts to leverage previous information across a continuum-of-testing approach to system evaluation. This short course will cover how to identify when Bayesian methods might be useful within a test and evaluation context, components required to accomplish a Bayesian analysis, and provide an understanding of how to interpret the results of that analysis. The course will apply these concepts to two hands-on examples (code and applications provided): one example focusing on system reliability and one focusing on system effectiveness. Furthermore, individuals will gain an understanding of the sequential nature of a Bayesian approach to test and evaluation, the limitations thereof, and gain a broad understanding of questions to ask to ensure a Bayesian analysis is appropriately accomplished.

    Additional Information

    • Recommended background: basic understanding of common distributions (e.g., Normal distribution, Binomial distribution)

    To follow along with examples, it is recommended you bring a computer that…
    o Can access the internet
    o Already has R Studio loaded (code will be provided; however, not having R Studio will not be a detriment to those who are interested in understanding the overall process but not executing an analysis themselves)

    Speaker Info:

    Corey Thrush

    Statistician

    Huntington Ingalls Industries/STAT COE

    Mr. Corey Thrush is a statistician at Huntington Ingalls Industries within the Scientific Test and Analysis Techniques Center of Excellence (STAT COE). He received a B.S. in Applied Statistics from Ohio Northern University and an M.A. in Statistics from Bowling Green State University. His interests are data exploration, statistical programming, and Bayesian Statistics.

  • April 26th Morning Keynote

    Speaker Info:

    Mr. Peter Coen

    Mission Integration Manager

    NASA

    Peter Coen currently serves as the Mission Integration Manager for NASA’s Quesst Mission.  His primary responsibility in this role is to ensure that the X-59 aircraft development, in-flight acoustic validation and community test elements of the Mission stay on track toward delivering on NASA’s Critical Commitment to provide quiet supersonic overflight response data to the FAA and the International Civil Aviation Organization.

    Previously, Peter was the manager for the Commercial Supersonic Technology Project in NASA’s Aeronautics Research Mission, where he led a team from the four NASA Aero Research Centers in the development of tools and technologies for a new generation of quiet and efficient supersonic civil transport aircraft.

    Peter’s NASA career spans almost 4 decades. During this time, he has studied technology integration in practical designs for many different types of aircraft and has made technical and management contributions to all of NASA’s supersonics related programs over the past 30 years.  As Project Manager, he led these efforts for 12 years.

    Peter is a licensed private pilot who has amassed nearly 30 seconds of supersonic flight time.

  • April 26th Morning Keynote

    Speaker Info:

    Maj. Gen. Shawn N. Bratton

    Commander, Space Training and Readiness Command

    United States Space Force

    Maj Gen Shawn N. Bratton is Commander, Space Training and Readiness Command, temporarily located at Peterson Space Force Base, Colorado. Space Training and Readiness Command was established as a Field Command 23 August 2021, and is responsible for preparing the USSF and more than 6,000 Guardians to prevail in competition and conflict through innovative education, training, doctrine, and test activities. Maj Gen Bratton received his commission from the Academy of Military Science in Knoxville, Tenn. Prior to his commissioning Maj Gen Bratton served as an enlisted member of the 107th Air Control Squadron, Arizona Air National Guard. He has served in numerous operational and staff positions. Maj Gen Bratton was the first Air National Guardsman to attend Space Weapons Instructor Course at Nellis Air Force Base. He deployed to the Air Component Coordination Element, Camp Victory Iraq for Operation IRAQI FREEDOM, he served as the USNORTHCOM Director of Space Forces, and commanded the 175th Cyberspace Operations Group, Maryland Air National Guard. He also served as the Deputy Director of Operations, USSPACECOM.

  • April 26th Morning Keynote

    Speaker Info:

    Col Sacha Tomlinson

    Test Enterprise Division Chief, STARCOM

    United States Space Force

    Colonel Sacha Tomlinson is the Test Enterprise Division Chief for Space Training and Readiness Command (STARCOM), Peterson Space Force Base, Colorado. She is also the STARCOM Commander’s Deputy Operational Test Authority. In these two roles, she is responsible for establishing STARCOM’s test enterprise, managing and executing the continuum of test for space systems and capabilities.

    Colonel Tomlinson’s background is principally in operational test, having served two assignments in the Air Force Test and Evaluation Center first as the Test Director for the Space Based Infrared System's GEO 1 test campaign, and later as AFOTEC Detachment 4's Deputy Commander.  She also had two assignments with the 17th Test Squadron in Air Combat Command, first as the Detachment 4 Commander, responsible for testing the Eastern and Western Range Launch and Test Range Systems, and later commanding the 17th Test Squadron, which tested major sustainment upgrades for Space systems, as well as conducting Tactics Development & Evaluations and the Space Weapon System Evaluation Program.

  • April 27th Morning Keynote

    Speaker Info:

    The Honorable Christine Fox

    Senior Fellow

    Johns Hopkins University Applied Physics Laboratory

    The Honorable Christine Fox currently serves as a member of the President’s National Infrastructure Advisory Council, participates on many governance and advisory boards, and is a Senior Fellow at the Johns Hopkins University Applied Physics Laboratory. Previously, she was the Assistant Director for Policy and Analysis at JHU/APL, a posi­tion she held from 2014 to early 2022. Before joining APL, she served as Acting Deputy Secretary of Defense from 2013 to 2014 and as Director of Cost Assessment and Program Evaluation (CAPE) from 2009-2013. As Director, CAPE, Ms. Fox served as chief analyst to the Secretary of Defense.  She officially retired from the Pentagon in May 2014. Prior to her DoD positions, she served as president of the Center for Naval Analyses from 2005 to 2009, after working there as a research analyst and manager since 1981. Ms. Fox holds a bachelor and master of science degree from George Mason University.

  • Army Wilks Award

    Speaker Info:

    Wilks Award Winner to Be Announced

  • ASA SDNS Student Poster Awards

    Speaker Info:

    Student Winners to be Announced

  • Assessing Predictive Capability and Contribution for Binary Classification Models

    Abstract:

    Classification models for binary outcomes are in widespread use across a variety of industries. Results are commonly summarized in a misclassification table, also known as an error or confusion matrix, which indicates correct vs incorrect predictions for different circumstances. Models are developed to minimize both false positive and false negative errors, but the optimization process to train/obtain the model fit necessarily results in cost-benefit trades. However, how to obtain an objective assessment of the performance of a given model in terms of predictive capability or benefit is less well understood, due to both the rich plethora of options described in literature as well as the largely overlooked influence of noise factors, specifically class imbalance. Many popular measures are susceptible to effects due to underlying differences in how the data are allocated by condition, which cannot be easily corrected.

    This talk considers the wide landscape of possibilities from a statistical robustness perspective. Results are shown from sensitivity analyses for a variety of different conditions for several popular metrics and issues are highlighted, highlighting potential concerns with respect to machine learning or ML-enabled systems. Recommendations are provided to correct for imbalance effects, as well as how to conduct a simple statistical comparison that will detangle the beneficial effects of the model itself from those of imbalance. Results are generalizable across model type.

    Speaker Info:

    Mindy Hotchkiss

    Technical Specialist

    Aerojet Rocketdyne

    Mindy Hotchkiss is a Technical Specialist and Subject Matter Expert in Statistics for Aerojet Rocketdyne. She holds a BS degree in Mathematics and Statistics and an MBA from the University of Florida, and a Masters in Statistics from North Carolina State University. She has over 20 years of experience as a statistical consultant between Pratt & Whitney and Aerojet Rocketdyne, including work supporting technology development across the enterprise, including hypersonics and metals additive manufacturing. She has been a team member and statistics lead on multiple Metals Affordability Initiative projects, working with industry partners and the Air Force Research Laboratory Materials Directorate. Interests include experimentation, risk and reliability, statistical modeling in any form, machine learning, autonomous systems development and systems engineering, digital engineering, and the practical implementation of statistical methods. She is a Past Chair of the ASQ Statistics Division and currently serves on the Board of Directors for RAMS, the Reliability and Maintainability Symposium, and the Governing Board of the Ellis R. Ott Graduate Scholarship program.

  • Assessing Risk with Cadet Candidates and USMA Admissions

    Abstract:

    Though the United States Military Academy (USMA) graduates approximately 1,000 cadets annually, over 100 cadets from the initial cohort fail to graduate and are separated or resign at great expense to the federal government. Graduation risk among incoming cadet candidates is difficult to measure; based on current research, the strongest predictors of college graduation risk are high school GPA and, to a lesser extent, standardized test scores. Other predictors include socioeconomic factors, demographics, culture, and measures of prolonged and active participation in extra-curricular activities. For USMA specifically, a cadet candidate’s Whole Candidate Score (WCS), which includes measures to score leadership and physical fitness, has historically proven to be a promising predictor of a cadet’s performance at USMA. However, predicting graduation rates and identifying risk variables still proves to be difficult. Using data from the USMA Admissions Department, we used logistic regression, k-Nearest Neighbors, random forests, and gradient boosting algorithms to better predict which cadets would be separated or resign using potential variables that may relate to graduation risk. Using measures such as p-values for statistical significance, correlation coefficients, and the Area Under the Curve (AUC) scores to determine true positives, we found supplementing the current admissions criteria with data on the participation of certain extra-curricular activities improves prediction rates on whether a cadet will graduate.

    Speaker Info:

    Daniel Lee

    Cadet

    United States Military Academy

    I was born in Harbor City, California at the turn of the millennium to two Korean immigrant parents. For most of my young life, I grew up in various communities in Southern California before my family ultimately settled in Murrieta in the Inland Empire, roughly equidistant from Los Angeles and San Diego. At the beginning of my 5th grade year, my father accepted a job offer with the US Army Corps of Engineers at Yongsan, South Korea. The seven years that I would spend in Korea would be among the most formative and fond years of my life. In Korea, I grew to better understand the diverse nations that composed the world and grew closer to my Korean heritage that I often forgot living in the US. It was in Korea, however, that I made my most impactful realization: I wanted to serve in the military. The military never crossed my mind as a career growing up. Growing up around the Army in Korea, I knew this was a path I wanted. Though the military was entirely out of my character, I spent the next several years working towards my goal of becoming an Army officer. Just before my senior year of high school, my family moved again to the small, rural town of Vidalia, Louisiana. I transitioned from living in a luxury high-rise in the middle of Seoul to a bungalow in one of the poorest regions of the US. Yet, I once again found myself entranced; not only did I once again grow to love my new home, but I also began to open my mind to the struggles, perspectives, and motivations of many rural Americans. To this day, I proudly proclaim my hometown and state of residence as Vidalia, Louisiana. My acceptance into West Point shortly after my move marked the beginning of my great adventure, fulfilling a life-long dream of serving in the Army and becoming an officer.

  • Assurance of Responsible AI/ML in the DOD Personnel Space

    Abstract:

    Testing and assuring responsible use of AI/ML enabled capabilities is a nascent topic in the DOD with many efforts being spearheaded by CDAO. In general, black box models tend to suffer from consequences related to edge cases, emergent behavior, misplaced or lack of trust, and many other issues, so traditional testing is insufficient to guarantee safety and responsibility in the employment of a given AI enabled capability. Focus of this concern tends to fall on well-publicized and high-risk capabilities, such as AI enabled autonomous weapons systems. However, while AI/ML enabled capabilities supporting personnel processes and systems, such as algorithms used for retention and promotion decision support, tend to carry low safety risk, many concerns, some of them specific to the personnel space, run the risk of undermining the DOD’s 5 ethical principles for RAI. Examples include service member privacy concerns, invalid prospective policy analysis, disparate impact against marginalized service member groups, and unintended emergent service member behavior in response to use of the capability. Eroding barriers to use of AI/ML are facilitating an increasing number of applications while some of these concerns are still not well understood by the analytical community. We consider many of these issues in the context of an IDA ML enabled capability and propose mechanisms to assure stakeholders of the adherence to the DOD’s ethical principles.

    Speaker Info:

    John Dennis

    Research Staff Member (Economist)

    IDA

    Dr. John W. Dennis, PhD, is a research staff member focusing on Econometrics, Statistics, and Data Science in the Institute for Defense Analyses' Human Capital and Test Science groups. He received his PhD in Economics from the University of North Carolina at Chapel Hill in 2019.

  • Avoiding Pitfalls in AI/ML Packages

    Abstract:

    Recent years have seen an explosion in the application of artificial intelligence and machine learning (AI/ML) to practical problems from computer vision to game playing to algorithm design. This growth has been mirrored and, in many ways, been enabled by the development and maturity of publicly-available software packages such as PyTorch and TensorFlow that make model building, training, and testing easier than ever. While these packages provide tremendous power and flexibility to users, and greatly facilitate learning and deploying AI/ML techniques, they and the models they provide are extremely complicated and as a result can present a number of subtle but serious pitfalls. This talk will present three examples from the presenter's recent experience where obscure settings or bugs in these packages dramatically changed model behavior or performance - one from a classic deep learning application, one from training of a classifier, and one from reinforcement learning. These examples illustrate the importance of thinking carefully about the results that a model is producing and carefully checking each step in its development before trusting its output.

    Speaker Info:

    Justin Krometis

    Research Assistant Professor

    Virginia Tech National Security Institute

    Justin Krometis is a Research Assistant Professor with the Virginia Tech National Security Institute and holds an adjunct position in the Math Department at Virginia Tech. His research in mostly in development of theoretical and computational frameworks for Bayesian data analysis. These include approaches to incorporating and balancing data and expert opinion into decision-making, estimating model parameters, including high- or even infinite-dimensional quantities, from noisy data, and designing experiments to maximize the information gained. He also has extensive expertise in high-performance computing and more recently-developed skills in Artificial Intelligence/Machine Learning (AI/ML) techniques. Research interests include: Statistical Inverse Problems, High-Performance Computing, Parameter Estimation, Uncertainty Quantification, Artificial Intelligence/Machine Learning (AI/ML), Reinforcement Learning, and Experimental Design.

    Prior to joining VTNSI, Dr. Krometis spent ten years as a Computational Scientist supporting high-performance computing with Advanced Research Computing at Virginia Tech and seven years in the public and private sectors doing transportation modeling for planning and evacuation applications and hurricane, pandemic, and other emergency preparedness. He holds Ph.D., M.S., and B.S. degrees in Math and a B.S. degree in Physics, all from Virginia Tech.

  • Back to the Future: Implementing a Time Machine to Improve and Validate Model Predictions

    Abstract:

    At a time when supply chain problems are challenging even the most efficient and robust supply ecosystems, the DOD faces the additional hurdles of primarily dealing in low volume orders of highly complex components with multi-year procurement and repair lead times. When combined with perennial budget shortfalls, it is imperative that the DOD spend money efficiently by ordering the “right” components at the “right time” to maximize readiness. What constitutes the “right” components at the “right time” depends on model predictions that are based upon historical demand rates and order lead times. Given that the time scales between decisions and results are often years long, even small modeling errors can lead to months-long supply delays or tens of millions of dollars in budget shortfalls. Additionally, we cannot evaluate the accuracy and efficacy of today’s decisions for some years to come. To address this problem, as well as a wide range of similar problems across our Sustainment analysis, we have built “time machines” to pursue retrospective validation – for a given model, we rewind DOD data sources to some point in the past and compare model predictions, using only data available at the time, against known historical outcomes. This capability allows us to explore different decisions and the alternate realities that would manifest in light of those choices. In some cases, this is relatively straightforward, while in others it is made quite difficult by problems familiar to any time-traveler: changing the past can change the future in unexpected ways.

    Speaker Info:

    Kyle Remley

    Research Staff Member

    IDA

    Dr. Remley received his Bachelor's of Science in Nuclear and Radiological Engineering from Georgia Tech in 2013, he received in Master's of Science in Nuclear Engineering from Georgia Tech in 2015, and he received in Ph.D. in Nuclear and Radiological Engineering from Georgia Tech in 2016. He was a senior engineer with the Naval Nuclear Laboratory until 2020. Dr. Remley has been a Research Staff Member with IDA since July 2020.

  • Back to the Future: Implementing a Time Machine to Improve and Validate Model Predictions

    Abstract:

    At a time when supply chain problems are challenging even the most efficient and robust supply ecosystems, the DOD faces the additional hurdles of primarily dealing in low volume orders of highly complex components with multi-year procurement and repair lead times. When combined with perennial budget shortfalls, it is imperative that the DOD spend money efficiently by ordering the “right” components at the “right time” to maximize readiness. What constitutes the “right” components at the “right time” depends on model predictions that are based upon historical demand rates and order lead times. Given that the time scales between decisions and results are often years long, even small modeling errors can lead to months-long supply delays or tens of millions of dollars in budget shortfalls. Additionally, we cannot evaluate the accuracy and efficacy of today’s decisions for some years to come. To address this problem, as well as a wide range of similar problems across our Sustainment analysis, we have built “time machines” to pursue retrospective validation – for a given model, we rewind DOD data sources to some point in the past and compare model predictions, using only data available at the time, against known historical outcomes. This capability allows us to explore different decisions and the alternate realities that would manifest in light of those choices. In some cases, this is relatively straightforward, while in others it is made quite difficult by problems familiar to any time-traveler: changing the past can change the future in unexpected ways.

    Speaker Info:

    Olivia Gozdz

    Research Staff Member

    IDA

    Dr. Gozdz received her Bachelor's of Science in Physics from Hamilton College in 2016, and she received her Ph.D. in Climate Science from George Mason University in 2022. Dr. Gozdz has been a Research Staff Member with IDA since September 2022.

  • Best Practices for Using Bayesian Reliability Analysis in Developmental Testing

    Abstract:

    Traditional methods for reliability analysis are challenged in developmental testing (DT) as systems become increasingly complex and DT programs become shorter and less predictable. Bayesian statistical methods, which can combine data across DT segments and use additional data to inform reliability estimates, can address some of these challenges. However, Bayesian methods are not widely used. I will present the results of a study aimed at identifying effective practices for the use of Bayesian reliability analysis in DT programs. The study consisted of interviews with reliability subject matter experts, together with a review of relevant literature on Bayesian methods. This analysis resulted in a set of best practices that can guide an analyst in deciding whether to apply Bayesian methods, in selecting the appropriate Bayesian approach, and in applying the Bayesian method and communicating the results.

    Speaker Info:

    Paul Fanto

    Research Staff Member

    IDA

    Paul Fanto is a research staff member at the Institute for Defense Analyses (IDA).  His work focuses on the modeling and analysis of space and ISR systems and on statistical methods for reliability.  He received a Ph.D. in Physics from Yale University in 2021, where he developed computational models of atomic nuclei.

  • Best Practices for Using Bayesian Reliability Analysis in Developmental Testing

    Abstract:

    Traditional methods for reliability analysis are challenged in developmental testing (DT) as systems become increasingly complex and DT programs become shorter and less predictable. Bayesian statistical methods, which can combine data across DT segments and use additional data to inform reliability estimates, can address some of these challenges. However, Bayesian methods are not widely used. I will present the results of a study aimed at identifying effective practices for the use of Bayesian reliability analysis in DT programs. The study consisted of interviews with reliability subject matter experts, together with a review of relevant literature on Bayesian methods. This analysis resulted in a set of best practices that can guide an analyst in deciding whether to apply Bayesian methods, in selecting the appropriate Bayesian approach, and in applying the Bayesian method and communicating the results.

    Speaker Info:

    Paul Fanto

    Research Staff Member

    Institute for Defense Analyses

    Paul Fanto is a research staff member at the Institute for Defense Analyses (IDA).  His work focuses on the modeling and analysis of space and ISR systems and on statistical methods for reliability.  He received a Ph.D. in Physics from Yale University in 2021, where he developed computational models of atomic nuclei.

  • Case Study on Test Planning and Data Analysis for Comparing Time Series

    Abstract:

    Several years ago, the US Army Research Institute of Environmental Medicine developed an algorithm to estimate core temperature in military working dogs (MWDs). This canine thermal model (CTM) is based on thermophysiological principles and incorporates environmental factors and acceleration. The US Army Medical Materiel Development Activity is implementing this algorithm in a collar-worn device that includes computing hardware, environmental sensors, and an accelerometer. Among other roles, Johns Hopkins University Applied Physics Laboratory (JHU/APL) is coordinating the test and evaluation of this device.

    The device’s validation is ultimately tied to field tests involving MWDs. However, to minimize the burden to MWDs and the interruptions to their training, JHU/APL seeks to leverage non-canine laboratory-based testing to the greatest possible extent.

    For example, JHU/APL is testing the device’s accelerometers with shaker tables that vertically accelerate the device according to specified sinusoidal acceleration profiles. This test yields time series of acceleration and related metrics, which are compared to ground-truth measurements from a reference accelerometer.

    Statistically rigorous comparisons between the CTM and reference measurements must account for the potential lack of independence between measurements that are close in time. Potentially relevant techniques include downsampling, paired difference tests, hypothesis tests of absolute difference, hypothesis tests of distributions, functional data analysis, and bootstrapping.

    These considerations affect both test planning and subsequent data analysis. This talk will describe JHU/APL’s efforts to test and evaluate the CTM accelerometers and will outline a range of possible methods for comparing time series.

    Speaker Info:

    Phillip Koshute

    Johns Hopkins University Applied Physics Laboratory

    Phillip Koshute is a data scientist and statistical modeler at the Johns Hopkins University Applied Physics Laboratory. He has degrees in mathematics and operations research and is currently pursuing his PhD in applied statistics at the University of Maryland.

  • CDAO Joint AI Test Infrastructure Capability

    Abstract:

    The Chief Digital and AI Office (CDAO) Test & Evaluation Directorate is developing the Joint AI Test Infrastructure Capability (JATIC) program of record, which is an interoperable set of state-of-the-art software capabilities for AI Test & Evaluation. It aims to provide a provide a comprehensive suite of integrated testing tools which can be deployed widely across the enterprise to address key T&E gaps. In particular, JATIC will capabilities will support the assessment of AI system performance, cybersecurity, adversarial resilience, and explainability - enabling the end-user to more effectively execute their mission. It is a key component of the digital testing infrastructure that the CDAO will provide in order to support the development and deployment of data, analytics, and AI across the Department.

    Speaker Info:

    David Jin

    Senior AI Engineer

    MITRE

    David Jin is the AI Test Tools Lead at the Chief Digital and AI Office. Within this role, he leads the Joint AI Test Infrastructure Capability Program which is developing software tools for rigorous AI algorithmic testing. His background is in computer vision and pure mathematics.

  • Circular Error Probable and an Example with Multilevel Effects

    Abstract:

    Circular Error Probable (CEP) is a measure of a weapon system’s precision developed based on the Bivariate Normal Distribution. Failing to understanding the theory behind CEP can result in misuse of equations developed to help estimation. Estimation of CEP is also much more straightforward given situations such as single samples where factors are not being manipulated. This brief aims to help build a theoretical understanding of CEP, and then presents a non-trivial example in which CEP is estimated via multilevel regression. The goal is to help build an understanding of CEP so it can be properly estimated in trivial (single sample) and non-trivial cases (e.g. regression and multilevel regression).

    Speaker Info:

    Jacob Warren

    Assistant Scientific Advisor

    Marine Corps Operational Test and Evaluation Activity

    Jacob Warren is the Assistant Scientific Advisor for the Marine Corps Operational Test and Evaluation Activity (MCOTEA). He was worked for MCOTEA since 2011 starting as a statistician before moving into his current role. Mr. Warren has a Master of Science degree in Applied Statistics from the Rochester Institute of Technology.

  • Coming Soon

    Speaker Info:

    Cosmin Safta

    Sandia National Laboratories

  • Comparing Normal and Binary D-optimal Design of Experiments by Statistical Power

    Abstract:

    In many Department of Defense (DoD) Test and Evaluation (T&E) applications, binary response variables are unavoidable. Many have considered D-optimal design of experiments (DOEs) for generalized linear models (GLMs). However, little consideration has been given to assessing how these new designs perform in terms of statistical power for a given hypothesis test. Monte Carlo simulations and exact power calculations suggest that D-optimal designs generally yield higher power than binary D-optimal designs, despite using logistic regression in the analysis after data have been collected. Results from using statistical power to compare designs contradict traditional DOE comparisons which employ D-efficiency ratios and fractional design space (FDS) plots. Power calculations suggest that practitioners that are primarily interested in the resulting statistical power of a design should use normal D-optimal designs over binary D-optimal designs when logistic regression is to be used in the data analysis after data collection.

    Speaker Info:

    Addison Adams

    Summer Associate / Graduate Student

    IDA / Colorado State University

    Addison joined the Institute for Defense Analysis (IDA) during the summer of 2022. Addison is currently a PhD student at Colorado State University where he is studying statistics. Addison's PhD research is focused on the stochastic inverse problem and its applications to random coefficient models. Before attending graduate school, Addison worked as a health actuary for Blue Cross of Idaho. Addison attended Utah Valley University (UVU) where he earned a BS in mathematics. During his time at UVU, Addison completed internships with both the FBI and AON.

  • Comparison of Bayesian and Frequentist Methods for Regression

    Abstract:

    Statistical analysis is typically conducted using either a frequentist or Bayesian approach. But what is the impact of choosing one analysis method over another? This presentation will compare the results of both linear and logistic regression using Bayesian and frequentist methods. The data set combines information on simulated diffusion of material and anticipated background signal to imitate sensor output. The sensor is used to estimate the total concentration of material, and a threshold will be set such that the false alarm rate (FAR) due to the background is a constant. The regression methods are used to relate the probability of detection, for a given FAR, to predictor variables, such as the total amount of material released. The presentation concludes with a comparison of the similarities and differences between the two methods given the results.

    Speaker Info:

    James P Theimer

    Operations Research Analyst/STAT Expert

    Homeland Security Community of Best Practices

    Dr. James Theimer is a Scientific Test and Analysis Techniques Expert employed by Huntington Ingles Industries Technical Solutions and working to support the Homeland Security Center of Best Practices.

    Dr. Theimer worked for Air Force Research Laboratory and predecessor organizations for more than 35 years.  He worked on modeling and simulation of sensors systems and supporting devices.  His doctoral research was on modeling pulse formation in fiber lasers.  He worked with a semiconductor reliability team as a reliability statistician and led a team which studied statistical validation of models of automatic sensor exploitation systems.  This team also worked with programs to evaluate these systems.

    Dr. Theimer has a PhD in Electrical Engineering from Rensselaer Polytechnic Institute, and MS in Applied Statistics from Wright State University, and MS in Atmospheric Science from SUNY Albany and a BS in Physics from University of Rochester.

  • Comparison of Magnetic Field Line Tracing Methods

    Abstract:

    At George Mason University, we are developing swmfio, a Python package, for processing Space Weather Modeling Framework (SWMF) magnetosphere and ionosphere results, which is used to study the sun, heliosphere, and the magnetosphere. The SWMF framework centers around a high-performance magnetohydrodynamic (MHD) model, the Block Adaptive Tree Solar-wind Roe Upwind Scheme (BATS-R-US). This analysis uses swmfio and other methods, to trace magnetic field lines, compare the results, and identify why the methods differ. While the earth's magnetic field protects the planet from solar radiation, solar storms can distort the earth's magnetic field allowing solar storms to damage satellites and electrical grids. Being able to trace magnetic field lines helps us understand space weather. In this analysis, the September 1859 Carrington Event is examined. This event is the most intense geomagnetic storm in recorded history. We use three methods to trace magnetic field lines in the Carrington Event, and compare the field lines generated by the different methods. We consider two factors in the analysis. First, we directly compare methods by measuring the distances between field lines generated by different methods. Second, we consider how sensitive the methods are to initial conditions. We note that swmfio’s linear interpolation, which is customized for the BATS-R-US adaptive mesh, provides expected results. It is insensitive to small changes in initial conditions and terminates field lines at boundaries. We observe, that for any method, when the mesh size becomes large, results may not be accurate.

    Speaker Info:

    Dean Thomas

    Researcher

    George Mason University

    In 2022, Dean Thomas joined a NASA Goddard collaboration examining space weather phenomena.   His research is examining major solar events that affect the earth.  While the earth's magnetic field protects the earth from solar radiation, solar storms can distort the earth's magnetic field allowing the storms to damage satellites and electrical grids.  Previously, he was Deputy Director for the Operational Evaluation Division (OED) at the Institute for Defense Analyses (IDA), managing a team of 150 researchers. OED supports the Director, Operational Test and Evaluation (DOT&E) within the Pentagon, who is responsible for operational testing of new military systems including aircraft, ships, ground vehicles, sensors, weapons, and information technology systems. His analyses fed into DOT&E’s reports and testimony to Congress and the Secretary of Defense on whether these new systems can successfully complete their missions and protect their crews.  He received his PhD in Physics in 1987 from the State University of New York (SUNY), Stony Brook.

  • Confidence Intervals for Derringer and Suich Desirability Function Optimal Points

    Abstract:

    A shortfall of the Derringer and Suich (1980) desirability function for multi-objective optimization has been a lack of inferential methods to quantify uncertainty. Most articles for addressing uncertainty involve robust methods, providing a point estimate that is less affected by variation. Few articles address confidence intervals or bands but not specifically for the widely used Derringer and Suich method. 8 methods are presented to construct 100(1-alpha) confidence intervals around Derringer and Suich desirability function optimal values. First order and second order models using bivariate and multivariate data sets are used as examples to demonstrate effectiveness. The 8 proposed methods include a simple best/worst case method, 2 generalized methods, 4 simulated surface methods, and a nonparametric bootstrap method. One of the generalized methods, 2 of the simulated surface methods, and the nonparametric method account for covariance between the response surfaces. All 8 methods seem to perform decently on the second order models; however, the methods which utilize an underlying multivariate-t distribution, Multivariate Generalized (MG) and Multivariate t Simulated Surface (MVtSSig) are recommended methods from this research as they perform well with small samples for both first order and second order models with coverage only becoming unreliable at consistently non-optimal solutions. MG and MVtSSig inference could also be used in conjunction with robust methods such as Pareto Front Optimization to help ascertain which solutions are more likely to be optimal before constructing confidence interval.

    Speaker Info:

    Peter Calhoun

    Operational Test Analyst

    HQ AFOTEC

    Peter Calhoun received the B.S degree in Applied Mathematics from the University of New Mexico, M.S. in Operations Research from the Air Force Institute of Technology (AFIT), and Ph.D in Applied Mathematics from AFIT. He has been an Operations Research Analyst with the United States Air Force since 2017. He is currently an Operational Test Analyst at HQ AFOTEC. His research interests are analysis of designed experiments, multivariate statistics, and response surface methodology.

  • Covariate Resilience Modeling

    Abstract:

    Resilience is the ability of a system to respond, absorb, adapt, and recover from a disruptive event. Dozens of metrics to quantify resilience have been proposed in the literature. However, fewer studies have proposed models to predict these metrics or the time at which a system will be restored to its nominal performance level after experiencing degradation. This talk presents three alternative approaches to model and predict performance and resilience metrics with techniques from reliability engineering, including (i) bathtub-shaped hazard functions, (ii) mixture distributions, and (iii) a model incorporating covariates related to the intensity of events that degrade performance as well as efforts to restore performance. Historical data sets on job losses during seven different recessions in the United States are used to assess the predictive accuracy of these approaches, including the recession that began in 2020 due to COVID-19. Goodness of fit measures and confidence intervals as well as interval-based resilience metrics are computed to assess how well the models perform on the data sets considered. The results suggest that both bathtub-shaped functions and mixture distributions can produce accurate predictions for data sets exhibiting V, U, L, and J shaped curves, but that W and K shaped curves that respectively experience multiple shocks, deviate from the assumption of a single decrease and subsequent increase, or suffers a sudden drop in performance cannot be characterized well by either of those classes proposed. In contrast, the model incorporating covariates is capable of tracking all of types of curves noted above very well, including W and K shaped curves such as the two successive shocks the U.S. economy experienced in 1980 and the sharp degradation in 2020. Moreover, covariate models outperform the simpler models on all of the goodness of fit measures and interval-based resilience metrics computed for all seven data sets considered. These results suggest that classical reliability modeling techniques such as bathtub-shaped hazard functions and mixture distributions are suitable for modeling and prediction of some resilience curves possessing a single decrease and subsequent recovery, but that covariate models to explicitly incorporate explanatory factors and domain specific information are much more flexible and achieve higher goodness of fit and greater predictive accuracy. Thus, the covariate modeling approach provides a general framework for data collection and predictive modeling for a variety of resilience curves.

    Speaker Info:

    Priscila Silva

    Graduate Research Assistant

    University of Massachusetts Dartmouth

    Priscila Silva is a Ph.D. student in Electrical and Computer Engineering at University of Massachusetts Dartmouth (UMassD). She received her MS  in Computer Engineering from UMassD in 2022, and her BS degree in Electrical Engineering from Federal University of Ouro Preto (UFOP) in 2017.

  • Covariate Resilience Modeling

    Abstract:

    Resilience is the ability of a system to respond, absorb, adapt, and recover from a disruptive event. Dozens of metrics to quantify resilience have been proposed in the literature. However, fewer studies have proposed models to predict these metrics or the time at which a system will be restored to its nominal performance level after experiencing degradation. This talk presents three alternative approaches to model and predict performance and resilience metrics with techniques from reliability engineering, including (i) bathtub-shaped hazard functions, (ii) mixture distributions, and (iii) a model incorporating covariates related to the intensity of events that degrade performance as well as efforts to restore performance. Historical data sets on job losses during seven different recessions in the United States are used to assess the predictive accuracy of these approaches, including the recession that began in 2020 due to COVID-19. Goodness of fit measures and confidence intervals as well as interval-based resilience metrics are computed to assess how well the models perform on the data sets considered. The results suggest that both bathtub-shaped functions and mixture distributions can produce accurate predictions for data sets exhibiting V, U, L, and J shaped curves, but that W and K shaped curves that respectively experience multiple shocks, deviate from the assumption of a single decrease and subsequent increase, or suffers a sudden drop in performance cannot be characterized well by either of those classes proposed. In contrast, the model incorporating covariates is capable of tracking all of types of curves noted above very well, including W and K shaped curves such as the two successive shocks the U.S. economy experienced in 1980 and the sharp degradation in 2020. Moreover, covariate models outperform the simpler models on all of the goodness of fit measures and interval-based resilience metrics computed for all seven data sets considered. These results suggest that classical reliability modeling techniques such as bathtub-shaped hazard functions and mixture distributions are suitable for modeling and prediction of some resilience curves possessing a single decrease and subsequent recovery, but that covariate models to explicitly incorporate explanatory factors and domain specific information are much more flexible and achieve higher goodness of fit and greater predictive accuracy. Thus, the covariate modeling approach provides a general framework for data collection and predictive modeling for a variety of resilience curves.

    Speaker Info:

    Andrew Bajumpaa

    Graduate Research Assistant

    University of Massachusetts Dartmouth

    Andrew Bajumpaa is an undergraduate student in Computer Science at University of Massachusetts Dartmouth.

  • Covariate Resilience Modeling

    Abstract:

    Resilience is the ability of a system to respond, absorb, adapt, and recover from a disruptive event. Dozens of metrics to quantify resilience have been proposed in the literature. However, fewer studies have proposed models to predict these metrics or the time at which a system will be restored to its nominal performance level after experiencing degradation. This talk presents three alternative approaches to model and predict performance and resilience metrics with techniques from reliability engineering, including (i) bathtub-shaped hazard functions, (ii) mixture distributions, and (iii) a model incorporating covariates related to the intensity of events that degrade performance as well as efforts to restore performance. Historical data sets on job losses during seven different recessions in the United States are used to assess the predictive accuracy of these approaches, including the recession that began in 2020 due to COVID-19. Goodness of fit measures and confidence intervals as well as interval-based resilience metrics are computed to assess how well the models perform on the data sets considered. The results suggest that both bathtub-shaped functions and mixture distributions can produce accurate predictions for data sets exhibiting V, U, L, and J shaped curves, but that W and K shaped curves that respectively experience multiple shocks, deviate from the assumption of a single decrease and subsequent increase, or suffers a sudden drop in performance cannot be characterized well by either of those classes proposed. In contrast, the model incorporating covariates is capable of tracking all of types of curves noted above very well, including W and K shaped curves such as the two successive shocks the U.S. economy experienced in 1980 and the sharp degradation in 2020. Moreover, covariate models outperform the simpler models on all of the goodness of fit measures and interval-based resilience metrics computed for all seven data sets considered. These results suggest that classical reliability modeling techniques such as bathtub-shaped hazard functions and mixture distributions are suitable for modeling and prediction of some resilience curves possessing a single decrease and subsequent recovery, but that covariate models to explicitly incorporate explanatory factors and domain specific information are much more flexible and achieve higher goodness of fit and greater predictive accuracy. Thus, the covariate modeling approach provides a general framework for data collection and predictive modeling for a variety of resilience curves.

    Speaker Info:

    Christian Taylor

    Graduate Research Assistant

    University of Massachusetts Dartmouth

    Christian Taylor is an undergraduate student in Computer Engineering at University of Massachusetts Dartmouth.

  • Covariate Resilience Modeling

    Abstract:

    Resilience is the ability of a system to respond, absorb, adapt, and recover from a disruptive event. Dozens of metrics to quantify resilience have been proposed in the literature. However, fewer studies have proposed models to predict these metrics or the time at which a system will be restored to its nominal performance level after experiencing degradation. This talk presents three alternative approaches to model and predict performance and resilience metrics with techniques from reliability engineering, including (i) bathtub-shaped hazard functions, (ii) mixture distributions, and (iii) a model incorporating covariates related to the intensity of events that degrade performance as well as efforts to restore performance. Historical data sets on job losses during seven different recessions in the United States are used to assess the predictive accuracy of these approaches, including the recession that began in 2020 due to COVID-19. Goodness of fit measures and confidence intervals as well as interval-based resilience metrics are computed to assess how well the models perform on the data sets considered. The results suggest that both bathtub-shaped functions and mixture distributions can produce accurate predictions for data sets exhibiting V, U, L, and J shaped curves, but that W and K shaped curves that respectively experience multiple shocks, deviate from the assumption of a single decrease and subsequent increase, or suffers a sudden drop in performance cannot be characterized well by either of those classes proposed. In contrast, the model incorporating covariates is capable of tracking all of types of curves noted above very well, including W and K shaped curves such as the two successive shocks the U.S. economy experienced in 1980 and the sharp degradation in 2020. Moreover, covariate models outperform the simpler models on all of the goodness of fit measures and interval-based resilience metrics computed for all seven data sets considered. These results suggest that classical reliability modeling techniques such as bathtub-shaped hazard functions and mixture distributions are suitable for modeling and prediction of some resilience curves possessing a single decrease and subsequent recovery, but that covariate models to explicitly incorporate explanatory factors and domain specific information are much more flexible and achieve higher goodness of fit and greater predictive accuracy. Thus, the covariate modeling approach provides a general framework for data collection and predictive modeling for a variety of resilience curves.

    Speaker Info:

    Drew Borden

    Graduate Research Assistant

    University of Massachusetts Dartmouth

    Drew Borden is an undergraduate student in Computer Engineering at University of Massachusetts Dartmouth.

  • Covariate Software Vulnerability Discovery Model to Support Cybersecurity T&E

    Abstract:

    Vulnerability discovery models (VDM) have been proposed as an application of software reliability growth models (SRGM) to software security related defects. VDM model the number of vulnerabilities discovered as a function of testing time, enabling quantitative measures of security. Despite their obvious utility, past VDM have been limited to parametric forms that do not consider the multiple activities software testers undertake in order to identify vulnerabilities. In contrast, covariate SRGM characterize the software defect discovery process in terms of one or more test activities. However, data sets documenting multiple security testing activities suitable for application of covariate models are not readily available in the open literature.

    To demonstrate the applicability of covariate SRGM to vulnerability discovery, this research identified a web application to target as well as multiple tools and techniques to test for vulnerabilities. The time dedicated to each test activity and the corresponding number of unique vulnerabilities discovered were documented and prepared in a format suitable for application of covariate SRGM. Analysis and prediction were then performed and compared with a flexible VDM without covariates, namely the Alhazmi-Malaiya Logistic Model (AML). Our results indicate that covariate VDM significantly outperformed the AML model on predictive and information theoretic measures of goodness of fit, suggesting that covariate VDM are a suitable and effective method to predict the impact of applying specific vulnerability discovery tools and techniques.

    Speaker Info:

    Lance Fiondella

    Associate Professor

    University of Massachusetts

    Lance Fiondella is an Associate Professor in the Department of Electrical & Computer Engineering at the University of Massachusetts Dartmouth and the Founding Director of the University of Massachusetts Dartmouth Cybersecurity Center, A NSA/DHS designated Center of Academic Excellence in Cyber Research (CAE-R). His research has been funded by DHS, Army, Navy, Air Force, NASA, and National Science Foundation, including a CAREER award and CyberCorps Scholarship for Service.

  • Cyber Testing Embedded Systems with Digital Twins

    Abstract:

    Dynamic cyber testing and analysis require instrumentation to facilitate measurements, e.g., to determine which portions of code have been executed, or detection of anomalous conditions which might not manifest at the system interface. However, instrumenting software causes execution to diverge from the execution of the deployed binaries. And instrumentation requires mechanisms for storing and retrieving testing artifacts on target systems. RESim is a dynamic testing and analysis platform that does not instrument software. Instead, RESim instruments high fidelity models of target hardware upon which software-under-test executes, providing detailed insight into program behavior. Multiple modeled computer platforms run within a single simulation that can be paused, inspected and run forward or backwards to selected events such as the modification of a specific memory address. Integration of the Google’s AFL fuzzer with RESim avoids the need to create fuzzing harnesses because programs are fuzzed in their native execution environment, commencing from selected execution states with data injected directly into simulated memory instead of I/O streams. RESim includes plugins for the IDA Pro and NSA’s Ghidra disassembler/debuggers to facilitate interactive analysis of individual processes and threads, providing the ability to skip to selected execution states (e.g., a reference to an input buffer) and “reverse execution” to reach a breakpoint by appearing to run backwards in time. RESim simulates networks of computers through use of Wind River’s Simics platform of high fidelity models of processors, peripheral devices (e.g., network interface cards), and memory. The networked simulated computers load and run firmware and software from images extracted from the physical systems being tested. Instrumenting the simulated hardware allows RESim to observe software behavior from the other side of the hardware, i.e., without affecting its execution.

    Simics includes tools to extend and create high fidelity models of processors and devices, providing a clear path to deploying and managing digital twins for use in developmental test and evaluation. The simulations can include optional real-world network and bus interfaces to facilitate integration into networks and test ranges. Simics is a COTS product that runs on commodity hardware and is able to execute several parallel instances of complex multi-component systems on a typical engineering workstation or server.

    This presentation will describe RESim and strategies for using digital twins for cyber testing of embedded systems. And the presentation will discuss some of the challenges associated with fuzzing non-trivial software systems.

    Speaker Info:

    Michael Thompson

    Research Associate

    Naval Postgraduate School

    Mike Thompson is a Research Associate at the Naval Postgraduate School.  Mike is the lead developer of the RESim reverse engineering platform, which grew out of his work as a member of the competition infrastructure development team for DARPA's Cyber Grand Challenge.  He is the lead developer for the Labtainers cyber lab exercise platform and for the CyberCIEGE educational video game. Mike has decades of experience developing products for software vulnerability analysis, cybersecurity education and high assurance trusted platforms.

  • Data Fusion: Using Data Science to Facilitate the Fusion of Multiple Streams of Data

    Abstract:

    Today there are an increasing number of sensors on the battlefield. These sensors collect data that includes, but is not limited to, images, audio files, videos, and text files. With today’s technology, the data collection process is strong, and there is a growing opportunity to leverage multiple streams of data, each coming in different forms. This project aims to take multiple types of data, specifically images and audio files, and combine them to increase our ability to detect and recognize objects. The end state of this project is the creation of an algorithm that utilizes and merges voice recordings and images to allow for easier recognition.

    Most research tends to focus on one modality or the other, but here we focus on the prospect of simultaneously leveraging both modalities for improved entity resolution. With regards to audio files, the most successful deconstruction and dimension reduction technique is a deep auto encoder. For images, the most successful technique is the use of a convolutional neural network. To combine the two modalities, we focused on two different techniques. The first was running each data source through a neural network and multiplying the resulting class probability vectors to capture the combined result. The second technique focused on running each data source through a neural network, extracting a layer from each network, concatenating the layers for paired image and audio samples, and then running the concatenated object through a fully connected neural network.

    Speaker Info:

    Madison McGovern

    Cadet

    United States Military Academy

    Madison McGovern is a senior at the United States Military Academy majoring in Applied Statistics and Data Science. Upon graduation, she is headed to Fort Gordon, GA to join the Army’s Cyber branch as a Cyber Electromagnetic Warfare Officer. Her research interests include using machine learning to assist military operations.

  • Data Literacy Within the Department of Defense

    Abstract:

    Data literacy, the ability to read, write, and communicate data in context, is fundamental for military organizations to create a culture where data is appropriately used to inform both operational and non-operational decisions. However, oftentimes organizations outsource data problems to outside entities and rely on a small cadre of data experts to tackle organizational problems. In this talk we will argue that data literacy is not solely the role or responsibility of the data expert. Ultimately, if experts develop tools and analytics that Army decision makers cannot use, or do not effectively understand the way the Army makes decisions, the Army is no more data rich than if it had no data at all.

    While serving on a sabbatical as the Chief Data Scientist for Joint Special Operations Command, COL Nick Clark (Department of Mathematical Sciences, West Point), noticed that a lack of basic data literacy skills was a major limitation to creating a data centric organization. As a result of this, he created 10 hours of training focusing on the fundamentals of data literacy. After delivering the course to JSOC, other DoD organizations began requesting the training. In response to this, a team from West Point joined with Army Talent Management Task Force to create mobile training teams. The teams have now delivered the training over 30 times to organizations ranging from tactical units up to strategic level commands. In this talk, we discuss what data literacy skills should be taught to the force and highlight best practices in educating soldiers, civilians, and contractors on the basics of data literacy. We will finally discuss strategies for assessing organizational Data Literacy and provide a framework for attendees to assess their own organizations data strengths and weaknesses.

    Speaker Info:

    Nicholas Clark

    Associate Professor

    United States Military Academy

    COL Nicholas Clark is an Associate Professor in the Department of Mathematical Sciences at West Point where he is the Program Director for West Point's Applied Statistics and Data Science Program. Nick received a BS in Mathematics from West Point in 2002, a MS in Statistics from George Mason in 2010, and a PhD in Statistics from Iowa State University in 2018. His dissertation was on Self-Exciting Spatio-Temporal Statistical Models and he has published in a variety of disciplines including spatio-temporal statistics, best practices in statistical methodologies, epidemiology, and sports statistics. Nick is the former director of the Center for Data Analysis and Statistics, where he conducted research for a variety of Department of Defense clients. COL Clark served as the Chief Data Scientist for JSOC while on sabbatical from June 2021 - June 2022. While in this role he created the Army's Data Literacy 101 course teaching the fundamentals of Data Literacy to Army soldiers, civilians and contractors. Since inception, he and his team have now delivered the course over 30 times to a wide range of Army organizations.

  • Data Management for Research, Development, Test, and Evaluation

    Abstract:

    It is important to manage Data from research, development, test, and evaluation effectively. Well-managed data makes research more efficient and promotes better analysis and decision-making. At present, numerous federal organizations are engaged in large-scale reforms to improve the way they manage their data, and these reforms are already effecting the way research is executed. Data management effects every part of the research process. Thoughtful, early planning sets research projects on the path to success by ensuring that the resources and expertise required to effectively manage data throughout the research process are in place when they are needed.

    This interactive tutorial will discuss the planning and execution of data management for research projects. Participants will build a data management plan, considering data security, organization, metadata, reproducibility, and archiving.

    By the conclusion of the tutorial, participants will be able to define data management and understand its importance, understand how the data lifecycle relates to the research process, and be able to build a data management plan.

    Speaker Info:

    Matthew Avery

    Assistant Director

    IDA

    Matthew Avery is an OED Assistant Director and part of OED’s Sustainment group. He represents OED on IDA’s Data Governance Council and acts as the Deputy to IDA’s Director of Data Strategy and Chief Data Officer, helping craft data-related strategy and policy.

    Matthew spearheads a Sustainment group effort to develop an end-to-end model to identify ways to improve mission-capable rates for the V-22 fleet. Prior to joining Sustainment, Matthew was on the Test Science team. As the Test Science Data Management lead, he helped develop analytical methods and tools for operational test and evaluation. He also led OED’s project on operational test and evaluation of Army and Marine Corps unmanned aircraft systems. In 2018-19 Matthew served as an embedded analyst in the Pentagon’s Office of Cost Assessment and Program Evaluation, where among other projects he built state-space models in support of the Space Control Strategic Portfolio Review.

    Matthew earned his PhD in Statistics from North Carolina State University in 2012, his MS in Statistics from North Carolina State in 2009, and a BA from New College of Florida in 2006. He is a member of the American Statistical Association.

  • Data Management for Research, Development, Test, and Evaluation

    Abstract:

    It is important to manage Data from research, development, test, and evaluation effectively. Well-managed data makes research more efficient and promotes better analysis and decision-making. At present, numerous federal organizations are engaged in large-scale reforms to improve the way they manage their data, and these reforms are already effecting the way research is executed. Data management effects every part of the research process. Thoughtful, early planning sets research projects on the path to success by ensuring that the resources and expertise required to effectively manage data throughout the research process are in place when they are needed. This interactive tutorial will discuss the planning and execution of data management for research projects. Participants will build a data management plan, considering data security, organization, metadata, reproducibility, and archiving. By the conclusion of the tutorial, participants will be able to define data management and understand its importance, understand how the data lifecycle relates to the research process, and be able to build a data management plan.

    Speaker Info:

    Heather Wojton

    Chief Data Officer

    IDA

    Heather Wojton is the Director, Research Quality and Chief Data Officer for IDA, a role she assumed in 2021. In this position, Heather provides strategic leadership, project management and direction for the corporation’s data strategy. She is responsible for enhancing IDA’s ability to efficiently and effectively accomplish research and business operations by assessing and evolving data systems, data management infrastructure and data-related practices. She also oversees the quality management processes for research projects, including the research product publication process and the technical review process.

    Heather joined IDA in 2015 as a member of the research staff. She is an expert in quantitative research methods, including test design and program evaluation. She held numerous research and leadership roles before being named an assistant director of a research division.

    As a researcher at IDA, Heather led IDA’s test science research program that facilitates data-driven decision-making within the Department of Defense by advancing statistical, behavioral, and data science methodologies and applying them to the evaluation of defense acquisition programs. Heather’s other accomplishments include advancing methods for test design, modeling and simulation validation, data management and curation, and artificial intelligence testing. In this role, she worked closely with academic and Defense Department partners to adapt existing test design and evaluation methods and develop novel methods where gaps persisted.

    Heather has a doctorate in experimental psychology from the University of Toledo and a bachelor’s degree in research psychology from Marietta College, where she was a member of the McDonough International Leadership Program. She is a graduate of the George Washington University National Security Studies Senior Management Program and the Maxwell School National Security Management Course at Syracuse University.

  • DATAWorks Distinguished Leadership Award

    Speaker Info:

    Winner To Be Announced

  • Developing a Domain-Specific NLP Topic Modeling Process for Army Experimental Data

    Abstract:

    Researchers across the U.S. Army are conducting experiments on the implementation of emerging technologies on the battlefield. Key data points from these experiments include text comments on the technologies’ performances. Researchers use a range of Natural Language Processing (NLP) tasks to analyse such comments, including text summarization, sentiment analysis, and topic modelling. Based on the successful results from research in other domains, this research aims to yield greater insights by implementing military-specific language as opposed to a generalized corpus. This research is dedicated to developing a methodology to analyze text comments from Army experiments and field tests using topic models trained on an Army domain-specific corpus. The methodology is tested on experimental data agglomerated in the Forge database, an Army Futures Command (AFC) initiative to provide researchers with a common operating picture of AFC research. As a result, this research offers an improved framework for analysis with domain-specific topic models for researchers across the U.S. Army.

    Speaker Info:

    Anders Grau

    Cadet

    United States Military Academy

    Anders Grau is a United States Military Academy cadet currently studying for a Bachelor of Science in Operations Research. In his time as a cadet, he has had the opportunity to work with the Research Facilitation Laboratory to analyse insider threats in the Army and has conducted an independent study on topic modelling with Twitter data. He is currently writing a thesis on domain-specific topic modelling for Army experimental data. Upon the completion of his studies, he will commission as a Second Lieutenant in the Army's Air Defense Artillery branch.

  • Development of a Wald-Type Statistical Test to Compare Live Test Data and M&S Predictions

    Abstract:

    This work describes the development of a statistical test created in support of ongoing verification, validation, and accreditation (VV&A) efforts for modeling and simulation (M&S) environments. The test decides between a null hypothesis of agreement between the simulation and reality, and an alternative hypothesis stating the simulation and reality do not agree. To do so, it generates a Wald-type statistic that compares the coefficients of two generalized linear models that are estimated on live test data and analogous simulated data, then determines whether any of the coefficient pairs are statistically different.

    The test was applied to two logistic regression models that were estimated from live torpedo test data and simulated data from the Naval Undersea Warfare Center’s (NUWC) Environment Centric Weapons Analysis Facility (ECWAF). The test did not show any significant differences between the live and simulated tests for the scenarios modeled by the ECWAF. While more work is needed to fully validate the ECWAF’s performance, this finding suggests that the facility is adequately modeling the various target characteristics and environmental factors that affect in-water torpedo performance.

    The primary advantage of this test is that it is capable of handling cases where one or more variables are estimable in one model but missing or inestimable from the other. While it is possible to simply create the linear models on the common set of variables, this results in the omission of potentially useful test data. Instead, this approach identifies the mismatched coefficients and combines them with the model’s intercept term, thus allowing the user to consider models that are created on the entire set of available data. Furthermore, the test was developed in a generalized manner without any references to a specific dataset or system. Therefore, other researchers who are conducting VV&A processes on other operational systems may benefit from using this test for their own purposes.

    Speaker Info:

    Carrington Metts

    Data Science Fellow

    IDA

    Carrington Metts is a Data Science Fellow at IDA. She has a Masters of Science in Business Analytics from the College of William and Mary. Her work at IDA encompasses a wide range of topics, including wargaming, modeling and simulation, natural language processing, and statistical analyses.

  • Digital Transformation Enabled by Enterprise Automation

    Abstract:

    Digital transformation is a broad term that means a variety of things to people in many different operational domains, but the underlying theme is consistent: using digital technologies to improve business processes, culture, and efficiency. Digital transformation results in streamlining communications, collaboration, and information sharing while reducing errors. Properly implemented digital processes provide oversight and cultivate accountability to ensure compliance with business processes and timelines.
    A core tenet of effective digital transformation is automation. The elimination or reduction of human intervention in processes provides significant gains to operational speed, accuracy, and efficiency.
    DOT&E uses automation to streamline the creation of documents and reports which need to include up-to-date information. By using Smart Documentation capabilities, authors can define and automatically populate sections of documents with the most up-to-date data, ensuring that every published document always has the most current information.
    This session discusses a framework for driving digital transformation to automate nearly any business process.

    Speaker Info:

    Nathan Pond

    Program Manager - Business Enterprise Systems

    Edaptive Computing, Inc.

    Nathan Pond is the Program Manager for Business Enterprise Systems at Edaptive Computing, Inc., where he works to provide integrated technology solutions around a variety of business and engineering processes. He oversees product development teams for core products and services enabling digital transformation, and acts as the principal cloud architect for cloud solutions. Mr. Pond has over 20 years of experience with software engineering and technology, with an emphasis on improving efficiency with digital transformation and process automation.

  • Dose-Response Data Considerations for the NASA Quesst Community Test Campaign

    Abstract:

    Key outcomes for NASA's Quesst mission are noise dose and perceptual response data to inform regulators on their decisions regarding noise certification standards for the future of overland commercial supersonic flight. Dose-response curves are commonly utilized in community noise studies to describe the annoyance of a community to a particular noise source. The X-59 aircraft utilizes shaped-boom technology to demonstrate low noise supersonic flight. For X-59 community studies, the sound level from X-59 overflights constitutes the dose, while the response is an annoyance rating selected from a verbal scale, e.g., “slightly annoyed” and “very annoyed.” Dose-response data will be collected from individual flyovers (single event dose) and an overall response to the accumulation of single events at the end of the day (cumulative dose). There are quantifiable sources of error in the noise dose due to uncertainty in microphone measurements of the sonic thumps and uncertainty in predicted noise levels at survey participant locations. Assessing and accounting for error in the noise dose is essential to obtain an accurate dose-response model. There is also a potential for error in the perceptual response. This error is due to the ability of participants to provide their response in a timely manner and participant fatigue after responding to up to one hundred surveys over the course of a month. This talk outlines various challenges in estimating noise dose and perceptual response and the methods considered in preparation for X-59 community tests.

    Speaker Info:

    Aaron Vaughn

    Research Aerospace Engineer

    NASA Langley Research Center

    Aaron Vaughn works in the Structural Acoustics Branch at NASA Langley Research Center and is a member of the Community Test and Planning Execution team under the Commercial Supersonic Technology project. Primarily, Aaron researches statistical methods for modeling the dose-response relationship of boom level to community annoyance in preparation for upcoming X-59 community tests.

  • DOT&E Strategic Initiatives, Policy, and Emerging Technologies (SIPET) Mission Brief

    Abstract:

    SIPET, established in 2021, is a deputate within the office of the Director, Operational Test and Evaluation (DOT&E). DOT&E created SIPET to codify and implement the Director’s strategic vision and keep pace with science and technology to modernize T&E tools, processes, infrastructure, and workforce. That is, The mission of SIPET is to drive continuous innovation to meet the T&E demands of the future; support the development of a workforce prepared to meet the toughest T&E challenges; and nurture a culture of information exchange across the enterprise and update policy and guidance. SIPET proactively identifies current and future operational test and evaluation needs, gaps, and potential solutions in coordination with the Services and agency operational test organizations. Collaborating with numerous stakeholders, SIPET develops and refines operational test policy guidance that support new test methodologies and technologies in the acquisition and test communities.

    SIPET, in collaboration with the T&E community, is leading the development of the 2022 DOT&E Strategy Update Implementation Plan (I-Plan). I-Plan initiatives include:

    • Test the Way We Fight – Architect T&E around validated mission threads and demonstrate the operational performance of the Joint Force in multi-domain operations.
    • Accelerate the Delivery of Weapons that Work – Embrace digital technologies to deliver high-quality systems at more dynamic rates.
    • Increase the Survivability of DOD in Contested Environments – Identify, assess, and act on cyber, electromagnetic, spectrum, space, and other risks to DOD mission – at scale and at speed.
    • Pioneer T&E of Weapons Systems Built to Change Over Time – Implement fluid and iterative T&E across the entire system lifecycle to help assure continued combat credibility as the system evolves to meet warfighter needs.
    • Foster an Agile and Enduring T&E Enterprise Workforce – Centralize and leverage efforts to assess, curate, and engage T&E talent to quicken the pace of innovation across the T&E enterprise.

    Speaker Info:

    Jeremy Werner

    Chief Scientist

    DOT&E

    Jeremy Werner, PhD, ST was appointed DOT&E’s Chief Scientist in December 2021 after initially starting at DOT&E as an Action Officer for Naval Warfare in August 2021.  Before then, Jeremy was at Johns Hopkins University Applied Physics Laboratory (JHU/APL), where he founded a data science-oriented military operations research team that transformed the analytics of an ongoing military mission.  Jeremy previously served as a Research Staff Member at the Institute for Defense Analyses where he supported DOT&E in the rigorous assessment of a variety of systems/platforms.  Jeremy received a PhD in physics from Princeton University where he was an integral contributor to the Compact Muon Solenoid collaboration in the experimental discovery of the Higgs boson at the Large Hadron Collider at CERN, the European Organization for Nuclear Research in Geneva, Switzerland.  Jeremy is a native Californian and received a bachelor’s degree in physics from the University of California, Los Angeles where he was the recipient of the E. Lee Kinsey Prize (most outstanding graduating senior in physics).

  • Effective Application of Self-Validated Ensemble Models in Challenging Test Scenarios

    Abstract:

    We test the efficacy of SVEM versus alternative variable selection methods in a mixture experiment setting. These designs have built-in dependencies that require modifications of the typical design and analysis methods. The usual design metric of power is not helpful for these tests and analyzing results becomes quite challenging, particularly for factor characterization. We provide some guidance and lessons learned from hypersonic fuel formulation experience. We also show through simulation favorable combinations of design and Generalized Regression analysis options that lead to the best results. Specifically, we quantify the impact of changing run size, including complex design region constraints, using space-filling vs optimal designs, including replicates and/or center runs, and alternative analysis approaches to include full model, backward stepwise, SVEM forward selection, SVEM Lasso, and SVEM neural network.

    Speaker Info:

    James Wisnowski

    Principal Consultant

    Adsurgo

    Dr. James Wisnowski is the co-founder of Adsurgo.  He currently provides training and consulting services to industry and government in Reliability Engineering, Design of Experiments (DOE), and Applied Statistics. He retired as an Air Force officer with over 20 years of service as a commander, joint staff officer, Air Force Academy professor, and operational tester.  He received his PhD in Industrial Engineering from Arizona State University and is currently a faculty member at the Colorado School of Mines Department of Mechanical Engineering teaching DOE. Some conference presentations and journal articles on applied statistics are shown at https://scholar.google.com/scholar?hl=en&as_sdt=0%2C44&q=james+wisnowski&btnG=

  • Empirical Calibration for a Linearly Extrapolated Lower Tolerance Bound

    Abstract:

    In many industries, the reliability of a product is often determined by a quantile of a distribution of a product's characteristics meeting a specified requirement. A typical approach to address this is to assume a distribution model and compute a one-sided confidence bound on the quantile. However, this can become difficult if the sample size is too small to reliably estimate a parametric model. Linear interpolation between order statistics is a viable nonparametric alternative if the sample size is sufficiently large. In most cases, linear extrapolation from the extreme order statistics can be used, but can result in inconsistent coverage. In this talk, we'll present an empirical study from our submitted manuscript used to generate calibrated weights for linear extrapolation that greatly improves the accuracy of the coverage across a feasible range of distribution families with positive support. We'll demonstrate this calibration technique using two examples from industry.

    Speaker Info:

    Caleb King

    Research Statistician Developer

    JMP Statistical Discovery

    Dr. Caleb King is a Research Statistician Developer for the Design of Experiments platform in the JMP software. He's been responsible for developing the Sample Size Explorers suite of power and sample size explorers as well as the new MSA Design platform. Prior to joining JMP, he was a senior statistician at Sandia National Laboratories for 3 years, helping engineers design and analyze their experiments. He received his M.S. and Ph.D. in statistics from Virginia Tech, specializing in design and analysis for reliability.

  • Energetic Defect Characterizations

    Abstract:

    Energetic defect characterizations in munitions is a task requiring further refinement in military manufacturing processes. Convolutional neural networks (CNN) have shown promise in defect localization and segmentation in recent studies. These studies supplement that we may utilize a CNN architecture to localize casting defects in X-ray images. The U.S. Armament center has provided munition images for training to develop a system against MILSPEC requirements to identify and categorize defect munitions. In our approach, we utilize preprocessed munitions images and transfer learning from prior studies' model weights to compare the localization accuracy of this dataset for application in the field.

    Speaker Info:

    Naomi Edegbe

    Cadet

    United States Military Academy

    Cadet Naomi Edegbe is a senior attending the United States Military Academy.  As an Applied Statistics and Data Science Major, she enjoys proving a mathematical concept, wrangling data, and analyzing problems to answer questions. Her short-term professional goals include competing for a fellowship with the National GEM Consortium to obtain a master's degree in mathematical sciences or data science. After which, she plans to serve her six-year active-duty Army commitment in the Quartermaster Corps. Cadet Edegbe’s l professional goal is to produce meaningful research in STEM, either in application to relevant Army resourcing needs or as a separate track into the field of epidemiology within social frameworks. 

  • Estimating Sparsely and Irregularly Observed Multivariate Functional Data

    Abstract:

    With the rise in availability of larger datasets, there is a growing need of tools and methods to help inform data-driven decisions. Data that vary over a continuum, such as time, exist in a wide array of fields, such as defense, finance, and medicine. One such class of methods that addresses data varying over a continuum is functional data analysis (FDA). FDA methods typically make three assumptions that are often violated in real datasets: all observations exist over the same continuum interval (such as a closed interval [a,b]), all observations are regularly and densely observed, and if the dataset consists of multiple covariates, the covariates are independent of one another. We look to address violation of the latter two assumptions.

    In this talk, we will discuss methods for analyzing functional data that are irregularly and sparsely observed, while also accounting for dependencies between covariates. These methods will be used to estimate the reconstruction of partially observed multivariate functional data that contain measurement errors. We will begin with a high-level introduction of FDA. Next, we will introduce functional principal components analysis (FPCA), which is a representation of functions that our estimation methods are based on. We will discuss a specific approach called principal components analysis through conditional expectation (PACE) (Yao et al, 2005), which computes the FPCA quantities for a sparsely or irregularly sampled function. The PACE method is a key component that allows us to estimate partially observed functions based on the available dataset. Finally, we will introduce multivariate functional principal components analysis (MFPCA) (Happ & Greven, 2018), which utilizes the FPCA representations of each covariate’s functions in order to compute a principal components representation that accounts for dependencies between covariates.

    We will illustrate these methods through implementation on simulated and real datasets. We will discuss our findings in terms of the accuracy of our estimates with regards to the amount and portions of a function that is observed, as well as the diversity of functional observations in the dataset. We will conclude our talk with discussion on future research directions.

    Speaker Info:

    Maximillian Chen

    Senior Professional Staff

    Johns Hopkins University Applied Physics Laboratory

    Max Chen received his PhD from the Department of Statistical and Data Sciences at Cornell University. He previously worked as a senior member of technical staff at Sandia National Laboratories. Since December 2019, Max is a Senior Professional Staff member at the Johns Hopkins University Applied Physics Laboratory. He is interested in developing novel statistical methodologies in the areas of high-dimensional data analysis, dimension reduction and hypothesis testing methods for matrix- and tensor-variate data, functional data analysis, dependent data analysis, and data-driven uncertainty quantification.

  • Featured Panel: AI Assurance

    Speaker Info:

    Josh Poore

    Associate Research Scientist

    Applied Research Laboratory for Intelligence and Security, University of Maryland

    Dr. Joshua Poore is an Associate Research Scientist with the Applied Research Laboratory for Intelligence and Security (ARLIS) at the University of Maryland where he supports the Artificial Intelligence, Autonomy, and Augmentation (AAA) mission area. Previously, Dr. Poore was a Principal Senior Scientist and Technology Development Manager at BAE Systems, FAST Labs (6/2018-2/2021), and Principal Scientist at Draper (03/2011-06/2018). Across his industry career, Dr. Poore was Technical Director or Principal Investigator for a wide range of projects focused on human-system integration and human-AI interaction funded by DARPA, IARPA, AFRL, and other agencies. Dr. Poore’s primary research foci are: (1) the use of ubiquitous software and distributed information technology as measurement mediums for human performance; (2) knowledge management or information architectures for reasoning about human system integration within and across systems.  Currently, this work serves ARLIS’ AAA test-bed for Test and Evaluation activities. Dr. Poore also leads the development of open source technology aligned with this research as a committer and Project Management Committee member for the Apache Software Foundation.

  • Featured Panel: AI Assurance

    Speaker Info:

    Alec Banks

    Senior Principal Scientist

    Defence Science and Technology Laboratory

    Dr Alec Banks works as a Senior Principal Scientist at Defence Science and Technology Laboratory (Dstl).  He has worked in defence engineering for over 30 years. His recent work has focused on the safety of software systems, including working as a software regulator for the UKs air platforms and developing software safety assurance for the UKs next generation submarines. In his regulatory capacity he has provided software certification assurance for platforms such as Scan Eagle, Watchkeeper and F-35 Lightning II.  More recently, Alec has been the MOD lead on research in Test, Evaluation, Verification and Validation of Autonomy and AI-based systems; this has included revisions of the UK’s Defence Standard for software safety to facilitate greater use of models and simulations and the adoption of machine learning in higher-integrity applications.

  • Featured Panel: AI Assurance

    Abstract:

    This panel discussion will bring together an international group of AI Assurance Case experts from Academia and Government labs to discuss the challenges and opportunities of applying assurance cases to AI-enabled systems.  The panel will discuss how assurance cases apply to AI-enabled systems, pitfalls in developing assurance cases, including human-system integration into the assurance case, and communicating the results of an assurance case to non-technical audiences.

    This panel discussion will be of interest to anyone who is involved in the development or use of AI systems.  It will provide insights into the challenges and opportunities of using assurance cases to provide justified confidence to all stakeholders, from the AI users and operators to executives and acquisition decision-makers.

    Speaker Info:

    Laura Freeman

    Deputy Director

    Virginia Tech National Security Institute

    Dr. Laura Freeman is a Research Associate Professor of Statistics and the Deputy Director of the Virginia Tech National Security Institute.  Her research leverages experimental methods for conducting research that brings together cyber-physical systems, data science, artificial intelligence (AI), and machine learning to address critical challenges in national security.  She is also a hub faculty member in the Commonwealth Cyber Initiative and leads research in AI Assurance. She develops new methods for test and evaluation focusing on emerging system technology.  She is also the Assistant Dean for Research for the College of Science, in that capacity she works to shape research directions and collaborations in across the College of Science.
    Previously, Dr. Freeman was the Assistant Director of the Operational Evaluation Division at the Institute for Defense Analyses.  In that position, she established and developed an interdisciplinary analytical team of statisticians, psychologists, and engineers to advance scientific approaches to DoD test and evaluation.  During 2018, Dr. Freeman served as that acting Senior Technical Advisor for Director Operational Test and Evaluation (DOT&E).  As the Senior Technical Advisor, Dr. Freeman provided leadership, advice, and counsel to all personnel on technical aspects of testing military systems.  She reviewed test strategies, plans, and reports from all systems on DOT&E oversight.
    Dr. Freeman has a B.S. in Aerospace Engineering, a M.S. in Statistics and a Ph.D. in Statistics, all from Virginia Tech.  Her Ph.D. research was on design and analysis of experiments for reliability data.

  • Featured Panel: AI Assurance

    Speaker Info:

    John Stogoski

    Senior Systems Engineer

    Software Engineering Institute, Carnegie Mellon University

    John Stogoski has been at Carnegie Mellon University’s Software Engineering Institute for 10 years including roles in the CERT and AI Divisions.  He is currently a senior systems engineer working with DoD sponsors to research how artificial intelligence can be applied to increase capabilities and build the AI engineering discipline.  In his previous role, he oversaw a prototyping lab focused on evaluating emerging technologies and design patterns for addressing cybersecurity operations at scale.  John spent a significant portion of his career at a major telecommunications company where he served in director roles responsible for the security operations center and then establishing a homeland security office after the 9/11 attack.  He worked with government and industry counterparts to advance policy and enhance our coordinated, operational capabilities to lessen impacts of future attacks or natural disaster events.  Applying lessons from the maturing of the security field, along with considering the unique aspects of artificial intelligence, can help us enhance the system development lifecycle and realize the opportunities increasing our strategic advantage.

  • Framework for Operational Test Design: An Example Application of Design Thinking

    Abstract:

    Design thinking is a problem-solving approach that promotes the principles of human-centeredness, iteration, and diversity. The poster provides a five-step framework for how to incorporate these design principles when building an operational test. In the first step, test designers conduct research on test users and the problems they encounter. In the second step, designers articulate specific user needs to address in the test design. In the third step, designers generate multiple solutions to address user needs. In the forth step, designers create prototypes of their best solutions. In the fifth step, designers refine prototypes through user testing.

    Speaker Info:

    Miriam Armstrong

    Research Staff Member

    IDA

    Dr. Armstrong is a human factors researcher at IDA where she is involved in operational testing of defense systems. Her expertise includes interactions between humans and autonomous systems and psychometrics. She received her PhD in Human Factors Psychology from Texas Tech University in 2021.

  • Fully Bayesian Data Imputation using Stan Hamiltonian Monte Carlo

    Abstract:

    When doing multivariate data analysis, one common obstacle is the presence of incomplete observations, i.e., observations for which one or more covariates are missing data. Rather than deleting entire observations that contain missing data, which can lead to small sample sizes and biased inferences, data imputation methods can be used to statistically “fill-in” missing data. Imputing data can help combat small sample sizes by using the existing information in partially complete observations with the end goal of producing less biased and higher confidence inferences.

    In aerospace applications, imputation of missing data is particularly relevant because sample sizes are small and quantifying uncertainty in the model is of utmost importance. In this paper, we outline the benefits of a fully Bayesian imputation approach which samples simultaneously from the joint posterior distribution of model parameters and the imputed values for the missing data. This approach is preferred over multiple imputation approaches because it performs the imputation and modeling steps in one step rather than two, making it more compatible with complex model forms. An example of this imputation approach is applied to the NASA Instrument Cost Model (NICM), a model used widely across NASA to estimate the cost of future spaceborne instruments. The example models are implemented in Stan, a statistical-modeling tool enabling Hamiltonian Monte Carlo (HMC).

    Speaker Info:

    Melissa Hooke

    Systems Engineer

    NASA Jet Propulsion Laboratory

    Melissa Hooke is a Systems Engineer at the Jet Propulsion Laboratory in the Systems Modeling, Analysis & Architectures group. She is the task manager for NASA's CubeSat or Microsat Probabilistic and Analogies Cost Tool (COMPACT) and the Analogy Software Cost Tool (ASCoT), and is the primary statistical model developer for the NASA Instrument Cost Model (NICM). Her areas of interest include Bayesian modeling, uncertainty quantification, and data visualization. Melissa was the recipient of the "Rising Star" Award at the NASA Cost Symposium in 2021. Melissa earned her B.A. in Mathematics and Statistics at Pomona College where she developed a Bayesian model for spacecraft safe mode events for her undergraduate thesis.
  • Gaps in DoD National Artificial Intelligence Test and Evaluation Infrastructure Capabilities

    Abstract:

    Significant literature has been published in recent years calling for updated and new T&E infrastructure to allow for the credible, verifiable assessment of DoD AI-enabled capabilities (AIECs) . However, existing literature falls short in providing the detail necessary to justify investments in specific DoD Enterprise T&E infrastructure. The goal of this study was to collect data about current DoD programs with AIEC and corresponding T&E infrastructure to identify high priority investments by tracing AIECs to tailored, specific recommendations for enterprise AI T&E infrastructure. The study is divided into six bins of research, itemized below. This presentation provides an interim study update on the current state of programs with AIEC across DoD.

    • Goals : State specific enterprise AI T&E normative goal(s) and timeline, including rationale.
    • Demand: Generate T&E requirements and a demand forecast by querying existing and anticipated AI programs across the Department.
    • Supply Baseline: Catalog existing and planned AI T&E activities and infrastructure at the enterprise, Service, and Program levels, including existing resourcing levels.
    • Gaps: Identify specific gaps based on this supply and demand information.
    • Responsibilities: Clarify roles and responsibilities across OSD T&E stakeholders.
    • Actions: Identify differentiated lines of effort—including objectives, milestones, and cost estimates—that create a cohesive plan to achieve enterprise AI T&E goals.

    Speaker Info:

    Brian Vickers

    CDAO AI Assurance

    IDA

  • I-TREE: a Tool for Characterizing Research Using Taxonomies

    Abstract:

    IDA is developing a Data Strategy to develop solid infrastructures and practices that allow for a rigorous data-centric approach to answering U.S. security and science policy questions. The data strategy implements data governance and data architecture strategies to leverage data to gain trusted insights, and establishes a data-centric culture. One key component of the Data Strategy is a set of research taxonomies that describe and characterize the research done at IDA. These research taxonomies, broadly divided into six categories, are a vital tool to help IDA researchers gain insight into the research expertise of staff and divisions, in terms of the research products that are produced for our sponsors.

    We have developed an interactive web application which consumes numerous disparate sources of data related to these taxonomies, research products, researchers, and divisions, and unites them to create quantified analytics and visualizations to answer questions about research at IDA. This tool, titled I-TREE (IDA-Taxonomical Research Expertise Explorer), will enable staff to answer questions like ‘Who are the researchers most commonly producing products for a specified research area?’, ‘What is the research profile of a specified author?’, ‘What research topics are most commonly addressed by a specified division?’, ‘Who are the researchers most commonly producing products in a specified division?’, and ‘What divisions are producing products for a specified research topic?’.

    These are essential questions whose answers allow IDA to identify subject-matter expertise areas, methodologies, and key skills in response to sponsor requests, and to identify common areas of expertise to build a research team with a broad range of skills. I-TREE demonstrates the use of data science and data management techniques that enhance the company’s data strategy while actively enabling researchers and management to make informed decisions.

    Speaker Info:

    Aayushi Verma

    Data Science Fellow

    IDA

    Aayushi Verma is a Data Science Fellow at the Institute for Defense Analyses. She supports the Chief Data Officer with the IDA Data Initiative Strategy by leveraging disparate sources of data to create applications and dashboards that help IDA staff. Her data science interests include data analysis, machine learning, artificial intelligence, and extracting stories from data. She has a B.Sc. (Hons.) in Astrophysics from the University of Canterbury, and is currently pursuing her M.S. in Data Science from Pace University.

  • Implementing Fast Flexible Space Filling Designs In R

    Abstract:

    Modeling and simulation (M&S) can be a useful tool for testers and evaluators when they need to augment the data collected during a test event. During the planning phase, testers use experimental design techniques to determine how much and which data to collect. When designing a test that involves M&S, testers can use Space-Filling Designs (SFD) to spread out points across the operational space. Fast Flexible Space-Filling Designs (FFSFD) are a type of SFD that are useful for M&S because they work well in nonrectangular design spaces and allow for the inclusion of categorical factors. Both of these are recurring features in defense testing.

    Guidance from the Deputy Secretary of Defense and the Director of Operational Test and Evaluation encourages the use of open and interoperable software and recommends the use of SFD. This project aims to address those.

    IDA analysts developed a function to create FFSFD using the free statistical software R. To our knowledge, there are no R packages for the creation of an FFSFD that could accommodate a variety of user inputs, such as categorical factors. Moreover, by using this function, users can share their code to make their work reproducible.

    This presentation starts with background information about M&S and, more specifically, SFD. The briefing uses a notional missile system example to explain FFSFD in more detail and show the FFSFD R function inputs and outputs. The briefing ends with a summary of the future work for this project.

    Speaker Info:

    Christopher Dimapasok

    Summer Associate / Graduate Student

    IDA / Johns Hopkins University

    I graduated from UCLA in 2020 with a degree in Molecular Cell and Development Biology. Currently, I am a graduate student at Johns Hopkins University and also worked as a Summer Associate for IDA. I hope to leverage my multidisciplinary skills to make a long-lasting impact.

  • Infusing Statistical Thinking into the NASA Quesst Community Test Campaign

    Abstract:

    Statistical thinking permeates many important decisions as NASA plans its Quesst mission, which will culminate in a series of community overflights using the X-59 aircraft to demonstrate low-noise supersonic flight. Month-long longitudinal surveys will be deployed to assess human perception and annoyance to this new acoustic phenomenon. NASA works with a large contractor team to develop systems and methodologies to estimate noise doses, to test and field socio-acoustic surveys, and to study the relationship between the two quantities, dose and response, through appropriate choices of statistical models. This latter dose-response relationship will serve as an important tool as national and international noise regulators debate whether overland supersonic flights could be permitted once again within permissible noise limits. In this presentation we highlight several areas where statistical thinking has come into play, including issues of sampling, classification and data fusion, and analysis of longitudinal survey data that are subject to rare events and the consequences of measurement error. We note several operational constraints that shape the appeal or feasibility of some decisions on statistical approaches, and we identify several important remaining questions to be addressed.

    Speaker Info:

    Nathan Cruze

    Statistician

    NASA Langley Research Center

    Dr. Nathan Cruze joined NASA Langley Research Center in 2021 as a statistician in the Engineering Integration Branch supporting the planning and execution of community testing during the Quesst mission.  Prior to joining NASA, he served as a research mathematical statistician at USDA’s National Agricultural Statistics Service for more than eight years, where his work focused on improving crop and economic estimates programs by combining survey and auxiliary data through statistical modeling.  His Ph.D. in Interdisciplinary Programs was co-directed by faculty from the statistics and chemical engineering departments at Ohio State University.  He holds bachelor’s degrees in economics and mathematics and master’s degrees in economics and statistics, also from Ohio State University.  Dr. Cruze currently co-chairs the Federal Committee on Statistical Methodology interest group on Computational Statistics and the Production of Official Statistics.

  • Introducing Self-Validated Ensemble Models (SVEM) – Bringing Machine Learning to DOEs

    Abstract:

    DOE methods have evolved over the years, as have the needs and expectations of experimenters. Historically, the focus emphasized separating effects to reduce bias in effect estimates and maximizing hypotheses testing power, which are largely a reflection of the methodological and computational tools of their time. Often DOE in industry is done to predict product or process behavior under possible changes. We introduce Self-Validating Ensemble Models (SVEM), an inherently predictive algorithmic approach to the analysis of DOEs, generalizing the fractional bootstrap to make machine learning and bagging possible for small datasets common in DOE. In many DOE applications the number of rows is small, and the factor layout is carefully structured to maximize information gain in the experiment. Applying machine learning methods to DOE is generally avoided because they begin with a partitioning the rows into a training set for model fitting and a holdout set for model selection. This alters the structure of the design in undesirable ways such as randomly introducing effect aliasing. SVEM avoids this problem by using a variation of the fractionally weighted bootstrap to create training and validation versions of the complete data that differ only in how rows are weighted. The weights are reinitialized, and models refit multiple times so that our final SVEM model is a model average, much like bagging. We find this allows us to fit models where the number of estimated effects exceeds the number of rows. We will present simulation results showing that in these supersaturated cases SVEM outperforms existing approaches like forward selection as measured by prediction accuracy.

    Speaker Info:

    Chris Gotwalt

    JMP Chief Data Scientist

    JMP Statistical Discovery

    Chris joined JMP in 2001 while obtaining his Ph.D. in Statistics at NCSU. Chris has made many contributions to JMP, mostly in the form of the computational algorithms that fit models or design experiments. He developed JMP’s algorithms for fitting neural networks, mixed models, structural equations models, text analysis, and many more. Chris leads a team of 20 statistical software developers, testers, and technical writers. Chris was the 2020 Chair of the Quality and Productivity Section of the American Statistical Association and has held adjunct professor appointments at Univ. of Nebraska, Univ. of New Hampshire, and NCSU, guiding dissertation research into generalized linear mixed models, extending machine learning techniques designed experiments, and machine learning based imputation strategies.

  • Introducing TestScience.org

    Abstract:

    The Test Science Team facilitates data-driven decision-making by disseminating various testing and analysis methodologies. One way they disseminate these methodologies is through the annual workshop, DATAWorks; another way is through the website, TestScience.org. The Test Science website includes video training, interactive tools, a related research library as well as the DATAWorks Archive.

    "Introducing TestScience.org", a presentation at DATAWorks, could include a poster and an interactive guided session through the site content. The presentation would inform interested DATAWorks attendees of the additional resources throughout the year. It could also be used to inform the audience about ways to participate, such as contributing interactive Shiny tools, training content, or research.

    "Introducing TestScience.org" would highlight the following sections of the website:

    1. The DATAWorks Archives

    2. Learn (Video Training)

    3. Tools (Interactive Tools)

    5. Research (Library)

    6. Team (About and Contact)

    Incorporating into DATAWorks an introduction to TestScience.org would inform attendees of additional valuable resources available to them, and could encourage broader participation in Testscience.org, adding value to both the DATAWorks attendees and the TestScience.org efforts.

    Speaker Info:

    Sean Fiorito

    Contractor

    IDA / V-Strat, LLC

    Mr. Fiorito has been a contractor for the Institute for Defense Analyses (IDA) since 2015. He was a part of the original team to design and develop both the DATAWorks and Test Science team websites. He has expertise in application development, integrated systems, cloud architecture and cloud adoption.

    Mr. Fiorito started his Federal IT career in 2004 with Booz Allen Hamilton. Since then he's worked with other Federal IT contract firms, both large (Deloitte, Accenture) and small (Fila, Dynamo). He has contributed to projects such as the Coast Guard's Rescue 21, the Forest Service's Electronic Management of NEPA (eMNEPA) and Federal Student Aide's Enterprise Cloud Migration.

    He holds a BS in Information Systems with a concentration in programming, as well as an Amazon Web Services Cloud Architect certification.

  • Introduction to Design of Experiments in R: Generating and Evaluating Designs with skpr

    Abstract:

    The Department of Defense requires rigorous testing to support the evaluation of effectiveness and suitability of oversight acquisition programs. These tests are performed in a resource constrained environment and must be carefully designed to efficiently use those resources. The field of Design of Experiments (DOE) provides methods for testers to generate optimal experimental designs taking these constraints into account, and computational tools in DOE can support this process by enabling analysts to create designs tailored specifically for their test program. In this tutorial, I will show how you can run these types of analyses using “skpr”: a free and open source R package developed by researchers at IDA for generating and evaluating optimal experimental designs. This software package allows you to perform DOE analyses entirely in code; rather than using a graphical user interface to generate and evaluate individual designs one-by-one, this tutorial will demonstrate how an analyst can use “skpr” to automate the creation of a variety of different designs using a short and simple R script. Attendees will learn the basics of using the R programming language and how to generate, save, and share their designs. Additionally, “skpr” provides a straightforward interface to calculate statistical power. Attendees will learn how to use built-in parametric and Monte Carlo power evaluation functions to compute power for a variety of models and responses, including linear models, split-plot designs, blocked designs, generalized linear models (including logistic regression), and survival models. Finally, I will demonstrate how you can conduct an end-to-end DOE analysis entirely in R, showing how to generate power versus sample size plots and other design diagnostics to help you design an experiment that meets your program's needs.

    Speaker Info:

    Tyler Morgan-Wall

    Research Staff Member

    IDA

    Dr. Tyler Morgan-Wall is a Research Staff Member at the Institute for Defense Analyses, and is the developer of the software library skpr: a package developed at IDA for optimal design generation and power evaluation in R. He is also the author of several other R packages for data visualization, mapping, and cartography. He has a PhD in Physics from Johns Hopkins University and lives in Silver Spring, MD.

  • Introduction to Machine Learning

    Abstract:

    Machine learning (ML) teaches computer systems through data or experience and can generally be divided into three broad branches:  supervised learning, unsupervised learning, and reinforcement learning.  The objective of this course is to provide attendees with 1) an introduction to ML methods, 2) insights into best practices, and 3) a survey of limitations to existing ML methods that are leading to new areas of research.  This introduction to machine learning course will cover a wide range of topics including regression, classification, clustering, feature selection, exploratory data analysis, reinforcement learning, transfer learning, and active learning.  This course will be taught through a series of lectures followed by demonstrations on open-source data sets using Jupyter Notebooks and Python.

    Speaker Info:

    Stephen Adams

    Associate Research Professor

    Virginia Tech National Security Institute

    Stephen Adams is an Associate Research Professor in the Virginia Tech National Security Institute.  He received a M.S. in Statistics from the University of Virginia (UVA) in 2010 and a Ph.D. from UVA in Systems Engineering in December of 2015.  His research focuses on applications of machine learning and artificial intelligence in real-world systems.  He has experience developing and implementing numerous types of machine learning and artificial intelligence algorithms.  His research interests include feature selection, machine learning with cost, transfer learning, reinforcement learning, and probabilistic modeling of systems.  His research has been applied to several domains including activity recognition, prognostics and health management, psychology, cybersecurity, data trustworthiness, natural language processing, and predictive modeling of destination given user geo-information data.

  • Large-scale cross-validated Gaussian processes for efficient multi-purpose emulators

    Abstract:

    We describe recent advances in Gaussian process emulation, which allow us to both save computation time and to apply inference algorithms that previously were too expensive for operational use. Specific examples are given from the Earth-orbiting Orbiting Carbon Observatory and the future Surface Biology and Geology Missions, dynamical systems, and other applications. While Gaussian processes are a well-studied field, there are surprisingly important choices that the community has not paid so much attention to this far, including dimension reduction, kernel parameterization, and objective function selection. This talk will highlight some of those choices and help understand what practical implications they have.

    Speaker Info:

    Jouni Susiluoto

    Data Scientist

    NASA Jet Propulsion Laboratory, California Institute of Technology

    Dr. Jouni Susiluoto is a Data Scientist at NASA Jet Propulsion Laboratory in Pasadena, California. His main research focus has recently been in inversion algorithms and forward model improvements for current and next-generation hyperspectral imagers, such as AVIRIS-NG, EMIT, and SBG. This research heavily leans on new developments in high-efficiency cross-validated Gaussian process techniques, a research topic that he has closely pursued together with Prof. Houman Owhadi's group at Caltech. Susiluoto's previous work includes a wide range of data science, uncertainty quantification and modeling applications in geosciences, such as spatio-temporal data fusion with very large numbers of data, Bayesian model selection, chaotic model analysis and parameter estimation, and climate and carbon cycle modeling. He has a doctorate in mathematics from University of Helsinki, Finland.

  • Model Validation Levels for Model Authority Quantification

    Abstract:

    Due to the increased use of Modeling & Simulation (M&S) in the development of Department of Defense (DOD) weapon systems, it is critical to assign models appropriate levels of trust. Validation is an assessment process that can help mitigate the risks posed by relying on potentially inaccurate, insufficient, or incorrect models. However, validation criteria are often subjective, and inconsistently applied between differing models. Current Practice fails to reassess models as requirements change, mission scope is redefined, new data is collected, or models are adapted to a new use. This brief will present Model Validation Levels (MVLs) as a validation paradigm that enables rigorous, objective validation of a model and yields metrics that quantify the amount of trust that can be placed in a model. This validation framework will be demonstrated through a real-world example detailing the construction and interpretation of MVLs.

    Speaker Info:

    Kyle Provost

    Senior Statistician

    STAT COE

    Kyle is a STAT Expert  (Huntington Ingalls Industries contractor) at the Scientific Test and Analysis Techniques (STAT) Center of Excellence (COE) at the Air Force Institute of Technology (AFIT). The STAT COE provides independent STAT consultation to designated acquisition programs and special projects to improve Test & Evaluation (T&E) rigor, effectiveness, and efficiency. He received his M.S. in Applied Statistics from Wright State University.

  • Model Verification in a Digital Engineering Environment: An Operational Test Perspective

    Abstract:

    As the Department of Defense adopts digital engineering strategies for acquisition systems in development, programs are embracing the use highly federated models to assess the end-to-end performance of weapon systems, to include the threat environment. Often, due to resource limitations or political constraints, there is limited live data with which to validate the end-to-end performance of these models. In these cases, careful verification of the model, including from an operational factor-space perspective, early in model development can assist testers in prioritizing resources for model validation in later system development. This presentation will discuss how using Design of Experiments to assess the operational factor space can shape model verification efforts and provide data for model validation focused on the end-to-end performance of the system.

    Speaker Info:

    Jo Anna Capp

    Research Staff Member

    IDA

    Jo Anna Capp is a Research Staff Member in the Operational Evaluation Division of IDA’s Systems and Analyses Center. She supports the Director, Operational Test and Evaluation (DOT&E) in the test and evaluation oversight of nuclear acquisition programs for the Department of Defense.
    Jo Anna joined IDA in 2017, and has worked on space and missile systems during her tenure. She is an expert in operational test and evaluation of nuclear weapon systems and in the use of statistical and machine learning techniques to derive insight into the performance of these and other acquisition systems.
    Jo Anna holds a doctorate in biochemistry from Duke University and a bachelor’s degree in cell and molecular biology from Florida Gulf Coast University.

  • Multimodal Data Fusion: Enhancing Image Classification with Text

    Abstract:

    Image classification is a critical part of gathering information on high-value targets. To this end, Convolutional Neural Networks (CNN) have become the standard model for image and facial classification. However, CNNs alone are not entirely effective at image classification, and especially human classification due to their lack of robustness and bias. Recent advances in CNNs, however, allow for data fusion to help reduce the uncertainty in their predictions. In this project, we describe a multimodal algorithm designed to increase confidence in image classification with the use of a joint fusion model with image and text data. Our work utilizes CNNs for image classification and bag-of-words for text categorization on Wikipedia images and captions relating to the same classes as the CIFAR-100 dataset. Using data fusion, we combine the vectors of the CNN and bag-of-words models and utilize a fully connected network on the joined data. We measure improvements by comparing the SoftMax layer for the joint fusion model and image-only CNN.

    Speaker Info:

    Jack Perreault

    Cadet

    United States Military Academy

    CDT Jack Perreault is a senior at the United States Military Academy majoring in Applied Statistics and Data Science and will commission as a Signal officer upon graduation. He hopes to pursue a Master of Science in Data Science through a technical scholarship. Within the Army, CDT Perreault plans to work in the 528th Special Operations Sustainment Brigade at Fort Bragg, North Carolina before transitioning to the Operations Research and Systems Analysis career field where he can conduct data-driven analysis that affects the operational and strategic decisions of the Army. CDT Perreault hopes to return to the United States Military Academy as an instructor within the Math Department where he can teach and inspire future cadets before transitioning to civilian sector. His current research is centered around analyzing how the use of a multimodal data fusion algorithm can leverage both images and accompanying text to enhance image classification. His prior research involves predictive modeling by analyzing role of public perception of the Vice President and its impact on presidential elections. CDT Perreault is a member of West Point’s track and field team and enjoys going to the beach while at home in Rhode Island.

  • NASEM Range Capabilities Study and T&E of Multi-Domain Operations

    Abstract:

    The future viability of DoD’s range enterprise depends on addressing dramatic changes in
    technology, rapid advances in adversary military capabilities, and the evolving approach the United States will take to closing kill chains in a Joint All Domain Operations environment. This recognition led DoD’s former Director of Operational Test and Evaluation (OT&E), the Honorable Robert Behler, to request that the National Academies of Science, Engineering and Medicine examine the physical and technical suitability of DoD’s ranges and infrastructure through 2035. The first half of this presentation will cover the highlights and key recommendations of this study, to include the need to create the “TestDevOps” digital infrastructure for future operational test and seamless range enterprise interoperability. The second half of this presentation looks at the legacy frameworks for the relationships of physical and virtual test capabilities, and how those frameworks are becoming outdated. This briefing explores proposals on how the interaction of operations, physical test capabilities, and virtual test capabilities need to evolve to support new paradigms of the rapidly evolving technologies and changing nature of multi-domain operations.

    Speaker Info:

    Hans Miller

    Department Chief Engineer

    MITRE

    Hans Miller, Col USAF (ret), is a Chief Engineer for Research and Advanced Capabilities department at the MITRE Corporation.  He retired with over 25 years of experience in combat operations, experimental flight test, international partnering, command and control, policy, and strategic planning of defense weapon systems.  His last assignment was as Division Chief of the Policy, Programs and Resources Division, Headquarters Air Force Test and Evaluation Directorate at the Pentagon.  He led a team responsible for Test and Evaluation policy throughout the Air Force, coordination with OSD and Joint Service counterparts, and staff oversight across the spectrum of all Air Force acquisition programs. Prior to that assignment, he was the Commander of the 96th Test Group, Holloman AFB, NM.  The 96th Test Group conducted avionics and weapon systems flight tests, inertial navigation and Global Positioning System tests, high-speed test track operations and radar cross section tests necessary to keep joint weapon systems ready for war.

    Hans Miller was commissioned as a graduate of the USAF Academy.  He has served as an operational and experimental flight test pilot in the B-1B and as an F-16 chase pilot.  He flew combat missions in the B-1B in Operation Allied Force and Operation Enduring Freedom.  He served as an Exercise Planning Officer at the NATO Joint Warfare Center, Stavanger, Norway. Col (ret) Miller was the Squadron Commander of the Global Power Bomber Combined Test Force coordinating ground and flight test activities on the B-1, B-2 and B-52.  He served as the Director, Comparative Technology Office, within the Office of the Secretary of Defense.  He managed the Department’s Foreign Comparative Testing, and Rapid Innovation Fund programs.

    Hans Miller is a Command Pilot with over 2100 hours in 35 different aircraft types. He is a Department of Defense Acquisition Corps member and holds Level 3 certification in Test and Evaluation.  He is a graduate of the USAF Weapons School, USAF Test Pilot School, Air Command and Staff College and Air War College.  He holds a bachelor’s degree in Aeronautical Engineering and a master’s degree in Aeronautical and Astronautical engineering from Stanford University.

  • Neural Networks for Quantitative Resilience Prediction

    Abstract:

    System resilience is the ability of a system to survive and recover from disruptive events, which finds applications in several engineering domains, such as cyber-physical systems and infrastructure. Most studies emphasize resilience metrics to quantify system performance, whereas more recent studies propose resilience models to project system recovery time after degradation using traditional statistical modeling approaches. Moreover, past studies are either performed on data after recovering or limited to idealized trends. Therefore, this talk considers alternative machine learning approaches such as (i) Artificial Neural Networks (ANN), (ii) Recurrent Neural Networks, and (iii) Long-Short Term Memory (LSTM) to model and predict system performance of alternative trends other than ones previously considered. These approaches include negative and positive factors driving resilience to understand and precisely quantify the impact of disruptive events and restorative activities. A hybrid feature selection approach is also applied to identify the most relevant covariates. Goodness of fit measures are calculated to evaluate the models, including (i) mean squared error, (ii) predictive-ratio risk, (iii) and adjusted R squared. The results indicate that LSTM models outperform ANN and RNN models requiring fewer neurons in the hidden layer in most of the data sets considered. In many cases, ANN models performed better than RNNs but required more time to be trained. These results suggest that neural network models for predictive resilience are both feasible and accurate relative to traditional statistical methods and may find practical use in many important domains.

    Speaker Info:

    Karen Alves da Mata

    Master Student

    University of Massachusetts Dartmouth

    Karen da Mata is a Master's Student in the Electrical and Computer Engineering Department at the University of Massachusetts - Dartmouth. She completed her undergraduate studies in the Electrical Engineering Department at the Federal University of Ouro Preto - Brazil - in 2018.

  • Novelty Detection in Network Traffic: Using Survival Analysis for Feature Identification

    Abstract:

    Over the past decade, Intrusion Detection Systems have become an important component of many organizations’ cyber defense and resiliency strategies. However, one of the greatest downsides of these systems is their reliance on known attack signatures for successful detection of malicious network events.
    When it comes to unknown attack types and zero-day exploits, modern Intrusion Detection Systems often fall short. Since machine learning algorithms for event classification are widely used in this realm, it is imperative to analyze the characteristics of network traffic that can lead to novelty detection using such classifiers. In this talk, we introduce a novel approach to identifying network traffic features that influence novelty detection based on survival analysis techniques. Specifically, we combine several Cox proportional hazards models to predict which features of a network flow are most indicative of a novel network attack and likely to confuse the classifier as a result. We also implement Kaplan-Meier estimates to predict the probability that a classifier identifies novelty after the injection of an unknown network attack at any given time. The proposed model is successful at pinpointing PSH Flag Count, ACK Flag Count, URG Flag Count, and Down/Up Ratio as the main features to impact novelty detection via Random Forest, Bayesian Ridge, and Linear SVR classifiers.

    Speaker Info:

    Elie Alhajjar

    Senior Research Scientist

    USMA

    Dr. Elie Alhajjar is a senior research scientist at the Army Cyber Institute and jointly an Associate Professor in the Department of Mathematical Sciences at the United States Military Academy in West Point, NY, where he teaches and mentors cadets from all academic disciplines. His work is supported by grants from NSF, NIH, NSA, and ARL and he was recently named the Dean's ​Fellow for research. His research interests include mathematical modeling, machine learning and network analysis, from a cybersecurity viewpoint. He has presented his research work in international meetings in North America, Europe, and Asia. He is a recipient of the Civilian Service Achievement Medal, the NSF Trusted CI Open Science Cybersecurity Fellowship, the Day One Technology Policy Fellowship, and the SIAM Science Policy Fellowship. He holds a Master of Science and a PhD in mathematics from George Mason University, as well as master’s and bachelor’s degrees from Notre Dame University.

  • On the Validation of Statistical Software

    Abstract:

    Validating statistical software involves a variety of challenges. Of these, the most difficult is the selection of an effective set of test cases, sometimes referred to as the “test case selection problem”. To further complicate matters, for many statistical applications, development and validation are done by individuals who often have limited time to validate their application and may not have formal training in software validation techniques. As a result, it is imperative that the adopted validation method is efficient, as well as effective, and it should also be one that can be easily understood by individuals not trained in software validation techniques. As it turns out, the test case selection problem can be thought of as a design of experiments (DOE) problem. This talk discusses how familiar DOE principles can be applied to validating statistical software.

    Speaker Info:

    Ryan Lekivetz

    Manager, Advanced Analytics R&D

    JMP Statistical Discovery

    Ryan Lekivetz is the manager of the Design of Experiments (DOE) and Reliability team that develops those platforms in JMP. He earned his doctorate in statistics from Simon Fraser University in Burnaby, BC, Canada, and has publications related to topics in DOE in peer-reviewed journals. He looks for ways to apply DOE in other disciplines and even his everyday life.

  • Opening Remarks

    Speaker Info:

    General Norty Schwartz

    President

    U.S. Air Force, retired / Institute for Defense Analyses

    Norton A. Schwartz serves as President of the Institute for Defense Analyses (IDA), a nonprofit corporation operating in the public interest. IDA manages three Federally Funded Research and Development Centers that answer the most challenging U.S. security and science policy questions with objective analysis leveraging extraordinary scientific, technical, and analytic expertise. At IDA, General Schwartz (U.S. Air Force, retired) directs the activities of more than 1,000 scientists and technologists employed by IDA.

    General Schwartz has a long and prestigious career of service and leadership that spans over 5 decades. He was most recently President and CEO of Business Executives for National Security (BENS). During his 6-year tenure at BENS, he was also a member of IDA’s Board of Trustees.

    Prior to retiring from the U.S. Air Force, General Schwartz served as the 19th Chief of Staff of the U.S. Air Force from 2008 to 2012. He previously held senior joint positions as Director of the Joint Staff and as the Commander of the U.S. Transportation Command. He began his service as a pilot with the airlift evacuation out of Vietnam in 1975.

    General Schwartz is a U.S. Air Force Academy graduate and holds a master’s degree in business administration from Central Michigan University. He is also an alumnus of the Armed Forces Staff College and the National War College.

    He is a member of the Council on Foreign Relations and a 1994 Fellow of Massachusetts Institute of Technology’s Seminar XXI. General Schwartz has been married to Suzie since 1981.

  • Opening Remarks

    Speaker Info:

    The Honorable Nickolas H. Guertin

    Director, Operational Test & Evaluation

    OSD/DOT&E

    Nickolas H. Guertin was sworn in as Director, Operational Test and Evaluation (DOT&E) on December 20, 2021. A Presidential appointee confirmed by the United States Senate, he serves as the senior advisor to the Secretary of Defense on operational and live fire test and evaluation of Department of Defense weapon systems.

    Mr. Guertin has an extensive four-decade combined military and civilian career in submarine operations; ship construction and maintenance; development and testing of weapons, sensors, combat management products including the improvement of systems engineering; and defense acquisition. Most recently, he has performed applied research for government and academia in software-reliant and cyber-physical systems at Carnegie Mellon University’s Software Engineering Institute.

    Over his career, he has led organizational transformation, improved competition, and increased application of modular open-system approaches, prototyping, and experimentation. He has also researched and published extensively on software-reliant system design, testing, and acquisition. He received a Bachelor of Science in Mechanical Engineering from the University of Washington and an MBA from Bryant University. He is a retired Navy Reserve Engineering Duty Officer, was Defense Acquisition Workforce Improvement Act (DAWIA) certified in Program Management and Engineering, and is also a licensed Professional Engineer (Mechanical).

    Mr. Guertin is involved with his community as an Assistant Scoutmaster and Merit Badge Counselor for two local Boy Scouts of America troops, and is an avid amateur musician. He is a native of Connecticut and now resides in Virginia with his wife and twin children

  • Opening Remarks

    Speaker Info:

    Bram Lillard

    Director, Operational Evaluation Division

    IDA

    V. Bram Lillard assumed the role of Director of the Operational Evaluation Division (OED) in early 2022. In this position, Bram provides strategic leadership, project oversight, and direction for the division’s research program, which primarily supports the Director, Operational Test and Evaluation (DOT&E) within the Office of the Secretary of Defense. He also oversees OED’s contributions to strategic studies, weapon system sustainment analyses, and cybersecurity evaluations for DOD and anti-terrorism technology evaluations for the Department of Homeland Security.

    Bram joined IDA in 2004 as a member of the research staff. In 2013-14, he was the acting science advisor to DOT&E. He then served as OED’s assistant director in 2014-21, ascending to deputy director in late 2021.

    Prior to his current position, Bram was embedded in the Pentagon where he led IDA’s analytical support to the Cost Assessment and Program Evaluation office within the Office of the Secretary of Defense. He previously led OED’s Naval Warfare Group in support of DOT&E. In his early years at IDA, Bram was the submarine warfare project lead for DOT&E programs. He is an expert in quantitative data analysis methods, test design, naval warfare systems and operations and sustainment analyses for Defense Department weapon systems.

    Bram has both a doctorate and a master’s degree in physics from the University of Maryland. He earned his bachelor’s degree in physics and mathematics from State University of New York at Geneseo. Bram is also a graduate of the Harvard Kennedy School’s Senior Executives in National and International Security program, and he was awarded IDA’s prestigious Goodpaster Award for Excellence in Research in 2017.

  • Optimal Release Policy for Covariate Software Reliability Models.

    Abstract:

    The optimal time to release a software is a common problem of broad concern to software engineers, where the goal is to minimize cost by balancing the cost of fixing defects before or after release as well as the cost of testing. However, the vast majority of these models are based on defect discovery models that are a function of time and can therefore only provide guidance on the amount of additional effort required. To overcome this limitation, this paper presents a software optimal release model based on cost criteria, incorporating the covariate software defect detection model based on the Discrete Cox Proportional Hazards Model. The proposed model provides more detailed guidance recommending the amount of each distinct test activity performed to discover defects. Our results indicate that the approach can be utilized to allocate effort among alternative test activities in order to minimize cost.

    Speaker Info:

    Ebenezer Yawlui

    Master's Student

    University of Massachusetts Dartmouth

    Ebenezer Yawlui is a MS student in the Department of Electrical & Computer Engineering at the University of Massachusetts Dartmouth (UMassD). He received his BS (2020) in Electrical Engineering from Regional Maritime University, Ghana.

  • Overarching Tracker of DOT&E Actions

    Abstract:

    OED’s Overarching Tracker of DOT&E Actions distills information from DOT&E’s operational test reports and memoranda on test plan and test strategy approvals to generate informative metrics on the office’s activities. In FY22, DOT&E actions covered 68 test plans, 28 strategies, and 28 reports, relating to 74 distinct programs. This poster presents data from those documents and highlights findings on DOT&E’s effectiveness, suitability, and survivability determinations and other topics related to the state of T&E.

    Speaker Info:

    Buck Thome

    Research Staff Member

    IDA

    Dr. Thome is a member of the research staff at Institute for Defense Analyses, focusing on test and evaluation of net-centric systems and cybersecurity.  He received his PhD in Experimental High Energy Physics from Carnegie Mellon University in 2011.  After working with a small business defense contractor developing radio frequency sensor systems, he came to IDA in 2013.

  • Perspectives on T&E of ML for Assuring Reliability in Safety-Critical Applications

    Speaker Info:

    Pradeep Ramuhalli

    Oak Ridge National Laboratory

  • Perspectives on T&E of ML for Assuring Reliability in Safety-Critical Applications

    Abstract:

    Artificial intelligence (AI) and Machine Learning (ML) are increasingly being examined for their utility in many domains. AI/ML solutions are being proposed for a broad set of applications, including surrogate modeling, anomaly detection, classification, image segmentation, control, etc. A lot of effort is being put into evaluating these solutions for robustness, especially in the context of safety critical applications. While traditional methods of verification and validation continue to be necessary, challenges exist in many safety critical applications given the limited ability to gather data covering all possible conditions, and limited ability to conduct experiments. This presentation will discuss potential approaches for testing and evaluating machine learning algorithms in such applications, as well as metrics for this purpose.

    Speaker Info:

    Pradeep Ramuhalli

    Group Lead

    Oak Ridge National Laboratory

    Dr. Pradeep Ramuhalli is a group lead for the Modern Nuclear Instrumentation and Controls group and a Distinguished R&D Scientist at Oak Ridge National Laboratory (ORNL). He leads a group with experience in measurement and data science applications to a variety of complex engineered systems. His research focus is on the development of sensor technologies for extreme environment and the integration of data from these sensors with data analytics technologies for prognostic health management and operational decision making. He also leads research on ML for enabling robust engineered systems (AIRES - AI for Robust Engineering and Science) as part of an internal research initiative at ORNL.

  • Planning for Public Sector Test and Evaluation in the Commercial Cloud

    Abstract:

    As the public sector shifts IT infrastructure toward commercial cloud solutions, the government test community needs to adjust its test and evaluation (T&E) methods to provide useful insights into a cloud-hosted system’s cyber posture. Government entities must protect what they develop in the cloud by enforcing strict access controls and deploying securely configured virtual assets. However, publicly available research shows that doing so effectively is difficult, with accidental misconfigurations leading to the most commonly observed exploitations of cloud-hosted systems. Unique deployment configurations and identity and access management across different cloud service providers increases the burden of knowledge on testers. More care must be taken during the T&E planning process to ensure that test teams are poised to succeed in understanding the cyber posture of cloud-hosted systems and finding any vulnerabilities present in those systems. The T&E community must adapt to this new paradigm of cloud-hosted systems to ensure that vulnerabilities are discovered and mitigated before an adversary has the opportunity to use those vulnerabilities against the system.

    Speaker Info:

    Brian Conway

    Research Staff Member

    IDA

    Lee Allison received his Ph.D. in experimental nuclear physics from Old Dominion University in 2017 studying a specialized particle detector system. Lee is now a Research Staff Member at the Institute for Defense Analyses in the cyber operational testing group where he focuses mainly on Naval and Land Warfare platform-level cyber survivability testing. Lee has also helped to build one of IDA’s cyber lab environments that both IDA staff members, DOT&E staff, and the OT community can use to better understand cyber survivability test and evaluation.

    Brian Conway holds a B.S. from the University of Notre Dame and a Ph.D. from Pennsylvania State University, where he studied solvation dynamics in ionic liquid mixtures with conventional solvents.  He joined the Institute for Defense Analyses in 2019, and has since supported operational testing in both the Departments of Defense and Homeland Security.   There, he focuses on cyber testing of Naval and Net-Centric and Space Systems to evaluate whether adversaries have the ability to exploit those systems and effect the missions executed by warfighters.

  • Plotting and Programming in Python

    Abstract:

    Plotting and Programming in Python is an introductory Python lesson offered by Software Carpentry.  This workshop covers data analysis and visualization in Python, focusing on working with core data structures (including tabular data), using conditionals and loops, writing custom functions, and creating customized plots.  This workshop also introduces learners to JupyterLab and strategies for getting help.  This workshop is appropriate for learners with no previous programming experience.

    Speaker Info:

    Elif Dede Yildirim

    The Carpentries

    Elif Dede Yildirim is a data scientist within the Office of Data and Analytics at All of US Program, NIH. She leads the data quality workstream and support demo and driver projects. She holds MS degrees in Statistics and Child Development, and my PhD in Child Development from Syracuse University. She completed her postdoctoral work at the University of Missouri-Columbia and held a faculty appointment at Auburn University, where she taught graduate-level method and stats courses and provided statistical consulting. She is currently pursuing her second undergraduate degree in Computer Science at Auburn, and plan to graduate in December 2023.

  • Plotting and Programming in Python

    Abstract:

    Plotting and Programming in Python is an introductory Python lesson offered by Software Carpentry.  This workshop covers data analysis and visualization in Python, focusing on working with core data structures (including tabular data), using conditionals and loops, writing custom functions, and creating customized plots.  This workshop also introduces learners to JupyterLab and strategies for getting help.  This workshop is appropriate for learners with no previous programming experience.

    Speaker Info:

    Chasz Griego

    The Carpentries

    Chasz Griego is an Open Science Postdoctoral Associate at the Carnegie Mellon University (CMU) Libraries. He received a PhD in Chemical Engineering from the University of Pittsburgh studying computational models to accelerate catalyst material discovery. He leads and supports Open Science teaching and research initiatives, particularly in the areas of reproducibility in computational research. His research involves investigating how open tools help promote reproducibility with computational research. He supports students and researchers at CMU with Python programming for data science applications, literate programming with Jupyter Notebooks, and version control with Git/GitHub.

  • Post-hoc UQ of Deep Learning Models Applied to Remote Sensing Image Scene Classification

    Abstract:

    Post-hoc Uncertainty Quantification of Deep Learning Models Applied to Remote Sensing Image Scene Classification

    Steadily growing quantities of high-resolution UAV, aerial, and satellite imagery provide an exciting opportunity for global transparency and geographic profiling of activities of interest. Advances in deep learning, such as deep convolutional neural networks (CNNs) and transformer models, offer more efficient ways to exploit remote sensing imagery. Transformers, in particular, are capable of capturing contextual dependencies in the data. Accounting for context is important because activities of interest are often interdependent and reveal themselves in co-occurrence of related image objects or related signatures. However, while transformers and CNNs are powerful models, their predictions are often taken as point estimates, also known as pseudo probabilities, as they are computed by the softmax function. They do not provide information about how confident the model is in its predictions, which is important information in many mission-critical applications, and therefore limits their use in this space.

    Model evaluation metrics can provide information about the predictive model’s performance. We present and discuss results of post-hoc uncertainty quantification (UQ) of deep learning models, i.e., UQ application to trained models. We consider an application of CNN and transformer models to remote sensing image scene classification using satellite imagery, and compare confidence estimates of scene classification predictions of these models using evaluation metrics, such as expected calibration error, reliability diagram, and Brier score, in addition to conventional metrics, e.g. accuracy and F1 score. For validation, we use the publicly available and well-characterized Remote Sensing Image Scene Classification (RESISC45) dataset, which contains 31,500 images, covering 45 scene categories with 700 images in each category, and with the spatial resolution that varies from 30 to 0.2 m per pixel. This dataset was collected over different locations and under different conditions and possesses rich variations in translation, viewpoint, object pose and appearance, spatial resolution, illumination, background, and occlusion.

    Speaker Info:

    Alexei Skurikhin

    scientist

    Los Alamos National Laboratory

    Alexei Skurikhin is a scientist with Remote Sensing and Data Science group at Los Alamos National Laboratory (LANL). He holds a Ph.D. in Computer Science and has been working at LANL since 1997 in the areas of signal and image analysis, evolutionary computations, computer vision, machine learning, and remote sensing applications.

  • Post-hoc UQ of Deep Learning Models Applied to Remote Sensing Image Scene Classification

    Abstract:

    Post-hoc Uncertainty Quantification of Deep Learning Models Applied to Remote Sensing Image Scene Classification

    Steadily growing quantities of high-resolution UAV, aerial, and satellite imagery provide an exciting opportunity for global transparency and geographic profiling of activities of interest. Advances in deep learning, such as deep convolutional neural networks (CNNs) and transformer models, offer more efficient ways to exploit remote sensing imagery. Transformers, in particular, are capable of capturing contextual dependencies in the data. Accounting for context is important because activities of interest are often interdependent and reveal themselves in co-occurrence of related image objects or related signatures. However, while transformers and CNNs are powerful models, their predictions are often taken as point estimates, also known as pseudo probabilities, as they are computed by the softmax function. They do not provide information about how confident the model is in its predictions, which is important information in many mission-critical applications, and therefore limits their use in this space.

    Model evaluation metrics can provide information about the predictive model’s performance. We present and discuss results of post-hoc uncertainty quantification (UQ) of deep learning models, i.e., UQ application to trained models. We consider an application of CNN and transformer models to remote sensing image scene classification using satellite imagery, and compare confidence estimates of scene classification predictions of these models using evaluation metrics, such as expected calibration error, reliability diagram, and Brier score, in addition to conventional metrics, e.g. accuracy and F1 score. For validation, we use the publicly available and well-characterized Remote Sensing Image Scene Classification (RESISC45) dataset, which contains 31,500 images, covering 45 scene categories with 700 images in each category, and with the spatial resolution that varies from 30 to 0.2 m per pixel. This dataset was collected over different locations and under different conditions and possesses rich variations in translation, viewpoint, object pose and appearance, spatial resolution, illumination, background, and occlusion.

    Speaker Info:

    Alexei Skurikhin

    scientist

    Los Alamos National Laboratory

    Alexei Skurikhin is a scientist with Remote Sensing and Data Science group at Los Alamos National Laboratory (LANL). He holds a Ph.D. in Computer Science and has been working at LANL since 1997 in the areas of signal and image analysis, evolutionary computations, computer vision, machine learning, and remote sensing applications.

  • Post-hoc UQ of Deep Learning Models Applied to Remote Sensing Image Scene Classification

    Abstract:

    Post-hoc Uncertainty Quantification of Deep Learning Models Applied to Remote Sensing Image Scene Classification

    Steadily growing quantities of high-resolution UAV, aerial, and satellite imagery provide an exciting opportunity for global transparency and geographic profiling of activities of interest. Advances in deep learning, such as deep convolutional neural networks (CNNs) and transformer models, offer more efficient ways to exploit remote sensing imagery. Transformers, in particular, are capable of capturing contextual dependencies in the data. Accounting for context is important because activities of interest are often interdependent and reveal themselves in co-occurrence of related image objects or related signatures. However, while transformers and CNNs are powerful models, their predictions are often taken as point estimates, also known as pseudo probabilities, as they are computed by the softmax function. They do not provide information about how confident the model is in its predictions, which is important information in many mission-critical applications, and therefore limits their use in this space.

    Model evaluation metrics can provide information about the predictive model’s performance. We present and discuss results of post-hoc uncertainty quantification (UQ) of deep learning models, i.e., UQ application to trained models. We consider an application of CNN and transformer models to remote sensing image scene classification using satellite imagery, and compare confidence estimates of scene classification predictions of these models using evaluation metrics, such as expected calibration error, reliability diagram, and Brier score, in addition to conventional metrics, e.g. accuracy and F1 score. For validation, we use the publicly available and well-characterized Remote Sensing Image Scene Classification (RESISC45) dataset, which contains 31,500 images, covering 45 scene categories with 700 images in each category, and with the spatial resolution that varies from 30 to 0.2 m per pixel. This dataset was collected over different locations and under different conditions and possesses rich variations in translation, viewpoint, object pose and appearance, spatial resolution, illumination, background, and occlusion.

    Speaker Info:

    Alexei Skurikhin

    scientist

    Los Alamos National Laboratory

    Alexei Skurikhin is a scientist with Remote Sensing and Data Science group at Los Alamos National Laboratory (LANL). He holds a Ph.D. in Computer Science and has been working at LANL since 1997 in the areas of signal and image analysis, evolutionary computations, computer vision, machine learning, and remote sensing applications.

  • Predicting Aircraft Load Capacity Using Regional Climate Data

    Abstract:

    While the impact of local weather conditions on aircraft performance is well-documented, climate change has the potential to create long-term shifts in aircraft performance. Using just one metric, internal load capacity, we document operationally relevant performance changes for a UH-60L within the Indo-Pacific region. This presentation uses publicly available climate and aircraft performance data to create a representative analysis. The underlying methodology can be applied at varying geographic resolutions, timescales, airframes, and aircraft performance characteristics across the entire globe.

    Speaker Info:

    Abraham Holland

    Research Staff Member

    IDA

    Dr. Abraham Holland joined the Institute for Defense Analyses (IDA) in 2019 after completing his PhD in Public Policy at Harvard University. At IDA, Dr. Holland is an applied microeconomist that has led a range analyses across defense manpower, operations, and infrastructure topics. He is also a founding member of IDA’s climate and energy security working group, those researchers focused on supporting IDA’s capability to bring the best available climate science to today’s national security challenges. In this area, he has completed analyses on the potential impact of climate change on Department of Defense equipment, personnel, and operations. In addition to being a U.S. Air Force veteran, he received his undergraduate degree from Dartmouth College and graduated summa cum laude in economics and Chinese literature.

  • Predicting Success and Identifying Key Characteristics in Special Forces Selection

    Abstract:

    The United States Military possesses special forces units that are entrusted to engage in the most challenging and dangerous missions that are essential to fighting and winning the nations wars. Entry into special forces is based on a series of assessments called Special Forces Assessment and Selection (SFAS), which consists of numerous challenges that test a soldiers mental toughness, physical fitness, and intelligence. Using logistic regression, random forest classification, and neural network classification, the researchers in this study aim to create a model that both accurately predicts whether a candidate passes SFAS and which variables are significant indicators of passing selection. Logistic regression proved to be the most accurate model, while also highlighting physical fitness, military experience, and intellect as the most significant indicators associated with success.

    Speaker Info:

    Mark Bobinski

    Cadet

    United States Military Academy

    I am currently a senior at the United States Military Academy at West Point. My major is Applied Statistics and Data Science and I come from Cleveland, Ohio. This past summer I had the opportunity to work with the Army's Special Warfare Center and School as an intern where we began the work on this project.  I thoroughly enjoy mathematical modeling and look to begin a career in data science upon retiring from the military.

  • Present Your Science

    Abstract:

    This comprehensive 1-day course equips scientists, engineers, researchers, and technical professionals to present their science in an understandable, memorable, and persuasive way.  Through a dynamic combination of lecture, discussion, exercises, and video analysis, each participant will walk away with the skills, knowledge, and practice necessary to transform the way their work is presented. Five course objectives are covered:

    1. Transform the scientific presentations skills of participants. Enable participants to utilize effective strategies for content, structure, slide design and delivery of scientific presentations.
    2. Teach participants to analyze and adapt to their audience.
    3. Help participants understand which scientific details to emphasize in their presentation and which details to filter out.
    4. Equip participants to understand and enact the assertion evidence slide design in their own talks to make their scientific presentation slides more understandable, memorable, and engaging.
    5. Assist participants in developing an engaging and confident delivery style.

    *** Attendees should bring a laptop with them to the session. ***

    Speaker Info:

    Melissa Marshall

    Founder

    Present Your Science

    Melissa Marshall is the leading expert on presenting complex ideas. Melissa Marshall is on a mission: to transform how scientists, engineers, and technical professionals present their work. That’s because she believes that even the best science is destined to remain undiscovered unless it’s presented in a clear and compelling way that sparks innovation and drives adoption. For a decade, she’s traveled around the world to work with Fortune 100 corporations, institutions and universities, teaching the proven strategies she’s mastered through her consulting work and during her decade as a faculty member at Penn State University. In 2019 through 2022, Microsoft has named her a Most Valuable Professional (MVP) for her work in transforming the way the scientific community uses PowerPoint to convey their research. Melissa has also authored a new online course on LinkedIn Learning. Melissa’s workshops are lively, practical and transformational. For a sneak peek, check out her TED Talk, “Talk Nerdy to Me.” It’s been watched by over 2.5 million people (and counting).

  • Recommendations for Statistical Analysis of Modeling and Simulation Environment Outputs

    Abstract:

    Modeling and simulation (M&S) environments feature frequently in test and evaluation (T&E) of Department of Defense (DoD) systems. Testers may generate outputs from M&S environments more easily than collecting live test data, but M&S outputs nevertheless take time to generate, cost money, require training to generate, and are accessed directly only by a select group of individuals. Nevertheless, many M&S environments do not suffer many of the resourcing limitations associated with live test. We thus recommend testers apply higher resolution output generation and analysis techniques compared to those used for collecting live test data. Doing so will maximize stakeholders’ understanding of M&S environments’ behavior and help utilize its outputs for activities including M&S verification, validation, and accreditation (VV&A), live test planning, and providing information for non-T&E activities.

    This presentation provides recommendations for collecting outputs from M&S environments such that a higher resolution analysis can be achieved. Space filling designs (SFDs) are experimental designs intended to fill the operational space for which M&S predictions are expected. These designs can be coupled with statistical metamodeling techniques that estimate a model that flexibly interpolates or predicts M&S outputs and their distributions at both observed settings and unobserved regions of the operational space. Analysts can use the resulting metamodels as a surrogate for M&S outputs in situations where the M&S environment cannot be deployed. They can also study metamodel properties to decide if a M&S environment adequately represents the original systems. IDA has published papers recommending specific space filling design and metamodeling techniques; this presentation briefly covers the content of those papers.

    Speaker Info:

    Curtis Miller

    Research Staff Member

    IDA

    Dr. Curtis Miller is a research staff member of the Operational Evaluation Division at the Institute for Defense Analyses. In that role, he advises analysts on effective use of statistical techniques, especially pertaining to modeling and simulation activities and U.S. Navy operational test and evaluation efforts, for the division's primary sponsor, the Director of Operational Test and Evaluation. He obtained a PhD in mathematics from the University of Utah and has several publications on statistical methods and computational data analysis, including an R package, CPAT. In the past, he has done research on topics in economics including estimating difference in pay between male and female workers in the state of Utah on behalf of Voices for Utah Children, an advocacy group.

  • Reinforcement Learning Approaches to the T&E of AI/ML-based Systems Under Test

    Abstract:

    Designed experiments provide an efficient way to sample the complex interplay of essential factors and conditions during operational testing. Analysis of these designs provide more detailed and rigorous insight into the system under test’s (SUT) performance than top-level summary metrics provide. The introduction of artificial intelligence and machine learning (AI/ML) capabilities in SUTs create a challenge for test and evaluation because the factors and conditions that constitute the AI SUT's “feature space” are more complex than those of a mechanical SUT. Executing the equivalent of a full-factorial design quickly becomes infeasible.

    This presentation will demonstrate an approach to efficient, yet rigorous, exploration of the AI/ML-based SUT’s feature space that achieves many of the benefits of a traditional design of experiments – allowing more operationally meaningful insight into the strengths and limitations of the SUT than top-level AI summary metrics (like ‘accuracy’) provide. The approach uses an algorithmically defined search method within a reinforcement learning-style test harness for AI/ML SUTs. An adversarial AI (or AI critic) efficiently traverses the feature space and maps the resulting performance of the AI/ML SUT. The process identifies interesting areas of performance that would not otherwise be apparent in a roll-up metric. Identifying 'toxic performance regions', in which combinations of factors and conditions result in poor model performance, provide critical operational insights for both testers and evaluators. The process also enables T&E to explore the SUT's sensitivity and robustness to changes in inputs and the boundaries of the SUT's performance envelope. Feedback from the critic can be used by developers to improve the AI/ML SUT and by evaluators to interpret in terms of effectiveness, suitability, and survivability. This procedure can be used for white box, grey box and black box testing.

    Speaker Info:

    Karen O'Brien

    Principal Data Scientist

    Modern Technology Solutions, Inc

    Karen O'Brien has 20 years of service as a Dept. of the Army Civilian.   She has worked as a physical scientist and ORSA in a wide range of mission areas, from ballistics to logistics, and from S&T to T&E.   She was a physics and chemistry nerd as an undergrad but uses her Masters in Predictive Analytics from Northwestern to support DoD agencies in developing artificial intelligence, machine learning, and advanced analytics capabilities.  She is currently a principal data scientist at Modern Technology Solutions, Inc.

  • Saving hardware, labor, and time using Bayesian adaptive design of experiments

    Abstract:

    Physical testing in the national security enterprise is often costly. Sometimes this is driven by hardware and labor costs, other times it can be driven by finite resources of time or hardware builds. Test engineers must make the most of their available resources to answer high consequence problems. Bayesian adaptive design of experiments (BADE) is one tool that should be in an engineer’s toolbox for designing and running experiments. BADE is sequential design of experiment approach which allows early stopping decisions to be made in real time using predictive probabilities (PP), allowing for more efficient data collection. BADE has seen successes in clinical trials, another high consequence arena, and it has resulted in quicker and more effective assessments of drug trials. BADE has been proposed for testing in the national security space for similar reasons of quicker and cheaper test series. Given the high-consequence nature of the tests performed in the national security space, a strong understanding of new methods is required before being deployed. The main contribution of this research is to assess the robustness of PP in a BADE under different modeling assumptions, and to compare PP results to its frequentist alternative, conditional power (CP). Comparisons are made based on Type I error rates, statistical power, and time savings through average stopping time. Simulation results show PP has some robustness to distributional assumptions. PP also tends to control Type I error rates better than CP, while maintaining relatively strong power. While CP usually recommends stopping a test earlier than PP, CP also tends to have more inconsistent results, again showing the benefits of PP in a high consequence application. An application to a real problem from Sandia National Laboratories shows the large potential cost savings for using PP. The results of this study suggest BADE can be one piece of an evidence package during testing to stop testing early and pivot, in order to decrease costs and increase flexibility.

    Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525.

    Speaker Info:

    Daniel Ries

    Senior Member of the Technical Staff

    Sandia National Laboratories

    Daniel Ries is a Senior Member of the Technical Staff at Sandia National Laboratories, where he has been since 2018. His roles at Sandia include statistical test engineer, technical researcher, project manager, and a short stint as acting manager of the Statistical Sciences Department. His work includes developing explainable AI to solve national security problems, applying Bayesian methods to include uncertainty quantification in solutions, and provide test and analysis support to weapon modernization programs. Daniel also serves as an Adjunct Professor at the University of Illinois Urbana-Champaign as an instructor and mentor for statistics majors interested in pursuing a data science career in the national security enterprise. Daniel received his PhD in statistics from Iowa State University in 2017.

  • Seamlessly Integrated Materials Labs at AFRL

    Abstract:

    One of the challenges to conducting research in the Air Force Research Laboratory is that many of our equipment controllers cannot be directly connected to our internal networks, due to older or specialized operating systems and the need for administrative privileges for proper functioning. This means that the current data collection process is often highly manual, with users documenting experiments in physical notebooks and transferring data via CDs or portable hard drives to connected systems for sharing or further processing. In the Materials & Manufacturing Directorate, we have developed a unique approach to seamlessly integrate our labs for more efficient data collection and transfer, which is specifically designed to help users ensure that data is findable for future reuse. In this talk, we will highlight our two enabling tools: NORMS, which assists users to easily generate metadata for direct association with data collected in the lab to eliminate physical notebooks; and Spike, which automates one-way data transfer from isolated systems to databases mirrored on other networks. In these databases, metadata can be used for complex search queries and data is automatically shared with project members without requiring additional transfers. The impact of this solution has been significantly faster data availability (including searchability) to all project members: a transfer and scanning process that used to take 3 hours can now take a few minutes. Future use cases will also enable Spike to transfer data directly into cloud buckets for in situ analysis, which would streamline collaboration with partners.

    Speaker Info:

    Lauren Ferguson

    Digital Transformation Lead, Materials & Manufacturing Directorate

    Air Force Research Laboratory

    Dr. Lauren Ferguson is the Digital Transformation Lead in the Materials & Manufacturing Directorate of the Air Force Research Laboratory in Dayton, OH. She earned her PhD in mathematics from Texas A&M University where she became interested in mathematical applications to materials science problems through an NSF fellowship. She spent eight years developing state-of-the-art simulation tools for composite materials that accurately model post-processing material state, capture complex damage patterns due to service loads and environments, and predict remaining life. For the last two years, she has pivoted to driving digital transformation efforts at AFRL, including facilitating pilot projects to seamlessly integrate labs for streamlined data collection and analysis, and to make Google Workspace and Cloud tools available to foster collaboration with global partners.

  • Skyborg Data Pipeline

    Abstract:

    The purpose of the Skyborg Data Pipeline is to allow for the rapid turnover of flight data collect during a test event, using collaborative easily access tool sets available in the AFOTEC Data Vault. Ultimately the goal of this data pipeline is to provide a working up to date dashboard that leadership can utilize shortly after a test event.

    Speaker Info:

    Alexander Malburg

    Data Analyst

    AFOTEC/EX

    AFOTEC/EX Data Analyst

  • STAR: A Cloud-based Innovative Tool for Software Quality Analysis

    Abstract:

    Traditionally, subject matter experts perform software quality analysis using custom spreadsheets which produce inconsistent output and are challenging to share and maintain across teams. This talk will introduce and demonstrate STAR – a cloud-based, data-driven tool for software quality analysis. The tool is aimed at practitioners who manage software quality and make decisions based on its readiness for delivery. Being web-based and fully automated allows teams to collaborate on software quality analysis across multiple projects and releases. STAR is an integration of SaaS and automated analytics. It is a digital engineering tool for software quality practice.

    To use the tool, all users need to do is upload their defect and development effort (optional) data to the tool and set a couple of planned release milestones, such as test start date and delivery dates for customer trial and deployment. The provided data is then automatically processed and aggregated into a defect growth curve. The core innovation of STAR is in its set of Statistical sound algorithms that are then used to fit a defect prediction curve to the provided data. This is achieved through the automated identification of inflection points in the original defect data and their use in generating piece-wise exponential models that make up the final prediction curve. Moreover, during the early days of software development, where no defect data is available, STAR can use the development effort plan and learn from previous software releases' defects and effort data to make predictions for the current release. Finally, the tool implements a range of what-if scenarios that enable practitioners to evaluate several potential actions to correct course.

    Thanks to the use of an earlier version of STAR by a large software development group at Nokia and the current trialing collaboration with NASA, the features and accuracy of the tool have improved to be better than traditional single curve fitting. In particular, the defect prediction is stable several weeks before the planned software release, and the multiple metrics provided by the tool make the analysis of software quality straightforward, guiding users in making an intelligent decision regarding the readiness for high-quality software delivery.

    Speaker Info:

    Kazu Okumoto

    CEO

    Sakura Software Solutions (3S) LLC

    Kazu is a well-recognized pioneer in software reliability engineering. He invented a world-famous statistical model for software reliability (the “Goel – Okumoto model”). After retiring from Nokia Bell Labs in 2020 as a Distinguished Member of the Technical Staff, Dr. Okumoto founded Sakura Software Solutions, which has developed a cloud-based innovative tool, STAR, for software quality assurance. His co-authored book on software reliability is the most frequently referenced in this field. And he has an impressive list of book chapters, keynote addresses, and numerous technical papers to his credit. Since joining Bell Labs in 1980, Kazu has worked on many exciting projects for original AT&T, Lucent, Alcatel Lucent, and Nokia. He has 13 years of management experience, including Bell Labs Field Representative in Japan. He completed his Ph D. (1979) and MS (1976) at Syracuse University and BS (1974) at Hiroshima University. He was an Assistant Professor at Rutgers University.

  • Systems Engineering Applications of UQ in Space Mission Formulation

    Abstract:

    In space mission formulation, it is critical to link the scientific phenomenology under investigation directly to the spacecraft design, mission design, and the concept of operations. With many missions of discovery, the large uncertainty in the science phenomenology and the operating environment necessitates mission architecture solutions that are robust and resilient to these unknowns, in order to maximize the probability of achieving the mission objectives. Feasible mission architectures are assessed against performance, cost, and risk, in the context of large uncertainties. For example, despite Cassini observations of Enceladus, significant uncertainties exist in the moon’s jet properties and the surrounding Enceladus environment. Orbilander or any other mission to Enceladus will need to quantify or bound these uncertainties in order to formulate a viable design and operations trade space that addresses a range of mission objectives within the imposed technical and programmatic constraints. Uncertainty quantification (UQ), utilizes a portfolio of stochastic, data science, and mathematical methods to characterize uncertainty of a system and inform risk and decision-making. This discussion will focus on a formulation of a UQ workflow and an example of an Enceladus mission development use case.

    Speaker Info:

    Kelli McCoy

    Senior Systems Engineer

    NASA Jet Propulsion Laboratory, University of Southern California

    Kelli McCoy began her career at NASA Kennedy Space Center as an Industrial Engineer in the Launch Services Program, following her graduation from Georgia Tech with a M.S in Industrial and Systems Engineering. She went on to obtain a M.S. in Applied Math and Statistics at Georgetown University, and subsequently developed probability models to estimate cost and schedule during her tenure with the Office of Evaluation at NASA Headquarters. Now at Jet Propulsion Laboratory, she has found applicability for math and probability models in an engineering environment. She further developed that skillset as the Lead of the Europa Clipper Project Systems Engineering Analysis Team, where she and her team produced 3 probabilistic risk assessments for the mission, using their model-based SE environment. She is currently the modeling lead for a JPL New Frontiers proposal and is a member of JPL's Quantification of Uncertainty Across Disciplines (QUAD) team, which is promoting Uncertainty Quantification practices across JPL. In parallel, Kelli is pursuing a PhD in UQ at University of Southern California.

  • Systems Engineering Applications of UQ in Space Mission Formulation

    Abstract:

    In space mission formulation, it is critical to link the scientific phenomenology under investigation directly to the spacecraft design, mission design, and the concept of operations. With many missions of discovery, the large uncertainty in the science phenomenology and the operating environment necessitates mission architecture solutions that are robust and resilient to these unknowns, in order to maximize the probability of achieving the mission objectives. Feasible mission architectures are assessed against performance, cost, and risk, in the context of large uncertainties. For example, despite Cassini observations of Enceladus, significant uncertainties exist in the moon’s jet properties and the surrounding Enceladus environment. Orbilander or any other mission to Enceladus will need to quantify or bound these uncertainties in order to formulate a viable design and operations trade space that addresses a range of mission objectives within the imposed technical and programmatic constraints. Uncertainty quantification (UQ), utilizes a portfolio of stochastic, data science, and mathematical methods to characterize uncertainty of a system and inform risk and decision-making. This discussion will focus on a formulation of a UQ workflow and an example of an Enceladus mission development use case.

    Speaker Info:

    Roger Ghanem

    Senior Systems Engineer

    NASA Jet Propulsion Laboratory, University of Southern California

    Co-author, Roger Ghanem is a Professor Civil Engineering at the University of Southern California

  • T&E as a Continuum

    Abstract:

    A critical change in how Test and Evaluation (T&E) supports capability delivery is needed to maintain our advantage over potential adversaries. Making this change requires a new paradigm in which T&E provides focused and relevant information supporting decision-making continually throughout capability development and informs decision makers from the earliest stages of Mission Engineering (ME) through Operations and Sustainment (O&S). This new approach improves the quality of T&E by moving from a serial set of activities conducted largely independently of Systems Engineering (SE) and ME activities to a new integrative framework focused on a continuum of activities termed T&E as a Continuum.

    T&E as a Continuum has three key attributes – capability and outcome focused testing; an agile, scalable evaluation framework; and enhanced test design – critical in the conduct of T&E and improving capability delivery. T&E as a Continuum builds off the 2018 DoD Digital Engineering (DE) Strategy’s five critical goals with three key enablers – robust live, virtual, and constructive (LVC) testing; developing model-based environments; and a “digital” workforce knowledgeable of the processes and tools associated with MBSE, model-based T&E, and other model-based processes.

    T&E as a Continuum improves the quality of T&E through integration of traditional SE and T&E activities, providing a transdisciplinary, continuous process coupling design evolution with VV&A.

    Speaker Info:

    Orlando Flores

    Chief Engineer, DTE&A

    OUSD(R&E)

    Mr. Orlando Flores is currently the Chief Engineer within the Office of the Executive Director for Developmental Test, Evaluation, and Assessments (DTE&A). He serves as the principle technical advisor to the Executive Director, Deputies, Cybersecurity Technical Director, Systems Engineering and Technical Assistance (SETA) staff, and outside Federally Funded Research and Development Center (FFRDC) technical support for all DT&E and Systems Engineering (SE) matters.
    Prior to this he served as the Technical Director for Surface Ship Weapons within the Program Executive Office Integrated Warfare Systems (PEO IWS). Where he was responsible for providing technical guidance and direction for the Surface Ship Weapons portfolio of surface missiles, launchers and gun weapon systems.
    From June 2016 to June 2019 he served as the Director for Surface Warfare Weapons for the Deputy Assistant Secretary of the Navy for Ship Programs (DASN Ships), supporting the Assistant Secretary of the Navy for Research, Development and Acquisition. He was responsible for monitoring and advising DASN Ships on all matters related to surface weapons and associated targets.
    Prior to this from August 2009 through June 2016, Mr. Flores was the Deputy Project Manager and Lead Systems Engineer for the Standard Missile-6 Block IA (SM-6 Blk IA) missile program within PEO IWS. In this role he oversaw the management, requirements definition, design, development and fielding of one of the Navy’s newest surface warfare weapons.
    Beginning in 2002 and through 2009, Mr. Flores served in multiple capacities within the Missile Defense Agency (MDA). His responsibilities included functional lead for test and evaluation of the Multiple Kill Vehicle program; Modeling and Simulation development lead; Command and Control, Battle Management, and Communications systems engineer for the Kinetic Energy Interceptors program; and Legislative Liaison for all U.S. House of Appropriations Committee matters. In 2008 he was selected to serve as a foreign affairs analyst for the Deputy Assistant Secretary of Defense for Nuclear and Missile Defense Policy within the Office of the Under Secretary of Defense for Policy where he developed and oversaw policies, strategies, and concepts pertaining to U.S. Ballistic Missile Defense System operations and deployment across the globe.
    Mr. Flores began his federal career in 1998 as an engineering intern for the Naval Sea Systems Command (NAVSEA) within the Department of the Navy. In July 2000 Mr. Flores graduated from the engineering intern program and assumed the position of Battle Management, Command Control, and Communications (BMC3) systems engineer for the NTW program where he led the design and development of ship-based ballistic missile defense BMC3 systems through 2002.
    Mr. Flores graduated from New Mexico State University in 1998 where he earned a bachelor’s of science degree in Mechanical Engineering. He earned a master’s of Business Administration in 2003. Mr. Flores is a member of the Department of Defense Acquisition Professional Community and has achieved two Defense Acquisition Workforce Improvements Act certifications: Level III in Program Management and Level II in Systems Planning, Research, Development and Engineering.

  • T&E Landscape for Advanced Autonomy

    Abstract:

    The DoD is making significant investments in the development of autonomous systems, spanning from basic research, at organizations such as DARPA and ONR, to major acquisition programs, such as PEO USC.
    In this talk we will discuss advanced autonomous systems as complex, fully autonomous systems and systems of systems, rather than specific subgenres of autonomous functions – i.e. basic path planning autonomy or vessel controllers for moving vessels from point A to B.
    As a community, we are still trying to understand how to integrate these systems in the field with the warfighter to fully optimize their added capabilities. A major goal of using autonomous systems is to support multi-domain, distributed operations. We have a vision for how this may work, but we don’t know when, or if, these systems will be ready any time soon to implement these visions. We must identify trends, analyze bottlenecks, and find scalable approaches to fielding these capabilities, such as identifying certification criterion, or optimizing methods of testing and evaluating (T&E) autonomous systems.
    Traditional T&E methods are not sufficient for cutting edge autonomy and artificial intelligence (AI). Not only do we have to test the traditional aspects of system performance (speed, endurance, range, etc.) but also the decision-making capabilities that would have previously been performed by humans.
    This complexity increases when an autonomous system changes based on how it is applied in the real world. Each domain, environment, and platform an autonomy is run on, presents unique autonomy considerations.
    Complexity is further compounded when we begin to stack these autonomies and integrate them into a fully autonomous system of systems. Currently, there are no standard processes or procedures for testing these nested, complex autonomies; yet there are numerous areas for growth and improvement in this space.
    We will dive into identified capability gaps in Advanced Autonomy T&E that we have recognized and provide approaches for how the DOD may begin to tackle these issues. It is important that we make critical contributions towards testing, trusting and certifying these complex autonomous systems.
    Primary focus areas that are addressed include:
    - Recommending the use of bulk testing through Modeling and Simulation (M&S), while ensuring that the virtual environment is representative of the operational environment.
    - Developing intelligent tests and test selection tools to locate and discriminate areas of interest faster than through traditional Monte-Carlo sampling methods.
    - Building methods for testing black box autonomies faster than real time, and with fewer computational requirements.
    - Providing data analytics that assess autonomous systems in ways that human provide decision makers a means for certification.
    - Expanding the concept of what trust means, how to assess and, subsequently, validate trustworthiness of these systems across stakeholders.
    - Testing experimental autonomous systems in a safe and structured manner that encourages rapid fielding and iteration on novel autonomy components.

    Speaker Info:

    Kathryn Lahman

    Program Manager

    Johns Hopkins University Applied Physics Laboratory

    I am the Program Manager for Advanced Autonomy Test & Evaluation (AAT&E) program within the Sea Control Mission Area of the Johns Hopkins Applied Physics Laboratory (JHU/APL).
    My primary focus is guiding the DoD towards a full scope understanding of what Autonomy T&E truly involves such as:
    - validation of Modeling and Simulation (M&S) environments and models
    - development of test tools and technologies to improve T&E within M&S environments
    - data collection, analysis and visualization to make smarter decisions more easily
    - improvement and streamlining of the T&E process to optimize continual development, test, and fielding
    - understanding and measuring trust of autonomy
    - supporting rapid experimentation and feedback loop from testing via M&S to testing with physical systems

    In my previous life as a Human Systems Engineer (HSE), I developed skills in Usability Engineering, Storyboarding, Technical Writing, Human Computer Interaction, and Information Design. Strong information technology professional with a Master of Science (M.S.) focused in Human Centered Computing from University of Maryland Baltimore County (UMBC).

    As I moved to JHU/APL I became focused on HSE involving UxVs (Unmanned Vehicles of multiple domains) and autonomous systems. I further moved into managing projects across autonomous system domains with the Navy as my primary sponsor. As my skillsets and understanding of Autonomy and Planning, Test and Evaluation (PT&E) of those systems grew, I applied a consistent human element to my approach for PT&E.

  • Test and Evaluation Methods for Authorship Attribution and Privacy Preservation

    Abstract:

    The aim of the IARPA HIATUS program is to develop explainable systems for authorship attribution and author privacy preservation through the development of feature spaces which encode the distinguishing stylistic characteristics of authors independently of text genre, topic, or format. In this talk, I will discuss progress towards defining an evaluation framework for this task to provide robust insights into system strengths, weaknesses, and overall performance. Our evaluation strategy includes the use of an adversarial framework between attribution and privacy systems, development of a focused set of core metrics, analysis of system performance dependencies on key data factors, systematic exploration of experimental variables to probe targeted questions about system performance, and investigation of key trade-offs between different performance measures.

    Speaker Info:

    Emily Saldanha

    Senior Data Scientist

    Pacific Northwest National Laboratory

    Dr. Emily Saldanha is a research scientist in the Data Science and Analytics group of the National Security Directorate at Pacific Northwest National Laboratory. Her work focuses on developing machine learning, deep learning, and natural language processing methods for diverse applications with the aim to extract information and patterns from complex and multimodal datasets with weak and noisy signals. Her research efforts have spanned application areas ranging from energy technologies to computational social science. She received her Ph.D. in physics from Princeton University in 2016, where her work focused on the development and application of calibration algorithms for microwave sensors for cosmological observations.

  • Test and Evaluation of AI Cyber Defense Systems

    Abstract:

    Adoption of Artificial Intelligence and Machine Learning powered cybersecurity defenses (henceforth, AI defenses) has outpaced testing and evaluation (T&E) capabilities. Industrial and governmental organizations around the United States are employing AI defenses to protect their networks in ever increasing numbers, with the commercial market for AI defenses currently estimated at $15 billion and expected to grow to $130 billion by 2030. This adoption of AI defenses is powered by a shortage of over 500,000 cybersecurity staff in the United States, by a need to expeditiously handle routine cybersecurity incidents with minimal human intervention and at machine speeds, and by a need to protect against highly sophisticated attacks. It is paramount to establish, through empirical testing, trust and understanding of the capabilities and risks associated with employing AI defenses.

    While some academic work exists for performing T&E of individual machine learning models trained using cybersecurity data, we are unaware of any principled method for assessing the capabilities of a given AI defense within an actual network environment. The ability of AI defenses to learn over time poses a significant T&E challenge, above and beyond those faced when considering traditional static cybersecurity defenses. For example, an AI defense may become more (or less) effective at defending against a given cyberattack as it learns over time. Additionally, a sophisticated adversary may attempt to evade the capabilities of an AI defense by obfuscating attacks to maneuver them into its blind spots, by poisoning the training data utilized by the AI defense, or both.

    Our work provides an initial methodology for performing T&E of on-premises network-based AI defenses on an actual network environment, including the use of a network environment with generated user network behavior, automated cyberattack tools to test the capabilities of AI cyber defenses to detect attacks on that network, and tools for modifying attacks to include obfuscation or data poisoning. Discussion will also center on some of the difficulties with performing T&E on an entire system, instead of just an individual model.

    Speaker Info:

    Shing-hon Lau

    Senior Cybersecurity Engineer

    Software Engineering Institute, Carnegie Mellon University

    Shing-hon Lau is a Senior Cybersecurity Engineer at the CERT Division of the Software Engineering Institute at Carnegie Mellon University, where he investigates the intersection between cybersecurity, artificial intelligence, and machine learning.  His research interests include rigorous testing of artificial intelligence systems, building secure and trustworthy machine learning systems, and understanding the linkage between cybersecurity and adversarial machine learning threats.  One research effort concerns the development of a methodology to evaluate the capabilities of AI-powered cybersecurity defensive tools.

    Prior to joining the CERT Division, Lau obtained his PhD in Machine Learning in 2018 from Carnegie Mellon. His doctoral work focused on the application of keystroke dynamics, or the study of keyboard typing rhythms, for authentication, insider-threat detection, and healthcare applications.

  • Test and Evaluation of Systems with Embedded Artificial Intelligence Components

    Abstract:

    As Artificial Intelligence (AI) continues to advance, it is being integrated into more systems. Often, the AI component represents a significant portion of the system that reduces the burden on the end user or significantly improves the performance of a task. The AI component represents an unknown complex phenomenon that is learned from collected data without the need to be explicitly programmed. Despite the improvement in performance, the models are black boxes. Evaluating the credibility and the vulnerabilities of AI models poses a gap in current test and evaluation practice. For high consequence applications, the lack of testing and evaluation procedures represents a significant source of uncertainty and risk. To help reduce that risk, we have developed a red-teaming inspired methodology to evaluate systems embedded with an AI component. This methodology highlights the key expertise and components that are needed beyond what a typical red team generally requires. Opposed to academic evaluation of AI models, we present a system-level evaluation rather than the AI model in isolation. We outline three axes along which to evaluate an AI component: 1) Evaluating the performance of the AI component to ensure that the model functions as intended and is developed based on bast practices developed by the AI community. This process entails more than simply evaluating the learned model. As the model operates on data used for training as well as perceived by the system, peripheral functions such as feature engineering and the data pipeline need to be included. 2) AI components necessitate supporting infrastructure in deployed systems. The support infrastructure may introduce additional vulnerabilities that are overlooked in traditional test and evaluation processes. Further, the AI component may be subverted by modifying key configuration files or data pipeline components. 3) AI models introduce possible vulnerabilities to adversarial attacks. These could be attacks designed to evade detection by the model, steal the model, poison the model, steal the model or data, or misuse the model to act inappropriately. Within the methodology, we highlight tools that may be applicable as well as gaps that need to be addressed by the community.

    SNL is managed and operated by NTESS under DOE NNSA contract DE-NA0003525

    Speaker Info:

    Michael Smith

    Principal Member

    Sandia National Laboratories

    Michael R. Smith is a Principal Member at Sandia National Laboratories. He previously earned his PhD at Brigham Young University for for his work on instance-level metalearning. In his current, his research focuses on the explainability, credibility, and validation of machine learned models in high-consequence applications and their effects on decision making.

  • Test and Evaluation Tool for Stealthy Communication

    Abstract:

    Stealthy communication allows the transfer of information while hiding not only the content of that information but also the fact that any hidden information was transferred. One way of doing this is embedding information into network covert channels, e.g., timing between packets, header fields, and so forth. We describe our work on an integrated system for the design, analysis, and testing of such communication. The system consists of two main components: the analytical component, the NExtSteP (NRL Extensible Stealthy Protocols) testbed, and the emulation component, consisting of CORE (Common Open Research Emulator), an existing open source network emulator, and EmDec, a new tool for embedding stealthy traffic in CORE and decoding the result.

    We developed the NExtSteP testbed as a tool to evaluate the performance and stealthiness of embedders and detectors applied to network traffic. NExtSteP includes modules to: generate synthetic traffic data or ingest it from an external source (e.g., emulation or network capture); embed data using an extendible collection of embedding algorithms; classify traffic, using an extendible collection of detectors, as either containing or not containing stealthy communication; and quantify, using multiple metrics, the performance of a detector over multiple traffic samples. This allows us to systematically evaluate the performance of different embedders (and embedder parameters) and detectors against each other.

    Synthetic data are easy to generate with NExtSteP. We use these data for initial experiments to broadly guide parameter selection and to study asymptotic properties that require numerous long traffic sequences to test. The modular structure of NExtSteP allows us to make our experiments increasingly realistic. We have done this in two ways: by ingesting data from captured traffic and then doing embedding, classification, and detector analysis using NExtSteP, and by using EmDec to produce external traffic data with embedded communication and then using NExtStep to do the classification and detector analysis.

    The emulation component was developed to build and evaluate proof-of-concept stealthy communications over existing IP networks. The CORE environment provides a full network, consisting of multiple nodes, with minimal hardware requirements and allows testing and orchestration of real protocols. Our testing environment allows for replay of real traffic and generation of synthetic traffic using MGEN (Multi-Generator) network testing tool. The EmDec software was created with the already existing NRL-developed protolib (protocol library). EmDec, running on CORE networks and orchestrated using a set of scripts, generates sets of data which are then evaluated for effectiveness by NExtSteP. In addition to evaluation by NExtSteP, development of EmDec allowed us to discover multiple novelties that were not apparent while using theoretical models.

    We describe current status of our work, the results so far, and our future plans.

    Speaker Info:

    Olga Chen

    Computer Scientist

    U.S. Naval Research Laboratory

    Dr. Olga Chen has worked as a Computer Scientist at the U.S. Naval Research Laboratory since 1999. For the last three years, she has been the Principal Investigator for the “Stealthy Communications and Situational Awareness” project. Her current research focuses on network protocols and communications, design of security protocols and architectures, and their analysis and verification. She has published peer-reviewed research on approaches to software security and on design and analysis of stealthy communications. She has a Doctorate in Computer Science from the George Washington University.

  • The Application of Semi-Supervised Learning in Image Classification

    Abstract:

    In today's Army, one of the fastest growing and most important areas in the effectiveness of our military is data science. One aspect of this field is image classification, which has applications such as target identification. However, one drawback within this field is that when an analyst begins to deal with a multitude of images, it becomes infeasible for an individual to examine all the images and classify them accordingly. My research presents a methodology for image classification which can be used in a military context, utilizing a typical unsupervised classification approach involving K-Means to classify a majority of the images while pairing this with user input to determine the label of designated images. The user input comes in the form of manual classification of certain images which are deliberately selected for presentation to the user, allowing this individual to select which group the image belongs in and refine the current image clusters. This shows how a semi-supervised approach to image classification can efficiently improve the accuracy of the results when compared to a traditional unsupervised classification approach.

    Speaker Info:

    Elijah Dabkowski

    Cadet

    United States Military Academy

    CDT Elijah Dabkowski is a senior at the United States Military Academy majoring in Applied Statistics and Data Science. He branched Engineers and hopes to pursue a Master of Science in Data Science through a technical scholarship upon graduation. Within the Army, CDT Dabkowski plans to be a combat engineer stationed in either Germany or Italy for the early portion of his career before transitioning to the Operations Research and Systems Analysis career field in order to use his knowledge to help the Army make informed data-driven decisions. His research is centered around the application of semi-supervised learning in image classification to provide a proof-of-concept for the Army in how data science can be integrated with the subject matter expertise of professional analysts to streamline and improve current practices. He enjoys soccer, fishing, and snowboarding and is a member of the club soccer team as well as a snowboard instructor at West Point.

  • The Automaton General-Purpose Data Intelligence Platform

    Abstract:

    The Automaton general-purpose data intelligence platform abstracts data analysis out to a high level and automates many routine analysis tasks while being highly extensible and configurable – enabling complex algorithms to elucidate mission-level effects. Automaton is built primarily on top of R Project and its features enable analysts to build charts and tables, calculate aggregate summary statistics, group data, filter data, pass arguments to functions, generate animated geospatial displays for geospatial time series data, flatten time series data into summary attributes, fit regression models, create interactive dashboards, and conduct rigorous statistical tests. All of these extensive analysis capabilities are automated and enabled from an intuitive configuration file requiring no additional software code. Analysts or software engineers can easily extend Automaton to include new algorithms, however. Automaton’s development was started at Johns Hopkins University Applied Physics Laboratory in 2018 to support an ongoing military mission and perform statistically rigorous analyses that use Bayesian-inference-based Artificial Intelligence to elucidate mission-level effects. Automaton has unfettered Government Purpose Rights and is freely available. One of DOT&E’s strategic science and technology thrusts entails automating data analyses for Operational Test & Evaluation as well as developing data analysis techniques and technologies targeting mission-level effects; Automaton will be used, extended, demonstrated/trained on, and freely shared to accomplish these goals and collaborate with others to drive our Department’s shared mission forward. This tutorial will provide an overview of Automaton’s capabilities (first 30 min, for Action Officers and Senior Leaders) as well as instruction on how to install and use the platform (remaining duration for hands-on-time with technical practitioners).

     Installation instructions are below and depend upon the user installing Windows Subsystem for Linux or having access to another Unix environment (e.g., macOS):

     Please install WSL V2 on your machines before the tutorial:

    https://learn.microsoft.com/en-us/windows/wsl/install

    Then please download/unzip the Automaton demo environment and place in your home directory:

    https://www.edaptive.com/dataworks/automaton_2023-04-18_dry_run_1.tar

    Then open up powershell from your home directory and type:

    wsl --import automaton_2023-04-18_dry_run_1 automaton_2023-04-18_dry_run_1 automaton_2023-04-18_dry_run_1.tar
    wsl -d automaton_2023-04-18_dry_run_1

    Speaker Info:

    Jeremy Werner

    Chief Scientist

    DOT&E

    Jeremy Werner, PhD, ST was appointed DOT&E’s Chief Scientist in December 2021 after initially starting at DOT&E as an Action Officer for Naval Warfare in August 2021.  Before then, Jeremy was at Johns Hopkins University Applied Physics Laboratory (JHU/APL), where he founded a data science-oriented military operations research team that transformed the analytics of an ongoing military mission.  Jeremy previously served as a Research Staff Member at the Institute for Defense Analyses where he supported DOT&E in the rigorous assessment of a variety of systems/platforms.  Jeremy received a PhD in physics from Princeton University where he was an integral contributor to the Compact Muon Solenoid collaboration in the experimental discovery of the Higgs boson at the Large Hadron Collider at CERN, the European Organization for Nuclear Research in Geneva, Switzerland.  Jeremy is a native Californian and received a bachelor’s degree in physics from the University of California, Los Angeles where he was the recipient of the E. Lee Kinsey Prize (most outstanding graduating senior in physics).

  • The Calculus of Mixed Meal Tolerance Test Trajectories

    Abstract:

    BACKGROUND
    Post-prandial glucose response resulting from a mixed meal tolerance test is evaluated from trajectory data of measured glucose, insulin, C-peptide, GLP-1 and other measurements of insulin sensitivity and β-cell function. In order to compare responses between populations or different composition of mixed meals, the trajectories are collapsed into the area under the curve (AUC) or incremental area under the curve (iAUC) for statistical analysis. Both AUC and iAUC are coarse distillations of the post-prandial curves and important properties of the curve structure are lost.

    METHODS
    Visual Basic Application (VBA) code was written to automatically extract seven different key calculus-based curve-shape properties of post-prandial trajectories (glucose, insulin, C-peptide, GLP-1) beyond AUC. Through two-sample t-tests, the calculus-based markers were compared between outcomes (reactive hypoglycemia vs. healthy) and against demographic information.

    RESULTS
    Statistically significant p-values (p < .01) between multiple curve properties in addition to AUC were found between each molecule studied and the health outcome of subjects based on the calculus-based properties of their molecular response curves. A model was created which predicts reactive hypoglycemia based on individual curve properties most associated with outcomes.

    CONCLUSIONS
    There is a predictive power using response curve properties that was not present using solely AUC. In future studies, the response curve calculus-based properties will be used for predicting diabetes and other health outcomes. In this sense, response-curve properties can predict an individual's susceptibility to illness prior to its onset using solely mixed meal tolerance test results.

    Speaker Info:

    Skyler Chauff

    Cadet

    United States Military Academy

    Skyler Chauff is a third-year student at the United States Military Academy at West Point. He is studying "Operations Research" and hopes to further pursue a career in data science in the Army. His hobbies include scuba-diving, traveling, and tutoring. Skyler is the head of the West Point tutoring program and helps lead the Army Smart nonprofit in providing free tutoring services to enlisted soldiers pursuing higher-level education. Skyler specializes in bioinformatics given his pre-medical background interwoven with his passion for data science.

  • The Component Damage Vector Method: A Statistically Rigorous Method for Validating AJEM

    Abstract:

    As the Test and Evaluation community increasingly relies on Modeling and Simulation (M&S) to supplement live testing, M&S validation has become critical for ensuring credible weapon system evaluations. System-level evaluations of Armored Fighting Vehicles (AFV) rely on the Advanced Joint Effectiveness Model (AJEM) and Full-Up System Level (FUSL) testing to assess AFV vulnerability. This report reviews one of the primary methods that analysts use to validate AJEM, called the Component Damage Vector (CDV) Method. The CDV method compares components that were damaged in FUSL testing to simulated representations of that damage from AJEM.

    Speaker Info:

    Tom Johnson

    Research Staff Member

    IDA

    Tom works on the LFT&E of Army land-based systems. He has three degrees in Aerospace Engineering and specializes in statistics and experimental design, including the validation of modeling and simulation. Tom has been at IDA for 11 years.

  • The Containment Assurance Risk Framework of the Mars Sample Return Program

    Abstract:

    The Mars Sample Return campaign aims at bringing rock and atmospheric samples from Mars to Earth through a series of robotic missions. These missions would collect the samples being cached and deposited on Martian soil by the Perseverance rover, place them in a container, and launch them into Martian orbit for subsequent capture by an orbiter that would bring them back. Given there exists a non-zero probability that the samples contain biological material, precautions are being taken to design systems that would break the chain of contact between Mars and Earth. These include techniques such as sterilization of Martian particles, redundant containment vessels, and a robust reentry capsule capable of accurate landings without a parachute.
    Requirements exist that the probability of containment not assured of Martian-contaminated material into Earth’s biosphere be less than one in a million. To demonstrate compliance with this strict requirement, a statistical framework was developed to assess the likelihood of containment loss during each sample return phase and make a statement about the total combined mission probability of containment not assured. The work presented here describes this framework, which considers failure modes or fault conditions that can initiate failure sequences ultimately leading to containment not assured. Reliability estimates are generated from databases, design heritage, component specifications, or expert opinion in the form of probability density functions or point estimates and provided as inputs to the mathematical models that simulate the different failure sequences. The probabilistic outputs are then combined following the logic of several fault trees to compute the ultimate probability of containment not assured. Given the multidisciplinary nature of the problem and the different types of mathematical models used, the statistical tools needed for analysis are required to be computationally efficient. While standard Monte Carlo approaches are used for fast models, a multi-fidelity approach to rare event probabilities is proposed for expensive models. In this paradigm, inexpensive low-fidelity models are developed for computational acceleration purposes while the expensive high-fidelity model is kept in the loop to retain accuracy in the results. This work presents an example of end-to-end application of this framework highlighting the computational benefits of a multi-fidelity approach.
    The decision to implement Mars Sample Return will not be finalized until NASA’s completion of the National Environmental Policy Act process. This document is being made available for information purposes only.

    Speaker Info:

    Giuseppe Cataldo

    Head, Planetary Protection, MSR CCRS

    NASA

    Giuseppe Cataldo leads the planetary protection efforts of the Mars Sample Return (MSR) Capture, Containment and Return System (CCRS). His expertise is in the design, testing and management of space systems. He has contributed to a variety of NASA missions and projects including the James Webb Space Telescope, where he developed a Bayesian framework for model validation and a multifidelity approach to uncertainty quantification for large-scale, multidisciplinary systems. Giuseppe holds a PhD in Aeronautics and Astronautics from the Massachusetts Institute of Technology (MIT) and several master's degrees from Italy and France.

  • Tools for Assessing Machine Learning Models' Performance in Real-World Settings

    Abstract:

    Machine learning (ML) systems demonstrate powerful predictive capability, but fielding such systems does not come without risk. ML can catastrophically fail in some scenarios, and in the absence of formal methods to validate most ML models, we require alternative methods to increase trust. While emerging techniques for uncertainty quantification and model explainability may seem to lie beyond the scope of many ML projects, they are essential tools for understanding deployment risk. This talk will share a practical workflow, useful tools, and lessons learned for ML development best practices.

    Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525. SAND2023-11982A

    Speaker Info:

    Carianne Martinez

    Principal Computer Scientist

    Sandia National Laboratories

    Cari Martinez is a Principal Computer Scientist in the Applied Machine Intelligence Department at Sandia National Laboratories. She is a technical lead for a team that focuses on applied deep learning research to benefit Sandia’s mission across a diverse set of science and engineering disciplines. Her research focuses on improving deep learning modeling capabilities with domain knowledge, uncertainty quantification, and explainability techniques. Cari's work has been applied to modeling efforts in several fields such as materials science, engineering science, structural dynamics, chemical engineering, and healthcare.

  • Topological Data Analysis’ involvement in Cyber Security

    Abstract:

    The purpose of this research is to see the use and application of Topological Data Analysis (TDA) in the real of Cyber Security. The methods used in this research include an exploration of different Python libraries or C++ python interfaces in order to explore the shape of data that is involved using TDA. These methods include, but are not limited to, the GUDHI, GIOTTO, and Scikit-tda libraries. The project’s results will show where the literal holes in cyber security lie and will offer methods on how to better analyze these holes and breaches.

    Speaker Info:

    Anthony Cappetta

    Cadet

    United States Military Academy

    Anthony “Tony” Cappetta is native of Yardley, Pennsylvania, senior, and Operations Research major at the United States Military Academy (USMA) at West Point. Upon graduation, Tony will commission in the United States Army as a Field Artillery Officer. Tony serves as the training staff officer of the Scoutmasters’ Council, French Forum, and Center for Enhanced Performance. He was also on the Crew team for three years as a student-athlete. An accomplished pianist, Tony currently serves as the Cadet-in-Charge of the Department of Foreign Language’s Piano and Voice Mentorship program, which he has been a part of since arriving at the Academy. Tony has planned and conducted independent research in the field of statistical concepts at USMA as well as independent studies in complex mathematics (Topology and Number Theory) with Dr. Andrew Yarmola of Princeton University. Currently, his research is interested in the Topological Data Analysis application on Cyber Security. He is a current semi-finalist for in the Fulbright program where he hopes to model and map disease transmission in his pursuits to eradicate disease.

  • Topological Data Analysis’ involvement in Cyber Security

    Abstract:

    The purpose of this research is to see the use and application of Topological Data Analysis (TDA) in the real of Cyber Security. The methods used in this research include an exploration of different Python libraries or C++ python interfaces in order to explore the shape of data that is involved using TDA. These methods include, but are not limited to, the GUDHI, GIOTTO, and Scikit-tda libraries. The project’s results will show where the literal holes in cyber security lie and will offer methods on how to better analyze these holes and breaches.

    Speaker Info:

    Elie Alhajjar

    Cadet

    United States Military Academy

  • Towards Scientific Practices for Situation Awareness Evaluation in Operational Testing

    Abstract:

    Situation Awareness (SA) plays a key role in decision making and human performance; higher operator SA is associated with increased operator performance and decreased operator errors. In the most general terms, SA can be thought of as an individual’s “perception of the elements in the environment within a volume of time and space, the comprehension of their meaning, and the projection of their status in the near future.”

    While “situational awareness” is a common suitability parameter for systems under test, there is no standardized method or metric for quantifying SA in operational testing (OT). This leads to varied and suboptimal treatments of SA across programs and test events. Current measures of SA are exclusively subjective and paint an inadequate picture. Future advances in system connectedness and mission complexity will exacerbate the problem. We believe that technological improvements will necessitate increases in the complexity of the warfighters’ mission, including changes to team structures (e.g., integrating human teams with human-machine teams), command and control (C2) processes (e.g., expanding C2 frameworks toward joint all-domain C2), and battlespaces (e.g., overcoming integration challenges for multi-domain operations). Operational complexity increases the information needed for warfighters to maintain high SA, and assessing SA will become increasingly important and difficult to accomplish.

    IDA’s Test science team has proposed a piecewise approach to improve the measurement of situation awareness in operational evaluations. The aim of this presentation is to promote a scientific understanding of what SA is (and is not) and encourage discussion amongst practitioners tackling this challenging problem. We will briefly introduce Endsley’s Model of SA, review the trade-offs involved in some existing measures of SA, and discuss a selection of potential ways in which SA measurement during OT may be improved.

    Speaker Info:

    Miriam Armstrong

    Research Staff Member

    IDA

    Dr. Armstrong is a human factors researcher at IDA where she is involved in operational testing of defense systems. Her expertise includes interactions between humans and autonomous systems and psychometrics. She received her PhD in Human Factors Psychology from Texas Tech University in 2021. Coauthors Elizabeth Green, Brian Vickers, and Janna Mantua also conduct human subjects research at IDA.

  • Uncertain Text Classification for Proliferation Detection

    Abstract:

    A key global security concern in the nuclear weapons age is the proliferation and development of nuclear weapons technology, and a crucial part of enforcing non-proliferation policy is developing an awareness of the scientific research being pursued by other nations and organizations. Deep, transformer-based text classification models are an important piece of systems designed to monitor scientific research for this purpose. For applications like proliferation detection involving high-stakes decisions, there has been growing interest in ensuring that we can perform well-calibrated, interpretable uncertainty quantification with such classifier models. However, because modern transformer-based text classification models have hundreds of millions of parameters and the computational cost of uncertainty quantification typically scales with the size of the parameter space, it has been difficult to produce computationally tractable uncertainty quantification for these models. We propose a new variational inference framework that is computationally tractable for large models and meets important uncertainty quantification objectives including producing predicted class probabilities that are well-calibrated and reflect our prior conception of how different classes are related.

    Speaker Info:

    Andrew Hollis

    Graduate Student

    North Carolina State University

    Andrew Hollis was born raised in Los Alamos, New Mexico. He attended the University of New Mexico as a Regents’ Scholar and received his bachelor’s degree in statistics with minors in computer science and mathematics in spring 2018. During his time in undergraduate, he also completed four summer internships at Los Alamos National Laboratory in the Principal Associate Directorate for Global Security. He began the PhD program in Statistics at North Carolina State University in August of 2018, and received his Masters of Statistics in December of 2020. While at NCSU, he has conducted research in collaboration with the Laboratory for Analytical Sciences, a research lab focused on building analytical tools for the intelligence community, the Consortium for Nonproliferation Enabling Capabilities, and West Point. He has had opportunities to complete two internships with the Department of Defense including an internship with the Air Force at the Pentagon in the summer of 2022. He plans to graduate with his PhD in May of 2023, and will begin working with the Air Force as an operations research analyst after graduation.
  • Uncertainty Aware Machine Learning for Accelerators

    Abstract:

    Standard deep learning models for classification and regression applications are ideal for capturing complex system dynamics.
    Unfortunately, their predictions can be arbitrarily inaccurate when the input samples are not similar to the training data.
    Implementation of distance aware uncertainty estimation can be used to detect these scenarios and provide a level of confidence associated with their predictions.
    We present results using Deep Gaussian Process Approximation (DGPA) methods for 1) anomaly detection at Spallation Neutron Source (SNS)
    accelerator and 2) uncertainty aware surrogate model for the Fermi National Accelerator Lab (FNAL) Booster Accelerator Complex.

    Speaker Info:

    Malachi Schram

    Head of Data Science Dept.

    Thomas Jefferson National Accelerator Facility

    Dr. Malachi Schram is the head of the data scientist department at the Thomas Jefferson National Accelerator Facility.

    His research spans large scale distributed computing, applications for data science, and developing new techniques and algorithms in data science.

    His current research is focused on uncertainty quantification for deep learning and new techniques for design and control.

  • Uncertainty Quantification of High Heat Microbial Reduction for NASA Planetary Protection

    Abstract:

    Planetary Protection is the practice of protecting solar system bodies from harmful contamination by Earth life and protecting Earth from possible life forms or bioactive molecules that may be returned from other solar system bodies. Microbiologists and engineers at NASA’s Jet Propulsion Laboratory (JPL) design microbial reduction and sterilization protocols that reduce the number of microorganisms on spacecraft or eliminate them entirely. These protocols are developed using controlled experiments to understand the microbial reduction process. Many times, a phenomenological model (such as a series of differential equations) is posited that captures key behaviors and assumptions of the process being studied. A Sterility Assurance Level (SAL) – the probability that a product, after being exposed to a given sterilization process, contains one or more viable organisms – is a standard metric used to assess risk and define cleanliness requirements in industry and for regulatory agencies. Experiments performed to estimate the SAL of a given microbial reduction or sterilization protocol many times have large uncertainties and variability in their results even under rigorously implemented controls that, if not properly quantified, can make it difficult for experimenters to interpret their results and can hamper a credible evaluation of risk by decision makers.

    In this talk, we demonstrate how Bayesian statistics and experimentation can be used to quantify uncertainty in phenomenological models in the case of microorganism survival under short-term high heat exposure. We show how this can help stakeholders make better risk-informed decisions and avoid the unwarranted conservatism that is often prescribed when processes are not well understood. The experiment performed for this study employs a 6 kW infrared heater to test survivability of heat resistant Bacillus canaveralius 29669 to temperatures as high as 350 °C for time durations less than 30 sec. The objective of this study was to determine SALs for various time-temperature combinations, with a focus on those time-temperature pairs that give a SAL of 10^-6. Survival ratio experiments were performed that allow estimation of the number of surviving spores and mortality rates characterizing the effect of the heat treatment on the spores. Simpler but less informative fraction-negative experiments that only provide a binary sterile/not-sterile outcome were also performed once a sterilization temperature regime was established from survival ratio experiments. The phenomenological model considered here is a memoryless mortality model that underlies many heat sterilization protocols in use today.

    This discussion and poster will outline how the experiment and model were brought together to determine SALs for the heat treatment under consideration. Ramifications to current NASA planetary protection sterilization specifications and current missions under development such as Mars Sample Return will be discussed. This presentation/poster is also relevant to experimenters and microbiologists working on military and private medical device applications where risk to human life is determined by sterility assurance of equipment.

    Speaker Info:

    Michael DiNicola

    Systems Engineer

    Jet Propulsion Laboratory, California Institute of Technology

    Michael DiNicola is a senior systems engineer in the Systems Modeling, Analysis & Architectures Group at the Jet Propulsion Laboratory (JPL). At JPL, Michael has worked on several mission concept developments and flight projects, including Europa Clipper, Europa Lander and Mars Sample Return, developing probabilistic models to evaluate key mission requirements, including those related to planetary protection, and infuse this modeling into trades throughout formulation of the mission concepts. He works closely with microbiologists in the Planetary Protection group to model assay and sterilization methods, and applies mathematical and statistical methods to improve Planetary Protection engineering practices at JPL and across NASA. At the same time, he also works with planetary scientists to characterize the plumes of Enceladus in support of future mission concepts. Michael earned his B.S. in Mathematics from the University of California, Los Angeles and M.A. in Mathematics from the University of California, San Diego.

  • User-Friendly Decision Tools

    Abstract:

    Personal experience and anecdotal evidence suggest that presenting analyses to sponsors, especially technical sponsors, is improved by helping the sponsor understand how results were derived. Providing summaries of analytic results is necessary but can be insufficient when the end goal is to help sponsors make firm decisions. When time permits, engaging sponsors with walk-throughs of how results may change given different inputs is particularly salient in helping sponsors make decisions in the context of the bigger picture. Data visualizations and interactive software are common examples of what we call "decision tools" that can walk sponsors through varying inputs and views of the analysis. Given long-term engagement and regular communication with a sponsor, developing user-friendly decision tools is a helpful practice to support sponsors. This talk presents a methodology for building decision tools that combines leading practices in agile development and STEM education. We will use a Python-based app development tool called Streamlit to show implementations of this methodology.

    Speaker Info:

    Clifford Bridges

    Research Staff Member

    IDA

    Clifford is formally trained in theoretical mathematics and has additional experience in education, software development, and data science. He has been working for IDA since 2020 and often uses his math and data science skills to support sponsors' needs for easy-to-use analytic capabilities. Prior to starting at IDA, Clifford cofounded a startup company in the fashion technology space and served as Chief Information Officer for the company.

  • Using Multi-Linear Regression to Understand Cloud Properties' Impact on Solar Radiance

    Abstract:

    With solar energy being the most abundant energy source on Earth, it is no surprise that the reliance on solar photovoltaics (PV) has grown exponentially in the past decade. The increasing costs of fossil fuels have made solar PV more competitive and renewable energy more attractive, and the International Energy Agency (IEA) forecasts that solar PV's installed power capacity will surpass that of coal by 2027. Crucial to the management of solar PV power is the accurate forecasting of solar irradiance, which is heavily impacted by different types and distributions of clouds. Many studies have aimed to develop models that accurately predict the global horizontal irradiance (GHI) while accounting for the volatile effects of clouds; in this study, we aim to develop a statistical model that helps explain the relationship between various cloud properties and solar radiance reflected by clouds them-self. Using 2020 GOES-16 data from the GOES R-Series Advanced Baseline Imager (ABI), we investigated the effect that the cloud-optical depth, cloud top temperature, solar zenith angle, and look zenith angle had on cloud solar radiance while accounting for differing longitude and latitudes. Using these variables as the explanatory variables, we developed a linear model using multi-linear regression that, when tested on untrained data sets from different days (same time of day as the training set), results in a coefficient of determination (R^2) between .70-.75. Lastly, after analyzing the variables' degree of contribution to the cloud solar radiance, we presented error maps that highlight areas where the model succeeds and fails in prediction accuracy.

    Speaker Info:

    Grant Parker

    Cadet

    United States Military Academy

    CDT Grant Parker attends the United States Military Academy and will graduate and commission in May 2023. He is an Applied Statistics and Data Science major and is currently conducting his senior thesis with Lockheed Martin Space. At the academy, he serves as 3rd Regiment's Operations Officer where he is responsible for planning and coordinating all trainings and events for the regiment. After graduation, CDT Parker hopes to attend graduate school and then start his career as a cyber officer in the US Army.

  • Utilizing Side Information alongside Human Demonstrations for Safe Robot Navigation

    Abstract:

    Rather than wait to the test and evaluation stage of a given system to evaluate safety, this talk proposes a technique which explicitly considers safety constraints during the learning process while providing probabilistic guarantees on performance subject to the operational environment's stochasticity. We provide evidence that such an approach results an overall safer system than their non-explicit counterparts in the context of wheeled robotic ground systems learning autonomous waypoint navigation from human demonstrations. Specifically, inverse reinforcement learning (IRL) provides a means by which humans can demonstrate desired behaviors for autonomous systems to learn environmental rewards (or inversely costs). The proposed presentation addresses two limitations of existing IRL techniques. First, previous algorithms require an excessive amount of data due to the information asymmetry between the expert and the learner. When a demonstrator avoids a state, it is not clear if it was because the state is sub-optimal or dangerous. The proposed talk explains how safety can be explicitly incorporated in IRL by using task specifications defined using linear temporal logic. Referred to as side information, this approach enables autonomous ground robots to avoid dangerous states both during training, and evaluation. Second, previous IRL techniques make the often unrealistic assumption that the agent has access to full information about the environment. We remove this assumption by developing an algorithm for IRL in partially observable Markov decision processes (POMDPs) which induces state uncertainty. The developed algorithm reduces the information asymmetry while increasing the data efficiency by incorporating task specifications expressed in temporal logic into IRL. The intrinsic nonconvexity of the underlying problem is managed in a scalable manner through a sequential linear programming scheme that guarantees local converge. In a series of examples, including experiments in a high-fidelity Unity simulator, we demonstrate that even with a limited amount of data and POMDPs with tens of thousands of states, our algorithm learns reward functions and policies that satisfy the safety specifications while inducing similar behavior to the expert by leveraging the provided side information.

    Speaker Info:

    Christian Ellis

    Journeyman Fellow

    Army Research Laboratory

    Christian Ellis is a PhD student focused on building safe and robust learning algorithms for autonomous ground systems which can operate in environments beyond were they are trained. His research interests include safe inverse reinforcement learning, environmental uncertainty quantification, test and evaluation, and formal verification. His resulting publications provide applied solutions to complete United States Army missions on real robotic hardware. Christian recently received a best student paper award for the paper titled, "Software and System Reliability Engineering for Autonomous Systems incorporating Machine Learning". Before research, Christian worked as a lead software engineer at a startup.

  • Validating the Prediction Profiler with Disallowed Combination: A Case Study

    Abstract:

    The prediction profiler is an interactive display in JMP statistical software that allows a user to explore the relationships between multiple factors and responses. A common use case of the profiler is for exploring the predicted model from a designed experiment. For experiments with a constrained design region defined by disallowed combinations, the profiler was recently enhanced to obey such constraints. In this case study, we show how a DOE based approach to validating statistical software was used to validate this enhancement.

    Speaker Info:

    Yeng Saanchi

    Analytic Software Tester

    JMP Statistical Discovery

    Yeng Saanchi is an Analytic Software Tester at JMP Statistical Discovery LLC, a SAS company. Her research interests include stochastic optimization and applications of optimal experimental designs in precision medicine.

  • Well-Calibrated Uncertainty Quantification for Language Models in the Nuclear Domain

    Abstract:

    A key component of global and national security in the nuclear weapons age is the proliferation of nuclear weapons technology and development. A key component of enforcing this non-proliferation policy is developing an awareness of the scientific research being pursued by other nations and organizations. To support non-proliferation goals and contribute to nuclear science research, we trained a RoBERTa deep neural language model on a large set of U.S. Department of Energy Office of Science and Technical Information (OSTI) research article abstracts and then finetuned this model for classification of scientific abstracts into 60 disciplines, which we call NukeLM. This multi-step approach to training improved classification accuracy over its untrained or partially out-of-domain competitors. While it is important for classifiers to be accurate, there has also been growing interest in ensuring that classifiers are well-calibrated with uncertainty quantification that is understandable to human decision-makers. For example, in the multiclass problem, classes with a similar predicted probability should be semantically related. Therefore, we also introduced an extension of the Bayesian belief matching framework proposed by Joo et al. (2020) that easily scales to large NLP models, such as NukeLM, and better achieves the desired uncertainty quantification properties.

    Speaker Info:

    Karl Pazdernik

    Senior Data Scientist

    Pacific Northwest National Laboratory

    Dr. Karl Pazdernik is a Senior Data Scientist within the National Security Directorate at Pacific Northwest National Laboratory (PNNL), a team lead within the Foundational Data Science group at PNNL, and a Research Assistant Professor at North Carolina State University. He is the program lead for the Open-Source Data Analytics program and a principal investigator on projects that involve disease modeling and image segmentation for materials science.

    His research has focused on the uncertainty quantification and dynamic modeling of multi-modal data with a particular interest in text analytics, spatial statistics, pattern recognition, anomaly detection, Bayesian statistics, and computer vision applied to financial data, networks, combined open-source data, disease prediction, and nuclear materials.

    He received a B.A. in Mathematics from Saint John's University, a Ph.D. in Statistics from Iowa State University, and was a postdoctoral scholar at North Carolina State University under the Consortium for Nonproliferation Enabling Capabilities.

  • A Decision-Theoretic Framework for Adaptive Simulation Experiments

    Abstract:

    We describe a model-based framework for increasing effectiveness of simulation experiments in the presence of uncertainty. Unlike conventionally designed simulation experiments, it adaptively chooses where to sample, based on the value of information obtained. A Bayesian perspective is taken to formulate and update the framework’s four models.

    A simulation experiment is conducted to answer some question. In order to define precisely how informative a run is for answering the question, the answer must be defined as a random variable. This random variable is called a query and has the general form of p(theta | y), where theta is the query parameter and y is the available data.

    Examples of each of the four models employed in the framework are briefly described below:

    1. The continuous correlated beta process model (CCBP) estimates the proportions of successes and failures using beta-distributed uncertainty at every point in the input space. It combines results using an exponentially decaying correlation function. The output of the CCBP is used to estimate value of a candidate run.

    2. The mutual information model quantifies uncertainty in one random variable that is reduced by observing the other one. The model quantifies the mutual information between any candidate runs and the query, thereby scoring the value of running each candidate.

    3. The cost model estimates how long future runs will take, based upon past runs using, e.g., a generalized linear model. A given simulation might have multiple fidelity options that require different run times. It may be desirable to balance information with the cost of a mixture of runs using these multi-fidelity options.

    4. The grid state model, together with the mutual information model, are used to select the next collection of runs for optimal information per cost, accounting for current grid load.

    The framework has been applied to several use cases, including model verification and validation with uncertainty quantification (VVUQ). Given a mathematically precise query, an 80 percent reduction in total runs has been observed.

    Speaker Info:

    Terril Hurst

    Senior Engineering Fellow

    Raytheon Technologies

    Terril N Hurst is a Senior Engineering Fellow at Raytheon Technologies, where he works to ensure that model-based engineering is based upon credible models and protocols that allow uncertainty quantification. Prior to coming to Raytheon in 2005, Dr. Hurst worked at Hewlett-Packard Laboratories, including a post-doctoral appointment in Stanford University’s Logic-Based Artificial Intelligence Group under the leadership of Nils Nilsson.

  • A Framework for Using Priors in a Continuum of Testing

    Abstract:

    A strength of the Bayesian paradigm is that it allows for the explicit use of all available information—to include subject matter expert (SME) opinion and previous (possibly dissimilar) data. While frequentists are constrained to only including data in an analysis (that is to say, only including information that can be observed), Bayesians can easily consider both data and SME opinion, or any other related information that could be constructed. This can be accomplished through the development and use of priors. When prior development is done well, a Bayesian analysis will not only lead to more direct probabilistic statements about system performance, but can result in smaller standard errors around fitted values when compared to a frequentist approach. Furthermore, by quantifying the uncertainty surrounding a model parameter, through the construct of a prior, Bayesians are able to capture the uncertainty across a test space of consideration.

    This presentation develops a framework for thinking about how different priors can be used throughout the continuum of testing. In addition to types of priors, how priors can change or evolve across the continuum of testing—especially when a system changes (e.g., is modified or adjusted) during phases of testing—will be addressed. Priors that strive to provide no information (reference priors) will be discussed, and will build up to priors that contain available information (informative priors). Informative priors—both those based on institutional knowledge or summaries from databases, as well as those developed based on previous testing data—will be discussed, with a focus on how to consider previous data that is dissimilar in some way, relative to the current test event. What priors might be more common in various phases of testing, types of information that can be used in priors, and how priors evolve as information accumulates will all be discussed.

    Speaker Info:

    Victoria Sieck

    Deputy Director / Assistant Professor of Statistics

    Scientific Test & Analysis Techniques Center of Excellence (STAT COE) / Air Force Institute of Technology (AFIT)

    Dr. Victoria R. C. Sieck is the Deputy Director of the Scientific Test & Analysis (STAT) Center of Excellence (COE), where she works with major acquisition programs within the Department of Defense (DoD) to apply rigor and efficiency to current and emerging test and evaluation methodologies through the application of the STAT process. Additionally, she is an Assistant Professor of Statistics at the Air Force Institute of Technology (AFIT), where her research interests include design of experiments, and developing innovate Bayesian approaches to DoD testing. As an Operations Research Analyst in the US Air Force (USAF), her experiences in the USAF testing community include being a weapons and tactics analyst and an operational test analyst. Dr. Sieck has a M.S. in Statistics from Texas A&M University, and a Ph.D. in Statistics from the University of New Mexico.  Her Ph.D. research was on improving operational testing through the use of Bayesian adaptive testing methods.

  • A New Method for Planning Full-Up System-Level (FUSL) Live Fire Tests

    Abstract:

    Planning Full-Up System-Level (FUSL) Live Fire tests is a complex process that has historically relied solely on subject matter expertise. In particular, there is no established method to determine the appropriate number of FUSL tests necessary for a given program. We developed a novel method that is analogous to the Design of Experiments process that is used to determine the scope of Operational Test events. Our proposed methodology first requires subject matter experts (SMEs) to define all potential FUSL shots. For each potential shot, SMEs estimate the severity of that shot, the uncertainty of that severity estimate, and the similarity of that shot to all other potential shots. We developed a numerical optimization algorithm that uses the SME inputs to generate a prioritized list of FUSL events and a corresponding plot of the total information gained with each successive shot. Together, these outputs can help analysts determine the adequate number of FUSL tests for a given program. We illustrate this process with an example on a notional ground vehicle. Future work is necessary prior to implementation on a program of record.

    Speaker Info:

    Lindsey Butler

    Research Staff Member

    IDA

    Dr. Lindsey Butler holds a B.S. in Chemical Engineering from Virginia Tech and a Ph.D. in Biomedical Engineering from the University of South Carolina. She has worked at the Institute for Defense Analyses for 5 years where she supports the Director of Operational Test and Evaluation. Dr. Butler is the Deputy Live Fire lead at the Institute for Defense Analyses. Her primary projects focus on assessing the survivability of body armor and armored vehicles against operationally realistic threats. She also has an expertise evaluating casualty assessments to personnel after live fire tests.

  • A Practitioner’s Guide to Advanced Topics in DOE

    Abstract:

    Having completed a first course in DOE and begun to apply these concepts, engineers and scientists quickly learn that test and evaluation often demands knowledge beyond the use of classical designs.  This one-day short course, taught by an engineer from a practitioner’s perspective, targets this problem.  Three broad areas are covered:

    • Optimal designs address questions such as how to accommodate constraints in the design space, specify unique models, and fractionate general factorial designs. Choices for methods revolve around objectives. For example, is the priority to best estimate model parameters or to reduce overall response prediction error?
    • Split plot designs allow restrictions to randomization for hard-to-change factors. These designs, including optimal adaptations, are now supported by most commercial experimental design software and within the reach of the practitioner.
    • Sequential design approaches strive to minimize resource requirements at the onset of a test. A phased approach includes a plan to add runs to achieve specific objectives following an initial test analysis.  The approach is especially effective in high-dimensional spaces where the influence of all factors is in question and/or model order is unknown.  One example is the use of screening designs where all factors are included in the first test with the expectation that many will be found insignificant.

    The course format is to introduce relevant background material, discuss case studies, and provide software demonstrations.   Case studies and demonstrations are derived from a variety of sources, including aerospace testing and DOD T&E. Learn design approaches, design comparison metrics, best practices, and lessons learned from the instructor’s experience.   A first course in Design of Experiments is a prerequisite.

    Speaker Info:

    Drew Landman

    Professor

    Old Dominion University

    Drew Landman has 34 years of experience in engineering education as a professor at Old Dominion University.  Dr. Landman’s career highlights include13 years (1996-2009) as chief engineer at the NASA Langley Full-Scale Wind Tunnel in Hampton, VA.  Landman was responsible for program management, test design, instrument design and calibration and served as the lead project engineer for many automotive, heavy truck, aircraft, and unmanned aerial vehicle wind tunnel tests including the Centennial Wright Flyer and the Boeing X-48B and C.  His research interests and sponsored programs are focused on wind tunnel force measurement systems and statistically defensible experiment design primarily to support wind tunnel testing. Dr. Landman has served as a consultant and trainer in the area of statistical engineering to test and evaluation engineers and scientists at AIAA, NASA, Aerovironment, Airbus, Aerion, ATI, USAF, US Navy, US Marines and the Institute for Defense Analysis.  Landman founded a graduate course sequence in statistical engineering within the ODU Department of Mechanical and Aerospace Engineering.  He is currently co-authoring a text on wind tunnel test techniques.

  • A Systems Perspective on Bringing Reliability and Prognostics to Machine Learning

    Abstract:

    Machine learning is being deployed into the real-world, yet the body of knowledge on testing, evaluating, and maintaining machine learning models is overwhelmingly centered on component-level analysis. But, machine learning and engineered systems are tightly coupled. This is evidenced by extreme sensitivity to of ML to changes in system structure and behavior. Thus, reliability, prognostics, and other efforts related to test and evaluation for ML cannot be divorced from the system. That is, machine learning and its system go hand-in-hand. Any other way makes an unjustified assumption about the existence of an independent variable. This talk explores foundational reasons for this phenomena, and the foundational challenges it poses to existing practice. Cases in machine health monitoring and in cyber defense are used to motivate the position that machine learning is not independent of physical changes to the system with which it interacts, and ML is not independent of the adversaries it defends against. By acknowledging these couplings, systems and mission engineers can better align test and evaluation practices with the fundamental character of ML.

    Speaker Info:

    Tyler Cody

    Research Assistant Professor

    Virginia Tech National Security Institute

    Tyler Cody is an Assistant Research Professor at the Virginia Tech National Security Institute. His research interest is in developing principles and best practices for the systems engineering of machine learning and artificial intelligence. His research has been applied to machine learning for engineered systems broadly, including hydraulic actuators, industrial compressors, rotorcraft, telecommunication systems, and computer networks. He received his Ph.D. in systems engineering from the University of Virginia in May 2021 for his work on a systems theory of transfer learning.

  • Adversaries and Airwaves – Compromising Wireless and Radio Frequency Communications

    Abstract:

    Wireless and radio frequency (RF) technology are ubiquitous in our daily lives, including laptops, key fobs, remote sensors, and antennas. These devices, while oftentimes portable and convenient, can potentially be susceptible to adversarial attack over the air. This breakout session will provide a short introduction into wireless hacking concepts such as passive scanning, active injection, and the use of software defined radios to flexibly sample the RF spectrum. We will also ground these concepts in live demonstrations of attacks against both wireless and wired systems.

    Speaker Info:

    Jason Schlup

    Research Staff Member

    IDA

    Dr. Jason Schlup received his Ph.D. in Aeronautics from the California Institute of Technology in 2018.  He is now a Research Staff Member at the Institute for Defense Analyses and provides analytical support to the Director, Operational Test and Evaluation’s Cyber Assessment Program.  Jason also contributes to IDA’s Cyber Lab capability.

  • Adversaries and Airwaves – Compromising Wireless and Radio Frequency Communications

    Abstract:

    Wireless and radio frequency (RF) technology are ubiquitous in our daily lives, including laptops, key fobs, remote sensors, and antennas. These devices, while oftentimes portable and convenient, can potentially be susceptible to adversarial attack over the air. This breakout session will provide a short introduction into wireless hacking concepts such as passive scanning, active injection, and the use of software defined radios to flexibly sample the RF spectrum. We will also ground these concepts in live demonstrations of attacks against both wireless and wired systems.

    Speaker Info:

    Mark Herrera

    Research Staff Member

    IDA

    Dr. Mark Herrera is a physicist (PhD – University of Maryland) turned defense analyst who specializes in missiles systems, mine warfare, and cyber.  As a Project Lead at the Institute for Defense Analyses (IDA), he leads a team of cyber subject matter experts in providing rigorous, responsive, and sound analytic input supporting independent assessments of a wide variety of US Navy platforms.  Prior to IDA, Mark was a principal investigator with Heron Systems Inc, leading the development of machine learning algorithms to improve aircraft sensor performance.

  • An Introduction to Data Visualization

    Abstract:

    Data visualization can be used to present findings, explore data, and use the human eye to find patterns that a computer would struggle to locate. Borrowing tools from art, storytelling, data analytics and software development, data visualization is an indispensable part of the analysis process.

    While data visualization usage spans across multiple disciplines and sectors, most never receive formal training in the subject. As such, this tutorial will introduce key data visualization building blocks and how to best use those building blocks for different scenarios and audiences. We will also go over tips on accessibility, design and interactive elements. While this will by no means be a complete overview of the data visualization field, by building a foundation and introducing some rules of thumb, attendees will be better equipped for communicating their findings to their audience.

    Speaker Info:

    Christina Heinich

    AST, Data Systems

    NASA

    Chris Heinich is a software engineer data visualization at NASA Langley Research Center. Working in a diverse set of fields like web development, software testing, and data analysis, Chris currently acts as the data visualization expert for the Office of Chief Information Officer (OCIO) Data Science Team. In the past they have developed and maintained dashboards for use cases such as COVID case/vaccine tracking or project management, as well as deploying open source data visualization tools like Dash/Plotly to traditional government servers. Currently they lead an agency-wide data visualization community of practice, which provides training and online collaboration space for NASA employees to learn and share experiences in data visualization.

  • An Introduction to Sustainment: The Importance and Challenges of Analyzing System Readines

    Abstract:

    The Department of Defense (DoD) spends the majority of its annual budget on making sure that systems are ready to perform when called to action. Even with large investments, though, maintaining adequate system readiness poses a major challenge for the DoD. Here, we discuss why readiness is so difficult to maintain and introduce the tools IDA has developed to aid readiness and supply chain analysis and decision-making. Particular emphasis is placed on “honeybee,” the tool developed to clean, assemble, and mine data across a variety of sources in a well-documented and reproducible way. Using a notional example, we demonstrate the utility of this tool and others like it in our suite; these tools lower the barrier to performing meaningful analysis, constructing and estimating input data for readiness models, and aiding the DoD’s ability to tie resources to readiness outcomes.

    Speaker Info:

    Megan L. Gelsinger

    Deputy Director, OED (VBL) and Research Staff Member (MLG)

    IDA

    Megan Gelsinger is a Research Staff Member in the Operational Evaluation Division of IDA.  Her work focuses on weapons system sustainment and readiness modeling.  Prior to joining IDA in August 2021, she completed her PhD in statistics at Cornell University. Her graduate work focused on developing computationally efficient methods for implementing big data spatio-temporal statistical models.

  • An Introduction to Sustainment: The Importance and Challenges of Analyzing System Readines

    Abstract:

    The Department of Defense (DoD) spends the majority of its annual budget on making sure that systems are ready to perform when called to action. Even with large investments, though, maintaining adequate system readiness poses a major challenge for the DoD. Here, we discuss why readiness is so difficult to maintain and introduce the tools IDA has developed to aid readiness and supply chain analysis and decision-making. Particular emphasis is placed on “honeybee,” the tool developed to clean, assemble, and mine data across a variety of sources in a well-documented and reproducible way. Using a notional example, we demonstrate the utility of this tool and others like it in our suite; these tools lower the barrier to performing meaningful analysis, constructing and estimating input data for readiness models, and aiding the DoD’s ability to tie resources to readiness outcomes.

    Speaker Info:

    V. Bram Lillard

    Deputy Director, OED (VBL) and Research Staff Member (MLG)

    IDA

    V. Bram Lillard is the Deputy Director of the Operational Evaluation Division at IDA. In addition to managing the Division’s research staff, he leads IDA’s Sustainment and Readiness Modeling group, which supports multiple sponsors; he directs a growing team of researchers focused on analyzing raw maintenance and supply data, developing software tools to build end-to-end simulations, and identifying what investments are needed to improve weapon system readiness. Dr. Lillard has been at IDA for over 18 years, contributing to a variety of research areas, including readiness, operational testing, cost analyses, F-35, surface, submarine, and anti-submarine warfare systems.  Prior to joining IDA, Dr. Lillard completed his PhD in physics at the University of Maryland.

  • An Overview of NASA’s Low Boom Flight Demonstration

    Abstract:

    NASA will soon begin a series of tests that will collect nationally representative data on how people perceive low noise supersonic overflights. For half a century, civilian aircraft have been required to fly slower than the speed of sound over land to prevent “creating an unacceptable situation” on the ground due to sonic booms. However, new aircraft shaping techniques have led to dramatic changes in how shockwaves from supersonic flight merge together as they travel to the ground. What used to sound like a boom on the ground will be transformed into a thump. NASA is now building a full-scale, piloted demonstration aircraft called the X-59 to demonstrate low noise supersonic flight. In 2024, the X-59 aircraft will commence a national series of community overflight tests to collect data on how people perceive “sonic thumps.” The community response data will be provided to national and international noise regulators as they consider creating new standards that allow supersonic flight over land at acceptably low noise levels.

    Speaker Info:

    Jonathan Rathsam

    Technical Lead

    NASA Langley Research Center

    Jonathan Rathsam is a Senior Research Engineer at NASA’s Langley Research Center in Hampton, Virginia.  He conducts laboratory and field research on human perceptions of low noise supersonic overflights.  He currently serves as Technical Lead of Survey Design and Analysis for Community Test Planning and Execution within NASA’s Commercial Supersonic Technology Project.  Recently he served as co-chair for the annual Defense and Aerospace Test and Analysis Workshop (DATAWorks) and as chair for a NASA Source Evaluation Board.  He holds a Ph.D. in Engineering from the University of Nebraska, a B.A. in Physics from Grinnell College in Iowa, and completed postdoctoral research in acoustics at Ben-Gurion University in Israel.

  • Analysis Apps for the Operational Tester

    Abstract:

    In the acquisition and testing world, data analysts repeatedly encounter certain categories of data, such as time or distance until an event (e.g., failure, alert, detection), binary outcomes (e.g., success/failure, hit/miss), and survey responses. Analysts need tools that enable them to produce quality and timely analyses of the data they acquire during testing. This poster presents four web-based apps that can analyze these types of data. The apps are designed to assist analysts and researchers with simple repeatable analysis tasks, such as building summary tables and plots for reports or briefings. Using software tools like these apps can increase reproducibility of results, timeliness of analysis and reporting, attractiveness and standardization of aesthetics in figures, and accuracy of results. The first app models reliability of a system or component by fitting parametric statistical distributions to time-to-failure data. The second app fits a logistic regression model to binary data with one or two independent continuous variables as predictors. The third calculates summary statistics and produces plots of groups of Likert-scale survey question responses. The fourth calculates the system usability scale (SUS) scores for SUS survey responses and enables the app user to plot scores versus an independent variable. These apps are available for public use on the Test Science Interactive Tools webpage https://new.testscience.org/interactive-tools/.

    Speaker Info:

    William Raymond Whitledge

    Research Staff Member

    IDA

    Bill Whitledge is a research staff member at the Institute for Defense Analyses (IDA), where he has worked since 2010. He is currently the project leader for the operational test agent task for the Cybersecurity and Infrastructure Security Agency (CISA) under the Department of Homeland Security (DHS). This project aims to provide CISA with rigorous analysis of the operational effectiveness, usability, and reliability of the National Cybersecurity Protection System (NCPS) family of systems.

    In addition to leading the NCPS testing project, Bill works on the IDA data management committee updating IDA's data management practices and procedures. He also co-leads the IDA Connects speaker series, an internal IDA series of lunchtime talks intended to help staff stay informed on current events, meet colleagues, and learn about research across IDA. Bill is passionate about helping people visualize and present information in more elegant and succinct ways. One of his main interests is writing tools in R and other programming languages to automate data collection, analysis, and visualization. He has developed four web applications hosted on the IDA Test Science website enabling easier analysis of system reliability, binary outcomes, system usability, and general survey responses. Bill is also an avid cyclist and golfer, and he is one of the coordinators of the IDA golf league.

    Bill received his Bachelor of Arts in physics with an economics minor from Colby College in 2008. He received his Master of Science in electrical engineering with a focus in optics and optical communication at Stanford University in 2010.

  • Analysis of Target Location Error using Stochastic Differential Equations

    Abstract:

    This paper presents an analysis of target location error (TLE) based on the Cox Ingersoll Ross (CIR) model. In brief, this model characterizes TLE as a function of range based the stochastic differential equation model

    dX(r) = a(b-X(r))dr + sigma *sqrt(X(r)) dW(r)

    where X(t) is TLE at range r, b is the long-term mean (terminal) of the TLE, a is the rate of reversion of X(r) to b, sigma is the process volatility, and W(t) is the standard Weiner process.

    Multiple flight test runs under the same conditions exhibit different realizations of the TLE process. This approach to TLE analysis models each flight test run as a realization the CIR process. Fitting a CIR model to multiple data runs then provides a characterization of the TLE system under test.

    This paper presents an example use of the CIR model. Maximum likelihood estimates of the parameters of the CIR model are found from a collection of TLE data runs. The resulting CIR model is then used to characterize overall system TLE performance as a function of range to the target as well as the asymptotic estimate of long-term TLE.

    Speaker Info:

    James Brownlow

    mathematical statistician

    USAF

    Dr. James Brownlow is a tech expert in statistics with the USAF, Edwards AFB, CA.  His PhD is in time series from UC Riverside.  Dr. Brownlow has developed test and evaluation procedures using Bayesian techniques, and developed Python code to adapt parametric survival models to the analysis of target location error.  He is a coauthor of a paper that used stochastic differential equations to characterize Kalman-filtered estimates of target track state vectors.

  • Applications for Monte Carlo Analysis within Job Shop Planning

    Abstract:

    Summary overview of Discrete Event Simulations (DES) for optimizing scheduling operations in a high mix, low volume, job shop environment. The DES model employs Monte Carlo simulation to minimize schedule conflicts and prioritize work, while taking into account competition for limited resources. Iterative simulation balancing to dampen model results and arrive at a globally optimized schedule plan will be contrasted with traditional deterministic scheduling methodologies.

    Speaker Info:

    Dominik Alder

    Project Management & Planning Operations Rep Senior Staff

    Lockheed Martin, Program Management

    Master's of Science in Systems Engineering from University of Denver with emphasis in discrete event simulation and optimization

    Bachelor's of Science in Manufacturing Engineering from Brigham Young University with emphasis in statistics and applied research

    +13 years of experience with Lockheed Martin Corporation employed in statistical analysis, material science, and manufacturing engineering

  • Applications of Equivalence Testing in T&E

    Abstract:

    Traditional hypothesis testing is used extensively in test and evaluation (T&E) to determine if there is a difference between two or more populations. For example, we can analyze a designed experiment using t-tests to determine if a factor affects the response or not. Rejecting the null hypothesis would provide evidence that the factor changes the response value. However, there are many situations in T&E where the goal is to actually show that things didn’t change; the response is actually the same (or nearly the same) after some change in the process or system. If we use traditional hypothesis testing to assess this scenario, we would want to “fail to reject” the null hypothesis; however, this doesn’t actually provide evidence that the null hypothesis is true. Instead, we can orient the analysis to the decision that will be made and use equivalence testing. Equivalence testing initially assumes the populations are different; the alternative hypothesis is that they are the same. Rejecting the null hypothesis provides evidence that the populations are the same, matching the objective of the test. This talk provides an overview of equivalence testing with examples demonstrating its applicability in T&E. We also discuss additional considerations for planning a test where equivalence testing will be used including: sample size and what does “equivalent” really mean.

    Speaker Info:

    Sarah Burke

    Analyst

    LinQuest

    Sarah Burke is an analyst for The Perduco Group, A Linquest Company. She earned her doctorate in industrial engineering at Arizona State University. Her research interests include design of experiments, response surface methodology, multi‐criteria decision making, and statistical engineering.

  • Applying Design of Experiments to Cyber Testing

    Abstract:

    We describe a potential framework for applying DOE to cyber testing and provide an example of its application to testing of a hypothetical command and control system.

    Speaker Info:

    J. Michael Gilmore

    Research Staff Member

    IDA

    Dr. James M. Gilmore "Mike"
    Research Staff Member | IDA SED

     

    Fields of Expertise

    Test and evaluation; cost analysis; cost-effectiveness analysis.

     

    Education

    1976 - 1980 Doctor of Philosophy in Nuclear Engineering at University of Wisconsin
    1972 - 1976 Bachelor of Science in Physics at M.I.T.

     

    Employment

    2017 - 2018 Principal Physical Scientist, RAND Corporation
    Performed various analyses for federal government clients

    2009 - 2017 Director, Operational Test and Evaluation, Department of Defense
    Senate-confirmed Presidential appointee serving as the principal advisor to the Secretary of Defense regarding the operational effectiveness of all defense systems

    2001 - 2009 Assistant Director for National Security , Congressional Budget Office
    Responsible for the CBO division performing analyses of a broad array of issues in national security for committees of the U.S. Congress

    1994 - 2001 Deputy Director for General Purpose Programs, OSD Program Analysis and Evaluation
    Responsible for four divisions performing analyses and evaluations of all aspects of DoD's conventional forces and associated programs

    1993 - 1994 Division Director, OSD Progam Analysis and Evaluation
    Responsible for divisions in the Cost Analysis Improvement Group performing independent cost analyses of major defense acquisition programs

    1990 - 1993 Analyst, OSD Program Analysis and Evaluation
    Performed analysis of strategic defense systems and command, control, and communications systems

    1989 - 1990 Analyst, Falcon Associates
    Performed various analyses for DoD clients

    1985 - 1989 Analyst, McDonnell Douglas
    Performed analysis involving issues in command, control, communications, and intelligence

    1981 - 1985 Scientist, Lawrence Livermore National Laboratory
    Modelled nuclear fusion experiments

  • April 27 Opening Keynote II

    Speaker Info:

    Jill Marlowe

    Digital Transformation Officer, NASA

    Jill Marlowe is the NASA’s first Digital Transformation Officer, leading the Agency to conceive, architect, and accelerate enterprise digital solutions that transform NASA's work, workforce and workplace to achieve bolder missions faster and more affordably than ever before. Her responsibilities include refinement and integration of NASA’s digital transformation strategy, plans and policies and coordination of implementation activities across the NASA enterprise in six strategic thrusts: data, collaboration, modeling, artificial intelligence/machine learning, process transformation, and culture and workforce.

    Prior to this role, Ms. Marlowe was the Associate Center Director, Technical, at NASA’s Langley Research Center in Hampton, Virginia.  Ms. Marlowe led strategy and transformation of the center’s technical capabilities to assure NASA’s future mission success. In this role, she focused on accelerating Langley’s internal and external collaborations as well as the infusion of digital technologies critical for the center to thrive as a modern federal laboratory in an ever more digitally-enabled, hyper-connected, fast-paced, and globally-competitive world.

    In 2008, Ms. Marlowe was selected to the Senior Executive Service as the Deputy Director for Engineering at NASA Langley, and went on to serve as the center’s Engineering Director and Research Director. With the increasing responsibility and scope of these roles, Ms. Marlowe has a broad range of leadership experiences that include: running large organizations of 500 -  1,000 people to deliver solutions to every one of NASA’s mission directorates; sustaining and morphing a diverse portfolio of technical capabilities spanning aerosciences, structures & materials, intelligent flight systems, space flight instruments, and entry descent & landing systems; assuring safe operation of over two-million square feet of laboratories and major facilities; architecting partnerships with universities, industry and other government agencies to leverage and advance NASA’s goals; project management of technology development and flight test experiments; and throughout all of this, incentivizing innovation in very different organizational cultures spanning foundational research, technology invention, flight design and development engineering, and operations. She began her NASA career in 1990 as a structural analyst supporting the development of numerous space flight instruments to characterize Earth’s atmosphere.

    Ms. Marlowe’s formal education includes a Bachelor of Science degree in Aerospace and Ocean Engineering from Virginia Tech in 1988, a Master of Science in Mechanical Engineering from Rensselaer Polytechnic Institute in 1990, and a Degree of Engineer in Civil and Environmental Engineering at George Washington University in 1997.  She serves on advisory boards for Virginia Tech’s Aerospace & Ocean Engineering Department, Sandia National Laboratory’s Engineering Sciences Research Foundation, and Cox Communications’ Digitally Inclusive Communities (regional). She is the recipient of two NASA Outstanding Leadership Medals, was named the 2017 NASA Champion of Innovation, is an AIAA Associate Fellow, and was inducted in 2021 into the Virginia Tech Academy of Aerospace & Ocean Engineering Excellence.  She lives in Yorktown, Virginia, with her husband, Kevin, along with the youngest of their three children and two energetic labradoodles.

  • April 27th Opening Keynote I

    Speaker Info:

    Wendy Masiello

    President, Wendy Mas Consulting LLC

    Wendy Masiello is an independent consultant having retired from the United States Air Force as a Lieutenant General.  She is president of Wendy Mas Consulting, LLC and serves as an independent director for KBR Inc., EURPAC Service, Inc., and StandardAero (owned by The Carlyle Group).  She is also a Director on the Procurement Round Table and the National ReBuilding Together Board, President-elect for the National Contract Management Association (NCMA) Board, Chair of Rawls Advisory Council for Texas Tech University’s College of Business and serves on the Air Force Studies Board under the National Academies of Science, Engineering, and Medicine.

    Prior to her July 2017 retirement, she was Director of the Defense Contract Management Agency where she oversaw a $1.4 billion budget and 12,000 people worldwide in oversight of 20,000 contractors performing 340,000 contracts with more than $2 trillion in contract value. During her 36-year career, General Masiello also served as Deputy Assistant Secretary (Contracting), Office of the Assistant Secretary of the Air Force for Acquisition, and as Program Executive Officer for the Air Forces $65 billion Service Acquisition portfolio.  She also commanded the 96th Air Base Wing at Edwards Air Force Base and deployed to Iraq and Afghanistan as Principal Assistant Responsible for Contracting for Forces.

    General Masiello’s medals and commendations include the Defense Superior Service Medal, Distinguished Service Medal and the Bronze Star.  She earned her Bachelor of Business Administration degree from Texas Tech University, a Master of Science degree in logistics management from the Air Force Institute of Technology, a Master of Science degree in national resource strategy from the Industrial College of the Armed Forces, Fort Lesley J. McNair, Washington, D.C., and is a graduate of Harvard Kennedy School's Senior Managers in Government.

    General Masiello is a 2017 Distinguished Alum of Texas Tech University, was twice (2015 and 2016) named among Executive Mosaic’s Wash 100, the 2014 Greater Washington Government Contractor “Public Sector Partner of the Year,” and recognized by Federal Computer Week as one of “The 2011 Federal 100”.  She is an NCMA Certified Professional Contract Manager and an NCMA Fellow.

  • April 28 Opening Keynote

    Speaker Info:

    Nickolas Guertin

    Director, Operational Test & Evaluation, OSD/DOT&E

    Nickolas H. Guertin was sworn in as Director, Operational Test and Evaluation on December 20, 2021. A Presidential appointee confirmed by the United States Senate, he serves as the senior advisor to the Secretary of Defense on operational and live fire test and evaluation of Department of Defense weapon systems.

    Mr. Guertin has an extensive four-decade combined military and civilian career in submarine operations, ship construction and maintenance, development and testing of weapons, sensors, combat management products including the improvement of systems engineering, and defense acquisition. Most recently, he has performed applied research for government and academia in software-reliant and cyber-physical systems at Carnegie Mellon University’s Software Engineering Institute.

    Over his career, he has been in leadership of organizational transformation, improving competition, application of modular open system approaches, as well as prototyping and experimentation. He has also researched and published extensively on software-reliant system design, testing and acquisition. He received a BS in Mechanical Engineering from the University of Washington and an MBA from Bryant University. He is a retired Navy Reserve Engineering Duty Officer, was Defense Acquisition Workforce Improvement Act (DAWIA) certified in Program Management and Engineering, and is also a registered Professional Engineer (Mechanical).

    Mr. Guertin is involved with his community as an Assistant Scoutmaster and Merit Badge Counselor for two local Scouts BSA troops as well as being an avid amateur musician. He is a native of Connecticut and now resides in Virginia with his wife and twin children.

  • Assurance Techniques for Learning Enabled Autonomous Systems which Aid Systems Engineering

    Abstract:

    It is widely recognized that the complexity and resulting capabilities of autonomous systems created using machine learning methods, which we refer to as learning enabled autonomous systems (LEAS), pose new challenges to systems engineering test, evaluation, verification, and validation (TEVV) compared to their traditional counterparts. This presentation provides a preliminary attempt to map recently developed technical approaches in the assurance and TEVV of learning enabled autonomous systems (LEAS) literature to a traditional systems engineering v-model. This mapping categorizes such techniques into three main approaches: development, acquisition, and sustainment. This mapping reviews the latest techniques to develop safe, reliable, and resilient learning enabled autonomous systems, without recommending radical and impractical changes to existing systems engineering processes. By performing this mapping, we seek to assist acquisition professionals by (i) informing comprehensive test and evaluation planning, and (ii) objectively communicating risk to leaders.

    The inability to translate qualitative assessments to quantitative metrics which measure system performance hinder adoption. Without understanding the capabilities and limitations of existing assurance techniques, defining safety and performance requirements that are both clear and testable remains out of reach. We accompany recent literature reviews on autonomy assurance and TEVV by mapping such developments to distinct steps of a well known systems engineering model chosen due to its prevalence, namely the v-model. For three top-level lifecycle phases: development, acquisition, and sustainment, a section of the presentation has been dedicated to outlining recent technical developments for autonomy assurance. This representation helps identify where the latest methods for TEVV fit in the broader systems engineering process while also enabling systematic consideration of potential sources of defects, faults, and attacks. Note that we use the v-model only to assist the classification of where TEVV methods fit. This is not a recommendation to use a certain software development lifecycle over another.

    Speaker Info:

    Christian Ellis

    Journeyman Fellow

    Army Research Laboratory / University of Mass. Dartmouth

    Christian Ellis is a PhD student in the Department of Electrical and Computer Engineering at the University of Massachusetts Dartmouth,  focused on building safe and robust autonomous ground systems for the United States Department of Defense. His research interests include unstructured wheeled ground autonomy, autonomous systems assurance, and safe autonomous navigation from human demonstrations. Recently, Christian received a student based paper award in 2020 for the paper titled "Software and System Reliability Engineering for Autonomous Systems Incorporating Machine Learning".

  • Bayesian Estimation for Covariate Defect Detection Model Based on Discrete Cox Proportiona

    Abstract:

    Traditional methods to assess software characterize the defect detection process as a function of testing time or effort to quantify failure intensity and reliability. More recent innovations include models incorporating covariates that explain defect detection in terms of underlying test activities. These covariate models are elegant and only introduce a single additional parameter per testing activity. However, the model forms typically exhibit a high degree of non-linearity. Hence, stable and efficient model fitting methods are needed to enable widespread use by the software community, which often lacks mathematical expertise. To overcome this limitation, this poster presents Bayesian estimation methods for covariate models, including the specification of informed priors as well as confidence intervals for the mean value function and failure intensity, which often serves as a metric of software stability. The proposed approach is compared to traditional alternative such as maximum likelihood estimation. Our results indicate that Bayesian methods with informed priors converge most quickly and achieve the best model fits. Incorporating these methods into tools should therefore encourage widespread use of the models to quantitatively assess software.

    Speaker Info:

    Priscila Silva

    Graduate Student

    University of Massachusetts Dartmouth

    Priscila Silva is a MS student in the Department of Electrical & Computer Engineering at the University of Massachusetts Dartmouth (UMassD). She received her BS (2017) in Electrical Engineering from the Federal University of Ouro Preto, Brazil.

  • Building Bridges: a Case Study of Assisting a Program from the Outside

    Abstract:

    STAT practitioners often find ourselves outsiders to the programs we assist. This session presents a case study that demonstrates some of the obstacles in communication of capabilities, purpose, and expectations that may arise due to approaching the project externally. Incremental value may open the door to greater collaboration in the future, and this presentation discusses potential solutions to provide greater benefit to testing programs in the face of obstacles that arise due to coming from outside the program team.

    DISTRIBUTION STATEMENT A. Approved for public release; distribution is
    unlimited.
    CLEARED on 5 Jan 2022. Case Number: 88ABW-2022-0002

    Speaker Info:

    Anthony Sgambellone

    Huntington Ingalls Industries

    Dr. Tony Sgambellone is a STAT Expert  (Huntington Ingalls Industries contractor) at the Scientific Test and Analysis Techniques (STAT) Center of Excellence (COE) at the Air Force Institute of Technology (AFIT). The STAT COE provides independent STAT consultation to designated acquisition programs and special projects to improve Test & Evaluation (T&E) rigor, effectiveness, and efficiency,. Dr. Sgambellone holds a Ph.D. in Statistics, a graduate minor in College and University Teaching, and has a decade of experience spanning the fields of finance, software, and test and development.  His current interests include artificial neural networks and the application of machine learning.

  • Case Study on Applying Sequential Methods in Operational Testing

    Abstract:

    Sequential methods concerns statistical evaluation in which the number, pattern, or composition of the data is not determined at the start of the investigation but instead depends on the information acquired during the investigation. Although sequential methods originated in ballistics testing for the Department of Defense (DoD), it is underutilized in the DoD. Expanding the use of sequential methods may save money and reduce test time. In this presentation, we introduce sequential methods, describe its potential uses in operational test and evaluation (OT&E), and present a method for applying it to the test and evaluation of defense systems. We evaluate the proposed method by performing simulation studies and applying the method to a case study. Additionally, we discuss some of the challenges we might encounter when using sequential analysis in OT&E.

    Speaker Info:

    Keyla Pagán-Rivera

    Research Staff Member

    IDA

    Dr. Keyla Pagán-Rivera has a Ph.D. in Biostatistics from The University of Iowa and serves as a Research Staff Member in the Operational Evaluation Division at the Institute for Defense Analyses. She supports the Director, Operational Test and Evaluation (DOT&E) on training, research and applications of statistical methods.

  • Categorical Data Analysis

    Abstract:

    Categorical data is abundant in the 21st century, and its analysis is vital to advance research across many domains. Thus, data-analytic techniques that are tailored for categorical data are an essential part of the practitioner’s toolset. The purpose of this short course is to help attendees develop and sharpen their abilities with these tools. Topics covered in this short course will include binary and multi-category logistic regression, ordinal regression, and classification, and methods to assess predictive accuracy of these approaches will be discussed. Data will be analyzed using the R software package, and course content loosely follow Alan Agresti’s excellent textbook “An Introduction to Categorical Data Analysis, Third Edition.”

    Speaker Info:

    Chris Franck

    Assistant Professor

    Virginia Tech

    Chris Franck is an Assistant Professor in the Department of Statistics at Virginia Tech.

  • Closing Remarks

    Speaker Info:

    Alyson Wilson

    NCSU

    Dr. Alyson Wilson is the Associate Vice Chancellor for National Security and Special Research Initiatives at North Carolina State University. She is also a professor in the Department of Statistics and Principal Investigator for the Laboratory for Analytic Sciences. Her areas of expertise include statistical reliability, Bayesian methods, and the application of statistics to problems in defense and national security. Dr. Wilson is a leader in developing transformative models for rapid innovation in defense and intelligence.

    Prior to joining NC State, Dr. Wilson was a jointly appointed research staff member at the IDA Science and Technology Policy Institute and Systems and Analyses Center (2011-2013); associate professor in the Department of Statistics at Iowa State University (2008-2011); Scientist 5 and technical lead for Department of Defense Programs in the Statistical Sciences Group at Los Alamos National Laboratory (1999-2008); and senior statistician and operations research analyst with Cowboy Programming Resources (1995-1999). She is currently serving on the National Academy of Sciences Committee on Applied and Theoretical Statistics and on the Board of Trustees for the National Institute of Statistical Sciences. Dr. Wilson is a Fellow of the American Statistical Association, the American Association for the Advancement of Science, and an elected member of the International Statistics Institute.

  • Cloud Computing for Computational Fluid Dynamics (CFD) in T&E

    Abstract:

    In this talk we’ll focus on exploring the motivation for using cloud computing for Computational Fluid Dynamics (CFD) for Federal Government Test & Evaluation. Using examples from automotive, aerospace and manufacturing we’ll look at benchmarks for a number of CFD codes using CPUs (x86 & Arm) and GPUs and we’ll look at how the development of high-fidelity CFD e.g. WMLES, HRLES, is accelerating the need for access to large scale HPC. The onset of COVID-19 has also meant a large increase in the need for remote visualization with greater numbers of researchers and engineering needing to work from home. This has also accelerated the adoption of the same approaches needed towards the pre- and post-processing of peta/exa-scale CFD simulation and we’ll look at how these are more easily accessed via a cloud infrastructure. Finally, we’ll explore perspectives on integrating ML/AI into CFD workflows using data lakes from a range of sources and where the next decade may take us.

    Speaker Info:

    Neil Ashton

    WW Principal CFD Specialist Solution Architect, HPC

    Amazon Web Services

    Neil Ashton is the WW subject matter expert for CFD within AWS. He works with customers in enterprise, startup and public-sector across the globe to help them to run their CFD (and often also FEA) workloads on AWS. In addition he acts as a key advisor to the global product teams to deliver better hardware and software for CFD and broader CAE users. He is also still very active in academic research around deep-learning and machine learning, future HPC approaches and novel CFD approaches (GPU’s, numerical methods, turbulence modelling)

  • Combining data from scanners to inform cadet physical performance

    Abstract:

    Digital anthropometry obtained from 3D body scanners has already revolutionized the clothing and fitness industries. Within seconds, these scanners collect hundreds of anthropometric measurements which are used by tailors to customize an article of clothing or by fitness trainers to track their client’s progress towards a goal. Three-dimensional body scanners have also been used in military applications, such as predicting injuries at Army basic training and checking a solder’s compliance with body composition standards. In response this increased demand, several 3D body scanners have become commercially available, each with a proprietary algorithm for measuring specific body parts. Individual scanners may suffice to collect measurements from a small population; however, they are not practical for use in creating large data sets necessary to train artificial intelligence (AI) or machine learning algorithms. This study fills the gap between these two applications by correlating body circumferences taken from a small population (n = 109) on three different body scanners and creating a standard scale for pooling data from the different scanners into one large AI ready data set. This data set is then leveraged in a separate application to understand the relationship between body shape and performance on the Army Combat Fitness Test (ACFT).

    Speaker Info:

    Nicholas Ashby

    Student

    United States Military Academy

    Nicholas (Nick) Ashby is a fourth-year cadet at the United States Military Academy. He was born in California’s Central Coast and grew up in Charlottesville, VA. At West Point he is pursuing a B.S. in Applied Statistics and Data Science and will commission as an Army Aviation officer in May. In his free time Nick works as a student manager and data analyst for Army’s NCAA Division 1 baseball team and he enjoys playing golf as well. His research, under advisor Dr. Diana M. Thomas, has focused on body shape, body composition, and performance on the U.S. Army Combat Fitness Test (ACFT).

  • Computing Statistical Tolerance Regions Using the R Package ‘tolerance’

    Abstract:

    Statistical tolerance intervals of the form (1−α, P) provide bounds to capture at least a specified proportion P of the sampled population with a given confidence level 1−α. The quantity P is called the content of the tolerance interval and the confidence level 1−α reflects the sampling variability. Statistical tolerance intervals are ubiquitous in regulatory documents, especially regarding design verification and process validation. Examples of such regulations are those published by the Food and Drug Administration (FDA), the Environmental Protection Agency (EPA), the International Atomic Energy Agency (IAEA), and the standard 16269-6 of the International Organization for Standardization (ISO). Research and development in the area of statistical tolerance intervals has undoubtedly been guided by the needs and demands of industry experts.

    Some of the broad applications of tolerance intervals include their use in quality control of drug products, setting process validation acceptance criteria, establishing sample sizes for process validation, assessing biosimilarity, and establishing statistically-based design limits. While tolerance intervals are available for numerous parametric distributions, procedures are also available for regression models, mixed-effects models, and multivariate settings (i.e., tolerance regions). Alternatively, nonparametric procedures can be employed when assumptions of a particular parametric model are not met. Tools for computing such tolerance intervals and regions are a necessity for researchers and practitioners alike. This was the motivation for designing the R package ‘tolerance,’ which not only has the capability of computing a wide range of tolerance intervals and regions for both standard and non-standard settings, but also includes some supplementary visualization tools. This session will provide a high-level introduction to the ‘tolerance’ package and its many features. Relevant data examples will be integrated with the computing demonstration, and specifically designed to engage researchers and practitioners from industry and government. A recently-launched Shiny app corresponding to the package will also be highlighted.

    Speaker Info:

    Derek Young

    Associate Professor of Statistics

    University of Kentucky

    Derek Young received their PhD in Statistics from Penn State University in 2007, where his research focused on computational aspects of novel finite mixture models.  He subsequently worked as a Senior Statistician for the Naval Nuclear Propulsion Program (Bettis Lab) for 3.5 years and then as a Research Mathematical Statistician for the US Census Bureau for 3 years.  He then joined the faculty of the Department of Statistics at the University of Kentucky in the fall of 2014, where he is currently a tenured Associate Professor.  While at the Bettis Lab, he engaged with engineers and nuclear regulators, often regarding the calculation of tolerance regions.  While at the Census Bureau, he wrote several methodological and computational papers for applied survey data analysis, many as the sole author.  Since being at the University of Kentucky, he has further progressed his research agenda in finite mixture modeling, zero-inflated modeling, and tolerance regions.  He also has extensive teaching experience spanning numerous undergraduate and graduate Statistics courses, as well as professional development presentations in Statistics.

  • Convolutional Neural Networks and Semantic Segmentation for Cloud and Ice Detection

    Abstract:

    Recent research shows the effectiveness of machine learning on image classification and segmentation. The use of artificial neural networks (ANNs) on image datasets such as the MNIST dataset of handwritten digits is highly effective. However, when presented with a more complex image, ANNs and other simple computer vision algorithms tend to fail. This research uses Convolutional Neural Networks (CNNs) to determine how we can differentiate between ice and clouds in the imagery of the Arctic. Instead of using ANNs, where we analyze the problem in one dimension, CNNs identify features using the spatial relationships between the pixels in an image. This technique allows us to extract spatial features, presenting us with higher accuracy. Using a CNN named the Cloud-Net Model, we analyze how a CNN performs when analyzing satellite images. First, we examine recent research on the Cloud-Net Model’s effectiveness on satellite imagery, specifically from Landsat data, with four channels: red, green, blue, and infrared. We extend and modify this model, allowing us to analyze data from the most common channels used by satellites: red, green, and blue. By training on different combinations of these three channels, we extend this analysis by testing on an entirely different data set: GOES imagery. This gives us an understanding of the impact of each individual channel in image classification. By selecting images that exist in the same geographic location and containing both ice and clouds, such as the Landsat, we test GOES analyzing the CNN’s generalizability. Finally, we present CNN’s ability to accurately identify the clouds and ice in the GOES data versus the Landsat data.

    Speaker Info:

    Prarabdha Ojwaswee Yonzon

    Cadet

    United States Military Academy (West Point)

    CDT Prarabdha “Osho” Yonzon is a first-generation Nepalese American raised in Brooklyn Park, Minnesota. He initially enlisted into the Minnesota National Guard in 2015 as an Aviation Operation Specialist, and he was later accepted into USMAPS in 2017. He is an Applied Statistics Data Science Major from the United States Military Academy. Osho is passionate about his research. He first started working with West Point Department of Physics to examine impacts on GPS solutions. Later, he published a few articles and presented them at the AWRA annual conference for modeling groundwater flow with the Math department. Currently, he is working with the West Point Department of Mathematics and Lockheed Martin to create machine learning algorithms to detect objects in images. He plans to attend graduate school for data science and serve as a cyber officer.

     

     

     

  • Data Integrity For Deep Learning Models

    Abstract:

    Deep learning models are built from algorithm frameworks that fit parameters over a large set of structured historical examples. Model robustness relies heavily on the accuracy and quality of the input training datasets. This mini-tutorial seeks to explore the practical implications of data quality issues when attempting to build reliable and accurate deep learning models. The tutorial will review the basics of neural networks, model building, and then dive deep into examples and data quality considerations using practical examples. An understanding of data integrity and data quality is pivotal for verification and validation of deep learning models, and this tutorial will provide students with a foundation of this topic.

    Speaker Info:

    John Cilli

    US Army, CCDC Armaments Center

  • Data Integrity For Deep Learning Models

    Abstract:

    Deep learning models are built from algorithm frameworks that fit parameters over a large set of structured historical examples. Model robustness relies heavily on the accuracy and quality of the input training datasets. This mini-tutorial seeks to explore the practical implications of data quality issues when attempting to build reliable and accurate deep learning models. The tutorial will review the basics of neural networks, model building, and then dive deep into examples and data quality considerations using practical examples. An understanding of data integrity and data quality is pivotal for verification and validation of deep learning models, and this tutorial will provide students with a foundation of this topic.

    Speaker Info:

    Roshan Patel

    Systems Engineer/Data Scientist

    US Army

    Mr. Roshan Patel is a systems engineer and data scientist working at CCDC Armament Center. His role focuses on systems engineering infrastructure, statistical modeling, and the analysis of weapon systems.  He holds a Masters of Computer Science from Rutgers University, where he specialized in operating systems programming and machine learning. Mr. Patel is the current AI lead for the Systems Engineering Directorate at CCDC Armaments Center.

  • Data Integrity For Deep Learning Models

    Abstract:

    Deep learning models are built from algorithm frameworks that fit parameters over a large set of structured historical examples. Model robustness relies heavily on the accuracy and quality of the input training datasets. This mini-tutorial seeks to explore the practical implications of data quality issues when attempting to build reliable and accurate deep learning models. The tutorial will review the basics of neural networks, model building, and then dive deep into examples and data quality considerations using practical examples. An understanding of data integrity and data quality is pivotal for verification and validation of deep learning models, and this tutorial will provide students with a foundation of this topic.

    Speaker Info:

    Victoria Gerardi

    US Army, CCDC Armaments Center

  • Data Science & ML-Enabled Terminal Effects Optimization

    Abstract:

    Warhead design and performance optimization against a range of targets is a foundational aspect of the Department of the Army’s mission on behalf of the warfighter. The existing procedures utilized to perform this basic design task do not fully leverage the exponential growth in data science, machine learning, distributed computing, and computational optimization. Although sound in practice and methodology, existing implementations are laborious and computationally expensive, thus limiting the ability to fully explore the trade space of all potentially viable solutions. An additional complicating factor is the fast paced nature of many Research and Development programs which require equally fast paced conceptualization and assessment of warhead designs.
    By utilizing methods to take advantage of data analytics, the workflow to develop and assess modern warheads will enable earlier insights, discovery through advanced visualization, and optimal integration of multiple engineering domains. Additionally, a framework built on machine learning would allow for the exploitation of past studies and designs to better inform future developments. Combining these approaches will allow for rapid conceptualization and assessment of new and novel warhead designs.
    US overmatch capability is quickly eroding across many tactical and operational weapon platforms. Traditional incremental improvement approaches are no longer generating appreciable performance improvements to warrant investment. Novel next generation techniques are required to find efficiencies in designs and leap forward technologies to maintain US superiority. The proposed approach seeks to shift existing design mentality to meet this challenge.

    Speaker Info:

    John Cilli

    Computer Scientist

    Picatinny Arsenal

    My name is John Cilli, I am a recent graduate of East Stroudsburg University with a bachelor's in Computer Science.  I have been working at Picatinny within the Systems Analysis Division as a computer scientist for little over a year now.

  • Deep learning aided inspection of additively manufactured metals

    Abstract:

    The performance and reliability of additively manufactured (AM) metals is limited by the ubiquitous presence of void- and crack-like defects that form during processing. Many applications require non-destructive evaluation of AM metals to detect potentially critical flaws. To this end, we propose a deep learning approach that can help with the interpretation of inspection reports. Convolutional neural networks (CNN) are developed to predict the elastic stress fields in images of defect-containing metal microstructures, and therefore directly identify critical defects. A large dataset consisting of the stress response of 100,000 random microstructure images is generated using high-resolution Fast Fourier Transform-based finite element (FFT-FE) calculations, which is then used to train a modified U-Net style CNN model. The trained U-Net model more accurately predicted the stress response compared to previous CNN architectures, exceeded the accuracy of low-resolution FFT-FE calculations, and were evaluated more than 100 times faster than conventional FE techniques. The model was applied to images of real AM microstructures with severe lack of fusion defects, and predicted a strong linear increase of maximum stress as a function of pore fraction. This work shows that CNNs can aid the rapid and accurate inspection of defect-containing AM material.

    Speaker Info:

    Brendan Croom

    Postdoctoral Fellow

    JHU Applied Physics Laboratory

    Dr. Croom joined Applied Physics Laboratory in 2020 as a Postdoctoral Researcher within the Multifunctional Materials and Nanostructures group. At APL, my work has focused on developing quantitative inspection, analysis and testing tools to ensure the reliability of additively manufactured metals, which commonly fail due to defects that were created during processing. This work involves pushing the capabilities of X-ray Computed Tomography imaging techniques in terms of speed and resolution to better resolve defects, and using machine learning to improve defect detection and measurement interpretation. Before joining APL, Dr. Croom was an NRC Postdoctoral Research Fellow at the Materials and Manufacturing Directorate at Air Force Research Laboratory, where he worked to study the fiber alignment, defect formation, and fracture behavior of additively manufactured composites. He completed his Ph.D. at the University of Virginia in 2019, where he developed several in situ X-ray Computed Tomography mechanical testing techniques.

  • Enabling Enhanced Validation of NDE Computational Models and Simulations

    Abstract:

    Enabling Enhanced Validation of NDE Computational Models and Simulations

    William C. Schneck, III, Ph.D.
    Elizabeth D. Gregory, Ph.D.
    NASA Langley Research Center

    Computer simulations of physical processes are increasingly used in the development, design, deployment, and life-cycle maintenance of many engineering systems [1] [2]. Non-Destructive Evaluation (NDE) and Structural Health Monitoring (SHM) must employ effective methods to inspect increasingly complex structural and material systems developed for new aerospace systems. Reliably and comprehensively interrogating this multidimensional [3] problem domain from a purely experimental perspective can become cost and time prohibitive. The emerging way to confront these new complexities in a timely and cost-effective manner is to utilize computer simulations. These simulations must be Verified and Validated [4] [5] to assure reliable use for these NDE/SHM applications.

    Beyond the classical use of models for engineering applications for equipment or system design efforts, NDE/SHM are necessarily applied to as-built and as-used equipment. While most structural or CFD models are applied to ascertain performance of as-designed systems, the performance of an NDE/SHM system is necessarily tied to the indications of damage/defects/deviations (collectively, flaws) within as-built and as-used structures and components. Therefore, the models must have sufficient fidelity to determine the influence of these aberrations on the measurements collected during interrogation. To assess the accuracy of these models, the Validation data sets must adequately encompass these flaw states.

    Due to the extensive parametric spaces that this coverage would entail, this talk proposes an NDE Benchmark Validation Data Repository, which should contain inspection data covering representative structures and flaws. This data can be reused from project to project, amortizing the cost of performing high quality Validation testing.

    Works Cited

    [1] Director, Modeling and Simulation Coordination Office, "Department of Defense Standard Practice: Documentation of Verification, Validation, and Accredation (VV&A) for Models and Simulations," Department of Defense, 2008.
    [2] Under Secretary of Defense (Acquisition, Technology and Logistics), "DoD Modeling and Simulation (M&S) Verification, Validation, and Accredation (VV&A)," Department of Defense, 2003.
    [3] R. C. Martin, Clean Architecture: A Craftsman's Guide to Software Structure and Design, Boston: Prentice Hall, 2018.
    [4] C. J. Roy and W. L. Oberkampf, "A Complete Framework for Verification, Validation, and Uncertainty Quantification in Scientific Computing (Invited)," in 48th AIAA Aerospace Sciences Meeting, Orlando, 2010.
    [5] ASME Performance Test Code Committee 60, "Guide for Verification and Validation in Computational Solid Mechanics," ASME International, New York, 2016.

    Speaker Info:

    William C. Schneck, III

    Research AST

    NASA LaRC

  • Estimating the time of sudden shift in the location or scale of ergodic-stationary process

    Abstract:

    Autocorrelated sequences arise in many modern-day industrial applications. In this paper, our focus is on estimating the time of sudden shift in the location or scale of a continuous ergodic-stationary sequence following a genuine signal from a statistical process control chart. Our general approach involves “clipping” the continuous sequence at the median or interquartile range (IQR) to produce a binary sequence, and then modeling the joint mass function for the binary sequence using a Bahadur approximation. We then derive a maximum likelihood estimator for the time of sudden shift in the mean of the binary sequence. Performance comparisons are made between our proposed change point estimator and two other viable alternatives. Although the literature contains existing methods for estimating the time of sudden shift in the mean and/or variance of a continuous process, most are derived under strict independence and distributional assumptions. Such assumptions are often too restrictive, particularly when applications involve Industry 4.0 processes where autocorrelation is prevalent and the distribution of the data is likely unknown. The change point estimation strategy proposed in this work easily incorporates autocorrelation and is distribution-free. Consequently, it is widely applicable to modern-day industrial processes.

    Speaker Info:

    Zhi Wang

    Data Scientist Contractor

    Bayer Crop Science

    Zhi Wang is currently a data scientist contractor at Bayer Crop Science focused on advancing field operation using modern data analytic tools and methods. Dr. Wang's research interests include changepoint detection and estimation, statistical process monitoring, business analytics, and geospatial environmental modeling.

  • Everyday Reproducibility

    Abstract:

    Modern data analysis is typically quite computational. Correspondingly, sharing scientific and statistical work now often means sharing code and data in addition writing papers and giving talks. This type of code sharing faces several challenges. For example, it is often difficult to take code from one computer and run it on another due to software configuration, version, and dependency issues. Even if the code runs, writing code that is easy to understand or interact with can be difficult. This makes it difficult to assess third-party code and its findings, for example, in a review process. In this talk we describe a combination of two computing technologies that help make analyses shareable, interactive, and completely reproducible. These technologies are (1) analysis containerization, which leverages virtualization to fully encapsulate analysis, data, code and dependencies into an interactive and shareable format, and (2) code notebooks, a literate programming format for interacting with analyses. This talks reviews both the problems at the high-level and also provides concrete solutions to the challenges faced. In addition to discussing reproducibility and data/code sharing generally, we will touch upon several such issues that arise specifically in the defense and aerospace communities.

    Speaker Info:

    Gregory J. Hunt

    Assistant Professor

    William & Mary

    Greg is an Assistant Professor of Mathematics at the College of William & Mary. He is an interdisciplinary researcher that builds scientific tools and is trained as a statistician, mathematician and computer scientist. Currently he works on a diverse set of problems in high-throughput micro-biology, research reproducibility, hypersonics, and spectroscopy.

  • Experiment Design and Visualization Techniques for an X-59 Low-boom Variability Study

    Abstract:

    This presentation outlines the design of experiments approach and data visualization techniques for a simulation study of sonic booms from NASA’s X-59 supersonic aircraft. The X-59 will soon be flown over communities across the contiguous USA as it produces a low-loudness sonic boom, or low-boom. Survey data on human perception of low-booms will be collected to support development of potential future commercial supersonic aircraft noise regulatory standards. The macroscopic atmosphere plays a critical role in the loudness of sonic booms. The extensive sonic boom simulation study presented herein was completed to assess climatological, geographical, and seasonal effects on the variability of the X-59’s low-boom loudness and noise exposure region size in order to inform X-59 community test planning. The loudness and extent of the noise exposure region make up the “sonic boom carpet.” Two spatial and temporal resolutions of atmospheric input data to the simulation were investigated. A Fast Flexible Space-Filling Design was used to select the locations across the USA for the two spatial resolutions. Analysis of simulated X-59 low-boom loudness data within a regional subset of the northeast USA was completed using a bootstrap forest to determine the final spatial and temporal resolution of the countrywide simulation study. Atmospheric profiles from NOAA’s Climate Forecast System Version 2 database were used to generate over one million simulated X-59 carpets at the final selected 138 locations across the USA. Effects of aircraft heading, season, geography, and climate zone on low-boom levels and noise exposure region size were analyzed. Models were developed to estimate loudness metrics throughout the USA for X-59 supersonic cruise overflight, and results were visualized on maps to show geographical and seasonal trends. These results inform regulators and mission planners on expected variations in boom levels and carpet extent from atmospheric variations. Understanding potential carpet variability is important when planning community noise surveys using the X-59.

    Speaker Info:

    William J Doebler

    Research Aerospace Engineer

    NASA Langley Research Center

    Will Doebler is a research engineer in NASA Langley’s Structural Acoustic Branch. He supports NASA’s Commercial Supersonic Technology project as a member of the Community Test Planning and Execution team for the X-59 low-boom supersonic aircraft. He has a M.S. in Acoustics from Penn State, and a B.A. in Physics from Gustavus Adolphus College in MN.

  • Exploring the behavior of Bayesian adaptive design of experiments

    Abstract:

    Physical experiments in the national security arena, including nuclear deterrence, are often expensive and time-consuming resulting in small sample sizes which make it difficult to achieve desired statistical properties. Bayesian adaptive design of experiments (BADE) is a sequential design of experiment approach which updates the test design in real time, in order to optimally collect data. BADE recommends ending experiments early by either concluding that the experiment would have ended in efficacy or futility, had the testing completely finished, with sufficiently high probability. This is done by using data already collected and marginalizing over the remaining uncollected data and updating the Bayesian posterior distribution in near real-time. BADE has seen successes in clinical trials, resulting in quicker and more effective assessments of drug trials while also reducing ethical concerns. BADE has typically only been used in futility studies rather than efficacy studies for clinical trials, although there hasn’t been much debate for this current paradigm. BADE has been proposed for testing in the national security space for similar reasons of quicker and cheaper test series. Given the high-consequence nature of the tests performed in the national security space, a strong understanding of new methods is required before being deployed. The main contribution of this research was to reproduce results seen in previous studies, for different aspects of model performance. A large simulation inspired by a real testing problem at Sandia National Laboratories was performed to understand the behavior of BADE under various scenarios, including shifts to mean, standard deviation, and distributional family, all in addition to the presence of outliers. The results help explain the behavior of BADE under various assumption violations. Using the results of this simulation, combined with previous work related to BADE in this field, it is argued this approach could be used as part of an “evidence package” for deciding to stop testing early due to futility, or with stronger evidence, efficacy. The combination of expert knowledge with statistical quantification provides the stronger evidence necessary for a method in its infancy in a high-consequence, new application area such as national security.

    Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525.

    Speaker Info:

    Daniel Ries

    Sandia National Laboratories

    Daniel Ries is a Senior Member of the Technical Staff at Sandia National Laboratories in the Statistics and Data Analytics Department. As an applied research statistician, Daniel collaborates with scientists and engineers in fields including nuclear deterrence, nuclear forensics, nuclear non-proliferation, global security, and climate science. His statistical work spans the topics of experimental design, inverse modeling, uncertainty quantification for machine learning and deep learning, spatio-temporal data analysis, and Bayesian methodology. Daniel completed his PhD in statistics at Iowa State University.

  • Forecasting with Machine Learning

    Abstract:

    The Department of Defense (DoD) has a considerable interest in forecasting key quantities of interest including demand signals, personnel flows, and equipment failure. Many forecasting tools exist to aid in predicting future outcomes, and there are many methods to evaluate the quality and uncertainty in those forecasts. When used appropriately, these methods can facilitate planning and lead to dramatic reductions in costs. This talk explores the application of machine learning algorithms, specifically gradient-boosted tree models, to forecasting and presents some of the various advantages and pitfalls of this approach. We conclude with an example where we use gradient-boosted trees to forecast Air National Guard personnel retention.

    Speaker Info:

    Akshay Jain

    Data Science Fellow

    IDA

    Akshay earned his Bachelor of Arts in Math, Political Science, Mathematical Methods in the Social Sciences (MMSS) from Northwestern University.  He is currently a Data Science Fellow in the Strategy, Forces, and Resources Division at the Institute for Defense Analyses.

  • From Gripe to Flight: Building an End-to-End Picture of DOD Sustainment

    Abstract:

    The DOD has to maintain readiness across a staggeringly diverse array of modern weapon systems, yet no single person or organization in the DOD has an end-to-end picture of the sustainment system that supports them. This shortcoming can lead to bad decisions when it comes to allocating resources in a funding-constrained environment. The underlying problem is driven by stovepiped databases, a reluctance to share data even internally, and a reliance on tribal knowledge of often cryptic data sources. Notwithstanding these difficulties, we need to create a comprehensive picture of the sustainment system to be able to answer pressing questions from DOD leaders. To that end, we have created a documented and reproducible workflow that shepherds raw data from DOD databases through cleaning and curation steps, and then applies logical rules, filters, and assumptions to transform the raw data into concrete values and useful metrics. This process gives us accurate, up-to-date data that we use to support quick-turn studies, and to rapidly build (and efficiently maintain) a suite of readiness models for a wide range of complex weapon systems.

    Speaker Info:

    Benjamin Ashwell

    Research Staff Member

    IDA

    Dr. Benjamin Ashwell has been a Research Staff Member at the Institute for Defense Analyses (IDA) since 2015. A founding member of IDA’s Sustainment Analysis Group, he leads the NAVAIR Sustainment analysis task. This work combines deep data analysis with end-to-end stochastic simulations to tie resource investments to flight line readiness outcomes. Before moving to the Sustainment Group in 2019, Dr. Ashwell spent three years supporting the Director of Operational Test and Evaluation’s (DOT&E’s) analysis of the Navy’s Littoral Combat Ship, specializing in surface warfare and system reliability.  Dr. Ashwell received his PhD in Chemistry from Northwestern University in 2015.

  • Introducing git for reproducible research

    Abstract:

    Version control software manages different versions of files, providing both an archive of files, a means to manage multiple versions of a file, and perhaps distribution. Perhaps the most popular program in the computer science community for version control is git, which serves as the backbone for websites such as Github, Bitbucket, and others. In this mini-tutorial we will introduce basics of version control in general, git in particular. We explain what role git plays in a reproducible research context. The goal of the course is to get participants started using git. We will create and clone repositories, add and track files in a repository, and manage git branches. We also discuss a few git best practices.

    Speaker Info:

    Curtis Miller

    Research Staff Member

    IDA

    Curtis Miller is a Research Staff Member at the Institute for Defense Analyses in the Operational Evaluation Division, where he is a member of the Test Science and Naval Warfare groups. He obtained a PhD in Mathematics at the University of Utah in 2020, where he studied mathematical statistics. He provides statistical expertise to the rest of OED and works primarily on design of experiments and analysis of modeling and simulation data.

  • Kernel Regression, Bernoulli Trial Responses, and Designed Experiments

    Abstract:

    Boolean responses are common for both tangible and simulation experiments. Well known approaches to fit models to Boolean responses include ordinary regression with normal approximations or variance stabilizing transforms, and logistic regression. Less well known is kernel regression. This session will present properties of kernel regression, its application to Bernoulli trial experiments, and other lessons learned from using kernel regression in the wild. Kernel regression is a non-parametric method. This requires modifications to many analyses, such as the required sample size. Unlike ordinary regression, the experiment design and model solution interact with each other. Consequently, the number of experiment samples for a desired modeling accuracy depends on the true state of nature. There has been trend in increasingly large simulation sample sizes as computing horsepower has grown. With kernel regression there is a point of diminishing return on sample sizes. That is, an experiment is better off with more data sites once a sufficient sample size is reached. Confidence interval accuracy is also dependent on the true state of nature. Parsimonious model tuning is required for accurate confidence intervals. Kernel tuning to build a parsimonious model using cross validation methods will be illustrated.

    Speaker Info:

    John Lipp

    LM Fellow

    Lockheed Martin, Systems Engineering

    Dr. John Lipp received his Electrical Engineering PhD from Michigan Technological University in the area of stochastic signal processing.  He currently is employed at Lockheed Martin where he holds the position of Fellow.  He teaches statistics and probability for engineers, Kalman filtering, design of experiments, and statistical verification and validation.

  • Legal, Moral, and Ethical Implications of Machine Learning

    Abstract:

    Machine learning algorithms can help to distill vast quantities of information to support decision making. However, machine learning also presents unique legal, moral, and ethical concerns – ranging from potential discrimination in personnel applications to misclassifying targets on the battlefield. Building on foundational principles in ethical philosophy, this presentation summarizes key legal, moral, and ethical criteria applicable to machine learning and provides pragmatic considerations and recommendations.

    Speaker Info:

    Alan B. Gelder

    Research Staff Member

    IDA

    Alan earned his PhD in Economics from the University of Iowa in 2014 and currently leads the Human Capital Group in the Strategy, Forces, and Resources Division at the Institute for Defense Analyses.  He specializes in microeconomics, game theory, experimental and behavioral economics, and machine learning, and his research focuses on personnel attrition and related questions for the DOD.

  • Let's stop talking about "transparency" with regard to AI

    Abstract:

    For AI-enabled and autonomous systems, issues of safety, security, and mission effectiveness are not separable—the same underlying data and software give rise to interrelated risks in all of these dimensions. If treated separately, there is considerable unnecessary duplication (and sometimes mutual interference) among efforts needed to satisfy commanders, operators, and certification authorities of the systems’ dependability. Assurances cases, pioneered within the safety and cybersecurity communities, provide a structured approach to simultaneously verifying all dimensions of system dependability with minimal redundancy of effort. In doing so, they also provide a more concrete and useful framework for system development and explanation of behavior than is generally seen in discussions of “transparency” and “trust” in AI and autonomy. Importantly, trust generally cannot be “built in” to systems, because the nature of the assurance arguments needed for various stakeholders requires iterative identification of evidence structures that cannot be anticipated by developers.

    Speaker Info:

    David Sparrow

    Senior Analyst

    IDA

    David A. Sparrow received a Ph.D. in physics in 1974, and spent 12 years as an academic physicist. He joined IDA in 1986 and has been a Research Staff Member ever since, with brief forays into management and government service. He was the first Director of the IDA Simulation Center from 1989 to 1990, and Assistant Director of the Science and Technology Division from 1993 to 1997. He then joined the government for a two-year stint as Science Advisor on Modeling and Simulation to the Director, Operational Test and Evaluation. Since returning to IDA, he has focused on technical issues in system development, especially ground combat systems—expansively defined to include unexploded ordinance (UXO), counter mine, and, occasionally, missile defense. Recent emphasis has been on AI-enabled autonomous systems for defense applications.  He has authored ~100 refereed papers and invited talks on various academic and national security topics.

  • Let's stop talking about "transparency" with regard to AI

    Abstract:

    For AI-enabled and autonomous systems, issues of safety, security, and mission effectiveness are not separable—the same underlying data and software give rise to interrelated risks in all of these dimensions. If treated separately, there is considerable unnecessary duplication (and sometimes mutual interference) among efforts needed to satisfy commanders, operators, and certification authorities of the systems’ dependability. Assurances cases, pioneered within the safety and cybersecurity communities, provide a structured approach to simultaneously verifying all dimensions of system dependability with minimal redundancy of effort. In doing so, they also provide a more concrete and useful framework for system development and explanation of behavior than is generally seen in discussions of “transparency” and “trust” in AI and autonomy. Importantly, trust generally cannot be “built in” to systems, because the nature of the assurance arguments needed for various stakeholders requires iterative identification of evidence structures that cannot be anticipated by developers.

    Speaker Info:

    David Tate

    Senior Analyst

    IDA

    David Tate joined the research staff of IDA’s Cost Analysis and Research Division in 2000.  In his 20 years with CARD, he has worked on a wide variety of topics, including research into

    • test and evaluation challenges of AI and autonomy
    • risks in the defense software industrial base
    • technical barriers to mobile ad hoc networking
    • limiting factors for rapid acquisition.

    Prior to coming to IDA, Dr. Tate was Senior Operations Research Analyst for Telecommunications at Decision-Science Applications, Inc.  Before that, he was an Assistant Professor of Industrial Engineering at the University of Pittsburgh.  Dr. Tate holds bachelor’s degrees in Philosophy and Mathematical Sciences from the Johns Hopkins University, and M.S. and Ph.D. degrees in Operations Research from Cornell University.

  • Leveraging Data Science and Cloud Tools to Enable Continuous Reporting

    Abstract:

    The DoD’s challenge to provide test results at the “Speed of Relevance” has generated many new strategies to accelerate data collection, adjudication, and analysis. As a result, the Air Force Operational Test and Evaluation Center (AFOTEC), in conjunction with the Air Force Chief Data Office’s Visible, Accessible, Understandable, Linked and Trusted Data Platform (VAULT), is developing a Survey Application. This new cloud-based application will be deployable on any AFNET-connected computer or tablet and merges a variety of tools for collection, storage, analytics, and decision-making into one easy-to-use platform. By placing cloud-computing power in the hands of operators and testers, authorized users can view report-quality visuals and statistical analyses the moment a survey is submitted. Because the data is stored in the cloud, demanding computations such as machine learning are run at the data source to provide even more insight into both quantitative and qualitative metrics. The T-7A Red Hawk will be the first operational test (OT) program to utilize the Survey Application. Over 1000 flying and simulator test points have been loaded into the application, with many more coming from developmental test partners. The Survey app development will continue as USAF testing commences. Future efforts will focus on making the Survey Application configurable to other research and test programs to enhance their analytic and reporting capabilities.

    Speaker Info:

    Timothy Dawson

    Lead Mobility Test Operations Analyst

    AFOTEC Detachment 5

    irst Lieutenant Timothy Dawson is an operational test analyst assigned with the Air Force Operational Test and Evaluation Center, Detachment 5, at Edwards AFB, Ca. The lieutenant serves as the lead AFOTEC Mobility Training Operations analyst, splitting his work between the T-7A Red Hawk high performance trainer, KC-46A Pegasus tanker, and VC-25B presidential transport. Lieutenant Dawson also serves alongside the 416th Flight Test Squadron as a flight test engineer on the T-38C Talon.

     

    Lieutenant Dawson, originally from Olympia, Wa., received his commission as a second lieutenant upon completing ROTC at the University of California, Berkeley in 2019. He served as a student pilot at Vance AFB, Ok., leading data analysis and software development projects before arriving to his current duty location at Edwards.

  • M&S approach for quantifying readiness impact of sustainment investment scenarios

    Abstract:

    Sustainment for weapon systems involves multiple components that influence readiness outcomes through a complex array of interactions. While military leadership can use simple analytical approaches to yield insights into current metrics (e.g., dashboard for top downtime drivers) or historical trends of a given sustainment structure (e.g., correlative studies between stock sizes and backorders), they are inadequate tools for guiding decision-making due to their inability to quantify the impact on readiness. In this talk, we discuss the power of IDA’s end-to-end modeling and simulation (M&S) approach that estimates time-varying readiness outcomes based on real-world data on operations, supply, and maintenance. These models are designed to faithfully emulate fleet operations at the level of individual components and operational units, as well as to incorporate the multi-echelon inventory system used in military sustainment. We showcase a notional example in which our M&S approach produces a set of recommended component-level investments and divestments in wholesale supply that would improve the readiness of a weapon system. We argue for the urgency of increased end-to-end M&S efforts across the Department of Defense to guide the senior leadership in its data-driven decision-making for readiness initiatives.

    Speaker Info:

    Andrew C. Flack

    Research Staff Member

    IDA (OED)

    Andrew Flack is a Research Staff Member in the Operational Evaluation division at IDA. His work focuses on weapons system sustainment and readiness modeling. Prior to joining IDA in 2016, Andrew was an analyst at the Defense Threat Reduction Agency (DTRA) studying M&S tools for chemical and biological defense.

  • M&S approach for quantifying readiness impact of sustainment investment scenarios

    Abstract:

    Sustainment for weapon systems involves multiple components that influence readiness outcomes through a complex array of interactions. While military leadership can use simple analytical approaches to yield insights into current metrics (e.g., dashboard for top downtime drivers) or historical trends of a given sustainment structure (e.g., correlative studies between stock sizes and backorders), they are inadequate tools for guiding decision-making due to their inability to quantify the impact on readiness. In this talk, we discuss the power of IDA’s end-to-end modeling and simulation (M&S) approach that estimates time-varying readiness outcomes based on real-world data on operations, supply, and maintenance. These models are designed to faithfully emulate fleet operations at the level of individual components and operational units, as well as to incorporate the multi-echelon inventory system used in military sustainment. We showcase a notional example in which our M&S approach produces a set of recommended component-level investments and divestments in wholesale supply that would improve the readiness of a weapon system. We argue for the urgency of increased end-to-end M&S efforts across the Department of Defense to guide the senior leadership in its data-driven decision-making for readiness initiatives.

    Speaker Info:

    Andrew C. Flack

    Research Staff Member

    IDA (OED)

    Andrew Flack is a Research Staff Member in the Operational Evaluation division at IDA. His work focuses on weapons system sustainment and readiness modeling. Prior to joining IDA in 2016, Andrew was an analyst at the Defense Threat Reduction Agency (DTRA) studying M&S tools for chemical and biological defense.

  • M&S approach for quantifying readiness impact of sustainment investment scenarios

    Abstract:

    Sustainment for weapon systems involves multiple components that influence readiness outcomes through a complex array of interactions. While military leadership can use simple analytical approaches to yield insights into current metrics (e.g., dashboard for top downtime drivers) or historical trends of a given sustainment structure (e.g., correlative studies between stock sizes and backorders), they are inadequate tools for guiding decision-making due to their inability to quantify the impact on readiness. In this talk, we discuss the power of IDA’s end-to-end modeling and simulation (M&S) approach that estimates time-varying readiness outcomes based on real-world data on operations, supply, and maintenance. These models are designed to faithfully emulate fleet operations at the level of individual components and operational units, as well as to incorporate the multi-echelon inventory system used in military sustainment. We showcase a notional example in which our M&S approach produces a set of recommended component-level investments and divestments in wholesale supply that would improve the readiness of a weapon system. We argue for the urgency of increased end-to-end M&S efforts across the Department of Defense to guide the senior leadership in its data-driven decision-making for readiness initiatives.

    Speaker Info:

    Han G. Yi

    Research Staff Member

    IDA (OED)

    Han Yi is a Research Staff Member in the Operational Evaluation division at IDA.  His work focuses on weapons system sustainment and readiness modeling. Prior to joining IDA in 2020, he completed his PhD in Communication Sciences and Disorders at The University of Texas at Austin and served as a Postdoctoral Scholar at the University of California, San Francisco.

  • Machine Learning for Efficient Fuzzing

    Abstract:

    A high level of security in software is a necessity in today’s world; the best way to achieve confidence in security is through comprehensive testing. This paper covers the development of a fuzzer that explores the massively large input space of a program using machine learning to find the inputs most associated with errors. A formal methods model of the software in question is used to generate and evaluate test sets. Using those test sets, a two-part algorithm is used: inputs get modified according to their Hamming distance from error-causing inputs and then a tree-based model learns the relative importance of each variable in causing errors. This architecture was tested against a model of an aircraft’s thrust reverser and predefined model properties offered a starting test set. From there, the hamming algorithm and importance model expand upon the original set to offer a more informed set of test cases. This system has great potential in producing efficient and effective test sets and has further applications in verifying the security of software programs and cyber-physical systems, contributing to national security in the cyber domain.

    Speaker Info:

    John Richie

    Cadet

    USAFA

  • Machine Learning for Uncertainty Quantification: Trusting the Black Box

    Abstract:

    Adopting uncertainty quantification (UQ) has become a prerequisite for providing credibility in modeling and simulation (M&S) applications. It is well known, however, that UQ can be computationally prohibitive for problems involving expensive high-fidelity models, since a large number of model evaluations is typically required. A common approach for improving efficiency is to replace the original model with an approximate surrogate model (i.e., metamodel, response surface, etc.) using machine learning that makes predictions in a fraction of the time. While surrogate modeling has been commonplace in the UQ field for over a decade, many practitioners still remain hesitant to rely on “black box” machine learning models over trusted physics-based models (e.g., FEA) for their analyses.

    This talk discusses the role of machine learning in enabling computational speedup for UQ, including traditional limitations and modern efforts to overcome them. An overview of surrogate modeling and its best practices for effective use is first provided. Then, some emerging methods that aim to unify physics-based and data-based approaches for UQ are introduced, including multi-model Monte Carlo simulation and physics-informed machine learning. The use of both traditional surrogate modeling and these more advanced machine learning methods for UQ are highlighted in the context of applications at NASA, including trajectory simulation and spacesuit certification.

    Speaker Info:

    James Warner

    Computational Scientist

    NASA Langley Research Center

    Dr. James (Jim) Warner joined NASA Langley Research Center (LaRC) in 2014 as a Research Computer Engineer after receiving his PhD in Computational Solid Mechanics from Cornell University. Previously, he received his B.S. in Mechanical Engineering from SUNY Binghamton University and held temporary research positions at the National Institute of Standards and Technology and Duke University. Dr. Warner is a member of the Durability, Damage Tolerance, and Reliability Branch (DDTRB) at LaRC, where he focuses on developing computationally-efficient approaches for uncertainty quantification for a range of applications including structural health management, additive manufacturing, and trajectory simulation. Additionally, he works to bridge the gap between UQ research and NASA mission impact, helping to transition state-of-the-art methods to solve practical engineering problems. To that end, he is currently involved in efforts to certify the xEMU spacesuit and develop guidance systems for entry, descent, and landing for Mars landing. His other research interests include machine learning, high performance computing, and topology optimization.

  • Measuring training efficacy: Structural validation of the Operational Assessment of Training

    Abstract:

    Effective training of the broad set of users/operators of systems has downstream impacts on usability, workload, and ultimate system performance that are related to mission success. In order to measure training effectiveness, we designed a survey called the Operational Assessment of Training Scale (OATS) in partnership with the Army Test and Evaluation Center (ATEC). Two subscales were designed to assess the degrees to which training covered relevant content for real operations (Relevance subscale) and enabled self-rated ability to interact with systems effectively after training (Efficacy subscale). The full list of 15 items were given to over 700 users/operators across a range of military systems and test events (comprising both developmental and operational testing phases). Systems included vehicles, aircraft, C3 systems, and dismounted squad equipment, among other types. We evaluated reliability of the factor structure across these military samples using confirmatory factor analysis. We confirmed that OATS exhibited a two-factor structure for training relevance and training efficacy. Additionally, a shortened, six-item measure of the OATS with three items per subscale continues to fit observed data well, allowing for quicker assessments of training. We discuss various ways that the OATS can be applied to one-off, multi-day, multi-event, and other types of training events. Additional OATS details and information about other scales for test and evaluation are available at the Institute for Defense Analyses’ web site, https://testscience.mystagingwebsite.com/validated-scales-repository/.

    Speaker Info:

    Brian Vickers

    Research Staff Member

    IDA

    Dr. Brian Vickers received his PhD in Cognition and Cognitive Neuroscience from the University of Michigan in 2015 on the topic of how decision architectures and human factors influence people’s decisions about time, money, and material options. Since that time, he was worked in data and decision science, and is currently a Research Staff Member at the Institute for Defense Analyses supporting topics including operational testing, test science, and artificial intelligence.

  • Method for Evaluating Bayesian Reliability Models for Developmental Testing

    Abstract:

    For analysis of military Developmental Test (DT) data, frequentist statistical models are increasingly challenged to meet the needs of analysts and decision-makers. Bayesian models have the potential to address this challenge. Although there is a substantial body of research on Bayesian reliability estimation, there appears to be a paucity of Bayesian applications to issues of direct interest to DT decision makers. To address this deficiency, this research accomplishes two tasks. First, this work provides a motivating example that analyzes reliability for a notional but representative system. Second, to enable the motivated analyst to apply Bayesian methods, it provides a foundation and best practices for Bayesian reliability analysis in DT. The first task is accomplished by applying Bayesian reliability assessment methods to notional DT lifetime data generated using a Bayesian reliability growth planning methodology (Wayne 2018). The tested system is assumed to be a generic complex system with a large number of failure modes. Starting from the Bayesian assessment methodology of (Wayne and Modarres, A Bayesian Model for Complex System Reliability 2015), this work explores the sensitivity of the Bayesian results to the choice of the prior distribution and compares the Bayesian results for the reliability point estimate and uncertainty interval with analogous results from traditional reliability assessment methods. The second task is accomplished by establishing a generic structure for systematically evaluating relevant statistical Bayesian models. It identifies what have been implicit reliability issues for DT programs using a structured poll of stakeholders combined with interviews of a selected set of Subject Matter Experts. Secondly, candidate solutions are identified in the literature. Thirdly, solutions matched to issues using criteria designed to evaluate the capability of a solution to improve support for decision-makers at critical points in DT programs. The matching process uses a model taxonomy structured according to decisions at each DT phase, plus criteria for model applicability and data availability. The end result is a generic structure that allows an analyst to identify and evaluate a specific model for use with a program and issue of interest.

    Wayne, Martin. 2018. "Modeling Uncertainty in Reliability Growth Plans." 2018 Annual Reliability and Maintainability Symposium (RAMS). 1-6.

    Wayne, Martin, and Mohammad Modarres. 2015. "A Bayesian Model for Complex System Reliability." IEEE Transactions on Reliability 64: 206-220.

    Speaker Info:

    Paul Fanto

    Research Staff Member, System Evaluation Division

    IDA

    Dr. Paul Fanto is a Research Staff Member at the Institute for Defense Analyses.  He received a Ph.D. in Physics from Yale University, where he worked on the application of Monte Carlo methods and high-performance computing to the modeling of atomic nuclei.  His current work involves the study of space systems and the application of Bayesian statistical methods to defense system testing.

  • Method for Evaluating Bayesian Reliability Models for Developmental Testing

    Abstract:

    For analysis of military Developmental Test (DT) data, frequentist statistical models are increasingly challenged to meet the needs of analysts and decision-makers. Bayesian models have the potential to address this challenge. Although there is a substantial body of research on Bayesian reliability estimation, there appears to be a paucity of Bayesian applications to issues of direct interest to DT decision makers. To address this deficiency, this research accomplishes two tasks. First, this work provides a motivating example that analyzes reliability for a notional but representative system. Second, to enable the motivated analyst to apply Bayesian methods, it provides a foundation and best practices for Bayesian reliability analysis in DT. The first task is accomplished by applying Bayesian reliability assessment methods to notional DT lifetime data generated using a Bayesian reliability growth planning methodology (Wayne 2018). The tested system is assumed to be a generic complex system with a large number of failure modes. Starting from the Bayesian assessment methodology of (Wayne and Modarres, A Bayesian Model for Complex System Reliability 2015), this work explores the sensitivity of the Bayesian results to the choice of the prior distribution and compares the Bayesian results for the reliability point estimate and uncertainty interval with analogous results from traditional reliability assessment methods. The second task is accomplished by establishing a generic structure for systematically evaluating relevant statistical Bayesian models. It identifies what have been implicit reliability issues for DT programs using a structured poll of stakeholders combined with interviews of a selected set of Subject Matter Experts. Secondly, candidate solutions are identified in the literature. Thirdly, solutions matched to issues using criteria designed to evaluate the capability of a solution to improve support for decision-makers at critical points in DT programs. The matching process uses a model taxonomy structured according to decisions at each DT phase, plus criteria for model applicability and data availability. The end result is a generic structure that allows an analyst to identify and evaluate a specific model for use with a program and issue of interest.

    Wayne, Martin. 2018. "Modeling Uncertainty in Reliability Growth Plans." 2018 Annual Reliability and Maintainability Symposium (RAMS). 1-6.

    Wayne, Martin, and Mohammad Modarres. 2015. "A Bayesian Model for Complex System Reliability." IEEE Transactions on Reliability 64: 206-220.

    Speaker Info:

    David Spalding

    Research Staff Member, System Evaluation Division

    IDA

    Dr. Spalding is a Research Staff member at the Institute for Defense Analyses. He has a Ph.D. degree from the University of Rochester in experimental particle physics and a master’s degree from George Washington University in Computer Science. At the Institute for Defense Analyses, he has analyses aircraft and missile system issues . For the past decade he has addressed programmatic and statistical problems in developmental testing.

  • Mixed Models: A Critical Tool for Dependent Observations

    Abstract:

    The use of fixed and random effects have a rich history. They often go by other names, including blocking models, variance component models, nested and split-plot designs, hierarchical linear models, multilevel models, empirical Bayes, repeated measures, covariance structure models, and random coefficient models. Mixed models are one of the most powerful and practical ways to analyze experimental data, and investing time to become skilled with them is well worth the effort. Many, if not most, real-life data sets do not satisfy the standard statistical assumption of independent observations. Failure to appropriately model design structure can easily result in biased inferences. With an appropriate mixed model we can estimate primary effects of interest as well as compare sources of variability using common forms of dependence among sets of observations. Mixed Models can readily become the most handy method in your analytical toolbox and provide a foundational framework for understanding statistical modeling in general.

    In this course we will cover many types of mixed models, including blocking, split-plot, and random coefficients.

    Speaker Info:

    Elizabeth Claassen

    Research Statistician Developer

    JMP Statistical Discovery

    Elizabeth A. Claassen, PhD, is a Research Statistician Developer at JMP Statistical Discovery. Dr. Claassen has over a decade of experience in statistical modeling in a variety of software packages. Her chief interest is generalized linear mixed models. Dr. Claassen earned an MS and PhD in statistics from the University of Nebraska–Lincoln, where she received the Holling Family Award for Teaching Excellence from the College of Agricultural Sciences and Natural Resources. She is an author of the third edition of "SAS® for Mixed Models: An Introduction and Basic Applications" (2018) and "JMP® for Mixed Models" (2021).

  • Moderator

    Abstract:

    For organizations to make data-driven decisions, they must be able to understand and organize their mission critical data.  Recently, the DoD, NASA and other federal agencies have declared their intention to become “data-centric” organizations, but transitioning from an existing mode of operation and architecture can be challenging.  Moreover, the DoD is pushing for artificial intelligence enabled systems (AIES) and wide scale digital transformation.  These concepts in the abstract seem straightforward, but because they can only evolve when people, processes, and technology change together, they have proven challenging in execution.  Since the structure and quality of an organization’s data limits what an organization can do with that data it is imperative to get data processes right before embarking on other initiatives that depend on quality data. Despite the importance of data quality, many organizations treat data architecture as an emergent phenomenon and not something to be planned or thought through holistically. In this discussion, panelists will explore what it means to be data-centric, what a data-centric architecture is, how it is different from the other data architectures, why an organization might prefer a data-centric approach, and the challenges associated with becoming data-centric.

    Speaker Info:

    Matthew Avery

    Assistant Director, Operational Evaluation

    IDA

    Matthew Avery is an Assistant Director in the Operational Evaluation Division (OED) at the Institute for Defense Analyses (IDA)and part of OED’s Sustainment group. He represents OED on IDA’s Data Governance Council and acts as the Deputy to IDA’s Director of Data Strategy and Chief Data Officer, helping craft data-related strategy and policy.

    Matthew leads IDA’s sustainment modeling efforts for the V-22 fleet, developing end-to-end multi-echelon models to evaluate options for improving mission-capable rates for the CV-22 and MV-22 fleets. Prior to this, Matthew was on the Test Science team, where he helped develop analytical methods and tools for operational test and evaluation. As the Test Science Data Management lead, he was responsible for delivering an annual summary of major activity undertaken by the Office of the Director, Operational Test and Evaluation. Additionally, Matthew wrote and implemented OED policy on data management and reproducible research. In addition to working with the Test Science team, Matthew also led operational test and evaluation efforts of Army and Marine Corps unmanned aircraft systems. In 2018-19 Matthew served as an embedded analyst in the Pentagon’s Office of Cost Assessment and Program Evaluation, where he built state-space models in support of the Space Control Strategic Portfolio Review.

    Matthew earned his PhD in Statistics from North Carolina State University in 2012, his MS in Statistics from North Carolina State in 2009, and a BA from New College of Florida in 2006. He is a member of the American Statistical Association.

  • Next Gen Breaching Technology: A Case Study in Deterministic Binary Response Emulation

    Abstract:

    Combat Capabilities Development Command Armaments Center (DEVCOM AC) is developing the next generation breaching munition, a replacement for the M58 Mine Clearing Line Charge. A series of M&S experiments were conducted to aid with the design of mine-neutralizing submunitions, utilizing space-filling designs, support vector machines, and hyper-parameter optimization. A probabilistic meta-model of the FEA-simulated performance data was generated with Platt Scaling in order to facilitate optimization, which was implemented to generate several candidate designs for follow-up live testing. This paper will detail the procedure used to iteratively explore and extract information from a deterministic process with a binary response.

    Speaker Info:

    Eli Golden

    Statistician

    US Army DEVCOM Armaments Center

    Eli Golden, GStat is a statistician in the Systems Analysis Division of US Army Combat Capabilities Development Command Armaments Center (DEVCOM AC). He is an experienced practitioner of Design of Experiments, empirical model-building, and data visualization focusing in the domains of conventional munition development, market research, advanced manufacturing, and modelling and simulation, and is an instructor/content curator for the Probability and Statistics courses at the Armament Center's Armament Graduate School (AGS). Mr. Golden has an M.S. in Applied Statistics from New Jersey Institute of Technology, M.S. in Mechanical Engineering from Stevens Institute of Technology, and a B.S. in Mechanical Engineering with a minor in Mathematics from Lafayette College.

  • Nonparametric multivariate profile monitoring using regression trees

    Abstract:

    Monitoring noisy profiles for changes in the behavior can be used to validate whether the process is operating under normal conditions over time. Change-point detection and estimation in sequences of multivariate functional observations is a common method utilized in monitoring such profiles. A nonparametric method utilizing Classification and Regression Trees (CART) to build a sequence of regression trees is proposed which makes use of the Kolmogorov-Smirnov statistic to monitor profile behavior. Our novel method compares favorably to existing methods in the literature.

    Speaker Info:

    Daniel A. Timme

    PhD Candidate

    Florida State University

    Daniel A. Timme is currently a student pursuing his PhD in Statistics from Florida State University. Mr. Timme graduated with a BS in Mathematics from the University of Houston and a BS in Business Management from the University of Houston-Clear Lake. He earned an MS in Systems Engineering with a focus in Reliability and a second MS in Space Systems with focuses in Space Vehicle Design and Astrodynamics, both from the Air Force Institute of Technology. Mr. Timme’s research interest is primarily focused in the areas of reliability engineering, applied mathematicsstatistics, optimization, and regression.

  • Opening Remarks

    Speaker Info:

    Bram Lillard

    Director, OED

    IDA

    V. Bram Lillard assumed the role of director of the Operational Evaluation Division (OED) in early 2022. In this position, Bram provides strategic leadership, project oversight, and direction for the division’s research program, which primarily supports the Director, Operational Test and Evaluation (DOT&E) within the Office of the Secretary of Defense. He also oversees OED’s contributions to strategic studies, weapon system sustainment analyses, and cybersecurity evaluations for DOD and anti-terrorism technology evaluations for the Department of Homeland Security.

    Bram joined IDA in 2004 as a member of the research staff. In 2013-14, he was the acting science advisor to DOT&E. He then served as OED’s assistant director in 2014-21, ascending to deputy director in late 2021.

    Prior to his current position, Bram was embedded in the Pentagon where he led IDA’s analytical support to the Cost Assessment and Program Evaluation office within the Office of the Secretary of Defense. He previously led OED’s Naval Warfare Group in support of DOT&E. In his early years at IDA, Bram was the submarine warfare project lead for DOT&E programs. He is an expert in quantitative data analysis methods, test design, naval warfare systems and operations and sustainment analyses for Defense Department weapon systems.

    Bram has both a doctorate and a master’s degree in physics from the University of Maryland. He earned his bachelor’s degree in physics and mathematics from State University of New York at Geneseo. Bram is also a graduate of the Harvard Kennedy School’s Senior Executives in National and International Security program, and he was awarded IDA’s prestigious Goodpaster Award for Excellence in Research in 2017.

  • Opening Remarks

    Speaker Info:

    Norton Schwartz

    President

    IDA

    Norton A. Schwartz serves as President of the Institute for Defense Analyses (IDA), a nonprofit corporation operating in the public interest. IDA manages three Federally Funded Research and Development Centers that answer the most challenging U.S. security and science policy questions with objective analysis leveraging extraordinary scientific, technical, and analytic expertise. At IDA, General Schwartz (U.S. Air Force, retired) directs the activities of more than 1,000 scientists and technologists employed by IDA. General Schwartz has a long and prestigious career of service and leadership that spans over 5 decades. He was most recently President and CEO of Business Executives for National Security (BENS). During his 6-year tenure at BENS, he was also a member of IDA’s Board of Trustees. Prior to retiring from the U.S. Air Force, General Schwartz served as the 19th Chief of Staff of the U.S. Air Force from 2008 to 2012. He previously held senior joint positions as Director of the Joint Staff and as the Commander of the U.S. Transportation Command. He began his service as a pilot with the airlift evacuation out of Vietnam in 1975. General Schwartz is a U.S. Air Force Academy graduate and holds a master’s degree in business administration from Central Michigan University. He is also an alumnus of the Armed Forces Staff College and the National War College. He is a member of the Council on Foreign Relations and a 1994 Fellow of Massachusetts Institute of Technology’s Seminar XXI. General Schwartz has been married to Suzie since 1981.

  • Operational Cyber Resilience in Engineering and Systems Test

    Abstract:

    Cyber resilience is the ability to anticipate, withstand, recover from, and adapt to adverse conditions, stresses, attacks, or compromises on systems that use or are enabled by cyber resources. As a property defined in terms of system behavior, cyber resilience presents special challenges from a test and evaluation perspective. Typically, system requirements are specified in terms of technology function and can be tested through manipulation of the systems operational environment, controls, or inputs. Resilience, however, is a high-level property relating to the capacity of the system to recover from unwanted loss of function. There are no commonly accepted definitions of how to measure this system property. Moreover, by design, resilience behaviors are exhibited only when the system has lost critical functions. The implication is that the test and evaluation of requirements for operational resilience will involve creating, emulating, or reasoning about the internal systems states that might result from successful attacks.

    This tutorial will introduce the Framework for Operational Resilience in Engineering and System Test (FOREST), a framework that supports the derivation of measures and metrics for developmental and operational test plans and activities for cyber resilience in cyber-physical systems. FOREST aims to provide insights to support the development of testable requirements for cyber resilience and the design of systems with immunity to new vulnerabilities and threat tactics. FOREST's elements range from attack sensing to the existence and characterization of resilience modes of operation to operator decisions and forensic evaluation. The framework is meant to be a reusable, repeatable, and practical framework that calls for system designers to describe a system’s operational resilience design in a designated, partitioned manner that aligns with resilience requirements and directly relates to the development of associated test concepts and performance metrics.

    The tutorial introduces model-based systems engineering (MBSE) tools and associated engineering methods that complement FOREST and support the architecting, design, or engineering aspects of cyber resilience. Specifically, it features Mission Aware, a MBSE meta-model and associated requirements and architecture analysis process targeted to decomposition of loss scenarios into testable resilience features in a system design. FOREST, Mission Aware, and associated methodologies and digital engineering tools will be applied to two case studies for cyber resilience: (1) Silverfish, a hypothetical networked munition system and (2) an oil distribution pipeline. The case studies will lead to derivations of requirements for cyber resilience and survivability, along with associated measures and metrics.

    Speaker Info:

    Peter Beling

    Professor

    Virginia Tech

    Peter A. Beling is a professor in the Grado Department of Industrial and Systems Engineering and associate director of the Intelligent Systems Division in the Virginia Tech National Security Institute. Dr. Beling’s research interests lie at the intersections of systems engineering and artificial intelligence (AI) and include AI adoption, reinforcement learning, transfer learning, and digital engineering. He has contributed extensively to the development of methodologies and tools in support of cyber resilience in military systems. He serves on the Research Council of the Systems Engineering Research Center (SERC), a University Affiliated Research Center for the Department of Defense.

    Tom McDermott is the Deputy Director and Chief Technology Officer of the Systems Engineering Research Center at Stevens Institute of Technology in Hoboken, NJ. He leads research on Digital Engineering transformation, education, security, and artificial intelligence applications. Mr. McDermott also teaches system architecture concepts, systems thinking and decision making, and engineering leadership for universities, government, and industry. He serves on the INCOSE Board of Directors as Director of Strategic Integration.

    Tim Sherburne is a research associate in the Intelligent System Division of the Virginia Tech National Security Institute.  Sherburne was previously a member of the systems engineering staff at the University of Virginia supporting Mission Aware research through rapid prototyping of cyber resilient solutions and model-based systems engineering (MBSE) specifications. Prior to joining the University of Virginia, he worked at Motorola Solutions in various Software Development and Systems Engineering roles defining and building mission critical public safety communications systems.

  • Optimal Designs for Multiple Response Distributions

    Abstract:

    Designed experiments can be a powerful tool for gaining fundamental understanding of systems and processes or maintaining or optimizing systems and processes. There are usually multiple performance and quality metrics that are of interest in an experiment, and these multiple responses may include data from nonnormal distributions, such as binary or count data. A design that is optimal for a normal response can be very different from a design that is optimal for a nonnormal response.
    This work includes a two-phase method that helps experimenters identify a hybrid design for a multiple response problem. Mixture and optimal design methods are used with a weighted optimality criterion for a three-response problem that includes a normal, a binary, and a Poisson model, but could be generalized to an arbitrary number and combination of responses belonging to the exponential family. A mixture design is utilized to identify the optimal weights in the criterion presented.

    Speaker Info:

    Brittany Fischer

    PhD Candidate

    Arizona State University

    Brittany Fischer is a PhD candidate in industrial engineering at Arizona State University. Prior to ASU, she received her bachelor’s and master’s degrees in statistics from Pennsylvania State University and worked as a statistical engineer for 5 years at Corning Incorporated.

  • Orbital Debris Effects Prediction Tool for Satellite Constellations

    Abstract:

    Based on observations gathered from the IDA Forum on Orbital Debris (OD) Risks and Challenges (October 8-9, 2020), DOT&E needed first-order predictive tools to evaluate the effects of orbital debris on mission risk, catastrophic collision, and collateral damage to DOD spacecraft and other orbital assets – either from unintentional or intentional [Anti-Satellite (ASAT)] collisions. This lack of modeling capability hindered DOT&E’s ability to evaluate the risk to operational effectiveness and survivability of individual satellites and large constellations, as well as risks to the overall use of space assets in the future.

    Part 1 of this presentation describes an IDA-derived Excel-based tool (SatPen) for determining the probability and mission effects of >1mm orbital debris impacts and penetration on individual satellites in low Earth orbit (LEO). IDA estimated the likelihood of satellite mission loss using a Starlink-like satellite as a case study and NASA’s ORDEM 3.1 orbital debris environment as an input, supplemented with typical damage prediction equations to support mission loss predictions.

    Part 2 of this presentation describes an IDA-derived technique (DebProp) to evaluate the debris propagating effects of large, trackable debris (>5 cm) or antisatellite weapons colliding with satellites within constellations. IDA researchers again used a Starlink-like satellite as a case study and worked with Stellingwerf Associates to modify the Smooth Particle Hydrodynamic Code (SPHC) in order to predict the number and direction of fragments following a collision by a tracked satellite fragment. The result is a file format that is readable as an input file for predicting orbital stability or debris re-entry for thousands of created particles, and predict additional, short-term OD-induced losses to other satellites in the constellation.

    By pairing these techniques, IDA can predict additional, short-term and long-term OD-induced losses to other satellites in the constellation, and conduct long-term debris growth studies.

    Speaker Info:

    Joel Williamsen

    Research Staff Member

    IDA

    FIELDS OF EXPERTISE
    Air and space vehicle survivability, missile lethality, LFT&E, ballistic response, active protection systems, hypervelocity impact, space debris, crew and passenger casualty assessment

    EDUCATION HISTORY
    1993
    Doctor of Philosophy in Systems Engineering at University of Alabama, Huntsville

    1989
    Master of Science in Engineering Management at University of Alabama, Huntsville

    1983
    Bachelor of Science in Mechanical Engineering at University of Nebraska

    EMPLOYMENT HISTORY
    2003 - Present
    Research Staff Member, IDA, OED

    1998 - 2003
    Director, Center for Space Systems Survivability, University of Denver

    1987 - 1998
    Spacecraft Survivability Design , NASA-Marshall Space Flight Center, NASA

    1983 - 1987
    U.S. Army Missile Command, Research Development and Engineering Center, Warhead Design, U.S. Army

    PROFESSIONAL ACTIVITIES
    American Institute of Aeronautics and Astronautics (Chair, Survivability Technical Committee, 2001-2003)
    Tau Beta Pi Engineering Honorary Society
    Pi Tau Sigma Mechanical Engineering Honorary Society

    HONORS

    IDA Welch Award, 2020.

    National AIAA Survivability Award, 2012. Citation reads, "For outstanding achievement in enhancing spacecraft, aircraft, and crew survivability through advanced meteoroid/orbital debris shield designs, on-orbit repair techniques, risk assessment tools, and live fire evaluation."
    NASA Astronauts' Personal Achievement Award (Silver Snoopy), 2001.
    NASA Exceptional Achievement Medal, Spacecraft Survivability Analysis, 1995.
    Army Research and Development Achievement Award, 1985.
    Patents and Statutory Invention Registrations: Enhanced Hypervelocity Impact Shield, 1997. Joint.
    Patents and Statutory Invention Registrations: Pressure Wall Patch, 1994. Joint.
    Patents and Statutory Invention Registrations: Advanced Anti-Tank Airframe Configuration Tandem Warhead Missile, 1991. Joint.
    Patents and Statutory Invention Registrations: Extendible Shoulder Fired Anti-tank Missile, 1990. Joint.
    Patents and Statutory Invention Registrations: Particulated Density Shaped Charge Liner, 1987.
    Patents and Statutory Invention Registrations: High Velocity Rotating Shaped Charge Warhead, 1986.
    Patents and Statutory Invention Registrations: Missile Canting Shaped Charge Warhead, 1985. Joint.
    NASA Group Achievement Awards (Space Station), 1992-1994.
    NASA Group Achievement Awards (Hubble System Review Team) 1989, 1990.
    Outstanding Performance Awards, 1984-1988, 1990, 1992-1997.
    First NASA-MSFC representative to International Space University, 1989.

     

  • Panelist 1

    Speaker Info:

    Heather Wojton

    Chief Data Officer

    IDA

    Heather Wojton is Director of Data Strategy and Chief Data Officer at IDA, a role she assumed in 2021. In this position, Heather provides strategic leadership, project management, and direction for the corporation’s data strategy. She is responsible for enhancing IDA’s ability to efficiently and effectively accomplish research and business operations by assessing and evolving data systems, data management infrastructure, and data-related practices.

    Heather joined IDA in 2015 as a researcher in the Operational Evaluation Division of IDA’s Systems and Analyses Center. She is an expert in quantitative research methods, including test design and program evaluation. She held numerous research and leadership roles before being named an assistant director in the Operational Evaluation Division.

    As a researcher at IDA, Heather led IDA’s test science research program that facilitates data-driven decision-making within the Department of Defense (DOD) by advancing statistical, behavioral, and data science methodologies and applying them to the evaluation of defense acquisition programs. Heather’s other accomplishments include advancing methods for test design, modeling and simulation validation, data management and curation, and artificial intelligence testing. In this role, she worked closely with academic and DOD partners to adapt existing test design and evaluation methods for DoD use and develop novel methods where gaps persist.

    Heather has a doctorate in experimental psychology from the University of Toledo and a bachelor’s degree in research psychology from Marietta College, where she was a member of the McDonough International Leadership Program. She is a graduate of the George Washington University National Security Studies Senior Management Program and the Maxwell School National Security Management Course at Syracuse University.

  • Panelist 2

    Speaker Info:

    Laura Freeman

    Director, Intelligent Systems Division

    Virginia Tech

    Dr. Laura Freeman is a Research Associate Professor of Statistics and dual hatted as the Director of the Intelligent Systems Lab, Virginia Tech National Security Institute and the Director of the Information Sciences and Analytics Division, Virginia Tech Applied Research Corporation (VT-ARC).   Her research leverages experimental methods for conducting research that brings together cyber-physical systems, data science, artificial intelligence (AI), and machine learning to address critical challenges in national security.  She develops new methods for test and evaluation focusing on emerging system technology.  In her role with VT-ARC she focuses on transitioning emerging research in these areas to solve challenges in Defense and Homeland Security. She is also a hub faculty member in the Commonwealth Cyber Initiative and leads research in AI Assurance. She is the Assistant Dean for Research for the College of Science, in that capacity she works to shape research directions and collaborations in across the College of Science in the Greater Washington D.C. area. Previously, Dr. Freeman was the Assistant Director of the Operational Evaluation Division at the Institute for Defense Analyses.  Dr. Freeman has a B.S. in Aerospace Engineering, a M.S. in Statistics and a Ph.D. in Statistics, all from Virginia Tech.  Her Ph.D. research was on design and analysis of experiments for reliability data.

  • Panelist 3

    Speaker Info:

    Jane Pinelis

    Joint Artificial Intelligence Center

    Dr. Jane Pinelis is the Chief of AI Assurance at the Department of Defense Joint Artificial Intelligence Center (JAIC). She leads a diverse team of testers and analysts in rigorous test and evaluation (T&E) as well as Responsible AI (RAI) implementation for JAIC capabilities, as well as development of AI Assurance products and standards that will support testing of AI-enabled systems across the DoD.

    Prior to joining the JAIC, Dr. Pinelis served as the Director of Test and Evaluation for USDI’s Algorithmic Warfare Cross-Functional Team, better known as Project Maven. She directed the developmental testing for the AI models, including computer vision, machine translation, facial recognition and natural language processing. Her team developed metrics at various levels of testing for AI capabilities and provided leadership empirically-based recommendations for model fielding. Additionally, she oversaw operational and human-machine teaming testing, and conducted research and outreach to establish standards in T&E of systems using artificial intelligence.

    Dr. Pinelis has spent over 10 years working predominantly in the area of defense and national security. She has largely focused on operational test and evaluation, both in support of the service operational testing commands and also at the OSD level. In her previous job as the Test Science Lead at the Institute of Defense Analyses, she managed an interdisciplinary team of scientists supporting the Director and the Chief Scientist of the Department of Operational Test and Evaluation on integration of statistical test design and analysis and data-driven assessments into test and evaluation practice. Before, that, in her assignment at the Marine Corps Operational Test and Evaluation Activity, Dr. Pinelis led the design and analysis of the widely publicized study on the effects of integrating women into combat roles in the Marine Corps. Based on this experience, she co-authored a book, titled “The Experiment of a Lifetime: Doing Science in the Wild for the United States Marine Corps.”

    In addition to T&E, Dr. Pinelis has several years of experience leading analyses for the DoD in the areas of wargaming, precision medicine, warfighter mental health, nuclear non-proliferation, and military recruiting and manpower planning.

    Her areas of statistical expertise include design and analysis of experiments, quasi-experiments, and observational studies, causal inference, and propensity score methods.

    Dr. Pinelis holds a BS in Statistics, Economics, and Mathematics, an MA in Statistics, and a PhD in Statistics, all from the University of Michigan, Ann Arbor.

  • Panelist 4

    Speaker Info:

    Calvin Robinson

    NASA

    Calvin Robinson is a Data Architect within the Information and Applications Division at NASA Glenn Research Center. He has over 10 years of experience supporting data analysis and simulation development for research, and currently supports several key data management efforts to make data more discoverable and aligned with FAIR principles. Calvin oversees the Center’s Information Management Program and supports individuals leading strategic AIML efforts within the Agency. Calvin holds a BS in Computer Science and Engineering from the University of Toledo.

  • Predicting Trust in Automated Systems: Validation of the Trust of Automated Systems Test

    Abstract:

    The number of people using autonomous systems for everyday tasks has increased steadily since the 1960s and has dramatically increased with the invention of smart devices that can be controlled via smartphone. Within the defense community, automated systems are currently used to perform search and rescue missions and to assume control of aircraft to avoid ground collision. Until recently, researchers have only been able to gain insights on trust levels by observing a human’s reliance on the system, so it was apparent that researchers needed a validated method of quantifying how much an individual trusts the automated system they are using. IDA researchers developed the Trust of Automated Systems Test (TOAST scale) to serve as a validated scale capable of measuring how much an individual trusts a system. This presentation will outline the nine item TOAST scale’s understanding and performance elements, and how it can effectively be used in a defense setting. We believe that this scale should be used to evaluate the trust level of any human using any system, including predicting when operators will misuse or disuse complex, automated and autonomous systems.

    Speaker Info:

    Caitlan Fealing

    Data Science Fellow

    IDA

    Caitlan Fealing is a Data Science Fellow within the Test Science group of OED. She has a Bachelor of Arts degree in Mathematics, Economics, and Psychology from Williams College. Caitlan uses her background and focus on data science to create data visualizations, support OED’s program management databases, and contribute to the development of the many resources available on IDA’s Test Science website.

  • Profile Monitoring via Eigenvector Perturbation

    Abstract:

    Control charts are often used to monitor the quality characteristics of a process over time to ensure undesirable behavior is quickly detected. The escalating complexity of processes we wish to monitor spurs the need for more flexible control charts such as those used in profile monitoring. Additionally, designing a control chart that has an acceptable false alarm rate for a practitioner is a common challenge. Alarm fatigue can occur if the sampling rate is high (say, once a millisecond) and the control chart is calibrated to an average in-control run length (ARL0) of 200 or 370 which is often done in the literature. As alarm fatigue may not just be annoyance but result in detrimental effects to the quality of the product, control chart designers should seek to minimize the false alarm rate. Unfortunately, reducing the false alarm rate typically comes at the cost of detection delay or average out-of-control run length (ARL1). Motivated by recent work on eigenvector perturbation theory, we develop a computationally fast control chart called the Eigenvector Perturbation Control Chart for nonparametric profile monitoring. The control chart monitors the l_2 perturbation of the leading eigenvector of a correlation matrix and requires only a sample of known in-control profiles to determine control limits. Through a simulation study we demonstrate that it is able to outperform its competition by achieving an ARL1 close to or equal to 1 even when the control limits result in a large ARL0 on the order of 10^6. Additionally, non-zero false alarm rates with a change point after 10^4 in-control observations were only observed in scenarios that are either pathological or truly difficult for a correlation based monitoring scheme.

    Speaker Info:

    Takayuki Iguchi

    PhD Student

    Florida State University

    Takayuki Iguchi is a Captain in the US Air Force and is currently a PhD student under the direction of Dr. Eric Chicken at Florida State University.

  • Quantifying the Impact of Staged Rollout Policies on Software Process and Product Metrics

    Abstract:

    Software processes define specific sequences of activities performed to effectively produce software, whereas tools provide concrete computational artifacts by which these processes are carried out. Tool independent modeling of processes and related practices enable quantitative assessment of software and competing approaches. This talk presents a framework to assess an approach employed in modern software development known as staged rollout, which releases new or updated software features to a fraction of the user base in order to accelerate defect discovery without imposing the possibility of failure on all users. The framework quantifies process metrics such as delivery time and product metrics, including reliability, availability, security, and safety, enabling tradeoff analysis to objectively assess the quality of software produced by vendors, establish baselines, and guide process and product improvement. Failure data collected during software testing is employed to emulate the approach as if the project were ongoing. The underlying problem is to identify a policy that decides when to perform various stages of rollout based on the software's failure intensity. The illustrations examine how alternative policies impose tradeoffs between two or more of the process and product metrics.

    Speaker Info:

    Lance Fiondella

    Associate Professor

    University of Massachusetts Dartmouth

    Lance Fiondella is an associate professor of Electrical and Computer Engineering at the University of Massachusetts Dartmouth and the Director of the UMassD Cybersecurity Center, a NSA/DHS designated Center of Academic Excellence in Cyber Research.

  • Risk Comparison and Planning for Bayesian Assurance Tests

    Abstract:

    Designing a Bayesian assurance test plan requires choosing a test plan that guarantees a product of interest is good enough to satisfy consumer’s criteria but not ‘so good’ that it causes producer’s concern if they fail the test. Bayesian assurance tests are especially useful because they can incorporate previous product information in the test planning and explicitly control levels of risk for the consumer and producer. We demonstrate an algorithm for efficiently computing a test plan given desired levels of risks in binomial and exponential testing. Numerical comparisons with the Operational Characteristic (OC) curve, Probability Ratio Sequential Test (PRST), and a simulation-based Bayesian sample size determination approach are also considered.

    Speaker Info:

    Hyoshin Kim

    North Carolina State University

    Hyoshin Kim received her B.Ec. in Statistics from Sungkyunkwan University, South Korea, in 2017, and her M.S. in Statistics from Seoul National University, South Korea, in 2019. She is currently a third year Ph.D. student at the department of Statistics at North Carolina State University. Her research interests are Bayesian assurance testing and Bayesian clustering algorithms for high dimensional correlated outcomes.

  • Safe Machine Learning Prediction and Optimization via Extrapolation Control

    Abstract:

    Uncontrolled model extrapolation leads to two serious kinds of errors: (1) the model may be completely invalid far from the data, and (2) the combinations of variable values may not be physically realizable. Optimizing models that are fit to observational data can lead to extrapolated solutions that are of no practical use without any warning. In this presentation we introduce a general approach to identifying extrapolation based on a regularized Hotelling T-squared metric. The metric is robust to certain kinds of messy data and can handle models with both continuous and categorical inputs. The extrapolation model is intended to be used in parallel with a machine learning model to identify when the machine learning model is being applied to data that are not close to that model training set or as a non-extrapolation constraint when optimizing the model. The methodology described was introduced into the JMP Pro 16 Profiler.

    Speaker Info:

    Tom Donnelly

    JMP Statistical Discovery LLC

  • Safe Machine Learning Prediction and Optimization via Extrapolation Control

    Abstract:

    Uncontrolled model extrapolation leads to two serious kinds of errors: (1) the model may be completely invalid far from the data, and (2) the combinations of variable values may not be physically realizable. Optimizing models that are fit to observational data can lead to extrapolated solutions that are of no practical use without any warning. In this presentation we introduce a general approach to identifying extrapolation based on a regularized Hotelling T-squared metric. The metric is robust to certain kinds of messy data and can handle models with both continuous and categorical inputs. The extrapolation model is intended to be used in parallel with a machine learning model to identify when the machine learning model is being applied to data that are not close to that model training set or as a non-extrapolation constraint when optimizing the model. The methodology described was introduced into the JMP Pro 16 Profiler.

    Speaker Info:

    Laura Lancaster

    JMP Statistical Discovery LLC

  • Sparse Models for Detecting Malicious Behavior in OpTC

    Abstract:

    Host-based sensors are standard tools for generating event data to detect
    malicious activity on a network. There is often interest in detecting
    activity using as few event classes as possible in order to minimize
    host processing slowdowns. Using DARPA's Operationally Transparent
    Cyber (OpTC) Data Release, we consider the problem of detecting
    malicious activity using event counts aggregated over five-minute
    windows. Event counts are categorized by eleven features according to
    MITRE CAR data model objects. In the supervised setting, we use
    regression trees with all features to show that malicious activity can
    be detected at above a 90% true positive rate with a negligible false
    positive rate. Using forward and exhaustive search techniques, we show
    the same performance can be obtained using a sparse model with only
    three features. In the unsupervised setting, we show that the
    isolation forest algorithm is somewhat successful at detecting
    malicious activity, and that a sparse three-feature model performs
    comparably. Finally, we consider various search criteria for
    identifying sparse models and demonstrate that the RMSE criteria is
    generally optimal.

    Speaker Info:

    Andrew Mastin

    Operations Research Scientist

    Lawrence Livermore National Laboratory

    Andrew Mastin is an Operations Research Scientist at Lawrence Livermore National Laboratory. He holds a Ph.D. in Electrical Engineering and Computer Science from the Massachusetts Institute of Technology. His current research interests include cybersecurity, network interdiction, dynamic optimization, and game theory.

  • STAT and UQ Implementation Lessons Learned

    Abstract:

    David Harrison and Kelsey Cannon from Lockheed Martin Space will present on STAT and UQ implementation lessons learned within Lockheed Martin. Faced with training 60,000 engineers in statistics, David and Kelsey formed a plan to make STAT and UQ processes the standard at Lockheed Martin. The presentation includes a range of information from initial communications plan, to obtaining leader adoption, to training engineers across the corporation. Not all programs initially accepted this process, but implementation lessons have been learned over time as many compounding successes and savings have been recorded.

    ©2022 Lockheed Martin, all rights reserved

    Speaker Info:

    Kelsey Cannon

    Materials Engineer

    Lockheed Martin

    Kelsey Cannon is a Senior Research Scientist at Lockheed Martin Space, previously completing a Specialty Engineering rotation program where she worked in a variety of environments and roles. Kelsey currently works with David Harrison, the statistical engineering SME at LM, to implement technical principles and a communications plan throughout the corporation. Kelsey holds a BS in Metallurgical and Materials Engineering from the Colorado School of Mines and is nearing completion of a MS in Computer Science and Data Science.

  • Stochastic Modeling and Characterization of a Wearable-Sensor-Based Surveillance Network

    Abstract:

    Current disease outbreak surveillance practices reflect underlying delays in the detection and reporting of disease cases, relying on individuals who present symptoms to seek medical care and enter the health care system. To accelerate the detection of outbreaks resulting from possible bioterror attacks, we introduce a novel two-tier, human sentinel network (HSN) concept composed of wearable physiological sensors capable of pre-symptomatic illness detection, which prompt individuals to enter a confirmatory stage where diagnostic testing occurs at a certified laboratory. Both the wearable alerts and test results are reported automatically and immediately to a secure online platform via a dedicated application. The platform aggregates the information and makes it accessible to public health authorities. We evaluated the HSN against traditional public health surveillance practices for outbreak detection of 80 Bacillus anthracis (Ba) release scenarios in mid-town Manhattan, NYC. We completed an end-to-end modeling and analysis effort, including the calculation of anthrax exposures and doses based on computational atmospheric modeling of release dynamics, and development of a custom-built probabilistic model to simulate resulting wearable alerts, diagnostic test results, symptom onsets, and medical diagnoses for each exposed individual in the population. We developed a novel measure of network coverage, formulated new metrics to compare the performance of the HSN to public health surveillance practices, completed a Design of Experiments to optimize the test matrix, characterized the performant trade-space, and performed sensitivity analyses to identify the most important engineering parameters. Our results indicate that a network covering greater than ~10% of the population would yield approximately a 24-hour time advantage over public health surveillance practices in identifying outbreak onset, and provide a non-target-specific indication (in the form of a statistically aberrant number of wearable alerts) of approximately 36-hours; these earlier detections would enable faster and more effective public health and law enforcement responses to support incident characterization and decrease morbidity and mortality via post-exposure prophylaxis.

    Speaker Info:

    Jane E. Valentine

    Senior Biomedical Engineer

    Johns Hopkins University Applied Physics Laboratory

    Jane Valentine received her B.S. in Mathematics and French, and Ph.D. in Biomedical Engineering, both from Carnegie Mellon University. She then completed a post-doc in Mechanical Engineering at the University of Illinois, and a data science fellowship in the United Kingdom, working with a pharmaceutical company.  She has been working at the Johns Hopkins University Applied Physics Laboratory since 2020, where she works on mathematical modeling and simulation, optimization, and data science, particularly in the areas of biosensors, knowledge graphs, and epidemiological modeling.

  • Structural Dynamic Programming Methods for DOD Research

    Abstract:

    Structural dynamic programming models are a powerful tool to help guide policy under uncertainty. By creating a mathematical representation of the intertemporal optimization problem of interest, these models can answer questions that static models cannot address. Applications can be found from military personnel policy (how does future compensation affect retention now?) to inventory management (how many aircraft are needed to meet readiness objectives?). Recent advances in statistical methods and computational algorithms allow us to develop dynamic programming models of complex real-world problems that were previously too difficult to solve.

    Speaker Info:

    Mikhail Smirnov

    Research Staff Member

    IDA

    Mikhail earned his PhD in Economics from Johns Hopkins University in 2017 and recently joined the Strategy, Forces, and Resources Division at the Institute for Defense Analyses after spending several years at CNA.  He specializes in structural and nonparametric econometrics, computational statistics, and machine learning, and his research has focused on questions related to retention and other personnel related decisions for the DOD.

  • Survey Dos and Don'ts

    Abstract:

    How many surveys have you been asked to fill out? How many did you actually complete? Why those surveys? Did you ever feel like the answer you wanted to mark was missing from the list of possible responses? Surveys can be a great tool for data collection if they are thoroughly planned out and well-designed. They are a relatively inexpensive way to collect a large amount of data from hard to reach populations. However, if they are poorly designed, the test team might end up with a lot of data and little to no information.
    Join the STAT COE for a short tutorial on the dos and don’ts of survey design and analysis. We’ll point out the five most common survey mistakes, compare and contrast types of questions, discuss the pros and cons for potential analysis methods (such as descriptive statistics, linear regression, principal component analysis, factor analysis, hypothesis testing, and cluster analysis), and highlight how surveys can be used to supplement other sources of information to provide value to an overall test effort.
    DISTRIBUTION STATEMENT A. Approved for public release; distribution is unlimited. CLEARED on 5 Jan 2022. Case Number: 88ABW-2022-0003

    Speaker Info:

    Gina Sigler

    Statistician

    Scientific Test and Analysis Techniques Center of Excellence (STAT COE)

    Gina Sigler is a senior statistician at Huntington Ingalls Industries, working at the Scientific Test and Analysis Techniques (STAT) Center of Excellence (COE) at the Air Force Institute of Technology (AFIT), where she provides rigorous test designs and best practices to programs across the Department of Defense (DoD). She was part of the AETC 2019 Air Force Analytic Team of the Year. Before joining the STAT COE, she worked as a faculty associate in the Statistics Department at the University of Wisconsin (UW)-Madison. She earned a B.S. degree in statistics from Michigan State University in 2012, an M.S. in statistics from the UW-Madison in 2014, and is currently pursuing a Ph.D. in Applied Mathematics-Statistics at AFIT.

  • Survey Dos and Don'ts

    Abstract:

    How many surveys have you been asked to fill out? How many did you actually complete? Why those surveys? Did you ever feel like the answer you wanted to mark was missing from the list of possible responses? Surveys can be a great tool for data collection if they are thoroughly planned out and well-designed. They are a relatively inexpensive way to collect a large amount of data from hard to reach populations. However, if they are poorly designed, the test team might end up with a lot of data and little to no information.
    Join the STAT COE for a short tutorial on the dos and don’ts of survey design and analysis. We’ll point out the five most common survey mistakes, compare and contrast types of questions, discuss the pros and cons for potential analysis methods (such as descriptive statistics, linear regression, principal component analysis, factor analysis, hypothesis testing, and cluster analysis), and highlight how surveys can be used to supplement other sources of information to provide value to an overall test effort.
    DISTRIBUTION STATEMENT A. Approved for public release; distribution is unlimited. CLEARED on 5 Jan 2022. Case Number: 88ABW-2022-0003

    Speaker Info:

    Alex (Mary) McBride

    Statistician

    Scientific Test and Analysis Techniques Center of Excellence (STAT COE)

    Alex McBride is a senior statistician at Huntington Ingalls Industries, working at the Homeland Security Community of Best Practices (HS CoBP) at the Air Force Institute of Technology (AFIT), where she provides rigorous test designs, analysis, and workforce development to acquisition programs across the Department of Homeland Security (DHS). She was part of the TED Workforce Development Team awarded a 2020 Under Secretary's Award in the category of Science and Engineering. Before joining the HS CoBP, she was a graduate teaching assistant for the Statistics Department at Wright State University. She earned a B.S. degree in statistics from Grand Valley State University in 2017 and an M.S. in statistics from the Wright State University in 2019. 

  • T&E of Responsible AI

    Abstract:

    Getting Responsible AI (RAI) right is difficult and demands expertise. All AI-relevant skill sets, including ethics, are in high demand and short supply, especially regarding AI’s intersection with test and evaluation (T&E). Frameworks, guidance, and tools are needed to empower working-level personnel across DOD to generate RAI assurance cases with support from RAI SMEs. At a high level, framework should address the following points:

    1. T&E is a necessary piece of the RAI puzzle--testing provides a feedback mechanism for system improvement and builds public and warfighter confidence in our systems, and RAI should be treated just like performance, reliability, and safety requirements.
    2. We must intertwine T&E and RAI across the cradle-to-grave product life cycle. Programs must embrace T&E and RAI from inception; as development proceeds, these two streams must be integrated in tight feedback loops to ensure effective RAI implementation. Furthermore, many AI systems, along with their operating environments and use cases, will continue to update and evolve and thus will require continued evaluation after fielding.
    3. The five DOD RAI principles are a necessary north star, but alone they are not enough to implement or ensure RAI. Programs will have to integrate multiple methodologies and sources of evidence to construct holistic arguments for how much the programs have reduced RAI risks.
    4. RAI must be developed, tested, and evaluated in context--T&E without operationally relevant context will fail to ensure that fielded tools achieve RAI. Mission success depends on technology that must interact with warfighters and other systems in complex environments, while constrained by processes and regulation. AI systems will be especially sensitive to operational context and will force T&E to expand what it considers.

    Speaker Info:

    Rachel Haga

    Research Associate

    IDA

  • Taming the beast: making questions about the supply system tractable by quantifying risk

    Abstract:

    The DoD sustainment system is responsible for managing the supply of millions of different spare parts, most of which are infrequently and inconsistently requisitioned, and many of which have procurement lead times measured in years. The DoD must generally buy items in anticipation of need, yet it simply cannot afford to buy even one copy of every unique part it might be called upon to deliver. Deciding which items to purchase necessarily involves taking risks, both military and financial. However, the huge scale of the supply system makes these risks difficult to quantify. We have developed methods that use raw supply data in new ways to support this decision making process. First, we have created a method to identify areas of potential overinvestment that could safely be reallocated to areas at risk of underinvestment. Second, we have used raw requisition data to create an item priority list for individual weapon systems in terms of importance to mission success. Together, these methods allow DoD decision makers to make better-informed decisions about where to take risks and where to invest scarce resources.

    Speaker Info:

    Joseph Fabritius

    Research Staff Member/Research Staff Member

    IDA

    Joseph Fabritius earned his Bachelor's degree in Physics from Rochester Institute of Technology in 2012. He earned his Master's degree in Physics from Drexel University 2017, and he earned his PhD in Physics from Drexel University in 2021. He currently is a Research Staff Member at the Institute for Defense Analyses where he works on sustainment analyses.

  • Taming the beast: making questions about the supply system tractable by quantifying risk

    Abstract:

    The DoD sustainment system is responsible for managing the supply of millions of different spare parts, most of which are infrequently and inconsistently requisitioned, and many of which have procurement lead times measured in years. The DoD must generally buy items in anticipation of need, yet it simply cannot afford to buy even one copy of every unique part it might be called upon to deliver. Deciding which items to purchase necessarily involves taking risks, both military and financial. However, the huge scale of the supply system makes these risks difficult to quantify. We have developed methods that use raw supply data in new ways to support this decision making process. First, we have created a method to identify areas of potential overinvestment that could safely be reallocated to areas at risk of underinvestment. Second, we have used raw requisition data to create an item priority list for individual weapon systems in terms of importance to mission success. Together, these methods allow DoD decision makers to make better-informed decisions about where to take risks and where to invest scarce resources.

    Speaker Info:

    Kyle Remley

    Research Staff Member/Research Staff Member

    IDA

    Kyle Remley earned his Bachelor's degree in Nuclear and Radiological Engineering from Georgia Tech in 2013. He earned his Master's degree in Nuclear Engineering from Georgia Tech in 2015, and he earned his PhD in Nuclear and Radiological Engineering from Georgia Tech in 2016. He was an engineer at the Naval Nuclear Laboratory from 2017 to 2020. Since 2020, he has been a Research Staff Member at Institute for Defense Analyses, where he works on sustainment analyses.

  • Test & Evaluation of ML Models

    Abstract:

    Machine Learning models have been incredibly impactful over the past decade; however, testing those models and comparing their performance has remained challenging and complex. In this presentation, I will demonstrate novel methods for measuring the performance of computer vision object detection models, including running those models against still imagery and video. The presentation will start with an introduction to the pros and cons of various metrics, including traditional metrics like precision, recall, average precision, mean average precision, F1, and F-beta. The talk will then discuss more complex topics such as tracking metrics, handling multiple object classes, visualizing multi-dimensional metrics, and linking metrics to operational impact. Anecdotes will be shared discussing different types of metrics that are appropriate for different types of stakeholders, how system testing fits in, best practices for model integration, best practices for data splitting, and cloud vs on-prem compute lessons learned. The presentation will conclude by discussing what software libraries are available to calculate these metrics, including the MORSE-developed library Charybdis.

    Speaker Info:

    Anna Rubinstein

    Director of Test and Evaluation

    MORSE Corporation

    Dr. Anna Rubinstein serves as the Director of Test and Evaluation for a Department of Defense (DoD) Artificial Intelligence (AI) program. She directs testing for AI models spanning the fields of computer vision, natural language processing, and other forms of machine perception. She leads teams developing metrics and assessing capabilities at the algorithm, system, and operational level, with a particular interest in human-machine teaming.

    Dr. Rubinstein has spent the last five years supporting national defense as a contractor, largely focusing on model and system evaluation. In her previous role as a Science Advisor in the Defense Advanced Research Projects Agency’s (DARPA) Information Innovation Office (I2O), she provided technical insight to research programs modeling cyber operations in the information domain and building secure software-reliant systems. Before that, Dr. Rubinstein served as a Research Staff Member at the Institute for Defense Analyses (IDA), leading efforts to provide verification and validation of nuclear weapons effects modeling codes in support of the Defense Threat Reduction Agency (DTRA). Dr. Rubinstein also has several years of experience developing algorithms for atmospheric forecasting, autonomous data fusion, social network mapping, anomaly detection, and pattern optimization.

    Dr. Rubinstein holds an M.A. in Chemical Engineering and a Ph.D. in Chemical Engineering and Materials Science from Princeton University, where she was a National Science Foundation Graduate Research Fellow. She also received a B.S. in Chemical Engineering, a B.A. in Chemistry, and a B.A. in Chinese, all from the University of Mississippi, where she was a Barry M. Goldwater Scholar.

  • Test and Evaluation Framework for AI Enabled Systems

    Abstract:

    In the current moment, autonomous and artificial intelligence (AI) systems are emerging at a dizzying pace. Such systems promise to expand the capacity and capability of individuals by delegating increasing levels of decision making down to the agent-level. In this way, operators can set high-level objectives for multiple vehicles or agents and need only intervene when alerted to anomalous conditions. Test and evaluation efforts at the Join AI Center are focused on exercising a prescribed test strategy for AI-enabled systems. This new AI T&E Framework recognizes the inherent complexity that follows from incorporating dynamic decision makers into a system (or into a system-of-systems). The AI T&E Framework is composed of four high-level types of testing that examine at an AI-enabled system from different angles to provide as complete a picture as possible of the system's capabilities and limitations, including algorithmic, system integration, human-system integration, and operational tests. These testing categories provides stakeholders with appropriate qualitative and quantitative assessments that bound the system’s use cases in a meaningful way. The algorithmic tests characterize the AI models themselves against metrics for effectiveness, security, robustness and responsible AI principles. The system integration tests the system itself to ensure it operates reliably, functions correctly, and is compatible with other components. The human-machine testing asks what do human operators think of the system, if they understand what the system is telling them, and if they trust the system under appropriate conditions. All of which culminates in an operational test that evaluates how the system performs in a realistic environment with realistic scenarios and adversaries. Interestingly, counter to traditional approaches, this framework is best applied during and throughout the development of an AI-enabled system. Our experience is that programs that conduct independent T&E alongside development do not suffer delays, but instead benefit from the feedback and insights gained from incremental and iterative testing, which leads to the delivery of a better overall capability.

    Speaker Info:

    Brian Woolley

    T&E Enclave Manager

    Joint Artificial Intelligence Center

    Lt Col Brian Woolley is a U.S. Air Force officer currently serving as the Test and Evaluation Enclave Manager at the DoD’s Joint Artificial Intelligence Center, Arlington, Virginia. He earned his Doctoral degree in Computer Engineering from the University of Central Florida and a Masters of Science in Software Engineering from the Air Force Institute of Technology. During his 19-year military career, Brian has served as a Cyber Operations Officer supporting the Air Force Weather Agency, the Joint Headquarters for the DoD Information Networks, U.S. Cyber Command, and as the Deputy Directory for the Autonomy and Navigation Center at the Air Force Institute of Technology.

  • The Mental Health Impact of Local COVID-19 Cases Prior to the Mass Availability of Vaccine

    Abstract:

    During the COVID-19 pandemic the majority of Americans experienced many new mental health stressors, including isolation, economic instability, fear of exposure to COVID-19, and the effects of themselves or loved ones catching COVID-19. Service members, veterans, and their families experienced these stressors differently from the general public. In this seminar we examine how local COVID-19 case counts affected mental health outcomes prior to the mass availability of vaccines. We show that households we identify as likely military households and TRICARE and Military health system beneficiaries reported higher mental health quality than their general population peers, but VA beneficiaries do not. We find local case counts are an important factor in determining demographic groups reporting drops in mental health during the pandemic.

    Speaker Info:

    Zachary Szlendak

    Research Staff Member

    IDA

    Fields of Expertise: Applied econometrics, economics of education, environmental economics, health economics, labor economics, microeconomics, and public economics.

    Zachary Szlendak has been a Research Staff Member for the Institute for Defense Analyses in the Cost Analysis and Research Division since 2020. Zachary has worked on recruitment and COVID-19 projects. His contribution on these projects includes causal inference of COVID policy, research on mental health, identifying factors that affect the financial well-being of service members, evaluating alternative market mechanisms, and participating on interviews aimed at increasing diversity, equity, and inclusion. Additionally, Zachary contributes to the economic analysis IDA’s SAFETY Act work.

    Prior to his work at IDA, Zachary earned his Ph.D. in Economics at the University of Colorado Boulder. He received his B.S. from the Colorado School of Mines in Applied Mathematics.

  • Topological Modeling of Human-Machine Teams

    Abstract:

    A Human-Machine Team (HMT) is a group of agents consisting of at least one human and at least one machine, all functioning collaboratively towards one or more common objectives. As industry and defense find more helpful, creative, and difficult applications of AI-driven technology, the need to effectively and accurately model, simulate, test, and evaluate HMTs will continue to grow and become even more essential.

    Going along with that growing need, new methods are required to evaluate whether a human-machine team is performing effectively as a team in testing and evaluation scenarios. You cannot predict team performance from knowledge of the individual team agents, alone; interaction between the humans and machines – and interaction between team agents, in general – increases the problem space and adds a measure of unpredictability. Collective team or group performance, in turn, depends heavily on how a team is structured and organized, as well as the mechanisms, paths, and substructures through which the agents in the team interact with one another – i.e. the team’s topology. With the tools and metrics for measuring team structure and interaction becoming more highly developed in recent years, we will propose and discuss a practical, topological HMT modeling framework that not only takes into account but is actually built around the team’s topological characteristics, while still utilizing the individual human and machine performance measures.

    Speaker Info:

    Caitlan Fealing

    Research Staff Member

    IDA

    Dr. Wilkins received his Ph.D. in Mathematics from the University of Tennessee, and then spent several years as a postdoctoral fellow and professor in academia before moving into the realm of defense research.  After three years at the US Army Research Office (part of the US Army Research Lab) managing the Mathematical Analysis and Complex Systems Program, he came to IDA as a research staff member in January 2020.  His mathematical background is in applied topological and geometric analysis, particularly the application of topology and geometry to the solution of problems in networking and optimal transport.  His current project work at IDA includes team modeling and simulation, sequential testing of chemical agent detectors, and helicopter pilot guidance and firing interfaces.

  • Topological Modeling of Human-Machine Teams

    Abstract:

    A Human-Machine Team (HMT) is a group of agents consisting of at least one human and at least one machine, all functioning collaboratively towards one or more common objectives. As industry and defense find more helpful, creative, and difficult applications of AI-driven technology, the need to effectively and accurately model, simulate, test, and evaluate HMTs will continue to grow and become even more essential.

    Going along with that growing need, new methods are required to evaluate whether a human-machine team is performing effectively as a team in testing and evaluation scenarios. You cannot predict team performance from knowledge of the individual team agents, alone; interaction between the humans and machines – and interaction between team agents, in general – increases the problem space and adds a measure of unpredictability. Collective team or group performance, in turn, depends heavily on how a team is structured and organized, as well as the mechanisms, paths, and substructures through which the agents in the team interact with one another – i.e. the team’s topology. With the tools and metrics for measuring team structure and interaction becoming more highly developed in recent years, we will propose and discuss a practical, topological HMT modeling framework that not only takes into account but is actually built around the team’s topological characteristics, while still utilizing the individual human and machine performance measures.

    Speaker Info:

    Jay Wilkins

    Research Staff Member

    IDA

    Dr. Wilkins received his Ph.D. in Mathematics from the University of Tennessee, and then spent several years as a postdoctoral fellow and professor in academia before moving into the realm of defense research.  After three years at the US Army Research Office (part of the US Army Research Lab) managing the Mathematical Analysis and Complex Systems Program, he came to IDA as a research staff member in January 2020.  His mathematical background is in applied topological and geometric analysis, particularly the application of topology and geometry to the solution of problems in networking and optimal transport.  His current project work at IDA includes team modeling and simulation, sequential testing of chemical agent detectors, and helicopter pilot guidance and firing interfaces.

  • TRMC Big Data Analytics Investments & Technology Review

    Abstract:

    To properly test and evaluate today’s advanced military systems, the T&E community must utilize big data analytics (BDA) and techniques to quickly process, visualize, understand, and report on massive amounts of data. This tutorial/presentation/TBD will inform the audience how to transform the current T&E data infrastructure and analysis techniques to one employing enterprise BDA and Knowledge Management (BDKM) that supports the current warfighter T&E needs and the developmental and operational testing of future weapon platforms.

    The TRMC enterprise BDKM will improve acquisition efficiency, keep up with the rapid pace of acquisition technological advancement, and ensure that effective weapon systems are delivered to warfighters at the speed of relevance – all while enabling T&E analysts across the acquisition lifecycle to make better and faster decisions using data previously inaccessible or unusable.

    This capability encompasses a big data architecture framework – its supporting resources, methodologies, and guidance – to properly address the current and future data needs of systems testing and analysis, as well as an implementation framework, the Cloud Hybrid Edge-to-Enterprise Evaluation and Test Analysis Suite (CHEETAS).

    In combination with the TRMC’s Joint Mission Environment Test Capability (JMETC) which provides readily-available connectivity to the Services’ distributed test capabilities and simulations, the TRMC has demonstrated that applying enterprise-distributed BDA tools and techniques to distributed T&E leads to faster and more informed decision-making – resulting in reduced overall program cost and risk.

    Speaker Info:

    Edward Powell

    Lead Architect and Systems Engineer

    Test Resource Management Center

    Dr. Edward T. Powell is a lead architect and systems engineer for the Test Resource Management Center.  He has worked in the military simulation, intelligence, and Test and Evaluation fields during his thirty-year career, specializing in systems and simulation architecture and engineering.  His current focus is on integrating various OSD and Service big data analysis initiatives into a single seamless cloud-based system-of-systems.  He holds a PhD in Astrophysics from Princeton University is the principal of his own consulting company based in Northern Virginia.

  • TRMC Big Data Analytics Investments & Technology Review

    Abstract:

    To properly test and evaluate today’s advanced military systems, the T&E community must utilize big data analytics (BDA) and techniques to quickly process, visualize, understand, and report on massive amounts of data. This tutorial/presentation/TBD will inform the audience how to transform the current T&E data infrastructure and analysis techniques to one employing enterprise BDA and Knowledge Management (BDKM) that supports the current warfighter T&E needs and the developmental and operational testing of future weapon platforms.

    The TRMC enterprise BDKM will improve acquisition efficiency, keep up with the rapid pace of acquisition technological advancement, and ensure that effective weapon systems are delivered to warfighters at the speed of relevance – all while enabling T&E analysts across the acquisition lifecycle to make better and faster decisions using data previously inaccessible or unusable.

    This capability encompasses a big data architecture framework – its supporting resources, methodologies, and guidance – to properly address the current and future data needs of systems testing and analysis, as well as an implementation framework, the Cloud Hybrid Edge-to-Enterprise Evaluation and Test Analysis Suite (CHEETAS).

    In combination with the TRMC’s Joint Mission Environment Test Capability (JMETC) which provides readily-available connectivity to the Services’ distributed test capabilities and simulations, the TRMC has demonstrated that applying enterprise-distributed BDA tools and techniques to distributed T&E leads to faster and more informed decision-making – resulting in reduced overall program cost and risk.

    Speaker Info:

    Ryan Norman

    Chief Data Officer

    Test Resource Management Center

    RYAN NORMAN serves as the Test Resource Management Center’s (TRMC) Chief Data Officer (CDO), where he is responsible for developing and overseeing execution of the strategic plan, underlying architecture, and related investments necessary to help improve T&E Knowledge Management and analysis capabilities across the DoD.  Mr. Norman also serves as the TRMC’s lead for Joint Mission Environments, where he is responsible for leading the Joint Mission Environment Test Capability (JMETC) team that provides distributed Test & Evaluation (T&E) infrastructure, enterprise services, and technical expertise to facilities across the Department of Defense (DoD).  This role includes acting as the Director for the Test & Training Enabling Architecture (TENA) Software Development Activity (SDA), which maintains and evolves an integration architecture that ensures live-virtual-constructive interoperability within and between test and training ranges.  Since joining the TRMC in 2008, Mr. Norman has held several additional positions including: Project Manager for the Cyber Operations Research and Network Analysis (CORONA) project; TRMC’s Senior Engineer for Cyberspace Test Capabilities; Chief Engineer of the JMETC program; and TRMC Lead for Army Range Oversight.  Mr. Norman previously served at U.S. Army Redstone Test Center (RTC) as Software Architect.  Mr. Norman holds a Bachelor of Science degree in Computer Science from the Georgia Institute of Technology and is a DoD certified acquisition professional in the T&E, Information Technology, and Program Management career fields.

  • Trust Throughout the Artificial Intelligence Lifecycle

    Abstract:

    AI and machine learning have become widespread throughout the defense, government, and commercial sectors. This has led to increased attention on the topic of trust and the role it plays in successfully integrating AI into highconsequence environments where tolerance for risk is low. Driven by recent successes of AI algorithms in a range of applications, users and organizations rely on AI to provide new, faster, and more adaptive capabilities. However, along with those successes have come notable pitfalls, such as bias, vulnerability to adversarial attack, and inability to perform as expected in novel environments. Many types of AI are data-driven, meaning they operate on and learn their internal models directly from data. Therefore, tracking how data were used to build data properties (e.g., training, validation, and testing) is crucial not only to ensure a high-performing model, but also to understand if the AI should be trusted. MLOps, an offshoot of DevSecOps, is a set of best practices meant to standardize and streamline the end-to-end lifecycle of machine learning. In addition to supporting the software development and hardware requirements of AI-based systems, MLOps provides a scaffold by which the attributes of trust can be formally and methodically evaluated. Additionally, MLOps encourages reasoning about trust early and often in the development cycle. To this end, we present a framework that encourages the development of AI-based applications that can be trusted to operate as intended and function safely both with and without human interaction. This framework offers guidance for each phase of the AI lifecycle, utilizing MLOps, through a detailed discussion of pitfalls resulting from not considering trust, metrics for measuring attributes of trust, and mitigations strategies for when risk tolerance is low.

    Speaker Info:

    Lauren H. Perry

    Senior Project Engineer

    The Aerospace Corporation

    Lauren H Perry
    Sr Project Engineer, Space Applications Group
    Ms. Perry’s work with The Aerospace Corporation incorporates AI/ML technologies into traditional software development programs for the IC, DoD, and commercial customers. Previously, she was the analytical lead for a DoD project established to improve joint interoperability within the Integrated Air and Missile Defense (IAMD) Family of Systems and enhance air warfare capability, and a Reliability Engineer at Lockheed Martin Space Systems Company. She has a background in experimental design, applied statistics, and statistical engineering for the aerospace domain.

     

    Dr. Philip C Slingerland
    Sr Engineering Specialist, Machine Intelligence and Exploitation Department
    Dr. Slingerland’s work with The Aerospace Corporation focuses on machine learning and computer vision projects for a variety of IC, DoD, and commercial customers. Previously, he spent four years as a data scientist and software developer at Metron Scientific Solutions in support of many Naval Sea Systems Command (NAVSEA) studies. Dr. Slingerland has a background in sensor modeling and characterization, with a PhD in physics studying the performance of terahertz quantum cascade lasers (QCLs) for remote sensing applications.

  • USE OF DESIGN & ANALYSIS OF COMPUTER EXPERIMENTS (DACE) IN SPACE MISSION TRAJECTORY DESIGN

    Abstract:

    Numerical astrodynamics simulations are characterized by a large input space and com-plex, nonlinear input-output relationships. Standard Monte Carlo runs of these simulations are typically time-consuming and numerically costly. We adapt the Design and Analysis of Com-puter Experiments (DACE) approach to astrodynamics simulations to improve runtimes and increase information gain. Space-filling designs such as the Latin Hypercube Sampling (LHS) methods, Maximin and Maximum Projection Sampling, with the Surrogate modelling tech-niques of DACE such as Radial Basis Functions and Gaussian Process Regression, gave sig-nificant improvements for astrodynamics simulations, including: reduced run time of Monte Carlo simulations, improved speed of sensitivity analysis, confidence intervals for non-Gaussian behavior, determination of outliers, and identifying extreme output cases not found by standard simulation and sampling methods.
    Four case studies are presented on novel applications of DACE to mission trajectory design & conjunction assessments with space debris: 1) Gaussian Process regression modelling of maneuvers and navigation uncertainties for commercial cislunar and NASA CLPS lunar missions; 2) Development of a Surrogate model for predficting collision risk and miss distance volatility between debris and satellites in Low Earth orbit; 3) Prediction of the displace-ment of an object in orbit using laser photon pressure; 4) Prediction of eclipse durations for the NASA IBEX-extended mission.
    The surrogate models are assessed by k-fold cross validation. The relative selection of sur-rogate model performance is verified by the Root Mean Square Error (RMSE) of predic-tions at untried points. To improve the sampling of manoeuvre and navigational uncertain-ties within trajectory design for lunar missions, a maximin LHS was used, in combination with the Gates model for thrusting uncertainty. This led to improvements in simulation ef-ficiency, producing a non-parametric ΔV distribution that was processed with Kernel Density Estimation to resolve a ΔV99.9 prediction with confidence bounds.

    In a collaboration with the NASA Conjunction Assessment Risk Analysis (CARA) group, the changes in probability of collision (Pc) for two objects in LEO was predicted using a network of 13 Gaussian Process Regression-based surrogate models that deter-mined the future trends in covariance and miss distance volatility, given the data provided within a conjunction data message. This allowed for determination of the trend in the prob-ability distribution of Pc up to three days from the time of closest approach, as well as the interpretation of this prediction in the form of an urgency metric that can assist satellite operators in the manoeuvre decision process.

    The main challenge in adapting the methods of DACE to astrodynamics simulations was to deliver a direct benefit to mission planning and design. This was achieved by delivering improvements in confidence and predictions for metrics including propellant required to complete a lunar mission (expressed as ΔV); statistical validation of the simulation models used and advising when a sufficient number of simulation runs have been made to verify convergence to an adequate confidence interval. Future applications of DACE for mission design include determining an optimal tracking schedule plan for a lunar mission, and ro-bust trajectory design for low thrust propulsion.

    Speaker Info:

    David Shteinman

    CEO/Managing Director

    Industrial Sciences Group

    David Shteinman is a professional engineer and industrial entrepreneur with 34 years’ experience in manufacturing, mining and transport. David has a passion for applying advanced mathematics and statistics to improve business outcomes. He leads The Industrial Sciences Group, a company formed from two of Australia’s leading research centres: The Australian Research Council and the University of NSW. He has been responsible for over 30 projects to date that combine innovative applications of  mathematics and statistics to several branches of engineering (astronautics, space missions, geodesy, transport, mining & mineral processing, plant control systems and energy)  in Australia, the US and Israel.

    Major projects  in the Space sector include projects with Google Lunar X and Space IL Lunar Mission; NASA Goddard; The Australian Space Agency;  Space Environment Research Centre and EOS Space Systems; Geoscience Australia; AGI, the University of Colorado (Boulder), Space Exploration Engineering (contractors to NASA CLPS Missions).

  • Using R Markdown & the Tidyverse to Create Reproducible Research

    Abstract:

    R is one of the major platforms for doing statistical analysis and research.  This course introduces the powerful and popular R software through the use of the RStudio IDE.  This course covers the use of the tidyverse suite of packages to import raw data (readr), do common data manipulations (dplyr and tidyr), and summarize data numerically (dplyr) and graphically (ggplot2).  In order to promote reproducibility of analyses, we will discuss how to code using R Markdown - a method of R coding that allows one to easily create PDF and HTML documents that interweave narrative, R code, and results.
    List of packages to install: tidyverse, GGally, Lahman, tinytex

    Speaker Info:

    Justin Post

    Teaching Associate Professor

    NCSU

    Justin Post is a Teaching Associate Professor and the Director of Online Education in the Department of Statistics at North Carolina State University. Teaching has always been his passion and that is his main role at NCSU. He teaches undergraduate and graduate courses in both face-to-face and distance settings.  Justin is an R enthusiast and has taught many short courses on R, the tidyverse, R shiny, and more.

  • Using Sensor Stream Data as Both an Input and Output in a Functional Data Analysis

    Abstract:

    A case study will be presented where patients wearing continuous glycemic monitoring systems provide sensor stream data of their glucose levels before and after consuming 1 of 5 different types of snacks. The goal is to be able to better predict a new patient’s glycemic-response-over-time trace after being given a particular type of snack. Functional Data Analysis (FDA) is used to extract eigenfunctions that capture the longitudinal shape information of the traces and principal component scores that capture the patient-to-patient variation. FDA is used twice. First it is used on the “before” baseline glycemic-response-over-time traces. Then a separate analysis is done on the snack-induced “after” response traces. The before FPC scores and the type of snack are then used to model the after FPC scores. This final FDA model will then be used to predict the glycemic response of new patients given a particular snack and their existing baseline response history. Although the case study is for medical sensor data, the methodology employed would work for any sensor stream where an event perturbs the system thus affecting the shape of the sensor stream post event.

    Speaker Info:

    Thomas A Donnelly

    Principal Systems Engineer

    JMP Statistical Discovery LLC

    Tom Donnelly works as a Systems Engineer for JMP Statistical Discovery LLC supporting users of JMP in the Defense and Aerospace sector.  He has been actively using and teaching Design of Experiments (DOE) methods for the past 38 years to develop and optimize products, processes, and technologies.  Donnelly joined JMP in 2008 after working as an analyst for the Modeling, Simulation & Analysis Branch of the US Army’s Edgewood Chemical Biological Center (now CCDC CBC).  There, he used DOE to develop, test, and evaluate technologies for detection, protection, and decontamination of chemical and biological agents.  Prior to working for the Army, Tom was a partner in the first DOE software company for 20 years where he taught over 300 industrial short courses to engineers and scientists.  Tom received his PhD in Physics from the University of Delaware.

  • Using the R ecosystem to produce a reproducible data analysis pipeline

    Abstract:

    Advances in open-source software have brought powerful machine learning and data analysis tools requiring little more than a few coding basics. Unfortunately, the very nature of rapidly changing software can contribute to legitimate concerns surrounding the reproducibility of research and analysis. Borrowing from current practices in data science and software engineering fields, a more robust process using the R ecosystem to produce a version-controlled data analysis pipeline is proposed. By integrating the data cleaning, model generation, manuscript writing, and presentation scripts, a researcher or data analyst can ensure small changes at any step will automatically be reflected throughout using the Rmarkdown, targets, renv, and xaringan R packages.

    Speaker Info:

    Andrew Farina

    Assistant Professor- Department of Behavioral Sciences and Leadership

    United States Military Academy

    Andrew G. Farina is an Assistant Professor at the United States Military Academy, Department of Behavioral Sciences and Leadership. He has ten combat deployments, serving with both conventional and special operations units. His research interests include leadership, character development, and risk-taking propensity.

  • Utilizing Machine Learning Models to Predict Success in Special Operations Assessment

    Abstract:

    The 75th Ranger Regiment is an elite Army Unit responsible for some of the most physically and mentally challenging missions. Entry to the unit is based on an assessment process called Ranger Regiment Assessment and Selection (RASP), which consists of a variety of tests and challenges of strength, intellect, and grit. This study explores the psychological and physical profiles of candidates who attempt to pass RASP. Using a Random Forest Artificial Intelligence model, and a penalized logistic regression model, we identify initial entry characteristics that are predictive of success in RASP. We focus on the differences between racial sub-groups and military occupational specialties (MOS) sub-groups to provide information for recruiters to identify underrepresented groups who are likely to succeed into the selection process.

    Speaker Info:

    Anna Vinnedge

    Student

    United States Military Academy

    Anna Vinnedge is a fourth year cadet at the United States Military Academy. She is a mathematics major from originally from Seattle, WA and the captain of the Varsity Women's Crew team. After graduation, she will serve as a cyber officer in the Army. She has conducted previous research in quantum and DNA based cryptography and coding theory, and is currently focusing her time on machine learning research for statistical analysis.

  • What statisticians should do to improve M&S validation studies

    Abstract:

    It is often said that many research findings -- from social sciences, medicine, economics, and other disciplines -- are false. This fact is trumpeted in the media and by many statisticians. There are several reasons that false research is published, but to what extent should we be worried about them in defense testing and in particular modeling and simulation validation studies?

    In this talk I will present several recommendations for actions that statisticians and data scientists can take to improve the quality of our validations and evaluations.

    Speaker Info:

    John Haman

    RSM

    IDA

    Dr. John Haman is a research staff member at the Institute for Defense Analyses, where he leads a team of analysts that develop methods and tools for analyzing test data. He has worked with a variety of Army, Navy, and Air Force systems, including counter-UAS and electronic warfare systems.

  • A DOE Case Study: Multidisciplinary Approach to Design an Army Gun Propulsion Charge

    Abstract:

    This session will focus on the novel application of a design of experiments approach to optimize a propulsion charge configuration for a U.S. Army artillery round. The interdisciplinary design effort included contributions from subject matter experts in statistics, propulsion charge design, computational physics and experimentation. The process, which we will present in this session, consisted of an initial, low fidelity modeling and simulation study to reduce the parametric space by eliminating inactive variables and reducing the ranges of active variables for the final design. The final design used a multi-tiered approach that consolidated data from multiple sources including low fidelity modeling and simulation, high fidelity modeling and simulation and live test data from firings in a ballistic simulator. Specific challenges of the effort that will be addressed include: integrating data from multiple sources, a highly constrained design space, functional response data, multiple competing design objectives and real-world test constraints. The result of the effort is a final, optimized propulsion charge design that will be fabricated for live gun firing.

    Speaker Info:

    Sarah Longo

    Data Scientist

    US Army CCDC Armaments Center

    Sarah Longo is a data scientist in the US Army CCDC Armaments Center's Systems Analysis Division. She has a background in Chemical and Mechanical Engineering and ten years experience in gun propulsion and armament engineering. Ms. Longo's gun-propulsion expertise has played a part in enabling the successful implementation of Design of Experiments, Empirical Modeling, Data Visualization and Data Mining for mission-critical artillery armament and weapon system design efforts.

  • A DOE Case Study: Multidisciplinary Approach to Design an Army Gun Propulsion Charge

    Abstract:

    This session will focus on the novel application of a design of experiments approach to optimize a propulsion charge configuration for a U.S. Army artillery round. The interdisciplinary design effort included contributions from subject matter experts in statistics, propulsion charge design, computational physics, and experimentation. The process, which we will present in this session, consisted of an initial, low fidelity modeling and simulation study to reduce the parametric space by eliminating inactive variables and reducing the ranges of active variables for the final design. The final design used a multi-tiered approach that consolidated data from multiple sources including low fidelity modeling and simulation, high fidelity modeling and simulation and live test data from firings in a ballistic simulator. Specific challenges of the effort that will be addressed include: integrating data from multiple sources, a highly constrained design space, functional response data, multiple competing design objectives, and real-world test constraints. The result of the effort is a final, optimized propulsion charge design that will be fabricated for live gun firing.

    Speaker Info:

    Melissa Jablonski

    Statistician

    US Army CCDC Armaments Center

    Melissa Jablonski is a statistician at the US Army Combat Capabilities Development Command Armaments Center.  She graduated from Stevens Institute of Technology with a Bachelor’s and Master’s degree in Mechanical Engineering and started her career in the area of finite element analysis.  She now works as a statistical consultant focusing in the areas of Design and Analysis of Computer Experiments (DACE) and Uncertainty Quantification (UQ).  She also acts as a technical expert and consultant in Design of Experiments (DOE), Probabilistic System Optimization, Data Mining/Machine Learning, and other statistical analysis areas for munition and weapon systems.  She is currently pursuing a Master’s degree in Applied Statistics from Pennsylvania State University.

  • A Framework for Efficient Operational Testing through Bayesian Adaptive Design

    Abstract:

    When developing a system, it is important to consider system performance from a user perspective. This can be done through operational testing---assessing the ability of representative users to satisfactorily accomplish tasks or missions with the system in operationally-representative environments. This process can be expensive and time-consuming, but is critical for evaluating a system. We show how an existing design of experiments (DOE) process for operational testing can be leveraged to construct a Bayesian adaptive design. This method, nested within the larger design created by the DOE process, allows interim analyses using predictive probabilities to stop testing early for success or futility. Furthermore, operational environments with varying probabilities of encountering are directly used in product evaluation. Representative simulations demonstrate how these interim analyses can be used in an operational test setting, and reductions in necessary test events are shown. The method allows for using either weakly informative priors when data from previous testing is not available, or for priors built using developmental testing data when it is available. The proposed method for creating priors using developmental testing data allows for more flexibility in which data can be incorporated into analysis than the current process does, and demonstrates that it is possible to get more precise parameter estimates. This method will allow future testing to be conducted in less time and at less expense, on average, without compromising the ability of the existing process to verify the system meets the user's needs.

    Speaker Info:

    Victoria Sieck

    Student / Operations Research Analyst

    University of New Mexico / Air Force Institute of Technology

    Victoria R.C. Sieck is a PhD Candidate in Statistics at the University of New Mexico. She is also an Operations Research Analyst in the US Air Force (USAF), with experiences in the USAF testing community as a weapons and tactics analyst and an operational test analyst. Her research interests include design of experiments and improving operational testing through the use of Bayesian methods.

  • A Great Test Requires a Great Plan

    Abstract:

    The Scientific Test and Analysis Techniques (STAT) process is designed to provide structure for a test team to progress from a requirement to decision quality information. The four phases of the STAT process are Plan, Design, Execute, and Analyze. Within the Test and Evaluation (T&E) community we tend to focus on the quantifiable metrics and the hard science of testing, which are the Design and the Analyze phases. At the STAT Center of Excellence
    (COE) we have emphasized an increased focus on the planning phase and in this presentation we focus on the elements necessary for a comprehensive planning session. In order to efficiently and effectively test a system it is vital that the test team understand the requirements, the System Under Test (SUT) to include any subsystems that will be tested, and the test facility. To accomplish this the right team members with the necessary knowledge must be in the room and prepared to present their information and have an educated discussion to arrive at a comprehensive agreement about the desired end stated of the test. Our recommendations for the initial planning meeting are based on a thorough study of the STAT process and on lessons learned from actual planning meetings.

    Speaker Info:

    Aaron Ramert

    STAT Analyst

    Scientific Test and Analytics Techniques Center of Excellence (STAT COE)

    Mr. Ramert is a graduate of the US Naval Academy and the Naval Postgraduate School and a 20 year veteran of the Marine Corps. During his career in the Marines he served tours in operational air and ground units as well as academic assignments. He joined the Scientific Test and Analysis Techniques (STAT) Center of Excellence (COE) in 2016 where he works with major acquisition programs with the Department of Defense to apply rigor and efficiency to their test and evaluation methodology through the application of the STAT process.

  • A Metrics-based Software Tool to Guide Test Activity Allocation

    Abstract:

    Existing software reliability growth models are limited to parametric models that characterize the number of defects detected as a function of testing time or the number of vulnerabilities discovered with security testing. However, the amount and types of testing effort applied are rarely considered. This lack of detail regarding specific testing activities limits the application of software reliability growth models to general inferences such as the additional amount of testing required to achieve a desired failure intensity, mean time to failure, or reliability (period of failure free operation).

    This presentation provides an overview of an open source software reliability tool implementing covariate software reliability models [1] to aid DoD organizations and their contractors who desire to quantitatively measure and predict the reliability and security improvement of software. Unlike traditional software reliability growth models, the models implemented in the tool can accept multiple discrete time series corresponding to the amount of each type of test activity performed as well as dynamic metrics computed in each interval. When applied in the context of software failure or vulnerability discovery data, the parameters of each activity can be interpreted as the effectiveness of that activity to expose reliability defects or security vulnerabilities. Thus, these enhanced models provide the structure to assess existing and emerging techniques in an objective framework that promotes thorough testing and process improvement, motivating the collection of relevant metrics and precise measurements of the time spent performing various testing activities.

    References
    [1] Vidhyashree Nagaraju, Chathuri Jayasinghe, Lance Fiondella, Optimal test activity allocation for covariate software reliability and security models, Journal of Systems and Software, Volume 168, 2020, 110643.

    Speaker Info:

    Jacob Aubertine

    Graduate Research Assistant

    University of Massachusetts Dartmouth

    Jacob Aubertine is pursuing a MS degree in the Department of Electrical and Computer Engineering at the University of Massachusetts Dartmouth, where he also received his BS (2020) in Computer Engineering. His research interests include software reliability, performance engineering, and statistical modeling.

  • Advancements in Characterizing Warhead Fragmentation Events

    Abstract:

    Fragmentation analysis is a critical piece of the live fire test and evaluation (LFT&E) of lethality and vulnerability aspects of warheads. But the traditional methods for data collection are expensive and laborious. New optical tracking technology is promising to increase the fidelity of fragmentation data, and decrease the time and costs associated with data collection.

    However, the new data will be complex, three dimensional 'fragmentation clouds', possibly with a time component as well. This raises questions about how testers can effectively summarize spatial data to draw conclusions for sponsors.

    In this briefing, we will discuss the Bayesian spatial models that are fast and effective for characterizing the patterns in fragmentation data, along with several exploratory data analysis techniques that help us make sense of the data. Our analytic goals are to
    - Produce simple statistics and visuals that help the live fire analyst compare and contrast warhead fragmentations;
    - Characterize important performance attributes or confirm design/spec compliance; and
    - Provide data methods that ensure higher fidelity data collection translates to higher fidelity modeling and simulation down the line.

    This talk is a version of the first-step feasibility study IDA is taking – hopefully much more to come as we continue to work on this important topic.

    Speaker Info:

    John Haman

    Research Staff Member

    Institute for Defense Analyses

    Dr. John Haman is a statistician at the Institute for Defense Analyses, where he develops methods and tools for analyzing test data. He has worked with a variety of Army, Navy, and Air Force systems, including counter-UAS and electronic warfare systems. Currently, John is providing technical support on operational testing to the Joint Artificial Intelligence Center.

  • An Adaptive Approach to Shock Train Detection

    Abstract:

    Development of new technology always incorporates model testing. This is certainly true for hypersonics, where flight tests are expensive and testing of component- and system-level models has significantly advanced the field. Unfortunately, model tests are often limited in scope, being only approximations of reality and typically only partially covering the range of potential realistic conditions. In this talk, we focus on the problem of real-time detection of the shock train leading edge in high-speed air-breathing engines, such as dual-mode scramjets. Detecting and controlling the shock train leading edge is important to the performance and stability of such engines, and a problem that has seen significant model testing on the ground and some flight testing. Often, methods developed for shock train detection are specific to the model used. Thus, they may not generalize well when tested in another facility or in flight as they typically require a significant amount of prior characterization of the model and flow regime. A successful method for shock train detection needs to be robust to changes in features like isolator geometry, inlet and combustor states, flow regimes, and available sensors. Such data can be difficult or impossible to obtain if the isolator operating regime is large. To this end, we propose the an approach for real-time detection of the isolator shock train. Our approach uses real-time pressure measurements to adaptively estimate the shock train position in a data-driven manner. We show that the method works well across different isolator models, placement of pressure transducers, and flow regimes. We believe that a data-driven approach is the way forward for bridging the gap between testing and reality, saving development time and money.

    Speaker Info:

    Greg Hunt

    Assistant Professor

    William & Mary

    Greg is an interdisciplinary researcher that builds scientific tools. He is trained as a statistician, mathematician and computer scientist. Currently he work on a diverse set of problems in biology, physics, and engineering.

  • Army's Open Experimentation Test Range for Internet of Battlefield Things: MSA-DPG

    Abstract:

    One key feature of future Multi-Domain Operations (MDO) is expected to be the ubiquity of devices providing information connected in an Internet of Battlefield Things (IoBT). To this end, U.S. Army aims to advance the underlying science of pervasive and heterogeneous IoBT sensing, networking, and actuation. In this effort, IoBT experimentation testbed is an integral part of the capability development, which evaluates and validates the scientific theories, algorithms, and technologies integrated with C2 systems under the military scenarios. Originally conceived for this purpose, Multi-Purpose Sensing Area Distributed Proving Ground (MSA-DPG) is an open-range test bed developed by the Army Research Laboratory (ARL). We discuss the vision and the development of MSA-DPG and its fundamental roles of MSA-DPG in research serving the communities of Military Sciences.

    Speaker Info:

    Jade Freeman

    Research Scientist

    U.S. Army DEVCOM Army Research Laboratory

    Dr. Jade Freeman currently serves as the Associate Branch Chief and a Team Lead at Battlefield Information Systems Branch. In this capacity, Dr. Freeman oversees information systems and engineering research projects and analyses. Prior to joining ARL, Dr. Freeman served as the Senior Statistician at the Office of Cybersecurity and Communications, Department of Homeland Security. Throughout the career, her work in operations and research includes cyber threat analyses, large survey design and analyses,  experimental design, survival analysis, and missing data imputation methods.  Dr. Freeman is also a PMP certified project manager, experienced in leading and managing IT development projects. Dr. Freeman obtained a Ph. D. in Statistics from the George Washington University.

  • Assessing Human-Autonomy Interaction in Driving-Assist Settings

    Abstract:

    In order to determine how the perception, Autopilot, and driver monitoring systems of Tesla Model 3s interact with one another, and also to determine the scale of between- and within-car variability, a series of four on-road tests were conducted. Three sets of tests were conducted on a closed track and one was conducted on a public highway. Results show wide variability across and within three Tesla Model 3s, with excellent performance in some cases but also likely catastrophic performance in others. This presentation will not only highlight how such interactions can be tested, but also how results can inform requirements and designs of future autonomous systems.

    Speaker Info:

    Mary "Missy" Cummings

    Professor

    Duke University

    Professor Mary (Missy) Cummings received her B.S. in Mathematics from the US Naval Academy in 1988, her M.S. in Space Systems Engineering from the Naval Postgraduate School in 1994, and her Ph.D. in Systems Engineering from the University of Virginia in 2004. A naval pilot from 1988-1999, she was one of the U.S. Navy's first female fighter pilots. She is currently a Professor in the Duke University Electrical and Computer Engineering Department and the Director of the Humans and Autonomy Laboratory. She is an AIAA Fellow and a member of the Veoneer, Inc. Board of Directors

  • Assessing Next-Gen Spacesuit Reliability: A Probabilistic Analysis Case Study at NASA

    Abstract:

    Under the Artemis program, the Exploration Extravehicular Mobility Unit (xEMU) spacesuit will ensure the safety of NASA astronauts during the targeted 2024 return to the moon. Efforts are currently underway to finalize and certify the xEMU design. There is a delicate balance between producing a spacesuit that is robust enough to safely withstand potential fall events while still satisfying stringent mass and mobility requirements. The traditional approach of considering worst case-type loading and applying conservative factors of safety (FoS) to account for uncertainties in the analysis was unlikely to meet the narrow design margins. Thus, the xEMU design requirement was modified to include a probability of no impact failure (PnIF) threshold that must be verified through probabilistic analysis.

    As part of a broader one year effort to help integrate modern uncertainty quantification (UQ) methodology into engineering practice at NASA, the certification of the xEMU spacesuit was selected as the primary case study. The project, led by NASA Langley Research Center (LaRC) under the Engineering Research & Analysis (R&A) Program in 2020, aimed to develop an end-to-end UQ workflow for engineering problems and to help facilitate reliability-based design at NASA. The main components of the UQ workflow included 1) sensitivity analysis to identify the most influential model parameters, 2) model calibration to quantified model parameter uncertainties using experimental data, and 3) uncertainty propagation for producing probabilistic model predictions and estimating reliability. Particular emphasis was placed on overcoming the common practical barrier of prohibitive computational expense associated with probabilistic analysis by leveraging state-of-the-art UQ methods and high performance computing (HPC). In lieu of mature computational models and test data for the xEMU at the time of the R&A Program, the UQ workflow for estimating PnIF was demonstrated using existing models and data from the previous generation of spacesuits (the Z-2). However, the lessons learned and capabilities developed in the process of the R&A are directly transferable to the ongoing xEMU certification effort and are currently being integrated in 2021.

    This talk provides an overview of the goals of and findings under NASA's UQ R&A project, focusing on the spacesuit certification case study. The steps of the UQ workflow applied to the Z-2 spacesuit using the available finite element method (FEM) models and impact test data will be detailed. The ability to quantify uncertainty in the most influential subset of FEM model input parameters and then propagate that uncertainty to estimates of PnIF is demonstrated. Since the FEM model of the full Z-2 assembly took nearly 1 day to execute just once, the advanced UQ methods and HPC utilization required to make the probabilistic analysis tractable are discussed. Finally, the lessons learned from conducting the case study are provided along with planned ongoing/future work for the xEMU certification in 2021.

    Speaker Info:

    James Warner

    Computational Scientist

    NASA Langley Research Center

    Dr. James Warner joined NASA Langley Research Center (LaRC) in 2014 as a Research Computer Engineer after receiving his PhD in Computational Solid Mechanics from Cornell University. Previously, he received his B.S. in Mechanical Engineering from SUNY Binghamton University and held temporary research positions at the National Institute of Standards and Technology and Duke University. Dr. Warner is a member of the Durability, Damage Tolerance, and Reliability Branch (DDTRB) at LaRC, where he focuses on developing computationally-efficient approaches for uncertainty quantification for a range of applications including structural health management and space radiation shielding design. His other research interests include high performance computing, inverse methods, and topology optimization.

  • Automated Test Case Generation for Human-Machine Interaction

    Abstract:

    The growing complexity of interactive systems requires increasing amounts of effort to ensure reliability and usability. Testing is an effective approach for finding and correcting problems with implemented systems. However, testing is often regarded as the most intellectual-demanding, time-consuming, and expensive part of system development. Furthermore, it can be difficult (if not impossible) for testers to anticipate all of the conditions that need to be evaluated. This is especially true of human-machine systems. This is because the human operator (who is attempting to achieve his or her task goals) is an additional concurrent component of the system and one whose behavior is not strictly governed by the implementation of designed system elements.

    To address these issues, researchers have developed approaches for automatically generating test cases. Among these are formal methods: rigorous, mathematical languages, tools, and techniques for modeling, specifying, and verifying (proving properties about) systems. These support model-based approaches (almost exclusively used in computer engineering) for creating tests that are efficient and provide guarantees about their completeness (at least with respect to the model). In particular, model checking can be used for automated test case generation. In this, efficient and exhaustive algorithms search a system model to find traces (test cases) through that model that satisfy specified coverage criteria: descriptions of the conditions the tests should encounter during execution.

    This talk focuses on a formal automated test generation method developed in my lab for creating cases for human-system interaction. This approach makes use of task models. Task models are a standard human factors method for describing how humans normatively achieve goals when interacting with a system. When these models are given formal semantics, they can be paired with models of system behavior to account for human-system interaction. Formal, automated test case generation can then be performed for coverage criteria asserted over the system (for example, to cover the entire human interface) or human task (to ensure all human activities or actions are performed). Generated tasks, when manually executed with the system, can serve two purposes. First, testers can observe whether the human behavior in test always produces the system behavior from the test. This can help analysts validate the models and, if no problems are found, be sure that any desirable properties exhibited by the model hold in the actual system. Second, testers will be able to use their insights about system usability and performance to subjectively evaluate the system under all of conditions contained in the tests. Given the coverage guarantees provided by the process, this means that testers can be confident they have seen every system condition relevant to the coverage criteria.

    In this talk, I will describe this approach to automated test case generation and illustrate its utility with a simple example. I will then describe how this approach could be extended to account for different dimensions of human cognitive performance and emerging challenges in human-autonomy interaction.

    Speaker Info:

    Matthew Bolton

    Associate Professor

    University at Buffalo, the State University of New York

    Dr. Bolton is an Associate Professor of Industrial and Systems Engineering at the University at Buffalo (UB). He obtained his Ph.D. in Systems Engineering from the University of Virginia, Charlottesville, in 2010. Before joining UB, he worked as a Senior Research Associate at NASA’s Ames Research Center and as an Assistant Professor of Industrial Engineering at the University of Illinois at Chicago. Dr. Bolton is an expert on the use of formal methods in human factors engineering and has published widely in this area. He has successfully applied his research to safety-critical applications in aerospace, medicine, defense, and cybersecurity.  He has received funding on projects sponsored by the European Space Agency, NSF, NASA, AHRQ, and DoD.  This includes a Young Investigator Program Award from the Army Research Office. He is an associate editor for the IEEE Transactions on Human Machine Systems and the former Chair of the Human Performance Modeling Technical Group for the Human Factors and Ergonomics Society. He was appointed as a Senior Member of IEEE in 2015 and received the Human Factors and Ergonomics Society’s William C. Howell Young Investigator award in 2018.

  • Certification by Analysis: A 20-year Vision for Virtual Flight and Engine Testing

    Abstract:

    Analysis-based means of compliance for airplane and engine certification, commonly known as “Certification by Analysis” (CbA), provides a strong motivation for the development and maturation of current and future flight and engine modeling technology. The most obvious benefit of CbA is streamlined product certification testing programs at lower cost while maintaining equivalent levels of safety. The current state of technologies and processes for analysis is not sufficient to adequately address most aspects of CbA today, and concerted efforts to drastically improve analysis capability are required to fully bring the benefits of CbA to fruition.

    While the short-term cost and schedule benefits of reduced flight and engine testing are clearly visible, the fidelity of analysis capability required to realize CbA across a much larger percentage of product certification is not yet sufficient.  Higher-fidelity analysis can help reduce the product development cycle and avoid costly and unpredictable performance and operability surprises that sometimes happen late in the development cycle. Perhaps the greatest long-term value afforded by CbA is the potential to accelerate the introduction of more aerodynamically and environmentally efficient products to market, benefitting not just manufacturers, but also airlines, passengers, and the environment.

    A far-reaching vision for CbA has been constructed to offer guidance in developing lofty yet realizable expectations regarding technology development and maturity through stakeholder involvement. This vision is composed of the following four elements:

    The ability to numerically simulate the integrated system performance and response of full-scale airplane and engine configurations in an accurate, robust, and computationally efficient manner.
    The development of quantified flight and engine modeling uncertainties to establish appropriate confidence in the use of numerical analysis for certification.
    The rigorous validation of flight and engine modeling capabilities against full-scale data from critical airplane and engine testing.
    The use of flight and engine modeling to enable Certification by Simulation.
    Key technical challenges include the ability to accurately predict airplane and engine performance for a single discipline, the robust and efficient integration of multiple disciplines, and the appropriate modeling of system-level assessment. Current modeling methods lack the capability to adequately model conditions that exist at the edges of the operating envelope where the majority of certification testing generally takes place.  Additionally, large-scale engine or airplane multidisciplinary integration has not matured to the level where it can be reliably used to efficiently model the intricate interactions that exist in current or future aerospace products.

    Logistical concerns center primarily on the future High Performance Computing capability needed to perform the large number of computationally intensive simulations needed for CbA. Complex, time-dependent, multidisciplinary analyses will require a computing capacity increase several orders of magnitude greater than is currently available.

    Developing methods to ensure credible simulation results is critically important for regulatory acceptance of CbA. Confidence in analysis methodology and solutions is examined so that application validation cases can be properly identified. Other means of measuring confidence such as uncertainty quantification and “validation-domain” approaches may increase the credibility and trust in the predictions.

    Certification by Analysis is a challenging long-term endeavor that will motivate many areas of simulation technology development, while driving the potential to decrease cost, improve safety, and improve airplane and engine efficiency. Requirements to satisfy certification regulations provide a measurable definition for the types of analytical capabilities required for success. There is general optimism that CbA is a goal that can be achieved, and that a significant amount of flight testing can be reduced in the next few decades.

    Speaker Info:

    Timothy Mauery

    Boeing

    For the past 20 years, Timothy Mauery has been involved in the development of low-speed CFD design processes. In this capacity, he has had the opportunity to interact with users and provide CFD support and training throughout the product development cycle. Prior to moving to the Commercial Airplanes division of The Boeing Company, he worked at the Lockheed Martin Aircraft Center, providing aerodynamic liaison support on a variety of military modification and upgrade programs. At Boeing, he has had the opportunity to support both future products as well as existing programs with CFD analysis and wind tunnel testing. Over the past ten years, he has been closely involved in the development and evaluation of analysis-based certification processes for commercial transport vehicles, for both derivative programs as well as new airplanes. Most recently he was the principal investigator on a NASA research announcement for developing requirements for airplane certification by analysis. Timothy received his bachelor's degree from Brigham Young University, and his master's degree from The George Washington University, where he was also a research assistant at NASA-Langley.

  • Challenges in Verification and Validation of CFD for Industrial Aerospace Applications

    Abstract:

    Verification and validation represent important steps for appropriate use of CFD codes and it is presently considered the user’s responsibility to ensure that these steps are completed. Inconsistent definitions and use of these terms in aerospace complicate the effort. For industrial-use CFD codes, there are a number of challenges that can further confound these efforts including varying grid topology, non-linearities in the solution, challenges in isolating individual components, and difficulties in finding validation experiments. In this presentation, a number of these challenges will be reviewed with some specific examples that demonstrate why verification is much more involved and challenging than typically implied in numerical method courses, but remains an important exercise.
    Some of the challenges associated with validation will also be highlighted using a range of different cases, from canonical flow elements to complete aircraft models. Benchmarking is often used to develop confidence in CFD solutions for engineering purposes, but falls short of validation in the absence of being able to predict bounds on the simulation error. The key considerations in performing benchmarking and validation will be highlighted and some current shortcomings in practice will be presented, leading to recommendations for conducting validation exercises. CFD workshops have considerably improved in their application of these practices, but there continues to be need for additional steps.

    Speaker Info:

    Andrew Cary

    Technical Fellow

    Boeing Research and Technology

    Andrew Cary is a technical fellow of the Boeing Company in CFD and is the focal for the BCFD solver. In this capacity, he has a strong focus on supporting users of the code across the Boeing enterprise as well as leading the development team. These responsibilities align with his interests in verification, validation, and uncertainty quantification as an approach to ensure reliable results as well as in algorithm development, CFD-based shape optimization, and unsteady fluid dynamics. Since hiring into the CFD team in 1996, he has led CFD application efforts across a full range of Boeing products as well as working in grid generation methods, flow solver algorithms, post-processing approaches, and process automation. These assignments have given him the opportunity to work with teams around the world, both inside and outside Boeing. Andrew has been an active member of the American Institute of Aeronautics and Astronautics, serving in multiple technical committees, including his present role on the CFD Vision 2030 Integration Committee.
    Andrew has also been an adjunct professor at Washington University since 1999, teaching graduate classes in CFD and fluid dynamics. Andrew received a Ph.D. (97) in Aerospace Engineering from the University of Michigan and a B.S. (92) and M.S. (97) in Aeronautical and Astronautical Engineering from the University of Illinois Urbana-Champaign.

  • Characterizing Human-Machine Teaming Metrics for Test and Evaluation

    Abstract:

    As advanced technologies and capabilities are enabling machines to engage in tasks that only humans have done previously, new challenges have emerged for the rigorous testing and evaluation (T&E) of human-machine teaming (HMT) concepts. We differentiate the distinction between a HMT and a human using a tool, and new challenges are enumerated: Agents’ mental models are opaque, machine-to-human communications need to be evaluated, and self-tasking and autonomy need to be evaluated. We argue that a focus on mission outcomes cannot fully characterize team performance due to the increased problem space evaluated and that the T&E community needs to develop and refine new metrics for agents of teams and teammate interactions. Our IDA HMT framework outlines major categories for HMT evaluation, emphasizing team metrics and parallelizing agent metrics across humans and machines. Major categories are tied to the literature and proposed as a starting point for additional T&E metric specification for robust evaluation.

    Speaker Info:

    Brian Vickers

    Research Staff Member

    Institute for Defense Analyses

    Brian is a Research Staff Member at the Institute for Defense Analyses where he applies rigorous statistics and study design to evaluate, test, and report on various programs.

    Dr. Vickers holds a Ph.D. from the University of Michigan, Ann Arbor where he researched various factors that influence decision making, with a focus on how people allocate their money, time, and other resources.

  • Closing Remarks

    Abstract:

    Mr. William (Allen) Kilgore serves as Director, Research Directorate at NASA Langley Research Center. He previously served as Deputy Director of Aerosciences providing executive leadership and oversight for the Center’s Aerosciences fundamental and applied research and technology capabilities with the responsibility over Aeroscience experimental and computational research. After being appointed to the Senior Executive Service (SES) in 2013, Mr. Kilgore served as the Deputy Director, Facilities and Laboratory Operations in the Research Directorate.

    Prior to this position, Mr. Kilgore spent over twenty years in the operations of NASA Langley’s major aerospace research facilities including budget formulation and execution, maintenance, strategic investments, workforce planning and development, facility advocacy, and integration of facilities’ schedules. During his time at Langley, he has worked in nearly all of the major wind tunnels with a primary focus on process controls, operations and testing techniques supporting aerosciences research. For several years, Mr. Kilgore led the National Transonic Facility, the world’s largest cryogenic wind tunnel. Mr. Kilgore has been at NASA Langley Research Center since 1989, starting as a graduate student.

    Mr. Kilgore earned a B.S. and M.S. in Mechanical Engineering with concentration in dynamics and controls from Old Dominion University in 1984 and 1989, respectively. He is the recipient of NASA’s Exceptional Engineering Achievement Medal in 2008 and Exceptional Service Medal in 2012.

    Speaker Info:

    William "Allen" Kilgore

    Director, Research Directorate

    NASA Langley Research Center

  • Cognitive Work Analysis - From System Requirements to Validation and Verification

    Abstract:

    Human-system interaction is a critical yet often neglected aspect of the system development process. It is mostly commonly incorporated into system performance assessments late in the design process leaving little opportunity for any substantive changes to be made to ensure satisfactory system performance achieved. As a result, workarounds and compromises become a patchwork of “corrections” that end up in the final fielded system. But what if mission outcomes, the work context, and performance expectations can be articulated earlier in the process, thereby influencing the development process throughout?

    This presentation will discuss how a formative method from the field of cognitive systems engineering, cognitive work analysis, can be leveraged to derive design requirements compatible with traditional systems engineering processes. This method establishes not only requirements from which system designs can be constructed, but also how system performance expectations can be more acutely defined a priori to guide the validation and verification process. Cognitive work analysis methods will be described to highlight how ‘cognitive work’ and ‘information relationship’ requirements can be derived and will be showcased in a case-study application of building a decision support system for future human spaceflight operations. Specifically, a description of the testing campaign employed to verify and validate the fielded system will be provided. In summary, this presentation will cover how system requirements can be established early in the design phase, guide the development of design solutions, and subsequently be used to assess the operational performance of the solutions within the context of the work domain it is intended to support.

    Speaker Info:

    Matthew Miller

    Exploration Research Engineer

    Jacobs/NASA Johnson Space Center

    Matthew J. Miller is an Exploration Research Engineer within the Astromaterials Research and Exploration Sciences (ARES) division at NASA Johnson Space Center. His work focuses on advancing present-day tools, technologies and techniques to improve future EVA operations by applying cognitive systems engineering principles. He has over seven years of EVA flight operations and NASA analog experience where he has developed and deployed various EVA support systems and concept of operations. He received a B.S. (2012), M.S. (2014) and Ph.D. (2017) in aerospace engineering from the Georgia Institute of Technology.

  • Collaborative Human AI Red Teaming

    Abstract:

    The Collaborative Human AI Red Teaming (CHART) project is an effort to develop an AI Collaborator which can help human test engineers quickly develop test plans for AI systems. CHART was built around processes developed for cybersecurity red-teaming. Using a goal-focused approach based upon iteratively testing and attacking a system then updating the testers model to discover novel failure modes not discovered by traditional T&E processes. Red teaming is traditionally a time intensive process which requires subject matter expert to study the system they are testing for months in order to develop attack strategies. CHART will accelerate this process by guiding the user through the process of diagraming the AI system under test and drawing upon a pre-established body of knowledge to identify the most probably vulnerabilities.

    CHART was provided internal seedling funds during FY20 to perform a feasibility study of the technology. During this period the team developed a taxonomy of AI vulnerabilities and an ontology of AI irruptions. Irruptions being events (either caused by a malicious actor or unintended consequences) which trigger the vulnerability and lead to an undesirable result. Using this taxonomy we built a threat modeling tool that allows users to diagram their AI system and identifies all the possible irruptions which could occur. This initial demonstration was based around two scenarios. An smartphone-based ECG system for telemedicine and a UAV trained reinforcement learning to avoid mid-air collisions.

    In this talk we will first discuss how Red Teaming differs from adversarial machine learning and traditional testing and evaluation. Next, we will provide an overview of how industry is approaching the problem of AI Red Teaming and how our approach differs. Finally, we will discuss how we developed our taxonomy of AI vulnerabilities, how to apply goal-focused testing to AI systems, and our strategy for automatically generating test plans.

    Speaker Info:

    Galen Mullins

    Senior AI Researcher

    Johns Hopkins University Applied Physics Laboratory

    Dr. Galen Mullins is a senior staff scientist in the Robotics Group of the Intelligent Systems branch at the Johns Hopkins Applied Physics Laboratory. His research is focused on developing intelligent testing techniques and adversarial tools for finding the vulnerabilities of AI systems. His recent project work has included the development of new imitation learning frameworks for modeling the behavior of autonomous vehicles, creating algorithms for generating adversarial environments, and developing red teaming procedures for AI systems.  He is the secretary for the IEEE/RAS working group on Guidelines for Verification of Autonomous Systems and teaches the Introduction to Robotics course at the Johns Hopkins Engineering for Professionals program.

    Dr. Galen Mullins received his B.S degrees in Mechanical Engineering and Mathematics respectively from Carnegie Mellon University in 2007 and joined APL the same year. Since then he earned his M.S. in Applied Physics from Johns Hopkins University in 2010, and his Ph.D in Mechanical Engineering from the University of Maryland in 2018. His doctoral research was focused on developing active learning algorithms for generating adversarial scenarios for autonomous vehicles.

  • Combinatorial Interaction Testing

    Abstract:

    This mini-tutorial provides an introduction to combinatorial interaction testing (CIT). The main idea behind CIT is to pseudo-exhaustively test software and hardware systems by covering combinations of components in order to detect faults. In 90 minutes, we provide an overview of this domain that includes the following topics: the role of CIT in software and hardware testing, how it complements and differs from design of experiments, considerations such as variable strength and constraints, the typical combinatorial arrays used for constructing test suites, and existing tools for test suite construction. Last, defense systems are increasingly relying on software with embedded machine learning (ML), yet ML poses unique challenges to applying conventional software testing due to characteristics such as the large input space, effort required for white box testing, and emergent behaviors apparent only at integration or system levels. As a well-studied black box approach to testing integrated systems with a pseudo-exhaustive strategy for handling large input spaces, CIT provides a good foundation for testing ML. In closing, we present recent research adapting concepts of combinatorial coverage to test design for ML.

    Speaker Info:

    Erin Lanus

    Research Assistant Professor

    Virginia Tech

    Erin Lanus is a Research Assistant Professor at the Hume Center for National Security and Technology at Virginia Tech. She has a Ph.D. in Computer Science with a concentration in cybersecurity  from Arizona State University. Her experience includes work as a Research Fellow at University of Maryland Baltimore County and as a High Confidence Software and Systems Researcher with the Department of Defense. Her current interests are software and combinatorial testing, machine learning in cybersecurity, and artificial intelligence assurance.

  • Cybersecurity Metrics and Quantification: Problems, Some Results, and Research Directions

    Abstract:

    Cybersecurity Metrics and Quantification is a fundamental but notoriously hard problem. It is one of the pillars underlying the emerging Science of Cybersecurity. In this talk, I will describe a number of cybersecurity metrics quantification research problems that are encountered in evaluating the effectiveness of a range of cyber defense tools. I will review the research results we have obtained over the past years. I will also discuss future research directions, including the ones that are undertaken in my research group.

    Speaker Info:

    Shouhuai Xu

    Professor

    University of Colorado Colorado Springs

    Shouhuai Xu is the Gallogly Chair Professor in the Department of Computer Science, University of Colorado Colorado Springs (UCCS). Prior to joining UCCS, he was with the Department of Computer Science, University of Texas at San Antonio. He pioneered a systematic approach, dubbed Cybersecurity Dynamics, to modeling and quantifying cybersecurity from a holistic perspective. This approach has three orthogonal research thrusts: metrics (for quantifying security, resilience and trustworthiness/uncertainty, to which this talk belongs), cybersecurity data analytics, and cybersecurity first-principle modeling (for seeking cybersecurity laws). His research has won a number of awards, including the 2019 worldwide adversarial malware classification challenge organized by the MIT Lincoln Lab. His research has been funded by AFOSR, AFRL, ARL, ARO, DOE, NSF and ONR. He co-initiated the International Conference on Science of Cyber Security (SciSec) and is serving as its Steering Committee Chair. He has served as Program Committee co-chair for a number of international conferences and as Program Committee member for numerous international conferences.  He is/was an Associate Editor of IEEE Transactions on Dependable and Secure Computing (IEEE TDSC), IEEE Transactions on Information Forensics and Security (IEEE T-IFS), and IEEE Transactions on Network Science and Engineering (IEEE TNSE). More information about his research can be found at https://xu-lab.org.

  • Dashboard for Equipment Failure Reports

    Abstract:

    Equipment Failure Reports (EFRs) describe equipment failures and the steps taken as a result of these failures. EFRs contain both structured and unstructured data. Currently, analysts manually read through EFRs to understand failure modes and make recommendations to reduce future failures. This is a tedious process where important trends and information can get lost. This motivated the creation of an interactive dashboard that extracts relevant information from the unstructured (i.e. free-form text) data and combines it with structured data like failure date, corrective action and part number. The dashboard is an RShiny application that utilizes numerous text mining and visualization packages, including tm, plotly, edgebundler, and topicmodels. It allows the end-user to filter to the EFRs that they care about and visualize meta-data, such as geographic region where the failure occurred, over time allowing previously unknown trends to be seen. The dashboard also applies topic modeling to the unstructured data to identify key themes. Analysts are now able to quickly identify frequent failure modes and look at time and region-based trends in these common equipment failures.

    Speaker Info:

    Robert Cole Molloy

    Johns Hopkins University Applied Physics Laboratory

    Robert Molloy is a data scientist for the Johns Hopkins University Applied Physic Laboratory's Systems Analysis Group, where he supports a variety of projects including text mining on unstructured text data, applying machine learning techniques to text and signal data, and implementing and modifying existing natural language models. He graduated from the University of Maryland, College Park in May 2020 with a dual degree in computer science and mathematics with a concentration in statistics.

  • Debunking Stress Rupture Theories Using Weibull Regression Plots

    Abstract:

    As statisticians, we are always working on new ways to explain statistical methodologies to non-statisticians. It is in this realm that we never underestimate the value of graphics and patience! In this presentation, we present a case study that involves stress rupture data where a Weibull regression is needed to estimate the parameters. The context of the case study results from a multi-stage project supported by NASA’s Engineering Safety Center (NESC) where the objective was to assess the safety of composite overwrapped pressure vessels (COPVs). The analytical team was tasked with devising a test plan to model stress rupture failure risk in carbon fiber strands that encase the COPVs with the goal of understanding the reliability of the strands at use conditions for the expected mission life. While analyzing the data, we found that the proper analysis contradicts accepted theories about the stress rupture phenomena. In this talk, we will introduce ways to graph the stress rupture data to better explain the proper analysis and also explore assumptions.

    Speaker Info:

    Anne Driscoll

    Associate Collegiate Professor

    Virginia Tech

    Anne Ryan Driscoll is an Associate Collegiate Professor in the Department of Statistics at Virginia Tech. She received her PhD in Statistics from Virginia Tech. Her research interests include statistical process control, design of experiments, and statistics education. She is a member of ASQ and ASA.

  • Empirical Analysis of COVID-19 in U.S. States and Counties

    Abstract:

    The zoonotic emergence of the coronavirus SARS-CoV-2 at the beginning of 2020 and the subsequent global pandemic of COVID-19 has caused massive disruptions to economies and health care systems, particularly in the United States. Using the results of serology testing, we have developed true prevalence estimates for COVID-19 case counts in the U.S. over time, which allows for more clear estimates of infection and case fatality rates throughout the course of the pandemic. In order to elucidate policy, demographic, weather, and behavioral factors that contribute to or inhibit the spread of COVID-19, IDA compiled panel data sets of empirically derived, publicly available COVID-19 data and analyzed which factors were most highly correlated with increased and decreased spread within U.S. states and counties. These analyses lead to several recommendations for future pandemic response preparedness.

    Speaker Info:

    Emily Heuring

    Research Staff Member

    Institute for Defense Analyses

    Dr. Emily Heuring received her PhD in Biochemistry, Cellular, and Molecular Biology from the Johns Hopkins University School of Medicine in 2004 on the topic of human immunodeficiency virus and its impact on the central nervous system. Since that time, she has been a Research Staff Member at the Institute for Defense Analyses, supporting operational testing of chemical and biological defense programs. More recently, Dr. Heuring has supported OSD-CAPE on Army and Marine Corps programs and the impact of COVID-19 on the general population and DOD.

  • Entropy-Based Adaptive Design for Contour Finding and Estimating Reliability

    Abstract:

    In reliability, methods used to estimate failure probability are often limited by the costs associated with model evaluations. Many of these methods, such as multi-fidelity importance sampling (MFIS), rely upon a cheap, surrogate model like a Gaussian process (GP) to quickly generate predictions. The quality of the GP fit, at least in the vicinity of the failure region(s), is instrumental in propping up such estimation strategies. We introduce an entropy-based GP adaptive design that, when paired with MFIS, provides more accurate failure probability estimates and with higher confidence. We show that our greedy data acquisition scheme better identifies multiple failure regions compared to existing contour-finding schemes. We then extend the method to batch selection. Illustrative examples are provided on benchmark data as well as an application to the impact damage simulator of a NASA spacesuit design.

    Speaker Info:

    Austin Cole

    PhD Candidate

    Virginia Tech

    Austin Cole is a statistics PhD candidate at Virginia Tech. He previously taught high school math and statistics courses, and holds a Bachelor’s in Mathematics and Master’s in Secondary Education from the College of William and Mary. Austin has worked with dozens of researchers as a lead collaborator in Virginia Tech’s Statistical Applications and Innovations Group (SAIG). Under the supervision of Dr. Robert Gramacy, Austin has conducted research in the area of computer experiments with focuses on Bayesian optimization, sparse covariance matrices, and importance sampling. He is currently collaborating with researchers at NASA Langley, to evaluate the safety of the next generation of spacesuits.

  • Estimating Pure-Error from Near Replicates in Design of Experiments

    Abstract:

    In design of experiments, setting exact replicates of factor settings enables estimation of pure-error; a model-independent estimate of experimental error useful in communicating inherent system noise and testing of model lack-of-fit. Often in practice, the factor levels for replicates are precisely measured rather than precisely set, resulting in near-replicates. This can result in inflated estimates of pure-error due to uncompensated set-point variation. In this article, we review previous strategies for estimating pure-error from near-replicates and propose a simple alternative. We derive key analytical properties and investigate them via simulation. Finally, we illustrate the new approach with an application.

    Speaker Info:

    Caleb King

    Research Statistician Developer

    SAS Institute

  • Fast, Unbiased Uncertainty Propagation with Multi-model Monte Carlo

    Abstract:

    With the rise of machine learning and artificial intelligence, there has been a huge surge in data-driven approaches to solve computational science and engineering problems. In the context of uncertainty propagation, machine learning is often employed for the construction of efficient surrogate models (i.e., response surfaces) to replace expensive, physics-based simulations. However, relying solely on surrogate models without any recourse to the original high-fidelity simulation will produce biased estimators and can yield unreliable or non-physical results.

    This talk discusses multi-model Monte Carlo methods that combine predictions from both fast, low-fidelity models with reliable, high-fidelity simulations to enable efficient and accurate uncertainty propagation. For instance, the low-fidelity models could arise from coarsened discretizations in space/time (e.g., Multilevel Monte Carlo - MLMC) or from general data-driven or reduced order models (e.g., Multifidelity Monte Carlo - MFMC; Approximate Control Variates - ACV). Given a fixed computational budget and a collection of models of varying cost/accuracy, the goal of these methods is to optimally allocate and combine samples across the models. The talk will also present a NASA-developed open-source Python library that acts as a general multi-model uncertainty propagation capability. The effectiveness of the discussed methods and Python library is demonstrated on a trajectory simulation application. Here, orders of magnitude computational speedup and accuracy are obtained for predicting the landing location of an umbrella heat shield under significant uncertainties in initial state, atmospheric conditions, etc.

    Speaker Info:

    Geoffrey Bomarito

    Materials Research Engineer

    NASA Langley Research Center

    Dr. Geoffrey Bomarito is a Materials Research Engineer at NASA Langley Research Center.  Before joining NASA in 2014, he earned a PhD in Computational Solid Mechanics from Cornell University.  He also holds an MEng from the Massachusetts Institute of Technology and a BS from Cornell University, both in Civil and Environmental Engineering.  Dr. Bomarito's work centers around machine learning and uncertainty quantification as applied to aerospace materials and structures.  His current topics of interest are physics informed machine learning, symbolic regression, additive manufacturing, and trajectory simulation.

  • Finding the Human in the Loop: Considerations for AI in Decision Making

    Speaker Info:

    Joe Lyons

    Lead for the Collaborative Interfaces and Teaming Core Research Area

    711 Human Performance Wing at Wright-Patterson AFB

    Joseph B. Lyons is the Lead for the Collaborative Interfaces and Teaming Core Research Area within the 711 Human Performance Wing at Wright-Patterson AFB, OH. Dr. Lyons received his PhD in Industrial/Organizational Psychology from Wright State University in Dayton, OH, in 2005.  Some of Dr. Lyons’ research interests include human-machine trust, interpersonal trust, human factors, and influence. Dr. Lyons has worked for the Air Force Research Laboratory as a civilian researcher since 2005, and between 2011-2013 he served as the Program Officer at the Air Force Office of Scientific Research where he created a basic research portfolio to study both interpersonal and human-machine trust as well as social influence. Dr. Lyons has published in a variety of peer-reviewed journals, and is an Associate Editor for the journal Military Psychology. Dr. Lyons is a Fellow of the American Psychological Association and the Society for Military Psychologists.

  • Finding the Human in the Loop: Evaluating HSI with AI-Enabled Systems: What should you consider in a TEMP?

    Speaker Info:

    Jane Pinelis

    Chief of the Test, Evaluation, and Assessment branch

    Department of Defense Joint Artificial Intelligence Center (JAIC)

    Dr. Jane Pinelis is the Chief of the Test, Evaluation, and Assessment branch at the Department of Defense Joint Artificial Intelligence Center (JAIC). She leads a diverse team of testers and analysts in rigorous test and evaluation (T&E) for JAIC capabilities, as well as development of T&E-specific products and standards that will support testing of AI-enabled systems across the DoD.

    Prior to joining the JAIC, Dr. Pinelis served as the Director of Test and Evaluation for USDI’s Algorithmic Warfare Cross-Functional Team, better known as Project Maven. She directed the developmental testing for the AI models, including computer vision, machine translation, facial recognition and natural language processing. Her team developed metrics at various levels of testing for AI capabilities and provided leadership empirically-based recommendations for model fielding. Additionally, she oversaw operational and human-machine teaming testing, and conducted research and outreach to establish standards in T&E of systems using artificial intelligence.

    Dr. Pinelis has spent over 10 years working predominantly in the area of defense and national security. She has largely focused on operational test and evaluation, both in support of the service operational testing commands and also at the OSD level. In her previous job as the Test Science Lead at the Institute of Defense Analyses, she managed an interdisciplinary team of scientists supporting the Director and the Chief Scientist of the Department of Operational Test and Evaluation on integration of statistical test design and analysis and data-driven assessments into test and evaluation practice. Before, that, in her assignment at the Marine Corps Operational Test and Evaluation Activity, Dr. Pinelis led the design and analysis of the widely publicized study on the effects of integrating women into combat roles in the Marine Corps. Based on this experience, she co-authored a book, titled “The Experiment of a Lifetime: Doing Science in the Wild for the United States Marine Corps.”

    In addition to T&E, Dr. Pinelis has several years of experience leading analyses for the DoD in the areas of wargaming, precision medicine, warfighter mental health, nuclear non-proliferation, and military recruiting and manpower planning.

    Her areas of statistical expertise include design and analysis of experiments, quasi-experiments, and observational studies, causal inference, and propensity score methods.

    Dr. Pinelis holds a BS in Statistics, Economics, and Mathematics, an MA in Statistics, and a PhD in Statistics, all from the University of Michigan, Ann Arbor.

  • Finding the Human in the Loop: Evaluating Warfighters’ Ability to Employ AI Capabilities

    Abstract:

    Although artificial intelligence may take over tasks traditionally performed by humans or power systems that act autonomously, humans will still interact with these systems in some way. The need to ensure these interactions are fluid and effective does not disappear—if anything, this need only grows with AI-enabled capabilities. These technologies introduce multiple new hazards for achieving high quality human-system integration. Testers will need to evaluate both traditional HSI issues as well as these novel concerns in order to establish the trustworthiness of a system for activity in the field, and we will need to develop new T&E methods in order to do this. In this session, we will hear how three national security organizations are preparing for these HSI challenges, followed by a broader panel discussion on which of these problems is most pressing and which is most promising for DoD research investments.

    Speaker Info:

    Dan Porter

    Research Staff Member

    Institute for Defense Analyses

  • Finding the Human in the Loop: HSI | Trustworthy AI

    Abstract:

    Recent successes and shortcomings of AI implementations have highlighted the importance of understanding how to design and interpret trustworthiness.  AI Assurance is becoming a popular objective for some stakeholders, however, assurance and trustworthiness are context-sensitive concepts that rely not only on software performance and cybersecurity, but also on human-centered design.  This talk summarizes Cognitive Engineering principles in the context of resilient AI engineering.  It also introduces approaches for successful Human-Machine Teaming in high risk work domains.

    Speaker Info:

    Stoney Trent

    Research Professor and Principal Advisor for Research and Innovation; Founder

    Virginia Tech; The Bulls Run Group, LLC

    Stoney Trent, Ph.D.  Research Professor and Principal Advisor for Research and Innovation, Virginia Tech; Founder, The Bulls Run Group, LLC

    Stoney is a Cognitive Engineer and Military Intelligence and Cyber Warfare veteran, who specializes in human-centered innovation.  As an Army officer, Stoney designed and secured over $350M to stand up the Joint Artificial Intelligence Center (JAIC) for the Department of Defense.  As the Chief of Missions in the JAIC, Stoney established product lines to deliver human-centered AI to improve warfighting and business functions in the world’s largest bureaucracy.  Previously, he established and directed U.S. Cyber Command’s $50M applied research lab, which develops and assesses products for the Cyber Mission Force.  Stoney has served as a Strategic Policy Research Fellow with the RAND Arroyo Center and is a former Assistant Professor in the Department of Behavioral Science and Leadership at the United States Military Academy.   He has served in combat and stability operations in Iraq, Kosovo, Germany, and Korea.  Stoney is a graduate of the Army War College and former Cyber Fellow at the National Security Agency.

  • Finding the Human in the Loop: Panelist

    Speaker Info:

    Rachel Haga

    Research Associate

    Institute for Defense Analyses

    Rachel is a Research Associate at the Institute for Defense Analyses where she applies rigorous statistics and study design to evaluate, test, and report on various programs. She specializes in human system integration.

  • Finding the Human in the Loop: Panelist

    Speaker Info:

    Chad Bieber

    Director, Test and Evaluation. Senior Research Engineer.

    Johns Hopkins University Applied Physics Laboratory

    Chad Bieber is a Senior Research Engineer at the Johns Hopkins University Applied Physics Lab, is currently working as the Test and Evaluation Director for Project Maven, and was previously a Research Staff Member at IDA. A former pilot in the US Air Force, he received his Ph.D. in Aerospace Engineering from North Carolina State University. Chad is interested in how humans interact with complex, and increasingly autonomous, systems.

  • Finding the Human in the Loop: Panelist

    Speaker Info:

    Poornima Madhavan

    Principal Scientist and Capability Lead for Social and Behavioral Sciences

    MITRE

    Dr. Poornima Madhavan is a Principal Scientist and Capability Lead for Social and Behavioral Sciences at the MITRE Corporation. She has more than 15 years of experience studying human-systems integration issues in sociotechnical systems including trust calibration, decision making, and risk perception. Dr. Madhavan spent the first decade of her career as a professor of Human Factors Psychology at Old Dominion University where she studied threat detection, risk analysis and human decision making in aviation and border security. This was followed by a stint as the Director of the Board on Human-Systems Integration at the National Academies of Sciences, Engineering and Medicine where she served as the primary spokesperson to the federal government on policy issues related to human-systems integration. Just before joining MITRE, Dr. Madhavan’s work focused on modeling human behavioral effects of non-lethal weapons and human-machine teaming for autonomous systems at the Institute for Defense Analyses. Dr. Madhavan received her M.A. and Ph.D. in Engineering Psychology from the University of Illinois at Urbana-Champaign and completed her post-doctoral fellowship in Social and Decision Sciences at Carnegie Mellon University.

  • Identifying Challenges and Solutions to T&E of Non-IP Networks

    Abstract:

    Many systems within the Department of Defense (DoD) contain networks that use both Internet Protocol (IP) and non-IP forms of information exchange. While IP communication is widely understood among the cybersecurity community, expertise and available test tools for non-IP protocols such as Controller Area Network (CAN), MIL-STD-1553, and SCADA are not as commonplace. Over the past decade, the DoD has repeatedly identified gaps in data collection and analysis when assessing the cybersecurity of non-IP buses. This roundtable is intended to open a discussion among testers and evaluators on the existing measurement and analysis tools for non-IP buses used across the community and also propose solutions to recurring roadblocks experienced when performing operational testing on non-IP components.

    Specific topics of discussion will include:
    What tools do you or your supporting teams use during cybersecurity events to attack, scan, and monitor non-IP communications?

    What raw quantitative data do you collect that captures the adversarial activity and/or system response from cyber aggression to non-IP components? Please provide examples of test instrumentation and data collection methods.

    What data analysis tools do you use to draw conclusions from measured data?

    What types of non-IP buses, including components on those buses, have you personally been able to test?

    What components were you not able to test? Why were you not able to test them? Was it due to safety concerns, lack of permission, lack of available tools and expertise, or other? Had you been given authority to test those components, do you think it would have improved the quality of test and comprehensiveness of the assessment?

    Speaker Info:

    Peter Mancini

    Research Staff Member

    Institute for Defense Analyses

    Peter Mancini works at the Institute for Defense Analyses, supporting the Director, Operational Test and Evaluation (DOT&E) as a Cybersecurity OT&E analyst.

  • Intelligent Integration of Limited-Knowledge IoT Services in a Cross-Reality Environment

    Abstract:

    The recent emergence of affordable, high-quality augmented-, mixed-, and virtual-reality (AR, MR, VR), technologies presents an opportunity to dramatically change the way users consume and interact with information. It has been shown that these immersive systems can be leveraged to enhance comprehension and accelerate decision-making in situations where data can be linked to spatial information, such as maps or terrain models. Furthermore, when immersive technologies are networked together, they allow for decentralized collaboration and provide perspective-taking not possible with traditional displays. However, enabling this shared space requires novel techniques in intelligent information management and data exchange. In this experiment, we explored a framework for leveraging distributed AI/ML processing to enable clusters of low-power, limited-functionality devices to deliver complex capabilities in aggregate to users distributed across the country collaborating simultaneously in a shared virtual environment. We deployed a motion detecting camera and triggered detection events to send information using a distributed request/reply worker framework to a remotely located YOLO image classification cluster. This work demonstrates the capability for various IoT and IoBT systems to invoke functionality without a priori knowledge of the specific endpoint to use to execute that functionality but by submitting a request based on a desired capability concept (e.g. image classification) with requiring only: 1) the knowledge of the broker location, 2) valid public/private key pair required to authenticate with the broker, and 3) the capability concept UUID and knowledge of request/reply formats used by that concept.

    Speaker Info:

    Mark Dennison

    Research Psychologist

    U.S. Army DEVCOM Army Research Laboratory

    Mark Dennison is a research psychologist with DEVCOM U.S. Army Research Laboratory in the Computational and Information Sciences Directorate, Battlefield Information Systems Branch. He leads a team of government researchers and contractors focused on enabling cross-reality technologies to enhance lethality across domains through information management across echelons. Dr. Dennison graduated with a bachelor’s degree from the University of California at Irvine, and earned his Master’s and Ph.D. degrees from the University of California at Irvine, all in the field of psychology with a specialization in cognitive neuroscience. He is stationed at ARL-West in Playa Vista, CA.

  • Introduction to Neural Networks for Deep Learning with Tensorflow

    Abstract:

    This mini-tutorial session discusses the practical application of neural networks from a lay person's perspective and will walk through a hands-on case study in which we build, train, and analyze a few neural network models using TensorFlow. The course will review the basics of neural networks and touch on more complex neural network architecture variants for deep learning applications. Deep learning techniques are becoming more prevalent throughout the development of autonomous and AI-enabled systems, and this session will provide students with the foundational intuition needed to understand these systems.

    Speaker Info:

    Roshan Patel

    Data Scientist

    US Army CCDC Armaments Center

    Mr. Roshan Patel is a systems engineer and data scientist working at CCDC Armament Center. His role focuses on systems engineering infrastructure, statistical modeling, and the analysis of weapon systems. He holds a Masters of Computer Science from Rutgers University, where he specialized in operating systems programming and machine learning. At Rutgers, Mr. Patel was a part-time lecturer for systems programming and data science seminars. Mr. Patel is the current AI lead for the Systems Engineering Directorate at CCDC Armaments Center.

  • Introduction to Qualitative Methods - Part 1

    Abstract:

    Qualitative data, captured through freeform comment boxes, interviews, focus groups, and activity observation is heavily employed in testing and evaluation (T&E). The qualitative research approach can offer many benefits, but knowledge of how to implement methods, collect data, and analyze data according to rigorous qualitative research standards is not broadly understood within the T&E community. This tutorial offers insight into the foundational concepts of method and practice that embody defensible approaches to qualitative research. We discuss where qualitative data comes from, how it can be captured, what kind of value it offers, and how to capitalize on that value through methods and best practices.

    Speaker Info:

    Kristina Carter

    Research Staff Member

    Institute for Defense Analyses

    Dr. Kristina Carter is a Research Staff Member at the Institute for Defense Analyses in the Operational Evaluation Division where she supports the Director, Operational Test and Evaluation (DOT&E) in the use of statistics and behavioral science in test and evaluation. She joined IDA full time in 2019 and her work focuses on the measurement and evaluation of human-system interaction. Her areas of expertise include design of experiments, statistical analysis, and psychometrics. She has a Ph.D. in Cognitive Psychology from Ohio University, where she specialized in quantitative approaches to judgment and decision making.

  • Introduction to Qualitative Methods - Part 2

    Speaker Info:

    Daniel Hellman

    Research Staff Member

    Institute for Defense Analyses

    Dr. Daniel Hellmann is a Research Staff Member in the Operational Evaluation Division at the Institute for Defense Analyses. He is also a prior service U.S. Marine with multiple combat tours. Currently, Dr. Hellmann specializes in mixed methods research on topics related to distributed cognition, institutions and organizations, and Computer Supported Cooperative Work (CSCW).”

  • Introduction to Qualitative Methods - Part 3

    Speaker Info:

    Emily Fedele

    Research Staff Member

    Institute for Defense Analyses

    Emily Fedele is a Research Staff Member at the Institute for Defense Analyses in the Science and Technology Division. She joined IDA in 2018 and her work focuses on conducting and evaluating behavioral science research on a variety of defense related topics. She has expertise in research design, experimental methods, and statistical analysis.

  • Introduction to Structural Equation Modeling: Implications for Human-System Interactions

    Abstract:

    Structural Equation Modeling (SEM) is an analytical framework that offers unique opportunities for investigating human-system interactions. SEM is used heavily in the social and behavioral sciences, where emphasis is placed on (1) explanation rather than prediction, and (2) measuring variables that are not observed directly (e.g., perceived performance, satisfaction, quality, trust, etcetera). The framework facilitates modeling of survey data through confirmatory factor analysis and latent (i.e., unobserved) variable regression models. We provide a general introduction to SEM by describing what it is, the unique features it offers to analysts and researchers, and how it is easily implemented in JMP Pro 16.0. Attendees will learn how to perform path analysis and confirmatory factor analysis, assess model fit, compare alternative models, and interpret results provided in SEM. The presentation relies on a real-data example everyone can relate to. Finally, we shed light on a few published studies that have used SEM to unveil insights on human performance factors and the mechanisms by which performance is affected. The key goal of this presentation is to provide general exposure to a modeling tool that is likely new to most in the fields of defense and aerospace.

    Speaker Info:

    Laura Castro-Schilo

    Sr. Research Statistician Developer

    SAS Institute

    Laura Castro-Schilo works on structural equations models in JMP. She is interested in multivariate analysis and its application to different kinds of data; continuous, discrete, ordinal, nominal and even text. Previously, she was Assistant Professor at the L. L. Thurstone Psychometric Laboratory at the University of North Carolina at Chapel Hill. Dr. Castro-Schilo obtained her PhD in quantitative psychology from the University of California, Davis.

  • Machine Learning Reveals that the Russian IRA’s Twitter Topic Patterns Evolved over Time

    Abstract:

    Introduction: Information Operations (IO) are a key component of our adversaries’ strategy to undermine U.S. military power without escalating to more traditional (and more easily identifiable) military strikes. Social media activity is one method of IO. In 2017 and 2018, Twitter suspended thousands of accounts likely belonging to the Kremlin-backed Internet Research Agency (IRA). Clemson University archived a large subset of these tweets (2.9M tweets posted by over 2800 IRA accounts), tagged each tweet with metadata (date, time, language, supposed geographical region, number of followers, etc.), and published this dataset on the polling aggregation website FiveThirtyEight.
    Methods: Machine Learning researchers at the Institute for Defense Analyses (IDA) downloaded Clemson’s dataset from FiveThirtyEight and analyzed both the content of the IRA tweets and their accompanying metadata. Using unsupervised learning techniques (Latent Dirichlet Allocation), IDA researchers mapped out how the patterns in the IRA’s tweet topics evolved over time.
    Results: Results showed that the IRA started tweeting in/before February 2012, but ramped up significantly in May/June 2015. Most tweets were in English, and most likely targeted the U.S. The IRA created new accounts after the first Twitter suspension in November 2017, with each new account quickly establishing an audience. Between at least January 2015 and October 2017, the IRA’s English tweet topics evolved over time, becoming tighter, more specific, more negative, and more polarizing, with the final pattern emerging in late 2015.
    Discussion: The United States government must expect that our adversaries' social media activity will continue to evolve over time. Efficient processing pipelines are needed for semi-automated analyses of time-evolving social media activity.

    Speaker Info:

    Emily Parrish

    Research Associate

    Institute for Defense Analyses

    Emily Parrish is a Research Associate in the Science and Technology Division at the Institute for Defense Analyses (IDA).  Her research focuses at IDA range from model verification and validation, countermine system developmental testing, and using natural language processing (NLP) to interpret collections of data of interest to DoD sponsors.  Emily graduated from the College of William and Mary in 2015 with a B.S. in chemistry and is currently earning her M.S. in data analytics at The George Washington University, with a focus on machine learning and NLP.

  • Metrics for Assessing Underwater Demonstrations for Detection and Classification of UXO

    Abstract:

    Receiver Operating Characteristic curves (ROC curves) are often used to assess the performance of detection and classification systems. ROC curves can have unexpected subtleties that make them difficult to interpret. For example, the Strategic Environmental Research and Development Program and the Environmental Security Technology Certification Program (SERDP/ESTCP) is sponsoring the development of novel systems for the detection and classification of Unexploded Ordnance (UXO) in underwater environments. SERDP is also sponsoring underwater testbeds to demonstrate the performance of these novel systems. The Institute for Defense Analyses (IDA) is currently designing and implementing the scoring process for these underwater demonstrations that addresses the subtleties of ROC curve interpretation. This presentation will provide an overview of the main considerations for ROC curve parameter selection when scoring underwater demonstrations for UXO detection and classification.

    Speaker Info:

    Jacob Bartel

    Research Associate

    Institute for Defense Analyses

    Jacob Bartel is a Research Associate at the Institute for Defense Analyses (IDA). His research focuses on computational modeling and verification and validation (V&V), primarily in the field of nuclear engineering. Recently, he has worked with SERDP/ESTCP to develop and implement scoring processes for testing underwater UXO detection and classification systems. Prior to joining IDA, his graduate research focused on the development of novel algorithms to model fuel burnup in nuclear reactors. Jacob earned his master’s degree in Nuclear Engineering and his bachelor’s degree in Physics from Virginia Tech.

  • Modeling and Simulation in Support of the Decision Analysis Process

    Abstract:

    Informed enterprise and program decision making is central to DoD’s Digital Engineering’s purpose statement. Decision analysis serves as a key mechanism to link the voice of the sponsor/end user with the voice of the engineer and the voice of the budgetary analyst in order to enable a closed loop requirements writing approach that is informed by rigorous assessments of a broad range of system-level alternatives across a thorough set of stakeholder value criteria to include life-cycle costs, schedule, performance, and long term viability. . The decision analytics framework employed by the U.S. Army’s Combat Capabilities Development Command (CCDC) Armaments Center (AC) is underpinned by a state-of-the-art modeling and simulation framework called PRISM (Performance Related and Integrated Suite of Models) developed at CCDC-AC. PRISM was designed in a way to allow performance estimates of a weapon system to evolve as more information and higher fidelity representations of those systems become available. PRISM provides the most up to date performance estimates into the decision analysis framework so that decision makers have the best information available when making complex strategic decisions. This briefing will unpack PRISM and highlight the model design elements that make it the foundation of CCDC-AC’s weapon system architecture and design decision making process.

    Speaker Info:

    Michael Greco

    Computer Scientist

    U.S. Army CCDC Armaments Center

    Mr. Greco is the lead architect and developer of the Performance Related and Integrated Suite of Models (PRISM) modeling and simulation framework. The PRISM tool has been used to support the analysis of many projects locally for CCDC-AC and externally for the larger Army analytical community. Mr. Greco has over 10 years of experience working in force effectiveness analysis with the use of operational and system performance simulations.

  • Multi-Agent Adaptive Coordinated Autonomy in Contested Battlefields

    Abstract:

    Autonomous multi-robot systems have the potential to augment the future force with enhanced capability while reducing the risk to human personnel in multi-domain operations (MDO). Mobile robots can constitute nodes in a heterogeneous Internet of Battlefield Things (IoBT); they can offer additional capability in the form of mobility to effectively make observations useful for planning and executing military operations against adversaries. In this talk, I will present the result of a series of field experiments where robots are tasked to perform military-relevant missions in realistic environments, in addition to describing the integration of mobile robot assets in the Multi-Purpose Sensing Array Distributed Proving Ground (MSA-DPG) for the purpose of augmenting IoBT systems.

    Speaker Info:

    John Rogers

    Senior Research Scientist

    U.S. Army DEVCOM Army Research Laboratory

    John Rogers is a research scientist specializing in autonomous mobile robotics at the Army Research Laboratory’s Intelligent Robotics Branch of the Computational and Information Sciences Directorate (CISD). John’s research has focused on autonomy for multi-robot teams as well as distributed multi-robot state estimation. John is currently leading the Tactical Behaviors group in the AI for mobility and maneuver essential research program, which is focused on developing deep learning and game theoretic maneuver for ground robots against adversaries. Prior to this, John led the multi-robot mapping research on the Autonomy Research Pilot Initiative (ARPI) on “autonomous collective defeat of hard and deeply buried targets” in collaboration with his colleagues at the Air Force Research Laboratory. John has also partnered with DCIST, MAST and Robotics CTA partners to extend funded research programs within-house collaborative projects.

    John completed his Ph.D. degree at the Georgia Institute of Technology in2012 with his advisor, Prof. Henrik Christensen from the Robotics and Intelligent Machines center. While at Georgia Tech, John participated in a variety of sponsored research projects including the Micro Autonomous Systems and Technology (MAST) project from the Army Research Laboratory, a counter-terrorism “red team” project for the Naval Research Laboratory in addition to his thesis research on semantic mapping and reasoning culminating in his thesis “Life-long mapping of objects and places for domestic robots”. Prior to attending Georgia Tech, John completed a M.S. degree in Computer Science at Stanford (2006) while working with Prof. Sebastian Thrun and Prof. Andrew Ng on the DARPA Learning Applied to Ground Robots (LAGR) project. John also holds M.S. and B.S. degrees in Electrical and Computer Engineering from Carnegie Mellon University (2002). John has authored or co-authored over 50 scientific publications in Robotics and Computer Vision journals and conferences. John’s current research interests are automatic exploration and mapping of large-scale indoor, outdoor, and subterranean environments, place recognition in austere locations, semantic scene understanding, and probabilistic reasoning for autonomous mobile robots.

    CV: Google Scholar:

    https://scholar.google.com/citations?user=uH_LDocAAAAJ&hl=en

  • Opening Remarks

    Abstract:

    Norton A. Schwartz serves as President of the Institute for Defense Analyses (IDA), a nonprofit corporation operating in the public interest. IDA manages three Federally Funded Research and Development Centers that answer the most challenging
    U.S. security and science policy questions with objective analysis leveraging extraordinary scientific, technical, and analytic expertise. At IDA, General Schwartz (U.S. Air Force, retired) directs the activities of more than 1,000 scientists and technologists employed by IDA.
    General Schwartz has a long and prestigious career of service and leadership that spans over 5 decades. He was most recently President and CEO of Business Executives for National Security (BENS). During his 6-year tenure at BENS, he was also a member of IDA’s Board of Trustees.

    Prior to retiring from the U.S. Air Force, General Schwartz served as the 19th Chief of Staff of the U.S. Air Force from 2008 to 2012. He previously held senior joint positions as Director of the Joint Staff and as the Commander of the U.S. Transportation Command. He began his service as a pilot with the airlift evacuation out of Vietnam in 1975.
    General Schwartz is a U.S. Air Force Academy graduate and holds a master’s degree in business administration from Central Michigan University. He is also an alumnus of the Armed Forces Staff College and the National War College.
    He is a member of the Council on Foreign Relations and a 1994 Fellow of Massachusetts Institute of Technology’s Seminar XXI. General Schwartz has been married to Suzie since 1981.

    Speaker Info:

    Norton Schwartz

    President

    Institute for Defense Analyses

  • Opening Remarks

    Abstract:

    Dr. O’Toole is the Acting Director, Operational Test and Evaluation as of January 20, 2021. Dr. O’Toole
    was appointed as the Principal Deputy Director, Operational Test and Evaluation in February 2020. In
    this capacity he is the principal staff assistant for all functional areas assigned to the office. He
    participates in the formulation, development, advocacy, and oversight of policies of the Secretary of
    Defense and in the development and implementation of test and test resource programs. He supports the
    Director in the planning, conduct, evaluation and reporting of operational and live fire testing. He serves
    as the Appropriation Director and Comptroller for the Operational Test and Evaluation, Defense
    Appropriation and the principal advisor to the Director on all Planning, Programming, and Budgeting
    System matters.
    Dr. O’Toole is the former Deputy Director for Naval Warfare within DOT&E. He oversaw the
    operational and live-fire testing of ships and submarines and their associated sensors; combat and
    communications systems, and weapons. He was also responsible for overseeing the adequacy of the test
    infrastructure and resources to support operational and live-fire testing for all acquisition programs across
    the Defense Department.
    Dr. O'Toole was previously an employee of the Naval Sea Systems Command as the Deputy Group
    Director of Aircraft Carrier Design and Systems Engineering. Prior to that, he was the Director of
    Systems Engineering Division (Submarines and Undersea Systems) where he led a diverse team of
    engineers who supported all Submarine Program Managers. His other assignments include being a Ship
    Design Manager/Navy's Technical Authority for the USS VIRGINIA Class submarines during design and
    new construction and for Amphibious Ships, Auxiliary Ships, and Command & Control Ships during inservice
    operations.
    Dr. O'Toole has also held other positions within the Department of Defense such as Deputy Program
    Executive Officer (Maritime and Rotary Wing) at the United States Special Operations Acquisition
    Command, Staff to the Deputy Assistant Secretary of the Navy for Research, Development & Acquisition
    (Ship Programs), and Deputy Director of Regional Maintenance for COMPACFLT (N43).
    In addition, Dr. O'Toole has over 30 years of experience as a Naval Officer (Active and Reserve) retiring
    at the rank of CAPTAIN. His significant tours include 5 Commanding Officer tours.
    Dr. Raymond D. O'Toole, Jr. is a native of Long Island NY and a graduate of the State University of New
    York - Maritime College earning a Bachelor of Engineering in Marine Engineering. He also holds a
    Master of Engineering Degree in Systems Engineering from Virginia Polytechnic Institute and State
    University, a Master of Science Degree in National Resource Strategy from the Industrial College of the
    Armed Forces, and a Doctorate in Engineering in the field of Engineering Management from the George
    Washington University, where he is now a Professional Lecturer of Engineering Management and
    Systems Engineering. He has received the SECDEF Meritorious Civilian Service Award and the USN
    Meritorious and Superior Civilian Service Awards.

    Speaker Info:

    Raymond O'Toole

    Acting Director, Operational Test and Evaluation

    DOT&E

  • Operational Cybersecurity Test and Evaluation of Non-IP and Wireless Networks

    Abstract:

    Nearly all land, air, and sea maneuver systems (e.g. vehicles, ships, aircraft, and missiles) are becoming more software-reliant and blending internal communication across both Internet Protocol (IP) and non-IP buses. IP communication is widely understood among the cybersecurity community, whereas expertise and available test tools for non-IP protocols such as Controller Area Network (CAN) and MIL-STD-1553 are not as commonplace. However, a core tenet of operational cybersecurity testing is to asses all potential pathways of information exchange present on the system, to include IP and non-IP. In this presentation, we will introduce a few non-IP protocols (e.g. CAN, MIL-STD-1553) and provide a live demonstration of how to attack a CAN network using malicious message injection. We will also discuss how potential cyber effects on non-IP busses can lead to catastrophic mission effects to the target system.

    Speaker Info:

    Peter Mancini

    Research Staff Member

    Institute for Defense Analyses

    Peter Mancini works at the Institute for Defense Analyses, supporting the Director, Operational Test and Evaluation (DOT&E) as a Cybersecurity OT&E analyst.

  • Opportunities and Challenges for Openly Publishing Statistics Research for National Defense

    Abstract:

    Openly publishing on statistics for defense and national security poses certain challenges and is often not straightforward, but research in this area is important to share with the open community to advance the field. Since statistical research for national defense applications is rather niche, target journals and audiences are challenging to identify. Adding an additional hurdle, much of the data for practical implementation is sensitive and surrogate datasets must be relied upon for publication. Lastly, many statisticians in these areas do not face the same expectations to openly publish as their colleagues. This roundtable is an opportunity for statisticians in the defense and national security community to come together and discuss the importance and challenges for publishing within this space. Participants will be asked to share challenges and successes related to publishing research for national defense applications. A handout summarizing common challenges and tips for succeeding will be provided. Specific topics for discussion will include: What expectations exist for statisticians to publish in this community? Are these expectations reasonable? Are you encouraged and supported by your funding institution to openly publish? Are there opportunities for collaborative work across institutions that might further encourage publications?

    Speaker Info:

    Lyndsay Shand

    R&D Statistician

    Sandia National Laboratories

    Lyndsay Shand received her PhD in Statistics from University of Illinois Urbana-Champaign in 2017 with a focus on spatio-temporal analysis. At Sandia, Lyndsay continues to conduct, lead and publish on research centered around space-time statistics with applications to material science, climate science and disease modeling to name a few. Specific research topics have included accounting for missing data in space-time point processes, Bayesian functional data registration, hierarchical disease modeling, spatial extremes and design of experiments for computer model.

    Lyndsay currently leads efforts to quantify the effects of ship-emissions on clouds to reduce uncertainty of aerosol processes in climate models and serves as programmatic lead and statistical subject matter expert of a team developing experimental design and surrogate modeling approaches for material modeling efforts at Sandia. Additionally, Lyndsay serves as a statistical consultant to many projects across the lab and was nominated for an employee recognition award for her leadership in 2019.

    Outside of research, Lyndsay is actively engaged in joint research with university partners including the University of Illinois, Brigham Young University, and the University of Washington. She is also a member and of the International Society for Bayesian Analysis (ISBA), the American Statistical Association, and the American Geophysical Union, serving as co-web editor for ISBA and treasurer for the environmental section of ISBA.

  • Organizing and Sharing Data within the T&E Community

    Abstract:

    Effective data sharing requires alignment of personnel, systems, and policies. Data are costly and precious, and to get the most value out of the data we collect, it is important that we share and reuse it whenever possible and appropriate. Data are typically collected and organized with a single specific use or goal in mind, and after that goal has been achieved (e.g., the report is published), the data are no longer viewed as important or useful. This process is self-fulfilling, as future analysts who might want to use these data will not be able to find them or will be unable to understand the data sufficiently due to lack of documentation and metadata. Implementing data standards and facilitating sharing are challenging in the national security environment. There are many data repositories within the DoD, but most of them are specific to certain organizations and are accessible by a limited number of people. Valid concerns about security make the process of sharing particular data sets challenging, and the opacity of data ownership often complicates the issue. This roundtable will facilitate discussion of these issues. Participants will have opportunities to share their experiences trying to share data and make use of data from previous testing. We hope to identify useful lessons learned and find ways to encourage data sharing within the community.

    Speaker Info:

    Matthew Avery

    Research Staff Memeber

    Institute for Defense Analyses

    Dr. Matthew Avery is a Research Staff Member at the Institute for Defense Analyses in the Operational Evaluation Division. As the Tactical Aerial Reconnaissance Systems Project Leader, he provides input to the DoD on test plans and reports for Army, Marine Corps, and Navy tactical unmanned aircraft systems. Dr. Avery co-chairs IDA’s Data Governance Committee and leads the Data Management Group within IDA’s Test Science team, where he helps develop internal policies and trainings on data usage, access and storage. From 2017 to 2018, Dr. Avery worked as an embedded analyst at the Department of Defense Office of Cost Analysis and Program Evaluation (CAPE), focusing on Space Control and Mobilization. Dr. Avery received his PhD in Statistics from North Carolina State University in 2012.

  • Overcoming Challenges and Applying Sequential Procedures to T&E

    Abstract:

    The majority of statistical analyses involves observing a fixed set of data and analyzing those data after the final observation has been collected to draw some inference about the population from which they came. Unlike these traditional methods, sequential analysis is concerned with situations for which the number, pattern, or composition of the data is not determined at the start of the investigation but instead depends upon the information acquired throughout the course of the investigation. Expanding the use of sequential analysis in DoD testing has the potential to save substantial test dollars and decrease test time. However, switching from traditional to sequential planning will likely induce unique challenges. The goal of this round table is to provide an open forum for topics related to sequential analyses. We aim to discuss potential challenges, identify potential ways to overcome them, and talk about successful stories of sequential analyses implementation and lessons learned. Specific questions for discussion will be provided to participants prior to the event.

    Speaker Info:

    Rebecca Medlin

    Research Staff Member

    Institute for Defense Analyses

    Dr. Rebecca Medlin is a Research Staff Member at the Institute for Defense Analyses.  She supports the Director, Operational Test and Evaluation (DOT&E) on the use of statistics in test & evaluation and has designed tests and conducted statistical analyses for several major defense programs including tactical vehicles, mobility aircraft, radars, and electronic warfare systems.  Her areas of expertise include design of experiments, statistical modeling, and reliability.  She has a Ph.D. in Statistics from Virginia Tech.

  • Physics-Informed Deep Learning for Modeling and Simulation under Uncertainty

    Abstract:

    Certification by analysis (CBA) involves the supplementation of expensive physical testing with modeling and simulation. In high-risk fields such as defense and aerospace, it is critical that these models accurately represent the real world, and thus they must be verified, validated and provide measures of uncertainty. While machine learning (ML) algorithms such as deep neural networks have seen significant success in low-risk sectors, they are typically opaque, difficult to interpret and often fail to meet these stringent requirements. Recently, a Department of Energy (DOE) report was released on the concept of scientific machine learning (SML) [1] with the aim of generally improving confidence in ML and enabling broader use in the scientific and engineering communities. The report identified three critical attributes that ML algorithms should possess: domain-awareness, interpretability, and robustness.

    Recent advances in physics-informed neural networks (PINNs) are promising in that they can provide both domain awareness and a degree of interpretability [2, 3, 4] by using governing partial differential equations as constraints during training. In this way, PINNs output physically admissible, albeit deterministic, solutions. Another noteworthy deep learning algorithm is the generative adversarial network (GAN), which can learn probability distributions [5] and provide robustness through uncertainty quantification. A limited number of works have recently demonstrated success by combining these two methods into what is referred to as a physics-informed GAN, or PIGAN [6, 7]. The PIGAN has the ability to produce physically admissible, non-deterministic predictions as well as solve non-deterministic inverse problems, potentially meeting the goals of domain awareness, interpretability, and robustness. This talk will present an introduction to PIGANs as well as an example of current NASA research implementing these networks.

    REFERENCES
    [1] Nathan Baker, Frank Alexander, Timo Bremer, Aric Hagberg, Yannis Kevrekidis, Habib Najm, Manish Parashar, Abani Patra, James Sethian, Stefan Wild, Karen Willcox, and Steven Lee. Workshop report on basic research needs for scientific machine learning: Core technologies for artificial intelligence. Technical report, USDOE Office of Science (SC) Washington, DC (United States), 2019.
    [2] Maziar Raissi, Paris Perdikaris, and George Karniadakis. Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378:686-707, 2019.
    [3] Alexandre Tartakovsky, Carlos Ortiz Marrero, Paris Perdikaris, Guzel Tartakovsky, and David Barajas-Solano. Learning parameters and constitutive relationships with physics informed deep neural networks. arXiv preprint arXiv:1808.03398, 2018.
    [4] Julia Ling, Andrew Kurzawski, and Jeremy Templeton. Reynolds averaged turbulence modelling using deep neural networks with embedded invariance. Journal of Fluid Mechanics, 807:155-166, 2016.
    [5] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems, pages 2672-2680, 2014.
    [6] Liu Yang, Dongkun Zhang, and George Karniadakis. Physics-informed generative adversarial networks for stochastic differential equations. arXiv preprint arXiv:1811.02033, 2018.
    [7] Yibo Yang and Paris Perdikaris. Adversarial uncertainty quantification in physics-informed neural networks. Journal of Computational Physics, 394:136-152, 2019.

    Speaker Info:

    Patrick Leser

    Aerospace Technologist

    NASA Langley Research Center

    Dr. Patrick Leser is a researcher in the Durability, Damage Tolerance, and Reliability Branch (DDTRB) at NASA Langley Research Center (LaRC) in Hampton, VA. After receiving a B.S. in Aerospace Engineering from North Carolina State University (NCSU), he became a NASA civil servant under the Pathways Intern Employment Program in 2014. In 2017, he received his PhD in Aerospace Engineering, also from NCSU. Dr. Leser’s research focuses primarily on uncertainty quantification (UQ), model calibration, and fatigue crack growth in metallic materials. Dr. Leser has applied these interests to various topics including structural health management, digital twin, additive manufacturing, battery health management and various NASA Engineering Safety Center (NESC) assessments, primarily focusing on the fatigue life of composite overwrapped pressure vessels (COPVs).  A primary focus of his work has been the development of computationally-efficient UQ methods that utilize fields such as high performance computing and machine learning.

  • Prior Formulation in a Bayesian Analysis of Biomechanical Data

    Abstract:

    Biomechanical experiments investigating the failure modes of biological tissue require a significant investment of time and money due to the complexity of procuring, preparing, and testing tissue. Furthermore, the potentially destructive nature of these tests makes repeated testing infeasible. This leads to experiments with notably small sample sizes in light of the high variance common to biological material. When the goal is to estimate parameters for an analytic artifact such as an injury risk curve (IRC), which relates an input quantity to a probability of injury, small sample sizes result in undesirable uncertainty. One way to ameliorate this effect is through a Bayesian approach, incorporating expert opinion and previous experimental data into a prior distribution. This has the advantage of leveraging the information contained in expert opinion and related experimental data to obtain faster convergence to an appropriate parameter estimation with a desired certainty threshold.

    We explore several ways of implementing Bayesian methods in a biomechanical setting, including permutations on the use of expert knowledge and prior experimental data. Specifically, we begin with a set of experimental data from which we generate a reference IRC. We then elicit expert predictions of the 10th and 90th quantiles of injury, and use them to formulate both uniform and normal prior distributions. We also generate priors from qualitatively similar experimental data, both directly on the IRC parameters and on the injury quantiles, and explore the use of weighting schemes to assign more influence to better datasets. By adjusting the standard deviation and shifting the mean, we can create priors of variable quality. Using a subset of the experimental data in conjunction with our derived priors, we then re-fit the IRC and compare it to the reference curve. For all methods we will measure the certainty, speed of convergence, and accuracy relative to the reference IRC, with the aim of recommending a best practices approach for the application of Bayesian methods in this setting. Ultimately an optimized approach for handling small samples sizes with Bayesian methods has the potential to increase the information content of individual biomechanical experiments by integrating them into the context of expert knowledge and prior experimentation.

    Speaker Info:

    Amanda French

    Data Scientist

    Johns Hopkins University Applied Physics Laboratory

    Amanda French is a data scientist at Johns Hopkins University Applied Physics Laboratory. She obtained her PhD in mathematics from UNC Chapel Hill and went on to perform data science for a variety of government agencies, including the Department of State, Military Health System, and Department of Defense. Her expertise includes statistics, machine learning, and experimental design.

  • Pseudo-Exhaustive Testing - Part 1

    Abstract:

    Exhaustive testing is infeasible when testing complex engineered systems. Fortunately, a combinatorial testing approach can be almost as effective as exhaustive testing but at dramatically lower cost. The effectiveness of this approach is due to the underlying construct on which it is based, that is a mathematical construct known as a covering array. This tutorial is divided into two sections. Section 1 introduces covering arrays, introduces a few covering array metrics, and then shows how covering arrays are used in combinatorial testing methodologies. Section 2 focuses on practical applications of combinatorial testing, including a commercial aviation example, an example that focuses on a widely used machine learning library, plus other examples that illustrate how common testing challenges can be addressed. In the process of working through these examples, an easy-to-use tool for generating covering arrays will be demonstrated.

    Speaker Info:

    Ryan Lekivetz

    Research Statistician Developer

    SAS Institute

    Ryan Lekivetz is a Principal Research Statistician Developer for the JMP Division of SAS where he implements features for the Design of Experiments platforms in JMP software.

  • Pseudo-Exhaustive Testing - Part 2

    Speaker Info:

    Joseph Morgan

    Principal Research Statistician

    SAS Institute

    Joseph Morgan is a Principal Research Statistician/Developer in the JMP Division of SAS Institute Inc. where he implements features for the Design of Experiments platforms in JMP software. His research interests include combinatorial testing, empirical software engineering and algebraic design theory.

  • Spatio-Temporal Modeling of Pandemics

    Abstract:

    The spread of COVID-19 across the United States provides an interesting case study in the modeling of spatio-temporal data. In this breakout session we will provide an overview of commonly used spatio-temporal models and demonstrate how Bayesian Inference can be performed using both exact and approximate inferential techniques. Using COVID data, we will demonstrate visualization techniques in R and introduce "off-the-shelf" spatio-temporal models. We will introduce participants to the Integrated Nested LaPlace approximation (INLA) methodology and show how results from this technique compare to using Markov Chain Monte Carlo (MCMC) techniques. Finally, we will demonstrate the short-falls in using "off-the-shelf" models and show how epidemiological motivated partial differential equations can be used to generate spatio-temporal models and discuss inferential issues when we move away from common models.

    Speaker Info:

    Nicholas Clark

    Assistant Professor

    West Point

    LTC Nicholas Clark is an Academy Professor at the United States Military Academy where he heads the Center for Data Analysis and Statistics. Nick received his PhD in Statistics from Iowa State University and his research interests include spatio-temporal statistics and Bayesian methodology.

  • Statistical Approaches to V&V and Adaptive Sampling in M&S - Part 1

    Abstract:

    Leadership has placed a high premium on analytically defensible results for M&S Verification and Validation. This mini-tutorial will provide a quick overview of relevant standard methods to establish equivalency in mean, variance, and distribution shape such as Two One-Sided Tests (TOST), K-S tests, Fisher’s Exact, and Fisher’s Combined Probability. The focus will be on more advanced methods such as the equality between model parameters in statistical emulators versus live tests (Hotelling T2, loglinear variance), equivalence of output curves (functional data analysis), and bootstrap methods. Additionally, we introduce a new method for near real-time adaptive sampling that places the next set of M&S runs at boundary regions of high gradient in the responses to more efficiently characterize complex surfaces such as those seen in autonomous systems.

    Speaker Info:

    Jim Wisnowski

    Principal Consultant

    Adsurgo LLC

    Jim Wisnowski is Principal Consultant and Co-founder at Adsurgo, LLC. He currently provides applied statistics training and consulting services across numerous industries and government departments with particular emphasis on Design of Experiments and Test & Evaluation. Previously, he was a commander and engineer in the Air Force, statistics professor at the US Air Force Academy, and Joint Staff officer. He received his PhD in Industrial Engineering from Arizona State University.

  • Statistical Approaches to V&V and Adaptive Sampling in M&S - Part 2

    Speaker Info:

    Jim Simpson

    Principal

    JK Analytics

    Jim Simpson is the Principal of JK Analytics where he currently coaches and trains across various industries and organizations. He has blended practical application and industrial statistics leadership with academic experience focused on researching new methods, teaching excellence and the development and delivery of statistics courseware for graduate and professional education. Previously, he led the Air Force’s largest test wing as Chief Operations Analyst.  He has served as full-time faculty at the Air Force Academy and Florida State University, and is now an Adjunct Professor at the Air Force Institute of Technology (AFIT) and the University of Florida. He received his PhD in Industrial Engineering from Arizona State University.

  • Statistical Engineering in Practice

    Speaker Info:

    Peter Parker

    Team Lead for Advancement Measurement Systems

    NASA Langley

    Dr. Peter Parker is Team Lead for Advanced Measurement Systems at the National Aeronautics and Space Administration’s Langley Research Center in Hampton, Virginia. He serves an Agency-wide statistical expert across all of NASA’s mission directorates of Exploration, Aeronautics, and Science to infuse statistical thinking, engineering, and methods. His expertise is in collaboratively integrating research objectives, measurement sciences, modeling and simulation, and test design to produce actionable knowledge that supports rigorous decision-making for aerospace research and development. After eight years in private industry, Dr. Parker joined Langley Research Center in 1997.

    He holds a B.S. in Mechanical Engineering from Old Dominion University, a M.S. in Applied Physics and Computer Science from Christopher Newport University, and a M.S. and Ph.D. in Statistics from Virginia Tech. He is a licensed Professional Engineer in the Commonwealth of Virginia. Dr. Parker is a senior member of the American Institute of Aeronautics and Astronautics, American Society for Quality, and American Statistical Association. He currently serves as Chair-Elect of the International Statistical Engineering Association. Dr. Parker is the past-Chair of the American Society for Quality’s Publication Management Board and Editor Emeritus of the journal Quality Engineering.

  • Statistical Engineering in Practice

    Speaker Info:

    Alex Varbanov

    Principal Scientist

    Procter and Gamble

    Dr. Alex Varbanov is a Principal Scientist at Procter & Gamble. He was born in Bulgaria. He graduated with Ph.D. from the School of Statistics, University of Minnesota in 1999. Dr. Varbanov has been with P&G R&D for 21 years and provides statistical support for variety of company business units (e.g., Fabric & Home Care) and brands (e.g., Tide, Swiffer). He performs experimental design, statistical analysis, and advanced modeling for many different areas including genomics, consumer research, and product claims. He is currently developing scent character end-to-end models for perfume optimization in P&G consumer products.

  • Statistical Engineering in Practice

    Abstract:

    Problems faced in defense and aerospace often go well beyond textbook problems presented in academic settings. Textbook problems are typically very well defined, and can be solved through application of one "correct" tool. Conversely, many defense and aerospace problems are ill-defined, at least initially, and require an overall strategy to attack, one involving multiple tools and often multiple disciplines. Statistical engineering is an approach recently developed for addressing large, complex, unstructured problems, particularly those for which data can be effectively utilized. This session will present a brief overview of statistical engineering, and how it can be applied to engineer solutions to complex problems. Following this introduction, two case studies of statistical engineering will be presented, to illustrate the concepts.

    Speaker Info:

    Roger Hoerl

    Associate Professor of Statistics

    Union College

    Dr. Roger W. Hoerl is the Brate-Peschel Associate Professor of Statistics at Union College in Schenectady, NY. Previously, he led the Applied Statistics Lab at GE Global Research. While at GE, Dr. Hoerl led a team of statisticians, applied mathematicians, and computational financial analysts who worked on some of GE’s most challenging research problems, such as developing personalized medicine protocols, enhancing the reliability of aircraft engines, and management of risk for a half-trillion dollar portfolio.

    Dr. Hoerl has been named a Fellow of the American Statistical Association and the American Society for Quality, and has been elected to the International Statistical Institute and the International Academy for Quality. He has received the Brumbaugh and Hunter Awards, as well as the Shewhart Medal, from the American Society for Quality, and the Founders Award and Deming Lectureship Award from the American Statistical Association. While at GE Global Research, he received the Coolidge Fellowship, honoring one scientist a year from among the four global GE Research and Development sites for lifetime technical achievement. His book with Ron Snee, Statistical Thinking: Improving Business Performance, now in its 3rd edition, was called “the most practical introductory statistics textbook ever published in a business context” by the journal Technometrics.

  • Statistical Engineering in Practice

    Abstract:

    Problems faced in defense and aerospace often go well beyond textbook problems presented in academic settings. Textbook problems are typically very well defined, and can be solved through application of one "correct" tool. Conversely, many defense and aerospace problems are ill-defined, at least initially, and require an overall strategy to attack, one involving multiple tools and often multiple disciplines. Statistical engineering is an approach recently developed for addressing large, complex, unstructured problems, particularly those for which data can be effectively utilized. This session will present a brief overview of statistical engineering, and how it can be applied to engineer solutions to complex problems. Following this introduction, two case studies of statistical engineering will be presented, to illustrate the concepts.

    Speaker Info:

    Angie Patterson

    Chief Consulting Engineer

    GE Aviation

  • Surrogate Models and Sampling Plans for Multi-fidelity Aerodynamic Performance Databases

    Abstract:

    Generating aerodynamic coefficients can be computationally expensive, especially for the viscous CFD solvers in which multiple complex models are iteratively solved. When filling large design spaces, utilizing only a high accuracy viscous CFD solver can be infeasible. We apply state-of-the-art methods for design and analysis of computer experiments to efficiently develop an emulator for high-fidelity simulations. First, we apply a cokriging model to leverage information from fast low-fidelity simulations to improve predictions with more expensive high-fidelity simulations. Combining space-filling designs with a Gaussian process model-based sequential sampling criterion allows us to efficiently generate sample points and limit the number of costly simulations needed to achieve the desired model accuracy. We demonstrate the effectiveness of these methods with an aerodynamic simulation study using a conic shape geometry.

    This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.

    Release Number: LLNL-ABS-818163

    Speaker Info:

    Kevin Quinlan

    Applied Statistician

    Lawrence Livermore National Laboratory

  • Test Design and Analysis for Modeling & Simulation Validation

    Abstract:

    System evaluations increasingly rely on modeling and simulation (M&S) to supplement live testing. It is thus crucial to thoroughly validate these M&S tools using rigorous data collection and analysis strategies. At this roundtable, we will identify and discuss some of the core challenges currently associated with implementing M&S validation for T&E. First, appropriate design of experiments (DOE) for M&S is not universally adopted across the T&E community. This arises in part due to limited knowledge of gold standard techniques from academic research (e.g., space filing designs; Gaussian Process emulators) as well as lack of expertise with the requisite software tools. Second, T&E poses unique demands in testing, such as extreme constraints in live testing conditions and reliance on binary outcomes. There is no consensus on how to incorporate these needs into the existing academic framework for M&S. Finally, some practical considerations lack clear solutions yet have direct consequences on design choice. In particular, we may discuss the following: (1) sample size determination when calculating power and confidence is not applicable, and (2) non-deterministic M&S output with high levels of noise, which may benefit from replication samples as in classical DOE.

    Speaker Info:

    Kelly Avery

    Research Staff Member

    Institute for Defense Analyses

    Kelly M. Avery is a Research Staff Member at the Institute for Defense Analyses.  She supports the Director, Operational Test and Evaluation (DOT&E) on the use of statistics in test & evaluation and modeling & simulation, and has designed tests and conducted statistical analyses for several major defense programs including tactical aircraft, missile systems, radars, satellite systems, and computer-based intelligence systems.  Her areas of expertise include statistical modeling, design of experiments, modeling & simulation validation, and statistical process control.  Dr. Avery has a B.S. in Statistics, a M.S. in Applied Statistics, and a Ph.D. in Statistics, all from Florida State University. 

  • The Keys to Successful Collaborations during Test and Evaluation: Moderator

    Abstract:

    The defense industry faces increasingly complex systems in test and evaluation (T&E) that require interdisciplinary teams to successfully plan testing. A critical aspect in test planning is a successful collaboration between T&E experts, subject matter experts, program leadership, statisticians, and others. This panel, based on their own experiences as consulting statisticians, will discuss elements that lead to successful collaborations, barriers during collaboration, and recommendations to improve collaborations during T&E planning.

    Speaker Info:

    Christine Anderson-Cook

    Los Alamos National Lab

  • The Keys to Successful Collaborations during Test and Evaluation: Panelist

    Speaker Info:

    Sarah Burke

    STAT Expert

    STAT Center of Excellence

    Dr. Sarah Burke is a scientific test and analysis techniques (STAT) Expert for the STAT Center of Excellence. She works with acquisition programs in the Air Force, Army, and Navy to improve test efficiency, plan tests effectively, and analyze the resulting test data to inform decisions on system development. She received her M.S. in Statistics and Ph.D. in Industrial Engineering from Arizona State University.

  • The Keys to Successful Collaborations during Test and Evaluation: Panelist

    Speaker Info:

    John Haman

    RSM

    Institute for Defense Analyses

    Dr. John Haman is a statistician at the Institute for Defense Analyses, where he develops methods and tools for analyzing test data. He has worked with a variety of Army, Navy, and Air Force systems, including counter-UAS and electronic warfare systems. Currently, John is supporting the Joint Artificial Intelligence Center.

  • The Keys to Successful Collaborations during Test and Evaluation: Panelist

    Speaker Info:

    Peter Parker

    Team Lead for Advanced Measurement Systems

    NASA Langley

    Dr. Peter Parker is Team Lead for Advanced Measurement Systems at the National Aeronautics and Space Administration’s Langley Research Center in Hampton, Virginia. He serves an Agency-wide statistical expert across all of NASA’s mission directorates of Exploration, Aeronautics, and Science to infuse statistical thinking, engineering, and methods. His expertise is in collaboratively integrating research objectives, measurement sciences, modeling and simulation, and test design to produce actionable knowledge that supports rigorous decision-making for aerospace research and development.

  • The Keys to Successful Collaborations during Test and Evaluation: Panelist

    Speaker Info:

    Willis Jensen

    HR Analyst

    W.L. Gore & Associates

    Dr. Willis Jensen is a member of the HR Analytics team at W.L. Gore & Associates, where he supports people related analytics work across the company. At Gore, he previously spent 12 years as a statistician and as the Global Statistics Team Leader where he led a team of statisticians that provided statistical support and training across the globe. He holds degrees in Statistics from Brigham Young University and a Ph.D. in Statistics from Virginia Tech.

  • The Role of the Statistics Profession in the DoD’s Current AI Initiative

    Abstract:

    In 2019, the DoD unveiled comprehensive strategies related to Artificial Intelligence, Digital Modernization, and Enterprise Data Analytics.  Recognizing that data science and analytics are fundamental to these strategies, in October 2020 the DoD has issued a comprehensive Data Strategy for national security and defense.  For over a hundred years, statistical sciences have played a pivotal role in our national defense, from quality assurance and reliability analysis of munitions fielded in WWII, to operational analyses defining battlefield force structure and tactics, to helping optimize the engineering design of complex products, to rigorous testing and evaluating of Warfighter systems.  The American Statistical Association (ASA) in 2015 recognized in its statement on The Role of Statistics in Data Science that “statistics is foundational to data science… and its use in this emerging field empowers researchers to extract knowledge and obtain better results from Big Data and other analytics projects.”  It is clearly recognized that data as information is a key asset to the DoD.  The challenge we face is how to transform existing talent to add value where it counts.

    Speaker Info:

    Laura Freeman

    Research Associate Professor of Statistics and Director of the Intelligent Systems Lab

    Virginia Tech

    Dr. Laura Freeman is a Research Associate Professor of Statistics and the Director of the Intelligent Systems Lab at the Virginia Tech Hume Center.  Her research leverages experimental methods for conducting research that brings together cyber-physical systems, data science, artificial intelligence (AI), and machine learning to address critical challenges in national security.  She is also a hub faculty member in the Commonwealth Cyber Initiative and leads research in AI Assurance. She develops new methods for test and evaluation focusing on emerging system technology.  She is also the Assistant Dean for Research in the National Capital Region, in that capacity she works to shape research directions and collaborations in across the College of Science in the National Capital Region.

    Previously, Dr. Freeman was the Assistant Director of the Operational Evaluation Division at the Institute for Defense Analyses.  In that position, she established and developed an interdisciplinary analytical team of statisticians, psychologists, and engineers to advance scientific approaches to DoD test and evaluation.  During 2018, Dr. Freeman served as that acting Senior Technical Advisor for Director Operational Test and Evaluation (DOT&E).  As the Senior Technical Advisor, Dr. Freeman provided leadership, advice, and counsel to all personnel on technical aspects of testing military systems.  She reviewed test strategies, plans, and reports from all systems on DOT&E oversight.

    Dr. Freeman has a B.S. in Aerospace Engineering, a M.S. in Statistics and a Ph.D. in Statistics, all from Virginia Tech.  Her Ph.D. research was on design and analysis of experiments for reliability data.

  • Uncertainty Quantification and Sensitivity Analysis Methodology for AJEM

    Abstract:

    The Advanced Joint Effectiveness Model (AJEM) is a joint forces model developed by the U.S. Army that is used in vulnerability and lethality (V/L) predictions for threat/target interactions. This complex model primarily generates a probability response for various components, scenarios, loss of capabilities, or summary conditions. Sensitivity analysis (SA) and uncertainty quantification (UQ), referred to jointly as SA/UQ, are disciplines that provide the working space for how model estimates changes with respect to changes in input variables. A comparative measure that will be used to characterize the effect of an input change on the predicted outcome was developed and is reviewed and illustrated in this presentation. This measure provides a practical context that stakeholders can better understand and utilize. We show graphical and tabular results using this measure.

    Speaker Info:

    Craig Andres

    Mathematical Statistician

    U.S. Army CCDC Data & Analysis Center

    Craig Andres is a Mathematical Statistician at the recently formed DEVCOM Data & Analysis Center in the Materiel M&S Branch working primarily on the uncertainty quantification, as well as the verification and validation, of the AJEM vulnerability model.  He is currently on developmental assignment with the Capabilities Projection Team.  He has a master's degrees in Applied Statistics from Oakland University and a master's degree in Mathematics from Western Michigan University.

  • Verification and Validation of Elastodynamic Simulation Software for Aerospace Research

    Abstract:

    Physics-based simulation of nondestructive evaluation (NDE) inspection can help to advance the inspectability and reliability of mechanical systems. However, NDE simulations applicable to non-idealized mechanical components often require large compute domains and long run times. This has prompted development of custom NDE simulation software tailored to high performance computing (HPC) hardware. Verification and validation (V&V) is an integral part of developing this software to ensure implementations are robust and applicable to inspection problems, producing tools and simulations suitable for computational NDE research.

    This presentation addresses factors common to V&V of several elastodynamic simulation codes applicable to ultrasonic NDE. Examples are drawn from in-house simulation software at NASA Langley Research Center, ranging from ensuring reliability in a 1D heterogeneous media wave equation solver to the V&V needs of 3D cluster-parallel elastodynamic software. Factors specific to a research environment are addressed, where individual simulation results can be as relevant as the software product itself. Distinct facets of V&V are discussed including testing to establish software reliability, employing systematic approaches for consistency with fundamental conservation laws, establishing the numerical stability of algorithms, and demonstrating concurrence with empirical data.

    This talk also addresses V&V practices for small groups of researchers. This includes establishing resources (e.g. time and personnel) for V&V during project planning to mitigate and control the risk of setbacks. Similarly, we identify ways for individual researchers to use V&V during simulation software development itself to both speed up the development process and reduce incurred technical debt.

    Speaker Info:

    Erik Frankforter

    Research Engineer

    NASA Langley Research Center

    Erik Frankforter is a research engineer in the Nondestructive Evaluation Sciences branch at NASA Langley Research Center. His research areas include the development of high performance computing nondestructive evaluation simulation software, advancement of inspection mechanics in advanced material systems, and application of physics-based simulation for inspection guidance. In 2017, he obtained a Ph. D. in mechanical engineering from the University of South Carolina, and served as a postdoctoral research scholar at the National Institute of Aerospace.

  • 21st Century Screening Designs

    Abstract:

    Since 2000 there have been several innovations in screening experiment design and analysis. Many of these methods are now available in commercial-off-the-shelf (COTS) software. Developments include; improved Nearly Orthogonal Arrays (NOAs) (2002, 2006), Definitive Screening Designs (DSDs) (2011), weighted A-optimal designs, and Group-orthogonal Supersaturated Designs (GO SSDs) (2019).
    • NOAs have proven effective for finding well balanced screening designs – especially when many or all factors are categorical at different numbers of levels.
    • DSDs are capable of collapsing into response surface designs. When too many factors are significant to support a response surface model, DSDs can be efficiently augmented to do so.
    • A-optimal designs allow the experimenter to leverage their knowledge to weight the importance of model terms to improve design performance relative to D-optimal and I-optimal designs.
    • GO SSDs allows experimenters to run fewer trials than there are factors with a strong likelihood that significant factors will be orthogonal. If they are not orthogonal they can efficiently be made so by folding over and adding 4 new rows.
    This tutorial will give examples of each of these approaches to screening many factors and provide rules of thumb for choosing which to apply for any specific problem type.

    Speaker Info:

    Thomas Donnelly

    Principal Systems Engineer

    SAS Institute Inc.

  • A HellerVVA Problem: The Catch-22 for Simulated Testing of Fully Autonomous Systems

    Abstract:

    In order to verify, validate, and accredit (VV&A) a simulation environment for testing the performance of an autonomous system, testers must examine more than just sensor physics—they must also provide evidence that the environmental features which drives system decision making are represented at all. When systems are black boxes though, these features are fundamentally unknown, necessitating that we first test to discover these features. An umbrella known as “model induction” provides approaches for demystifying black boxes and obtaining models of their decision making, but the current state of the art assumes testers can input large quantities of operationally relevant data. When systems only make passive perceptual decisions or operate in purely virtual environments, these assumptions are typically met. However, this will not be the case for black-box, fully autonomous systems. These systems can make decisions about the information they acquire—which cannot be changed in pre-recorded passive inputs—and a major reason to obtain a decision model is to VV&A the simulation environment—preventing the valid use of a virtual environment to obtain a model. Furthermore, the current consensus is that simulation will be used to get limited safety releases for live testing. This creates a catch-22 of needing data to obtain the decision-model, but needing the decision-model to validly obtain the data. In this talk, we provide a brief overview of this challenge and possible solutions.

    Speaker Info:

    Daniel Porter

    Research Staff Member

    IDA

  • A Notional Case Study of Uncertainty Analysis in Live Fire Modeling and Simulation

    Abstract:

    A vulnerability assessment, which evaluates the ability of an armored combat vehicle and its crew to withstand the damaging effects of an anti-armor weapon, presents a unique challenge because vehicles are expensive and testing is destructive. This limits the number of full-up-system-level tests to quantities that generally do not support meaningful statistical inference. The prevailing solution to this problem is to obtain test data that is more affordable from sources which include component- and subsystem-level testing. This creates a new challenge that forms the premise of this paper: how can lower-level data sources be connected to provide a credible system-level prediction of vehicle vulnerability? This paper presents a case study that demonstrates an approach to this problem that emphasizes the use of fundamental statistical techniques --- design of experiments, statistical modeling, and propagation of uncertainty --- in the context of a combat scenario that depicts a ground vehicle being engaged by indirect artillery.

    Speaker Info:

    Thomas Johnson

    Research Staff Member

    IDA

  • A Path to Fielding Autonomous Systems

    Abstract:

    Challenges in adapting the system design process, including test and evaluation, to autonomous systems have been well documented. These include the inability to verify autonomous behaviors, designing systems for assurance, and dealing with adaptive systems post-fielding. This talk will briefly recap those challenges, and then propose a path toward overcoming these challenges in the case of Army ground autonomous systems. The path begins with unmanned systems designed for test. Training and experimentation within a safe infrastructure can then be used to specify system characteristics and inform decisions as to what behaviors should be autonomous. Assurance comes in the form of assurance arguments updated as the system technology and understanding increases.

    Speaker Info:

    Craig Lennon

    Autonomy CoI TEVV co-lead

    CCDC Army Research Lab

  • A Practical Introduction To Gaussian Process Regression

    Abstract:

    Abstract: Gaussian process regression is ubiquitous in spatial statistics, machine learning, and the surrogate modeling of computer simulation experiments.  Fortunately their prowess as accurate predictors, along with an appropriate quantification of uncertainty, does not derive from difficult-to-understand methodology and cumbersome implementation.  We will cover the basics, and provide a practical tool-set ready to be put to work in diverse applications.  The presentation will involve accessible slides authored in Rmarkdown, with reproducible examples spanning bespoke implementation to add-on packages.

    Instructor Bio: Robert Gramacy is a Professor of Statistics in the College of Science at Virginia Polytechnic and State University (Virginia Tech). Previously he was an Associate Professor of Econometrics and Statistics at the Booth School of Business, and a fellow of the Computation Institute at The University of Chicago. His research interests include Bayesian modeling methodology, statistical computing, Monte Carlo inference, nonparametric regression, sequential design, and optimization under uncertainty.  Professor Gramacy is a computational statistician. He specializes in areas of real-data analysis where the ideal modeling apparatus is impractical, or where the current solutions are inefficient and thus skimp on fidelity. Such endeavors often require new models, new methods, and new algorithms. His goal is to be impactful in all three areas while remaining grounded in the needs of a motivating application. His aim is to release general purpose software for consumption by the scientific community at large, not only other statisticians.  Professor Gramacy is the primary author on six R packages available on CRAN, two of which (tgp, and monomvn) have won awards from statistical and practitioner communities.

    Speaker Info:

    Robert "Bobby" Gramacy

    Virginia Tech

    Robert Gramacy is a Professor of Statistics in the College of Science at Virginia Polytechnic and State University (Virginia Tech). Previously he was an Associate Professor of Econometrics and Statistics at the Booth School of Business, and a fellow of the Computation Institute at The University of Chicago. His research interests include Bayesian modeling methodology, statistical computing, Monte Carlo inference, nonparametric regression, sequential design, and optimization under uncertainty.  Professor Gramacy is a computational statistician. He specializes in areas of real-data analysis where the ideal modeling apparatus is impractical, or where the current solutions are inefficient and thus skimp on fidelity. Such endeavors often require new models, new methods, and new algorithms. His goal is to be impactful in all three areas while remaining grounded in the needs of a motivating application. His aim is to release general purpose software for consumption by the scientific community at large, not only other statisticians.  Professor Gramacy is the primary author on six R packages available on CRAN, two of which (tgp, and monomvn) have won awards from statistical and practitioner communities.

  • A Simulation Study of Binary Response Models for the Small-Caliber Primer Sensitivity Test

    Abstract:

    In ammunition, the primer is a cartridge component containing explosive mix designed to initiate propellant. It is the start of the chain of events leading to projectile launch. Most primers are forcibly struck by a firing pin, and the sensitivity of the primer determines if the cartridge fires. The primer sensitivity test is used to determine whether a batch of primers are over or under sensitive. Over-sensitive primers can lead to accidental discharges, while under-sensitive primers can lead to misfires. The current Government test method relies on a “hand-calculation” based on the normal distribution to determine lot acceptance. Although a good approximation, it is not known if the true process follows a normal distribution. A simulation study was conducted to evaluate relative lot acceptance risk using the hand-calculation compared to various generalized linear models. It is shown that asymptotic behavior, and therefore lot acceptance, is very sensitive to model selection. This sensitivity is quantified and will be used to augment current empirical research. As a result, a more appropriate test method and acceptance model can be determined.

    Speaker Info:

    Zach Krogstad

    Data Scientist

    U.S. Army

  • A Validation Case Study: The Environment Centric Weapons Analysis Facility

    Abstract:

    Reliable modeling and simulation (M&S) allows the undersea warfare community to understand torpedo performance in scenarios that could never be created in live testing, and do so for a fraction of the cost of an in-water test. The Navy hopes to use the Environment Centric Weapons Analysis Facility (ECWAF), a hardware-in-the-loop simulation, to predict torpedo effectiveness and supplement live operational testing. In order to trust the model’s results, the T&E community has applied rigorous statistical design of experiments techniques to both live and simulation testing. As part of ECWAF’s two-phased validation approach, we ran the M&S experiment with the legacy torpedo and developed an empirical emulator of the ECWAF using logistic regression. Comparing the emulator’s predictions to actual outcomes from live test events supported the test design for the upgraded torpedo. This talk overviews the ECWAF’s validation strategy, decisions that have put the ECWAF on a promising path, and the metrics used to quantify uncertainty.

    Speaker Info:

    Elliot Bartis

    Research Staff Member

    IDA

    Elliot Bartis is a research staff member at the Institute for Defense Analyses where he works on test and evaluation of undersea warfare systems such as torpedoes and torpedo countermeasures.  Prior to coming to IDA, Elliot received his B.A. in physics from Carleton College and his Ph.D. in materials science and engineering from the University of Maryland in College Park.  For his doctorate dissertation, he studied how cold plasma interacts with biomolecules and polymers.  Elliot was introduced to model validation through his work on a torpedo simulation called the Environment Centric Weapons Analysis Facility.  In 2019, Elliot and others involved in the MK 48 torpedo program received a Special Achievement Award from the International Test and Evaluation Association in part for their work on this simulation.  Elliot lives in Falls Church, VA with his wife Jacqueline and their cat Lily.

  • Adoption Challenges in Artificial Intelligence and Machine Learning for Analytic Work Environments

    Speaker Info:

    Laura McNamara

    Distinguished Member of Technical Staff

    Sandia National Laboratories

    Dr. Laura A. McNamara is Distinguished Member of Technical Staff at Sandia National Laboratories. She's spent her career partnering with computer scientists, software engineers, physicists, human factors experts, organizational psychologists, remote sensing and imagery scientists, and national security analysts in a wide range of settings.  She has expertise in user-centered technology design and evaluation, information visualization/visual analytics, and mixed qualitative/quantitative social science research. Most of her projects involve challenges in sensor management, technology usability, and innovation feasibility and adoption. She enjoys working in Agile and Agile-like environments and is a skilled leader of interdisciplinary engineering, scientific, and software teams.  She is passionate about ensuring usability, utility, and adaptability of visualization, operational, and analytic software.

    Dr. McNamara's current work focuses on operational and analytic workflows in remote sensing environments.  She is also an expert on visual cognitive workflows in team environments, focused on the role of user interfaces and analytic technologies to support exploratory data analysis and information creation with large, disparate, unwieldy datasets, from text to remote sensing.  Dr. McNamara has longstanding interest in the epistemology and practices of computational modeling and simulation, verification and validation, and uncertainty quantification.  She has worked with the National Geospatial-Intelligence Agency, the Missile Defense Agency, the Defense Intelligence Agency, and the nuclear weapons programs at Sandia and Los Alamos National Laboratories to enhance the effective use of modeling and simulation in interdisciplinary R&D projects.

  • AI for Cyber

    Abstract:

    Recent developments in artificial intelligence have captured the popular imagination and the attention of governments and businesses across the world. From digital assistants, to smart homes, to self-driving cars, AI appears to be on the verge of taking over many parts of our daily lives. Meanwhile, as the world becomes more networked, industry, governments, and individuals are facing a growing array of cybersecurity threats. In this talk, we will discuss the intersection of artificial intelligence, machine learning, and cybersecurity. In particular, we will look at how some popular machine learning methods will and will not change how we do cybersecurity, we will separate what in the AI and ML space is realistic from what is science fiction, and we will attempt to identify the true potential of ML to positively impact cybersecurity.

    Speaker Info:

    Adam Cardinal-Stakenas

    Data Science Lead

    NSA

  • Aliased Informed Model Selection for Nonregular Designs

    Abstract:

    Nonregular designs are a preferable alternative to regular resolution IV designs because they avoid confounding two-factor interactions. As a result nonregular designs can estimate and identify a few active two-factor interactions. However, due to the sometimes complex alias structure of nonregular designs, standard screening strategies can fail to identify all active effects. In this talk, two-level nonregular screening designs with orthogonal main effects will be discussed. By utilizing knowledge of the alias structure, we propose a design based model selection process for analyzing nonregular designs. Our Aliased Informed Model Selection (AIMS) strategy is a design specific approach that is compared to three generic model selection methods; stepwise regression, Lasso, and the Dantzig selector. The AIMS approach substantially increases the power to detect active main effects and two-factor interactions versus the aforementioned generic methodologies.

    Speaker Info:

    Carly Metcalfe

    PhD Candidate

    Arizona State University

  • An Approach to Assessing Sterilization Probabilistically

    Abstract:

    Sterility Assurance Level (SAL) is the probability that a product, after being exposed to a given sterilization process, contains one or more viable organisms. The SAL is a standard way of defining cleanliness requirements on a product or acceptability of a sterilization procedure in industry and regulatory agencies. Since the SAL acknowledges the inherent probabilistic nature of detecting sterility – that we cannot be absolutely sure that sterility is achieved – a probabilistic approach to its assessment is required that considers the actual end-to-end process involved with demonstrating sterility. Provided here is one such approach.

    We assume the process of demonstrating sterility is based on the scientific method, and therefore starts with a scientific hypothesis of the model that generates the life/death outcomes of the organisms that will be observed in experiment. Experiments are then designed (e.g. environmental conditions determined, reference/indicator organisms selected, number of samples/replicates, instrumentation) and performed, and an initial conclusion regarding the appropriate model is drawn from observed numbers of organisms remaining once exposed to the sterilization process. Ideally, this is then validated by future experiment by independent scientific inquiry, or the results are used to develop new hypotheses and the process repeats. Ultimately, a decision is made regarding the validity of the sterilization process by means of comparing with a SAL. Bayesian statistics naturally lends itself to develop a probability distribution from this process to compare against a given SAL, which is the approach taken in this paper. We provide an example application to provide a simple demonstration of this from actual experiments performed, and discuss its relevance to the future NASA mission, Mars Sample Return.

    Speaker Info:

    Mike Dinicola

    System Engineer

    Jet Propulsion Laboratory

  • An end-to-end uncertainty quantification framework in predictive ocean data science

    Abstract:

    Because of the formidable challenge of observing the time-evolving full-depth global ocean circulation, numerical simulations play an essential role in quantifying the ocean’s role in climate variability and long-term change. For the same reason, predictive capabilities are confounded by the high-dimensional space of uncertain variables (initial conditions, internal parameters, external forcings, and model inadequacy). Bayesian inverse methods (loosely known in approximate form as data assimilation) that optimally extract and merge information from sparse, heterogeneous observations and models are powerful tools to enable rigorously calibrated and initialized predictive models to optimally learn from the sparse data. A key enabling computational approach is the use of derivative (adjoint and Hessian) methods for solving a deterministic nonlinear least-squares optimization problem. Such a parameter and state estimation system is practiced by the NASA-supported Estimating the Circulation and Climate of the Ocean (ECCO) consortium. An end-to-end system that propagates uncertainties from observational data to relevant oceanographic metrics or quantities of interest within an inverse modeling framework should address, within a joint approach, all sources of uncertainties, including those in (1) observations (measurement, sampling and representation errors), (2) the model (parametric and structural model error), (3) the data assimilation method (algorithmic approximations and data ingestion), (4) initial and boundary conditions (external forcings, bathymetry), and (5) the prior knowledge (error covariances). Here we lay out a vision for such an end-to-end framework. Recent developments in computational science and engineering are beginning to render theoretical concepts practical in real-world applications.

    Speaker Info:

    Patrick Heimbach

    Associate Professor

    University of Texas at Austin

  • Analyzing Miss Distance

    Speaker Info:

    Kevin Kirshenbaum

    Research Staff Member

    IDA

  • Assessing the reliability of prediction intervals from Bayesian Neural Networks

    Abstract:

    Neural networks (NN) have become popular models because of their predictive power in a variety of applications. Users are beginning to use NN to automate tasks previously done by humans. One criticism of NN is they provide no uncertainty with their predictions, which is problematic in high risk applications. Bayesian neural networks (BNN) provide one approach to quantifying uncertainty by putting NN in a probabilistic framework through placing priors on all weights and computing posterior predictive distributions. We assess the quality of uncertainty given by BNN estimated using Markov Chain Monte Carlo (MCMC) and variational inference (VI) with a simulation study. These results are also compared to Concrete Dropout, another way to provide uncertainty for NN, and to a Gaussian Process model. The effect of network architecture on uncertainty quantification is also explored. BNN fit via MCMC gave uncertainty results similar to those of the Gaussian Process, which performed better than BNN fit via VI or Concrete Dropout. Results also show the significant effects of network architecture on interpolation and show additional issues with over- and underfitting.
    Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525.

    Speaker Info:

    Daniel Ries

    Sandia National Lab

  • Automated Feature Extraction

    Abstract:

    Pivotal to current US military operations is the quick identification of buildings on Gridded Reference Graphics (GRGs), gridded satellite images of an objective. At present, the United States Special Operations Command (SOCOM) identifies these buildings by hand through the work of individual intelligence officers. Recent advances in Convolutional Neural Networks (CNNs), however, allow the possibility for this process to be streamlined through the use of object detection algorithms. In this presentation, we describe an object detection algorithm designed to quickly identify and highlight every building present on a GRG. Our work leverages both the U-Net and the Mask R-CNN architectures as well as a four-city dataset to produce an algorithm that accurately identifies a large breadth of buildings. Our model reports an accuracy of 87% and is capable of detecting buildings on a diverse set of test images.

    Speaker Info:

    Samuel Humphries

    Student

    United States Military Academy

  • Automated Road Extraction from Satellite Images

    Abstract:

    In this presentation we will discuss a methodology to automatically extract road networks from aerial images using machine learning algorithms. There are many civilian applications for this technology, such as maintaining GPS maps, real-estate monitoring and disaster relief, but this presentation is aimed a military applications for analyzing remote sensing data. The algorithm we propose identifies road networks and classifies each road as either improved or unimproved. Implementing this system in a military context will require our model to work across a wide range of environments. We will discuss the effectiveness of the model in a military context.

    Speaker Info:

    Trevor Parker

    Cadet

    United States Military Academy

  • Balancing Human & Virtual Decision Making for Identifying Fraudulent Credit Card Activity

    Abstract:

    According to Forbes, merchants in the United States lose approximately $190 billion annually to credit card fraud (Shaughnessy 2012). Nordstrom, a leading retailer in the fashion industry, alone incurs losses upwards of $100 million every year. While these losses hurt the company financially, the more pressing concern for Nordstrom is the negative impact on the customer. If fraud is incorrectly flagged and an account is unnecessarily frozen, a customer will be dissatisfied with their experience. Conversely, if legitimate fraud goes undetected, valued customers can experience monetary loss and lose faith in the company. To minimize losses while maximizing customer satisfaction, Nordstrom Card Services created a fraud detection machine that analyzes every transaction and will take one of four actions based on the transaction’s characteristics: approve, approve and send to a queue, decline and send to a queue, or decline. Once a transaction is sent to the queue, it is assessed by either a human analyst or a virtual analyst. Those analysts will determine if any further action must be taken within the customer’s account. This project focuses on how to appropriately assign human and virtual decision-makers to maximize accuracy when determining whether or not a credit card transaction is fraud.

    Speaker Info:

    Rao Abdul Hannan

    Cadet

    United States Air Force Academy

  • Bayesian Experimental Design to augment a sensor network

    Abstract:

    The problem of augmenting an existing sensor network can be solved by considering how best to answer the question of interest. In nuclear explosion monitoring, this could include where best to place a new seismic sensor to most accurately and precisely predict the unknown location of a seismic event. In this talk, we will solve this problem using a Bayesian Design of Experiments (DoE) approach where each design consists of the existing sensor network coupled with a new, possible location. We will incorporate complex computer simulation and experimental data to rank the possible sensor locations with respect to their ability to provide accurate and precise event location estimates. This will result in a map of desirability for new locations, so that if the most highly ranked location is not available or possible, regions of similar predictability can be identified. The Bayesian DoE approach also allows for incorporating denied access locations due to either geographical or political constraints.

    Speaker Info:

    Emily Casleton

    Staff Scientist

    Los Alamos National Laboratory

  • Bayesian Logistic Regression with Separated Data

    Abstract:

    When analyzing binary responses, logistic regression is sometimes difficult when some scenarios tested result in only successes or failures; this is called separation in the data. Using typical frequentist methods, logistic regression often fails because of separation. However, Bayesian logistic regression does not suffer from this limitation. This talk will walk through a Bayesian logistic regression of data with separation using the R brms package.

    Speaker Info:

    Jason Martin

    Test Design and Analysis Lead

    U.S. Army CCDC Aviation and Missile Center

  • Build Better Graphics

    Speaker Info:

    Brian Vickers

    Research Staff Member

    IDA

  • Building an End-to-End Model of Super Hornet Readiness

    Abstract:

    Co-authors: Benjamin Ashwell, Edward Beall, and V. Bram Lillard

    Bottom-up emulations of real sustainment systems that explicitly model spares, personnel, operations, and maintenance are a powerful way to tie funding decisions to their impact on readiness, but they are not widely used. The simulations require extensive data to properly model the complex and variable processes involved in a sustainment system, and the raw data used to populate the simulation are often scattered across multiple organizations.

    The Navy has encountered challenges in keeping sustaining the desired number of F/A-18 Super Hornets in mission capable states. IDA was asked to build an end-to-end model of the Super Hornet sustainment system using the OPUS/SIMLOX suite of tools to investigate the strategic levers that drive readiness. IDA built an R package (“honeybee”) that aggregates and interprets Navy sustainment data using statistical techniques to create component-level metrics, and a second R package (“stinger”) that uses these metrics to create a high-fidelity representation of the Navy’s operational tempo for OPUS/SIMLOX.

    Speaker Info:

    Benjamin Ashwell

    Research Staff Member

    IDA

  • Calibrate, Emulate, Sample

    Abstract:

    The calibration of complex models to data is both a challenge and an opportunity. It can be posed as an Inverse Problem. This work focuses on the interface of Ensemble Kalman algorithms used for inversion or posterior sampling (EKI/EKS) , Gaussian process emulation (GPE) and Markov Chain Monte Carlo (MCMC) for the calibration of, and quantification of uncertainty in, parameters learned from data. The goal is to perform uncertainty quantification in predictions made from complex models, reflecting uncertainty in these parameters, with relatively few computationally expensive forward model evaluations. This is achieved by propagating approximate posterior samples obtained by judicious combination of ideas from EKI/EKS, GPE and MCMC. The strategy will be illustrated with idealized models related to climate modeling.

    Speaker Info:

    Alfredo Garbuno-Inigo

    Postdoctoral Scholar

    Caltech

  • Can AI Predict Human Behavior?

    Abstract:

    Given the rapid increase of novel machine learning applications in cybersecurity and people analytics, there is significant evidence that these tools can give meaningful and actionable insights. Even so, great care must be taken to ensure that automated decision making tools are deployed in such a way as to mitigate bias in predictions and promote security of user data. In this talk, Dr. Burns will take a deep dive into an open source data set in the area of people analytics, demonstrating the application of basic machine learning techniques, while discussing limitations and potential pitfalls in using an algorithm to predict human behavior. In the end, Dustin will draw a comparison between the potential to predict human behavioral propensity to things such as becoming an insider threat to how assisted diagnosis tools are used in medicine to predict development or reoccurrence of illnesses.

    Speaker Info:

    Dustin Burns

    Senior Scientist

    Exponent

    Dr. Dustin Burns is a Senior Scientist in the Statistical and Data Sciences practice at Exponent, a multidisciplinary scientific and engineering consulting firm dedicated to responding to the world’s most impactful business problems. Combining his background in laboratory experiments with his expertise in data analytics and machine learning, Dr. Burns works across many industries, including security, consumer electronics, utilities, and health sciences. He supports clients’ goals to modernize data collection and analytics strategies, extract information from unused data such as images and text, and test and validate existing systems.

  • Categorical Data Analysis

    Abstract:

    Categorical data is abundant in the 21st century, and its analysis is vital to advance research across many domains. Thus, data-analytic techniques that are tailored for categorical data are an essential part of the practitioner’s toolset. The purpose of this short course is to help attendees develop and sharpen their abilities with these tools. Topics covered in this short course will include logistic regression, ordinal regression, and classification, and methods to assess predictive accuracy of these approaches will be discussed. Data will be analyzed using the R software package, and course content loosely follow Alan Agresti’s excellent textbook An Introduction to Categorical Data Analysis, Third Edition.

    Speaker Info:

    Chris Franck

    Virginia Tech

    Chris Franck is an assistant professor in the Department of Statistics at Virginia Tech.  His research focuses in Bayesian model selection and averaging, objective Bayes, and spatial statistics.  Much of his work has a specific emphasis in health applications

  • Characterizing the Orbital Debris Environment Using Satellite Perturbation Anomaly Data

    Abstract:

    The untracked orbital debris environment has been described as one of the most serious risks
    to the survivability of satellites in high-traffic low Earth orbits, where acute satellite population
    growth is taking place. This paper describes a method for correlating observed satellite orbital
    changes with orbital debris impacts, and demonstrates how populations of small debris (< 1 cm)
    can be characterized by directly examining the orbit and attitude changes of individual satellites
    within constellations. The paper also presents means for detecting unusual movements and other
    anomalies (e.g., communication losses) in individual satellites and satellite constellations using
    the Space Surveillance Network, other space surveillance sensors, and in situ methods. Finally,
    the paper discusses how an anomaly data archive and policy repository might be established,
    supporting an improved definition of the orbital debris environment in harmony with the
    President’s Space Policy Directive 3.

    Speaker Info:

    Joel Williamsen

    Research Staff Member

    IDA

  • Combining Physics-Based Simulation & Machine Learning for Fast Uncertainty Quantification

    Abstract:

    With the rise of machine learning and artificial intelligence, there has been a huge surge in data-driven approaches to solve computational science and engineering problems. In the context of uncertainty quantification (UQ), a common use case for machine learning is in the construction of efficient surrogate models (i.e., response surfaces) to replace expensive, physics-based simulations. However, relying solely on data-driven models for UQ without any further recourse to the original high-fidelity simulation will generally produce biased estimators and can yield unreliable or non-physical results, especially when training data is sparse or predictions are required outside of the training data domain.

    Speaker Info:

    James Warner

    Computational Scientist

    NASA Langley Research Center

  • Connecting Software Reliability Growth Models to Software Defect Tracking

    Abstract:

    Co-Author: Melanie Luperon.

    Most software reliability growth models only track defect discovery. However, a practical  concern is removal of high severity defects, yet defect removal is often assumed to occur instantaneously. More recently, several defect removal models have been formulated as differential equations in terms of the number of defects discovered but not yet resolved and the rate of resolution. The limitation of this approach is that it does not take into consideration data contained in a defect tracking database.

    This talk describes our recent efforts to analyze data from a NASA program. Two methods to model defect resolution are developed, namely (i) distributional and (ii) Markovian approaches. The distributional approach employs times between defect discovery and resolution to characterize the mean resolution time and derives a software defect resolution model from the corresponding software reliability growth model to track defect discovery. The Markovian approach develops a state model from the stages of the software defect lifecycle as well as a transition probability matrix and the distributions for each transition, providing a semi-Markov model. Both the distribution and Markovian approaches employ a censored estimation technique to identify the maximum likelihood estimates, in order to handle the case where some but not all of the defects discovered have been resolved. Furthermore, we apply a hypothesis test to determine if a first or second order Markov chain best characterizes the defect lifecycle. Our results indicate that a first order Markov chain was sufficient to describe the data considered and that the Markovian approach achieves modest improvements in predictive accuracy, suggesting that the simpler distributional approach may be sufficient to characterize the software defect resolution process during test. The practical inferences of such models include an estimate of the time required to discover and remove all defects.

    Speaker Info:

    Lance Fiondella

    Associate Professor

    University of Massachusetts

    Lance Fiondella is an associate professor of Electrical and Computer Engineering at the University of Massachusetts Dartmouth. He received his PhD (2012) in Computer Science and Engineering from the University of Connecticut. Dr. Fiondella’s papers have received eleven conference paper awards, including six with his students. His software and system reliability and security research has been funded by the DHS, NASA, Army Research Laboratory, Naval Air Warfare Center, and National Science Foundation, including a CAREER Award.

  • Connecting Software Reliability Growth Models to Software Defect Tracking Poster

    Speaker Info:

    Melanie Luperon

    Student

    University of Massachusetts

  • Creating formal characterizations of routine contingency management in commercial aviation

    Abstract:

    Traditional approaches to safety management focus on collection of data describing unwanted states (i.e., accidents and incidents) and analysis of undesired behaviors (i.e., faults and errors) that precede those states. Thus, in the traditional view of safety, safety is both defined and measured by its absence, namely the lack of safety. In extremely high confidence systems like commercial air transport, however, opportunities to measure the absence of safety are relatively rare. Ironically, a critical barrier to measuring safety and the impact of mitigation strategies in commercial aviation is the lack of opportunities for measurement.
    While traditional approaches to safety that focus only on minimizing undesired outcomes have proven utility, they represent, at best, an incomplete view of safety in complex sociotechnical domains such as aviation. For example, pilots and controllers successfully manage contingencies during routine, everyday operations that contribute to the safety of the national airspace system. However, events that result in successful outcomes are not systematically collected or analyzed. Characterization and measurement of routine safety-producing behaviors would create far more opportunities for measurement of safety, potentially increasing the temporal sensitivity, utility and forensics capability of safety assurance methods that can leverage these metrics.
    The current study describes an initial effort to characterize how pilots and controllers manage contingencies during routine everyday operations and specify that characterization within a cognitive architecture that could potentially be transformed into a formal framework for verification. Rather than focus on rare events in which things went wrong, this study focused on frequent events in which operators adjusted their work to ensure things went right. Namely, this study investigated how operators responded to expected and unexpected disturbances during Area Navigation (RNAV) arrivals into Charlotte Douglas International Airport (KCLT). Event reports submitted to NASA’s Aviation Safety Reporting System (ASRS) that referenced one or more of the KCLT RNAV arrivals were examined. The database search returned 29 event reports that described air carrier operations on one of the RNAV arrivals. Those 29 event reports included 39 narratives, which were examined to identify statements describing safety-producing performance using the Resilience Analysis Grid (RAG) framework (Hollnagel, 2011). The RAG identifies four basic capabilities of resilience performance: anticipating, monitoring for, responding to, and learning from disruptions. Analysis of the 39 ASRS narratives revealed 99 statements describing resilient behaviors, which were categorized to create a taxonomy of 19 resilient performance strategies.
    The strategies in this taxonomy can be classified and tagged, and can then be formally described as a scenario that leads to either the preservation or degradation of safety. These scenarios can be abstracted and translated into temporal logic formulae, which serve as procedural rules in a knowledge database. This procedure outlines the means by which a set of relational rules that represent the communal knowledge of the system are captured and utilized. We are then able to check whether the knowledge base is consistent and create classes and subclasses which allows for generalization of a particular strategic instance. This procedure enables the future development of a classifier inference engine.

    Speaker Info:

    Jon Holbrook

    Cognitive Scientist

    NASA

  • Creating Insight into Joint Cognitive System Performance

    Abstract:

    The DOD’s ability to collect data is far outstripping its ability to convert that data into actionable information. This problem is not unique to the military. With vast improvements in our ability to collect data, many organizations are drowning in that data. As a result, the need for advanced algorithms (Artificial Intelligence, Machine Learning, etc.) to support human work has never been greater. However, algorithm performance and the performance of the larger work system, composed of human and machine agents, are not synonymous. In order to build and field high performance work systems that consistently provide positive mission impact, the S&T community must measure the performance of the entire Joint Cognitive System (JCS). Algorithm performance, by itself, is necessary but not sufficient for predicting the performance of these systems in the field. Better predicting JCS performance once a system is fielded requires modelling, or at least learning about, the humans’ cognitive work and how the proposed technology will change that work. For example, providing practitioners with a new Machine Learning enabled support tool may impose unintended cognitive work as practitioners calibrate themselves to the performance envelope of the new tools. In addition, the practitioners’ cognitive work is shaped by many “soft constraints” which are often not captured by technologists. For example, real world time constraints may lead practitioners to use simple decision making heuristics, rather than deliberative hypothesis testing, when making critical decisions. The technology requirements needed to support both deliberate and heuristic decision making may be very different.

    This talk will discuss several approaches to gain insight on JCS performance as part of a larger cycle of iteratively discovering, building, and testing these systems. Additionally, this talk will give an overview of a small pilot study conducted by the Air Force Research Lab to measure the impact of a simulated Machine Learning agent designed to support AF intelligence analysts exploiting Full Motion Video data.

    Speaker Info:

    Taylor Murphy

    Cognitive Systems Engineer

    Air Force Research Laboratory

  • CVN 78 Sortie Generation Rate

    Speaker Info:

    Dean Thomas

    Deputy Director

    IDA

  • Cyber Tutorial - Jason Schlup

    Speaker Info:

    Jason Schlup

    IDA

  • Cyber Tutorial - Kelly Tran

    Speaker Info:

    Kelly Tran

    IDA

  • Cyber Tutorial - Lee Allison

    Speaker Info:

    Lee Allison

    IDA

  • Cyber Tutorial - Mark Herrera

    Speaker Info:

    Mark Herrera

    IDA

  • Cyber Tutorial - Peter Mancini

    Abstract:

    Cyberattacks are in the news every day, from data breaches of banks and stores to ransomware attacks shutting down city governments and delaying school years. In this mini-tutorial, we introduce key cybersecurity concepts and methods to conducting cybersecurity test and evaluation.  We walk you through a live demonstration of a cyberattack and provide real-world examples of each major step we take. The demonstration shows an attacker gaining command and control of a Nerf turret. We leverage tools commonly used by red teams to explore an attack scenario involving phishing, network scanning, password cracking, pivoting, and finally creating a mission effect. We also provide a defensive view and analytics that shows artifacts left by the attack path.

    Speaker Info:

    Peter Mancini

    IDA

  • D-Optimally Based Sequential Test Method for Ballistic Limit Testing

    Abstract:

    Ballistic limit testing of armor is testing in which a kinetic energy threat is shot at armor at varying velocities. The striking velocity and whether the threat completely penetrated or partially penetrated the armor is recorded. The probability of penetration is modeled as a function of velocity using a generalized linear model. The parameters of the model serve as inputs to MUVES which is a DoD software tool used to analyze weapon system vulnerability and munition lethality.

    Generally, the probability of penetration is assumed to be monotonically increasing with velocity. However, in cases in which there is a change in penetration mechanism, such as the shatter gap phenomena, the probability of penetration can no longer be assumed to be monotonically increasing and a more complex model is necessary. One such model was developed by Chang and Bodt to model the probability of penetration as a function of velocity over a velocity range in which there are two penetration mechanisms.

    This paper proposes a D-optimally based sequential shot selection method to efficiently select threat velocities during testing. Two cases are presented: the case in which the penetration mechanism for each shot is known (via high-speed or post shot x-ray) and the case in which the penetration mechanism is not known. This method may be used to support an improved evaluation of armor performance for cases in which there is a change in penetration mechanism.

    Speaker Info:

    Leonard Lombardo

    Mathematician

    U.S. Army Aberdeen Test Center

    Leonard currently serves is an analyst for the RAM/ILS Engineering and Analysis Division at the U.S. Army Aberdeen Test Center (ATC).  At ATC, he is the lead analyst for both ballistic testing of helmets and fragmentation analysis.  Previously, while on a developmental assignment at the U.S. Army Evaluation Center, he worked towards increasing the use of generalized linear models in ballistic limit testing.  Since then, he has contributed towards the implementation of generalized linear models within the test center through test design and analysis.

  • Dashboard for Equipment Failure Reports

    Abstract:

    Equipment Failure Reports (EFRs) describe equipment failures and the steps taken as a result of these failures. EFRs contain both structured and unstructured data. Currently, analysts manually read through EFRs to understand failure modes and make recommendations to reduce future failures. This is a tedious process where important trends and information can get lost. This motivated the creation of an interactive dashboard that extracts relevant information from the unstructured (i.e. free-form text) data and combines it with structured data like failure date, corrective action and part number. The dashboard is an RShiny application that utilizes numerous text mining and visualization packages, including tm, plotly, edgebundler, and topicmodels. It allows the end-user to filter to the EFRs that they care about and visualize meta-data, such as geographic region where the failure occurred, over time allowing previously unknown trends to be seen. The dashboard also applies topic modeling to the unstructured data to identify key themes. Analysts are now able to quickly identify frequent failure modes and look at time and region-based trends in these common equipment failures.
    DISTRIBUTION STATEMENT A - APPROVED FOR PUBLIC RELEASE; DISTRIBUTION IS UNLIMITED.

    Speaker Info:

    Cole Molloy

    Johns Hopkins Applied Physics Lab

  • Data Analysis for Special Operations Selection Programs

    Abstract:

    This study assesses the relationship between psychometric screening of candidates for our partner special operations unit and successful completion of the unit’s candidate selection program. Improving the candidate selection program is our primary goal for this study. Our partner unit maintains a comprehensive database of summary information on previous candidates but has not yet conducted a robust analysis of candidate attributes. We sought to achieve this goal by using statistical methods to determine predictors associated with successful completion of the selection program. Our results suggest that we may identify predictors associated with success but may struggle in constructing an effective predictive model due to the inherent differences of candidates. Our predictors are scales from standardized psychometric evaluations administered by the selection program intended originally to identify candidates with psychopathologies. Our outcome is a binary variable indicating successful completion of the program. Analyzing the demographics of the candidate selection population is our secondary goal for this study. Our analysis helped our partner unit improve its selection program by identifying characteristics associated with success. Other members of the special operations community may generalize our results to improve their respective programs as well. We do not intend for units to use these results to draw definitive conclusions regarding the ability of a candidate to pass a selection program.

    Speaker Info:

    Nicholas Cunningham

    Cadet

    United States Military Academy

  • Data Science from Scratch in the DoD

    Abstract:

    Although new organizations and companies are created every day in the private sector, in the public sector this is much rarer. As such, the establishment of Army Futures Command was the single most significant Army reorganization since 1973. The concept of the command is simple: build a better Army for years to come, which will be completed through harnessing artificial intelligence and big data analysis to quickly process information and identify trends to shape modernization efforts. This presentation will the share lessons learned from standing up a Data and Decision Sciences Directorate within Army Futures Command and address the pitfalls associated with developing an enduring strategy and capability for a command managing a $30B+ modernization portfolio from day one.

    Speaker Info:

    Cade Saie

    Director of Data and Decision Sciences

    US Army Futures Command

    Lieutenant Colonel (Promotable) Cade Saie is an active officer in the United States Army and serves as an Operations Research Systems Analyst (ORSA). He was commissioned in the Army as an Infantry officer and has served in the 82nD Airborne and 1st Armored Divisions, commanding an infantry company in Ramadi, Iraq from 2005-2006. After command, he transitioned to FA49 (Operations Research Systems Analyst) and was assigned to the TRADOC Analysis Center-Ft Lee before attending the Air Force Institute of Technology (AFIT) where he received a master’s and doctorate degree. Upon graduation, he was assigned to the U.S. Army Cyber Command at Ft. Belvoir, VA where he established and led the command’s data science team from 2014-2017. In October of 2017 he was selected to be part of the Army Futures Command Task Force upon its standup and served in that capacity until the command was established and took a position as the AFC Chief Data Officer and the Director of Data and Decision Sciences.
    He has been published in the European Journal of the Operational Research, Journal of Algorithms and Computational Technology, and The R Journal. He has been awarded Dr. Wilbur B. Payne Memorial Award for Excellence in Analysis in 2009 and 2015.
    He has a B.S. in Computer Information’s Systems from Norwich University in Northfield, VT and a M.S. in Systems Engineering and PhD in Operations Research.

  • Deep Learning Models for Image Analysis & Object Recognition

    Speaker Info:

    Ridhima Amruthesh

    DATA SOCIETY

    Ridhima is a data scientist who enjoys using the Python and R programming languages to explore, analyze, visualize and present data. Ridhima believes that anyone, regardless of their background, can learn and benefit from technical skills. In addition to teaching at Data Society, Ridhima has a strong background in computer science and that pushed her to pursue her master's degree in Information Systems. She realized her passion was to help educate others on the importance of using data to derive insights, for all fields. She is currently a manager at Data Society and loves to help build and grow the data science team as well as educate others. She is passionate about creating interactive visualizations and building insightful visualizations for clients. Ridhima holds a MS in Information Systems from the University of Maryland, College Park and BE In Computer Science and Engineering.

  • Design Fractals: Visualizing the Coverage Properties of Covering Arrays

    Abstract:

    The challenge of identifying test cases that maximize the chance of discovering faults at minimal cost is a challenge that software test engineers constantly face. Combinatorial testing is an effective test case selection strategy to address this challenge. The idea is to select test cases that ensure that all possible combinations of settings from two (or more) inputs are accounted for, regardless of which subset of inputs are selected. This is accomplished by using a covering array as the test case selection mechanism. However, for any given testing scenario, there are usually several alternative covering arrays that may satisfy budgetary and coverage requirements. Unfortunately, it is often unclear how to choose from these alternatives. Practitioners often want a way to explore how the input space is being covered before deciding which alternative is best suited for the testing scenario. In this presentation we will provide an overview of a new graphical method that may be used to visualize the input space of covering arrays. We use the phrase "design fractals" to refer to this graphical method.

    Speaker Info:

    Joseph Morgan

    Research Statistician

    SAS Institute Inc

  • Designing Competitions that Reward Robust Algorithm Performance

    Abstract:

    Supervised learning competitions such as those hosted by Kaggle provide a quantitative, objective way to evaluate and compare algorithms. However, the typical format of evaluating competitors against a fixed set of data can encourage overfitting to that particular set. If the competition host is interested in motivating development of approaches that will perform well against a more general “in the wild” problem outside of the competition, the winning algorithms of these competitions might fall short because they’re tuned too closely to the competition data. Furthermore, the idea of having a single final ranking of the competitors based on the competition data set ignores the possibility that that ranking might change substantially if a different data set were used, even one that has the same statistical characteristics as the original.

    We present an approach for designing training and test sets that reward more robust algorithms and discourage overfitting with the intent of improved performance for scenarios beyond the competition. These carefully designed sets also enable a rich and nuanced analysis and comparison of the performance of competing algorithms, including a more flexible final ranking. We illustrate these methods with two competitions recently designed and hosted by Los Alamos National Laboratory to improve detection, identification, and location of radiological threats in urban environments.

    Speaker Info:

    Kary Myers

    Deputy Group Leader, Statistical Sciences Group

    Los Alamos National Laboratory

  • Development and Analytic Process Used to Develop a 3-Dimensional Graphical User Interface System for Baggage Screening

    Abstract:

    The Transportation Security Administration (TSA) uses several types of screening technologies for the purposes of threat detection at airports and federal facilities across the country.   Computed Tomography (CT) systems afford TSA personnel in the Checked Baggage setting a quick and effective method to screen property with less need to physically inspect property due to their advanced imaging capabilities.  Recent reductions in size, cost, and processing speed for CT systems spurred an interest in incorporating these advanced imaging systems at the Checkpoint to increase the speed and effectiveness of scanning personal property as well as passenger satisfaction during travel.  The increase in speed and effectiveness of scanning personal property with fewer physical property inspections stems from several qualities native to CT imaging that current 2D X-Ray based Advanced Technology 2 (AT2) systems typically found at Checkpoints lack.  Specifically, the CT offers rotatable 3D images and advanced identification algorithms that allow TSA personnel to more readily identify items requiring review on-screen without requesting that passengers remove them from their bag.
    The introduction of CT systems at domestic airports led to the identification of a few key Human Factors issues, however.  Several vendors used divergent strategies to produce the CT systems introduced at domestic airport Checkpoints.  Each system offered users different 3D visualizations, informational displays, and identification algorithms, offering a range of views, tools, layouts, and material colorization for users to sort through. The disparity in system similarity and potential for multiple systems to operate at a single airport resulted in unnecessarily complex training, testing, certification, and operating procedures.  In response, a group of human factors engineers (HFEs) was tasked with creating requirements for a single common Graphical User Interface (GUI) for all CT systems that would provide a standard look, feel, and interaction across systems.
    We will discuss the development and analytic process used to 1.) gain an understanding of the tasks that CT systems must accomplish at the Checkpoint (i.e. focus groups), 2.) identify what tools Transportation Security Officers (TSOs) tend to use and why (i.e. focus groups and rank-ordered surveys), and 3.) determine how changes during iterative testing effects performance (i.e. A/B testing while collecting response time, accuracy, and tool usage). The data collection effort described here resulted in a set of requirements that produced a highly usable CT interface as measured by several valid and reliable objective and subjective measures.  Perceptions of the CGUI’s usability (e.g., the System Usability Scale; SUS) were aligned with TSO performance (i.e., Pd, PFA, and Throughput) during use of the CGUI prototype. Iterative testing demonstrated an increase in the SUS score and performance measures for each revision of the requirements used to produce the common CT interface.  User perspectives, feedback, and performance data also offered insight toward the determination of necessary future efforts that will increase user acceptance of the redesigned CT interface.  Increasing user acceptance offers TSA the opportunity to improve user engagement, reduces errors, and the likelihood that the system will stay in service without a mandate.

    Speaker Info:

    Charles McKee

    President and CEO

    Taverene Analytics LLC

    Mr. McKee provides Test and Evaluation, Systems Engineering, Human Factors Engineering, Strategic Planning, Capture Planning, and Proposal Development support to companies supporting the Department of Defense and Department of Homeland Security.  Recently served as President of the Board of Directors, International Test and Evaluation Association (ITEA), 2013 – 2015. Security Clearance: Secret, previously cleared for TS SCI. Homeland Security Vetted.

    TSA Professional Engineering Logistics Support Services (PELSS2) (May 2016 – present) for Global System Technologies (GST) and TSA Operational Test & Evaluation Support Services (OTSS) and Test & Evaluation Support Services (TESS) (Aug 2014 – 2016) for Engility. Provides Acquisition Management, Systems Engineering, Operational Test & Evaluation (OT&E), Human Factors Engineering (HFE), test planning, design of experiments, data collection, data analysis, statistics, and evaluation reporting on Transportation Security Equipment (TSE) systems deployed to Airports and Intermodal facilities. Led the design and development of a Common Graphical User Interface (CGUI) for new Checkpoint Computed Tomography systems. The CGUI design maximized the Probability of Detection, minimized probability of false alarms, while improving throughput time for screening accessible property by Transportation Security Officers (TSO’s) at airports.

    Division Manager, Alion Science and Technology, 2009-2014. Oversaw program management and technical support for the Test and Evaluation Division.  Provided analysis support to multiple clients such as:  Army Program Executive Office (PEO) Simulation Training Instrumentation (STRI) STARSHIP program and DISA Test and Evaluation (T&E) Mission Support Services contract. Provided Subject Matter Expertise to all client on program management, test and evaluation, statistical analysis, modeling and simulation, training, human factors engineering / human systems integration, and design of experiments.

    Operations Manager, SAIC, 2006-2009. Oversaw the program management and technical support for the Test, Evaluation, and Analysis Operation (TEAO). Provided analysis support to multiple clients such as the Director, Operational Test and Evaluation (DOT&E), Joint Test & Evaluation (JT&E), Test Resource Management Center (TRMC), OSD AT&L Systems Engineering, Defense Modeling and Simulation Office (DMSO), Air Force Test and Evaluation (AF/TE), US Joint Forces Command (USJFCOM) Joint Forces Integration and Interoperability Test (JFIIT), and the Air Combat Command (ACC) Red Flag Exercise Support. Provided Subject Matter Expertise to all clients on program management, test and evaluation, statistical analysis, modeling and simulation, training, human factors engineering / human systems integration, and design of experiments.

    Senior Program Manager, Human Factors Engineer. BDM / TRW / NGC (1989-2000 and 2003-2006).  Provided Human Factors Engineering / Manpower Personnel Integration support to the Army Test and Evaluation Command (ATEC) / Army Evaluation Center (AEC), FAA Systems Engineering Integrated Product Team (SEIPT), and TSA Data Collection, Reduction, Analysis, Reporting, and Archiving (DCRARA) Support. Developed Evaluation Plans, Design of Experiments (DOE), requirements analysis, test planning, test execution, data collection, reduction, analysis, statistical analysis, and military assessments of Army programs.  Supported HFE / MANPRINT working groups and System MANPRINT Management Plans. Conducted developmental assessments of System Safety, Manpower, Personnel, Training, and Human Systems Interfaces. MS, Industrial Engineering, NCSU, 1989. Major: Human Factors Engineering. Minor: Occupational Safety and Health.  Scholarship from the National Institute of Occupational Safety and Health (NIOSH). Master’s Thesis on Cumulative Trauma Disorders in occupations with repetitive motion.

     

  • Development of Predictive Models for Brain-Computer Interface Systems: A Case Study

    Abstract:

    Recent advances in brain computer interfaces (BCI) has brought interest in exploring the possibility of BCI like technology for future armament systems. In an experiment, conducted at the Tactical Behavioral Lab (TBRL), electroencephalogram (EEG) data was collected during simulated engagement scenarios in order to study the relationship between a soldier's state of mind and his biophysical signals. One of the goals was to determine if it was possible to anticipate the decision to fire a gun. The nature of EEG data presents a unique set of challenges. For example, the high sensitivity of EEG electrodes coupled with their close proximity on the scalp means recorded data is noisy and highly correlated. Special attention needs to be payed to data pre-processing and feature engineering in building predictive models.

    Speaker Info:

    Kevin Eng

    Statistician

    US Army CCDC Armaments Center

  • Dynamic Model Updating for Streaming Classification and Clustering

    Abstract:

    A common challenge in the cybersecurity realm is the proper handling of high-volume streaming data. Typically in this setting, analysts are restricted to techniques with computationally cheap model-fitting and prediction algorithms. In many situations, however, it would be beneficial to use more sophisticated techniques. In this talk, a general framework is proposed that adapts a broad family of statistical and machine learning techniques to the streaming setting. The techniques of interest are those that can generate computationally cheap predictions, but which require iterative model-fitting procedures. This broad family of techniques includes various clustering, classification, regression, and dimension reduction algorithms. We discuss applied and theoretical issues that arise when using these techniques for streaming data whose distribution is evolving over time.

    Speaker Info:

    Alexander Foss

    Senior Statistician

    Sandia National Laboratories

  • Employing Design of Experiments (DOE) in an Electronic Warfare Test Strategy/Design

    Abstract:

    Electronic warfare systems generate diverse signals in a complex environment. This briefing covers how a particular system employed DOE to develop the overall strategy, determined which tests would utilize DOE, and generated specific designs. It will cover the technical and resource challenges encountered throughout the process leading up to testing. We will discuss the difficulty in defining responses and factors and ideas on how to resolve these issues. Lastly we will cover impacts from shortened schedules and how DOE enabled a quantitative risk assessment.

    Speaker Info:

    Michael Harman

    Statistical Test Designer

    STAT COE

  • Ensemble and Compositional Methods for Uncertainty Calibration in Deep Learning

    Abstract:

    We study the problem of post-hoc calibration of machine learning classifiers. We introduce the following desiderata for uncertainty calibration: (a) accuracy-preserving, (b) data-efficient, and (c) high expressive power. We show that none of the existing methods satisfy all three requirements, and demonstrate how our proposed calibration strategies can help achieve dramatically better data efficiency and expressive power while provably preserving classification accuracy of the original classifier. When calibrating a 50-layer Wide ResNet on ImageNet classification task, the proposed strategies improve the expressivity of temperature scaling by 17% and data efficiency of isotonic regression by a factor of 258, while preserving classification accuracy.

    Speaker Info:

    Jize Zhang

    Postdoctoral Research Staff Member

    Lawrence Livermore National Laboratory

  • Expanding the Systems Test Engineer’s Toolbox A Multi-Criteria Decision Analysis Technique using the One-Sided Tolerance Interval for Analysis of DOE Planned Tests

    Abstract:

    Co-Authors: Luis Cortes (MITRE) and Alethea Duhon (Air Force Agency for Modeling and Simulation)

    This talk highlights the advantage of a multi-criteria decision analysis (MCDA) technique that utilizes the one-sided tolerance interval (OSTI) for the analysis of response data resulting from design of experiments (DOE) structured testing.  Tolerance Intervals (TI) can be more intuitive for representing data and, when combined with DOE structured tests results, are excellent for providing decision quality information.  The use of statistical techniques in planning provides a rigorous approach for the analysis of test data, an ideal input for a MCDA technique.  This article’s findings demonstrate the value of constructing the OSTI for the interpretation of DOE structured tests.  We also demonstrate the utility of using an optimized model combined with the OSTI as part of a MCDA technique for choosing between alternatives, which offers an efficient and powerful method to sift through test results rapidly and efficiently.  This technique provides an alternative for analyzing data across multiple alternatives when traditional analysis techniques, such as the descriptive statistics (including the Confidence Interval [CI]), potentially obscure information from the untrained eye.  Finally, the technique provides a level playing field--a critical feature given today’s acquisition protest culture.

    Speaker Info:

    Michael Sheehan

    Principal Engineer

    MITRE

  • Finding a Cheater: Machine Learning and the Dice Game “Craps”

    Abstract:

    Machine learning is becoming increasingly imbedded in military systems, so how do we know if it’s working, or as the name implies learning? This presentation and demonstration is a thought experiment for tester’s to consider how to measure and test systems with imbedded machine learning. This presentation involves a physical demonstration of elements of the popular casino game craps. Craps is a fast-paced dice game that offers players the opportunity to try and beat the odds, which are stacked in the casino’s favor. But what if a player could flip the odds on the casino by cheating? Can the audience detect the cheater? How about a machine learning algorithm? The audience will be able to make their own guesses and compare them to A-Dell, a machine learning algorithm designed to find craps cheaters.

    Speaker Info:

    Paul Johnson

    Scientific Advisor

    MCOTEA

  • Generalized Linear Multilevel Model Applied to a Forced-Choice Psychoacoustic Test

    Abstract:

    Linear regression is often extended to either (1) multilevel models in which response data cannot be assumed independent, e.g., nested data, or (2) generalized linear models in which response data is not normally distributed. Applying both extensions to binary responses results in generalized linear multilevel models in which a sigmoid link function gives the relationship between the mean of the response variable and a linear combination of predictors. Such models are well-suited to analyze psychophysical experiments, in which fitted sigmoids give the relationship between physical stimulus level and the probability of a human perceptual response. In this work, a generalized linear multilevel model is applied to a three-alternative forced-choice psychoacoustic test in which human test subjects were asked to identify a sound signal presented at different levels relative to a background noise. Since the guessing rate at low stimulus levels must converge to 1/3, a custom link function is applied. In this test, the grouping variable is the subject, because within-subject responses are assumed to be more alike than between-subject responses. This leads to readily available information about population-level parameters, such as how auditory thresholds are distributed within the group, how steep the psychometric functions are and if the differences are statistically significant. The multilevel model also demonstrates the effect of shrinkage in which the partially-pooled regression parameters are closer to the population mean than parameters found by un-pooled analyses.

    Speaker Info:

    Matthew Boucher

    Research Engineer

    NASA Langley Research Center

  • Graph link prediction in computer networks using Poisson Matrix Factorisation

    Abstract:

    Graph link prediction is an important task in cyber-security: relationships between entities within a computer network, such as users interacting with computers, or clients connecting to servers, can provide key insights into adversary behaviour. Poisson matrix factorization (PMF) is a popular model for link prediction in large networks, particularly useful for its scalability. An extension to PMF to include scenarios that are commonly encountered in cyber-security applications are presented. Specifically, an extension is proposed to include known covariates associated with the graph nodes as well as a seasonal variation to handle dynamic networks.

    Speaker Info:

    Melissa Turcotte

    Research Scientist

    Los Alamos National Laboratory

  • I have the Power! Power Calculation in Complex (and Not So Complex) Modeling Situations Part 1

    Abstract:

    Invariably, any analyst who has been in the field long enough has heard the dreaded questions: “Is X number of samples enough? How much data do I need for my experiment?” Ulterior motives aside, any investigation involving data must ultimately answer the question of “How many?” to avoid risking either insufficient data to detect a scientifically significant effect or having too much data leading to a waste of valuable resources. This can become particularly difficult when the underlying model is complex (e.g. longitudinal designs with hard-to-change factors, time-to-event response with censoring, binary responses with non-uniform test levels, etc.). Even in the supposedly simpler case of categorical factors, where run size is often chosen using a lower bound power calculation, a simple approach can mask more “powerful” techniques. In this tutorial, we will spend the first half exploring how to use simulation to perform power calculations in complex modeling situations drawn from relevant defense applications. Techniques will be illustrated using both R and JMP Pro. In the second half, we will investigate the case of categorical factors and illustrate how treating the unknown effects as random variables induces a distribution on statistical power, which can then be used as a new way to assess experimental designs.

    Instructor Bio: Caleb King is a Research Statistician Tester for the DOE platform in the JMP software. He received his MS and PhD in Statistics from Virginia Tech and worked for three years as a statistical scientist at Sandia National Laboratories prior to arriving at JMP. His areas of expertise include optimal design of experiments, accelerated testing, reliability analysis, and small-sample theory

    Speaker Info:

    Caleb King

    JMP Division, SAS Institute Inc.

  • I have the Power! Power Calculation in Complex (and Not So Complex) Modeling Situations Part 2

    Abstract:

    Instructor Bio: Ryan Lekivetz is a Senior Research Statistician Developer for the JMP Division of SAS where he implements features for the Design of Experiments platforms in JMP software.

    Speaker Info:

    Ryan Lekivetz

    JMP Division, SAS Institute Inc.

  • Improving Cyber Resiliency through the Application of Behavioral Science to Cybersecurity

    Abstract:

    The mission of the Human Behavior and Cybersecurity Capability at MITRE is to leverage human behavior to reduce cybersecurity risk using behavioral sciences to understand and strengthen the human firewall. The Capability consists of a team of experienced behavioral scientists who bring applied subject-matter-expertise in human behavior to cybersecurity challenges, with demonstrated successes and thought leadership in insider threat and usable security. This presentation will introduce the four focus areas of the Capability: Insider Threat; Usable Security and Technology Adoption; Cybersecurity Assessment and Exercise Support; and Cybersecurity Risk Perceptions and Awareness. We will then discuss the methods and metrics used to study human behavior within each of these focal areas.

    Insider Threat Assessment: Our insider threat behavioral risk assessments use qualitative research practices to elicit discussion and disclosure of risks from the perspective of potential insiders. MITRE’s Insider Threat Framework, developed to address the challenges experienced by the National Critical Infrastructure in classifying and assessing insider threats, is a data-driven framework that includes psycho-social and cyber-physical characteristics that could be common, observable indicators for insider attacks. The continuous approach to developing this framework includes consistently structuring, hand-coding, collating and analyzing a large dataset (5,000-10,000) of raw insider threat investigation case files shared directly from multiple organizations. The aggregated framework, but not the sensitive raw data, is shared with insider threat programs to operationalize and facilitate the identification, prevention, detection and mitigation of insider threats.

    Usable Security and Technology Adoption: Our goal of studying the human aspects of usable security technologies is to improve decision making and enhance technology adoption by users. We will present examples of how we have applied psychological principles and behavioral methods to design valid and reliable approaches to evaluate the feasibility of new security programs, products, and resources such as AI-enabled security technology.

    Cybersecurity Assessment and Exercise Support: Complex collaborations among cybersecurity teams are becoming increasingly important but can trigger new challenges that impact mission success. We assess, evaluate and train individuals and teams to improve knowledge among cyber professionals during cybersecurity exercises. Examples of our work include cognitive adaptability and exercise assessments, development of team collaboration and performance metrics, team sense-making, and cyber team resilience.

    Cybersecurity Risk Perceptions and Awareness: Perceptions of cybersecurity risks and threats, and resulting decisions about how to approach or mitigate them have the potential to impact the effectiveness of cybersecurity programs. We evaluate how people, processes and technology impact tactical and strategic risk-based decisions and apply behavioral concepts to inform the cybersecurity risk framework and awareness programs.

    Speaker Info:

    Poornima Madhavan

    Principal Behavioral Scientist

    MITRE

  • Integrating Systems Engineering into Test Strategy Development and Systems Evaluation

    Abstract:

    As defense systems become more complex, multi-domain, and interdependent, the problem arises: What is the best way to determine what we need to test, and how much testing is adequate? A methodology, based on systems engineering, was developed specifically for use in Live Fire Test and Evaluation (LFT&E); however, the process can be applied to operational or developmental testing as well.

    The use of Systems Engineering principles involves understanding the prioritized warfighter (user) needs, and applies a definition process that identifies critical test issues. The main goals of this methodology are to clearly define the system performance priorities, based on the critical mission risks if performance is insufficient, and then to generate various solutions that will mitigate those associated risks. The systems engineering process also helps develop a common language and framework so all parties can discuss tradeoffs between cost, schedule and performance. It also produces specific products for each step of the process that ensure that each step has been adequately addressed.

    Most importantly, the methodology should enable better communication among and within the program, test, and oversight teams by clarifying mission and test priorities, clarifying test objectives, evaluating risks from proposed testing and demonstrated performance, and reporting decision-quality information. The result of implementing the methodology is two-fold: It produces a test strategy that prioritizes testing based on the criticality and uncertainty of the system’s performance; and it guides the development of a system evaluation that clearly links test outcomes to overall mission risks.

    Speaker Info:

    Charlie Middleton

    Test and Evaluation Expert

    OSD Scientific Test and Analysis Techniques Center of Excellence

  • International journal of astrobiology

    Abstract:

    The Europa Clipper mission must comply with the NASA Planetary Protection requirement in NASA Procedural Requirement (NPR) 8020.12D, which states: “The probability of inadvertent contamination of an ocean or other liquid water body must be less than 1×10-4 per mission”. Mathematical approaches designed to assess compliance with this requirement have been offered in the past, but no accepted methodology was in-place to trace the end-to-end probability of contamination: from a terrestrial microorganism surviving in a non-terrestrial ocean, to the impact scenario that put them there, to potential flight system failures that led to impact, back to the initial bioburden launched with the spacecraft. As a result, hardware could presumably be either over- or under-cleaned. Over-specified microbial reduction protocols can greatly add to the cost (and schedule) of a project. On the other hand, if microbes on hardware are not sufficiently eliminated, there is increased risk of potentially contaminating another body with terrestrial organisms – adversely affecting scientific exploration and possibly conflicting with international treaty.

    The anticipated Mars Sample Return Campaign would be subject to a similar challenge regarding returning Martian material to Earth. A proposed requirement is “The MSR campaign shall have a probability of releasing unsterilized Martian particles, with diameters ≥ 50 nanometers (TBC), into Earth’s biosphere ≤ 1x10-6”. A similar question arises: what is required in terms of ensuring sterilization or containing Martian particles in order to meet the requirement?
    The mathematical framework and other interesting sensitivities that the analysis has revealed for each mission are discussed.

    Speaker Info:

    Kelli Mccoy

    Mars Sample Return Campaign Risk Manager

    JPL

  • Introduction to Machine Learning: Classification Algorithms

    Abstract:

    This short course discusses the applications of machine learning from a lay person's perspective and presents the landscape of approaches and their utility. We then dive into a technical hands-on workshop implementing classification algorithms to build predictive models, tune them, and interpret their results. Applications include forecasting behaviors and events. Topics that will be covered include: introduction to machine learning & its applications, introduction to classification and supervised machine learning, classification algorithms, and classification performance metrics.
    *Pre-requisites: Attendees must be comfortable using R to manipulate data and must know how to create basic visualizations with ggplot2.

    Speaker Info:

    Martin Skarzynski

    DATA SOCIETY

    My primary research interest is in understanding health risk factors by combining scientific expertise from diverse fields with machine intelligence.
    I believe I am uniquely equipped to bridge the gaps between scientific disciplines and deliver on the promise of data science in health research.
    My preferred tools are R and Python, open source programming languages kept on the cutting edge by their active and supportive communities.
    Through research and teaching, I am constantly improving my ability to obtain, tidy, explore, transform, visualize, model, and communicate data.
    I aim to utilize my technical skills and science background to become a leader among the next generation of multidisciplinary data scientists.

  • Introduction to Structural Equation Modeling: Implications for Human-System Interactions

    Abstract:

    Structural Equation Modeling (SEM) is an analytical framework that offers unique opportunities for investigating human-system interactions. SEM is used heavily in the social and behavioral sciences, where emphasis is placed on (1) explanation rather than prediction, and (2) measuring variables that are not observed directly. The framework facilitates modeling of survey data through confirmatory factor analysis and latent (i.e., unobserved) variable regression models. We provide a general introduction to SEM by describing what it is, the unique features it offers to analysts and researchers, and how it is easily implemented in JMP Pro 15.1. The introduction relies on a fun example everyone can relate to. Then, we shed light on a few published studies that have used SEM to unveil insights on human performance factors and the mechanisms by which performance is affected. The key goal of this presentation is to provide general exposure to a modeling tool that is likely new to most in the fields of defense and aerospace.

    Speaker Info:

    Laura Castro-Schilo

    Research Statistician Developer

    JMP Division, SAS Institute, Inc.

  • Introduction to Uncertainty Quantification for Practitioners and Engineers

    Abstract:

    Uncertainty is an inescapable reality that can be found in nearly all types of engineering analyses. It arises from sources like measurement inaccuracies, material properties, boundary and initial conditions, and modeling approximations. Uncertainty Quantification (UQ) is a systematic process that puts error bands on results by incorporating real world variability and probabilistic behavior into engineering and systems analysis. UQ answers the question: what is likely to happen when the system is subjected to uncertain and variable inputs. Answering this question facilitates significant risk reduction, robust design, and greater confidence in engineering decisions. Modern UQ techniques use powerful statistical models to map the input-output relationships of the system, significantly reducing the number of simulations or tests required to get accurate answers.

    This tutorial will present common UQ processes that operate within a probabilistic framework. These include statistical Design of Experiments, statistical emulation methods used to create the simulation inputs to response relationship, and statistical calibration for model validation and tuning to better represent test results. Examples from different industries will be presented to illustrate how the covered processes can be applied to engineering scenarios. This is purely an educational tutorial and will focus on the concepts, methods, and applications of probabilistic analysis and uncertainty quantification. SmartUQ software will only be used for illustration of the methods and examples presented. This is an introductory tutorial designed for practitioners and engineers with little to no formal statistical training. However, statisticians and data scientists may also benefit from seeing the material presented from a more practical use than a purely technical perspective.

    There are no prerequisites other than an interest in UQ. Attendees will gain an introductory understanding of Probabilistic Methods and Uncertainty Quantification, basic UQ processes used to quantify uncertainties, and the value UQ can provide in maximizing insight, improving design, and reducing time and resources.

    Instructor Bio: Gavin Jones, Sr. SmartUQ Application Engineer, is responsible for performing simulation and statistical work for clients in aerospace, defense, automotive, gas turbine, and other industries. He is also a key contributor in SmartUQ’s Digital Twin/Digital Thread initiative. Mr. Jones received a B.S. in Engineering Mechanics and Astronautics and a B.S. in Mathematics from the University of Wisconsin-Madison.

    Speaker Info:

    Gavin Jones

    Sr. Application Engineer

    SmartUQ

  • KC-46A Adaptive Relevant Testing Strategies to Enable Incremental Evaluation

    Abstract:

    The DoD’s challenge to provide capability at the “Speed of Relevance” has generated many new strategies to adapt to rapid development and acquisition. As a result, Operational Test Agencies (OTA) have had to adjust their test processes to accommodate rapid, but incremental delivery of capability to the warfighter. The Air Force Operational Test and Evaluation Center (AFOTEC) developed the Adaptive Relevant Testing (ART) concept to answer the challenge. In this session, AFOTEC Test Analysts will brief examples and lessons learned from implementing the ART principles on the KC-46A acquisition program to identify problems early and promote the delivery of individual capabilities as they are available to test. The AFOTEC goal is to accomplish these incremental tests while maintaining a rigorous statistical evaluation in a relevant and timely manner. This discussion will explain in detail how the KC-46A Initial Operational Test and Evaluation (IOT&E) was accomplished in a unique way that allowed the test team to discover, report on, and correct major system deficiencies much earlier than traditional methods.

    Speaker Info:

    J. Quinn Stank

    Lead KC-46 Analyst

    AFOTEC

    First Lieutenant J. Quinn Stank is the Lead Analyst for the Air Force Operational Test and Evaluation Center Detachment 5 at Outside Location Everett, Washington. The lieutenant serves as the advisor to the Operational Test and Evaluation team for the KC-46A.
    Lieutenant Stank, originally from Knoxville, Tn., received his commission as a second lieutenant upon graduation from the United States Air Force Academy in 2016.

    EDUCATION:

    2016 Bachelor of Science degree in operations research, United States Air Force Academy, Colorado Springs, Co.
    2018 Operations Research/Systems Analysis-Military Applications Course, Fort Lee, Va.

    ASSIGNMENTS:

    1. August 2016 – April 2018, student, undergraduate pilot training, Sheppard AFB, Texas
    2. April 2018 – present, Lead Analyst, KC-46A, AFOTEC Det 5 OL-EW, Seattle, Wash.

    MAJOR AWARDS AND DECORATIONS:

    National Defense Service Medal

    Air Force Outstanding Unit Award

    Air Force Commendation Medal

    EFFECTIVE DATES OF PROMOTION:

    Second Lieutenant June 02, 2016
    First Lieutenant June 02, 2018
  • Lunch Keynote

    Speaker Info:

    Yisroel Brumer

    Principal Deputy Director

    CAPE

    Dr. Yisroel Brumer is the Principal Deputy Director of Cost Assessment and Program Evaluation (CAPE) in the Office of the Secretary of Defense. In this role, he oversees all CAPE analysis and activities including strategic studies, programmatic analysis, and cost estimates across the entire Department of Defense. In particular, he leads CAPE’s involvement in the annual program and budget review process, providing oversight and decision support for content and funding for the entire DoD five-year fiscal year defense program. Within the acquisitions process, he leads CAPE’s oversight of all major investment programs across the DoD, with particular emphasis on the analysis of alternative investment strategies, cost estimation, cost-benefit analyses, economic analyses, and programmatic tradeoffs. He also guides the Strategic Portfolio Reviews – cross-cutting studies requested by the Secretary of Defense on high-interest issues critical to the success of the Department. Finally, he oversees Independent Cost Estimates and cost analysis, ensuring that such processes provide accurate information and realistic estimates of acquisition program cost to senior Department leaders and Congressional defense committees.
    From 2017 to 2018, Dr. Brumer was CAPE’s Deputy Director for Analysis and Innovation, where he was responsible for executing major cross-cutting analyses across the entire Defense portfolio. Beginning in 2012, he led CAPE’s Strategic, Defensive, and Space Programs Division, providing advice and analysis on over $60B per year of programs ranging from antiterrorism to intercontinental ballistic missiles. In that role, he was hand-picked to oversee the Secretary of Defense’s number one priority, a multibillion dollar revamp of the entire nuclear enterprise, including a cultural overhaul and the initial stages of the Nuclear Triad modernization. From 2010 to 2012, he served as Director of CAPE’s Program Analysis Division, where he led major cross-cutting analyses and all DoD Front End Assessments, all selected by and briefed directly to the Secretary of Defense. Dr. Brumer first joined CAPE in 2005 as an Operations Research Analyst, where he conducted analysis and provided advice to senior leaders on analytical tradeoffs in the DoD's science and technology, homeland defense, nuclear command and control, and combating weapons of mass destruction portfolios.
    Dr. Brumer holds a Ph.D. in Chemical Physics and a Master of Science in Chemistry from Harvard University, as well as a Bachelor of Science in Chemistry from the University of Toronto. After conducting postdoctoral research at Harvard on the physics of complex biological systems, he joined the Department of Homeland Security's Science and Technology Directorate as a fellow with the American Association for the Advancement of Science (AAAS), where he was a pioneering member of a number of key programs.
    Dr. Brumer has received the Presidential Rank Award of Meritorious Executive, the Secretary of Defense Medal for Meritorious Civilian Service (with Bronze Palm), the Secretary of Defense Award for Excellence, the Space and Command, Control, Communications, Computers, Intelligence, Surveillance, and Reconnaissance High Impact Analysis Award, and the Daniel Wilson Scholarship in Chemistry, as well as other awards. When not in the Pentagon, Dr. Brumer enjoys spending time with his fantastic wife Kim and their two excellent children, Eliana and Netanel.

  • Machine Learning for cybersecurity - Self Learning Systems for Threat Detection

    Speaker Info:

    Nisha Iyer

    DATA SOCIETY

    Nisha is a data scientist who enjoys using the Python and R programming languages to explore, analyze, visualize and present data. Nisha believes that anyone, regardless of their background, can learn and benefit from technical skills. In addition to teaching at Data Society, Nisha has worked in corporate consulting and media to build and grow data science teams. During this time, she not only built the teams but was educating others in the company on the importance and need for data science. Data Society has helped her grow her passion for spreading data literacy across commercial and government clients.
    Nisha holds an MS in Data Science from the George Washington University in Washington DC and a BA in Communication from the University of MD, College Park.

  • Machine Learning Reveals that Russian IRA's Twitter Topic Patterns Evolved over Time

    Abstract:

    Introduction: Information Operations (IO) are a key component of our adversaries' strategy to undermine U.S. military power without escalating to more traditional (and more easily identifiable) military strikes. Social media activity is one method of IO. In 2017 and 2018, Twitter suspended thousands of accounts likely belonging to the Kremlin-backed Internet Research Agency (IRA). Clemson University archived a large subset of these tweets (2.9M tweets posted by over 2800 IRA accounts), tagged each tweet with metadata (date, time, language, supposed geographical region, number of followers, etc.), and published this dataset on the polling aggregation website FiveThirtyEight.

    Speaker Info:

    Emily Parrish

    Research Associate

    IDA

  • Modeling Human-System Interaction in UAM: Design and Application of Human-Autonomy Teaming

    Abstract:

    Authors: Vincent E. Houston, Joshua L. Prinzel
    Urban Air Mobility (UAM) is defined as “…a safe and efficient system for air passenger and cargo transportation within an urban area. It is inclusive of small package delivery and other urban unmanned aerial system services and supports a mix of onboard/ground-piloted and increasingly autonomous operations” [1] (p. 4). UAM operations likely require autonomous systems to enable functions ranging from simplified vehicle operations [2] to fleet and resource management [3]. Automation has had a significant and ubiquitous role in aviation, it has been generally limited in capability and characterized by poor human-system design with numerous disastrous consequences [4]. Autonomy, however, represents a significant evolutionary step up from automation. Autonomous systems are characterized by the capabilities to “independently assume functions typically assigned to human operators, with less human intervention overall and for longer periods of time” [5]. Autonomous systems are self-directed, self-sufficient, and non-deterministic [6] [7].
    The system requirements and architectures for UAM represents a variety of functions that can be termed “work-as-imagined” to characterize the notion of how people think work is done and how work is actually [8]. Work-as-imagined is defined through three basic sources: Experience of work-as-done; knowledge and understanding of work-as-prescribed; and exposure to work-as-disclosed [9]. UAM represents a revolutionary approach to aviation, the gap between work-as-imagined to ultimately work-as-done may be significant. An example is represented by the substantial system architecture concepts and technological solutions based on assumptions that UAM will be fully autonomous rather than increasingly autonomous in application.
    The emerging field of human-autonomy teaming represents a new paradigm that explores the various mechanisms by which humans and machines can work and think together [5] [10] [11]. A team is defined as, “a distinguishable set of two or more agents who interact, dynamically, interdependently, and adaptively toward a common and valued goal/objective/mission” [11] (p. 4). The literature points to pitfalls associated with some automation implementation strategies (e.g., inadequate supervisory control, poor vigilance, skill loss, etc.). A comprehensive, coherent, cohesive prioritized research-driven and data empirical-based approach has been hypothesized to ensure future success of UAM. This work shows that performance is better human autonomy teaming [11].
    The proposal shall discuss the burgeoning field of human-autonomy teaming with emphasis on the challenges of identifying data requirements and human-autonomy teaming research needs for UAM. The innovative vision of community air taxi operations currently remains primarily conceptual and ill-defined with few practical and working prototypes. The Autonomous System Technologies for Resilient Airspace Operations (ASTRAO) describes one of NASA’s increasingly autonomous technologies showcasing the potential of the human-autonomy teaming design approach. ASTRAO is a Simplified Vehicle Operation [2] UAM application that utilizes machine learning and data algorithms, coupled with human-autonomy teaming principles and human factors optimization. The technology research and development effort is intended to provide design solutions for future air taxi flight decks, with less experienced and trained pilots, to that of remote supervisory operations for many-to-one vehicles to ground station / remote pilots and/or air traffic service providers.

    Speaker Info:

    Vincent E. Houston

    Computer/Machine Learning Research Engineer

    NASA

  • Morning Keynote

    Speaker Info:

    Michael Seablom

    Chief Technologist

    Science Mission Directorate

    Michael Seablom is the Chief Technologist for the Science Mission Directorate at NASA Headquarters. He has the responsibility for surveying and assessing technology needs for the Heliophysics, Astrophysics, Earth Science, and Planetary Science Divisions, and is the primary liaison to the NASA Office of Chief Technologist and the Space Technology Mission Directorate.

  • Morning Keynote - Greg Zacharias

    Speaker Info:

    Greg Zacharias

    Chief Scientist

    DOT&E

    Dr. Greg Zacharias serves as Chief Scientist to the Director of Operational Test and Evaluation, providing scientific and technical (S&T) guidance on the overall approach to assessing the operational effectiveness, suitability, and survivability of major DOD weapon systems. He advises the DOT&E in critical S&T areas including: emerging technologies; modeling and simulation (M&S); human-systems integration; and test design/analysis. Dr. Zacharias also represents the DOT&E on technical groups focused on policy, programs, and technology assessments, interacting with the DOD, industry, and academia.

    Before this appointment, Dr. Zacharias was the Chief Scientist of the US Air Force (USAF), advising the Secretary and the Chief of Staff, providing assessments on a range of S&T issues affecting the Air Force mission, and interacting with other Air Staff principals, acquisition organizations, and S&T communities. He served on the Executive Committee of the Air Force Scientific Advisory Board (SAB), and was the principal USAF S&T representative to the civilian scientific/engineering community and the public. His office published an autonomous systems roadmap entitled “Autonomous Horizons: The Way Forward.”

    Earlier, Dr. Zacharias served as President and Senior Principal Scientist of Charles River Analytics, providing strategic direction for the Government Services and Commercial Solutions Divisions. Before co-founding Charles River, he was a Senior Scientist at Raytheon/BBN, where he developed and applied models of human decision-making in multi-agent dynamic environments. Earlier, as a Research Engineer at the CS Draper Laboratory, Dr. Zacharias focused on advanced human/machine interface design issues for the Space Shuttle, building on an earlier USAF assignment at NASA, where he was responsible for preliminary design definition of the Shuttle reentry flight control system.

    Dr. Zacharias served on the Air Force SAB for eight years, contributing to nine summer studies, including chairing a study on “Future Operations Concepts for Unmanned Aircraft Systems.” As a SAB member he also chaired the Human System Wing Advisory Group, was a member of Air Combat Command’s Advisory Group, and served as a technical program reviewer for the Air Force Research Laboratory. He was a member of the National Research Council (NRC) Committee on Human-Systems Integration for over ten years, supporting several NRC studies including a DMSO-sponsored study of military human behavior models, and co-chairing a follow-up USAF-sponsored study to identify promising DOD S&T investments in the area. He has served on the DOD Human Systems Technology Area Review and Assessment (TARA) Panel, Embry-Riddle’s Research Advisory Board, MIT’s Engineering Systems Division Advisory Board, the Board of the Small Business Technology Council (SBTC), and was the founding Chair of the Human Systems Division of the National Defense Industrial Association (NDIA).

    Dr. Zacharias obtained his BS, MS, and PhD degrees in Aeronautics and Astronautics at MIT, where he was an MIT Sloan Scholar. He is a Distinguished Graduate and Distinguished Alumnus of USAF Officer Training School (OTS), and has received the USAF Exceptional Civilian Service Award, and twice received the USAF Meritorious Civilian Service Award.

  • Natural Language Processing for Safety-Critical Requirements

    Abstract:

    Requirements specification flaws are still the biggest contributing factor to most accidents related to software. Most NASA projects have safety-critical requirements that, if implemented incorrectly, could lead to serious safety implications and/or mission-ending scenarios. There are normally thousands of system-/subsystem-/component-level requirements that need to be analyzed for safety criticality early in the project development life cycle. Manually processing such requirements is typically time-consuming and prone to error. To address this, we implemented and tested text classification models to identify requirements that are safety-critical within project documentation. We found that a naïve Bayes classifier was able to identify all safety-critical requirements with an average false positive rate of 41.35%. Future models trained on larger project requirement datasets may achieve even better performance, reducing the burden of processing requirements on safety and mission assurance personnel and improving the safety of NASA projects.

    Speaker Info:

    Ying Shi

    Safety and Mission Assurance

    NASA GSFC

  • Network Analysis

    Abstract:

    Understanding the connections and dependencies that exist in our data is becoming ever more important. This one-day course on network analysis will introduce many of the basic concepts of networks, including descriptive statistics (e.g., centrality, prestige, etc.), community detection, and an introduction to nonparametric inferential tests. Additionally, cutting-edge methods for creating so-called “psychometric networks” that focus on the connections between variables will be covered. Throughout, we will discuss visualization methods that can highlight the nature of connections between entities in the network, whether they are observations, variables, or both.

    Speaker Info:

    Doug Steinley

    University of Missouri

    Doug Steinley is a Professor in the Psychological Sciences Department at the University of Missouri. His research focuses on multivariate statistical methodology, with a primary interest in cluster analysis and social network analysis. His research in cluster analysis focuses on both traditional cluster analytic procedure (e.g., k-means cluster analysis) and more modern techniques (e.g., mixture modeling). In that the formulation of the general partitioning problem can be thought of in a graph theoretic nature, his research also involves combinatorics and social network analysis.

  • Opening Keynote - Norty Schwartz

    Speaker Info:

    Norton "Norty" Schwartz

    President and CEO

    IDA

    General Norton A. Schwartz serves as President and CEO of the Institute for Defense Analyses (IDA), a nonprofit corporation operating in the public interest. IDA manages three Federally Funded Research and Development Centers that answer the most challenging U.S. security and science policy questions with objective analysis leveraging extraordinary scientific, technical, and analytic expertise. At IDA, General Schwartz (U.S. Air Force, retired) directs the activities of more than 1,000 scientists and technologists employed by IDA.
    General Schwartz has a long and prestigious career of service and leadership that spans over five decades. He was most recently President and CEO of Business Executives for National Security (BENS). During his 6-year tenure at BENS, he was also a member of IDA’s Board of Trustees.
    Prior to retiring from the U.S. Air Force, General Schwartz served as the 19th Chief of Staff of the U.S. Air Force from 2008 to 2012. He previously held senior joint positions as Director of the Joint Staff and as the Commander of the U.S. Transportation Command. He began his service as a pilot with the airlift evacuation out of Vietnam in 1975. General Schwartz is a U.S. Air Force Academy graduate and holds a master’s degree in business administration from Central Michigan University. He is also an alumnus of the Armed Forces Staff College and the National War College. He is a member of the Council on Foreign Relations and a 1994 Fellow of Massachusetts Institute of Technology’s Seminar XXI. General Schwartz has been married to Suzie since 1981.

  • Optimal Designs for Multiple Response Distributions

    Abstract:

    Having multiple objectives is common is experimental design. However, a design that is optimal for a normal response can be very different from a design that is optimized for a nonnormal response. This application uses a weighted optimality criterion to identify an optimal design with continuous factors for three different response distributions. Both linear and nonlinear models are incorporated with normal, binomial and Poisson response variables. A JMP script employs a coordinate exchange algorithm that seeks to identify a design that is useful for all three of these responses. The impact from varying the prior distributions on the nonlinear parameters as well as changing the weights on the responses in the criterion is considered.

    Speaker Info:

    Brittany Fischer

    Arizona State University

  • Orbital position in space: What is Truth? A study comparing two astrodynamics systems

    Abstract:

    The Air Force maintains a space catalog of orbital information on tens-of-thousands of space objects, including active satellites and satellite debris. They also computationally project where they expect objects to be in the future. Each day, the Air Force issues warnings to satellite owner-operators about potential conjunctions (space objects passing near each other), which often results in one or both of the satellites maneuvering (if possible) for safety of flight. This problem grows worse as mega-constellations, such as SpaceX's Starlink, are launched.

    Speaker Info:

    Jason Sheldon

    Research Staff Member

    IDA

  • Overarching Tracker: A trend Analysis of System performance data

    Speaker Info:

    Caitlan Fealing

    Data Science Fellow

    IDA

  • Physics-Informed Deep Learning for Modeling and Simulation under Uncertainty

    Abstract:

    Recently, a Department of Energy (DOE) report was released on the concept of
    scientic machine learning (SML), which is broadly dened as "a computational tech-
    nology that can be trained, with scientic data, to augment or automate human skills."
    [1] As the demand for machine learning (ML) in science and engineering rapidly in-
    creases, it is important to have condence that the output of the ML algorithm is
    representative of the phenomena, processes, or physics being modeled. This is espe-
    cially important in high-stakes elds such as defense and aerospace. In the DOE report,
    three research themes were highlighted with the aim of providing condence in ML im-
    plementations. In particular, ML algorithms should be domain-aware, interpretable,
    and robust.
    Deep learning has become a ubiquitous term over the past decade due to its ability
    to model high-dimensional complex processes, but domain awareness, interpretability,
    and robustness in these large neural networks (NNs) are often hard to achieve. Recent
    advances in physics-informed neural networks (PINNs) are promising in that they can
    provide both domain awareness and a degree of interpretability [2, 3, 4]. These al-
    gorithms take advantage of the breadth of scientic knowledge built over centuries by
    fusing governing partial dierential equations into the NN training process. In this way,
    PINNs output physically admissible solutions. However, PINNs are generally deter-
    ministic, meaning interpretability and robustness suffer as it is unclear how uncertainty
    affects the model.
    Another noteworthy deep learning algorithm is the generative adversarial network
    (GAN). GANs are capable of modeling probability distributions in both forward and
    inverse problems, and thus have received a ood of inerest with over 15,000 citations
    of the seminal paper [5] in six years. A natural next step is to combine both PINNs
    and GANs to address all three themes laid out in [1]. The resultant physics-informed
    GAN (PI-GAN) is capable of both modeling physical processes and simultaneously
    quantifying uncertainty. A limited number of works have already demonstrated the
    success of PI-GANs [6, 7]. This talk will present an introduction to PI-GANs as well
    as an example of current NASA research implementing these networks.

    Speaker Info:

    Patrick Leser

    NASA Langley Research Center

  • Post-hoc Uncertainty Quantification for Remote Sensing Observing Systems

    Abstract:

    The ability of spaceborne remote sensing data to address important Earth and climate science problems rests crucially on how well the underlying geophysical quantities can be inferred from these observations. Remote sensing instruments measure parts of the electromagnetic spectrum and use computational algorithms to infer the unobserved true physical states. However, the accompanying uncertainties, if they are provided at all, are usually incomplete. There are many reasons why including but not limited to unknown physics, computational artifacts and compromises, unknown uncertainties in the inputs, and more.

    In this talk I will describe a practical methodology for uncertainty quantification of physical state estimates derived from remote sensing observing systems. The method we propose combines Monte Carlo simulation experiments with statistical modeling to approximate conditional distributions of unknown true states given point estimates produced by imperfect operational algorithms. Our procedure is carried out post-hoc; that is, after the operational processing step because it is not feasible to redesign and rerun operational code. I demonstrate the procedure using four months of data from NASA's Orbiting Carbon Observatory-2 mission, and compare our results to those obtained by validation against data from the Total Carbon Column Observing Network where it exists.

    Speaker Info:

    Amy Braverman

    Principal Statistician

    Jet Propulsion Laboratory, California Institute of Technology

  • Practical Applications for Functional Data Analysis in T&E

    Abstract:

    Testing today’s complex systems often requires advanced statistical methods to properly characterize and optimize measures of performance as a function of input factors. One promising area to more precisely model system behavior when the response is a curve or function over several measured time units is Functional Data Analysis (FDA). Some input factors such as sensor data could also be functions rather than held constant as is often assumed over the duration of the test run. Recent enhancements in common statistical software programs used across DoD and NASA now make FDA much more accessible to the analytical test community. This presentation will address the fundamental principles, workflow, and interpretation of results from FDA using a designed experiment for an autonomous system as an example. Additionally, we will address how to use FDA to establish a level of equivalence for modeling & simulation verification & validation efforts.

    Speaker Info:

    James Wisnowski

    Principal Consultant and Co-owner

    Adsurgo LLC

  • Predicting System Failures - A Statistical Approach to Reliability Growth Modeling

    Abstract:

    Reliability, in colloquial terms, is the ability of a system or piece of equipment to perform some required function when and where we need it to. We argue that new and unproven military equipment can benefit from a statistical approach for modeling reliability growth. The modern “standard” for these programs is the AMSAA Planning Model based on Projection Methodology (PM2). We describe how to augment PM2 with a statistical perspective to make reliability prediction more “data informed.” We have developed teaching “modules” to help elucidate this process from the ground up.

    Speaker Info:

    Kate Sanborn

    Invited Speaker

    North Carolina State University

  • Process/Workflow-Oriented View for Decision-Support Systems

    Abstract:

    Across application domains, analysts are tasked with an untenable situation of manually completing a big data analysis of a mix of quantitative and qualitative information sets. Human decision-making requires that evidence gathered from sources such as experiments, engineering analysis, and expert judgment be transformed into an appropriate format and presentation style. Distillation and interpretation of multi-source data can be supported through tools or decision-support systems that include automated features to reduce the mental burden on human analysts. Analysts benefit from a data-informed support tool that provides the correct information, at the right time, to arrive at the correct solution under uncertainty. My research has coupled a process/workflow-oriented view with knowledge and skill elicitation techniques to predict information analysts need and how they interact with and transform that information. Thus, this data-informed approach allows for a mapping of process steps and tools that will benefit analysts. As the state of data as it exists today is only likely to grow, not diminish over time, an approach to efficiently organizing and interpreting the data is crucial. Ultimately, improved decision making is realized across an entire workflow that is sustainable across time.

    Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525.

    Speaker Info:

    Nicole Murchison

    Systems Research and Analysis

    Sandia National Laboratories

  • Quantifying Computational Uncertainty in Supersonic Flow Predictions without Experimental Measurements

    Abstract:

    With the advancement of computational modeling, there is a push to reduce historically necessary experimental testing requirements for assessing vehicle component and system level performance. As a result, uncertainty quantification is a necessary part of predictive modeling, particularly regarding the modeling approach. In the absence of experimental data, model-form uncertainty may not be easily determined and model calibration may not be possible. Therefore, quantifying the potential variability as a result of model selection is required to accurately quantify performance, robustness, and reliability. This talk will outline a proposed approach to quantifying uncertainty in a variety vehicle applications with focus given particularly to supersonic flow applications. The aim is to first identify key sources of uncertainty in computational modeling, such as spatial and temporal discretization and turbulence modeling. Then, the classification and treatment of uncertainty sources is discussed, along with the potential impact of these uncertainties on performance predictions. Lastly, a description of five upcoming tests in the NASA Langley Unitary Plan Wind Tunnel designed to test predicative capability are briefly described.

    Speaker Info:

    Thomas West

    NASA Langley

  • Quantifying Science Needs and Mission Requirements through Uncertainty Quantification

    Abstract:

    With the existing breadth of Earth observations and our ever-increasing knowledge about our home planet, it proves more and more difficult to identify gaps in our current knowledge and the observations to fill these gaps. Sensitivity analyses (SA) and uncertainty quantification (UQ) in existing data and models can help to not only identify and quantify gaps but also provide the means to suggest how much impact a targeted new observation will have on the science in this field. This presentation will discuss general approaches and specific examples (e.g. Sea Level Rise research) how we use SA and UQ to systematically assess where gaps in our current understanding of a science area lie, how to identify the observations that fills the gap, and how to evaluate the expected impact of the observation on the scientific understanding in this area.

    Speaker Info:

    Carmen Boening

    Deputy Manager Earth Science Section/Research Scientist

    Jet Propulsion Laboratory/California Institute of Technology

    Dr. Carmen Boening is a Climate Scientist and Deputy Manager of the Earth Science Section at NASA’s Jet Propulsion Laboratory, California Institute of Technology. She received her PhD in physics with a focus on physical oceanography and space geodesy from the University of Bremen, Germany in 2009. After a postdoctoral appointment 2009-2011, she started her current position in climate science at JPL. Since 2015, her responsibilities have expanded to management roles, including the group supervisor role for the Sea Level and Ice group (2015-2018), Project Scientist of the Gravity Recovery and Climate Experiment (GRACE), and since 2018 Deputy Manager of the Earth Science Section at JPL. Her research focus is sea level science with an emphasis on how interannual fluctuations in the global water cycle influence sea level in the short and long term. Motivated by the fact that interannual and decadal variability have a significant impact on trend estimates and associated uncertainties, the estimation of uncertainties in climate predictions has become a significant part of her research.

  • Quantifying Uncertainty in Reporting System Usability Scale Results

    Abstract:

    Much work has been conducted over the last two decades on standardizing usability surveys for determining usability of a new system. In this presentation, we analyze not what we ask users, but rather how we report results from standard usability surveys such as the System Usability Scale (SUS). When the number of individuals surveyed is large, classical statistical techniques can be leveraged; however, as we will demonstrate, due to the skewness in the data these techniques may be suboptimal when the number of individuals surveyed is small. In such small-sample circumstances, we argue for use of the bias-corrected and accelerated bootstrap confidence interval. We further demonstrate how Bayesian inference can be leveraged to take advantage of the over 10 years worth of data that exists for SUS surveys. Finally, we demonstrate an online app that we have built to aid practitioners in quantifying the uncertainty in their SUS surveys.

    Speaker Info:

    Nick Clark

    Assistant Professor

    West Point - Math Department

  • Reevaluating Planetary Protection Bioburden Accounting and Verification Reporting

    Abstract:

    Biological cleanliness of launched spacecraft destined to celestial bodies is governed by International Planetary Protection Guidelines. In particular, spacecraft that are targeting bodies which are of scientific interest in understanding the origins of life or thought to harbor extant life must undergo microbial reduction and recontamination prevention regimes throughout the hardware assembly, test and launch operations that result in a direct verification of biological cleanliness of spacecraft surfaces. As a result of this verification associated biologicals are enumerated on petri dishes and then numerically treated to account for sample volumes, sample device efficiency, and laboratory processing efficiencies to arrive at a bioburden density. The current NASA approach utilizes a 1950’s Viking-based mathematical treatment which factors in the raw colony count, sample device and processing efficiency and fraction of extract analyzed. Historically per NASA direction, samples are grouped based upon flight hardware structural and functional proximity and if the value of raw colony counts is zero it is changed to one. In previous missions that launched from 1996 to 2018, a combination of Poisson and Gaussian statistics that evolved from mission to mission were utilized. In 2019, the statistical approach for performing bioburden accounting and verification reporting was re-evaluated to develop a technique that is both mathematically and biologically valid. Multiple mission datasets have been analyzed at high level and it has been determined that since there is a significant data set for each mission with low incidence rates of biological counts, a Bayesian model would be appropriate. Data from the InSight mission was then utilized to demonstrate the application of these models and approach on spacecraft sampled data using both informed and non-informed priors and subsequently compared to the current and historical mission mathematical treatments. The Bayesian models were within family to the previous and heritage approaches, and as an added benefit were able to provide a range and associated confidence intervals for the reported values. From the preliminary work, we propose that these models present a valid mathematical and biological approach for reporting spacecraft bioburden to be utilized in final requirements reporting and as an input into the initial bioburden population used for probabilistic risk assessments. Further development of the models will include a full spacecraft bioburden verification comparison as well as the utilization of ground truth experiments, as deemed necessary.

    Speaker Info:

    James Benardini

    Sr. Planetary Protection Engineer

    Jet Propulsion Laboratory

  • Resilience and Productive Safety Considerations for In-Time System-Wide Safety Assurance

    Abstract:

    Authors: Lawrence J. Prinzel III, Jon B. Holbrook, Kyle E. Ellis, Chad L. Stephens
    New innovative technologies and operational concepts will be required to meet the ever increasing global demands on air transportation. The NASA System-Wide Safety (SWS) project is focused on how future aviation advances can meet demand needs while maintaining today’s ultra-safe system safety levels. Aviation safety as it evolves shall require new ways of thinking about safety, integrating a wide-range of existing and new safety systems and practices, creating and enhancing tools and technologies, leveraging the access to system-wide data and data fusion, improving data analysis capabilities, and developing new methods for in-time risk monitoring and detection, hazard prioritization and mitigation, safety assurance decision-support, and in-time integrated system analytics [NRC, 2018]. To meet these needs, the SWS project has developed research priorities including In-time System-wide Safety Assurance (ISSA) and development of In-time Aviation Safety Management System (IASMS) [Ellis et al., 2019]. As part of this effort, the concepts of “resilience” and “productive safety” are being studied. Traditional approaches to aviation safety have focused on what can go wrong and how to prevent it. Another approach to thinking about system safety should reflect not only “avoiding things that go wrong” (protective safety) but also “ensuring that things go right” (productive safety) that together enables a system to exhibit resilience. On-going SWS research is focused on application of these concepts for ISSA and design of IASMS.
    NASA identified significant challenges and research needs for ISSA and IASMS for Urban Air Mobility (UAM) [NRC, 2018] [Ellis et al., 2019]. UAM is an emerging concept of operation that features small aircraft providing on-demand transportation over relatively shorter distances within urban areas. The SWS project has focused on development of UAM-domain safety monitoring and alerting tools, integrated predictive technologies for UAM-level application, and adaptive in-time safety threat management [Ellis et al., 2019]. Significant research challenges include how to identify data sources and indicators for in-time safety critical risks, how to analyze those data to detect and prioritize risks, and how to optimize safety awareness and safety action decision support. Because UAM is also being used to evaluate the safety paradigm of “work-as-imagined” - characterizing how people think their work is done in comparison to how work is actually done as they are all too often not the same. The challenges associated with ISSA and development of IASMS are significant even for existing air transportation system operations where work-as-imagined and work-as-done can actually be compared. However, because UAM currently exists only as work-as-imagined, the safety challenges are far greater for meeting ISSA needs and design of IASMS.
    The present proposal shall discuss resilience and productive safety considerations for in-time safety assurance and safety management systems for UAM. Topics include challenges of collecting productive safety in-time data, granularity of data types and measurement, the need for new analytical methods, issues for identifying in-time productive safety metrics and indicators, and potential approaches toward quantification of resilience indices. Recommendations and future research directions shall also be described.

    Speaker Info:

    Lawrence J. Prinzel III

    Senior Aerospace Research Engineer

    NASA Langley Research Center

  • Sequential Testing and Simulation Validation for Autonomous Systems

    Abstract:

    Autonomous systems expect to play a significant role in the next generation of DoD acquisition programs. New methods need to be developed and vetted, particularly for two groups we know well that will be facing the complexities of autonomy: a) test and evaluation, and b) modeling and simulation. For test and evaluation, statistical methods that are routinely and successfully applied throughout DoD need to be adapted to be most effective in autonomy, and some of our practices need to be stressed. One is sequential testing and analysis, which we illustrate to allow testers to learn and improve incrementally. The other group needing to rethink practices best for autonomy is the modeling and simulation. Proposed are some statistical methods appropriate for modeling and simulation validation for autonomous systems. We look forward to your comments and suggestions.

    Speaker Info:

    Jim Simpson

    Principal

    JK Analytics

  • Software Testing

    Abstract:

    Systematically testing software for errors and demonstrating that it meets the system specification is a necessary component of assuring trustworthiness of information systems. Software testing is often costly and time consuming when conducted correctly, but the consequence of poor quality testing is even higher, especially for critical systems. This short course will provide an introduction to software testing including the process of testing within the software development lifecycle, as well as techniques and considerations in choosing test cases for constructing comprehensive test suites to achieve coverage of the code and/or input space as relevant to the system under test. Existing tools for test automation and test suite construction will be presented.

    Speaker Info:

    Erin Lanus

    Virginia Tech

    Erin Lanus is a Research Assistant Professor at the Hume Center for National Security and Technology at Virginia Tech. She has a Ph.D. in Computer Science with a concentration in cybersecurity  from Arizona State University. Her experience includes work as a Research Fellow at University of Maryland Baltimore County and as a High Confidence Software and Systems Researcher with the Department of Defense. Her current interests are software and combinatorial testing, machine learning in cybersecurity, and artificial intelligence assurance. 

  • Spectral Embedding and Cyber Networks

    Abstract:

    We are given a time series of graphs, for example those defined by connections between computers in network flows. Several questions are relevant to cyber security:
    1) What are the natural groupings (clustering) of the computers on the network, and how do these evolve in time?
    2) Is the graph "abnormal" for this day/time and hence indicative of a large scale problem?
    3) Are certain nodes acting "abnormally" or "suspiciously"?
    4) Given that some computers cannot be uniquely resolved (due to various types of dynamic IP address assignments), can we pair a newly observed computer with it's previous instantiation in an earlier hour?
    In this talk, I will give a very brief introduction to some spectral graph methods that have shown promise for answering some of these questions, and present some preliminary results. These methods are much more widely applicable, and if time permits I will discuss some of the areas to which they are currently being applied.

    Speaker Info:

    David Marchette

    Principal Scientist

    Naval Surface Warfare Center, Dahlgren Division

  • STAT COE Autonomy Test and Evaluation Workshop Highlights

    Abstract:

    The Scientific Test and Analysis Techniques (STAT) Center of Excellence (COE) and Science of Test Research Consortium Advancements in Test and Evaluation of Autonomous Systems (ATEAS) Workshop, held 29-31 October 2019, at the Wright Brothers Institute, is part of a study being conducted on behalf of the Office of the Secretary of Defense (OSD).The goal of the study is to determine the current state of autonomous systems used within the Department of Defense (DoD), industry, and academia, with a focus on the test, evaluation, verification, and validation of those systems. The workshop addressed two overarching study objectives: 1) identify and develop methods and processes, and identify lessons learned needed to enable rigorous test and evaluation (T&E) of autonomous systems; and 2) refine current challenges and gaps in DoD methods, processes, and test ranges to rigorously test and evaluate autonomous systems. The workshop also introduced the STAT COE’s data call to the DoD, industry, and academia. This data call further informs the autonomy community on the specific efforts of, and challenges faced by, collaborators in T&E of autonomous systems. Finally, the workshop provided STAT COE members with an excellent opportunity to form partnerships with members of the DoD with the intent of finding an autonomous system pilot program in each service branch that could benefit from STAT COE support and provide first-hand experience in T&E of autonomous systems. Major takeaways from the presentations and panels included updated information on the challenges originally identified by the STAT COE in 2015, as well as new challenges related to modeling and simulation and data ownership, storage, and sharing within the DoD. The workshop also yielded information about current efforts for T&E of autonomous systems in DoD, industry, and academia; two completed data calls; targeted population to which to send the data call; and several potential pilot programs which the STAT COE could support. The next steps of the STAT COE will be to distribute the workshop report and data call, and engage with pilot programs to get first-hand experience in T&E of autonomous systems and attempt to find solutions to the challenges currently identified.

    Speaker Info:

    Troy Welker

    Analyst

    STAT COE

  • Statistical Analysis of a Transonic Aerodynamic Calibration

    Abstract:

    The Monte Carlo method of uncertainty analysis was used to characterize the uncertainty of tunnel conditions within the calibration of a wind tunnel. To calibrate the tunnel, a long static pipe, consisting of 444 static pressure ports, was used. Data from this calibration was used to generate the data needed to be implemented into the analysis. Monte Carlo analysis specifically looks at all potential possibilities using this generated data by propagating it through the equations for tunnel conditions. Any method of uncertainty analysis would normally encompass precision, bias, and fossilized uncertainty, however, for this particular analysis, precision uncertainty is assumed to be negligible. All remaining uncertainties were calculated using a sigma value of 2 with a 95% confidence level over a normal Gaussian distribution. At the end of the analysis, a comparison was done to see the effect of fossilized uncertainty on the overall uncertainty, which showed that this often overlooked portion of uncertainty does cause a noticeable change in the overall uncertainty of the value. With the consideration of fossilized uncertainty, the Monte Carlo method of analysis is a useful method for characterizing the uncertainty of tunnel conditions for a wind tunnel calibration.

    Speaker Info:

    Lindsey Drone

    Data Engineer

    NASA Ames Research Center

  • Statistical Engineering for Service Life Prediction of Polymers

    Abstract:

    Economically efficient selection of materials depends on knowledge of not just the immediate properties, but the durability of those properties. For example, when selecting building joint sealant, the initial properties are critical to successful design. These properties change over time and can result in failure in the application (buildings leak, glass falls). A NIST led industry consortium has a research focus on developing new measurement science to determine how the properties of the sealant change with environmental exposure. In this talk, the two-decade history of the NIST led effort will be examined through the lens of Statistical Engineering, specifically its 6 phases: (1) Identify the problem. (2) Provide structure. (3)Understand the context. (4) Develop a strategy. (5) Develop and execute tactics. (6) Identify and deploy a solution.

    Phases 5 and 6 will be the primary focus of this talk, but all of the phases will be discussed. The tactics of phase 5 were often themselves multi-month or year research problems. Our approach to predicting outdoor degradation based only on accelerated weathering in the laboratory has been revised and improved many times over several years. In phase 6, because of NIST’s unique mission of promoting U.S. innovation and industrial competitiveness, the focus has been outward on technology transfer and the advancement of test standards. This may differ from industry and other government agencies where the focus may be improvement of processes inside of the organization.

     

     

    Speaker Info:

    Adam Pintar

    Mathematical Statistician

    National Institute of Standards and Technology

    Adam Pintar is a Mathematical Statistician at the National Institute of Standards and Technology.  He applies statistical methods and thinking to diverse application areas including Physics, Chemistry, Biology, Engineering, and more recently Social Science.  He received a PhD in Statistics from Iowa State University.

  • Strategies for Sequential Experimentation

    Abstract:

    Design of experiments is typically presented as a “one shot” approach. However, it may be more efficient to divide the experiment into smaller pieces, thus expending resources in a smarter, more adaptive manner. This sequential approach becomes especially suitable when experimenters begin with very little information about the process, for example, when scaling up a new product. It allows for better definition of the design space, adaption to unexpected results, estimation of variability, reduction in waste, and validation of the results.

    The statistical literature primarily focuses on sequential experimentation in the context of screening, which in our experience is only the beginning of an overall strategy for experimentation. This tutorial begins with screening and then goes well beyond this first step for more complete coverage of this important topic:

    1. Screening before experimentation
    2. Adding and removing factors during the experiment
    3. Expanding, shrinking, or repairing the experimental design space during the experiment
    4. Validation experiments
    5. And more!

    Several real-world examples will be provided and demonstrated using software, thus providing attendees a solid briefing with very helpful aspects for practical application of sequential experimentation.

    Speaker Info:

    Martin Bezener

    Director of Research & Development

    Stat-Ease

  • Taking Down a Turret: Introduction to Cyber Operational Test and Evaluation

    Abstract:

    Cyberattacks are in the news every day, from data breaches of banks and stores to ransomware attacks shutting down city governments and delaying school years. In this mini-tutorial, we introduce key cybersecurity concepts and methods to conducting cybersecurity test and evaluation.  We walk you through a live demonstration of a cyberattack and provide real-world examples of each major step we take. The demonstration shows an attacker gaining command and control of a Nerf turret. We leverage tools commonly used by red teams to explore an attack scenario involving phishing, network scanning, password cracking, pivoting, and finally creating a mission effect. We also provide a defensive view and analytics that shows artifacts left by the attack path.

    Speaker Info:

    OED Cyber Lab

    IDA

  • The Challenge of Data for Predicting Human Errors

    Abstract:

    The field of human reliability analysis (HRA) seeks to predict sources of human error for safety-critical systems. Much of the dominant research in HRA was historically conducted for nuclear power, where regulations required risk models of hardware and humans to ensure the safe operation of nuclear power plants in the face of potential accident situations. A challenge for HRA is that the incidence of nuclear accidents has fortunately been extremely rare. Thus, actuarial data do not serve as a good source for informing predictions. Simulators make it possible to train performance for rare events, but data collection has largely focused on performance where human errors have occurred, omitting a denominator of good and bad performance that would be useful for prediction. As HRA has branched into new domains, there is the additional challenge to what extent those data that are available from primarily control room operations can be generalized to other types of human activities. To address these shortcomings, new data collection efforts are underway, focusing on (1) more comprehensive logging of operator performance in simulators, (2) development of new testbeds for collecting data, and (3) data mining of existing human performance data that were not originally intended for human error. In this talk, I’ll review underlying assumptions of HRA and map new data sources to solving data needs in new domains such defense and aerospace.

    Speaker Info:

    Thomas Ulrich

    Human Factors & Reliability Associate Scientist

    Idaho National Laboratory

  • The OTA Perspective on the Challenges of Testing in the Evolving Acquisition Environment and Proposed Mitigations

    Abstract:

    During the fall of 2019, AFOTEC led a cross-OTA rapid improvement event to identify and mitigate challenges Operational Test (OT) teams from all services are facing in the evolving acquisition environment.  Surveys were sent out to all of the service OTAs, AFOTEC compiled the results, and a cross-OTA meeting was held at the Institute for Defense Analysis (IDA) to determine the most significant challenges to address.  This presentation discusses the selected challenges and the proposed mitigations that were briefed to the Director, Operational Test and Evaluation (DOT&E) and the Service OTA Commanders at the OTA Roundtable in November 2019.

    Speaker Info:

    Leisha Scheiss

    AFOTEC

  • The Path to Assured Autonomy

    Abstract:

    Autonomous systems are becoming increasingly ubiquitous throughout society. They are used to optimize our living and work environments, protect our institutions and critical infrastructure, transport goods and people across the world, and so much more. However, there are fundamental challenges to (1) designing and verifying the safe and reliable operation of autonomous systems, (2) ensuring their security and resilience to adversarial attack, (3) predictably and seamlessly integrating autonomous systems into complex human ecosystems, and (4) ensuring the beneficial impact of autonomous systems on human society. In collaboration with government, industry, and academia, the Johns Hopkins Institute for Assured Autonomy addresses these challenges across three core pillars of technology, ecosystem, and policy & governance in order to drive a future where autonomous systems are trusted contributors to society.

    Speaker Info:

    Cara LaPointe

    Director

    Johns Hopkins University Institute for Assured Autonomy

    Dr. Cara LaPointe is a futurist who focuses on the intersection of technology, policy, ethics, and leadership. She works at the Johns Hopkins Applied Physics Laboratory where she serves as the Interim Co-Director of the Johns Hopkins Institute for Assured Autonomy to ensure that autonomous systems are safe, secure, and trustworthy as they are increasingly integrated into every aspect of our lives.
    During more than two decades in the United States Navy, Dr. LaPointe held numerous roles in the areas of autonomous systems, acquisitions, ship design and production, naval force architecture, power and energy systems, and unmanned vehicle technology integration. At the Deep Submergence Lab of the Woods Hole Oceanographic Institution (WHOI), she conducted research in underwater autonomy and robotics, developing sensor fusion algorithms for deep-ocean autonomous underwater vehicle navigation.
    Dr. LaPointe was previously a Senior Fellow at Georgetown University’s Beeck Center for Social Impact + Innovation where she created the “Blockchain Ethical Design Framework” as a tool to drive social impact and ethics into blockchain technology. Dr. LaPointe has served as an advisor to numerous global emerging technology initiatives and she is a frequent speaker on autonomy, artificial intelligence, blockchain, and other emerging technologies at a wide range of venues such as the United Nations, the World Bank, the Organization for Economic Co-operation and Development, SXSW, and the Aspen Institute in addition to various universities. Dr. LaPointe is a patented engineer, a White House Fellow, and a French American Foundation Young Leader. She served for two Presidents across an administration transition as the Interim Director of the President’s Commission on White House Fellowships.
    Cara holds a Doctor of Philosophy awarded jointly by the Massachusetts Institute of Technology (MIT) and WHOI, a Master of Science and a Naval Engineer degree from MIT, a Master of Philosophy from the University of Oxford, and a Bachelor of Science from the United States Naval Academy.

  • The Role of Sensitivity Analysis in Evaluating the Credibility of Machine Learning

    Abstract:

    We discuss several shock-physics applications we are pursuing using deep neural network approaches. These range from equation-of-state (EOS) inference for simple flyer-plate impact experiments when velocity time series are observed to radiographic inversion of high-speed impact simulations with composite materials. We put forward a methodology to leverage privileged information available from high-fidelity simulations to train a deep network prior to application on experimental observation. Within this process several sources of uncertainty are active and we discuss ways to mitigate these by careful structuring of the simulated training set. Once a network is trained the credibility of its inference, beyond performance on a test set, must be assessed. Without robust feature detection tools available for deep neural networks we show that much can be gained by applying classical sensitivity analysis techniques to the trained network. We show some results of this sensitivity analysis for our physics applications and discuss the caveats and pitfalls that arise when applying sensitivity analysis to deep machine learning algorithms.

    Speaker Info:

    Kyle Hickmann

    Scientist

    Los Alamos National Laboratory

  • The Role of Statistical Engineering in Creating Solutions for Complex Opportunities

    Abstract:

    Statistical engineering is the art and science for addressing complex organizational opportunities with data.  The span of statistical engineering ranges from the “problems that keep CEOs awake at night” to the analysts dealing with the results of the experimentation necessary for the success of their most current project. This talk introduces statistical engineering and its full spectrum of approaches to complex opportunities with data.  The purpose of this talk is to set the stage for the two specific case studies that follow it. Too often, people lose sight of the big picture of statistical engineering by a too narrow focus on the specific case studies.  Too many people walk away thinking “This is what I have been doing for years.  It is simply good applied statistics.”  These people fail to see what we can learn from each other through the sharing of our experiences to teach other people how to create solutions more efficiently and effectively.  It is this big picture that is the focus of this talk.

     

     

    Speaker Info:

    Geoff Vining

    Professor

    Virginia Tech

    Geoff Vining is a Professor of Statistics at Virginia Tech, where from 1999 – 2006, he also was the department head.  He holds an Honorary Doctor of Technology from Luleå University of Technology.  He is an Honorary Member of the ASQ (the highest lifetime achievement award in the field of Quality), an Academician of the International Academy for Quality, a Fellow of the American Statistical Association (ASA), and an Elected Member of the International Statistical Institute.  He is the Founding and Current Past-Chair of the International Statistical Engineering Association (ISEA).  He is a founding member of the US DoD Science of Test Research Consortium.
    Dr. Vining won the 2010 Shewhart Medal, the ASQ career award given to the person who has demonstrated the most outstanding technical leadership in the field of modern quality control.  He also received the 2015 Box Medal from the European Network for Business and Industrial Statistics (ENBIS). This medal recognizes a statistician who has remarkably contributed to the development and the application of statistical methods in European business and industry. In 2013, he received an Engineering Excellence Award from the NASA Engineering and Safety Center.  He received the 2011 William G. Hunter Award from the ASQ Statistics Division for excellence in statistics as a communicator, consultant, educator, innovator, and integrator of statistics with other disciplines and an implementer who obtains meaningful results.
    Dr. Vining is the author of three textbooks.  He is an internationally recognized expert in the use of experimental design for quality, productivity, and reliability improvement and in the application of statistical process control.  He has extensive consulting experience, most recently with the U.S. Department of Defense through the Science of Test Research Consortium and with NASA.

  • The Role of Uncertainty Quantification in Machine Learning

    Abstract:

    Uncertainty is an inherent, yet often under-appreciated, component of machine learning and statistical modeling. Data-driven modeling often begins with noisy data from error-prone sensors collected under conditions for which no ground-truth can be ascertained. Analysis then continues with modeling techniques that rely on a myriad of design decisions and tunable parameters. The resulting models often provide demonstrably good performance, yet they illustrate just one of many plausible representations of the data – each of which may make somewhat different predictions on new data.

    This talk provides an overview of recent, application-driven research at Sandia Labs that considers methods for (1) estimating the uncertainty in the predictions made by machine learning and statistical models, and (2) using the uncertainty information to improve both the model and downstream decision making. We begin by clarifying the data-driven uncertainty estimation task and identifying sources of uncertainty in machine learning. We then present results from applications in both supervised and unsupervised settings. Finally, we conclude with a summary of lessons learned and critical directions for future work.

    Speaker Info:

    David Stracuzzi

    Research Scientist

    Sandia National Laboratories

  • The Science of Trust of Autonomous Unmanned Systems

    Abstract:

    The world today is witnessing a significant investment in autonomy and artificial intelligence that most certainly will result in ever-increasing capabilities of unmanned systems. Driverless vehicles are a great example of systems that can make decisions and perform very complex actions. The reality though is that while it is well understood what these systems are doing, but not well at all ‘how’ the intelligence engines are generating decisions to accomplish those actions. Therein lies the underlying challenge of accomplishing formal test and evaluation of these systems and related, how to engender trust in their performance. This presentation will outline and define the problem space, discuss those challenges, and offer solution constructs.

    Speaker Info:

    Reed Young

    Program Manager for Robotics and Autonomy

    Johns Hopkins University Applied Physics Laboratory

  • Title Coming Soon

    Speaker Info:

    Kedar Phadke

    Phadke Associates, Inc

  • Trusted Collaborative Autonomous Systems for Multi Domain Operations

    Abstract:

    Every service Chief in the DoD is urging that the key to winning future wars is having the ability to rapidly understand our enemies and integrate maneuver across domains of land, sea, air and space - what is commonly referred to as multi-domain operations (MDO). Each service has some notion of how they will accomplish this and what it means for the pace and resilience of US warfighting but there is no universally agreed upon method to get there or even to define what “there” is. However, as far back as 2011, the DoD recognized the strategic significance of managing the disparate missions of communication and control, strike, surveillance, navigation, and electronic warfare as a single, integrated electromagnetic spectrum (ES) “maneuver space” or new strategic domain to span the traditional warfighting domains. This ES domain will be managed across many platforms (manned and unmanned) with many unique ES enabled payloads across vast distances in land, sea, air and space. Only by extending these platforms some degree of autonomy and providing them with the ability to collaborate will we achieve dominance in the ES domain. In essence, collaborative autonomous software services will form the backbone of MDO. We are already seeing this play-out in a number of defense and commercial efforts and there is little doubt that we (and our adversaries) are already on the path. Using traditional DoD methods of test and evaluation may be the greatest obstacle to deploying these capabilities. The DevOps (compound of Development Operations) movement emerged in earnest in the early 2010's as a way for large enterprise software service companies (e.g. Amazon, Google, Netflix) to continuously innovate and deliver new products to the market. It has since proliferated across the globe. Three major ideas underlying the DevOps culture are of interest to us: (1) breaking down monolithic autonomous software services into multiple loosely coupled “microservices;” (2) containerizing services for portability and baked in security; and (3) performing “continuous testing.” In this presentation, we will discuss how adopting widely available DevOps tools and methodologies will make verification and validation of collaborative autonomous systems faster, more robust and transparent and enable the DoD’s goal of achieving truly effective and scalable multi domain operations.

    Speaker Info:

    Robert Murphey

    Principal Research Engineer

    Georgia Tech Research Institute

  • Uncertainty Quantification and Check Standard Testing at NASA Glenn Research Facilities

    Abstract:

    Uncertainty quantification has been performed in various NASA Glenn Research Center (GRC) facilities over the past several years, primarily in the wind tunnel facilities. Uncertainty propagation analysis has received a bulk of the focus and effort put forth to date, and while it provides a vital aspect of the overall uncertainty picture, it must be supplemented by consistent data analysis and check standard programs in order to achieve and maintain a statistical basis for proven data quality.

    This presentation will briefly highlight the uncertainty propagation effort at NASA GRC, its usefulness, and the questions that remain unanswered. It will show how performing regular check standard testing fills in the gaps in the current UQ effort, will propose high-level test plans in two to three GRC facilities, and discuss considerations that need to be addressed in the planning process.

    Speaker Info:

    Erin Hubbard

    Data Engineer

    Jacobs / NASA Glenn Research Center

  • Uncertainty Quantification and Decomposition Methods for Risk-Sensitive Machine Learning

    Abstract:

    Machine learning methods have attracted a lot of research attention in scientific applications. However, model credibility remains a major challenge toward the reliable deployment of these models to the field in costly or risky use cases. In this talk, we explore uncertainty quantification techniques to asses the quality of neural network predictions in regression and classification problems. To understand model variability, we examine different sources of randomness associated with training samples, data observation order, weight initialization, dropout, and ensemble formations. Motivated by typical scientific computing applications, we assume a limited sample budget and suggest approaches for reporting and possibly reducing uncertainty.

    Speaker Info:

    Ahmad Rushdi

    Member of Technical Staff

    Sandia National Laboratories

  • Unlock Trends in Instrument Performance with Recurrence Studies and Text Analyses

    Abstract:

    Recurrence analysis models the frequency of recurrent events, such as break downs or repairs, to obtain the total repairs per unit as a function of time. Text analytics is used to extract useful summary information from maintenance records. The coupling of techniques results in dynamic, actionable information used to ensure optimal management of test instruments. This session provides a practical example of recurrence analyses and text analytics on data collected from multiple units of chromatography instrumentation. Analyses include the liberal use of data visualization and practical interpretation of results giving attendees enlightenment intended to extract the best performance from test equipment.
    Testing and evaluation typically involves the use of sophisticated instrumentation. The information contributed from instruments must be precise, accurate, and reliable due to critical decisions that are made from the data. Regular repairs and maintenance are needed to ensure robust instruments that operate properly. Recurrence analysis is used to obtain a mean cumulative function (MCF) to better understand instrument performance over time. The MCF can be used to estimate maintenance costs, explain repair tendencies, and compare units.
    Test instrumentation typically includes a large amount of documentation within use logs and maintenance records. Text analytics is used to quickly summarizes documented information into word clouds, document term matrices, or clusters for enhanced understanding. Common themes that arise from text analytics help to focus preventive maintenance and to ensure that the most needed parts and resources are available to mitigate testing delays. Latent semantic analyses enhance the information to better understand the topic vectors present within the documents.
    The dynamic linking of techniques provides for optimal understanding of the performance of test instrumentation over the life of the equipment. The results of the combined analyses allow for data-driven maintenance planning, sound justification for instrument replacement, and the assurance of robust test results.

    Speaker Info:

    Rob Lievense

    Systems Engineer

    SAS/JMP

  • 3D Mapping, Plotting, and Printing in R with Rayshader

    Abstract:

    Is there ever a place for the third dimension in visualizing data? Is the use of 3D inherently bad, or can a 3D visualization be used as an effective tool to communicate results? In this talk, I will show you how you can create beautiful 2D and 3D maps and visualizations in R using the rayshader package. Additionally, I will talk about the value of 3D plotting and how good aesthetic choices can more clearly communicate results to stakeholders.

    Rayshader is a free and open source package for transforming geospatial data into engaging visualizations using a simple, scriptable workflow. It provides utilities to interactively map, plot, and 3D print data from within R. It was nominated by Hadley Wickham to be one of 2018’s Data Visualizations of the Year for the online magazine Quartz.

    Speaker Info:

    Tyler Morgan-Wall

  • A 2nd-Order Uncertainty Quantification Framework Applied to a Turbulence Model Validation Effort

    Abstract:

    Computational fluid dynamics is now considered to be an indispensable tool for the design and development of scramjet engine components. Unfortunately, the quantification of uncertainties is rarely addressed with anything other than sensitivity studies, so the degree of confidence associated with the numerical results remains exclusively with the subject matter expert that generated them. This practice must be replaced with a formal uncertainty quantification process for computational fluid dynamics to play an expanded role in the system design, development, and flight certification process. Given the limitations of current hypersonic ground test facilities, this expanded role is believed to be a requirement by some in the hypersonics community if scramjet engines are to be given serious consideration as a viable propulsion system. The present effort describes a simple, relatively low cost, nonintrusive approach to uncertainty quantification that includes the basic ingredients required to handle both aleatoric (random) and epistemic (lack of knowledge) sources of uncertainty. The nonintrusive nature of the approach allows the computational fluid dynamicist to perform the uncertainty quantification with the flow solver treated as a ``black box''. Moreover, a large fraction of the process can be automated, allowing the uncertainty assessment to be readily adapted into the engineering design and development workflow. In the present work, the approach is applied to a model scramjet isolator problem where the desire is to validate turbulence closure models in the presence of uncertainty. In this context, the relevant uncertainty sources are determined and accounted for to allow the analyst to delineate turbulence model-form errors from other sources of uncertainty associated with the simulation of the facility flow.

    Speaker Info:

    Robert Baurle

  • A Causal Perspective on Reliability Assessment

    Abstract:

    Causality in an engineered system pertains to how a system output changes due to a controlled change or intervention on the system or system environment. Engineered systems designs reflect a causal theory regarding how a system will work, and predicting the reliability of such systems typically requires knowledge of this underlying causal structure. The aim of this work is to introduce causal modeling tools that inform reliability predictions based on biased data sources and illustrate how these tools can inform data integration in practice. We present a novel application of the popular structural causal modeling framework to reliability estimation in an engineering application, illustrating how this framework can inform whether reliability is estimable and how to estimate reliability using data integration given a set of assumptions about the subject matter and data generating mechanism. When data are insufficient for estimation, sensitivity studies based on problem-specific knowledge can inform how much reliability estimates can change due to biases in the data and what information should be collected next to provide the most additional information. We apply the approach to a pedagogical example related to a real, but proprietary, engineering application, considering how two types of biases in data can influence a reliability calculation.

    Speaker Info:

    Lauren Hund

  • A Quantitative Assessment of the Science Robustness of the Europa Clipper Mission

    Abstract:

    Existing characterization of Europa’s environment is enabled by the Europa Clipper mission’s successful predecessors: Pioneer, Voyager, Galileo, and most recently, Juno. These missions reveal high intensity energetic particle fluxes at Europa’s orbit, requiring a multidimensional design challenge to ensure mission success (i.e. meeting Level 1 science requirements).
    Risk averse JPL Design Principles and the Europa Environment Requirement Document (ERD) dictate practices and policy, which if masterfully followed, are designed to protect Clipper from failure or degradation due to radiation. However, even if workmanship is flawless and no waivers are assessed, modeling errors, shielding uncertainty, and natural variation in the Europa environment are cause for residual concern. While failure and part degradation are of paramount concern, the occurrence of temporary outages, causing loss or degradation of science observations, is also a critical mission risk, left largely unmanaged by documents like the ERD.
    The referenced risk is monitored and assessed through a Project Systems Engineering-led mission robustness effort, which attempts balance the risk of science data loss with potential design cost and increased mission complexity required to mitigate such risk. The Science Sensitivity Model (SSM) was developed to assess mission and science robustness, with its primary goal being to ensure a high probability of achieving Level 1 (L1) science objectives by informing the design of a robust spacecraft, instruments, and mission design.
    This discussion will provide an overview of the problem, the model, and solution strategies. Subsequent presentations discuss the experimental design used to understand the problem space and the graphics and visualization used to reveal important conclusions.

    Speaker Info:

    Kelli McCoy

  • A Statistical Approach for Uncertainty Quantification with Missing Data

    Abstract:

    Uncertainty quantification (UQ) has emerged as the science of quantitative characterization and reduction of uncertainties in simulation and testing. Stretching across applied mathematics, statistics, and engineering, UQ is a multidisciplinary field with broad applications. A popular UQ method to analyze the effects of input variability and uncertainty on the system responses is generalized Polynomial Chaos Expansion (gPCE). This method was developed using applied mathematics and does not require knowledge of a simulation’s physics. Thus, gPCE may be used across disparate industries and is applicable to both individual component and system level simulations.
    The gPCE method can encounter problems when any of the input configurations fail to produce valid simulation results. gPCE requires that results be collected on a sparse grid Design of Experiment (DOE), which is generated based on probability distributions of the input variables. A failure to run the simulation at any one input configuration can result in a large decrease in the accuracy of a gPCE. In practice, simulation data sets with missing values are common because simulations regularly yield invalid results due to physical restrictions or numerical instability.
    We propose a statistical approach to mitigating the cost of missing values. This approach yields accurate UQ results if simulation failure makes gPCE methods unreliable. The proposed approach addresses this missing data problem by introducing an iterative machine learning algorithm. This methodology allows gPCE modelling to handle missing values in the sparse grid DOE. The study will demonstrate the convergence characteristics of the methodology to reach steady state values for the missing points using a series of simulations and numerical results. Remarks about the convergence rate and the advantages and feasibility of the proposed methodology will be provided.
    Several examples are used to demonstrate the proposed framework and its utility including a secondary air system example from the jet engine industry and several non-linear test functions. This is based on joint work with Dr. Mark Andrews at SmartUQ.

    Speaker Info:

    Mark Andrews

  • A Survey of Statistical Methods in Aeronautical Ground Testing

    Speaker Info:

    Drew Landman

  • A User-Centered Design Approach to Military Software Development

    Abstract:

    This case study highlights activities performed during the front-end process of a software development effort undertaken by the Fire Support Command and Control Program Office. This program office provides the U.S. Army, Joint and coalition commanders with the capability to plan, execute and deliver both lethal and non-lethal fires. Recently, the program office has undertaken modernization of its primary field artillery command and control system that has been in use for over 30 years. The focus of this case study is on the user-centered design process and activities taken prior to and immediately following contract award.

    A modified waterfall model comprised of three cyclic, yet overlapping phases (observation, visualization, and evaluation) provided structure for the iterative, user-centered design process. Gathering and analyzing data collected during focus groups, observational studies, and workflow process mapping, enabled the design team to identify 1) design patterns across the role/duty, unit and echelon matrix (a hierarchical organization structure), 2) opportunities to automate manual processes, 3) opportunities to increase efficiencies for fire mission processing, 4) bottlenecks and workarounds to be eliminated through design of the modernized system, 5) shortcuts that can be leveraged in design, 6) relevant and irrelevant content for each user population for streamlining access to functionality, 7) a usability baseline for later comparison (e.g., the number of steps and time taken to perform a task as captured in workflows for comparison to the same task in the modernized system), and provided the basis for creating visualizations using wireframes. Heuristic evaluations were conducted early to obtain initial feedback from users. In the next few months, usability studies will enable users to provide feedback based on actual interaction with the newly designed software.

    Included in this case study are descriptions of the methods used to collect user-centered design data, how results were visualized/documented for use by the development team, and lessons learned from applying user-centered design techniques during software development of a military field artillery command and control system.

    Speaker Info:

    Pam Savage-Knepshield

  • Accelerating Uncertainty Quantification for Complex Computational Models

    Abstract:

    Scientific computing has undergone extraordinary growth in sophistication in recent years, enabling the simulation of a wide range of complex multiphysics and multiscale phenomena. Along with this increase in computational capability is the growing recognition that uncertainty quantification (UQ) must go hand-in-hand with numerical simulation in
    order to generate meaningful and reliable predictions for engineering applications. If not rigorously considered, uncertainties due to manufacturing defects, material variability, modeling assumptions, etc. can cause a substantial disconnect between simulation and reality. Packaging these complex computational models within an UQ framework, however, can be a significant challenge due to the need to repeatedly evaluate the model when even a single evaluation is time-consuming.

    This talk discusses efforts at NASA Langley Research Center (LaRC) to enable rapid UQ for problems with expensive computational models. Under the High Performance Computing Incubator (HPCI) program at LaRC, several open-source software libraries are being developed and released to provide access to general-purpose, state-of-the-art UQ algorithms. The common denominator of these methods is that they all expose parallelism among the model evaluations needed for UQ and, as such, are implemented to leverage HPC resources when available to achieve tremendous computational speedup. While the methods and software presented are broadly applicable, they will be demonstrated in the context of applications that have particular interest at NASA, including structural health management and trajectory simulation.

    Speaker Info:

    James Warner

  • Adapting Operational Test to Rapid-Acquisition Programs

    Abstract:

    During the past several years, the DoD has begun applying rapid prototyping and fielding authorities—granted by Congress in the FY2016-FY2018 National Defense Authorization Acts (NDAA)—to many acquisition programs. Other programs have implemented an agile acquisition strategy where incremental capability is delivered in iterative cycles. As a result, Operational Test Agencies (OTA) have had to adjust their test processes to accommodate shorter test timelines and periodic delivery of capability to the warfighter. In this session, representatives from the Service OTAs will brief examples where they have implemented new practices and processes for conducting Operational Test on acquisition programs categorized as agile, DevOps, and/or Section 804 rapid-acquisition efforts. During the final 30 minutes of the session, a panel of OTA representatives will field questions from the audience concerning the challenges and opportunities related to test design, data collection, and analysis, that rapid-acquisition programs present.

    Speaker Info:

    Panel Discussion

  • Adopting Optimized Software Test Design Methods at Scale

    Abstract:

    Using Combinatorial Test Design methods to select software test scenarios has repeatedly delivered large efficiency and thoroughness gains - which begs the questions:

    • Why are these proven methods not used everywhere?
    • Why do some efforts to promote adoption of new approaches stagnate?
    • What steps can leaders take to introduce successfully introduce and spread new test design methods?

    For more than a decade, Justin Hunter has helped large global organizations across six continents adopt new test design techniques at scale. Working in some environments, he has felt like Sisyphus, forever condemned to roll a boulder uphill only to watch it roll back down again. In other situations, things clicked; teams smoothly adopted new tools and techniques, and impressive results were quickly achieved. In this presentation, Justin will discuss several common challenges faced by large organizations, explain why adopting test design tools is more challenging than adopting other types of development and testing tools, and share actionable recommendations to consider when you roll out new test design approaches.

    Speaker Info:

    Justin Hunter

  • AI & ML in Complex Environment

    Abstract:

    The U.S. Army Research Laboratory's (ARL) Essential Research Program (ERP) on Artificial Intelligence & Machine Learning (AI & ML) seeks to research, develop and employ a suite of AI-inspired and ML techniques and systems to assist teams of soldiers and autonomous agents in dynamic, uncertain, complex operational conditions. Systems will be robust, scalable, and capable of learning and acting with varying levels of autonomy, to become integral components of networked sensors, knowledge bases, autonomous agents, and human teams. Three specific research gaps will be examined: (i) Learning in Complex Data Environments, (ii) Resource-constrained AI Processing at the Point-of-Need and (iii) Generalizable & Predictable AI. The talk will highlight ARL's internal research efforts over the next 3-5 years that are connected, cumulative and converging to produce tactically-sensible AI-enabled capabilities for decision making at the tactical edge, specifically addressing topics in: (1) adversarial distributed machine learning, (2) robust inference & machine learning over heterogeneous sources, (3) adversarial reasoning integrating learned information, (4) adaptive online learning and (5) resource-constrained adaptive computing. The talk will also highlight collaborative research opportunities in AI & ML via ARL’s Army AI Innovation Institute (A2I2) which will harness the distributed research enterprise via the ARL Open Campus & Regional Campus initiatives.

    Speaker Info:

    Tien Pham

  • Air Force Human Systems Integration Program

    Abstract:

    The Air Force (AF) Human Systems Integration (HSI) program is led by the 711th Human Performance Wing’s Human Systems Integration Directorate (711 HPW/HP). 711 HPW HP provides direct support to system program offices and AF Major Commands (MAJCOMs) across the acquisition lifecycle from requirements development to fielding and sustainment in addition to providing home office support. With an ever-increasing demand signal for support, HSI practitioners within 711 HPW/HP assess HSI domain areas for human-centered risks and strive to ensure systems are designed and developed to safely, effectively, and affordably integrate with human capabilities and limitations. In addition to system program offices and MAJCOMs, 711 HPW/HP provides HSI support to AF Centers (e.g., AF Sustainment Center, AF Test Center), the AF Medical Service, and special cases as needed. The AF Global Strike Command (AFGSC) is the largest MAJCOM with several Programs of Record (POR), such as the B-1, B-2, and B-52 bombers, Intercontinental Ballistic Missiles (ICBM), Ground-Based Strategic Deterrent (GBSD), Airborne Launch Control System (ALCS), and other support programs/vehicles like the UH-1N. Mr. Anthony Thomas (711 HPW/HP), the AFGSC HSI representative, will discuss how 711 HPW/HP supports these programs at the MAJCOM headquarters level and in the system program offices.

    Speaker Info:

    Anthony Thomas

  • An Overview of Uncertainty-Tolerant Decision Support Modeling for Cybersecurity

    Abstract:

    Cyber system defenders face the challenging task of continually protecting critical assets and information from a variety of malicious attackers. Defenders typically function within resource constraints, while attackers operate at relatively low costs. As a result, design and development of resilient cyber systems that support mission goals under attack, while accounting for the dynamics between attackers and defenders, is an important research problem. This talk will highlight decision support modeling challenges under uncertainty within non-cooperative cybersecurity settings. Multiple attacker-defender game formulations under uncertainty are discussed with steps for further research.

    Speaker Info:

    Samrat Chatterjee

  • Anatomy of a Cyberattack: Standardizing Data Collection for Adversarial and Defensive Analyses

    Abstract:

    Hardly a week goes by without news of a cybersecurity breach or an attack by cyber adversaries against a nation’s infrastructure. These incidents have wide-ranging effects, including reputational damage and lawsuits against corporations with poor data handling practices. Further, these attacks do not require the direction, support, or funding of technologically advanced nations; instead, significant damage can be – and has been – done with small teams, limited budgets, modest hardware, and open source software. Due to the significance of these threats, it is critical to analyze past events to predict trends and emerging threats.

    In this document, we present an implementation of a cybersecurity taxonomy and a methodology to characterize and analyze all stages of a cyberattack. The chosen taxonomy, MITRE ATT&CK™, allows for detailed definitions of aggressor actions which can be communicated, referenced, and shared uniformly throughout the cybersecurity community. We translate several open source cyberattack descriptions into the analysis framework, thereby constructing cyberattack data sets.

    These data sets (supplemented with notional defensive actions) illustrate example Red Team activities. The data collection procedure, when used during penetration testing and Red Teaming, provides valuable insights about the security posture of an organization, as well as the strengths and shortcomings of the network defenders. Further, these records can support past trends and future outlooks of the changing defensive capabilities of organizations.

    From these data, we are able to gather statistics on the timing of actions, detection rates, and cyberattack tool usage. Through analysis, we are able to identify trends in the results and compare the findings to prior events, different organizations, and various adversaries.

    Speaker Info:

    Jason Schlup

  • Applying Functional Data Analysis throughout Aerospace Testing

    Abstract:

    Sensors abound in aerospace testing and while many scientists look at the data from a physics perspective, the comparative statistics information is what drives decisions. A multi-company project was comparing launch data from the 1980’s to a current set of data that included 30 sensors. Each sensor was designed to gather 3000 data points during the 3-second launch event. The data included temperature, acceleration, and pressure information. This talk will compare the data analysis methods developed for this project as well as the use of the new Functional Data Analysis tool within JMP for its ability to discern in-family launch performances.

    Speaker Info:

    David Harrison

  • Area Validation for Applications with Mixed Uncertainty

    Abstract:

    Model validation is a process for determining how accurate a model is
    when compared to a true value. The methodology uses uncertainty analysis
    in order to assess the discrepancy between a measured and predicted value.
    In the literature, there have been several area metrics introduced to handle
    these type of discrepancies. These area metrics were applied to problems that
    include aleatory uncertainty, epistemic uncertainty, and mixed uncertainty.
    However, these methodologies lack the ability to fully characterize the true
    dierences between the experimental and prediction data when mixed un-
    certainty exists in the measurements and/or in the predictions. This work
    will introduce a new area metric validation approach which aims to com-
    pensate for the shortcomings in current techniques. The approach will be
    described in detail and comparisons between existing metrics will be shown.
    To demonstrate its applicability the new area metric will be applied to a
    stagnation point calibration probe's surface predictions for a low-enthalpy
    conditions. For this application, testing was preformed in the Hypersonic
    Materials Environmental Test System (HYMETS) facility located at NASA
    Langley Research Center.

    Speaker Info:

    Laura White

  • Bayesian Analysis

    Abstract:

    This course will cover the basics of the Bayesian approach to practical and coherent statistical inference. Particular attention will be paid to computational aspects, including MCMC. Examples/practical hands-on exercises will the run gamut from toy illustration to real-world data analysis from all areas of science, with R implementations/coaching provided. The course closely follows P.D. Hoff’s "A First Course in Bayesian Statistical Methods"—Springer 2009. Some examples are borrowed from two other texts which are nice references to have. J. Albert’s’ "Bayesian Computation with R"— Springer 2nd ed. 2009; and "A. Gelman, J.B. Carlin, H.S. Stern, D. Dunson, A. Vehtari and D.B. Rubin’ s "Bayesian Data Analysis"—3rd ed. 2013.

    Speaker Info:

    Robert Gramacy

    Virginia Tech

  • Bayesian Component Reliability Estimation: F-35 Case Study

    Abstract:

    A challenging aspect of a system reliability assessment is integrating
    multiple sources of information, including component, subsystem, and
    full-system data, previous test data, or subject matter expert opinion. A
    powerful feature of Bayesian analyses is the ability to combine these
    multiple sources of data and variability in an informed way to perform
    statistical inference. This feature is particularly valuable in assessing
    system reliability where testing is limited and only a small number (or no
    failures at all) are observed.

    The F-35 is DoD's largest program; approximately one-third of the operations
    and sustainment cost is attributed to the cost of spare parts and the
    removal, replacement, and repair of components. The failure rate of those
    components is the driving parameter for a significant portion of the
    sustainment cost, and yet for many of these components, poor estimates of
    the failure rate exist. For many programs, the contractor produces estimates
    of component failure rates, based on engineering analysis and legacy systems
    with similar parts. While these are useful, the actual removal rates can
    provide a more accurate estimate of the removal and replacement rates the
    program anticipates to experience in future years. In this presentation, we
    show how we applied a Bayesian analysis to combine the engineering
    reliability estimates with the actual failure data to overcome the problems
    of cases where few data exist. Our technique is broadly applicable to any
    program where multiple sources of reliability information need be combined
    for the best estimation of component failure rates and ultimately
    sustainment costs.

    Speaker Info:

    Rebecca Medlin

  • Bayesian Component Reliability Estimation: F-35 Case Study

    Abstract:

    A challenging aspect of a system reliability assessment is integrating
    multiple sources of information, including component, subsystem, and
    full-system data, previous test data, or subject matter expert opinion. A
    powerful feature of Bayesian analyses is the ability to combine these
    multiple sources of data and variability in an informed way to perform
    statistical inference. This feature is particularly valuable in assessing
    system reliability where testing is limited and only a small number (or no
    failures at all) are observed.

    The F-35 is DoD's largest program; approximately one-third of the operations
    and sustainment cost is attributed to the cost of spare parts and the
    removal, replacement, and repair of components. The failure rate of those
    components is the driving parameter for a significant portion of the
    sustainment cost, and yet for many of these components, poor estimates of
    the failure rate exist. For many programs, the contractor produces estimates
    of component failure rates, based on engineering analysis and legacy systems
    with similar parts. While these are useful, the actual removal rates can
    provide a more accurate estimate of the removal and replacement rates the
    program anticipates to experience in future years. In this presentation, we
    show how we applied a Bayesian analysis to combine the engineering
    reliability estimates with the actual failure data to overcome the problems
    of cases where few data exist. Our technique is broadly applicable to any
    program where multiple sources of reliability information need be combined
    for the best estimation of component failure rates and ultimately
    sustainment costs.

    Speaker Info:

    V. Bram Lillard

  • Behavioral Analytics: Paradigms and Performance Tools of Engagement in System Cybersecurity

    Abstract:

    The application opportunities for behavioral analytics in the cybersecurity space are based upon simple realities.

    1. The great majority of breaches across all cybersecurity venues is due to human choices and human error.

    2. With communication and information technologies making for rapid availability of data, as well as behavioral strategies of bad actors getting cleverer, there is need for expanded perspectives in cybersecurity prevention.

    3. Internally-focused paradigms must now be explored that place endogenous protection from security threats as an important focus and integral dimension of cybersecurity prevention.

    The development of cybersecurity monitoring metrics and tools as well as the creation of intrusion prevention standards and policies should always include an understanding of the underlying drivers of human behavior. As temptation follows available paths, cyber-attacks follow technology, business models, and behavioral habits.

    The human element will always be the most significant part in the anatomy of any final decision. Choice options – from input, to judgement, to prediction, to action – need to be better understood for their relevance to cybersecurity work. Behavioral Performance Indexes harness data about aggregate human participation in an active system, helping to capture some of the detail and nuances of this critically important dimension of cybersecurity.

    Speaker Info:

    Robert Gough

  • Categorical Data Analysis

    Abstract:

    Categorical data is abundant in the 21st century, and its analysis is vital to advance research across many domains. Thus, data-analytic techniques that are tailored for categorical data are an essential part of the practitioner’s toolset. The purpose of this short course is to help attendees develop and sharpen their abilities with these tools. Topics covered in this short course will include logistic regression, ordinal regression, and classification, and methods to assess predictive accuracy of these approaches will be discussed. Data will be analyzed using the R software package, and course content loosely follow Alan Agresti’s excellent textbook An Introduction to Categorical Data Analysis, Third Edition.

    Speaker Info:

    Christopher Franck

    Virginia Tech

  • Challenges in Test and Evaluation of AI: DoD's Project Maven

    Abstract:

    The Algorithmic Warfare Cross Functional Team (AWCFT or Project Maven) organizes DoD stakeholders to enhance intelligence support to the warfighter through the use of automation and artificial intelligence. The AWCFT’s objective is to turn the enormous volume of data available to DoD into actionable intelligence and insights at speed. This requires consolidating and adapting existing algorithm-based technologies as well as overseeing the development of new solutions. This brief will describe some of the methodological challenges in test and evaluation that the Maven team is working through to facilitate speedy and agile acquisition of reliable and effective AI / ML capabilities.

    Speaker Info:

    Jane Pinelis

  • Communicating Statistical Concepts and Results: Lessons Learned from the US Service Academies

    Abstract:

    Communication is critical both for analysts and decision-makers who rely on analysis to inform their choices. The Service Academies are responsible for educating men and women who may serve in both roles over the course of their careers. Analysts must be able to summarize their results concisely and communicate them to the decision-maker in a way that is relevant and actionable. Decision-makers understand that analytical results may carry with them uncertainty and be able to incorporate this uncertainty properly when evaluating different options. This panel explores the role of the US Service Academies in preparing their students for both roles. Featuring representatives from the US Air Force Academy, the US Naval Academy, and the US Military Academy, this panel will cover how future US Officers are taught to use and communicate with data. Topics include developing and motivating numerical literacy, understandings of uncertainty, how analysts should frame uncertainty to decision-makers, and how decision-makers should understand information presented with uncertainty. Panelists will discuss what they think the academies do well and areas that are ripe for improvement.

    Speaker Info:

    Panel Discussion

  • Comparison of Methods for Testing Uniformity to Support the Validation of Simulation Models used for Live-Fire Testing

    Abstract:

    Goodness-of-fit (GOF) testing is used in many applications, including statistical hypothesis testing to determine if a set of data come from a hypothesized distribution. In addition, combined probability tests are extensively used in meta-analysis to combine results from several independent tests to asses an overall null hypothesis. This paper summarizes a study conducted to determine which GOF and/or combined probability test(s) can be used to determine if a set of data with relative small sample size comes from the standard uniform distribution, U(0,1). The power against different alternative hypothesis of several GOF tests and combined probability methods were examined. The GOF methods included: Anderson-Darling, Chi-Square, Kolmogorov-Smirnov, Cramér-Von Mises, Neyman-Barton, Dudewicz-van der Meulen, Sherman, Quesenberry-Miller, Frosini, and Hegazy-Green; while thecombined probability test methods included: Fisher's Combined Probability Test, Mean Z, Mean P, Maximum P, Minimum P, Logit P, and Sum Z. While no one method was determined to provide the best power in all situations, several useful methods to support model validation were identified.

    Speaker Info:

    Shannon Shelburne

  • Constructing Designs for Fault Location

    Abstract:

    Abstract. While fault testing a system with many factors each appearing at some number of levels, it may not be possible to test all combinations of factor levels. Most faults are caused by interactions of only a few factors, so testing interactions up to size t will often find all faults in the system without executing an exhaustive test suite. Call an assignment of levels to t of the factors a t-way interaction. A covering array is a collection of tests that ensures that every t-way interaction is covered by at least one test in the test suite. Locating arrays extend covering arrays with the additional feature that they not only indicate the presence of faults but locate the faulty interactions when there are no more than d faults in the system. If an array is (d, t)-locating, for every pair of sets of t-way interactions of size d, the interactions do not appear in exactly the same tests. This ensures that the faulty interactions can be differentiated from non-faulty interactions
    by the results of some test in which interactions from one set or the other but not both are tested. When the property holds for t-way interaction sets of size up to d, the notation (d, t ¯ ) is used. In addition to fault location, locating arrays have also been used to identify significant effects in
    screening experiments.

    Locating arrays are fairly new and few techniques have been explored for their construction. Most of the available work is limited to finding only one fault (d = 1). Known general methods require a covering array of strength t + d and produce many more tests than are needed. In this talk, we present Partitioned Search with Column Resampling (PSCR), a computational search
    algorithm to verify if an array is (d, t ¯ )-locating by partitioning the search space to decrease the number of comparisons. If a candidate array is not locating, random resampling is performed until a locating array is constructed or an iteration limit is reached. Algorithmic parameters determine which factor columns to resample and when to add additional tests to the candidate array. We use a 5 × 5 × 3 × 2 × 2 full factorial design to analyze the performance of the algorithmic parameters and provide guidance on how to tune parameters to prioritize speed, accuracy, or a combination of both. Last, we compare our results to the number of tests in locating arrays constructed for the factors and levels of real-world systems produced by other methods.

    Speaker Info:

    Erin Lanus

  • Decentralized Signal Processing and Distributed Control for Collaborative Autonomous Sensor Networks

    Abstract:

    Collaborative autonomous sensor networks have recently been used in many applications including inspection, law enforcement, search and rescue, and national security. They offer scalable, low cost solutions which are robust to the loss of multiple sensors in hostile or dangerous environments. While often comprised of less capable sensors, the performance of a large network can approach the performance of far more capable and expensive platforms if nodes are effectively coordinating their sensing actions and data processing. This talk will summarize work to date at LLNL on distributed signal processing and decentralized optimization algorithms for collaborative autonomous sensor networks, focusing on ADMM-based solutions for detection/estimation problems and sequential greedy optimization solutions which maximize submodular functions, e.g. mutual information.

    Speaker Info:

    Ryan Goldhahn

  • Deep Reinforcement Learning

    Abstract:

    An overview of Deep Reinforcement Learning and it's recent successes in creating high performing agents. Covering it's application in “easy” environments up to massively complex multi-agent strategic environments. Will analyze the behaviors learned, discuss research challenges, and imagine future possibilities.

    Speaker Info:

    Benjamin Bell

  • Demystifying the Black Box: A Test Strategy for Autonomy

    Abstract:

    Systems with autonomy are beginning to permeate civilian, industrial, and military sectors. Though these technologies have the potential to revolutionize our world, they also bring a host of new challenges in evaluating whether these tools are safe, effective, and reliable. The Institute for Defense Analyses is developing methodologies to enable testing systems that can, to some extent, think for themselves. In this talk, we share how we think about this problem and how this framing can help you develop a test strategy for your own domain.

    Speaker Info:

    Dan Porter

  • Design and Analysis of Experiments for Europa Clipper’s Science Sensitivity Model

    Abstract:

    The Europa Clipper Science Sensitivity Model (SSM) can be thought of as a graph in which the nodes are mission requirements at ten levels in a hierarchy, and edges represent how requirements at one level of the hierarchy depend on those at lower levels. At the top of the hierarchy, there are ten nodes representing ten, Level 1 science requirements for the mission. At the bottom of the hierarchy, there are 100 or so nodes representing instrument-specific science requirements. In between, nodes represent intermediate science requirements with complex interdependencies. Meeting, or failing to meet, bottom-level requirements depends on the frequency of faults and the lengths of recovery times on the nine Europa Clipper instruments and the spacecraft.
    Our task was to design and analyze the results of a Monte Carlo experiment to estimate the probabilities of meeting the Level 1 science requirements based on parameters of the distributions of time between failures and of recovery times. We simulated an ensemble of synthetic missions in which failures and recoveries were random realizations from those distributions. The pass-fail status of the bottom-level instrument-specific requirements were propagated up the graph for each of the synthetic missions. Aggregating over the collection of synthetic missions produced estimates of the pass-fail probabilities for the Level 1 requirements. We constructed a definitive screening design and supplemented it with additional space-filing runs, using JMP 14 software. Finally, we used the vectors of failure and recovery parameters as predictors, and the pass-fail probabilities of the high-level requirements as responses, and built statistical models to predict the latter from the former. In this talk, we will describe the design considerations and review the fitted models and their implications for mission success.

    Speaker Info:

    Amy Braverman

  • Design of Experiments

    Abstract:

    Overview/Course Outcomes-
    Well-designed experiments are a powerful tool for developing and validating cause and effect relationships when evaluating and improving product and process performance and for operational testing of complex systems. Designed experiments are the only efficient way to verify the impact of changes in product or process factors on actual performance.
    The course outcomes are:
    • Ability to plan and execute experiments
    • Ability to collect data and analyze and interpret these data to provide the knowledge required for business success
    • Knowledge of a wide range of modern experimental tools that enable practitioners to customize their experiment to meet practical resource constraints
    The topics covered during the course are:
    • Fundamentals of DOX - randomization, replication, and blocking.
    • Planning for a designed experiment - type and size of design, factor selection, levels and ranges, response measurement, sample sizes.
    • Graphical and statistical approaches to DOX analysis.
    • Blocking to eliminate the impact of nuisance factors on experimental results.
    • Factorial experiments and interactions.
    • Fractional factorials - efficient and effective use of experimental resources.
    • Optimal designs
    • Response surface methods
    • A demonstration illustrating and comparing the effectiveness of different experimental design strategies.

    This course is focused on helping you and your organization make the most effective utilization of DOX. Software usage is fully integrated into the course

    Who Should Attend-
    The course is suitable for participants from an engineering or technical background. Participants will need some previous experience and background in statistical methods.

    Reference Materials-
    The course is based on the textbook Design and Analysis of Experiments, 9th Edition, by Douglas C. Montgomery. JMP Software will be discussed and illustrated.

    Speaker Info:

    Dr. Doug Montgomery

    Arizona State University, JMP

  • Design of Experiments

    Abstract:

    Overview/Course Outcomes-
    Well-designed experiments are a powerful tool for developing and validating cause and effect relationships when evaluating and improving product and process performance and for operational testing of complex systems. Designed experiments are the only efficient way to verify the impact of changes in product or process factors on actual performance.
    The course outcomes are:
    • Ability to plan and execute experiments
    • Ability to collect data and analyze and interpret these data to provide the knowledge required for business success
    • Knowledge of a wide range of modern experimental tools that enable practitioners to customize their experiment to meet practical resource constraints
    The topics covered during the course are:
    • Fundamentals of DOX - randomization, replication, and blocking.
    • Planning for a designed experiment - type and size of design, factor selection, levels and ranges, response measurement, sample sizes.
    • Graphical and statistical approaches to DOX analysis.
    • Blocking to eliminate the impact of nuisance factors on experimental results.
    • Factorial experiments and interactions.
    • Fractional factorials - efficient and effective use of experimental resources.
    • Optimal designs
    • Response surface methods
    • A demonstration illustrating and comparing the effectiveness of different experimental design strategies.

    This course is focused on helping you and your organization make the most effective utilization of DOX. Software usage is fully integrated into the course

    Who Should Attend-
    The course is suitable for participants from an engineering or technical background. Participants will need some previous experience and background in statistical methods.

    Reference Materials-
    The course is based on the textbook Design and Analysis of Experiments, 9th Edition, by Douglas C. Montgomery. JMP Software will be discussed and illustrated.

    Speaker Info:

    Dr. Caleb King

    Arizona State University, JMP

  • Engineering first, Statistics second: Deploying Statistical Test Optimization (STO) for Cyber

    Abstract:

    Due to the immense potential use cases, configurations, and threat behaviors, thorough and efficient cyber testing is a significant challenge for the defense community. In this presentation, Phadke will present case studies where STO was successfully deployed for cyber testing, resulting in higher assurance, reduced schedule, and reduced testing cost. Phadke will also discuss importance first focusing on the engineering and science analysis, and only after that is complete, implementing statistical methods.

    Speaker Info:

    Kedar Phadke

  • Exploring Problems in Shipboard Air Defense with Modeling

    Abstract:

    One of the primary roles of navy surface combatants is defending high-value units against attack by anti-ship cruise missiles (ASCMs). They accomplish this either by launching their interceptor missiles and shooting the ASCMs down with rapid-firing guns (hard kill), or through the use of deceptive jamming, decoys, or other non-kinetic means (soft kill) to defeat the threat. The wide range of hostile ASCM capabilities and the different properties of friendly defenses, combined with the short time-scale for defeating these ASCMs, makes this a difficult problem to study.
    IDA recently completed a study focusing on the extent to which friendly forces were vulnerable to massed ASCM attacks, and possible avenues for improvement. To do this we created a pair of complementary models with the combined flexibility to explore a wide range of questions. The first model employed a set of closed-form equations, and the second a time-dependent Monte Carlo simulation. This presentation discusses the thought processes behind the models and their relative strengths and weaknesses.

    Speaker Info:

    Ralph Donnelly

  • Exploring Problems in Shipboard Air Defense with Modeling

    Abstract:

    One of the primary roles of navy surface combatants is defending high-value units against attack by anti-ship cruise missiles (ASCMs). They accomplish this either by launching their interceptor missiles and shooting the ASCMs down with rapid-firing guns (hard kill), or through the use of deceptive jamming, decoys, or other non-kinetic means (soft kill) to defeat the threat. The wide range of hostile ASCM capabilities and the different properties of friendly defenses, combined with the short time-scale for defeating these ASCMs, makes this a difficult problem to study.
    IDA recently completed a study focusing on the extent to which friendly forces were vulnerable to massed ASCM attacks, and possible avenues for improvement. To do this we created a pair of complementary models with the combined flexibility to explore a wide range of questions. The first model employed a set of closed-form equations, and the second a time-dependent Monte Carlo simulation. This presentation discusses the thought processes behind the models and their relative strengths and weaknesses.

    Speaker Info:

    Benjamin Ashwell

  • Functional Data Analysis for Design of Experiments

    Abstract:

    With nearly continuous recording of sensor values now common, a new type of data called “functional data” has emerged. Rather than the individual readings being modeled, the shape of the stream of data over time is being modeled. As an example, one might model many historical vibration-over-time streams of a machine at start-up to identify functional data shapes associated with the onset of system failure. Functional Principal Components (FPC) analysis is a new and increasingly popular method for reducing the dimensionality of functional data so that only a few FPCs are needed to closely approximate any of a set of unique data streams. When combined with Design of Experiments (DoE) methods the response to be modeled in as fewest tests as possible is now the shape of a stream of data over time. Example analyses will be shown where the form of the curve is modeled as the function of several input variables allowing one to determine the input settings associated with shapes indicative of good or poor system performance. This allows the analyst to predict the shape of the curve as a function of the input variables.

    Speaker Info:

    Tom Donnelly

  • Human in the Loop Experiment Series Evaluating Synthetic Vision Displays for Enhanced Airplane State Awareness

    Abstract:

    Recent data from Boeing’s Statistical Summary of Commercial Jet Airplane Accidents shows that Loss of Control - In Flight (LOC-I) is the leading cause of fatalities in commercial aviation accidents worldwide. The Commercial Aviation Safety Team (CAST), a joint government and industry effort tasked with reducing the rate of fatal accidents, requested that the National Aeronautics and Space Administration (NASA) conduct research on virtual day-visual meteorological conditions displays, such as synthetic vision, in order to combat LOC-I. NASA recently concluded a series of experiments using commercial pilots from various backgrounds to evaluate synthetic vision displays. This presentation will focus on the two most recent experiments: one conducted with the Navy’s Disorientation Research Device and one completed at NASA Langley Research Center that utilized the Microsoft HoloLens to display synthetic vision. Statistical analysis was done on aircraft performance data, pilot inputs, and a range of subjective questionnaires to assess the efficacy of the displays.

    Speaker Info:

    Kathryn Ballard

  • Hypergames for Control System Security

    Abstract:

    The identification of the Stuxnet worm in 2010 provided a highly publicized example of a cyber attack on an industrial control system. This raised public awareness about the possibility of similar attacks against other industrial targets – including critical infrastructure. Here, we use hypergames to analyze how adversarial perturbations can be used to manipulate a system using optimal control. Hypergames form an extension of game theory that models strategic interactions between players with significantly different perceptions of the game(s) they are playing. Previous work on hypergames has been limited to simple interactions, where a small set of discrete choices are available to each player. However, we apply hypergames to larger systems with continuous variables. Our results highlight that manipulating constraints can be a more effective attacker strategy than directly manipulating objective function parameters. Moreover, the attacker need not influence the underlying system to carry out a successful attack – it may be sufficient to deceive the defender controlling the system. Finally, we identify several characteristics that will make our analysis amenable to higher-dimensional control systems.

    Speaker Info:

    Arnab Bhattacharya

  • Identifying and Contextualizing Maximum Instrument Fault Rates and Minimum Instrument Recovery Times for Europa Clipper Science through Applied Statistics and Strategic Visualizations

    Abstract:

    Using the right visualizations as part of broad system and statistical Monte Carlo analysis supports interpretation of key drivers and relationships between variables, provides context about the full system, and communicates to non-statistician stakeholders.
    An experimental design was used to understand the relationships between instrument and spacecraft fault rate and recovery time in relation to the probability of achieving Europa Clipper science objectives during the Europa Clipper tour. Given spacecraft and instrument outages, requirement achievement checks were performed to determine the probability of meeting scientific objectives. Visualizations of the experimental design output enabled analysis of the full parameter set. Correlation between individual instruments and specific scientific objectives is not straight forward; some scientific objectives require a single instrument to be on at certain times and during varying conditions across the trajectory, while other science objectives require multiple instruments to function concurrently.
    By examining the input conditions that meet scientific objectives with the highest probability, and comparing those to trials with the lowest probability of meeting scientific objectives, key relationships could be visualized, enabling valuable mission and engineering design insights. Key system drivers of scientific success were identified, such as fault rate tolerance and recovery time required for each instrument and the spacecraft.
    Key steps, methodologies, difficulties and result-highlights are presented, along with a discussion of next steps and options for refinement and future analysis.

    Speaker Info:

    Thomas Youmans

  • Improved Surface Gunnery Analysis with Continuous Data

    Abstract:

    Swarms of small, fast speedboats can challenge even the most capable modern warships, especially when they operate in or near crowded shipping lanes. As part of the Navy’s operational testing of new ships and systems, at-sea live-fire tests against remote-controlled targets allow us to test our capability against these threats. To ensure operational realism, these events are minimally scripted and allow the crew to respond in accordance with their training. This is a trade-off against designed experiments, which ensure statistically optimal sampling of data from across the factor space, but introduce many artificialities.
    A recent test provided data on the effectiveness of naval gunnery. However, standard binomial (hit/miss) analyses fell short, as the number of misses was much larger than the number of hits. This prevented us from fitting more than a few factors and resulted in error bars so large as to be almost useless. In short, binomial analysis taught us nothing we did not already know. Recasting gunfire data from binomial (hit/miss) to continuous (time-to-kill) allowed us to draw statistical conclusions with tactical implications from these free-play, live-fire surface gunnery events. Using a censored-data analysis approach enabled us to make this switch and avoid the shortcomings of other statistical methods. Ultimately, our analysis provided the Navy with suggestions for improvements to its tactics and the employment of its weapons.

    Speaker Info:

    Benjamin Ashwell

  • Improved Surface Gunnery Analysis with Continuous Data

    Abstract:

    Swarms of small, fast speedboats can challenge even the most capable modern warships, especially when they operate in or near crowded shipping lanes. As part of the Navy’s operational testing of new ships and systems, at-sea live-fire tests against remote-controlled targets allow us to test our capability against these threats. To ensure operational realism, these events are minimally scripted and allow the crew to respond in accordance with their training. This is a trade-off against designed experiments, which ensure statistically optimal sampling of data from across the factor space, but introduce many artificialities.
    A recent test provided data on the effectiveness of naval gunnery. However, standard binomial (hit/miss) analyses fell short, as the number of misses was much larger than the number of hits. This prevented us from fitting more than a few factors and resulted in error bars so large as to be almost useless. In short, binomial analysis taught us nothing we did not already know. Recasting gunfire data from binomial (hit/miss) to continuous (time-to-kill) allowed us to draw statistical conclusions with tactical implications from these free-play, live-fire surface gunnery events. Using a censored-data analysis approach enabled us to make this switch and avoid the shortcomings of other statistical methods. Ultimately, our analysis provided the Navy with suggestions for improvements to its tactics and the employment of its weapons.

    Speaker Info:

    V. Bram Lillard

  • Machine Learning Prediction With Streamed Sensor Data: Fitting Neural Networks using Functional Principal Components

    Abstract:

    Sensors that record sequences of measurements are now embedded in many products from wearable exercise watches to chemical and semiconductor manufacturing equipment. There is information in the shapes of the sensor stream curves that is highly predictive of a variety of outcomes such as the likelihood of a product failure event or batch yield. Despite this data now being common and readily available, it is often being used either inefficiently or not at all due to lack of knowledge and tools for how to properly leverage it. In this presentation, we will propose fitting splines to sensor streams and extracting features called functional principal component scores that offer a highly efficient low dimensional compression of the signal data. Then, we use these features as inputs into machine learning models like neural networks and LASSO regression models. Once one sees sensor data in this light, answering a wide variety of applied questions becomes a straightforward two stage process of data cleanup/functional feature extraction followed by modeling using those features as inputs.

    Speaker Info:

    Chris Gotwalt

  • Multivariate Data Analysis

    Abstract:

    In this one-day workshop, we will explore five techniques that are commonly used to model human behavior: principal component analysis, factor analysis, cluster analysis, mixture modeling, and multidimensional scaling. Brief discussions of the theory of each method will be provided, along with some examples showing how the techniques work and how the results are interpreted in practice. Accompanying R-code will be provided so attendees are able to implement these methods on their own.

    Speaker Info:

    Doug Steinley

    University of Missouri

  • Multivariate Density Estimation and Data-enclosing Sets Using Sliced-Normal Distributions

    Abstract:

    This talk focuses on a means to characterize the variability in multivariate data. This characterization, given in terms of both probability distributions and closed sets, is instrumental in assessing and improving the robustness/reliability properties of system designs. To this end, we propose the Sliced-Normal (SN) class of distributions. The versatility of SNs enables modeling complex multivariate dependencies with minimal modeling effort. A polynomial mapping is defined which injects the physical space into a higher dimensional (so-called) feature space on which a suitable normal distribution is defined. Optimization-based strategies for the estimation of SNs from data in both physical and feature space are proposed. The formulations in physical space yield non-convex optimization programs whose solutions often outperform the solutions in feature space. However, the formulations in feature space yield either an analytical solution or a convex program thereby facilitating their application to problems in high dimensions. The superlevel sets of a SN density have a closed semi-algebraic form making them amenable to rigorous uncertainty quantification methods. Furthermore, we propose a chance-constrained optimization framework for identifying and eliminating the effects of outliers in the prescription of such regions. These strategies can be used to mitigate the conservatism intrinsic to many methods in system identification, fault detection, robustness/reliability analysis, and robust design caused by assuming parameter independence and by including outliers in the dataset.

    Speaker Info:

    Luis Crespo

  • Open Architecture Tradeoffs (OAT): A simple, computational game engine for rapidly exploring hypotheses in Battle Management Command and Control (BMC2)

    Abstract:

    We created the Open Architectures Tradeoff (OAT) tool, a simple, computational game engine for rapidly exploring hypotheses about mission effectiveness in Battle Management Command and Control (BMC2). Each run of an OAT game simulates a military mission in contested airspace. Game objects represent U.S., adversary, and allied assets, each of which moves through the simulated airspace. Each U.S. asset has a Command and Control (C2) package the controls its actions—currently, neural networks form the basis of each U.S. asset’s C2 package. The weights of the neural network are randomized at the beginning of each game and are updated over the course of the game as the U.S. asset learns which of its actions lead to rewards, e.g., intercepting an adversary. Weights are updated via a Decentralized Partially Observable Markov Decision Process (Dec-POMDP) altered to accommodate a Reinforcement Learning paradigm.
    OAT allows a user to winnow down the trade space that should be considered when setting up more expensive and time-consuming campaign models. OAT could be used to weed out bad ideas for “fast failure”, thus avoiding waste of campaign modeling resources. Questions can be explored via OAT such as: Which combination of system capabilities is likely to be more or less effective in a particular military mission?
    For example, in an early analysis, OAT was used to test the hypothesis that increases in U.S. assets’ sensor range always lead to increases in mission effectiveness, quantified as the percent of adversaries intercepted. We ran over 2500 OAT games, each time varying the sensor range of U.S. assets and the density of adversary assets. Results show that increasing sensor range did lead to an increase in military effectiveness—but only up to a certain point. Once the sensor range surpassed approximately 10-15% of the simulated airspace size, no further gains were made in the percent of adversaries intercepted. Thus, campaign modelers should hesitate to devote resources to exploring sensor range in isolation.
    More recent OAT analyses are exploring more complex hypotheses regarding the trade space between sensor range and communications range.

    Speaker Info:

    Shelley Cazares

  • Probabilistic Data Synthesis to Provide a Defensible Risk Assessment for Army Munition

    Abstract:

    Military grade energetics are, by design, required to operate under extreme conditions. As such, warheads in a munition must demonstrate a high level of structural integrity in order to ensure safe and reliable operation by the Warfighter. In this example which involved an artillery munition, a systematic analytics-driven approach was executed which synthesized physical test data results with probabilistic analysis, non-destructive evaluation, modeling and simulation, and comprehensive risk analysis tools in order to determine the probability of a catastrophic event. Once the severity, probability of detection, occurrence, were synthesized, a model was built to determine the risk of a catastrophic event during firing which then accounts for defect growth occurring as a result of rough-handling. This comprehensive analysis provided a defensible, credible, and dynamic snapshot of risk while allowing for a transparent assessment of contribution to risk of the various inputs through sensitivity analyses. This paper will illustrate intersection of product safety, reliability, systems-safety policy, and analytics, and highlight the impact of a holistic multidisciplinary approach. The benefits of this rigorous assessment included quantifying risk to the user, supporting effective decision-making, improving resultant safety and reliability of the munition, and supporting triage and prioritization of future Non-Destructive Evaluation (NDE) screening efforts by identifying at-risk subpopulations.

    Speaker Info:

    Kevin Singer

  • Reasoning about Uncertainty with the Stan Modeling Language

    Abstract:

    This briefing discusses the practical advantages of using the probabilistic programming language (PPL) Stan to answer statistical questions, especially those related to the quantification of uncertainty. Stan is a relatively new statistical tool that allows users to specify probability models and reason about the processes that generate the data they encounter.

    Stan has quickly become a popular language for writing statistical models because it allows one to specify rich (or sparse) Bayesian models using high level language. Further, Stan is fast, memory efficient, and robust.

    Stan requires users be explicit about the model they wish to evaluate, which makes the process of statistical modeling more transparent to users and decision makers. This is valuable because it forces practitioners to consider assumptions at the beginning of the model building procedure, rather than at the end (or not at all). In this sense, Stan is the opposite of a “black box” modeling approach. This approach may be tedious and labor intensive at first, but the pay-offs are large. For example, once a model is set-up inferential tasks all essentially automatic, as changing the model does not change the how one analyzes the data. This is a generic approach to inference.

    To illustrate these points, we use Stan to study a ballistic miss distance problem. In ballistic missile testing, the p-content circular error probable (CEP) in the circle that contains p percent of future shots fired, on average. Statistically, CEP is a bivariate prediction region, constrained by the model to be circular.

    In Frequentist statistics, the determination of CEP is highly dependent on the model fit, and a different calculation of CEP must be produced for each plausible model. However, with Stan, we can approach the CEP calculation invariant of the model we use to fit the data.

    We show how to use Stan to calculate CEP and uncertainty intervals for the parameters using summary statistics.

    Statistical practitioners can access Stan from several programming languages, including R and Python.

    Speaker Info:

    John Haman

  • Sample Size Calculations for Quiet Sonic Boom Community Surveys

    Abstract:

    NASA is investigating the dose-response relationship between quiet sonic boom exposure and community noise perceptions. This relationship is the key to possible future regulations that would replace the ban on commercial supersonic flights with a noise limit. We have built several Bayesian statistical models using pilot community study data. Using goodness of fit measures, we downselected to a subset of models which are the most appropriate for the data. From this subset of models we demonstrate how to calculate sample size requirements for a simplified example without any missing data. We also suggest how to modify the sample size calculation to account for missing data.

    Speaker Info:

    Jasme Lee

  • Satellite Affordability in LEO (SAL)

    Abstract:

    The Satellite Affordability in LEO (SAL) model identifies the cheapest constellation capable of providing a desired level of performance within certain constraints. SAL achieves this using a combination of analytical models, statistical emulators, and geometric relationships. SAL is flexible and modular, allowing users to customize certain components while retaining default behavior in other cases. This is desirable if users wish to consider an alternative cost formulation or different types of payload. Uses for SAL include examining cost tradeoffs with respect to factors like constellation size and desired performance level, evaluating the sensitivity of constellation costs to different assumptions about cost behavior, and providing a first-pass look at what proliferated smallsats might be capable of. At this point, SAL is limited to Walker constellations with sun-synchronous, polar orbits.

    Speaker Info:

    Matthew Avery

  • Screening Designs for Resource Constrained Deterministic M&S Experiments: A Munitions Case Study

    Abstract:

    Abstract: In applications where modeling and simulation runs are quick and cheap, space filling designs will give the tester all the information they need to make decisions about their system. In some applications however, this luxury does not exist, and each M&S run can be time consuming and expensive. In these scenarios, a sequential test approach provides an efficient solution where an initial screening is conducted, followed by an augmentation to fit specified models of interest. Until this point, no dedicated screening designs for UQ applications in resource constrained situations existed. Due to the Army’s frequent exposure to this type of situation, the need sparked a collaboration between Picatinny’s Statistical Methods and Analysis group and Professor V. Roshan Joseph of Georgie Tech, where a new type of UQ screening design was created. This paper provides a brief introduction to the design, its intended use, and a case study in which this new methodology was applied.

    Speaker Info:

    Christopher Drake

  • Sequential Testing for Fast Jet Life Support Systems

    Abstract:

    The concept of sequential testing has many disparate meanings. Often, for statisticians it takes on a purely mathematical context while possibly meaning multiple disconnected test events for some practitioners. Here we present a pedagogical approach to creating test designs involving constrained factors using JMP software. Recent experiences testing one of the U.S. military’s fast jet life support systems (LSS) serves as a case study and back drop to support the presentation. The case study discusses several lessons learned during LSS testing, applicable to all practitioners of scientific test and analysis techniques (STAT) and design of experiments (DOE). We conduct a short analysis to specifically determine a test region with a set of factors pertinent to modeling human breathing and the use of breathing machines as part of the laboratory setup. A comparison of several government and industry laboratory test points and regions with governing documentation is made, along with the our proposal for determining a necessary and sufficient test region for tests involving human breathing as a factor.

    Speaker Info:

    Darryl Ahner

    Steven Thorsen, Sarah Burke &

  • SLS Structural Dynamics Sensor Optimization Study

    Abstract:

    A crucial step in the design and development of a fight vehicle, such as NASA's Space Launch System (SLS), is understanding its vibration behavior while in fight. Vehicle designers rely on low-cost finite element analysis (FEA) to predict the vibration behavior of the vehicle. During ground and flight tests, sensors are strategically placed at predefined locations that contribute the most vibration information under the assumption that FEA is accurate, producing points to validate the FEA models. This collaborative work focused on developing optimal sensor placement algorithms to validate FEA models against test data, and to characterize the vehicles vibration characteristics.

    Speaker Info:

    Ken Toro

  • SLS Structural Dynamics Sensor Optimization Study

    Abstract:

    A crucial step in the design and development of a fight vehicle, such as NASA's Space Launch System (SLS), is understanding its vibration behavior while in fight. Vehicle designers rely on low-cost finite element analysis (FEA) to predict the vibration behavior of the vehicle. During ground and flight tests, sensors are strategically placed at predefined locations that contribute the most vibration information under the assumption that FEA is accurate, producing points to validate the FEA models. This collaborative work focused on developing optimal sensor placement algorithms to validate FEA models against test data, and to characterize the vehicles vibration characteristics.

    Speaker Info:

    Jon Stallrich

  • Software Reliability and Security Assessment: Automation and Frameworks

    Abstract:

    Software reliability models enable several quantitative predictions such as the number of faults remaining, failure rate, and reliability (probability of failure free operation for a specified period of time in a specified environment). This talk will describe recent efforts in collaboration with NASA, including (1) the development of an automated script for the SFRAT (Software Failure and Reliability Assessment Tool) to streamline application of software reliability methods to ongoing programs, (2) application to a NASA program, (3) lessons learned, (4) and future directions for model and tool development to support the practical needs of the software reliability and security assessment frameworks.

    Speaker Info:

    Lance Fiondella

  • Sources of Error and Bias in Experiments with Human Subjects

    Abstract:

    No set of experimental data is perfect and researchers are aware that data from experimental studies invariably contain some margin of error. This is particularly true of studies with human subjects since human behavior is vulnerable to a range of intrinsic and extrinsic influences beyond the variables being manipulated in a controlled experimental setting. Potential sources of error may lead to wide variations in the interpretation of results and the formulation of subsequent implications. This talk will discuss specific sources of error and bias in the design of experiments and present systematic ways to overcome these effects. First, some of the basic errors in general experimental design will be discussed, including human errors, systematic errors and random errors. Second, we will explore specific types of experimental error that appear in human subjects research. Lastly, we will discuss the role of bias in experiments with human subjects. Bias is a type of systematic error that is introduced into the sampling or testing phase and encourages one outcome over another. Often, bias is the result of the intentional or unintentional influence that an experimenter may exert on the outcomes of a study. We will discuss some common sources of bias in research with human subjects, including biases in sampling, selection, response, performance execution, and measurement. The talk will conclude with a discussion of how errors and bias influence the validity of human subjects research and will explore some strategies for controlling these errors and biases

    Speaker Info:

    Poornima Madhavan

  • Statistical Engineering and M&S in the Design and Development of DoD Systems

    Abstract:

    This presentation will use a notional armament system case-study to illustrate the use of M&S DOE, surrogate modeling, sensitivity analysis, multi-objective optimization and model calibration during early lifecycle development and design activities in the context of a new armament system. In addition to focusing on the statistician’s, data scientist’s, or analyst’s role and the key statistical techniques in engineering DoD systems, this presentation will also emphasize the non-statistical / engineering domain-specific aspects in a multidisciplinary design and development process which make uses of these statistical approaches at the subcomponent and subsystem-level as well as the end-to-end system modeling. A statistical engineering methodology which emphasizes the use of ‘virtual’ DOE-based model emulators developed at the subsystem-level and integrated using a systems-engineering architecture framework can yield a more tractable engineering problem compared to traditional ‘design-build-test-fix’ cycles or direct simulation of computationally expensive models. This supports a more informed prototype design for physical experimentation while providing a greater variety of materiel solutions thereby reducing development and testing cycles and time to field complex systems.

    Speaker Info:

    Doug Ray

  • Statistical Engineering and M&S in the Design and Development of DoD Systems

    Abstract:

    This presentation will use a notional armament system case-study to illustrate the use of M&S DOE, surrogate modeling, sensitivity analysis, multi-objective optimization and model calibration during early lifecycle development and design activities in the context of a new armament system. In addition to focusing on the statistician’s, data scientist’s, or analyst’s role and the key statistical techniques in engineering DoD systems, this presentation will also emphasize the non-statistical / engineering domain-specific aspects in a multidisciplinary design and development process which make uses of these statistical approaches at the subcomponent and subsystem-level as well as the end-to-end system modeling. A statistical engineering methodology which emphasizes the use of ‘virtual’ DOE-based model emulators developed at the subsystem-level and integrated using a systems-engineering architecture framework can yield a more tractable engineering problem compared to traditional ‘design-build-test-fix’ cycles or direct simulation of computationally expensive models. This supports a more informed prototype design for physical experimentation while providing a greater variety of materiel solutions thereby reducing development and testing cycles and time to field complex systems.

    Speaker Info:

    Melissa Jablonski

  • Statistical Methods for Modeling and Simulation Verification and Validation

    Abstract:

    Statistical Methods for Modeling and Simulation Verification and Validation is a 1-day tutorial in applied statistical methods for the planning, designing and analysis of simulation experiments and live test events for the purposes of verifying and validating models and simulations. The course covers the fundamentals of verification and validation of models and simulations, as it is currently practiced and as is suggested for future applications. The first session is largely an introduction to modeling and simulation concepts, verification and validation policies, along with a basic introduction to data visualization and statistical methods. Session 2 covers the essentials of experiment design for simulations and live test events. The final session focuses on analysis techniques appropriate for designed experiments and tests, as well as observational data, for the express purpose of simulation validation. We look forward to your participation in this course.

    Speaker Info:

    Dr. Jim Simpson

    JK Analytics LLC, Adsurgo LLC, Analytical Arts LLC

  • Statistical Methods for Modeling and Simulation Verification and Validation

    Abstract:

    Statistical Methods for Modeling and Simulation Verification and Validation is a 1-day tutorial in applied statistical methods for the planning, designing and analysis of simulation experiments and live test events for the purposes of verifying and validating models and simulations. The course covers the fundamentals of verification and validation of models and simulations, as it is currently practiced and as is suggested for future applications. The first session is largely an introduction to modeling and simulation concepts, verification and validation policies, along with a basic introduction to data visualization and statistical methods. Session 2 covers the essentials of experiment design for simulations and live test events. The final session focuses on analysis techniques appropriate for designed experiments and tests, as well as observational data, for the express purpose of simulation validation. We look forward to your participation in this course.

    Speaker Info:

    Dr. Jim Wisnowski

    JK Analytics LLC, Adsurgo LLC, Analytical Arts LLC

  • Statistical Methods for Modeling and Simulation Verification and Validation

    Abstract:

    Statistical Methods for Modeling and Simulation Verification and Validation is a 1-day tutorial in applied statistical methods for the planning, designing and analysis of simulation experiments and live test events for the purposes of verifying and validating models and simulations. The course covers the fundamentals of verification and validation of models and simulations, as it is currently practiced and as is suggested for future applications. The first session is largely an introduction to modeling and simulation concepts, verification and validation policies, along with a basic introduction to data visualization and statistical methods. Session 2 covers the essentials of experiment design for simulations and live test events. The final session focuses on analysis techniques appropriate for designed experiments and tests, as well as observational data, for the express purpose of simulation validation. We look forward to your participation in this course.

    Speaker Info:

    Dr. Stargel Doane

    JK Analytics LLC, Adsurgo LLC, Analytical Arts LLC

  • Statistical Process Control and Capability Study on the Water Content Measurements in NASA Glenn’s Icing Research Tunnel

    Abstract:

    The Icing Research Tunnel (IRT) at NASA Glenn Research Center follows the recommended practice for icing tunnel calibration outlined in SAE’s ARP5905 document. The calibration team has followed the schedule of a full calibration every five years with a check calibration done every six months following. The liquid water content of the IRT has maintained stability within in the specifications presented to customers that the variation is within +/- 10% of the calibrated, target measurement. With recent measurements and instrumentation errors, a more thorough assessment of error source was desired. By constructing statistical process control charts, the ability to determine how the instrument varies in the short term, mid term, and long term was gained. The control charts offer a view of instrument error, facility error, or installation changes. It was discovered that there was a shift from target to mean baseline thus leading to the study of the overall capability indices of the liquid water content measuring instrument to perform within specifications defined in the IRT. This presentation describes data processing procedures for the Multi-Element Sensor in the IRT, including collision efficiency corrections, canonical correlation analysis, Chauvenet’s Criterion for rejection of data, distribution check of data, and mean, median and mode for construction of control charts. Further data is presented to describe the repeatability of the IRT with the Multi-Element Sensor and the ability to maintain a stable process for the defined calibration schedule.

    Speaker Info:

    Emily Timko

  • Target Location Error Estimation Using Parametric Models

    Speaker Info:

    James Brownlow

  • Test and Evaluation of Emerging Technologies

    Speaker Info:

    Dr. Greg Zacharias

    Chief Scientist Operational Test and Evaluation

  • The 80/20 rule, can and should we break it using efficient data management tools?

    Abstract:

    Abstract: Data scientists spend approximately 80% of their time preparing, cleaning, and feature engineering data sets. In this talk I will share use cases that show why this is important and why we need to do it. I will also describe the Earth System Grid Federation (ESGF) which is an open source effort providing a robust, distributed data and computation platform, enabling world wide access to Peta/Exa-scale scientific data. ESGF will help reduce the amount of effort needed for climate data preprocessing by integrating the necessary analysis and data sharing tools.

    Speaker Info:

    Ghaleb Abdulla

  • The Isle of Misfit Designs: A Guided Tour of Optimal Designs That Break the Mold

    Abstract:

    Whether it was in a Design of Experiments course or through your own work, you’ve no doubt seen and become well acquainted with the standard experimental design. You know the features: they’re “orthogonal” (no messy correlations to deal with), their correlation matrices are nice pretty diagonals, and they can only happen with run sizes of 4, 8, 12, 16, and so on. Well what if I told you that there existed optimal designs that defied convention. What if I told you that, yes, you can run an optimal design with, say, 5 factors in 9 runs. Or 10. Or even 11 runs! Join me as I show you a strange new world of optimal designs that are the best at what they do, even though they might not look very nice.

    Speaker Info:

    Caleb King

  • Thursday Keynote Speaker I

    Speaker Info:

    Wendy Martinez

    Director, Mathematical Statistics Research Center, Bureau of Labor Statistics

    ASA President-Elect (2020)

    Wendy Martinez has been serving as the Director of the Mathematical Statistics Research Center at the Bureau of Labor Statistics (BLS) for six years. Prior to this, she served in several research positions throughout the Department of Defense. She held the position of Science and Technology Program Officer at the Office of Naval Research, where she established a research portfolio comprised of academia and industry performers developing data science products for the future Navy and Marine Corps. Her areas of interest include computational statistics, exploratory data analysis, and text data mining. She is the lead author of three books on MATLAB and statistics. Dr. Martinez was elected as a Fellow of the American Statistical Association (ASA) in 2006 and is an elected member of the International Statistical Institute. She was honored by the American Statistical Association when she received the ASA Founders Award at the JSM 2017 conference. Wendy is also proud and grateful to have been elected as the 2020 ASA President.

  • Thursday Keynote Speaker II

    Speaker Info:

    Michael Little

    Program Manager, Advanced Information Systems Technology

    Earth Science Technology Office, NASA Headquarters

    Over the past 45 years, Mike’s primary focus has been on the management of research and development, focusing on making the results more useful in meeting the needs of the user community. Since 1984, he has specialized in communications, data and processing systems, including projects in NASA, the US Air Force, the FAA and the Census Bureau. Before that, he worked on Major System Acquisition Programs, in the Department of Defense including Marine Corps combat vehicles and US Navy submarines.

    Currently, Mike manages a comprehensive program to provide NASA’s Earth Science research efforts with the information technologies it will need in the 2020-2035 time-frame to characterize, model and understand the Earth. This Program addresses the full range of data lifecycle from generating data using instruments and models, through the management of the data and including the ways in which information technology can help to exploit the data. Of particular interest today are the ways in which NASA can measure and understand transient and transitional phenomena and the impact of climate change. The AIST Program focuses the application of applied math and statistics, artificial intelligence, case-based reasoning, machine learning and automation to improve our ability to use observational data and model output in understanding Earth’s physical processes and natural phenomena.

    Training and odd skills:
    Application of cloud computing
    US Government Computer Security
    US Navy Nuclear Propulsion operations and maintenance on two submarines

  • Thursday Lunchtime Keynote Speaker

    Speaker Info:

    T. Charles Clancy

    Bradley Professor of Electrical and Computer Engineering

    Virginia Tech

    Charles Clancy is the Bradley Professor of Electrical and Computer Engineering at Virginia Tech where he serves as the Executive Director of the Hume Center for National Security and Technology. Clancy leads a range of strategic programs at Virginia Tech related to security, including the Commonwealth Cyber Initiative. Prior to joining VT in 2010, Clancy was an engineering leader in the National Security Agency, leading research programs in digital communications and signal processing. He received his PhD from the University of Maryland, MS from University of Illinois, and BS from the Rose-Hulman Institute of Technology. He is co-author to over 200 peer-reviewed academic publications, six books, over twenty patents, and co-founder to five venture-backed startup companies.

  • Time Machine Learning: Getting Navy Maintenance Duration Right

    Abstract:

    In support of the Navy’s effort to obtain improved outcomes through data-driven decision making, The Center for Naval Analyses’ Data Science Program (CNA/DSP) supports the performance-to plan(P2P) forum, which is co chaired by the Vice Chief of Naval Operations and the Assistant Secretary of the Navy (RD&A). The P2P forum provides senior Navy leadership forward looking performance forecasts, which are foundational to articulating Navy progress toward readiness and capability goals.
    While providing analytical support for this forum, CNA/DSP leveraged machine learning techniques, including Random Forests and Artificial Neural Networks, to develop improved estimates of future maintenance durations for the Navy. When maintenance durations exceed their estimated timelines, these delays can affect training, manning, and deployments in support of operational commanders. Currently, the Navy creates maintenance estimates during numerous timeframes including the program objective memorandum (POM) process, the Presidential Budget (PB), and at contract award leading to evolving estimates over time. The limited historical accuracy for these estimates, especially with the POM and PB estimates, have persisted over the last decade. These errors have led to a gap between planned funding and actual costs in addition to changes in the assets available for operational commanders each year.
    The CNA/DSP prediction model reduces the average error in forecasted maintenance duration days from 128 days to 31 days for POM estimates. Improvements in duration accuracy for the PB and contract award time frames were also achieved using similar ML processes. The data curation for these models involved numerous data sources of varying quality and required significant feature engineering to provide usable model inputs that could allow for forecasts over the Future Years Defense Program (FYDP) in order to support improved resource allocation and scheduling in support of the optimized fleet response training plan (OFRTP).

    Speaker Info:

    Tim Kao

  • Toward Real-Time Decision Making in Experimental Settings

    Abstract:

    Materials scientists, computer scientists and statisticians at LANL have teamed up to investigate how to make near real time decisions during fast-paced experiments. For instance, a materials scientist at a beamline typically has a short window in which to perform a number of experiments, after which they analyze the experimental data, determine interesting new experiments and repeat. In typical circumstances, that cycle could take a year. The goal of this research and development project is to accelerate that cycle so that interesting leads are followed during the short window for experiments, rather than in years to come. We detail some of our UQ work in materials science, including emulation, sensitivity analysis, and solving inverse problems, with an eye toward real-time decision making in experimental settings.

    Speaker Info:

    Devin Francom

  • Tuesday Keynote

    Speaker Info:

    David Chu

    President

    Institute for Defense Analyses

    David Chu serves as President of the Institute for Defense Analyses. IDA is a non-profit corporation operating in the public interest. Its three federally funded research and development centers provide objective analyses of national security issues and related national challenges, particularly those requiring extraordinary scientific and technical expertise.

    As president, Dr. Chu directs the activities of more than 1,000 scientists and technologists. Together, they conduct and support research requested by federal agencies involved in advancing national security and advising on science and technology issues.

    Dr. Chu served in the Department of Defense as Under Secretary of Defense for Personnel and Readiness from 2001-2009, and earlier as Assistant Secretary of Defense and Director for Program Analysis and Evaluation from 1981-1993.

    From 1978-1981 he was the Assistant Director of the Congressional Budget Office for National Security and International Affairs.

    Dr. Chu served in the U. S. Army from 1968-1970. He was an economist with the RAND Corporation from 1970-1978, director of RAND’s Washington Office from 1994-1998, and vice president for its Army Research Division from 1998-2001.

    He earned a bachelor of arts in economics and mathematics, and his doctorate in economics, from Yale University.

    Dr. Chu is a member of the Defense Science Board and a Fellow of the National Academy of Public Administration. He is a recipient of the Department of Defense Medal for Distinguished Public Service with Gold Palm, the Department of Veterans Affairs Meritorious Service Award, the Department of the Army Distinguished Civilian Service Award, the Department of the Navy Distinguished Public Service Award, and the National Academy of Public Administration’s National Public Service Award.

  • Tutorial: Combinatorial Methods for Testing and Analysis of Critical Software and Security Systems

    Abstract:

    Combinatorial methods have attracted attention as a means of providing strong assurance at reduced cost, but when are these methods practical and cost-effective? This tutorial includes two sections on the basis and application of combinatorial test methods: The first section explains the background, process, and tools available for combinatorial testing, with illustrations from industry experience with the method. The focus is on practical applications, including an industrial example of testing to meet FAA-required standards for life-critical software for commercial aviation. Other example applications include modeling and simulation, mobile devices, network configuration, and testing for a NASA spacecraft. The discussion will also include examples of measured resource and cost reduction in case studies from a variety of application domains.

    The second part explains combinatorial testing-based techniques for effective security testing of software components and large-scale software systems. It will develop quality assurance and effective re-verification for security testing of web applications and testing of operating systems. It will further address how combinatorial testing can be applied to ensure proper error-handling of network security protocols and provide the theoretical guarantees for detecting Trojans injected in cryptographic hardware. Procedures and techniques, as well as workarounds will be presented and captured as guidelines for a broader audience.

    Speaker Info:

    Rick Kuhn

    National Institute of Standards & Technology

  • Tutorial: Combinatorial Methods for Testing and Analysis of Critical Software and Security Systems

    Abstract:

    Combinatorial methods have attracted attention as a means of providing strong assurance at reduced cost, but when are these methods practical and cost-effective? This tutorial includes two sections on the basis and application of combinatorial test methods: The first section explains the background, process, and tools available for combinatorial testing, with illustrations from industry experience with the method. The focus is on practical applications, including an industrial example of testing to meet FAA-required standards for life-critical software for commercial aviation. Other example applications include modeling and simulation, mobile devices, network configuration, and testing for a NASA spacecraft. The discussion will also include examples of measured resource and cost reduction in case studies from a variety of application domains.

    The second part explains combinatorial testing-based techniques for effective security testing of software components and large-scale software systems. It will develop quality assurance and effective re-verification for security testing of web applications and testing of operating systems. It will further address how combinatorial testing can be applied to ensure proper error-handling of network security protocols and provide the theoretical guarantees for detecting Trojans injected in cryptographic hardware. Procedures and techniques, as well as workarounds will be presented and captured as guidelines for a broader audience.

    Speaker Info:

    Dimitris Simos

    National Institute of Standards & Technology

  • Tutorial: Combinatorial Methods for Testing and Analysis of Critical Software and Security Systems

    Abstract:

    Combinatorial methods have attracted attention as a means of providing strong assurance at reduced cost, but when are these methods practical and cost-effective? This tutorial includes two sections on the basis and application of combinatorial test methods: The first section explains the background, process, and tools available for combinatorial testing, with illustrations from industry experience with the method. The focus is on practical applications, including an industrial example of testing to meet FAA-required standards for life-critical software for commercial aviation. Other example applications include modeling and simulation, mobile devices, network configuration, and testing for a NASA spacecraft. The discussion will also include examples of measured resource and cost reduction in case studies from a variety of application domains.

    The second part explains combinatorial testing-based techniques for effective security testing of software components and large-scale software systems. It will develop quality assurance and effective re-verification for security testing of web applications and testing of operating systems. It will further address how combinatorial testing can be applied to ensure proper error-handling of network security protocols and provide the theoretical guarantees for detecting Trojans injected in cryptographic hardware. Procedures and techniques, as well as workarounds will be presented and captured as guidelines for a broader audience.

    Speaker Info:

    Raghu Kacker

    National Institute of Standards & Technology

  • Tutorial: Cyber Attack Resilient Weapon Systems

    Abstract:

    This tutorial is an abbreviated version of a 36-hour short course recently provided by UVA to a class composed of engineers working at the Defense Intelligence Agency. The tutorial provides a definition for cyber attack resilience that is an extension of earlier definitions of system resilience that were not focused on cyber attacks. Based upon research results derived by the University of Virginia over an eight year period through DoD/Army/AF/Industry funding , the tutorial will illuminate the following topics: 1) A Resilence Design Requirements methodology and the need for supporting analysis tools, 2) a System Architecture approach for achieving resilience, 3) Example resilience design patterns and example prototype implementations, 4) Experimental results regarding resilience-related roles and readiness of system operators, and 5) Test and Evaluation Issues. The tutorial will be presented by UVA Munster Professor Barry Horowitz.

    Speaker Info:

    Barry Horowitz

    Professor, Systems Engineering

    University of Virginia

  • Tutorial: Developing Valid and Reliable Scales

    Abstract:

    The DoD uses psychological measurement to aid in decision-making about a variety of issues including the mental health of military personnel before and after combat, and the quality of human-systems interactions. To develop quality survey instruments (scales) and interpret the data obtained from these instruments appropriately, analysts and decision-makers must understand the factors that affect the reliability and validity of psychological measurement. This tutorial covers the basics of scale development and validation and discusses current efforts by IDA, DOT&E, ATEC, and JITC to develop validated scales for use in operational test and evaluation.

    Speaker Info:

    Wojton Heather

    IDA / USARMY ATEC

  • Tutorial: Developing Valid and Reliable Scales

    Abstract:

    The DoD uses psychological measurement to aid in decision-making about a variety of issues including the mental health of military personnel before and after combat, and the quality of human-systems interactions. To develop quality survey instruments (scales) and interpret the data obtained from these instruments appropriately, analysts and decision-makers must understand the factors that affect the reliability and validity of psychological measurement. This tutorial covers the basics of scale development and validation and discusses current efforts by IDA, DOT&E, ATEC, and JITC to develop validated scales for use in operational test and evaluation.

    Speaker Info:

    Shane Hall

    IDA / USARMY ATEC

  • Tutorial: Learning Python and Julia

    Abstract:

    In recent years, the programming language Python with its supporting ecosystem has established itself as a significant capability to support the activities of the typical data scientist. Recently, version 1.0 of the programming language Julia has been released; from a software engineering perspective, it can be viewed as a modern alternative. This tutorial presents both Python and Julia from both a user and developer point of view. From a user’s point of view, the basic syntax of each, along with fundamental prerequisite knowledge presented. From a developers point of view the underlying infrastructure of the programming language / interpreter / compiler is discussed.

    Speaker Info:

    Douglas Hodson

    Associate Professor

    Air Force Institute of Technology

  • Tutorial: Reproducible Research

    Abstract:

    Analyses are “reproducible” if the same methods applied to the same data produce identical results when run again by another researcher (or you in the future). Reproducible analyses are transparent and easy for reviewers to verify, as results and figures can be traced directly to the data and methods that produced them. There are also direct benefits to the researcher. Real-world analysis workflows inevitably require changes to incorporate new or additional data, or to address feedback from collaborators, reviewers, or sponsors. These changes are easier to make when reproducible research best practices have been considered from the start.
    Poor reproducibility habits result in analyses that are difficult or impossible to review, are prone to compounded mistakes, and are inefficient to re-run in the future. They can lead to duplication of effort or even loss of accumulated knowledge when a researcher leaves your organization. With larger and more complex datasets, along with more complex analysis techniques, reproducibility is more important than ever.
    Although reproducibility is critical, it is often not prioritized either due to a lack of time or an incomplete understanding of end-to-end opportunities to improve reproducibility.
    This tutorial will discuss the benefits of reproducible research and will demonstrate ways that analysts can introduce reproducible research practices during each phase of the analysis workflow: preparing for an analysis, performing the analysis, and presenting results. A motivating example will be carried throughout to demonstrate specific techniques, useful tools, and other tips and tricks where appropriate. The discussion of specific techniques and tools is non-exhaustive; we focus on things that are accessible and immediately useful for someone new to reproducible research. The methods will focus mainly on work performed using R, but the general concepts underlying reproducible research techniques can be implemented in other analysis environments, such as JMP and Excel, and are briefly discussed.
    By implementing the approaches and concepts discussed during this tutorial, analysts in defense and aerospace will be equipped to produce more credible and defensible analyses of T&E data.

    Speaker Info:

    Andrew Flack

    IDA

  • Tutorial: Reproducible Research

    Abstract:

    Analyses are “reproducible” if the same methods applied to the same data produce identical results when run again by another researcher (or you in the future). Reproducible analyses are transparent and easy for reviewers to verify, as results and figures can be traced directly to the data and methods that produced them. There are also direct benefits to the researcher. Real-world analysis workflows inevitably require changes to incorporate new or additional data, or to address feedback from collaborators, reviewers, or sponsors. These changes are easier to make when reproducible research best practices have been considered from the start.
    Poor reproducibility habits result in analyses that are difficult or impossible to review, are prone to compounded mistakes, and are inefficient to re-run in the future. They can lead to duplication of effort or even loss of accumulated knowledge when a researcher leaves your organization. With larger and more complex datasets, along with more complex analysis techniques, reproducibility is more important than ever.
    Although reproducibility is critical, it is often not prioritized either due to a lack of time or an incomplete understanding of end-to-end opportunities to improve reproducibility.
    This tutorial will discuss the benefits of reproducible research and will demonstrate ways that analysts can introduce reproducible research practices during each phase of the analysis workflow: preparing for an analysis, performing the analysis, and presenting results. A motivating example will be carried throughout to demonstrate specific techniques, useful tools, and other tips and tricks where appropriate. The discussion of specific techniques and tools is non-exhaustive; we focus on things that are accessible and immediately useful for someone new to reproducible research. The methods will focus mainly on work performed using R, but the general concepts underlying reproducible research techniques can be implemented in other analysis environments, such as JMP and Excel, and are briefly discussed.
    By implementing the approaches and concepts discussed during this tutorial, analysts in defense and aerospace will be equipped to produce more credible and defensible analyses of T&E data.

    Speaker Info:

    John Haman

    IDA

  • Tutorial: Reproducible Research

    Abstract:

    Analyses are “reproducible” if the same methods applied to the same data produce identical results when run again by another researcher (or you in the future). Reproducible analyses are transparent and easy for reviewers to verify, as results and figures can be traced directly to the data and methods that produced them. There are also direct benefits to the researcher. Real-world analysis workflows inevitably require changes to incorporate new or additional data, or to address feedback from collaborators, reviewers, or sponsors. These changes are easier to make when reproducible research best practices have been considered from the start.
    Poor reproducibility habits result in analyses that are difficult or impossible to review, are prone to compounded mistakes, and are inefficient to re-run in the future. They can lead to duplication of effort or even loss of accumulated knowledge when a researcher leaves your organization. With larger and more complex datasets, along with more complex analysis techniques, reproducibility is more important than ever.
    Although reproducibility is critical, it is often not prioritized either due to a lack of time or an incomplete understanding of end-to-end opportunities to improve reproducibility.
    This tutorial will discuss the benefits of reproducible research and will demonstrate ways that analysts can introduce reproducible research practices during each phase of the analysis workflow: preparing for an analysis, performing the analysis, and presenting results. A motivating example will be carried throughout to demonstrate specific techniques, useful tools, and other tips and tricks where appropriate. The discussion of specific techniques and tools is non-exhaustive; we focus on things that are accessible and immediately useful for someone new to reproducible research. The methods will focus mainly on work performed using R, but the general concepts underlying reproducible research techniques can be implemented in other analysis environments, such as JMP and Excel, and are briefly discussed.
    By implementing the approaches and concepts discussed during this tutorial, analysts in defense and aerospace will be equipped to produce more credible and defensible analyses of T&E data.

    Speaker Info:

    Kevin Kirshenbaum

    IDA

  • Tutorial: Statistics Boot Camp

    Abstract:

    In the test community, we frequently use statistics to extract meaning from data. These inferences may be drawn with respect to topics ranging from system performance to human factors. In this mini-tutorial, we will begin by discussing the use of descriptive and inferential statistics, before exploring the basics of interval estimation and hypothesis testing. We will introduce common statistical techniques and when to apply them, and conclude with a brief discussion of how to present your statistical findings graphically for maximum impact.

    Speaker Info:

    Kelly Avery

    IDA

  • Uncertainty Quantification

    Abstract:

    We increasingly rely on mathematical and statistical models to predict
    phenomena ranging from nuclear power plant design to profits made in financial markets. When assessing the feasibility of these predictions, it is critical to quantify uncertainties associated with the models, inputs to the models, and data used to calibrate the models. The synthesis of statistical and mathematical techniques, which can be used to quantify input and response uncertainties for simulation codes that can take hours to days to run, comprises the evolving field of uncertainty quantification. The use of data, to improve the predictive accuracy of models, is central to uncertainty quantification so we will begin by providing an overview of how Bayesian techniques can be used to construct distributions for model inputs. We will subsequently describe the computational issues associated with propagating these distributions through complex models to construct prediction intervals for statistical quantities of interest such as expected
    profits or maximal reactor temperatures. Finally, we will describe the use of
    sensitivity analysis to isolate critical model inputs and surrogate model
    construction for simulation codes that are too complex for direct statistical
    analysis. All topics will be motivated by examples arising in engineering, biology, and economics.

    Speaker Info:

    Ralph Smith

    North Carolina State University

  • Uncertainty Quantification: Combining Large Scale Computational Models with Physical Data for Inference

    Abstract:

    Combining physical measurements with computational models is key to many investigations involving validation and uncertainty quantification (UQ). This talk surveys some of the many approaches taken for validation and UQ, with large-scale computational models. Experience with such applications suggests classifications of different types of problems with common features (e.g. data size, amount of empiricism in the model, computational demands, availability of data, extent of extrapolation required, etc.). More recently, social and social-technical systems are being considered for similar analyses, bringing new challenges to this area. This talk will approaches for such problems and will highlight what might be new research directions for application and methodological development in UQ.

    Speaker Info:

    Dave Higdon

  • Using Bayesian Neural Networks for Uncertainty Quantification of Hyperspectral Image Target Detection

    Abstract:

    Target detection in hyperspectral images (HSI) has broad value in defense applications, and neural networks have recently begun to be applied for this problem. A common criticism of neural networks is they give a point estimate with no uncertainty quantification (UQ). In defense applications, UQ is imperative because the cost of a false positive or negative is high. Users desire high confidence in either “target” or “not target” predictions, and if high confidence cannot be achieved, more inspection is warranted. One possible solution is Bayesian neural networks (BNN). Compared to traditional neural networks which are constructed by choosing a loss function, BNN take a probabilistic approach and place a likelihood function on the data and prior distributions for all parameters (weights and biases),
    which in turn implies a loss function. Training results in posterior predictive distributions, from which prediction intervals can be computed, rather than only point estimates. Heatmaps show where and how much uncertainty there is at any location and give insight into the physical area being imaged as well as possible improvements to the model. Using pytorch and pyro software, we test BNN on a simulated HSI scene produced using the Rochester Institute of Technology (RIT) Digital Imaging and Remote Sensing Image Generation (DIRSIG) model. The scene geometry used is also developed by RIT and is a detailed representation of a suburban neighborhood near Rochester, NY, named “MegaScene.” Target panels were inserted for this effort, using paint reflectance and bi-directional reflectance distribution function (BRDF) data acquired from the Nonconventional Exploitation Factors Database System (NEFDS). The target panels range in size from large to subpixel, with some targets only partially visible. Multiple renderings of this scene are created under different times of day and with different atmospheric conditions to assess model generalization. We explore the uncertainty heatmap for different times and
    environments on MegaScene as well as individual target predictive distributions to gain insight into the power of BNN.

    Speaker Info:

    Daniel Ries

  • Validation and Uncertainty Quantification of Complex Models

    Abstract:

    Advances in high performance computing have enabled detailed simulations of real-world physical processes, and these simulations produce large datasets. Even as detailed as they are, these
    simulations are only approximations of imperfect mathematical models, and furthermore, their outputs depend on inputs that are themselves uncertain. The main goal of a validation and uncertainty quantication methodology is to determine the uncertainty, that is, the relationship between the true value of a quantity of interest and its prediction by the simulation. The value of the computational results is limited unless the uncertainty can be quantied or bounded. Bayesian
    calibration is a common method for estimating model parameters and quantifying their associated uncertainties; however, calibration becomes more complicated when the data arise from dierent
    types of experiments. On an example from material science we will employ two types of data and demonstrate how one can obtain a set of material strength models that agree with both data sources.

    Speaker Info:

    Kassie Fronczyk

  • Valuing Human Systems Integration: A Test and Data Perspective

    Abstract:

    Technology advances are accelerating at a rapid pace, with the potential to enable greater capability and power to the Warfighter. However, if human capabilities and limitations are not central to concepts, requirements, design, and development then new/upgraded weapons and systems will be difficult to train, operate, and maintain, may not result in the skills, job, grade, and manpower mix as projected, and may result in serious human error, injury or Soldier loss.
    The Army Human Systems Integration (HSI) program seeks to overcome these challenges by ensuring appropriate consideration and integration of seven technical domains: Human Factors Engineering (e.g., usability), Manpower, Personnel, Training, Safety and Occupational Health, Habitability, Force Protection and Survivability. The tradeoffs, constraints, and limitations occurring among and between these technical domains allows HSI to execute a coordinated, systematic process for putting the warfighter at the center of the design process – equipping the warfighter rather than manning equipment. To that end, the Army HSI Headquarters, currently as a directorate within the Army Headquarters Deputy Chief of Staff (DCS), G-1 develops strategies and ensures human systems factors are early key drivers in concepts, strategy, and requirements, and are fully integrated throughout system design, development, testing and evaluation, and sustainment
    The need to consider HSI factors early in the development cycle is critical. Too often, man-machine interface issues are not addressed until late in the development cycle (i.e. production and deployment phase) after the configuration of a particular weapon or system has been set. What results is a degraded combat capability, suboptimal system and system-of-systems integration, increased training and sustainment requirements, or fielded systems not in use.
    Acquisition test data are also good sources to glean HSI return on investment (ROI) metrics. Defense acquisition reports such as test and evaluation operational assessments identifies HSI factors as root causes when Army programs experience increase cost, schedule overruns, or low performance. This is identifiable by the number and type of systems that require follow-on test and evaluation (FOT&E), over reliance on field service representatives (FSRs), costly and time consuming engineering change requests (ECRs), or failures in achieving reliability, availability, and maintainability (RAM) key performance parameters (KPPs) and key system attributes (KSAs).
    In this presentation, we will present these data and submit several return on investment (ROI) metrics, closely aligned to the defense acquisition process, to emphasize and illustrate the value of HSI. Optimizing Warfighter-System performance and reducing human errors, minimizing risk of Soldier loss or injury, and reducing personnel and materiel life cycle costs produces data that are inextricably linked to early, iterative, and measurable HSI processes within the defense acquisition system.

    Speaker Info:

    Jeffrey Thomas

  • Waste Not, Want Not: A Methodological Illustration of Quantitative Text Analysis

    Abstract:

    "The wise use of one's resources will keep one from poverty." This is the definition of the proverbial saying "waste not, want not" according to www.dictionary.com. Indeed, one of the most common resources analysts encounter is text in free-form. This text might come from survey comments, feedback, websites, transcriptions of interviews, videos, etcetera. Notably, researchers have used wisely the information conveyed in text for many years. However, in many instances, the qualitative methods employed require numerous hours of reading, training, coding, and validating, among others.
    As technology continues to evolve, simple access to text data is blooming. For example, analysts conducting online studies can have thousands of text entries from participants' comments. Even without recent advances in technology analysts have had access to text in books, letters, and other archival data for centuries. One important challenge, however, is figuring out how to make sense of text data without investing a large number of resources, time, and the effort involved in qualitative methodology or "old-school" quantitative approaches (such as reading a collection of 200 books and counting the occurrence of important terms in the text). This challenge has been solved in the information retrieval field –a branch of computer science—with the implementation of a technique called latent semantic analysis (LSA; Manning, Raghavan, & Schütze, 2008) and a closely related technique called topic analysis (TA; SAS Institute Inc., 2018). Undoubtedly, other quantitative methods for text analysis, such as latent Dirichlet analysis (Blei, Ng, & Jordan, 2003), are also apt for the task of unveiling knowledge from text data, but we restrict the discussion in this presentation to LSA and TA because these exclusively deal with the underlying structure of the text rather than identifying clusters.
    In this presentation, we aim to make quantitative text analysis --specifically LSA and TA-- accessible to researchers and analysts from a variety of disciplines. We do this by leveraging understanding of a popular multivariate technique: principal components analysis (PCA). We start by describing LSA and TA by drawing comparisons and equivalencies to PCA. We make these comparisons in an intuitive, user-friendly manner and then through a technical description of mathematical statements, which rely on the singular value decomposition of a document-term matrix. Moreover, we explain the implementation of LSA and TA using statistical software to enable simple application of these techniques. Finally, we show a practical application of LSA and TA with empirical data of aircraft incidents.

    Speaker Info:

    Laura Castro-Schilo

  • Wednesday Keynote Speaker I

    Speaker Info:

    Peter Parker

    Team Lead, Advanced Measurement Systems

    NASA

    Dr. Parker is Team Lead for Advanced Measurement Systems at the National Aeronautics and Space Administration’s Langley Research Center in Hampton, Virginia. He serves an Agency-wide statistical expert across all of NASA’s mission directorates of Exploration, Aeronautics, and Science to infuse statistical thinking, engineering, and methods including statistical design of experiments, response surface methodology, and measurement system characterization. His expertise is in collaboratively integrating research objectives, measurement sciences, test design, and statistical methods to produce actionable knowledge for aerospace research and development.

    He holds a B.S. in Mechanical Engineering, a M.S. in Applied Physics and Computer Science and a M.S. and Ph.D. in Statistics from Virginia Tech. Dr. Parker is a senior member of the American Institute for Aeronautics and Astronautics, American Society for Quality, and the American Statistical Association. Dr. Parker currently Chairs the American Society for Quality’s Publication Management Board and previously served as Editor-in-Chief of the journal Quality Engineering.

  • Wednesday Keynote Speaker II

    Speaker Info:

    Laura Freeman

    Associate Director, ISL

    Hume Center for National Security and Technology, Virginia Tech

    Dr. Laura Freeman is an Assistant Director of the Operational Evaluation Division at the Institute for Defense Analyses. In that position, she established and developed an interdisciplinary analytical team of statisticians, psychologists, and engineers to advance scientific approaches to DoD test and evaluation. Her focus areas include test design, statistical data analysis, modeling and simulation validation, human-system interactions, reliability analysis, software testing, and cybersecurity testing. Dr. Freeman currently leads a research task for the Chief Management Officer (CMO) aiming to reform DoD testing. She guides an interdisciplinary team in recommending changes and developing best practices. Reform initiatives include incorporating mission context early in the acquisition lifecycle, integrating all test activities, and improving data management processes.

    During 2018, Dr. Freeman served as that acting Senior Technical Advisor for Director Operational Test and Evaluation (DOT&E). As the Senior Technical Advisor, Dr. Freeman provided leadership, advice, and counsel to all personnel on technical aspects of testing military systems. She served as a liaison with Service technical advisors, General Officers, and members of the Senior Executive Service on key technical issues. She reviewed test strategies, plans, and reports from all systems on DOT&E oversight.

    During her tenure at IDA, Dr. Freeman has designed tests and conducted statistical analyses for programs of national importance including weapon systems, missile defense, undersea warfare systems, command and control systems, and most recently the F-35. She prioritizes supporting the analytical community in the DoD workforce. She developed and taught numerous courses on advanced test design and statistical analysis, including two new Defense Acquisition University (DAU) Courses on statistical methods. She is a founding organizer of DATAWorks (Defense and Aerospace Test and Analysis Workshop), a workshop designed to share new methods, provide training, and share best practices between NASA, the DoD, and National Labs.

    Dr. Freeman is the recipient of the 2017 IDA Goodpaster Award for Excellence in Research and the 2013 International Test and Evaluation Association (ITEA) Junior Achiever Award. She is a member of the American Statistical Association, the American Society for Quality, the International Statistical Engineering Association, and ITEA. She serves on the editorial boards for Quality Engineering, Quality Reliability Engineering International, and the ITEA Journal. Her areas of statistical expertise include designed experiments, reliability analysis, and industrial statistics.
    Prior to joining IDA in 2010, Dr. Freeman worked at SAIC providing statistical guidance to the Director, Operational Test and Evaluation. She also consulted with NASA on various projects. In 2008, Dr. Freeman established the Laboratory for Interdisciplinary Statistical Analyses at Virginia Tech and Served as its inaugural Director.

    Dr. Freeman has a B.S. in Aerospace Engineering, a M.S. in Statistics and a Ph.D. in Statistics, all from Virginia Tech. Her Ph.D. research was on design and analysis of experiments for reliability data.

  • Wednesday Keynote Speaker III

    Speaker Info:

    Timothy Dare

    Deputy Director, Developmental Test, Evaluation, and Prototyping

    SES OUSD(R&E)

    Mr. Timothy S. Dare is the Deputy Director for Developmental Test, Evaluation and Prototyping (DD(DTEP)). As the DD(DTEP), he serves as the principal advisor on developmental test and evaluation (DT&E) to the Secretary of Defense, Under Secretary of Defense for Research and Engineering, and Director of Defense Research and Engineering for Advanced Capabilities. Mr. Dare is responsible for DT&E policy and guidance in support of the acquisition of major Department of Defense (DoD) systems, and providing advocacy, oversight, and guidance to the DT&E acquisition workforce. He informs policy and advances leading edge technologies through the development of advanced technology concepts, and developmental and operational prototypes. By working closely with interagency partners, academia, industry and governmental labs, he identifies, develops and demonstrates multi-domain technologies and concepts that address high-priority DoD, multi-Service, and Combatant Command warfighting needs.

    Prior to his appointment in December 2018, Mr. Dare was a Senior Program Manager for program management and capture at Lockheed Martin (LM) Space. In this role he was responsible for the capture and execution phases of multiple Intercontinental Ballistic Missile programs for Minuteman III, including a new airborne Nuclear Command and Control (NC2) development program. His major responsibilities included establishing program working environments at multiple locations, policies, processes, staffing, budget and technical baselines.

    Mr. Dare has extensive T&E and prototyping experience. As the Engineering Program Manager for the $1.8B Integrated Space C2 programs for NORAD/NORTHCOM systems at Cheyenne Mountain, Mr. Dare was the Integration and Test lead focusing on planning, executing, and evaluating the integration and test phases (developmental and operational T&E) for Missile Warning and Space Situational Awareness (SSA) systems. Mr. Dare has also been the Engineering Lead/Integration and Test lead on other systems such as the Hubble Space Telescope; international border control systems; artificial intelligence (AI) development systems (knowledge-based reasoning); Service-based networking systems for the UK Ministry of Defence; Army C2 systems; Space Fence C2; and foreign intelligence, surveillance, and reconnaissance systems.

    As part of the Department’s strategic defense portfolio, Mr. Dare led the development of advanced prototypes in SSA C2 (Space Fence), Information Assurance (Single Sign-on), AI systems, and was the sponsoring program manager for NC2 capability development.

    Mr. Dare is a graduate of Purdue University and is a member of both the Association for Computing Machinery and Program Management Institute. He has been recognized by the U.S. Air Force for his contributions supporting NORAD/NORTHCOM’s strategic defense missions, and the National Aeronautics and Space Administration for his contributions to the original Hubble Space Telescope program. Mr. Dare holds a U.S. Patent for Single Sign-on architectures.

  • Wednesday Lunchtime Keynote Speaker

    Speaker Info:

    Jared Freeman

    Chief Scientist of Aptima and Chair of the Human Systems Division

    National Defense Industry Association

    Jared Freeman, Ph.D., is Chief Scientist of Aptima and Chair of the Human Systems Division of the National Defense Industry Association. His research and publications address measurement, assessment, and enhancement of human learning, cognition, and performance in technologically complex military environments.

  • When Validation Fails: Analysis of Data from an Imperfect Test Chamber

    Abstract:

    For chemical/biological testing, test chambers are sometimes designed with a vapor or aerosol homogeneity requirement. For example, a test community may require that the difference in concentration between any two test locations in a chamber be no greater than 20 percent. To validate the chamber, testers must demonstrate that such a requirement is met with a specified amount of certainty, such as 80 percent. With a validated chamber, multiple systems can be simultaneously tested at different test locations with the assurance that each system is exposed to nearly the same concentration.

    In some cases, however, homogeneity requirements are difficult to achieve. This presentation demonstrates a valid Bayesian method for testing probability of detection as a function of concentration in a chamber that fails to meet a homogeneity requirement. The demonstrated method of analysis is based on recent experience with an actual test chamber. Multiple systems are tested simultaneously at different locations in the chamber. Because systems tested in the chamber are exposed to different concentrations depending on these locations, the differences must be quantified to the greatest extent possible. To this end, data from the failed validation efforts are used to specify informative prior distributions for probability-of-detection modeling. Because these priors quantify and incorporate uncertainty in model parameters, they ensure that the final probability-of-detection model constitutes a valid comparison of the performance of the different systems.

    Speaker Info:

    Kendal Ferguson

  • Your Mean May Not Mean What You Mean It to Mean

    Abstract:

    The average and standard deviation of, say, strength or dimensional test data are basic engineering math, simple to calculate. What those resulting values actually mean, however, may not be simple, and can be surprisingly different from what a researcher wants to calculate and communicate. Mistakes can lead to overlarge estimates of spread, structures that are over- or under-designed and other challenges to understanding or communicating what your data is really telling you.

    This talk will discuss some common errors and missed opportunities seen in engineering and scientific analyses along with mitigations that can be applied through smart and efficient test planning and analysis. It will cover when - and when not - to report a simple mean of a dataset based on the way the data was taken; why ignoring this often either hides or overstates risk; and a standard method for planning tests and analyses to avoid this problem. And it will cover what investigators can correctly (or incorrectly) say about means and standard deviations of data, including how and why to describe uncertainty and assumptions depending on what a value will be used for.

    The presentation is geared toward the engineer, scientist or project manager charged with test planning, data analysis or understanding findings from tests and other analyses. Attenders' basic understanding of quantitative data analysis is recommended; more-experienced participants will grasp correspondingly more nuance from the pitch. Some knowledge of statistics is helpful, but not required. Participants will be challenged to think about an average as not just “the average”, but a valuable number that can and must relate to the engineering problem to be solved, and must be firmly based in the data. Attenders will leave the talk with a more sophisticated understanding of this basic, ubiquitous but surprisingly nuanced statistic and greater appreciation of its power as an engineering tool.

    Speaker Info:

    Ken Johnson

  • A Multi-method, Triangulation Approach to Operational Testing

    Abstract:

    Humans are not produced in quality-controlled assembly lines, and we typically are much more variable than the mechanical systems we employ. This mismatch means that when characterizing the effectiveness of a system, the system must be considered in the context of its users. Accurate measurement is critical to this endeavor, yet while human variability is large, effort to reduce measurement error of those humans is relatively small. The following talk discusses the importance of using multiple measurement methods—triangulation—to reduce error and increase confidence when characterizing the quality of HSI. A case study from an operational test of an attack helicopter demonstrates how triangulation enables more actionable recommendations.

    Speaker Info:

    Daniel Porter

    Research Staff Member

    IDA

  • Application of Adaptive Sampling to Advance the Metamodeling and Uncertainty Quantification Process

    Abstract:

    Over the years the aerospace industry has continued to implement design of experiments and metamodeling (e.g., response surface methodology) in order to shift the knowledge curve forward in the systems design process. While the adoption of these methods is still incomplete across aerospace sub-disciplines, they comprise the state-of-the-art during systems design and for design evaluation using modeling and simulation or ground testing. In the context of modeling and simulation, while national infrastructure in high performance computing becomes higher performance, so do the demands placed on those resources in terms of simulation fidelity and number of researchers. Furthermore, with recent emphasis placed on the uncertainty quantification of aerospace system design performance, the number of simulation cases needed to properly characterize a system’s uncertainty across the entire design space increases by orders of magnitude, further stressing available resources. This leads to advanced development groups either sticking to ad hoc estimates of uncertainty (e.g., subject matter expert estimates based on experience) or neglecting uncertainty quantification all together. Advancing the state-of-the-art of aerospace systems design and evaluation requires a practical adaptive sampling scheme that responds to the characteristics of the underlying design or uncertainty space. For example, when refining a system metamodel gradually, points should be chosen for design variable combinations that are located in high curvature regions or where metamodel uncertainty is the greatest. The latter method can be implemented by defining a functional form of the metamodel variance and using it to define the next best point to sample. For schemes that require n points to be sampled simultaneously, considerations can be made to ensure proper sample dispersion. The implementation of adaptive sampling schemes to the design and evaluation process will enable similar fidelity with fewer samples of the design space compared to fixed or ad hoc sampling methods (i.e., shorter time or human resources required). Alternatively, the uncertainty of the design space can be reduced to a greater extent for the same number of samples or with fewer samples using higher fidelity simulations. The purpose of this presentation will be to examine the benefits of adaptive sampling as applied to challenging design problems. Emphasis will be placed on methods that are accessible to engineering

    Speaker Info:

    Erik Axdahl

    Hypersonic Airbreathing Propulsion Branch NASA Langley Research Center

  • Application of Adaptive Sampling to Advance the Metamodeling and Uncertainty Quantification Process

    Abstract:

    Over the years the aerospace industry has continued to implement design of experiments and metamodeling (e.g., response surface methodology) in order to shift the knowledge curve forward in the systems design process. While the adoption of these methods is still incomplete across aerospace sub-disciplines, they comprise the state-of-the-art during systems design and for design evaluation using modeling and simulation or ground testing. In the context of modeling and simulation, while national infrastructure in high performance computing becomes higher performance, so do the demands placed on those resources in terms of simulation fidelity and number of researchers. Furthermore, with recent emphasis placed on the uncertainty quantification of aerospace system design performance, the number of simulation cases needed to properly characterize a system’s uncertainty across the entire design space increases by orders of magnitude, further stressing available resources. This leads to advanced development groups either sticking to ad hoc estimates of uncertainty (e.g., subject matter expert estimates based on experience) or neglecting uncertainty quantification all together. Advancing the state-of-the-art of aerospace systems design and evaluation requires a practical adaptive sampling scheme that responds to the characteristics of the underlying design or uncertainty space. For example, when refining a system metamodel gradually, points should be chosen for design variable combinations that are located in high curvature regions or where metamodel uncertainty is the greatest. The latter method can be implemented by defining a functional form of the metamodel variance and using it to define the next best point to sample. For schemes that require n points to be sampled simultaneously, considerations can be made to ensure proper sample dispersion. The implementation of adaptive sampling schemes to the design and evaluation process will enable similar fidelity with fewer samples of the design space compared to fixed or ad hoc sampling methods (i.e., shorter time or human resources required). Alternatively, the uncertainty of the design space can be reduced to a greater extent for the same number of samples or with fewer samples using higher fidelity simulations. The purpose of this presentation will be to examine the benefits of adaptive sampling as applied to challenging design problems. Emphasis will be placed on methods that are accessible to engineering

    Speaker Info:

    Robert Baurle

    Hypersonic Airbreathing Propulsion Branch NASA Langley Research Center

  • Application of Design of Experiments to a Calibration of the National Transonic Facility

    Abstract:

    Recent work at the National Transonic Facility (NTF) at the NASA Langley Research Center has shown that a substantial reduction in freestream pressure fluctuations can be achieved by positioning the moveable model support walls and plenum re-entry flaps to choke the flow just downstream of the test section. This choked condition reduces the upstream propagation of disturbances from the diffuser into the test section, resulting in improved Mach number control and reduced freestream variability. The choked conditions also affect the Mach number gradient and distribution in the test section, so a calibration experiment was undertaken to quantify the effects of the model support wall and re-entry flap movements on the facility freestream flow using a centerline static pipe. A design of experiments (DOE) approach was used to develop restricted-randomization experiments to determine the effects of total pressure, reference Mach number, model support wall angle, re-entry flap gap height, and test section longitudinal location on the centerline static pressure and local Mach number distributions for a reference Mach number range from 0.7 to 0.9. Tests were conducted using air as the test medium at a total temperature of 120 °F as well as for gaseous nitrogen at cryogenic total temperatures of -50, -150, and -250 °F. The resulting data were used to construct quadratic polynomial regression models for these factors using a Restricted Maximum Likelihood (REML) estimator approach. Independent validation data were acquired at off-design conditions to check the accuracy of the regression models. Additional experiments were designed and executed over the full Mach number range of the facility (0.2 £ Mref £ 1.1) at each of the four total temperature conditions, but with the model support walls and re-entry flaps set to their nominal positions, in order to provide calibration regression models for operational experiments where a choked condition downstream of the test section is either not feasible or not required. This presentation focuses on the design, execution, analysis, and results for the two experiments performed using air at a total temperature of 120 °F. Comparisons are made between the regression model output and validation data, as well as the legacy NTF calibration results, and future work is discussed.

    Speaker Info:

    Matt Rhode

    NASA

  • Application of Design of Experiments to a Calibration of the National Transonic Facility

    Abstract:

    Recent work at the National Transonic Facility (NTF) at the NASA Langley Research Center has shown that a substantial reduction in freestream pressure fluctuations can be achieved by positioning the moveable model support walls and plenum re-entry flaps to choke the flow just downstream of the test section. This choked condition reduces the upstream propagation of disturbances from the diffuser into the test section, resulting in improved Mach number control and reduced freestream variability. The choked conditions also affect the Mach number gradient and distribution in the test section, so a calibration experiment was undertaken to quantify the effects of the model support wall and re-entry flap movements on the facility freestream flow using a centerline static pipe. A design of experiments (DOE) approach was used to develop restricted-randomization experiments to determine the effects of total pressure, reference Mach number, model support wall angle, re-entry flap gap height, and test section longitudinal location on the centerline static pressure and local Mach number distributions for a reference Mach number range from 0.7 to 0.9. Tests were conducted using air as the test medium at a total temperature of 120 °F as well as for gaseous nitrogen at cryogenic total temperatures of -50, -150, and -250 °F. The resulting data were used to construct quadratic polynomial regression models for these factors using a Restricted Maximum Likelihood (REML) estimator approach. Independent validation data were acquired at off-design conditions to check the accuracy of the regression models. Additional experiments were designed and executed over the full Mach number range of the facility (0.2 £ Mref £ 1.1) at each of the four total temperature conditions, but with the model support walls and re-entry flaps set to their nominal positions, in order to provide calibration regression models for operational experiments where a choked condition downstream of the test section is either not feasible or not required. This presentation focuses on the design, execution, analysis, and results for the two experiments performed using air at a total temperature of 120 °F. Comparisons are made between the regression model output and validation data, as well as the legacy NTF calibration results, and future work is discussed.

    Speaker Info:

    Matt Bailey

    Jacobs Technology Inc

  • Application of Statistical Methods and Designed Experiments to Development of Technical Requirements

    Abstract:

    The Army relies heavily on the voice of the customer to develop and refine technical requirements for developmental systems, but too often the approach is reactive. The ARDEC (Armament Research, Development & Engineering Center) Statistics Group at Picatinny Arsenal, NJ, working closely with subject matter experts, has been implementing market research and web development techniques and Design of Experiments (DOE) best practices to design and analyze surveys that provide insight into the customer’s perception of utility for various developmental commodities. Quality organizations tend to focus on ensuring products meet technical requirements, with far less of an emphasis placed on whether or not the specification actually captures customer needs. The employment of techniques and best practices spanning the fields of Market Research, Design of Experiments, and Web Development (choice design, conjoint analysis, contingency analysis, psychometric response scales, stratified random sampling) converge towards a more proactive and risk-mitigating approach to the development of technical and training requirements, and encourages strategic decision-making when faced with the inarticulate nature of human preference. Establishing a hierarchy of customer preference for objective and threshold values of key performance parameters enriches the development process of emerging systems by making the process simultaneously more effective and more efficient.

    Speaker Info:

    Eli Golden

    U.S. ARMY ARMAMENT RESEARCH, DEVELOPMENT & ENGINEERING CENTER

  • Asparagus is the most articulate vegetable ever

    Abstract:

    During the summer of 2001, Microsoft launched Windows XP, which was lauded by many users as the most reliable and usable operating system at the time. Miami Herald columnist, Dave Berry, responded to this praise by stating that “this is like saying asparagus is the most articulate vegetable ever.” Whether you agree or disagree with Dave Berry, these users’ reactions are relative (to other operating systems and to other past and future versions). This is due to an array of technological factors that have facilitated human-system improvements. Automation is often cited as improving human-system performance across many domains. It is true that when the human and automation are aligned, performance improves. But, what about the times that this is not the case? This presentation will describe the myths and facts about human-system performance and increasing levels of automation through examples of human-system R&D conducted on a satellite ground system. Factors that affect human-system performance and a method to characterize mission performance as it relates to increasing levels of automation will also be discussed.

    Speaker Info:

    Kerstan Cole

  • Assessing Human Visual Inspection for Acceptance Testing: An Attribute Agreement Analysis Case Study

    Abstract:

    In today’s manufacturing, inspection, and testing world, understanding the capability of the measurement system being used via the use of Measurement Systems Analyses (MSA) is a crucial activity that provides the foundation for the use of Design of Experiments (DOE) and Statistical Process Control (SPC). Although undesirable, there are times when human observation is the only measurement system available. In these types of situations, traditional MSA tools are often ineffectual due to the nature of the data collected. When there are no other alternatives, we need some method for assessing the adequacy and effectiveness of the human observations. When multiple observers are involved, Attribute Agreement Analyses are a powerful tool for quantifying the Agreement and Effectiveness of a visual inspection system. This talk will outline best practices and rules of thumb for Attribute Agreement Analyses, and will highlight a recent Army case study to further demonstrate the tool’s use and potential.

    Speaker Info:

    Christopher Drake

    Lead Statistician

    QE&SA Statistical Methods & Analysis Group

  • B-52 Radar Modernization Test Design Considerations

    Abstract:

    Inherent system processes, restrictions on collection, or cost may impact the practical execution of an operational test. This study presents the use of blocking and split-plot designs when complete randomization is not feasible in operational test. Specifically, the USAF B-52 Radar Modernization Program test design is used to present tradeoffs of different design choices and the impacts of those choices on cost, operational relevance, and analytical rigor.

    Speaker Info:

    Stuart Corbett

    AFOTEC

  • B-52 Radar Modernization Test Design Considerations

    Abstract:

    Inherent system processes, restrictions on collection, or cost may impact the practical execution of an operational test. This study presents the use of blocking and split-plot designs when complete randomization is not feasible in operational test. Specifically, the USAF B-52 Radar Modernization Program test design is used to present tradeoffs of different design choices and the impacts of those choices on cost, operational relevance, and analytical rigor.

    Speaker Info:

    Joseph Maloney

    AFOTEC

  • Bayesian Calibration and Uncertainty Analysis: A Case Study Using a 2-D CFD Turbulence Model

    Abstract:

    The growing use of simulations in the engineering design process promises to reduce the need for extensive physical testing, decreasing both development time and cost. However, as mathematician and statistician George E. P. Box said, “Essentially, all models are wrong, but some are useful.” There are many factors that determine simulation or, more broadly, model accuracy. These factors can be condensed into noise, bias, parameter uncertainty, and model form uncertainty. To counter these effects and ensure that models faithfully match reality to the extent required, simulation models must be calibrated to physical measurements. Further, the models must be validated, and their accuracy must be quantified before they can be relied on in lieu of physical testing. Bayesian calibration provides a solution for both requirements: it optimizes tuning of model parameters to improve simulation accuracy, and estimates any remaining discrepancy which is useful for model diagnosis and validation. Also, because model discrepancy is assumed to exist in this framework, it enables robust calibration even for inaccurate models. In this paper, we present a case study to investigate the potential benefits of using Bayesian calibration, sensitivity analyses, and Monte Carlo analyses for model improvement and validation. We will calibrate a 7-parameter k-𝜎 CFD turbulence model simulated in COMSOL Multiphysics®. The model predicts coefficient of lift and drag for an airfoil defined using a 6049-series airfoil parameterization from the National Advisory Committee for Aeronautics (NACA). We will calibrate model predictions using publicly available wind tunnel data from the University of Illinois Urbana-Champaign’s (UIUC) database. Bayesian model calibration requires intensive sampling of the simulation model to determine the most likely distribution of calibration parameters, which can be a large computational burden. We greatly reduce this burden by following a surrogate modeling approach, using Gaussian process emulators to mimic the CFD simulation. We train the emulator by sampling the simulation space using a Latin Hypercube (LHD) Design of Experiment (DOE), and assess the accuracy of the emulator using leave-oneout Cross Validation (CV) error. The Bayesian calibration framework involves calculating the discrepancy between simulation results and physical test results. We also use Gaussian process emulators to model this discrepancy. The discrepancy emulator will be used as a tool for model validation; characteristic trends in residual errors after calibration can indicate underlying model form errors which were not addressed via tuning the model calibration parameters. In this way, we will separate and quantify model form uncertainty and parameter uncertainty. The results of a Bayesian calibration include a posterior distribution of calibration parameter values. These distributions will be sampled using Monte Carlo methods to generate model predictions, whereby new predictions have a distribution of values which reflects the uncertainty in the tuned calibrated parameter. The resulting output distributions will be compared against physical data and the uncalibrated model to assess the effects of the calibration and discrepancy model. We will also perform global, variance based sensitivity analysis on the uncalibrated model and the calibrated models, and investigate any changes in the sensitivity indices from uncalibrated to calibrated.

    Speaker Info:

    Peter Chien

  • Building A Universal Helicopter Noise Model Using Machine Learning

    Abstract:

    Helicopters serve a number of useful roles within the community; however, community acceptance of helicopter operations is often limited by the resulting noise. Because the noise characteristics of helicopters depend strongly on the operating condition of the vehicle, effective noise abatement procedures can be developed for a particular helicopter type, but only when the noisy regions of the operating envelope are identified. NASA Langley Research Center—often in collaboration with other US Government agencies, industry, and academia—has conducted noise measurements for a wide variety of helicopter types, from light commercial helicopters to heavy military utility helicopters. While this database is expansive, it covers only a fraction of helicopter types in current commercial and military service and was measured under a limited set of ambient conditions and vehicle configurations. This talk will describe a new “universal” helicopter noise model suitable for planning helicopter noise abatement procedures. Modern machine learning techniques will be combined with the principle of nondimensionalization and applied to NASA’s helicopter noise data in order to develop a model capable of estimating the noisy operating states of any conventional helicopter under any specific ambient conditions and vehicle configurations.

    Speaker Info:

    Eric Greenwood

    Aeroacoustics Branch

  • Cases of Second-Order Split-Plot Designs

    Abstract:

    The fundamental principles of experiment design are factorization, replication, randomization, and local control of error. In many industries, however, departure from these principles is commonplace. Often in our experiments complete randomization is not feasible because the factor level settings are hard, impractical, or inconvenient to change or the resources available to execute under homogeneous conditions are limited. These restrictions in randomization lead to split-plot experiments. We are also often interested in fitting second-order models leading to second-order split-plot experiments. Although response surface methodology has grown tremendously since 1951, the lack of alternatives for second-order split-plots remains largely unexplored. The literature and textbooks offer limited examples and provide guidelines that often are too general. This deficit of information leaves practitioners ill prepared to face the many roadblocks associated with these types of designs. This presentation provides practical strategies to help practitioners in dealing with second-order split-plot and by extension, split-split-plot experiments, including an innovative approach for the construction of a response surface design referred to as second-order sub-array Cartesian product split-plot design. This new type of design, which is an alternative to other classes of split-plot designs that are currently in use in defense and industrial applications, is economical, has a low prediction variance of the regression coefficients, and low aliasing between model terms. Based on an assessment using well accepted key design evaluation criterion, second-order sub-array Cartesian product split-plot designs perform as well as historical designs that have been considered standards up to this point.

    Speaker Info:

    Luis Cortes

    MITRE

  • Challenger Challenge: Pass-Fail Thinking Increases Risk Measurably

    Abstract:

    Binomial (pass-fail) response metrics are more far more commonly used in test, requirements, quality and engineering than they need to be. In fact, there is even an engineering school of thought that they’re superior to continuous-variable metrics. This is a serious, even dangerous problem in aerospace and other industries: think the Space Shuttle Challenger accident. There are better ways. This talk will cover some examples of methods available to engineers and statisticians in common statistical software. It will not dig far into the mathematics of the methods, but will walk through where each method might be most useful and some of the pitfalls inherent in their use – including potential sources of misinterpretation and suspicion by your teammates and customers. The talk is geared toward engineers, managers and professionals in the –ilities who run into frustrations dealing with pass-fail data and thinking.

    Speaker Info:

    Ken Johnson

    Applied Statistician

    NASA Engineering and Safety Center

  • Closing Remarks

    Speaker Info:

    Robert Behler

    Director

    DOT&E

    Robert F. Behler was sworn in as Director of Operational Test and Evaluation on December 11, 2017. A Presidential appointee confirmed by the United States Senate, he serves as the senior advisor to the Secretary of Defense on operational and live fire test and evaluation of Department of Defense weapon systems. Prior to his appointment, he was the Chief Operating Officer and Deputy Director of the Carnegie Mellon University Software Engineering Institute (SEI), a Federally Funded Research and Development Center. SEI is a global leader in advancing software development and cybersecurity to solve the nation’s toughest problems through focused research, development, and transition to the broader software engineering community. Before joining the SEI, Mr. Behler was the President and CEO of SRC, Inc. (formerly the Syracuse Research Corporation). SRC is a not-for-profit research and development corporation with a forprofit manufacturing subsidiary that focuses on radar, electronic warfare and cybersecurity technologies. Prior to working at SRC, Mr. Behler was the General Manager and Senior Vice President of the MITRE Corp where he provided leadership to more than 2,500 technical staff in 65 worldwide locations. He joined MITRE from the Johns Hopkins University Applied Physics Laboratory where he was a General Manager for more than 350 scientists and engineers as they made significant contributions to critical Department of Defense (DOD) precision engagement challenges. General Behler served 31 years in the United States Air Force, retiring as a Major General in 2003. During his military career, he was the Principal Adviser for Command and Control, Intelligence, Surveillance and Reconnaissance (C21SR) to the Secretary and Chief of Staff of the U.S. Air Force (USAF). International assignments as a general officer included the Deputy Commander for NATO’s Joint Headquarters North in Stavanger, Norway. He was the Director of the Senate Liaison Office for the USAF during the 104th congress. Mr. Behler also served as the assistant for strategic systems to the Director of Operational Test and Evaluation. As an experimental test pilot, he flew more than 65 aircraft types. Operationally he flew worldwide reconnaissance missions in the fastest aircraft in the world, the SR-71 Blackbird. Mr. Behler is a Fellow of the Society of Experimental Test Pilots and an Associate Fellow of the American Institute of Aeronautics and Astronautics. He is a graduate of the University of Oklahoma where he received a B.S. and M.S. in aerospace engineering, has a MBA from Marymount University and was a National Security Fellow at the JFK School of Government at Harvard University. Mr. Behler has recently been on several National Research Council studies for the National Academy of Sciences including: “Critical Code,” “Software Producibility, Achieving Effective Acquisition of Information Technology in the Department of Defense” and “Development Planning: A Strategic Approach to Future Air Force Capabilities.”

  • Combining Human Factors Data and Models of Human Performance

    Abstract:

    As systems and missions become increasingly complex, the roles of humans throughout the mission life cycle is evolving. In areas, such as maintenance and repair, hands-on tasks still dominate, however, new technologies have changed many tasks. For example, some critical human tasks have moved from manual control to supervisory control, often of systems at great distances (e.g., remotely piloting a vehicle, or science data collection on Mars). While achieving mission success remains the key human goal, almost all human performance metrics focus on failures rather than successes. This talk will examine the role of humans in creating mission success as well as new approaches for system validation testing needed to keep up with evolving systems and human roles.

    Speaker Info:

    Cynthia Null

    Technicial Fellow for Human Factors

  • Comparing M&S Output to Live Test Data: A Missile System Case Study

    Abstract:

    In the operational testing of DoD weapons systems, modeling and simulation (M&S) is often used to supplement live test data in order to support a more complete and rigorous evaluation. Before the output of the M&S is included in reports to decision makers, it must first be thoroughly verified and validated to show that it adequately represents the real world for the purposes of the intended use. Part of the validation process should include a statistical comparison of live data to M&S output. This presentation includes an example of one such validation analysis for a tactical missile system. In this case, the goal is to validate a lethality model that predicts the likelihood of destroying a particular enemy target. Using design of experiments, along with basic analysis techniques such as the Kolmogorov-Smirnov test and Poisson regression, we can explore differences between the M&S and live data across multiple operational conditions and quantify the associated uncertainties.

    Speaker Info:

    Kelly Avery

    Reasearch Staff member

    IDA

  • Consensus Building

    Speaker Info:

    Antonio Possolo

    NIST Fellow, Chief Statistician

    National Institute of Standards and Technology.

    Antonio Possolo holds a Ph.D. in statistics from Yale University, and has been practicing the statistical arts for more than 35 years, in industry (General Electric, Boeing), academia (Princeton University, University of Washington in Seattle, Classical University of Lisboa), and government. He is committed to the development and application of probabilistic and statistical methods that contribute to advances in science and technology, and in particular to measurement science.

  • Creating Shiny Apps in R for Sharing Automated Statistical Products

    Abstract:

    Interactive web apps can be built straight from R with the R package, Shiny. hiny apps are becoming more prevalent as a way to automate statistical products and share them with others who do not know R. This tutorial will cover Shiny app syntax and how to create basic Shiny apps. Participants will create basic apps by working through several examples and explore how to change and improve these apps. Participants will leave the session with the tools to create their own complicated applications. Participants will need a computer with R, R Studio, and the shiny R package installed.

    Speaker Info:

    Randy Griffiths

    U.S. Army Evaluation Center

  • CYBER Penetration Testing and Statistical Analysis in DT&E

    Abstract:

    Reconnaissance, footprinting, and enumeration are critical steps in the CYBER penetration testing process because if these steps are not fully and extensively executed, the information available for finding a system’s vulnerabilities may be limited. During the CYBER testing process, penetration testers often find themselves doing the same initial enumeration scans over and over for each system under test. Because of this, automated scripts have been developed that take these mundane and repetitive manual steps and perform them automatically with little user input. Once automation is present in the penetration testing process, Scientific Test and Analysis Techniques (STAT) can be incorporated. By combining automation and STAT in the CYBER penetration testing process, Mr. Tim McLean at Marine Corps Tactical Systems Support Activity (MCTSSA) coined a new term called CYBERSTAT. CYBERSTAT is applying scientific test and analysis techniques to offensive CYBER penetration testing tools with an important realization that CYBERSTAT assumes the system under test is the offensive penetration test tool itself. By applying combinatorial testing techniques to the CYBER tool, the CYBER tool’s scope is expanded beyond “one at a time” uses as the combinations of the CYBER tool’s capabilities and options are explored and executed as test cases against the target system. In CYBERSTAT, the additional test cases produced by STAT can be run automatically using scripts. This talk will show how MCTSSA is preparing to use CYBERSTAT in the Developmental Test and Evaluation process of USMC Command and Control systems.

    Speaker Info:

    Timothy McLean

  • Demystifying Data Science

    Abstract:

    Data science is the new buzz word – it is being touted as the solution for everything from curing cancer to self-driving cars. How is data science related to traditional statistics methods? Is data science just another name for “big data”? In this mini-tutorial, we will begin by discussing what data science is (and is not). We will then discuss some of the key principles of data science practice and conclude by examining the classes of problems and methods that are included in data science.

    Speaker Info:

    Alyson Wilson

    Laboratory for Analytic Sciences North Carolina State Univeristy

  • Design and Analysis of Nonlinear Models for the Mars 2020 Rover

    Abstract:

    The Mars Rover 2020 team commonly faces nonlinear behavior across the test program that is often closely related to the underlying physics. Classical and newer response surface designs do well with quadratic approximations while space filling designs have proven useful for modeling & simulation of complex surfaces. This talk specifically covers fitting nonlinear equations based on engineering functional forms as well as sigmoid and exponential decay curves. We demonstrate best practices on how to design and augment nonlinear designs using the Bayesian D-Optimal Criteria. Several examples, to include drill bit degradation, illustrate the relative ease of implementation with popular software and the utility of these methods.

    Speaker Info:

    Jim Wisnowski

  • Determination of Power for Complex Experimental Designs

    Abstract:

    Power tells us the probability of rejecting the null hypothesis for an effect of a given size, and helps us select an appropriate design prior to running the experiment. The key to computing power for an effect is determining the size of the effect. We describe a general approach for sizing effects that covers a wide variety of designs including two-level factorials, multilevel factorials with categorical levels, split-plot and response surface designs. The application of power calculations to DoE is illustrated by way of several case studies. These case studies include both continuous and binomial responses. In the case of response surface designs, the fitted model is usually used for drawing contour maps, 3D surfaces, making predictions, or performing optimization. For these purposes, it is important that the model adequately represent the response behavior over the region of interest. Therefore, power to detect individual model parameters is not a good measure of what we are designing for. A discussion and pertinent examples will show attendees how the precision of the fitted surface (i.e. the precision of the predicted response) relative to the noise is a critical criterion in design selection. In this presentation, we introduce a process to determine if the design has adequate precision for DoE needs.

    Speaker Info:

    Pat Whitcomb

    Stat-Ease, Inc

  • Development of a Locking Setback Mass for Cluster Munition Applications: A UQ Case Study

    Abstract:

    The Army is currently developing a cluster munition that is required to meet functional reliability requirements of 99%. This effort focuses on the design process for a setback lock within the safe and arm (S&A) device in the submunition fuze. This lock holds the arming rotor in place, thus preventing the fuze from beginning its arming sequence until the setback lock detracts during a launch event. Therefore, the setback lock is required to not arm (remain in place) during a drop event (safety) and to arm during a launch event (reliability). In order to meet these requirements, uncertainty quantification techniques were used to evaluate setback lock designs. We designed a simulation experiment, simulated the setback lock behavior in a drop event and in a launch event, fit a model to the results, and optimized the design for safety and reliability. Currently, 8 candidate designs that meet the requirements are being manufactured, and adaptive sensitivity testing is planned to inform the surrogate models and improve their predictive capability. A final optimized design will be chosen based on the improved models, and realistic drop safety and arm reliability predictions will be obtained using Monte-Carlo simulations of the surrogate models.

    Speaker Info:

    Melissa Jablonski

    U.S. ARMY ARMAMENT RESEARCH, DEVELOPMENT & ENGINEERING CENTER

  • DOE and Test Automation for System of Systems TE

    Abstract:

    Rigorous, efficient and effective test science techniques are individually taking hold in many software centric DoD acquisition programs, both in developmental and operational test regimes. These techniques include agile software development, cybersecurity test and evaluation (T&E), design and analysis of experiments and automated software testing. Many software centric programs must also be tested together with other systems to demonstrate they can be successfully integrated into a more complex systems of systems. This presentation focuses on the two test science disciplines of designed experiments (DOE) and automated software testing (AST) and describes how they can be used effectively and leverage one another in planning for and executing a system of systems test strategy. We use the Navy’s Distributed Common Ground System as an example.

    Speaker Info:

    Jim Simpson

    JK Analytics

  • Doppler Assisted Sensor Fusion for Tracking and Exploitation

    Abstract:

    We have developed a new sensor fusion approach called Doppler Assisted Sensor Fusion (DASF), which pairs a range rate profile from one moving sensor with location accuracy with another range rate profile from another sensor with high location accuracy. This paring provides accurate identification, location, and tracking of moving emitters, with low association latency. The approach we use for data fusion is distinct from previous approaches. In the conventional approach, post detection data from the each sensor is overlaid with data from another sensor in an attempt to associate the data outputs. For the DASF approach the fusion is at the sensor level, the first sensor collects data and provides the standard identification in addition a unique emitter range rate profile. This profile is used to associate the emitter signature to a range-rate signature obtained by the geolocation sensor. The geolocation sensor then provides the desired location accuracy. We will provide results using real tracking data scenarios.

    Speaker Info:

    J. Derek Tucker

    Sandia National Laboratories

  • Evaluating Deterministic Models of Time Series by Comparison to Observations

    Abstract:

    A standard paradigm for assessing the quality of model simulations is to compare what these models produce to experimental or observational samples of what the models seek to predict. Often these comparisons are based on simple summary statistics, even when the objects of interest are time series. Here, we propose a method of evaluation through probabilities derived from tests of hypotheses that model-simulated and observed time sequences share common signals. The probabilities are based on the behavior of summary statistics of model output and observational data, over ensembles of pseudo-realizations. These are obtained by partitioning the original time sequences into signal and noise components, and using a parametric bootstrap to create pseudo-realizations of the noise. We demonstrate with an example from climate model evaluation for which this methodology was developed.

    Speaker Info:

    Amy Braverman

    Jet Propulsion Laboratory, California Institute of Technology

  • Evolving Statistical Tools

    Abstract:

    In this session, researchers from the Institute for Defense Analyses (IDA) present a collection of statistical tools designed to meet ongoing and emerging needs for planning, designing, and evaluating operational tests. We first present a suite of interactive applications hosted on test.testscience.testscience.org that are designed to address common analytic needs in the operational test community. These freely available resources include tools for constructing confidence intervals, computing statistical power, comparing distributions, and computing Bayesian reliability. Next, we discuss four dedicated software tools: JEDIS – a JMP Add-In for automating power calculations for designed experiments
    skpr – an R package for generating optimal experimental designs and easily evaluating power for normal and non-normal response variables
    ciTools – an R package for quickly and simply generating confidence intervals and quantifying uncertainty for simple and complex linear models
    nautilus – an R package for visualizing and analyzing aspects of sensor performance, such as detection range and track completeness

    Speaker Info:

    Matthew Avery

    Research Staff Member

    IDA

  • Evolving Statistical Tools

    Abstract:

    In this session, researchers from the Institute for Defense Analyses (IDA) present a collection of statistical tools designed to meet ongoing and emerging needs for planning, designing, and evaluating operational tests. We first present a suite of interactive applications hosted on test.testscience.testscience.org that are designed to address common analytic needs in the operational test community. These freely available resources include tools for constructing confidence intervals, computing statistical power, comparing distributions, and computing Bayesian reliability. Next, we discuss four dedicated software tools: JEDIS – a JMP Add-In for automating power calculations for designed experiments
    skpr – an R package for generating optimal experimental designs and easily evaluating power for normal and non-normal response variables
    ciTools – an R package for quickly and simply generating confidence intervals and quantifying uncertainty for simple and complex linear models
    nautilus – an R package for visualizing and analyzing aspects of sensor performance, such as detection range and track completeness

    Speaker Info:

    Tyler Morgan-Wall

    Research Staff Member

    IDA

  • Evolving Statistical Tools

    Abstract:

    In this session, researchers from the Institute for Defense Analyses (IDA) present a collection of statistical tools designed to meet ongoing and emerging needs for planning, designing, and evaluating operational tests. We first present a suite of interactive applications hosted on test.testscience.testscience.org that are designed to address common analytic needs in the operational test community. These freely available resources include tools for constructing confidence intervals, computing statistical power, comparing distributions, and computing Bayesian reliability. Next, we discuss four dedicated software tools: JEDIS – a JMP Add-In for automating power calculations for designed experiments
    skpr – an R package for generating optimal experimental designs and easily evaluating power for normal and non-normal response variables
    ciTools – an R package for quickly and simply generating confidence intervals and quantifying uncertainty for simple and complex linear models
    nautilus – an R package for visualizing and analyzing aspects of sensor performance, such as detection range and track completeness

    Speaker Info:

    Benjamin Ashwell

    Research Staff Member

    IDA

  • Evolving Statistical Tools

    Abstract:

    In this session, researchers from the Institute for Defense Analyses (IDA) present a collection of statistical tools designed to meet ongoing and emerging needs for planning, designing, and evaluating operational tests. We first present a suite of interactive applications hosted on test.testscience.testscience.org that are designed to address common analytic needs in the operational test community. These freely available resources include tools for constructing confidence intervals, computing statistical power, comparing distributions, and computing Bayesian reliability. Next, we discuss four dedicated software tools: JEDIS – a JMP Add-In for automating power calculations for designed experiments
    skpr – an R package for generating optimal experimental designs and easily evaluating power for normal and non-normal response variables
    ciTools – an R package for quickly and simply generating confidence intervals and quantifying uncertainty for simple and complex linear models
    nautilus – an R package for visualizing and analyzing aspects of sensor performance, such as detection range and track completeness

    Speaker Info:

    Kevin Kirshenbaum

    Research Staff Member

    IDA

  • Evolving Statistical Tools

    Abstract:

    In this session, researchers from the Institute for Defense Analyses (IDA) present a collection of statistical tools designed to meet ongoing and emerging needs for planning, designing, and evaluating operational tests. We first present a suite of interactive applications hosted on test.testscience.testscience.org that are designed to address common analytic needs in the operational test community. These freely available resources include tools for constructing confidence intervals, computing statistical power, comparing distributions, and computing Bayesian reliability. Next, we discuss four dedicated software tools: JEDIS – a JMP Add-In for automating power calculations for designed experiments
    skpr – an R package for generating optimal experimental designs and easily evaluating power for normal and non-normal response variables
    ciTools – an R package for quickly and simply generating confidence intervals and quantifying uncertainty for simple and complex linear models
    nautilus – an R package for visualizing and analyzing aspects of sensor performance, such as detection range and track completeness

    Speaker Info:

    Stephanie Lane

    Research Staff Member

    IDA

  • Evolving Statistical Tools

    Abstract:

    In this session, researchers from the Institute for Defense Analyses (IDA) present a collection of statistical tools designed to meet ongoing and emerging needs for planning, designing, and evaluating operational tests. We first present a suite of interactive applications hosted on test.testscience.testscience.org that are designed to address common analytic needs in the operational test community. These freely available resources include tools for constructing confidence intervals, computing statistical power, comparing distributions, and computing Bayesian reliability. Next, we discuss four dedicated software tools: JEDIS – a JMP Add-In for automating power calculations for designed experiments
    skpr – an R package for generating optimal experimental designs and easily evaluating power for normal and non-normal response variables
    ciTools – an R package for quickly and simply generating confidence intervals and quantifying uncertainty for simple and complex linear models
    nautilus – an R package for visualizing and analyzing aspects of sensor performance, such as detection range and track completeness

    Speaker Info:

    Jason Sheldon

    Research Staff Member

    IDA

  • Experimental Design of a Unique Force Measurement System Calibration

    Abstract:

    Aerodynamic databases for space flight vehicles rely on wind-tunnel tests utilizing precision force measurement systems (FMS). Recently, NASA’s Space Launch System (SLS) program has conducted numerous wind-tunnel testing. This presentation will focus on the calibration of a unique booster FMS through the use of design of experiments (DoE) and regression modeling. Utilization of DoE resulted in a sparse, time-efficient, design with results exceeding researcher’s expectations.

    Speaker Info:

    Ken Toro

  • Exploratory Data Analysis

    Abstract:

    After decades of seminal methodological research on the subject—accompanied by a myriad of applications—John Tukey formally created the statistical discipline known as EDA with the publication of his book “Exploratory Data Analysis” in 1977. The breadth and depth of this book was staggering, and its impact pervasive, running the gamut from today’s routine teaching of box plots in elementary schools, to the existent core philosophy of data exploration “in-and-for-itself” embedded in modern day statistics and AI/ML. As important as EDA was at its inception, it is even more essential now, with data sets increasing in both complexity and size. Given a science & engineering problem/question, and given an existing data set, we argue that the most important deliverable in the problem-solving process is data-driven insight; EDA visualization techniques lie at the core of extracting that insight. This talk has 3 parts: 1. Data Diamond: In light of the focus of DATAWorks to share essential methodologies for operational testing/evaluation, we first present a problem-solving framework (simple in form but rich in content) constructed and fine-tuned over 4 decades of scientific/engineering problem-solving: the data diamond. This data-centric structure has proved essential for systematically approaching a variety of research and operational problems, for determining if the data on hand has the capacity to answer the question at hand, and for identifying weaknesses in the total experimental effort that might compromise the rigor/correctness of derived solutions. 2. EDA Methods & Block Plot: We discuss those EDA graphical tools that have proved most important/insightful (for the presenter) in attacking the wide variety of physical/chemical/ biological/engineering/infotech problems existent in the NIST environment. Aside from some more commonly-known EDA tools in use, we discuss the virtues/applications of the block plot, which is a tool specifically designed for the “comparative” problem type–ascertaining as to whether the (yes/no) conclusion about the statistical significance of a single factor under study, is in fact robustly true over the variety of other factors (material/machine/method/operator/ environment, etc.) that co-exist in most systems. The testing of army bullet-proof vests is used as an example. 3. 10-Step DEX Sensitivity Analysis: Since the rigor/robustness of testing & evaluation conclusions are dictated not only by the choice of (post-data) analysis methodologies, but more importantly by the choice of (pre-data) experiment design methodologies, we demonstrate a recommended procedure for the important “sensitivity analysis” problem–determining what factors most affect the output of a multi-factor system. The deliverable is a ranked list (ordered by magnitude) of main effects (and interactions). Design-wise, we demonstrate the power and efficiency of orthogonal fractionated 2-level designs for this problem; analysis-wise, we present a structured 10-step graphical analysis which provides detailed data-driven insight into what “drives” the system, what optimal settings exist for the system, what prediction model exists for the system, and what direction future experiments should be to further optimize the system. The World Trade Center collapse analysis is used as an example.

    Speaker Info:

    Jim Filliben

  • High-Effective Statistical Collaboration. The Art and the Science

    Speaker Info:

    Peter Parker

    Team Lead

    Advance Measurement Systems NASA

    Dr. Parker is Team Lead for Advanced Measurement Systemsat the National Aeronautics and Space Administration’s Langley Research Center in Hampton, Virginia. He serves an Agency-wide statistical expert across all of NASA’s mission directorates of Exploration, Aeronautics, and Science to infuse statistical thinking, engineering, and methods including statistical design of experiments, response surface methodology, and measurement system characterization. His expertise is in collaboratively integrating research objectives, measurement sciences, test design, and statistical methods to produce actionable knowledge for aerospace research and development. He holds a B.S. in Mechanical Engineering, a M.S. in Applied Physics and Computer Science and a M.S. and Ph.D. in Statistics from Virginia Tech. Dr. Parker is a senior member of the American Institute for Aeronautics and Astronautics, American Society for Quality, and the American Statistical Association. Dr. Parker currently Chairs the American Society for Quality’s Publication Management Board and previously served as Editor-in-Chief of the journal Quality Engineering.

  • Illustrating the Importance of Uncertainty Quantification (UQ) in Munitions Modeling

    Abstract:

    The importance of the incorporation of Uncertainty Quantification (UQ) techniques into the design and analysis of Army systems is discussed. Relevant examples are presented where UQ would have been extremely useful. The intent of the presentation is to show the broad relevance of UQ and how, in the future, it will greatly improve the time to fielding and quality of developmental items.

    Speaker Info:

    Donald Calucci

  • Infrastructure Lifetimes

    Abstract:

    Infrastructure refers to the structures, utilities, and interconnected roadways that support the work carried out at a given facility. In the case of the Lawrence Livermore National Laboratory infrastructure is considered exclusive of scientific apparatus, safety and security systems. LLNL inherited it’s infrastructure management policy from the University of California which managed the site during LLNL’s first 5 decades. This policy is quite different from that used in commercial property management. Commercial practice weighs reliability over cost by replacing infrastructure at industry standard lifetimes. LLNL practice weighs overall lifecycle cost seeking to mitigate reliability issues through inspection. To formalize this risk management policy a careful statistical study was undertaken using 20 years of infrastructure replacement data. In this study care was taken to adjust for left truncation as-well-as right censoring. 57 distinct infrastructure class data sets were fitted using MLE to the Generalized Gamma distribution. This distribution is useful because it produces a weighted blending of discrete failure (Weibull model) and complex system failure (Lognormal model). These parametric fittings then yielded median lifetimes and conditional probabilities of failure. From conditional probabilities bounds on budget costs could be computed as expected values. This has provided a scientific basis for rational budget management as-well-as aided operations by prioritizing inspection, repair and replacement activities.

    Speaker Info:

    Erika Taketa

    Lawrence Livermore National Laboratory

  • Infrastructure Lifetimes

    Abstract:

    Infrastructure refers to the structures, utilities, and interconnected roadways that support the work carried out at a given facility. In the case of the Lawrence Livermore National Laboratory infrastructure is considered exclusive of scientific apparatus, safety and security systems. LLNL inherited it’s infrastructure management policy from the University of California which managed the site during LLNL’s first 5 decades. This policy is quite different from that used in commercial property management. Commercial practice weighs reliability over cost by replacing infrastructure at industry standard lifetimes. LLNL practice weighs overall lifecycle cost seeking to mitigate reliability issues through inspection. To formalize this risk management policy a careful statistical study was undertaken using 20 years of infrastructure replacement data. In this study care was taken to adjust for left truncation as-well-as right censoring. 57 distinct infrastructure class data sets were fitted using MLE to the Generalized Gamma distribution. This distribution is useful because it produces a weighted blending of discrete failure (Weibull model) and complex system failure (Lognormal model). These parametric fittings then yielded median lifetimes and conditional probabilities of failure. From conditional probabilities bounds on budget costs could be computed as expected values. This has provided a scientific basis for rational budget management as-well-as aided operations by prioritizing inspection, repair and replacement activities.

    Speaker Info:

    William Romine

    Lawrence Livermore National Laboratory

  • Initial Investigation into the Psychoacoustic Properties of Small Unmanned Aerial System Noise

    Abstract:

    For the past several years, researchers at NASA Langley have been engaged in a series of projects to study the degree to which existing facilities and capabilities, originally created for work on full-scale aircraft, are extensible to smaller scales – those of the small unmanned aerial systems (sUAS, also UAVs and, colloquially, `drones’) that have been showing up in the nation’s airspace. This paper follows an effort that has led to an initial human-subject psychoacoustic test regarding the annoyance generated by sUAS noise. This effort spans three phases: 1. the collection of the sounds through field recordings, 2. the formulation and execution of a psychoacoustic test using those recordings, 3. the analysis of the data from that test. The data suggests a lack of parity between the noise of the recorded sUAS and that of a set of road vehicles that were also recorded and included in the test, as measured by a set of contemporary noise metrics.

    Speaker Info:

    Andrew Chrisrian

    Structural Acoustics Branch

  • Initial Validation of the Trust of Automated System Test

    Abstract:

    Automated systems are technologies that actively select data, transform information, make decisions, and control processes. The U.S. military uses automated systems to perform search and rescue and reconnaissance missions, and to assume control of aircraft to avoid ground collision. Facilitating appropriate trust in automated systems is essential to improving the safety and performance of human-system interactions. In two studies, we developed and validated an instrument to measure trust in automated systems. In study 1, we demonstrated that the scale has a 2-factor structure and demonstrates concurrent validity. We replicated these results using an independent sample in study 2.

    Speaker Info:

    Heather Wojton

    Research Staff Member

    IDA

  • Insights, Predictions, and Actions: Descriptive Definitions of Data Science, Machine Learning, and Artificial Intelligence

    Abstract:

    The terms “Data Science”, “Machine Learning”, and “Artificial Intelligence” have become increasingly common in popular media, professional publications, and even in the language used by DoD leadership. But these terms are often not well understood, and may be used incorrectly and interchangeably. Even a textbook definition of these fields is unlikely to help with the distinction, as many definitions tend to lump everything under the umbrella of computer science or introduce unnecessary buzzwords. Leveraging a framework first proposed by David Robinson, Chief Data Scientist at DataCamp, we forgo the textbook definitions and instead focus on practical distinctions between the work of practitioners in each field, using examples relevant to the test and evaluation community where applicable.

    Speaker Info:

    Andrew Flack

    Research Staff Member

    IDA

  • Interface design for analysts in a data and analysis-rich environment

    Abstract:

    Increasingly humans will rely on the outputs of our computational partners to make sense of the complex systems in our world. To be employed, the statistical and algorithmic analysis tools that are deployed to analysts’ toolboxes must afford their proper use and interpretation. Interface design for these tool users should provide decision support appropriate for the current stage of sensemaking. Understanding how users build, test, and elaborate their mental models of complex systems can guide the development of robust interfaces.

    Speaker Info:

    Karin Butler

    Sandia National Labortories

  • Introduction of Uncertainty Quantification and Industry Challenges

    Abstract:

    Uncertainty is an inescapable reality that can be found in nearly all types of engineering analyses. It arises from sources like measurement inaccuracies, material properties, boundary and initial conditions, and modeling approximations. For example, the increasing use of numerical simulation models throughout industry promises improved design and insight at significantly lower costs and shorter timeframes than purely physical testing. However, the addition of numerical modeling has also introduced complexity and uncertainty to the process of generating actionable results. It has become not only possible, but vital to include Uncertainty Quantification (UQ) in engineering analysis. The competitive benefits of UQ include reduced development time and cost, improved designs, better understanding of risk, and quantifiable confidence in analysis results and engineering decisions. Unfortunately, there are significant cultural and technical challenges which prevent organizations from utilizing UQ methods and techniques in their engineering practice. This presentation will introduce UQ methodology and discuss the past and present strategies for addressing these challenges, making it possible to use UQ to enhance engineering processes with fewer resources and in more situations. Looking to the future, anticipated challenges will be discussed along with an outline of the path towards making UQ a common practice in engineering.

    Speaker Info:

    Peter Chien

  • Introduction to R

    Abstract:

    This course is designed to introduce participants to the R programming language and the R studio editor. R is a free and open-source software for summarizing data, creating visuals of data, and conducting statistical analyses. R can offer many advantages over programs such as Excel including faster computation, customized analyses, access to the latest statistical techniques, automation of tasks, and the ability to easily reproduce research. After completing this course, a new user should be able to: • Import/export data from/to external files.
    • Create and manipulate new variables.
    • Conduct basic statistical analyses (such as t-tests and linear regression).
    • Create basic graphs.
    • Install and use R packages
    Participants should bring a laptop for the interactive components of the course.

    Speaker Info:

    Justin Post

    North Carlina State Univeristy

  • Journey to a Data Centric Approcach for National Security

    Speaker Info:

    Marcey Hoover

    Quality Assurance Director

    Sandia National Labortories

    As Quality Assurance Director, Dr. Marcey Hoover is responsible for designing and sustaining the Laboratories’ quality assurance system and the associated technical capabilities needed for flawless execution of safe, secure, and efficient work to deliver exceptional products and services to its customers.Marcey previously served as the Senior Manager responsible for developing the science and engineering underpinning efforts to predict and influence the behavior of complex, highly interacting systems critical to our nation’s security posture. In her role as Senior Manager and Chief of Operations for Sandia’s Energy and Climate program, Marcey was responsible for strategic planning, financial management, business development, and communications. In prior positions, she managed organizations responsible for (1) quality engineering on new product development programs, (2) research and development of advanced computational techniques in the engineering sciences, and (3) development and execution of nuclear weapon testing and evaluation programs. Marcey has also led several executive office functions, including corporate- level strategic planning.Active in both the American Statistical Association and the American Society for Quality (ASQ), Marcey served two terms as the elected ASQ Statistics Division Treasurer. She was recognized as the Outstanding Alumni of the Purdue University Statistics Department in 2009 and nominated in 2011 for the YWCA Middle Rio Grande Women on the Move award. She currently serves on both the Purdue Strategic Research Advisory Council and the Statistics Alumni Advisory Board, and as a mentor for Big Brothers Big Sisters of New Mexico.Marcey received her bachelor of science degree in mathematics from Michigan State University, and her master of science and doctor of philosophy degrees in mathematical statistics from Purdue University.

  • Leveraging Anomaly Detection for Aircraft System Health Data Stability Reporting

    Abstract:

    Detecting and diagnosing aircraft system health poses a unique challenge as system complexity increases and software is further integrated. Anomaly detection algorithms systematically highlight unusual patterns in large datasets and are a promising methodology for detecting aircraft system health. The F-35A fighter aircraft is driven by complex, integrated subsystems with both software and hardware components. The F-35A operational flight program is the software that manages each subsystem within the aircraft and the flow of required information and support between subsystems. This information and support are critical to the successful operation of many subsystems. For example, the radar system supplies information to the fusion engine, without which the fusion engine would fail. ACC operational testing can be thought of as equivalent to beta testing for operational flight programs. As in other software, many faults result in minimal loss of functionality and are often unnoticed by the user. However, there are times when a software fault might result in catastrophic functionality loss (i.e., subsystem shutdown). It is critical to catch software problems that will result in catastrophic functionality loss before the flight software is fielded to the combat air forces. Subsystem failures and degradations can be categorized and quantified using simple system health data codes (e.g., degrade, fail, healthy). However, because the integrated nature of the F-35A, a subsystem degradation may be caused by another subsystem. The 59th Test and Evaluation Squadron collects autonomous system data, pilot questionnaires, and health report codes for F-35A subsystems. Originally, this information was analyzed using spreadsheet tools (i.e., Microsoft Excel). Using this method, analysts were unable to examine all subsystems or attribute cause for subsystem faults. The 59 TES is developing a new process that leverages anomaly detection algorithms to isolate flights with unusual patterns of subsystem failures and within those flights, highlight what subsystem faults are correlated with increased subsystem failures. This presentation will compare the performance of several anomaly detection algorithms (e.g., K-means, K-nearest neighbors, support vector machines) using simulated F-35A data.

    Speaker Info:

    Kyle Gartrell

  • Machine Learning to Assess Piolts' Cognitive State

    Abstract:

    The goal of the Crew State Monitoring (CSM) project is to use machine learning models trained with physiological data to predict unsafe cognitive states in pilots such as Channelized Attention (CA) and Startle/Surprise (SS). These models will be used in a real-time system that predicts a pilot’s mental state every second, a tool that can be used to help pilots recognize and recover from these mental states. Pilots wore different sensors that collected physiological data such as a 20-channel electroencephalography (EEG), respiration, and galvanic skin response (GSR). Pilots performed non-flight benchmark tasks designed to induce these states, and a flight simulation with “surprising” or “channelizing” events. The team created a pipeline to generate pilot-dependent models that trains on benchmark data, tune on a portion of a flight task, and be deployed onto the remaining flight task. The model is a series of anomaly-detection based ensembles, where each ensemble focuses on predicting a single state. Ensembles were comprised of several anomaly detectors such as One Class SVMs, each focusing on a different subset of sensor data. We will discuss the performance of these models, as well as the ongoing research generalizing models across pilots and improving accuracy.

    Speaker Info:

    Tina Heinich

    Computer Engineer, OCIO Data Science Team

    AST, Data Systems

  • Method for Evaluating the Quality of Cybersecurity Defenses

    Abstract:

    This presentation discusses a methodology to use knowledge of cyber attacks and defender responses from operational assessments to gain insights into the defensive posture and to inform a strategy for improvement. The concept is to use the attack thread as the instrument to probe and measure the detection capability of the cyber defenses. The data enable a logistic regression approach to provide a quantitative basis for the analysis and recommendations.

    Speaker Info:

    Shawn Whetstone

    Research Staff Member

    IDA

  • Metrics to Characterize Temporal patterns in Lifespans of Artifacts

    Abstract:

    Over the past decade, uncertainty quantification has become an integral part of engineering design and analysis. Both NASA and the DoD are making significant investments to advance the science of uncertainty quantification, increase the knowledge base, and strategically expanding its use. This increased use of uncertainty based results improves investment strategies and decision making. However, in complex systems, many challenges still exist when dealing with uncertainty in cases that have sparse, unreliable, poorly understood, and/or conflicting data. Often times, assumptions are made regarding the statistical nature of data that may not be well grounded and the impact of those assumptions is not well understood, which can lead to ill-informed decision making. This talk will focus on the quantification of uncertainty when both well characterized, aleatory, and not well known, epistemic, uncertainty sources exist. Particular focus is given to the treatment and management of epistemic uncertainty. A summary of non-probabilistic methods will be presented along with the propagation of mixed uncertainty and optimization under uncertainty. A discussion of decision making under uncertainty is also included to illustrate the use of uncertainty quantification.

    Speaker Info:

    Soumyo Moitra

    Software Engineering Institute Carnegie Mellon Univeristy

  • Mitigating Pilot Disorientation with Synthetic Vision Displays

    Abstract:

    Loss of control in flight has been a leading cause of accidents and incidents in commercial aviation worldwide. The Commercial Aviation Safety Team (CAST) requested studies on virtual day-visual meteorological conditions displays, such as synthetic vision, in order to combat loss of control. Over the last four years NASA has conducted a series of experiments evaluating the efficacy of synthetic vision displays for increased spatial awareness. Commercial pilots with various levels of experience from both domestic and international airlines were used as subjects. This presentation describes the synthetic vision research and how pilot subjects affected experiment design and statistical analyses.

    Speaker Info:

    Kathryn Ballard

    NASA

  • Model credibility in statistical reliability analysis with limited data

    Abstract:

    Due to financial and production constraints, it has become increasingly common for analysts and test planners in defense applications to find themselves working with smaller amounts of data than seen in industry. These same analysts are also being asked to make strong statistical statements based on this limited data. For example, a common goal is ‘demonstrating’ a high reliability requirement with sparse data. In such situations, strong modeling assumptions are often used to achieve the desired precision. Such model-driven actions contain levels of risk that customers may not be aware of and may be too high to be considered acceptable. There is a need to articulate and mitigate risk associated with model form error in statistical reliability analysis. In this work, we review different views on model credibility from the statistical literature and discuss how these notions of credibility apply in data-limited settings. Specifically, we consider two different perspectives on model credibility: (1) data-driven credibility metrics for model fit, (2) credibility assessments based on consistency of analysis results with prior beliefs. We explain how these notions of credibility can be used to drive test planning and recommend an approach to presenting analysis results in data-limited settings. We apply this approach to two case studies from reliability analysis: Weibull analysis and Neyer-D optimal test plans.

    Speaker Info:

    Caleb King

    Sandia National Laboratories

  • Modern Response Surface Methods & Computer Experiments

    Abstract:

    This course details statistical techniques at the interface between mathematical modeling via computer simulation, computer model meta-modeling (i.e., emulation/surrogate modeling), calibration of computer models to data from field experiments, and model-based sequential design and optimization under uncertainty (a.k.a. Bayesian Optimization). The treatment will include some of the historical methodology in the literature, and canonical examples, but will primarily concentrate on modern statistical methods, computation and implementation, as well as modern application/data type and size. The course will return at several junctures to real-word experiments coming from the physical and engineering sciences, such as studying the aeronautical dynamics of a rocket booster re-entering the atmosphere; modeling the drag on satellites in orbit; designing a hydrological remediation scheme for water sources threatened by underground contaminants; studying the formation of super-nova via radiative shock hydrodynamics. The course material will emphasize deriving and implementing methods over proving theoretical properties.

    Speaker Info:

    Robert Gramacy

    Virginia Polytechnic Institute and State University

  • NASA AERONAUTICS

    Speaker Info:

    Bob Pearce

    Deputy Associate Administrator

    Strategy, Aeronautics Research Mission DirectorateNASA

    Mr. Pearce is responsible for leading aeronautics research mission strategic planning to guide the conduct of the agency’s aeronautics research and technology programs, as well as leading ARMD portfolio planning and assessments, mission directorate budget development and approval processes, and review and evaluation of all of NASA’s aeronautics research mission programs for strategic progress and relevance. Pearce is also currently acting director for ARMD’s Airspace Operations and Safety Program, and responsible for the overall planning, management and evaluation of foundational air traffic management and operational safety research. Previously he was director for strategy, architecture and analysis for ARMD, responsible for establishing a strategic systems analysis capability focused on understanding the system-level impacts of NASA’s programs, the potential for integrated solutions, and the development of high-leverage options for new investment and partnership. From 2003 until July 2010, Pearce was the deputy director of the FAA-led Next Generation Air Transportation System (NextGen) Joint Planning and Development Office (JPDO). The JPDO was an interagency office tasked with developing and facilitating the implementation of a national plan to transform the air transportation system to meet the long-term transportation needs of the nation. Prior to the JPDO, Pearce held various strategic and program management positions within NASA. In the mid-1990s he led the development of key national policy documents including the National Science and Technology Council’s “Goals for a National Partnership in Aeronautics Research and Technology” and the “Transportation Science and Technology Strategy.” These two documents provided a substantial basis for NASA’s expanded investment in aviation safety and airspace systems. He began his career as a design engineer at the Grumman Corporation, working on such projects as the Navy’s F-14 Tomcat fighter and DARPA’s X-29 Forward Swept Wing Demonstrator. Pearce also has experience from the Department of Transportation’s Volpe National Transportation Systems Center where he made contributions in the area of advanced concepts for intercity transportation systems. Pearce has received NASA’s Exceptional Service Medal for sustained excellence in planning and advocating innovative aeronautics programs in conjunction with the White House and other federal agencies. He received NASA’s Exceptional Achievement Medal for outstanding leadership of the JPDO in support of the transformation of the nation’s air transportation system. Pearce has also received NASA’s Cooperative External Achievement Award and several Exceptional Performance and Group Achievement Awards.He earned a bachelor’s of science degree in mechanical and aerospace engineering from Syracuse University, and a master’s of science degree in technology and policy from the Massachusetts Institute of Technology.

  • NASA’s Human Exploration Research Analog (HERA): An analog mission for isolation, confinement, and remote conditions in space exploration scenarios

    Abstract:

    Shelley Cazares served as a crewmember of the 14th mission of NASA’s Human Exploration Research Analog (HERA). In August 2017, Dr. Cazares and her three crewmates were enclosed in an approximately 600-sq. ft. simulated spacecraft for an anticipated 45 days of confined isolation at Johnson Space Center, Houston, TX. In preparation for long-duration missions to Mars in the 2030s and beyond, NASA seeks to understand what types of diets, habitats, and activities can keep astronauts healthy and happy on deep space voyages. To collect this information, NASA is conducting several analog missions simulating the conditions astronauts face in space. HERA is a set of experiments to investigate the effects of isolation, confinement, and remote conditions in space exploration scenarios. Dr. Cazares will discuss the application procedure, the pre-mission training process, the life and times inside the habitat during the mission, and her crew’s emergency evacuation from the habitat due to the risk of rising floodwaters in Hurricane Harvey.

    Speaker Info:

    Shelley Cazares

    NASA

  • Opening Keynote

    Speaker Info:

    David Chu

    President

    IDA

    David Chu serves as President of the Institute for Defense Analyses. IDA is a non-profit corporation operating in the public interest. Its three federally funded research and development centers provide objective analyses of national security issues and related national challenges, particularly those requiring extraordinary scientific and technical expertise. As president, Dr. Chu directs the activities of more than 1,000 scientists and technologists. Together, they conduct and support research requested by federal agencies involved in advancing national security and advising on science and technology issues. Dr. Chu served in the Department of Defense as Under Secretary of Defense for Personnel and Readiness from 2001-2009, and earlier as Assistant Secretary of Defense and Director for Program Analysis and Evaluation from 1981-1993. From 1978-1981 he was the Assistant Director of the Congressional Budget Office for National Security and International Affairs. Dr. Chu served in the U. S. Army from 1968-1970. He was an economist with the RAND Corporation from 1970-1978, director of RAND’s Washington Office from 1994-1998, and vice president for its Army Research Division from 1998-2001. He earned a bachelor of arts in economics and mathematics, and his doctorate in economics, from Yale University. Dr. Chu is a member of the Defense Science Board and a Fellow of the National Academy of Public Administration. He is a recipient of the Department of Defense Medal for Distinguished Public Service with Gold Palm, the Department of Veterans Affairs Meritorious Service Award, the Department of the Army Distinguished Civilian Service Award, the Department of the Navy Distinguished Public Service Award, and the National Academy of Public Administration’s National Public Service Award.

  • Opening Keynote

    Speaker Info:

    Dave Duma

    Assistant Director

    Operational Test and Evaluation

    Mr. Duma is the Acting Director, Operational Test and Evaluation as of January 20, 2017. Mr. Duma was appointed as the Principal Deputy Director, Operational Test and Evaluation in January 2002. In this capacity he is responsible for all functional areas assigned to the office. He participates in the formulation, development, advocacy, and oversight of policies of the Secretary of Defense and in the development and implementation of test and test resource programs. He oversees the planning, conduct, analysis, evaluation, and reporting of operational and live fire testing. He serves as the Appropriation Director and Comptroller for the Operational Test and Evaluation, Defense Appropriation and coordinates all Planning, Programming, and Budgeting Execution matters. He previously served as Acting Director, Operational Test and Evaluation from February 2005 to July 2007 and again from May 2009 to September 2009.Mr. Duma also served as the Acting Deputy Director, Operational Test and Evaluation from January 1992 to June 1994. In this capacity he was responsible for oversight of the planning, conduct, analysis, and reporting of operational test and evaluation for all major conventional weapons systems in the Department of Defense. He supervised the development of evaluation plans and test program strategies, observed the conduct of operational test events, evaluated operational field tests of all armed services and submitted final reports for Congress.Mr. Duma returned to government service from the commercial sector. In private industry he worked a variety of projects involving test and evaluation; requirements generation; command, control, communications, intelligence, surveillance and reconnaissance; modeling and simulation; and software development.Mr. Duma has 30 years of naval experience during which he was designated as a Joint Service Officer. He served as the Director, Test and Evaluation Warfare Systems for the Chief of Naval Operations, the Deputy Commander, Submarine Squadron TEN, and he commanded the nuclear powered submarine USS SCAMP (SSN 588).Mr. Duma holds Masters of Science degrees in National Security and Strategic Studies and in Management. He holds a Bachelor of Science degree in Nuclear Engineering. He received the U.S. Presidential Executive Rank Award on two occasions; in 2008, the Meritorious Executive Award and in 2015, the Distinguished Executive Rank Award. He is a member of the International Test and Evaluation Association.

  • Operational Evaluation of a Flight-deck Software Application

    Abstract:

    Traffic Aware Strategic Aircrew Requests (TASAR) is a NASA-developed operational concept for flight efficiency and route optimization for the near-term airline flight deck. TASAR provides the aircrew with a cockpit automation tool that leverages a growing number of information sources on the flight deck to make fuel- and time-saving route optimization recommendations while in route. In partnership with a commercial airline, a research prototype software that implements TASAR has been installed on three aircraft to enable the evaluation of this software in operational use. During the flight trials, data are being collected to quantify operational performance, which will enable NASA to improve algorithms and enhance functionality in the software based on real-world user experience. This presentation highlights statistical challenges and discusses lessons learned during the initial stages of the operational evaluation.

    Speaker Info:

    Sara Wilson

    NASA

  • Operational Testing of Cyber Systems

    Abstract:

    Previous operational tests that included cybersecurity focused in on vulnerabilities discovered at the component level and ad hoc system level exploitation attacks during adversarial assessments. The subsequent evaluation of vulnerabilities and attacks as it relates to the overall resilience of the system were largely qualitative in nature and chalk full of human centered biases making them unreliable estimators of system resilience in a cyber contested environment. To mitigate these shortcomings this tutorial will present an approach for more structured operational tests based on common search algorithms; and, more rigorous quantitative measurements and analysis based on actuarial methods for estimating resilience.

    Speaker Info:

    Paul Johnson

    MCOTEA

  • Optimizing for Mission Success in Highly Uncertain Scenarios

    Abstract:

    Optimization under uncertainty increases the complexity of a problem as well as the computing resources required to solve it. As the amount of uncertainty is increased, these difficulties are exacerbated. However, when optimizing for mission-level objectives, rather than component- or system-level objectives, an increase in uncertainty is inevitable. Previous research has found methods to perform optimization under uncertainty, such as robust design optimization or reliability-based design optimization. These are generally executed at a product component quality level, to minimize variability and stay within design tolerances but are not tailored to capture the high amount of variability in a mission-level problem. . In this presentation, an approach for formulating and solving highly stochastic mission-level optimization problems is described. A case study is shown using an unmanned aerial system (UAS) on a search mission while an “enemy” UAS attempts to interfere. This simulation, modeled in the Unity Game Engine, has highly stochastic outputs, where the time to mission success varies by multiple orders of magnitude, but the ultimate goal is a binary output representing mission success or failure. The results demonstrate the capabilities and challenges of optimization in these types of mission scenarios.

    Speaker Info:

    Brian Chell

  • Overview of Design of Experiments

    Abstract:

    Well-designed experiments are a powerful tool for developing and validating cause and effect relationships when evaluating and improving product and process performance and for operational testing of complex systems. Designed experiments are the only efficient way to verify the impact of changes in product or process factors on actual performance.The course outcomes are: • Ability to plan and execute experiments. • Ability to collect data and analyze and interpret these data to provide the knowledge required for business success. • Knowledge of a wide range of modern experimental tools that enable practitioners to customize their experiment to meet practical resource constraintsThe topics covered during the course are: • Fundamentals of DOX – randomization, replication, and blocking. • Planning for a designed experiment – type and size of design, factor selection, levels and ranges, response measurement, sample sizes. • Graphical and statistical approaches to DOX analysis. • Blocking to eliminate the impact of nuisance factors on experimental results. • Factorial experiments and interactions. • Fractional factorials – efficient and effective use of experimental resources. • Optimal designs. • Response surface methods. • A demonstration illustrating and comparing the effectiveness of different experimental design strategies. This course is focused on helping you and your organization make the most effective utilization of DOX. Software usage is fully integrated into the course.

    Speaker Info:

    Bradley Jones

    Distinguished Research Fellow

    JMP Division/SAS

  • Overview of Design of Experiments

    Abstract:

    Well-designed experiments are a powerful tool for developing and validating cause and effect relationships when evaluating and improving product and process performance and for operational testing of complex systems. Designed experiments are the only efficient way to verify the impact of changes in product or process factors on actual performance.The course outcomes are: • Ability to plan and execute experiments. • Ability to collect data and analyze and interpret these data to provide the knowledge required for business success. • Knowledge of a wide range of modern experimental tools that enable practitioners to customize their experiment to meet practical resource constraintsThe topics covered during the course are: • Fundamentals of DOX – randomization, replication, and blocking. • Planning for a designed experiment – type and size of design, factor selection, levels and ranges, response measurement, sample sizes. • Graphical and statistical approaches to DOX analysis. • Blocking to eliminate the impact of nuisance factors on experimental results. • Factorial experiments and interactions. • Fractional factorials – efficient and effective use of experimental resources. • Optimal designs. • Response surface methods. • A demonstration illustrating and comparing the effectiveness of different experimental design strategies. This course is focused on helping you and your organization make the most effective utilization of DOX. Software usage is fully integrated into the course.

    Speaker Info:

    Doug Motgomery

    Regents’ Professor of Industrial Engineering and Statistics

    ASU Foundation Professor of Engineering Arizona State University

  • Quality Control and Statistical Process Control

    Abstract:

    formance. On the other hand, the need to draw causal inference about factors not under the researchers’ control, calls for a specialized set of techniques developed for observational studies. The persuasiveness and adequacy of such an analysis depends in part on the ability to recover metrics from the data that would approximate those of an experiment. This tutorial will provide a brief overview of the common problems encountered with lack of randomization, as well as suggested approaches for rigorous analysis of observational studies.

    Speaker Info:

    Jane Pinelis

    Research Staff Member

    IDA

  • Reliability Fundamentals and Analysus Lessons Learned

    Abstract:

    Although reliability analysis is a part of Operational Test and Evaluation, it is uncommon for analysts to have a background in reliability theory or experience applying it. This presentation highlights some lessons learned from reliability analysis conducted on several AFOTEC test programs. Topics include issues related to censored data, limitations and alternatives to using the exponential distribution, and failure rate analysis using test data.

    Speaker Info:

    Dan Telford

    AFOTEC

  • Robust Parameter Design

    Abstract:

    The Japanese industrial engineer, Taguchi, introduced the concept of robust parameter design in the 1950s. Since then, it has seen widespread, successful application in automotive and aerospace applications. Engineers have applied this methodology both to physical and computer experimentation. This tutorial provides a basic introduction to these concepts, with an emphasis on how robust parameter design provides a proper basis for the evaluation and confirmation of system performance. The goal is to show how to modify basic robust parameter designs to meet the specific needs of the weapons testing community.This tutorial targets systems engineers, analysts, and program managers who must evaluate and confirm complex system performance. The tutorial illustrates new ideas that are useful for the evaluation and the confirmation of the performance for such systems.What students will learn: • The basic concepts underlying robust parameter design
    • The importance of the statistical concept of interaction to robust parameter design
    • How statistical interaction is the key concept underlying much of the evaluation and confirmation of system performance, particularly of weapon systems

    Speaker Info:

    Geoff Vining

  • Sage III Seu Statistical Analysis Model

    Abstract:

    The Stratospheric Aerosol and Gas Experiment (SAGE III) aboard the International Space Station (ISS) was experiencing a series of anomalies called Single Event Upsets (SEUs). Booz Allen Hamilton was tasked with conducting a statistical analysis to model the incidence of SEUs in the SAGE III equipment aboard the ISS. The team identified factors correlated with SEU incidences, set up a model to track degradation of Sage III, and showed current and past probabilities as a function of the space environment. The space environment of SAGE III was studied to identify possible causes of SEUs. The analysis revealed variables most correlated with the anomalies, including solar wind strength, solar and geomagnetic field behavior, and location/orientation of the ISS, sun, and moon. The data was gathered from a variety of sources including US government agencies, foreign and domestic academic centers, and state-of-the-art simulation algorithms and programs. Logistic regression was used to analyze SEUs and gain preliminary results. The data was divided into small time intervals to approximate independence and allow logistic regression. Due to the rarity of events the initial model results were based on few SEUs. The team set up a Graphical User Interface (GUI) program to automatically analyze new data as it became available to the SAGE III team. A GUI was built to allow the addition of more data over the life of the SAGE III mission. As more SEU incidents occur and are entered into the model, its predictive power will grow significantly. The GUI enables the user to easily rerun the regression analysis and visualize its results to inform operational decision making.

    Speaker Info:

    Ray McCollum

    Booz Allen Hamilton

  • Screening Experiments with Partial Replication

    Abstract:

    Small screening designs are frequently used in the initial stages of experimentation with the goal of identifying important main effects as well as to gain insight on potentially important two-factor interactions. Commonly utilized experimental designs for screening (e.g., resolution III or IV two-level fractional factorials, Plackett-Burman designs, etc.) are unreplicated and as such, provide no unbiased estimate of experimental error. However, if statistical inference is considered an integral part of the experimental analysis, one view is that inferential procedures should be performed using the unbiased pure error estimate. As full replication of an experiment may be quite costly, partial replication offers an alternative for obtaining a model independent error estimate. Gilmour and Trinca (2012, Applied Statistics) introduce criteria for the design of optimal experiments for statistical inference (providing for the optimal selection of replicated design points). We begin with an extension of their work by proposing a Bayesian criterion for the construction of partially replicated screening designs with less dependence on an assumed model. We then consider the use of the proposed criterion within the context of multi-criteria design selection where estimation and protection against model misspecification are considered. Insights for analysis and model selection in light of partial replication will be provided.

    Speaker Info:

    David Edwards

    Virginia Commonwealth University

  • Sound level recommendations for quiet sonic boom dose-response community surveys

    Abstract:

    The current ban on commercial overland supersonic flight may be replaced by a noise limit on sonic boom sound level. NASA is establishing a quiet sonic boom database to guide the new regulation. The database will consist of multiple community surveys used to model the dose-response relationship between sonic boom sound levels and human annoyance. There are multiple candidate dose-response modeling techniques, such as classical logistic regression and multilevel modeling. To plan for these community surveys, recommendations for data collection will be developed from pilot community test data. Two important aspects are selecting sample size and sound level range. Selection of sample size must be strategic as large sample sizes are costly whereas small sample sizes may result in more uncertainty in the estimates. Likewise, there are trade-offs associated with selection of the sound level range. If the sound level range includes excessively high sound levels, the public may misunderstand the potential impact of quiet sonic booms, resulting in a negative backlash against a promising technological advancement. Conversely, a narrow range that includes only low sound levels might exclude the eventual noise limit. This presentation will focus on recommendations for sound level range given the expected shape of the dose-response curve.

    Speaker Info:

    Jasme Lee

    North Carlina State Univeristy

  • Space-Filling Designs for Robustness Experiments

    Abstract:

    To identify the robust settings of the control factors, it is very important to understand how they interact with the noise factors. In this article, we propose space-filling designs for computer experiments that are more capable of accurately estimating the control-by-noise interactions. Moreover, the existing space-filling designs focus on uniformly distributing the points in the design space, which are not suitable for noise factors because they usually follow non-uniform distributions such as normal distribution. This would suggest placing more points in the regions with high probability mass. However, noise factors also tend to have a smooth relationship with the response and therefore, placing more points towards the tails of the distribution is also useful for accurately estimating the relationship. These two opposing effects make the experimental design methodology a challenging problem. We propose optimal and computationally efficient solutions to this problem and demonstrate their advantages using simulated examples and a real industry example involving a manufacturing packing line.

    Speaker Info:

    Roshan Vengazhiyil

  • Statistics Boot Camp

    Abstract:

    In the test community, we frequently use statistics to extract meaning from data. These inferences may be drawn with respect to topics ranging from system performance to human factors. In this mini-tutorial, we will begin by discussing the use of descriptive and inferential statistics. We will continue by discussing commonly used parametric and nonparametric statistics within the defense community, ranging from comparisons of distributions to comparisons of means. We will conclude with a brief discussion of how to present your statistical findings graphically for maximum impact.

    Speaker Info:

    Stephanie Lane

    Research Staff Member

    IDA

  • Survey Construction and Analysis

    Abstract:

    In this course, we introduce the main concepts of the survey methodology process – from survey sampling design to analyzing the data obtained from complex survey designs. The course topics include: 1. Introduction to the Survey Process
    2. R Tools
    3. Sampling Designs – Simple Random Sampling, Cluster Sampling, Stratified Sampling, and more
    4. Weighting and Variance Estimation
    5. Exploratory Data Analysis
    6. Complex Survey Analysis
    We use a combination of lectures and hands-on exercises using R. Students are expected to have R and associated packages installed on their computers. We will send a list of required packages before the course. We also use data from Department of Defense surveys, where appropriate.

    Speaker Info:

    Wendy Matrinez

    Bureau of Labor and Statistics

  • Survey Construction and Analysis

    Abstract:

    In this course, we introduce the main concepts of the survey methodology process – from survey sampling design to analyzing the data obtained from complex survey designs. The course topics include: 1. Introduction to the Survey Process
    2. R Tools
    3. Sampling Designs – Simple Random Sampling, Cluster Sampling, Stratified Sampling, and more
    4. Weighting and Variance Estimation
    5. Exploratory Data Analysis
    6. Complex Survey Analysis
    We use a combination of lectures and hands-on exercises using R. Students are expected to have R and associated packages installed on their computers. We will send a list of required packages before the course. We also use data from Department of Defense surveys, where appropriate.

    Speaker Info:

    MoonJung Cho

    Bureau of Labor and Statistics

  • System Level Uncertainty Quanification for Low-Boom Supersonic Flight Vehicles

    Abstract:

    Under current FAA regulations, civilian aircraft may not operate at supersonic speeds over land. However, over the past few decades, there have been renewed efforts to invest in technologies to mitigate sonic boom from supersonic aircraft through advances in both vehicle design and sonic boom prediction. NASA has heavily invested in tools and technologies to enable commercial supersonic flight and currently has several technical challenges related to sonic boom reduction. One specific technical challenge relates to the development of tools and methods to predict, under uncertainty, the noise on the ground generated by an aircraft flying at supersonic speeds. In attempting to predict ground noise, many factors from multiple disciplines must be considered. Further, classification and treatment of uncertainties in coupled systems, mutlifidelity simulations, experimental data, and community responses are all concerns in system level analysis of sonic boom prediction. This presentation will introduce the various methodologies and techniques utilized for uncertainty quantification with a focus on the build up to system level analysis. An overview of recent research activities and case studies investigating the impact of various disciplines and factors on variance in ground noise will be discussed.

    Speaker Info:

    Ben Phillips

  • Test Planning for Observational Studies using Poisson Process Modeling

    Abstract:

    Operational Test (OT) is occasionally conducted after a system is already fielded. Unlike a traditional test based on Design of Experiments (DOE) principles, it is often not possible to vary the levels of the factors of interest. Instead the test is of an observational nature. Test planning for observational studies involves choosing where, when, and how long to evaluate a system in order to observe the possible combinations of factor levels that define the battlespace. This presentation discusses a test-planning method that uses Poisson process modeling as a way to estimate the length of time required to observe factor level combinations in the operational environment.

    Speaker Info:

    Brian Stone

    AFOTEC

  • Testing and Analytical Challenges on the Path to Hypersonic Flight

    Speaker Info:

    Mark Lewis

    Director

    IDA-STPI

    Dr. Mark J. Lewis is the Director of IDA’s Science and Technology Policy Institute, a federally funded research and development center. He leads an organization that provides analysis of national and international science and technology issues to the Office of Science and Technology Policy in the White House, as well as other Federal agencies including the National Institutes of Health, the National Science Foundation, NASA, the Department of Energy, Homeland Security, and the Federal Aviation Administration, among others.Prior to taking charge of STPI, Dr. Lewis served as the Willis Young, Jr. Professor and Chair of the Department of Aerospace Engineering at the University of Maryland. A faculty member at Maryland for 24 years, Dr. Lewis taught and conducted basic and applied research. From 2004 to 2008, Dr. Lewis was the Chief Scientist of the U.S. Air Force. From 2010 to 2011, he was President of the American Institute of Aeronautics and Astronautics (AIAA).Dr. Lewis attended the Massachusetts Institute of Technology, where he received a Bachelor of Science degree in aeronautics and astronautics, Bachelor of Science degree in earth and planetary science (1984), Master of Science (1985), and Doctor of Science (1988) in aeronautics and astronautics.Dr. Lewis is the author of more than 300 technical publications and has been an adviser to more than 70 graduate students. Dr. Lewis has also served on various advisory boards for NASA, the Air Force, and DoD, including two terms on the Air Force Scientific Advisory Board, the NASA Advisory Council, and the Aeronautics and Space Engineering Board of the National Academies.Dr. Lewis’s awards include the Meritorious Civilian Service Award and Exceptional Civilian Service Award; he was also recognized as the 1994 AIAA National Capital Young Scientist/Engineer of the Year, IECEC/ AIAA Lifetime Achievement Award, and is an Aviation Week and Space Technology Laureate (2007).

  • Testing Autonomous Systems

    Abstract:

    Autonomous robotic systems (hereafter referred to simply as autonomous systems) have attracted interest in recent years as capabilities improve to operate in unstructured, dynamic environments without continuous human guidance. Acquisition of autonomous systems potentially decrease personnel costs and provide a capability to operate in dirty, dull, or dangerous mission segments or achieve greater operational performance. Autonomy enables a particular action of a system to be automatic or, within programmed boundaries, self-governing. For our purposes, autonomy is defined as the system having a set of intelligence-based capabilities (i.e., learned behaviors) that allows it to respond to situations that were not pre-programmed or anticipated (i.e. learning-based responses) prior to system deployment. Autonomous systems have a degree of self-governance and self-directed behavior, possibly with a human’s proxy for decisions. Because of these intelligence-based capabilities, autonomous systems pose new challenges in conducting test and evaluation that assures adequate performance, safety, and cybersecurity outcomes. We propose an autonomous systems architecture concept and map the elements of a decision theoretic view of a generic decision problem to the components of this architecture. These models offer a foundation for developing a decision-based, common framework for autonomous systems. We also identify some of the various challenges faced by the Department of Defense (DoD) test and evaluation community in assuring the behavior of autonomous systems as well as test and evaluation requirements, processes, and methods needed to address these challenges.

    Speaker Info:

    Darryl Ahner

    Director

    AFIT

  • The Development and Execution of Split-Plot Designs In Navy Operational Test and Evaluation: A Practitioner’s Perspective

    Abstract:

    Randomization is one of the basic principles of experimental design and the associated statistical methods. In Navy operational testing, complete randomization is often not possible due to scheduling or execution constraints. Given these constraints, operational test designers often utilize split-plot designs to accommodate the hard-to-change nature of various factors of interest. Several case studies will be presented to provide insight into the challenges associated with Navy operational test design and execution.

    Speaker Info:

    Stargel Doane

  • The Effect of Extremes in Small Sample Size on Simple Mixed Models: A Comparison of Level-1 and Level-2 Size

    Abstract:

    Mixed models are ideally suited to analyzing nested data from within-persons designs, designs that are advantageous in applied research. Mixed models have the advantage of enabling modeling of random effects, facilitating an accounting of the intra-person variation captured by multiple observations of the same participants and suggesting further lines of control to the researcher. However, the sampling requirements for mixed models are prohibitive to other areas which could greatly benefit from them. This simulation study examines the impact of small sample sizes in both levels of the model on the fixed effect bias, type I error, and power of a simple mixed model analysis. Despite the need for adjustments to control for type I error inflation, findings indicate that smaller samples than previously recognized can be used for mixed models under certain conditions prevalent in applied research. Examination of the marginal benefit of increases in sample subject and observation size provides applied researchers with guidance for developing mixed-model repeated measures designs that maximize power.

    Speaker Info:

    Kristina Carter

    Research Staff Member

    IDA

  • The Use of DOE vs OFAT in the Calibration of AEDC Wind Tunnels

    Abstract:

    The use of statistically rigorous methods to support testing at Arnold Engineering Development Complex (AEDC) has been an area of focus in recent years. As part of this effort, the use of Design of Experiments (DOE) has been introduced for calibration of AEDC wind tunnels. Historical calibration efforts used One- Factor-at-a-Time (OFAT) test matrices, with a concentration on conditions of interest to test customers. With the introduction of DOE, the number of test points collected during the calibration decreased, and were not necessary located at historical calibration points. To validate the use of DOE for calibration purposes, the 4-ft Aerodynamic Wind Tunnel 4T was calibrated using both DOE and OFAT methods. The results from the OFAT calibration were compared to model developed from the DOE data points and it was determined that the DOE model sufficiently captured the tunnel behavior within the desired levels of uncertainty. DOE analysis also showed that within Tunnel 4T, systematic errors are insignificant as indicated by agreement noted between the two methods. Based on the results of this calibration, a decision was made to apply DOE methods to future tunnel calibrations, as appropriate. The development of the DOE matrix in Tunnel 4T required the consideration of operational limitations, measurement uncertainties, and differing tunnel behavior over the performance map. Traditional OFAT methods allowed tunnel operators to set conditions efficiently while minimizing time consuming plant configuration changes. DOE methods, however, require the use of randomization which had the potential to add significant operation time to the calibration. Additionally, certain tunnel parameters, such as variable porosity, are only of interest in a specific region of the performance map. In addition to operational concerns, measurement uncertainty was an important consideration for the DOE matrix. At low tunnel total pressures, the uncertainty in the Mach number measurements increase significantly. Aside from introducing non-constant variance into the calibration model, the large uncertainties at low pressures can increase overall uncertainty in the calibration in high pressure regions where the uncertainty would otherwise be lower. At high pressures and transonic Mach numbers, low Mach number uncertainties are required to meet drag count uncertainty requirements. To satisfy both the operational and calibration requirements, the DOE matrix was divided into multiple independent models over the tunnel performance map. Following the Tunnel 4T calibration, AEDC calibrated the Propulsion Wind Tunnel 16T, Hypersonic Wind Tunnels B and C, and the National Full-Scale Aerodynamics Complex (NFAC). DOE techniques were successfully applied to the calibration of Tunnel B and NFAC, while a combination of DOE and OFAT test methods were used in Tunnel 16T because of operational and uncertainty requirements over a portion of the performance map. Tunnel C was calibrated using OFAT because of operational constraints. The cost of calibrating these tunnels has not been significantly reduced through the use of DOE, but the characterization of test condition uncertainties is firmly based in statistical methods.

    Speaker Info:

    Rebecca Rought

    AEDC/TSTA

  • Uncertainty Quantification

    Abstract:

    We increasingly rely on mathematical and statistical models to predict phenomena ranging from nuclear power plant design to profits made in financial markets. When assessing the feasibility of these predictions, it is critical to quantify uncertainties associated with the models, inputs to the models, and data used to calibrate the models. The synthesis of statistical and mathematical techniques, which can be used to quantify input and response uncertainties for simulation codes that can take hours to days to run, comprises the evolving field of uncertainty quantification. The use of data, to improve the predictive accuracy of models, is central to uncertainty quantification so we will begin by providing an overview of how Bayesian techniques can be used to construct distributions for model inputs. We will subsequently describe the computational issues associated with propagating these distributions through complex models to construct prediction intervals for statistical quantities of interest such as expected profits or maximal reactor temperatures. Finally, we will describe the use of sensitivity analysis to isolate critical model inputs and surrogate model construction for simulation codes that are too complex for direct statistical analysis. All topics will be motivated by examples arising in engineering, biology, and economics.

    Speaker Info:

    Ralph Smith

    North Carlina State Univeristy

  • Uncertainty Quantification and Analysis at The Boeing Company

    Abstract:

    The Boeing Company is assessing uncertainty quantification methodologies across many phases of aircraft design in order to establish confidence in computational fluid dynamics-based simulations of aircraft performance. This presentation provides an overview of several of these efforts. First, the uncertainty in aerodynamic performance metrics of a commercial aircraft at transonic cruise due to turbulence model and flight condition variability is assessed using 3D CFD with non-intrusive polynomial chaos and second order probability. Second, a sample computation of uncertainty in increments is performed for an engineering trade study, leading to the development of a new method for propagating input-uncontrolled uncertainties as well as input-controlled uncertainties. This type of consideration is necessary to account for variability associated with grid convergence on different configurations, for example. Finally, progress toward applying the computed uncertainties in forces and moments into an aerodynamic database used for flight simulation will be discussed. This approach uses a combination of Gaussian processes and multiple-fidelity Kriging meta-modeling to synthesize the required data.

    Speaker Info:

    John Schaefer

    Sandia National Labortories

  • Uncertainty Quantification with Mixed Uncertainty Sources

    Abstract:

    Over the past decade, uncertainty quantification has become an integral part of engineering design and analysis. Both NASA and the DoD are making significant investments to advance the science of uncertainty quantification, increase the knowledge base, and strategically expanding its use. This increased use of uncertainty based results improves investment strategies and decision making. However, in complex systems, many challenges still exist when dealing with uncertainty in cases that have sparse, unreliable, poorly understood, and/or conflicting data. Often times, assumptions are made regarding the statistical nature of data that may not be well grounded and the impact of those assumptions is not well understood, which can lead to ill-informed decision making. This talk will focus on the quantification of uncertainty when both well characterized, aleatory, and not well known, epistemic, uncertainty sources exist. Particular focus is given to the treatment and management of epistemic uncertainty. A summary of non-probabilistic methods will be presented along with the propagation of mixed uncertainty and optimization under uncertainty. A discussion of decision making under uncertainty is also included to illustrate the use of uncertainty quantification.

    Speaker Info:

    Tom West

  • What is Bayesian Experimental Design?

    Abstract:

    In an experiment with a single factor with three levels, treatments A, B, and C, a single treatment is to be applied to each of several experimental units selected from some set of units. The response variable is continuous, and differences in its value show the relative effectiveness of the treatments. An experimental design will dictate which treatment is applied to what units. Since differences in the response variable are used to judge differences between treatments, the most important goal of the design is to prevent the treatment effect being masked by some unrelated property of the experimental units. Second important function of the design is to ensure power, that is, that if the treatments are not equally effective, the differences in the response variable are likely to be larger than background noise. Classical experimental design theory uses three principles: replication, randomization, and blocking, to produce an experimental design. Replication refers to how many units are used, blocking is a possible grouping of the units to reduce between unit heterogeneity, and randomization governs the assignment of units to treatment. Classical experimental designs are balanced as much as possible, that is, the three treatments are applied the same number of times, in each potential block of units. Bayesian experimental design aims to make use of additional related information, often called prior information, to produce a design. The information may be in the form of related experimental results, for example, treatments A and B may have been previously studied. It could be additional information about the experimental units, or about the response variable. This additional information could be used to change the usual blocking, to reduce the number of units assigned to treatments A and B compared to C, and/or reduce the total number of units needed to ensure power. This talk aims to explain Bayesian design concepts and illustrate them on realistic examples.

    Speaker Info:

    Blaza Toman

    Statistical Engineering Division, NIST

  • Workforce Analytics

    Abstract:

    Several statistical methods have been used effectively to model workforce behavior, specifically attrition due to retirement and voluntary separation[1]. Additionally various authors have introduced career development[2] as a meaningful aspect of workforce planning. While both general and more specific attrition modeling techniques yield useful results only limited success has followed attempts to quantify career stage transition probabilities. A complete workforce model would include quantifiable flows both vertically and horizontally in the network described pictorially here at a single time point in Figure 1. The horizontal labels in Figure 1 convey one possible meaning assignable to career stage transition – in this case, competency. More formal examples might include rank within a hierarchy such as in a military organization or grade in a civil service workforce. In the case of the Nuclear Weapons labs knowing that the specialized, classified knowledge needed to deal with Stockpile Stewardship is being preserved as evidenced by the production of Masters, individuals capable of independent technical work, is also of interest to governmental oversight. In this paper we examine the allocation of labor involved in a specific Life Extension program at LLNL. This growing workforce is described by discipline and career stage to determine how well the Norden-Rayleigh development cost model[3] fits the data. Since this model underlies much budget estimation within both DOD and NNSA the results should be of general interest. Data is also examined as a possible basis for quantifying horizontal flows in Figure 1.

    Speaker Info:

    William Romine

    Lawrence Livermore National Laboratory

  • XPCA: A Copula-based Generalization of PCA for Ordinal Data

    Abstract:

    Principal Component Analysis is a standard tool in an analyst’s toolbox. The standard practice of rescaling each column can be reframed as a copula-based decomposition in which the marginal distributions are fit with a univariate Gaussian distribution and the joint distribution is modeled with a Gaussian copula. In this light, we present an alternative to traditional PCA we call XPCA by relaxing the marginal Gaussian assumption and instead fit each marginal distribution with the empirical distribution function. Interval-censoring methods are used to account for the discrete nature of the empirical distribution function when fitting the Gaussian copula model. In this talk, we derive the XPCA estimator and inspect the differences in fits on both simulated and real data applications.

    Speaker Info:

    Cliff Anderson-Bergman

    Sandia National Laboratories

  • A Study to Investigate the Use of CFD as a Surrogate for Wind Tunnel Testing in the High Supersonic Speed Regime

    Speaker Info:

    Eric Walker

    NASA

  • A Study to Investigate the Use of CFD as a Surrogate for Wind Tunnel Testing in the High Supersonic Speed Regime

    Speaker Info:

    Joseph Morrison

    NASA

  • Allocating Information Gathering Efforts for Selection Decisions

    Abstract:

    Selection decisions, such as procurement decisions, are often based on multiple performance attributes whose values are estimated using data (samples) collected through experimentation. Because the sampling (measurement) process has uncertainty, more samples provide better information. With a limited test budget to collect information to support such a selection decision, determining the number of samples to observe from each alternative and attribute is a critical information gathering decision. In this talk we present a sequential allocation scheme that uses Bayesian updating and maximizes the probability of selecting the true best alternative when the attribute value samples contain Gaussian measurement error. In this sequential approach, the test-designer uses the current knowledge of the attribute values to identify which attribute and alternative to sample next; after that sample, the test-designer chooses another attribute and alternative to sample, and this continues until no more samples can be made. We present the results of a simulation study that illustrates the performance advantage of the proposed sequential allocation scheme over simpler and more common fixed allocation approaches.

    Speaker Info:

    Dennis Leber

    NIST

  • Allocating Information Gathering Efforts for Selection Decisions

    Abstract:

    Selection decisions, such as procurement decisions, are often based on multiple performance attributes whose values are estimated using data (samples) collected through experimentation. Because the sampling (measurement) process has uncertainty, more samples provide better information. With a limited test budget to collect information to support such a selection decision, determining the number of samples to observe from each alternative and attribute is a critical information gathering decision. In this talk we present a sequential allocation scheme that uses Bayesian updating and maximizes the probability of selecting the true best alternative when the attribute value samples contain Gaussian measurement error. In this sequential approach, the test-designer uses the current knowledge of the attribute values to identify which attribute and alternative to sample next; after that sample, the test-designer chooses another attribute and alternative to sample, and this continues until no more samples can be made. We present the results of a simulation study that illustrates the performance advantage of the proposed sequential allocation scheme over simpler and more common fixed allocation approaches.

    Speaker Info:

    Jeffrey Herrmann

    University of Maryland

  • Augmenting Definitive Screening Designs

    Abstract:

    Jones and Nachtsheim (2011) introduced a class of three-level screening designs called definitive screening designs (DSDs). The structure of these designs results in the statistical independence of main effects and two-factor interactions; the absence of complete confounding among two-factor interactions; and the ability to estimate all quadratic effects. Because quadratic effects can be estimated, DSDs can allow for the screening and optimization of a system to be performed in one step, but only when the number of terms found to be active during the screening phase of analysis is less than about half the number or runs required by the DSD (Errore, et al., 2016). Otherwise, estimation of second-order models requires augmentation of the DSD. In this paper we explore the construction of series of augmented designs, moving from the starting DSD to designs capable of estimating the full second-order model. We use power calculations, model-robustness criteria, and model-discrimination criteria to determine the number of runs by which to augment in order to identify the active second-order effects with high probability.

    Speaker Info:

    Abby Nachtsheim

    ASU

  • Automated Software Testing Best Practices and Framework: A STAT COE Project

    Abstract:

    The process for testing military systems which are largely software intensive involves techniques and procedures often different from those for hardware-based systems. Much of the testing can be performed in laboratories at many of the acquisition stages, up to operational testing. Testing software systems is not different from testing hardware-based systems in that testing earlier and more intensively benefits the acquisition program in the long run. Automated testing of software systems enables more frequent and more extensive testing, allowing for earlier discovery of errors and faults in the code. Automated testing is beneficial for unit, integrated, functional and performance testing, but there are costs associated with automation tool license fees, specialized manpower, and the time to prepare and maintain the automation scripts. This presentation discusses some of the features unique to automated software testing and offers a framework organizations can implement to make the business case for, to organize for, and to execute and benefit from automating the right aspects of their testing needs. Automation has many benefits in saving time and money, but is most valuable in freeing test resources to perform higher value tasks.

    Speaker Info:

    Jim Simpson

    JK Analytics

  • Automated Software Testing Best Practices and Framework: A STAT COE Project

    Abstract:

    The process for testing military systems which are largely software intensive involves techniques and procedures often different from those for hardware-based systems. Much of the testing can be performed in laboratories at many of the acquisition stages, up to operational testing. Testing software systems is not different from testing hardware-based systems in that testing earlier and more intensively benefits the acquisition program in the long run. Automated testing of software systems enables more frequent and more extensive testing, allowing for earlier discovery of errors and faults in the code. Automated testing is beneficial for unit, integrated, functional and performance testing, but there are costs associated with automation tool license fees, specialized manpower, and the time to prepare and maintain the automation scripts. This presentation discusses some of the features unique to automated software testing and offers a framework organizations can implement to make the business case for, to organize for, and to execute and benefit from automating the right aspects of their testing needs. Automation has many benefits in saving time and money, but is most valuable in freeing test resources to perform higher value tasks.

    Speaker Info:

    Jim Wisnowski

    Adsurgo

  • Background of NASA’s Juncture Flow Validation Test

    Speaker Info:

    Joseph Morrison

    NASA

  • Big Data, Big Think

    Abstract:

    The NASA Big Data, Big Think team jump-starts coordination, strategy, and progress for NASA applications of Big Data Analytics techniques, fosters collaboration and teamwork among centers and improves agency-wide understanding of Big Data research techniques & technologies and their application to NASA mission domains. The effort brings the Agency’s Big Data community together and helps define near term projects and leverages expertise throughout the agency. This presentation will share examples of Big Data activities from the Agency and discuss knowledge areas and experiences, including data management, data analytics and visualization.

    Speaker Info:

    Robert Beil

    NASA

  • Blast Noise Event Classification from a Spectrogram

    Abstract:

    Spectrograms (i.e., squared magnitude of short-time Fourier transform) are commonly used as features to classify audio signals in the same way that social media companies (e.g., Google, Facebook, Yahoo) use images to classify or automatically tag people in photos. However, a serious problem arises when using spectrograms to classify acoustic signals, in that the user must choose the input parameters (hyperparameters), and such choices can have a drastic effect on the accuracy of the resulting classifier. Further, considering all possible combinations of the hyperparameters is a computationally intractable problem. In this study, we simplify the problem making it computationally tractable, explore the utility of response surface methods for sampling the hyperparameter space, and find that response surface methods are a computationally efficient means of identifying the hyperparameter combinations that are likely to give the best classification results.

    Speaker Info:

    Edward Nykaza

    Army Engineering Research and Development Center, Construction Engineering Research Laboratory

  • Carrier Reliability Model Validation

    Abstract:

    Model Validation for Simulations of CVN-78 Sortie Generation

    As part of the test planning process, IDA is examining flight operations on the Navy’s newest carrier, CVN-78. The analysis uses a model, the IDA Virtual Carrier Model (IVCM), to examine sortie generation rates and whether aircraft can complete missions on time. Before using IVCM, it must be validated. However, CVN-78 has not been delivered to the Navy, and data from actual operations are to validate the model. Consequently, we will validate IVCM by comparing it to another model. This is a reasonable approach when a model is used in general analyses such as test planning, but is not acceptable when a model is used in the assessment of system effectiveness and suitability. The presentation examines the use of various statistical tools – Wilcoxon Rank Sum Test, Kolmogorov-Smirnov Test, and lognormal regression – to examine whether the results from two models provide similar results and to quantify the magnitude of any differences. From the analysis, IDA concluded that locations and distribution shapes are consistent, and that the differences between the models are less than 15 percent, which is acceptable for test planning.

    Speaker Info:

    Dean Thomas

    IDA

  • Censored Data Analysis for Performance Data

    Abstract:

    Binomial metrics like probability-to-detect or probability-to-hit typically provide operationally meaningful and easy to interpret test outcomes. However, they are information poor metrics and extremely expensive to test. The standard power calculations to size a test employ hypothesis tests, which typically result in many tens to hundreds of runs. In addition to being expensive, the test is most likely inadequate for characterizing performance over a variety of conditions due to the inherently large statistical uncertainties associated with binomial metrics. A solution is to convert to a continuous variable, such as miss distance or time-to-detect. The common objection to switching to a continuous variable is that the hit/miss or detect/non-detect binomial information is lost, when the fraction of misses/no-detects is often the most important aspect of characterizing system performance. Furthermore, the new continuous metric appears to no longer be connected to the requirements document, which was stated in terms of a probability. These difficulties can be overcome with the use of censored data analysis. This presentation will illustrate the concepts and benefits of this approach, and will illustrate a simple analysis with data, including power calculations to show the cost savings for employing the methodology.

    Speaker Info:

    Bram Lillard

    IDA

  • Combinational Testing

    Abstract:

    Combinatorial methods have attracted attention as a means of providing strong assurance at reduced cost. Combinatorial testing takes advantage of the interaction rule, which is based on analysis of thousands of software failures. The rule states that most failures are induced by single factor faults or by the joint combinatorial effect (interaction) of two factors, with progressively fewer failures induced by interactions between three or more factors. Therefore if all faults in a system can be induced by a combination of t or fewer parameters, then testing all t-way combinations of parameter values is pseudo-exhaustive and provides a high rate of fault detection. The talk explains background, method, and tools available for combinatorial testing. New results on using combinatorial methods for oracle-free testing of certain types of applications will also be introduced

    Speaker Info:

    Raghu Kacker

    NIST

  • Combinational Testing

    Abstract:

    Combinatorial methods have attracted attention as a means of providing strong assurance at reduced cost. Combinatorial testing takes advantage of the interaction rule, which is based on analysis of thousands of software failures. The rule states that most failures are induced by single factor faults or by the joint combinatorial effect (interaction) of two factors, with progressively fewer failures induced by interactions between three or more factors. Therefore if all faults in a system can be induced by a combination of t or fewer parameters, then testing all t-way combinations of parameter values is pseudo-exhaustive and provides a high rate of fault detection. The talk explains background, method, and tools available for combinatorial testing. New results on using combinatorial methods for oracle-free testing of certain types of applications will also be introduced

    Speaker Info:

    Rick Kuhn

    NIST

  • Combinatorial Testing for Link-16 Developmental Test and Evaluation

    Abstract:

    Due to small Tactical Data Link testing windows, only commonly used messages are tested resulting in the evaluation of only a small subset of all possible Link 16 messages. To increase the confidence that software design and implementation issues are discovered in the earliest phases of government acceptance testing, Marine Corps Tactical Systems Support Activity (MCTSSA) Instrumentation and Data Management Section (IDMS) successfully implemented an extension of the traditional form of Design of Experiments (DOE), called Combinatorial Testing (CT). CT was utilized to reduce the human bias and inconsistencies involved in Link 16 testing and replace them with a thorough test that can validate a system's ability to properly consume all of the possible valid combinations of Link 16 message field values. MCTSSA's unique team of subject matter experts was able to bring together the tenants of virtualization, automation, C4I Air systems testing, tactical data link testing, and Design of Experiments methodology, to invent a testing paradigm that will exhaustively evaluate tactical Air systems. This presentation will give an overview of how CT was implemented for the test.

    Speaker Info:

    Tim Mclean

    MCTSSA

  • Communicating Complex Statistical Methodologies to Leadership

    Abstract:

    More often than not, the data we analyze for the military is plagued with statistical issues. Multicollinearity, small sample sizes, quasi-experimental designs, and convenience samples are some examples of what we commonly see in military data. Many of these complications can be resolved either in the design or analysis stage with appropriate statistical procedures. But, to keep our work useful, usable, and transparent to the military leadership who sponsors it, we must strike the elusive balance between explaining and justifying our design and analysis techniques and not inundating our audience with unnecessary details. It can be even more difficult to get military leadership to understand the statistical problems and solutions so well that they are enthused and supportive of our approaches. Using literature written on the subject as well as a variety of experiences, we will showcase several examples, as well as present ideas for keeping our clients actively engaged in statistical methodology discussions.

    Speaker Info:

    Jane Pinelis

    Johns Hopkins University Applied Physics Lab or JHU

  • Communicating Complex Statistical Methodologies to Leadership

    Abstract:

    More often than not, the data we analyze for the military is plagued with statistical issues. Multicollinearity, small sample sizes, quasi-experimental designs, and convenience samples are some examples of what we commonly see in military data. Many of these complications can be resolved either in the design or analysis stage with appropriate statistical procedures. But, to keep our work useful, usable, and transparent to the military leadership who sponsors it, we must strike the elusive balance between explaining and justifying our design and analysis techniques and not inundating our audience with unnecessary details. It can be even more difficult to get military leadership to understand the statistical problems and solutions so well that they are enthused and supportive of our approaches. Using literature written on the subject as well as a variety of experiences, we will showcase several examples, as well as present ideas for keeping our clients actively engaged in statistical methodology discussions.

    Speaker Info:

    Paul Johnson

    MCOTEA

  • Communication in Statistics & the Five Hardest Concepts

    Speaker Info:

    Jennifer Van-Mellekom

    Virginia Tech

  • Comparing Experimental Designs

    Abstract:

    This tutorial will show how to compare and choose experimental designs based on multiple criteria. Answers to questions like "Which Design of Experiments (DOE) is better/best?" will be answered by looking at both data and graphics that show the relative performance of the designs based on multiple criteria, including; power of the designs for different model terms, how well the designs minimize predictive variance across the design space, to what level are model terms confounded or correlated, what are the relative efficiencies that measure how well coefficients are estimated or how well predictive variance is minimized. Many different case studies of screening, response surface, and screening augmented to response surface designs will be compared. Designs with both continuous and categorical factors, and with constraints on the experimental region will also be compared.

    Speaker Info:

    Tom Donnelly

    JMP

  • Data Farming

    Abstract:

    This tutorial is designed for newcomers to simulation-based experiments. Data farming is the process of using computational experiments to “grow” data, which can then be analyzed using statistical and visualization techniques to obtain insight into complex systems. The focus of the tutorial will be on gaining practical experience with setting up and running simulation experiments, leveraging recent advances in large-scale simulation experimentation pioneered by the Simulation Experiments & Efficient Designs (SEED) Center for Data Farming at the Naval Postgraduate School (http://harvest.nps.edu). Participants will be introduced to fundamental concepts, and jointly explore simulation models in an interactive setting. Demonstrations and written materials will supplement guided, hands-on activities through the setup, design, data collection, and analysis phases of an experiment-driven simulation study.

    Speaker Info:

    Susan Sanchez

    Naval Postgraduate School

  • Data Visualization

    Abstract:

    Data visualization allows us to quickly explore and discover relationships graphically and interactively. We will provide the foundations for creating better graphical information to accelerate the insight discovery process and enhance the understandability of reported results. First principles and the “human as part of the system” aspects of information visualization from multiple leading sources such as Harvard Business Review, Edward Tufte, and Stephen Few will be explored using representative example data sets. We will discuss best practices for graphical excellence to most effectively, clearly, and efficiently communicate your story. We will explore visualizations applicable across the conference themes (computational modeling, DOE, statistical engineering, modeling & simulation, and reliability) for univariate, multivariate, time-dependent, and geographical data.

    Speaker Info:

    Jim Wisnowski

    Adsurgo

  • Data Visualization

    Abstract:

    Teams of people with many different talents and skills work together at NASA to
    improve our understanding of our planet Earth, our Sun and solar system, and the
    Universe. The Earth System is made up of complex interactions and dependencies
    of the solar, oceanic, terrestrial, atmospheric, and living components. Solar storms
    have been recognized as a cause of technological problems on Earth since the invention
    of the telegraph in the 19th century. Solar flares, coronal holes, and coronal mass
    ejections (CME's) can emit large bursts of radiation, high speed electrons and
    protons, and other highly energetic particles that are released from the sun, and are
    sometimes directed at Earth. These particles and radiation can damage satellites in space,
    shutdown power grids on earth, cause GPS outages, and have serious health concerns to humans
    flying at high altitudes on earth, as well as astronauts in space. NASA builds and operates a fleet
    of satellites to study the sun and a fleet of satellites and aircraft to observe the Earth system.
    NASA’s Computer Models combine the observations with numerical models, to understand
    how these systems work. Using satellite observations alongside computer models we can combine
    many pieces of information to form a coherent view of Earth and the Sun. NASA research helps us
    understand how processes combine to affect life on Earth: this includes severe weather, health,
    changes in climate, and space weather. The Scientific Visualization Studio wants you to learn
    about NASA programs through visualization. The SVS works closely with scientists
    in the creation of data visualizations, animations, and images in order to promote
    a greater understanding of Earth and Space Science research activities at NASA
    and within the academic research community supported by NASA.

    Speaker Info:

    Lori Perklins

    NASA

  • Design & Analysis of a Computer Experiment for an Aerospace Conformance Simulation Study

    Abstract:

    Within NASA's Air Traffic Management Technology Demonstration # 1 (ATD-1), Interval Management (IM) is a flight deck tool that enables pilots to achieve or maintain a precise in-trail spacing behind a target aircraft. Previous research has shown that violations of aircraft spacing requirements can occur between an IM aircraft and its surrounding non-IM aircraft when it is following a target on a separate route. This talk focuses on the experimental design and analysis of a computer experiment which models the airspace configuration of interest in order to determine airspace/aircraft conditions leading to spacing violations during IM operation. We refer to multi-layered nested continuous factors as those that are continuous and ordered in their selection; they can only be selected sequentially with a level selected for one factor affecting the range of possible values for each subsequently nested factor. While each factor is nested within another factor, the exact nesting relationships have no closed form solution. In this talk, we describe our process of engineering an appropriate space-filling design for this situation. Using this space-filling design and Gaussian process modeling, we found that aircraft delay assignments and wind profiles significantly impact the likelihood of spacing violations and the interruption of IM operations.

    Speaker Info:

    David Edwards

    Virginia Commonwealth University

  • Deterministic System Design of Experiments Based Frangible Joint Design Reliability Estimation

    Abstract:

    Frangible Joints are linear pyrotechnic devices used to separate launch vehicle and spacecraft stages and fairings. Advantages of these systems include low mass, low dynamic shock, and low debris. However the primary disadvantage for human space flight applications is the design’s use of a single explosive cord to effect function, rendering the device zero fault tolerant. Commercial company proposals to utilize frangible joints in human space flight applications spurred a NASA Engineering and Safety Center (NESC) assessment of the reliability of frangible joints. Empirical test and LS-DYNA based finite element analysis was used to understand and assess the design and function, and a deterministic system Design of Experiments (dsDOE) study was conducted to assess the sensitivity of function to frangible joint design variables and predict the device’s design reliability. The collaboration between statistical engineering experts and LS-DYNA analysis experts enabled a comprehensive understanding of these devices.

    Speaker Info:

    Scott West

    Aerospace Corporation

  • Deterministic System Design of Experiments Based Frangible Joint Design Reliability Estimation

    Abstract:

    Frangible Joints are linear pyrotechnic devices used to separate launch vehicle and spacecraft stages and fairings. Advantages of these systems include low mass, low dynamic shock, and low debris. However the primary disadvantage for human space flight applications is the design’s use of a single explosive cord to effect function, rendering the device zero fault tolerant. Commercial company proposals to utilize frangible joints in human space flight applications spurred a NASA Engineering and Safety Center (NESC) assessment of the reliability of frangible joints. Empirical test and LS-DYNA based finite element analysis was used to understand and assess the design and function, and a deterministic system Design of Experiments (dsDOE) study was conducted to assess the sensitivity of function to frangible joint design variables and predict the device’s design reliability. The collaboration between statistical engineering experts and LS-DYNA analysis experts enabled a comprehensive understanding of these devices.

    Speaker Info:

    Martin Annett

    Aerospace Corporation

  • Deterministic System Design of Experiments Based Frangible Joint Design Reliability Estimation

    Abstract:

    Frangible Joints are linear pyrotechnic devices used to separate launch vehicle and spacecraft stages and fairings. Advantages of these systems include low mass, low dynamic shock, and low debris. However the primary disadvantage for human space flight applications is the design’s use of a single explosive cord to effect function, rendering the device zero fault tolerant. Commercial company proposals to utilize frangible joints in human space flight applications spurred a NASA Engineering and Safety Center (NESC) assessment of the reliability of frangible joints. Empirical test and LS-DYNA based finite element analysis was used to understand and assess the design and function, and a deterministic system Design of Experiments (dsDOE) study was conducted to assess the sensitivity of function to frangible joint design variables and predict the device’s design reliability. The collaboration between statistical engineering experts and LS-DYNA analysis experts enabled a comprehensive understanding of these devices.

    Speaker Info:

    James Womach

    Aerospace Corporation

  • Developmental Test and Evaluation

    Speaker Info:

    Brian Hall

    Principal Deputy Director

    DoD

    Dr. Hall was appointed to the Senior Executive Service in November 2014 as the Principal Deputy Director for Developmental Test and Evaluation, and is currently serving as the Principle Deputy Director of the Test Resource Management Center (TRMC). In this position, he oversees matters concerning the Nation's critical test range infrastructure, science and technology efforts, development of the biennial Strategic Plan for DoD Test and Evaluation (T&E) resources, as well as certification of the Services' T&E budgets.

    Prior to this position, Dr. Hall was the Technical Advisor for Operational Test and Evaluation (OT&E) of all Land and Expeditionary Warfare systems in OSD-DOT&E. In this position, advised the highest authorities in DoD for OT&E, observed operational testing, and coauthored Operational Assessments and Beyond Low-Rate Initial Production reports submitted to the 4 Congressional Defense Committees.
    Prior to serving on the OSD staff, Dr. Hall was the Division Chief for Aviation, Missiles, and C4ISR Systems in the Army Test and Evaluation Command (ATEC), where he was responsible for supervising the development of test plans and evaluations that directly supported milestone decision reviews and materiel fielding/production decisions of more than 300 Army programs. While with the Army, Dr. Hall was one of the leading reliability experts that helped establish the Center for Reliability Growth at Aberdeen Proving Ground, as well as develop and administer the ATEC/AMSAA 3-day reliability course to improve defense acquisition learning.

    Over his career, Dr. Hall has led studies, developed methodologies, presented research, published papers, crafted policy, and authored policy implementation guides. He has also developed staff, advised numerous defense programs, and served as an executive member of, or invited contributor to: tri-service DoD Blue Ribbon Panels; National Academy of Science studies; and DoD working-groups to improve system reliability.

    Dr. Hall is a Senior Service College graduate and has earned advanced degrees in Applied Mathematics, Reliability Engineering, and Strategic Studies. He is a domain expert in Reliability Engineering, as well as in Reliability Growth Management and Methodology. He has developed statistical methods and reliability growth models that have been: published in international journals, incorporated into Military Handbooks, adopted by Operational Test Agency policy, and utilized to shape growth plans and assess reliability maturity of numerous DoD systems.

  • Do Asymmetries in Nuclear Arsenals Matter?

    Abstract:

    The importance of the nuclear balance vis-a-vis our principal adversary has been the subject of intense but unresolved debate in the international security community for almost seven decades. Perspectives on this question underlie national security policies regarding potential unilateral reductions in strategic nuclear forces, the imbalance of nonstrategic nuclear weapons in Europe, nuclear crisis management, nuclear proliferation, and nuclear doctrine.

    The overwhelming majority of past studies of the role of the nuclear balance in nuclear crisis evolution and outcome have been qualitative and focused on the relative importance of the nuclear balance and national resolve. Some recent analyses have invoked statistical methods, however, these quantitative studies have generated intense controversy because of concerns with analytic rigor. We apply a multi-disciplinary approach that combines historical case study, international relations theory, and appropriate statistical analysis. This approach results in defensible findings on causal mechanisms that regulate nuclear crisis resolution. Such findings should inform national security policy choices facing the Trump administration.

    Speaker Info:

    Jane Pinelis

    Johns Hopkins University Applied Physics Lab or JHU

  • DOE Case Studies in Aerospace Research and Development

    Abstract:

    This presentation will provide a high level view of recent DOE applications to aerospace research. Two broad categories are defined, aerodynamic force measurement system calibrations and aircraft model wind tunnel aerodynamic characterization. Each case study will outline the application of DOE principles including design choices, accommodations for deviations from classical DOE approaches, discoveries, and practical lessons learned.

    Case Studies

    Aerodynamic Force Measurement System Calibrations

    Large External Wind Tunnel Balance Calibration
    - Fractional factorial
    - Working with non-ideal factor settings
    - Customer driven uncertainty assessment

    Internal Balance Calibration Including Temperature
    - Restrictions to randomization – split plot design requirements
    - Constraints to basic force model
    - Crossed design approach

    Aircraft Model Wind Tunnel Aerodynamic Characterization

    The NASA/Boeing X-48B Blended Wing Body Low-Speed Wind Tunnel Test
    - Overcoming a culture of OFAT
    - General approach to investigating a new aircraft configuration
    - Use of automated wind tunnel models and randomization
    - Power of residual analysis in detecting problems

    NASA GL–10 UAV Aerodynamic Characterization
    - Use of the Nested-Face Centered Design for aerodynamic characterization
    - Issues working with over 20 factors
    - Discoveries

    Speaker Info:

    Drew Landman

    Old Dominion University

  • Dose-Response Model of Recent Sonic Boom Community Annoyance Data

    Abstract:

    To enable quiet supersonic passenger flight overland, NASA is providing national and international noise regulators with a low-noise sonic boom database. The database will consist of dose-response curves, which quantify the relationship between low-noise sonic boom exposure and community annoyance. The recently-updated international standard for environmental noise assessment, ISO 1996-1:2016, references multiple fitting methods for dose-response analysis. One of these fitting methods, Fidell’s community tolerance level method, is based on theoretical assumptions that fix the slope of the curve, allowing only the intercept to vary. This fitting method is applied to an existing pilot sonic boom community annoyance data set from 2011 with a small sample size. The purpose of this exercise is to develop data collection and analysis recommendations for future sonic boom community annoyance surveys.

    Speaker Info:

    Jonathan Rathsam

    NASA

  • Estimating the Distribution of an Extremum using a Peaks-Over-Threshold Model and Monte Carlo Simulation

    Abstract:

    Estimating the probability distribution of an extremum (maximum or minimum), for some fixed amount of time, using a single time series typically recorded for a shorter amount of time, is important in many application areas, e.g., structural design, reliability, quality, and insurance. When designing structural members, engineers are concerned with maximum wind effects, which are functions of wind speed. With respect to reliability and quality, extremes experienced during storage or transport, e.g., extreme temperatures, may substantially impact product quality, lifetime, or both. Insurance companies are of course concerned about very large claims.

    In this presentation, a method to estimate the distribution of an extremum using a well-known peaks-over-threshold (POT) model and Monte Carlo simulation is presented. Since extreme values have long been a subject of study, some brief history is first discussed. The POT model that underlies the approach is then laid out. A description of the algorithm follows. It leverages pressure data collected on scale models of buildings in a wind tunnel for context. Essentially, the POT model is fitted to the observed data and then used to simulate many times series of the desired length. The empirical distribution of the extrema is obtained from the simulated series. Uncertainty in the estimated distribution is quantified by a bootstrap algorithm. Finally, an R package implementing the computations is discussed.

    Speaker Info:

    Adam Pintar

    NIST

  • Experimental Design for Composite Pressure Vessel Life Prediction

    Abstract:

    One of the major pillars of experimental design is sequential learning. The experimental design should not be viewed as a “one-shot” effort, but rather as a series of experiments where each stage builds upon information learned from the previous study. It is within this realm of sequential learning that experimentation soundly supports the application of the scientific method.
    This presentation illustrates the value of sequential experimentation and also the connection between the scientific method and experimentation through a discussion of a multi-stage project supported by NASA’s Engineering Safety Center (NESC) where the objective was to assess the safety of composite overwrapped pressure vessels (COPVs). The analytical team was tasked with devising a test plan to model stress rupture failure risk in carbon fiber strands that encase the COPVs with the goal of understanding the reliability of the strands at use conditions for the expected mission life. This presentation highlights the recommended experimental design for the strand tests and then discusses the benefits that resulted from the suggested sequential testing protocol.

    Speaker Info:

    Anne Drsicoll

    Virginia Tech

  • Flight Test and Evaluation of Airborne Spacing Application

    Abstract:

    NASA’s Airspace Technology Demonstration (ATD) project was developed to facilitate the transition of mature air traffic management technologies from the laboratory to operational use. The first ATD focused on an integrated set of advanced NASA technologies to enable efficient arrival operations in high-density terminal airspace. This integrated arrival solution was validated and verified in laboratories and transitioned to a field prototype for an operational demonstration. Within NASA, this was a collaborative effort between Ames and Langley Research Centers involving a multi-year iterative experimentation process consisting of a series of sequential batch computer simulations and human-in-the-loop experiments, culminating in a flight test. Designing and analyzing the flight test involved a number of statistical challenges. There were several variables which are known to impact the performance of the system, but which could not be controlled in an operational environment. Changes in the schedule due to weather and the dynamic positioning of the aircraft on the arrival routes resulted in the need for a design that could be modified in real-time. This presentation describes a case study from a recent NASA flight test, highlights statistical challenges, and discusses lessons learned.

    Speaker Info:

    Sara Wilson

    NASA

  • How do the Framework and Design of Experiments Fundamentally Help?

    Abstract:

    The Military Global Positioning System (GPS) User Equipment (MGUE) program is the user segment of the GPS Enterprise—a program on the Deputy Assistant Secretary of Defense for Developmental Test and Evaluation (DASD(DT&E)) Space and Missile Defense Systems portfolio. The MGUE program develops and test GPS cards capable of using Military-Code (M Code) and legacy signals.
    The program’s DT&E strategy is challenging. The GPS cards provide new, untested capabilities. Milestone A was approved on 2012 with sole source contracts released to three vendors for Increment 1. An Acquisition Decision Memorandum directs the program to support a Congressional Mandate to provide GPS M Code-capable equipment for use after FY17. Increment 1 provides GPS receiver form factors for the ground domain interface as well as for the aviation and maritime domain interface.
    When reviewing DASD(DT&E) Milestone B (MS B) Assessment Report, Mr. Kendall expressed curiosity about how the Developmental Evaluation Framework (DEF) and Design of Experiments (DOE) help.
    This presentation describes how the DEF and DOE methods help producing more informative and more economical developmental tests than what was originally under consideration by the test community—decision-quality information with a 60% reduction in test cycle time. It provides insight into how the integration of the DEF and DOE improved the overall effectiveness of the DT&E strategy, illustrates the role of modeling and simulation (M&S) in the test design process, provides examples of experiment designs for different functional and performance areas, and illustrates the logic involved in balancing risks and test resources. The DEF and DOE methods enables the DT&E strategy to fully exploit early discovery, to maximize verification and validation opportunities, and to characterize system behavior across the technical requirements space.

    Speaker Info:

    Luis A. Cortes

  • How do the Framework and Design of Experiments Fundamentally Help?

    Abstract:

    The Military Global Positioning System (GPS) User Equipment (MGUE) program is the user segment of the GPS Enterprise—a program on the Deputy Assistant Secretary of Defense for Developmental Test and Evaluation (DASD(DT&E)) Space and Missile Defense Systems portfolio. The MGUE program develops and test GPS cards capable of using Military-Code (M Code) and legacy signals.
    The program’s DT&E strategy is challenging. The GPS cards provide new, untested capabilities. Milestone A was approved on 2012 with sole source contracts released to three vendors for Increment 1. An Acquisition Decision Memorandum directs the program to support a Congressional Mandate to provide GPS M Code-capable equipment for use after FY17. Increment 1 provides GPS receiver form factors for the ground domain interface as well as for the aviation and maritime domain interface.
    When reviewing DASD(DT&E) Milestone B (MS B) Assessment Report, Mr. Kendall expressed curiosity about how the Developmental Evaluation Framework (DEF) and Design of Experiments (DOE) help.
    This presentation describes how the DEF and DOE methods help producing more informative and more economical developmental tests than what was originally under consideration by the test community—decision-quality information with a 60% reduction in test cycle time. It provides insight into how the integration of the DEF and DOE improved the overall effectiveness of the DT&E strategy, illustrates the role of modeling and simulation (M&S) in the test design process, provides examples of experiment designs for different functional and performance areas, and illustrates the logic involved in balancing risks and test resources. The DEF and DOE methods enables the DT&E strategy to fully exploit early discovery, to maximize verification and validation opportunities, and to characterize system behavior across the technical requirements space.

    Speaker Info:

    Mike Sheeha

    MITRE

  • Improving Sensitivity Experiments

    Abstract:

    This presentation will provide a brief overview of sensitivity testing, and emphasize applications to several products and system of importance to the Defense as well as private industry, including Insensitive Energetics, Ballistic testing of protective armor, testing of munition fuzes and Microelectromechanical Systems (MEMS) components, and safety testing of high-pressure test ammunition, and packaging for high-value materials.

    Speaker Info:

    Douglas Ray

    US Army RDECOM ARDEC

  • Improving Sensitivity Experiments

    Abstract:

    This presentation will provide a brief overview of sensitivity testing, and emphasize applications to several products and system of importance to the Defense as well as private industry, including Insensitive Energetics, Ballistic testing of protective armor, testing of munition fuzes and Microelectromechanical Systems (MEMS) components, and safety testing of high-pressure test ammunition, and packaging for high-value materials.

    Speaker Info:

    Kevin Singer

    US Army

  • Improving the Rigor of Navy M&S VV&A through the application of Design of Experiments Methodologies and related Statistical Techniques

    Speaker Info:

    Stargel Doane

    COTF

  • Integrated Uncertainty Quantification for Risk and Resource Management: Building Confidence in Design

    Speaker Info:

    Eric Walker

    NASA

  • Introduction to Bayesian Statistics

    Abstract:

    One of the most powerful features of Bayesian analyses is the ability to combine multiple sources of information in a principled way to perform inference. For example, this feature can be particularly valuable in assessing the reliability of systems where testing is limited for some reason (e.g., expense, treaty). At their most basic, Bayesian methods for reliability develop informative prior distributions using expert judgment or similar systems. Appropriate models allow the incorporation of many other sources of information, including historical data, information from similar systems, and computer models. I will introduce the approach and then consider examples from defense acquisition and lifecycle extension, focusing on the strengths and weaknesses of the Bayesian analyses.

    Speaker Info:

    Alyson Wilson

    North Carolina State University

  • Introduction To Design of Experiments

    Abstract:

    Well-designed experiments are a powerful tool for developing and validating cause and effect relationships when evaluating and improving product and process performance and for operational testing of complex systems. Designed experiments are the only efficient way to verify the impact of changes in product or process factors on actual performance.

    Speaker Info:

    Doug Montgomery

    Professor

    Arizona State University

  • Introduction To Design of Experiments

    Abstract:

    Well-designed experiments are a powerful tool for developing and validating cause and effect relationships when evaluating and improving product and process performance and for operational testing of complex systems. Designed experiments are the only efficient way to verify the impact of changes in product or process factors on actual performance.

    Speaker Info:

    Brad Jones

    Professor

    Arizona State University

  • Introduction to Human Measurement

    Speaker Info:

    Cynthia Null

    NASA

  • Machine Learning: Overview and Applications to Test

    Abstract:

    “Machine learning is quickly gaining importance in being able to infer meaning from large, high-dimensional datasets. It has even demonstrated performance meeting or exceeding human capabilities in conducting a particular set of tasks such as speech recognition and image recognition. Employing these machine learning capabilities can lead to increased efficiency in data collection, processing, and analysis. Presenters will provide an overview of common examples of supervised and unsupervised learning tasks and algorithms as an introduction to those without experience in machine learning.

    Presenters will also provide motivation for machine learning tasks and algorithms in a variety of test and evaluation settings. For example, in both developmental and operational test, restrictions on instrumentation, number of sorties, and the amount of time allocated to analyze collected data make data analysis challenging. When instrumentation is unavailable or fails, a common back-up data source is an over-the-shoulder video recording or recordings of aircraft intercom and radio transmissions, which traditionally are tedious to analyze. Machine learning based image and speech recognition algorithms can assist in extracting information quickly from hours of video and audio recordings. Additionally, unsupervised learning techniques may be used to aid in the identification of influences of logged or uncontrollable factors in many test and evaluation settings. Presenters will provide a potential example for the application of unsupervised learning techniques to test and evaluation.”

    Speaker Info:

    Takayuki Iguchi

    AFOTEC

  • Machine Learning: Overview and Applications to Test

    Abstract:

    “Machine learning is quickly gaining importance in being able to infer meaning from large, high-dimensional datasets. It has even demonstrated performance meeting or exceeding human capabilities in conducting a particular set of tasks such as speech recognition and image recognition. Employing these machine learning capabilities can lead to increased efficiency in data collection, processing, and analysis. Presenters will provide an overview of common examples of supervised and unsupervised learning tasks and algorithms as an introduction to those without experience in machine learning.

    Presenters will also provide motivation for machine learning tasks and algorithms in a variety of test and evaluation settings. For example, in both developmental and operational test, restrictions on instrumentation, number of sorties, and the amount of time allocated to analyze collected data make data analysis challenging. When instrumentation is unavailable or fails, a common back-up data source is an over-the-shoulder video recording or recordings of aircraft intercom and radio transmissions, which traditionally are tedious to analyze. Machine learning based image and speech recognition algorithms can assist in extracting information quickly from hours of video and audio recordings. Additionally, unsupervised learning techniques may be used to aid in the identification of influences of logged or uncontrollable factors in many test and evaluation settings. Presenters will provide a potential example for the application of unsupervised learning techniques to test and evaluation.”

    Speaker Info:

    Megan Lewis

    AFOTEC

  • Model Based Systems Engineering Panel Discussion

    Abstract:

    This panel will share status, experiences and expectations within DoD and NASA for transitioning Systems Engineering to a more integrated digital engineering domain. A wide range of perspectives will be provided, covering the implementation waterfront of practitioner, management, research and strategy. Panelist will also be prepared to discuss more focused areas of digital systems engineering, such as test and evaluation, and engineering statistics.

    Speaker Info:

    John Holladay

    NASA

  • Model Uncertainty and its Inclusion in Testing Results

    Abstract:

    Answers to real world questions are often based on the use of judiciously chosen mathematical/statistical/physical models. In particular, assessment of failure probabilities of physical systems rely heavily on such models. Since no model describes the real world exactly, sensitivity analyses are conducted to examine influences of (small) perturbations of an assumed model. In this talk we present a structured approach, using an "Assumptions Lattice" and corresponding "Uncertainty Pyramid", for transparently conveying the influence of various assumptions on analysis conclusions. We illustrate this process in the context of a simple multicomponent system.

    Speaker Info:

    Stever Lund

    NIST

  • Optimal Multi-Response Designs

    Abstract:

    The problem of constructing a design for an experiment when multiple responses are of interest does not have a clear answer, particularly when the response variables are of different types. Planning an experiment for an air-to-air missile simulation, for example, might have the following responses simultaneously: hit or miss the target (a binary response) and the time to acquire the target (a continuous response). With limited time and resources and only one experiment possible, the question of selecting an appropriate design to model both responses is important. In this presentation, we discuss a method for creating designs when two responses, each with a different distribution (normal, binomial, or Poisson), are of interest. We demonstrate the proposed method using various weighting schemes for the two models to show how the designs change as the weighting scheme changes. In addition, we explore the effect of the specified priors for the nonlinear models on these designs.

    Speaker Info:

    Sarah Burke

    STAT COE

  • Overview of Statistical Validation Tools

    Abstract:

    When Modeling and Simulation (M&S) is used as part of operational evaluations of effectiveness, suitability, survivability, or lethality, the M&S capability should first be rigorously validated to ensure it is representing the real world accurately enough for the intended use. Specifically, we need to understand and characterize the usefulness and limitations of the M&S, especially in terms of uncertainty! Many statistical techniques are available to compare M&S output with live test data. This presentation will describe and present results from a simulation study conducted to determine which techniques provide the highest statistical power to detect differences in mean and variance between live and sim for a variety of data types and sizes.

    Speaker Info:

    Kelly McGinnity

    IDA

  • Project Data Flow Is an Engineered System

    Abstract:

    Data within a project, investigation or test series are often seen as a bunch of numbers that were produced. While this is part of the story, it forgets the most important part: the data’s users. This more powerful process begins with early focus on planning, executing and managing data flow within a test or project as a system, treating each handoff between internal and external stakeholders a system interface. This presentation will persuade you why data production should be replaced by the idea of a data supply chain focused on goals and customers. The presenter will outline how this could be achieved in your team. The talk is aimed at not only project and data managers, but also team members who produce or use data. Retooling team thinking and processes along these lines will help communication, facilitate availability, display and understanding of data by any stakeholder, make data verification, validation and analysis easier, and help keep team members focused on what is necessary and important: solving the problem at hand.

    Speaker Info:

    Ken Johnson

    NASA

  • Range Adversarial Planning Tool for Autonomy Test and Evaluation

    Speaker Info:

    Chad Hawthorne

    JHU/APL

  • Recent Advances in Measuring Display Clutter

    Abstract:

    Display clutter has been defined as an unintended effect of display imagery obscuring or confusing other information or that may not be relevant to the task at hand. Negative effects of clutter on user performance have been documented; however, some work suggests differential effects with workload variations and measurement method. Existing measures of clutter either focus on physical display characteristics or user perceptions and they generally exhibit weak correlations with task performance, limiting utility for application in safety-critical domains. These observations have led to a new integrated measure of clutter accounting for display data, user knowledge and patterns of visual attention. Due to limited research on clutter effects in domains other than aviation, empirical studies have been conducted to evaluate the new measure in automobile driving. Data-driven measures and subjective perceptions of clutter were collected along with patterns of visual attention allocation when drivers searched ‘high’ and ‘low’ clutter navigation displays. The experimental paradigm was manipulated to include both presentation-based trials with static display images or use of a dynamic, driving simulator. The new integrated measure was more strongly correlated with driver performance than other, previously-developed measures of clutter. Results also revealed clutter to significantly alter attention and degrade performance with static displays but to have little to no effects in driving simulation. Findings corroborate trends in the literature that clutter has its greatest effects on behavior in domains requiring extended attention to displays, such as map-search, compared to use of displays to support secondary tasks, such as nav aids in driving. Integrating display data and user knowledge factors with patterns of attention shows promise for clutter measurement

    Speaker Info:

    David Kaber

    NCSU

  • Recent Advances in Measuring Display Clutter

    Abstract:

    Display clutter has been defined as an unintended effect of display imagery obscuring or confusing other information or that may not be relevant to the task at hand. Negative effects of clutter on user performance have been documented; however, some work suggests differential effects with workload variations and measurement method. Existing measures of clutter either focus on physical display characteristics or user perceptions and they generally exhibit weak correlations with task performance, limiting utility for application in safety-critical domains. These observations have led to a new integrated measure of clutter accounting for display data, user knowledge and patterns of visual attention. Due to limited research on clutter effects in domains other than aviation, empirical studies have been conducted to evaluate the new measure in automobile driving. Data-driven measures and subjective perceptions of clutter were collected along with patterns of visual attention allocation when drivers searched ‘high’ and ‘low’ clutter navigation displays. The experimental paradigm was manipulated to include both presentation-based trials with static display images or use of a dynamic, driving simulator. The new integrated measure was more strongly correlated with driver performance than other, previously-developed measures of clutter. Results also revealed clutter to significantly alter attention and degrade performance with static displays but to have little to no effects in driving simulation. Findings corroborate trends in the literature that clutter has its greatest effects on behavior in domains requiring extended attention to displays, such as map-search, compared to use of displays to support secondary tasks, such as nav aids in driving. Integrating display data and user knowledge factors with patterns of attention shows promise for clutter measurement

    Speaker Info:

    Carl Pankok

  • Reflections on Statistical Engineering and it's Application

    Speaker Info:

    Geoff Vining

    Professor

    Virginia Tech

    Geoff Vining is a Professor of Statistics at Virginia Tech. From 1999 – 2006, he also was the department head. He currently is the ASQ Treasurer for 2016 and Past-Chair of the ASQ Technical Communities Council. He is a Fellow of the ASQ, a Fellow of the American Statistical Association (ASA), and an Elected Member of the International Statistical Institute.

    Dr. Vining served as Editor of the Journal of Quality Technology from 1998 – 2000 and as Editor-in-Chief of Quality Engineering from 2008-2009. He also has served as Chair of the ASQ Publications Management Board, as Chair of the ASQ Statistics Division, and as Chair of the ASA Quality and Productivity Section.

    Dr. Vining won the 2010 Shewhart Medal, the ASQ career award given annually to the person not previously so honored who has demonstrated the most outstanding technical leadership in the field of modern quality control, especially through the development to its theory, principles, and techniques. He also received the 2015 Box Medal from the European Network for Business and Industrial Statistics (ENBIS). This medal recognizes each year an extraordinary statistician who has remarkably contributed with his/her work to the development and the application of statistical methods in European business and industry. In 2013, he received an Engineering Excellence Award from the NASA Engineering and Safety Center. He received the 2011 William G. Hunter Award from the ASQ Statistics Division for excellence in statistics as a communicator, a consultant, an educator, an innovator, an integrator of statistics with other disciplines and an implementer who obtains meaningful results. He won the 1990 ASQ Brumbaugh Award for the paper published in an ASQ journal that made the greatest contribution to the development of industrial applications of quality control and the 2005 Lloyd Nelson Award from the Statistics Division for the paper published in the Journal of Quality Technology that had the greatest immediate impact to practitioners.

    Dr. Vining is the author of three textbooks. He is an internationally recognized expert in the use of experimental design for quality, productivity, and reliability improvement and in the application of statistical process control. He has extensive consulting experience, most recently with NASA and the U.S. Department of Defense.

  • Reliability Growth Modeling

    Abstract:

    Several optimization models are described for allocating resources to different testing activities in a system’s reliability growth program. These models assume availability of an underlying reliability growth model for the system, and capture the tradeoffs associated with focusing testing resources at various levels (e.g., system, subsystem, component) and/or how to divide resources within a given level. In order to demonstrate insights generated by solving the model, we apply the optimization models to an example series-parallel system in which reliability growth is assumed to follow the Crow/AMSAA reliability growth model. We then demonstrate how the optimization models can be extended to incorporate uncertainty in Crow/AMSAA parameters.

    Speaker Info:

    Kellly Sullivan

    University of Arkansas

  • Resampling Methods

    Abstract:

    Resampling Methods: This tutorial presents widely used resampling methods to include bootstrapping, cross-validation, and permutation tests. Underlying theories will be presented briefly, but the primary focus will be on applications. A new graph-theoretic approach to change detection will be discussed as a specific application of permutation testing. Examples will be demonstrated in R; participants are encouraged to bring their own portable computers to follow along using datasets provided by the instructor.

    Speaker Info:

    David Ruth

    United States Naval Academy

  • Retooling Design and Development

    Speaker Info:

    Chris Singer

    NASA Deputy Chief Engineer

    NASA

    Christopher (Chris) E. Singer is the NASA Deputy, Chief Engineer responsible for integrating engineering across the Agencies 10 field centers. Prior to this appointment in April 2016, he served as the Engineering Director at NASA's Marshall Space Flight Center in Huntsville, Alabama. Appointed in 2011, Mr. Singer led an organization of 1,400 civil service and 1,200 support contractor employees responsible for the design, testing, evaluation, and operation of hardware and software associated with space transportation, spacecraft systems, science instruments and payloads under development at the Marshall Center. The Engineering Directorate also manages NASA's Payload Operations Center at Marshall, which is the command post for scientific research activities on-board the International Space Station.

    Mr. Singer began his NASA career in 1983 as a rocket engine specialist. In 1992, he served a one-year assignment at NASA Headquarters in Washington, DC, as senior manager for the space shuttle main engine and external tank in the Space Shuttle Support Office. In 1994, Mr. Singer supervised the development and implementation of safety improvements and upgrades to shuttle propulsion components. In 2000, he was appointed chief engineer in the Space Transportation Directorate then was selected as deputy director of Marshall's Engineering Directorate from 2004 to 2011.

    Mr. Singer is an AIAA Associate Fellow. In 2006, he received the Presidential Rank Award for Meritorious Executives — the highest honor for career federal employees. He was awarded the NASA Outstanding Leadership Medal in 2001 and 2008 for his leadership. In 1989, he received the prestigious Silver Snoopy Award from the Astronaut Corps for his contributions to the success of human spaceflight missions.

    A native of Nashville, Tennessee, Mr. Singer earned a bachelor's degree in mechanical engineering in 1983 from Christian Brothers University in Memphis, Tennessee.

    Chris enjoys woodworking, fishing and Hang gliding. Chris is married to the former Jody Adams of Hartselle, Alabama. They have three children and live in Huntsville, Alabama.

  • Sample Size and Considerations for Statistical Power

    Abstract:

    Sample size drives the resources and supports the conclusions of operational test. Power analysis is a common statistical methodology used in planning efforts to justify the number of samples. Power analysis is sensitive to extreme performance (e.g. 0.1% correct responses or 99.999% correct responses) relative to a threshold value, extremes in response variable variability, numbers of factors and levels, system complexity, and a myriad of other design- and system-specific criteria. This discussion will describe considerations (correlation/aliasing, operational significance, thresholds, etc.) and relationships (design, difference to detect, noise, etc.) associated with power. The contribution of power to design selection or adequacy must often be tempered when significant uncertainty or test resources constraints exist. In these situations, other measures of merit and alternative analytical approaches become at least as important as power in the development of designs that achieve the desired technical adequacy. In conclusion, one must understand what power is, what factors influence the calculation, and when to leverage alternative measures of merit.

    Speaker Info:

    Vance Oas

    AFOTEC

  • Sample Size and Considerations for Statistical Power

    Abstract:

    Sample size drives the resources and supports the conclusions of operational test. Power analysis is a common statistical methodology used in planning efforts to justify the number of samples. Power analysis is sensitive to extreme performance (e.g. 0.1% correct responses or 99.999% correct responses) relative to a threshold value, extremes in response variable variability, numbers of factors and levels, system complexity, and a myriad of other design- and system-specific criteria. This discussion will describe considerations (correlation/aliasing, operational significance, thresholds, etc.) and relationships (design, difference to detect, noise, etc.) associated with power. The contribution of power to design selection or adequacy must often be tempered when significant uncertainty or test resources constraints exist. In these situations, other measures of merit and alternative analytical approaches become at least as important as power in the development of designs that achieve the desired technical adequacy. In conclusion, one must understand what power is, what factors influence the calculation, and when to leverage alternative measures of merit.

    Speaker Info:

    Nick Garcia

    AFOTEC

  • Search for Extended Test Design Methods for Complex Systems of Systems

    Speaker Info:

    Alex Alaniz

    AFOTEC

  • Sequential Experimentation for a Binary Response - The Break Separation Method

    Abstract:

    Binary response experiments are common in epidemiology, biostatistics as well as in military applications. The Up and Down method, Langlie’s Method, Neyer’s method, K in a Row method and 3 Phase Optimal Design are methods used for sequential experimental design when there is a single continuous variable and a binary response. During this talk, we will discuss a new sequential experimental design approach called the Break Separation Method (BSM). BSM provides an algorithm for determining sequential experimental trials that will be used to find a median quantile and fit a logistic regression model using Maximum Likelihood estimation. BSM results in a small sample size and is designed to efficiently compute the median quantile.

    Speaker Info:

    Darsh Thakkar

    RIT-S

  • Sequential Experimentation for a Binary Response - The Break Separation Method

    Abstract:

    Binary response experiments are common in epidemiology, biostatistics as well as in military applications. The Up and Down method, Langlie’s Method, Neyer’s method, K in a Row method and 3 Phase Optimal Design are methods used for sequential experimental design when there is a single continuous variable and a binary response. During this talk, we will discuss a new sequential experimental design approach called the Break Separation Method (BSM). BSM provides an algorithm for determining sequential experimental trials that will be used to find a median quantile and fit a logistic regression model using Maximum Likelihood estimation. BSM results in a small sample size and is designed to efficiently compute the median quantile.

    Speaker Info:

    Rachel Silvestrini

    RIT-S

  • Software Reliability Modeling

    Abstract:

    Many software reliability models characterize the number of faults detected during the testing process as a function of testing time, which is performed over multiple stages. Typically, the later stages are progressively more expensive because of the increased number of personnel and equipment required to support testing as the system nears completion. Such transitions from one stage of testing to the next change in the operational environment. One statistical approach to combine software reliability growth models in a manner capable of characterizing multi-stage testing is the concept of a change-point process, where the intensity of a process experiences a distinct change at one or more discrete times during testing. Thus, change-point processes can be used to model change in the failure rate of software due to changes in the testing strategy and environment, integration testing, and resource allocation as it proceeds through multiple stages of testing.

    This presentation generalizes change-point models to the heterogeneous case, where fault detection before and after a change-point can be characterized by distinct nonhomogeneous Poisson processes (NHPP). Experimental results suggest that heterogeneous change-point models better characterize some failure data sets, which can improve the applicability of software reliability models to large-scale software systems that are tested over multiple stages.

    Speaker Info:

    Lance Fiondella

    UMASS

  • Software Test Techniques

    Abstract:

    In recent years, software testing techniques based on formal methods have made their way into industrial practice as a supplement to system and unit testing. I will discuss three core techniques that have proven particularly amenable to transition: 1) Concolic execution, which enables the automatic generation of high-coverage test suites; 2) Property-based randomized testing, which automatically checks sequences of API calls to ensure that expected high-level behavior occurs; and 3) Bounded model checking, which enables systematic exploration of both concrete systems and high-level models to check temporal properties, including ordering of events and timing requirements.

    Speaker Info:

    Jose Calderon

    Galois

  • Split-Plot and Restricted Randomization Designs

    Abstract:

    Have you ever built what you considered to be the ideal designed experiment, then passed it along to be run and learn later that your recommended run order was ignored? Or perhaps you were part of a test execution team and learned too late that one or more of your experimental factors are difficult or time-consuming to change. We all recognize that the best possible guard against lurking background noise is complete randomization, but often we find that a randomized run order is extremely impractical or even infeasible. Split-plot design and analysis methods have been around for over 80 years, but only in the last several years have the methods fully matured and been made available in commercial software. This class will introduce you to the world of practical split-plot design and analysis methods. We’ll provide you the skills to effectively build designs appropriate to your specific needs and demonstrate proper analysis techniques using general linear models, available in the statistical software. Topics include split-plots for 2-level and mixed-level factor sets, for first and second order models, as well as split-split-plot designs.

    Speaker Info:

    Jim Simpson

    JK Analytics

  • Statistical Methods for Programmatic Assessment and Anomaly Detection

    Speaker Info:

    Douglas Brown

    BAH

  • Statistical Methods for Programmatic Assessment and Anomaly Detection

    Speaker Info:

    Ray McCollum

    BAH

  • Statistical Models for Reliability Data

    Abstract:

    Engineers in manufacturing industries require data-driven reliability information for making business, product-design, and engineering decisions. The owners and operators of fleets of systems also need reliability information to make good decisions. The course will focus on concepts, examples, models, data analysis, and interpretation of reliability data analyses. Examples and exercises will include product field (maintenance or warranty) data and accelerated life test data. After completing this course, participants will be able to recognize and properly deal with different kinds of reliability data and properly interpret important reliability metrics. Topics will include the use probability plots to identify appropriate distributional models (e.g. Weibull and lognormal distributions), estimating important quantities like distribution quantiles and failure probabilities, the analysis of data with multiple failure modes, and the analysis of recurrence data from a fleet of systems or a reliability growth program.

    Speaker Info:

    William Meeker

    Professor

    Iowa State University

  • Structured Decision Making

    Abstract:

    Difficult choices are often required in a decision-making process where resources and budgets are increasingly constrained. This talk demonstrates a structured decision-making approach using layered Pareto fronts to prioritize the allocation of funds between munitions stockpiles based on their estimated reliability, the urgency of needing available units, and the consequences if adequate numbers of units are . This case study illustrates the process of first identifying appropriate metrics that summarize important dimensions of the decision, and then eliminating non-contenders from further consideration in an objective stage. The final subjective stage incorporates subject matter expert priorities to select the four stockpiles to receive additional maintenance and surveillance funds based on understanding the trade-offs and robustness to various user priorities.

    Speaker Info:

    Christine Anderson-Cook

    LANL

  • Test and Evaluation Matter for Warfighter and We Need your Help to Make it Better

    Speaker Info:

    Katherine Warner

    Science Advisor within Secretary of Defense

    DOT&E

    Dr. Catherine Warner serves as the Science Advisor for the Director, Operational Test and Evaluation within the Office of the Secretary of Defense. Dr. Warner has been involved with operational test and evaluation since 1991, when she became a research staff member at the Institute for Defense Analyses (IDA). In that capacity, Dr. Warner performed and directed analysis of operational tests for Army, Navy, and Air Force systems in support of DOT&E. From 2005 – 2010, Dr. Warner was an Assistant Director at IDA and also served as the lead for the Air Warfare group. Her analysis portfolio included major aircraft systems such as the F-22, F/A-18E/F, V-22, and H-1. Prior to that, Dr. Warner was the lead analyst for Unmanned Aerial Vehicle (UAV) systems including Predator, Shadow, Hunter, and Global Hawk. In 2013 at the request of the Defense Information Systems Agency (DISA), Dr. Warner deployed to Kabul, Afghanistan for 16 months in support of NATO’s International Security Assistance Force (ISAF) and the US Operation Enduring Freedom (OEF). She lead a team of Information Technology specialists in advising Afghanistan’s Ministry of Communications and Information Technology on enhancing national communication capabilities for the security and economic growth of the country. The primary focus of this team included supporting the completion of Afghanistan’s National Fiber Optic Ring, Spectrum Management, and Cyber Security.Dr. Warner previously worked at the Lawrence Livermore National Laboratory. She grew up in Albuquerque, New Mexico, attended the University of New Mexico and San Jose State University where she earned both B.S. and M.S. degrees in Chemistry. She also earned both M.A. and Ph.D. degrees in Chemistry from Princeton University.

  • Testing and Estimation in Sequential High-Dimension Data

    Abstract:

    Many modern processes generate complex data records not readily analyzed by traditional techniques. For example, a single observation from a process might be a radar signal consisting of n pairs of bivariate data described via some functional relation between reflection and direction. Methods are examined here for detecting changes in such sequences from some known or estimated nominal state. Additionally, estimates of the degree of change (scale, location, frequency, etc.) are desirable and discussed. The proposed methods are designed to take advantage of all available data in a sequence. This can become unwieldy for long sequences of large-sized observations, so dimension reduction techniques are needed. In order for these methods to be as widely applicable as possible, we make limited distributional assumptions and so we propose new nonparametric and Bayesian tools to implement these estimators.

    Speaker Info:

    Eric Chicken

    Florida State University

  • The (Empirical) Case for Analyzing Likert-Type Data with Parametric Tests

    Abstract:

    Surveys are commonly used to evaluate the quality of human-system interactions during the operational testing of military systems. Testers use Likert-type response options to measure the intensity of operators’ subjective experiences (e.g., usability, workload) while operating the system. Recently, appropriate methods for analyzing Likert data have become a point of contention within the operational test community. Some argue that Likert data can be analyzed with parametric techniques whereas others argue that only non-parametric techniques should be used. However, the reasons stated for holding a particular view are rarely tied to findings in the empirical literature. This presentation sheds light on this debate by reviewing existing research on how parametric statistics affect the conclusions drawn from Likert data and debunk common myths and misunderstandings about the nature of Likert data within the operational test community and academia.

    Speaker Info:

    Heather Wojton

    Research Staff Member

    IDA

  • The Future of Engineering at NASA Langley

    Abstract:

    In May 2016, the NASA Langley Research Center’s Engineering Director stood up a group consisting of employees within the directorate to assess the current state of engineering being done by the organization. The group was chartered to develop ideas, through investigation and benchmarking of other organizations within and outside of NASA, for how engineering should look in the future. This effort would include brainstorming, development of recommendations, and some detailed implementation plans which could be acted upon by the directorate leadership as part of an enduring activity. The group made slow and sporadic progress in several specific, self-selected areas including: training and development; incorporation of non-traditional engineering disciplines; capturing and leveraging historical data and knowledge; revolutionizing project documentation; and more effective use of design reviews.
    The design review investigations have made significant progress by leveraging lessons learned and techniques gained by collaboration with operations research analysts within the local Lockheed Martin Center for Innovation (the “Lighthouse”) and pairing those techniques with advanced data analysis tools available through the IBM Watson Content Analytics environment. Trials with these new techniques are underway but show promising results for the future of providing objective, quantifiable data from the design review environment – an environment which to this point has remained essentially unchanged for the past 50 years.

    Speaker Info:

    Joe Gasbarre

    NASA

  • The System Usability Scale: A measurement Instrument Should Suit the Measurement Needs

    Abstract:

    The System Usability Scale (SUS) was developed by John Brooke in 1986 “to take a quick measurement of how people perceived the usability of (office) computer systems on which they were working.” The SUS is a 10-item, generic usability scale that is assumed to be system agnostic, and it results in a numerical score that ranges from 0-100. It has been widely employed and researched with non-military systems. More recently, it has been strongly recommended for use with military systems in operational test and evaluation, in part because of its widespread commercial use, but largely because it produces a numerical score that makes it amendable to statistical operations.
    Recent lessons learned with SUS in operational test and evaluation strongly question its use with military systems, most of which differ radically from non-military systems. More specifically, (1) usability measurement attributes need to be tailored to the specific system under test and meet the information needs of system users, and (2) a SUS numerical cutoff score of 70—a common benchmark with non-military systems—does not accurately reflect “system usability” from an operator or test team perspective. These findings will be discussed in a psychological and human factors measurement context, and an example of system-specific usability attributes will be provided as a viable way forward. In the event that the SUS is used in operational test and evaluation, some recommendations for interpreting the outcomes will be provided.

    Speaker Info:

    Keith Kidder

    AFOTEC

  • Trust in Automation

    Abstract:

    This brief talk will focus on the process of human-machine trust in context of automated intelligence tools. The trust process is multifaceted and this talk will define concepts such as trust, trustworthiness, trust behavior, and will examine how these constructs might be operationalized in user studies. The talk will walk through various aspects of what might make an automated intelligence tool more or less trustworthy. Further, the construct of transparency will be discussed as a mechanism to foster shared awareness and shared intent between humans and machines.

    Speaker Info:

    Joseph Lyons

    Technical Advisor

    Air Force Research Laboratory

  • Uncertainty Quantification: What is it and Why it is Important to Test, Evaluation, and Modeling and Simulation in Defense and Aerospace

    Abstract:

    Uncertainty appears in many aspects of systems design including stochastic design parameters, simulation inputs, and forcing functions. Uncertainty Quantification (UQ) has emerged as the science of quantitative characterization and reduction of uncertainties in both simulation and test results. UQ is a multidisciplinary field with a broad base of methods including sensitivity analysis, statistical calibration, uncertainty propagation, and inverse analysis. Because of their ability to bring greater degrees of confidence to decisions, uncertainty quantification methods are playing a greater role in test, evaluation, and modeling and simulation in defense and aerospace. The value of UQ comes with better understanding of risk from assessing the uncertainty in test and modeling and simulation results.
    The presentation will provide an overview of UQ and then discuss the use of some advanced statistical methods, including DOEs and emulation for multiple simulation solvers and statistical calibration, for efficiently quantifying uncertainties. These statistical methods effectively link test, evaluation and modeling and simulation by coordinating the valuation of uncertainties, simplifying verification and validation activities.

    Speaker Info:

    Peter Qian

    University of Wisconsin and SmartUQ

  • Updating R and Reliability Training with Bill Meeker

    Abstract:

    Since its publication, Statistical Methods for Reliability Data by W. Q. Meeker and L. A. Escobar has been recognized as a foundational resource in analyzing failure time to and survival data. Along with the text, the authors provided an S-Plus software package, called SPLIDA, to help readers utilize the methods presented in the text. Today, R is the most popular statistical computing language in the world, largely supplanting S-Plus. The SMRD package is the result of a multi-year effort to completely rebuild SPLIDA, to take advantage of the improved graphics and workflow capabilities available in R. This presentation introduces the SMRD package, outlines the improvements and shows how the package works seamlessly with the rmarkdown and shiny packages to dramatically speed up your workflow. The presentation concludes with a discussion on what improvements still need to be made prior to publishing the package on the CRAN.

    Speaker Info:

    Jason Freels

    AFIT

  • Validation of AIM-9X Modeling and Simulation

    Abstract:

    One use for Modeling and Simulation (M&S) in Test and Evaluation (T&E) is to produce weapon miss distances to evaluate the effectiveness of a weapons. This is true for the Air Intercept Missile-9X (AIM-9X) T&E community. Since flight testing is expensive, the test program uses relatively few flight tests at critical conditions, and supplements those data with large numbers of miss distances from simulated tests across the weapons operational space. However, before the model and simulation is used to predict performance it must first be validated. Validation is an especially daunting task when working with a limited number of live test data. In this presentation we shown that even with a limited number of live test points (e.g. 16 missile fires), we can still perform a statistical analysis for the validation. Specifically, we introduce a validation technique known as Fisher’s Combined Probability Test and we show how to apply Fisher’s test to validate the AIM-9X model and simulation.

    Speaker Info:

    Rebecca Dickinson

    Research Staff Member

    IDA

  • VV&UQ - Uncertainty Quantification for Model-Based Engineering of DoD Systems

    Abstract:

    The US Army ARDEC has recently established an initiative to integrate statistical and probabilistic techniques into engineering modeling and simulation (M&S) analytics typically used early in the design lifecycle to guide technology development. DOE-driven Uncertainty Quantification techniques, including statistically rigorous model verification and validation (V&V) approaches, enable engineering teams to identify, quantify, and account for sources of variation and uncertainties in design parameters, and identify opportunities to make technologies more robust, reliable, and resilient earlier in the product’s lifecycle. Several recent armament engineering case studies - each with unique considerations and challenges - will be discussed.

    Speaker Info:

    Douglas Ray

    US Army RDECOM ARDEC

  • VV&UQ - Uncertainty Quantification for Model-Based Engineering of DoD Systems

    Abstract:

    The US Army ARDEC has recently established an initiative to integrate statistical and probabilistic techniques into engineering modeling and simulation (M&S) analytics typically used early in the design lifecycle to guide technology development. DOE-driven Uncertainty Quantification techniques, including statistically rigorous model verification and validation (V&V) approaches, enable engineering teams to identify, quantify, and account for sources of variation and uncertainties in design parameters, and identify opportunities to make technologies more robust, reliable, and resilient earlier in the product’s lifecycle. Several recent armament engineering case studies - each with unique considerations and challenges - will be discussed.

    Speaker Info:

    Melissa Jablonski

    US Army

  • Speaker Info:

    Dave Duma

    Acting Director, Operationak Test and Evaluation, Office of the Secretary of Defense

    DOT&E

    Mr. Duma is the Acting Director, Operational Test and Evaluation as of January 20, 2017. Mr. Duma was appointed as the Principal Deputy Director, Operational Test and Evaluation in January 2002. In this capacity he is responsible for all functional areas assigned to the office. He participates in the formulation, development, advocacy, and oversight of policies of the Secretary of Defense and in the development and implementation of test and test resource programs. He oversees the planning, conduct, analysis, evaluation, and reporting of operational and live fire testing. He serves as the Appropriation Director and Comptroller for the Operational Test and Evaluation, Defense Appropriation and coordinates all Planning, Programming, and Budgeting Execution matters. He previously served as Acting Director, Operational Test and Evaluation from February 2005 to July 2007 and again from May 2009 to September 2009.

    Mr. Duma also served as the Acting Deputy Director, Operational Test and Evaluation from January 1992 to June 1994. In this capacity he was responsible for oversight of the planning, conduct, analysis, and reporting of operational test and evaluation for all major conventional weapons systems in the Department of Defense. He supervised the development of evaluation plans and test program strategies, observed the conduct of operational test events, evaluated operational field tests of all armed services and submitted final reports for Congress.

    Mr. Duma returned to government service from the commercial sector. In private industry he worked a variety of projects involving test and evaluation; requirements generation; command, control, communications, intelligence, surveillance and reconnaissance; modeling and simulation; and software development.

    Mr. Duma has 30 years of naval experience during which he was designated as a Joint Service Officer. He served as the Director, Test and Evaluation Warfare Systems for the Chief of Naval Operations, the Deputy Commander, Submarine Squadron TEN, and he commanded the nuclear powered submarine USS SCAMP (SSN 588).

    Mr. Duma holds Masters of Science degrees in National Security and Strategic Studies and in Management. He holds a Bachelor of Science degree in Nuclear Engineering. He received the U.S. Presidential Executive Rank Award on two occasions; in 2008, the Meritorious Executive Award and in 2015, the Distinguished Executive Rank Award. He is a member of the International Test and Evaluation Association.

  • "High Velocity Analytics for NASA JPL Mars Rover Experimental Design"

    Abstract:

    Rigorous characterization of system capabilities is essential for defensible decisions in test and evaluation (T&E). Analysis of designed experiments is not usually associated “big” data analytics as there are typically a modest number of runs, factors, and responses. The Mars Rover program has recently conducted several disciplined DOEs on prototype coring drill performance with approximately 10 factors along with scores of responses and hundreds of recorded covariates. The goal is to characterize the ‘atthis-time’ capability to confirm what the scientists and engineers already know about the system, answer specific performance and quality questions across multiple environments, and inform future tests to optimize performance. A ‘rigorous’ characterization required that not just one analytical path should be taken, but a combination of interactive data visualization, classic DOE analysis screening methods, and newer methods from predictive analytics such as decision trees. With hundreds of response surface models across many test series and qualitative factors, these methods used had to efficiently find the signals hidden in the noise. Participants will be guided through an end-to-end analysis workflow with actual data from many tests (often Definitive Screening Designs) of the Rover prototype coring drill. We will show data assembly, data cleaning (e.g. missing values and outliers), data exploration with interactive graphical designs, variable screening, response partitioning, data tabulation, model building with stepwise and other methods, and model diagnostics. Software packages such as R and JMP will be used.

    Speaker Info:

    Jim Wisnowski

    Co-founder/Principle

    Adsurgo

    James Wisnowski provides training and consulting services in Design of Experiments, Predictive Analytics, Reliability Engineering, Quality Engineering, Text Mining, Data Visualization, and Forecasting to government and industry. Previously, he spent a career in analytics for the government. He retired from the Air Force having had leadership positions at the Pentagon, Air Force Academy, Air Force Operational Test and Evaluation Center, and units across the Air Force. He has published numerous papers in technical journals and presented several invited conference presentations. He was co-author of the Design and Analysis of Experiments by Douglas Montgomery: A Supplement for using JMP.

  • "High Velocity Analytics for NASA JPL Mars Rover Experimental Design"

    Abstract:

    Rigorous characterization of system capabilities is essential for defensible decisions in test and evaluation (T&E). Analysis of designed experiments is not usually associated “big” data analytics as there are typically a modest number of runs, factors, and responses. The Mars Rover program has recently conducted several disciplined DOEs on prototype coring drill performance with approximately 10 factors along with scores of responses and hundreds of recorded covariates. The goal is to characterize the ‘atthis-time’ capability to confirm what the scientists and engineers already know about the system, answer specific performance and quality questions across multiple environments, and inform future tests to optimize performance. A ‘rigorous’ characterization required that not just one analytical path should be taken, but a combination of interactive data visualization, classic DOE analysis screening methods, and newer methods from predictive analytics such as decision trees. With hundreds of response surface models across many test series and qualitative factors, these methods used had to efficiently find the signals hidden in the noise. Participants will be guided through an end-to-end analysis workflow with actual data from many tests (often Definitive Screening Designs) of the Rover prototype coring drill. We will show data assembly, data cleaning (e.g. missing values and outliers), data exploration with interactive graphical designs, variable screening, response partitioning, data tabulation, model building with stepwise and other methods, and model diagnostics. Software packages such as R and JMP will be used.

    Speaker Info:

    Heath Rushing

    Co-founder/Principle

    Adsurgo

    Heath Rushing is the cofounder of Adsurgo and author of the book Design and Analysis of Experiments by Douglas Montgomery: A Supplement for using JMP. Previously, he was the JMP Training Manager at SAS, a quality engineer at Amgen, an assistant professor at the Air Force Academy, and a scientific analyst for OT&E in the Air Force. In addition, over the last six years, he has taught Science of Tests (SOT) courses to T&E organizations throughout the DoD.

  • A Statistical Tool for Efficient and information-Rich Testing

    Abstract:

    Binomial metrics like probability-to-detect or probability-to-hit typically provide operationally meaningful and easy to interpret test outcomes. However, they are informationpoor metrics and extremely expensive to test. The standard power calculations to size a test employ hypothesis tests, which typically result in many tens to hundreds of runs. In addition to being expensive, the test is most likely inadequate for characterizing performance over a variety of conditions due to the inherently large statistical uncertainties associated with binomial metrics. A solution is to convert to a continuous variable, such as miss distance or time-todetect. The common objection to switching to a continuous variable is that the hit/miss or detect/non-detect binomial information is lost, when the fraction of misses/no-detects is often the most important aspect of characterizing system performance. Furthermore, the new continuous metric appears to no longer be connected to the requirements document, which was stated in terms of a probability. These difficulties can be overcome with the use of censored data analysis. This presentation will illustrate the concepts and benefits of this approach, and will illustrate a simple analysis with data, including power calculations to show the cost savings for employing the methodology.

    Speaker Info:

    Bram Lillard

    Research Staff Member

    IDA

  • Acceptability of Radiation Detection Systems

    Abstract:

    The American National Standards Institute (ANSI) maintains a set of test standards that provide methods to characterize and determine the acceptability of radiation detection systems for use in Homeland security. With a focus on the environmental, electromagnetic, and mechanical functionality tests, we describe the test formulation and discuss challenges faced in administering the standard to include the assurance of comparable evaluations across multiple test facilities and the handling of systems that provide a non-standard, unit-less response. We present proposed solutions to these difficulties that are currently being considered in updated versions of the ANSI standards. We briefly describe a decision analytic approach that could allow for the removal of minimum performance requirements from the standards and enable the end user to determine system suitability based on operation-specific requirements.

    Speaker Info:

    Dennis Leber

    NIST

  • Advanced Regression

    Abstract:

    This course assumes that the students have previous exposure to simple and multiple linear regression topics, at least from their undergraduate education; however, the course does not assume that the students are very current with this material. The course goal is to provide more insights and details into some of the more important topics. The presentation emphasizes the use of software for performing the analysis.

    Speaker Info:

    Geoff Vining

    Professor

    Virginia Tech University

    Geoff Vining is a Professor of Statistics at Virginia Tech. From 1999 - 2006, he also was the department head. He currently is a member of the American Society for Quality (ASQ) Board of Directors and Past-Chair of the ASQ Technical Communities Council. In 2016, he serves as the ASQ Treasurer. He is a Fellow of the ASQ, a Fellow of the American Statistical Association (ASA), and an Elected Member of the International Statistical Institute. Dr. Vining served as Editor of the Journal of Quality Technology from 1998 - 2000 and as Editor-in-Chief of Quality Engineering from 2008-2009. Dr. Vining has authored or co-authored three textbooks, all in multiple editions. He was several of the most important awards in industrial statistics/quality engineering including the ASQ's Shewhart Medal, Brumbaugh Award, and Hunter Award along with the ENBIS Box Medal. He is an internationally recognized expert in the use of experimental design for quality and productivity improvement and in the application of statistical process control. He has extensive consulting experience, most recently with NASA and the U.S. Department of Defense.

  • Aerospace Measurement and Experimental System Development Characterization

    Abstract:

    Co-Authors: Sean A. Commo, Ph.D., P.E. and Peter A. Parker, Ph.D., P.E. NASA Langley Research Center, Hampton, Virginia, USA Austin D. Overmeyer, Philip E. Tanner, and Preston B. Martin, Ph.D. U.S. Army Research, Development, and Engineering Command, Hampton, Virginia, USA. The application of statistical engineering to helicopter wind-tunnel testing was explored during two powered rotor entries. The U.S. Army Aviation Development Directorate Joint Research Program Office and the NASA Revolutionary Vertical Lift Project performed these tests jointly at the NASA Langley Research Center. Both entries were conducted in the 14- by 22-Foot Subsonic Tunnel with a small segment of the overall tests devoted to developing case studies of a statistical engineering approach. Data collected during each entry were used to estimate response surface models characterizing vehicle performance, a novel contribution of statistical engineering applied to powered rotor-wing testing. Additionally, a 16- to 47-times reduction in the number of data points required was estimated when comparing a statistically-engineered approach to a conventional one-factor-at-a-time approach.

    Speaker Info:

    Ray Rhew

    NASA

  • An Engineer and a Statistician Walk into a Bar: A Statistical Engineering Negotiation

    Abstract:

    A question about whether how post-processing affects the flammability of a given metal an oxygen enriched environment could be answered with the sensitivity of a standard test that is typically used to measure differences in flammability between different metals. The principal investigator was familiar with design of experiments (DOE) and wanted to optimize the value of the information to be gained. This talk will focus on the interchange between the engineer and a statistician. Their negotiations will illustrate the process of clarifying the problem, planning the test including choosing factors and responses, running the experiment and analyzing and reporting on the data. It will focus on the choices made to help squeeze extra knowledge from each test run and leverage statistics to result in increased engineering insight.

    Speaker Info:

    Ken Johnson

    NASA

  • Application of Statistical Engineering to Mixture Problems With Process Variables

    Abstract:

    Statistical engineering has been defined as: “The study of how to best utilize statistical concepts, methods, and tool, and integrate them with IT and other relevant sciences to generate improved results.” A key principle is that significant untapped benefits are often achievable by integrating multiple methods in novel ways to address a problem, without having to invent new statistical techniques. In this presentation, we discuss the application of statistical engineering to the problem of design and analysis of mixture experiments when process variables are also involved. In such cases, models incorporating interaction between the mixture and process variables have been developed, but tend to require large designs and models. By considering models nonlinear in the parameters, also well known in the literature, we demonstrate how experimenters can utilize an alternative, iterative strategy in attacking such problems. We show that this strategy potentially saves considerable experimental time and effort, while producing models that are nearly as accurate as much larger linear models. These results are illustrated using two published data sets and one new data set, all involving interactive mixture and process variable problems.

    Speaker Info:

    Roger Hoerl

    Union College

  • Bayesian Adaptive Design for Conformance Testing with Bernoulli Trials

    Abstract:

    Co-authors: Adam L. Pintar, Blaza Toman, and Dennis Leber. A task of the Domestic Nuclear Detection Office (DNDO) is the evaluation of radiation and nuclear (rad/nuc) detection systems used to detect and identify illicit rad/nuc materials. To obtain estimated system performance measures, such as probability of detection, and to determine system acceptability, the DNDO sometimes conduct large scale field tests of these systems at great cost. Typically, non adaptive designs are employed where each rad/nuc test source is presented to each system under test a predetermined and fixed number of times. This approach can lead to unnecessary cost if the system is clearly acceptable or unacceptable. In this presentation, an adaptive design with Bayesian decision theoretic foundations is discussed as an alternative to, and contrasted with, the more common single stage design. Although the basis of the method is Bayesian decision theory, designs may be tuned to have desirable type I and II error rates. While the focus of the presentation is a specific DNDO example, the method is applicable widely. Further, since constructing the designs is somewhat compute intensive, software in the form of an R package will be shown and is available upon request.

    Speaker Info:

    Adamn Pintar

    NIST

  • Bayesian Data Analysis in R/STAN

    Abstract:

    In an era of reduced budgets and limited testing, verifying that requirements have been met in a single test period can be challenging, particularly using traditional analysis methods that ignore all available information. The Bayesian paradigm is tailor made for these situations, allowing for the combination of multiple sources of data and resulting in more robust inference and uncertainty quantification. Consequently, Bayesian analyses are becoming increasingly popular in T&E. This tutorial briefly introduces the basic concepts of Bayesian Statistics, with implementation details illustrated in R through two case studies: reliability for the Core Mission functional area of the Littoral Combat Ship (LCS) and performance curves for a chemical detector in the Common Analytical Laboratory System (CALS) with different agents and matrices. Examples are also presented using RStan, a high-performance open-source software for Bayesian inference on multi-level models.

    Speaker Info:

    Kassandra Fronczyk

    IDA

  • Bayesian Data Analysis in R/STAN

    Abstract:

    In an era of reduced budgets and limited testing, verifying that requirements have been met in a single test period can be challenging, particularly using traditional analysis methods that ignore all available information. The Bayesian paradigm is tailor made for these situations, allowing for the combination of multiple sources of data and resulting in more robust inference and uncertainty quantification. Consequently, Bayesian analyses are becoming increasingly popular in T&E. This tutorial briefly introduces the basic concepts of Bayesian Statistics, with implementation details illustrated in R through two case studies: reliability for the Core Mission functional area of the Littoral Combat Ship (LCS) and performance curves for a chemical detector in the Common Analytical Laboratory System (CALS) with different agents and matrices. Examples are also presented using RStan, a high-performance open-source software for Bayesian inference on multi-level models.

    Speaker Info:

    James Brownlow

    U.S. Air Force 812TSS/ENT

  • Bayesian Estimation of Reliability Growth

    Speaker Info:

    Jim Brownlow

    U.S. Air Force 812TSS/ENT

  • Case Studies for Statistical Engineering Applied to Powered Rotorcraft Wind-Tunnel Tests

    Abstract:

    Co-Authors: Sean A. Commo, Ph.D., P.E. and Peter A. Parker, Ph.D., P.E. NASA Langley Research Center, Hampton, Virginia, USA Austin D. Overmeyer, Philip E. Tanner, and Preston B. Martin, Ph.D. U.S. Army Research, Development, and Engineering Command, Hampton, Virginia, USA. The application of statistical engineering to helicopter wind-tunnel testing was explored during two powered rotor entries. The U.S. Army Aviation Development Directorate Joint Research Program Office and the NASA Revolutionary Vertical Lift Project performed these tests jointly at the NASA Langley Research Center. Both entries were conducted in the 14- by 22-Foot Subsonic Tunnel with a small segment of the overall tests devoted to developing case studies of a statistical engineering approach. Data collected during each entry were used to estimate response surface models characterizing vehicle performance, a novel contribution of statistical engineering applied to powered rotor-wing testing. Additionally, a 16- to 47-times reduction in the number of data points required was estimated when comparing a statistically-engineered approach to a conventional one-factor-at-a-time approach.

    Speaker Info:

    Sean Commo

    NASA

  • Combining information for Realiability Assesment-Tuesday Morning

    Speaker Info:

    Alyson Wilson

    North Carolina State University

  • Design and Analysis of Margin Testing in Support of Product Qualification for High Reliability Systems

    Speaker Info:

    Justin Newcomer

    Sandia National Lab

  • Experiences in Reliability Analysis

    Abstract:

    Reliability assurance processes in manufacturing industries require data-driven information for making product-design decisions. Life tests, accelerated life tests, and accelerated degradation tests are commonly used to collect reliability data. Data from products in the field provide another important source of useful reliability information. Due to complications like censoring, multiple failure modes, and the need for extrapolation, these reliability studies typically yield data that require special statistical methods. This presentation will describe the analyses of a collection of different life data analysis applications in the area of product reliability. Methods used in the analyses include Weibull and lognormal analysis, analysis of data with multiple failure modes, accelerated test analysis, analysis of both repeated measures and destructive degradation data and the analysis of recurrence data from repairable systems.

    Speaker Info:

    Bill Meeker

    Professor

    Iowa State University

    William Meeker is Professor of Statistics and Distinguished Professor of Liberal Arts and Sciences at Iowa State University. He is a Fellow of the American Statistical Association, the American Society for Quality, and the American Association for the Advancement of Science, and a past Editor of Technometrics. He is co-author of the books Statistical Methods for Reliability Data with Luis Escobar (1998), and Statistical Intervals with Gerald Hahn (1991), and of numerous publications in the engineering and statistical literature. He has won numerous awards for his research and service, including the Brumbaugh, Hunter, Sacks, Shewhart, Youden, and Wilcoxon awards. He has done research and consulted extensively on problems in reliability data analysis, warranty analysis, accelerated testing, nondestructive evaluation, and statistical computing.

  • Importance of Modeling and Simulation in Testing and the Need for Rigorous Validation

    Abstract:

    Modeling and simulation (M&S) is often an important element of operational evaluations of effectiveness, suitability, survivability, and lethality. For example, the testing of new systems designed to operate against advanced foreign threats, as well as the testing of systems of systems, will involve the use of M&S to examine scenarios that cannot be created using live testing. In order to have an adequate understanding of, and confidence in, the results obtained from M&S, statistically rigorous techniques should be applied to the validation process wherever possible. Design of experiments methodologies should be employed to determine what live and simulation data are needed to support rigorous validation, and formal statistical tests should be used to compare live and simulated data. This talk will discuss the importance of M&S in operational testing through a few examples, and outline several statistically rigorous techniques for validation.

    Speaker Info:

    Kelly McGinnity

    IDA

  • Introduction to Bayesian

    Abstract:

    This course will cover the basics of the Bayesian approach to practical and coherent statistical inference. Particular attention will be paid to computational aspects, including MCMC. Examples will the run gamut from toy illustration to real-world data analysis from all areas of science, with R implementations provided.

    Speaker Info:

    Robert Gramacy

    Associate Professor

    University of Chicago

    Professor Gramacy is an Associate Professor of Econometrics and Statistics in the Booth School of business, and a fellow of the Computation Institute at The University of Chicago. His research interests include Bayesian modeling methodology, statistical computing, Monte Carlo inference, nonparametric regression, sequential design, and optimizaton under uncertainty. He specializes in areas of real-data analysis where the ideal modeling apparatus is impractical, or where the current solutions are inefficient and thus skimp on fidelity.

  • Introduction to Survey Design

    Abstract:

    Surveys are a common tool for assessing user experiences with systems in various stages of development. This mini-tutorial introduces the social and cognitive processes involved in survey measurement and addresses best practices in survey design. Clarity of question wording, appropriate scale use, and methods for reducing survey-fatigue are emphasized. Attendees will learn practical tips to maximize the information gained from user surveys and should bring paper and pencils to practice writing and evaluating questions.

    Speaker Info:

    Heather Wojton

    Research Staff Member

    IDA

  • Introduction to Survey Design

    Abstract:

    Surveys are a common tool for assessing user experiences with systems in various stages of development. This mini-tutorial introduces the social and cognitive processes involved in survey measurement and addresses best practices in survey design. Clarity of question wording, appropriate scale use, and methods for reducing survey-fatigue are emphasized. Attendees will learn practical tips to maximize the information gained from user surveys and should bring paper and pencils to practice writing and evaluating questions.

    Speaker Info:

    Justin Mary

    Research Staff Member

    IDA

  • Introduction to Survey Design

    Abstract:

    Surveys are a common tool for assessing user experiences with systems in various stages of development. This mini-tutorial introduces the social and cognitive processes involved in survey measurement and addresses best practices in survey design. Clarity of question wording, appropriate scale use, and methods for reducing survey-fatigue are emphasized. Attendees will learn practical tips to maximize the information gained from user surveys and should bring paper and pencils to practice writing and evaluating questions.

    Speaker Info:

    Jonathan Snavely

    IDA

  • Leveraging Design for Variation to Improve both Testing and Design - A Case Study on Probabilistic Design of Bearings

    Abstract:

    In this case study we demonstrate an application of Pratt & Whitney’s “Design for Variation” discipline applied to the task of a roller bearing design. The ultimate goal, in this application, was to utilize test data from the “real world” to calibrate a computer model used for design and ensure that roller bearing designs obtained from this model were optimized for maximum robustness to major sources of variation in bearing manufacture and operation. The “Design for Variation” process provides engineers with many useful analysis results even before real world data is applied: high fidelity sensitivity analysis, uncertainty analysis (quantifying a baseline risk of failing to meet design intent) and model verification. The combining of real world data and Bayesian statistical methods that Design for Variation employs to calibrate models, however, goes a step further, validating the accuracy of the model’s outputs and quantifying any bias between the model and the real world. As a result of this application, the designers were able to identify the sources of the bias and correct the model’s physics-based aspects to more accurately model reality. The improved model is now integrated into all successive bearing design activities. The benefits of this method, which only required a small amount of high quality test data, are now available to all present and future roller bearing designs.

    Speaker Info:

    Jaime O'connell

    Pratt and Whitney

  • Lunch with Keynote Leadership Perspective

    Speaker Info:

    Jon Hollday

    NASA

    Link to Bio unavail

  • Lunch With Leadership Perspective-Wednesday PM

    Speaker Info:

    David Brown

    Deputy Assistant

    Secretary of Defense, Developmental Test & Evaluation

    Link to Bio unavail

  • Making Statistically Defensible Testing 'The way we do things around here'

    Abstract:

    For the past 7 years, USAF DT&E has been exploring ways to adapt the principles of experimental design to rapidly evolving developmental test articles and test facilities – often with great success. This paper discusses three case studies that span the range of USAF DT&E activities from EW to Ground Test to Flight Test and shows the truly revolutionary impact Fisher’s DOE can have on development. The Advanced Strategic and Tactical Expendable (ASTE) testing began in 1990 to develop, enhance, and test new IR flares, flare patterns, and dispense tactics. More than 60 aircraft & flare types have been tested. The typical output is a “Pancake Plot” of 200+ “cases” of flare, aspect angle, range, elevation, and flare effectiveness using a stop-light chart (red-yellow-green) approach. In usual testing – ~3000 flare engagements costing $1M to participate. The response, troublingly enough, is binary – 15-30 binomial response trials measuring P(Decoy). Binary responses are information-poor. Legacy testing does not assess present statistical power in reporting P(decoy) results. Analysts investigated replacing P(Decoy) w/ continuous metrics – e.g. time to decoy. This research is ongoing. We found we could spread the replicates out to examine 3x to 5x more test conditions without affecting power materially. Analysis with the Generalized Linear Model (GLZ) replaced legacy “cases” analysis with 75% improvement to confidence intervals with same data. We are seeking to build a Monte-Carlo simulation to estimate how many runs are required in a logistics regression model to achieve adequate power. We hope to reduce customer expenditures for flare information by as much as 50%. Co-authors J. Higdon, B, Knight. AEDC completed a major upgrade with new nozzle hardware to vary Mach. The Arnold Transonic Wind Tunnel 4T spans the range of flows from subsonic to approximately M9. The new wind tunnel was to be computer controlled. A number of key instrumentation improvements were made at the same time. The desire was to calibrate the resulting modified tunnel. The last calibration of 4T was 25 years ago in 1990. The calibration ranged across the full range of Mach and pressure capabilities, spanning a four-D space: pressure, Mach, wall angle, and wall porosity. Both the traditional OFAT effort – vary one factor at a time – and a parallel DOE effort were run to compare design, execution, modeling, and prediction capabilities against cost and time to run. The robust embedded face-centered CCD DOE design (J. Simpson and D. Landman ’05) employed 75 vs. 176 OFAT runs. The RSM design achieved 57% run savings. Due to an admirable discipline in randomization during the DOE trials, the smaller design required longer to run. As a result of using the DOE approach, engineers found it easier predict offcondition tunnel operating characteristics using RSM models, optimize facility flow quality for any given test condition. In addition, the RSM regression models support future “spot check” calibration in future by comparing predictions to measured values. If the measurement falls within the prediction interval, the existing calibration still appropriate. AEDC is using split-plot style designs a for current wind tunnel probe calibration. Co-author: Dr Dough Garrard.

    Speaker Info:

    Greg Hutto

    Air Force 96th Test Wing

  • Managing Uncertainty in the Context of Risk Acceptance Decision Making at NASA: Thinking Beyond the Model

    Abstract:

    NASA has instituted requirements for establishing Agency-level safety thresholds and goals that define “long-term targeted and maximum tolerable levels of risk to the crew as guidance to developers in evaluating ‘how safe is safe enough’ for a given type of mission.” With the adoption of this policy for human space flight and with ongoing Agency efforts to increase formality in the development and review of the basis for risk acceptance, the decision-support demands placed on risk models are becoming more stringent at NASA. While these models play vital roles in informing risk acceptance decisions, they are vulnerable to incompleteness of risk identification, as well as to incomplete understanding of probabilities of occurrence, potentially leaving a substantial portion of the actual risk unaccounted for, especially for new systems. This presentation argues that management of uncertainty about the “actual” safety performance of a system must take into account the contribution of unknown and/or underappreciated (UU) risk. Correspondingly, responsible risk-acceptance decision-making requires the decision-maker to think beyond the model and address factors (e.g., organizational and management factors) that live outside traditional engineering risk models. This presentation advocates the use of a safety-case approach to risk acceptance decision-making.

    Speaker Info:

    Homayoon Dezfuli

    NASA

  • Operational Cybersecurity Testing

    Abstract:

    The key to acquiring a cybersecure system is the ability to drive considerations about security from the operational level into the tactics and procedures. For T&E to support development and acquisition decisions, we must also adopt the perspective that a cyberattack is an attack on the mission using technology. A well-defined process model linking tools, tasks, and operators to mission performance supports this perspective. We will discuss an approach based on best practices learned from various DHS programs.

    Speaker Info:

    Alex Hoover

    DHS

  • Overview of Design of Experiments

    Abstract:

    Well-designed experiments are a powerful tool for developing and validating cause and effect relationships when evaluating and improving product and process performance and for operational testing of complex systems. Designed experiments are the only efficient way to verify the impact of changes in product or process factors on actual performance. This course is focused on helping you and your organization make the most effective utilization of DOX. Software usage is fully integrated into the course

    Speaker Info:

    Doug Montgomery

    Professor

    Arizon State University

    Douglas C. Montgomery, Ph.D., is Regent's Professor of Industrial Engineering and Statistics and ASU Foundation Professor of Engineering at Arizona State University. He was the John M. Fluke Distinguished Professor of Engineering, Director of Industrial Engineering and Professor of Mechanical Engineering at the University of Washington in Seattle. He was a Professor of Industrial and Systems Engineering at the Georgia Institute of Technology. He holds BSIE, MS and Ph.D. degrees from Virginia Tech. Dr. Montgomery's industrial experience includes engineering assignments with Union Carbide Corporation and Eli Lilly and Company. He also has extensive consulting experience. Dr. Montgomery's professional interests focus on industrial statistics, including design of experiments, quality and reliability engineering, applications of linear models, and time series analysis and forecasting. He also has interests in operations research and statistical methods applied to modeling and analyzing manufacturing systems. He was a Visiting Professor of Engineering at the Monterey Institute of Technology in Monterey, Mexico, and a University Distinguished Visitor at the University of Manitoba. Dr. Montgomery has conducted basic research in empirical stochastic modeling, process control, and design of experiments. The Department of Defense, the Office of Naval Research, the National Science Foundation, the United States Army, and private industry have sponsored his research. He has supervised 66 doctoral dissertations and over 40 MS theses and MS Statistics Projects.

  • Power Anyalysis Concepts

    Speaker Info:

    Jim Simpson

    JK Analytics

  • Presenting Complex Statistical Methodologies to Military Leadership

    Abstract:

    More often than not, the data we analyze for the military is plagued with statistical issues. Multicollinearity, small sample sizes, quasi-experimental designs, and convenience samples are some examples of what we commonly see in military data. Many of these complications can be resolved either in the design or analysis stage with appropriate statistical procedures. But, to keep our work useful, usable, and transparent to the military leadership who sponsors it, we must strike the elusive balance between explaining and justifying our design and analysis techniques and not inundating our audience with unnecessary details. It can be even more difficult to get military leadership to understand the statistical problems and solutions so well that they are enthused and supportive of our approaches. Using literature written on the subject as well as a variety of experiences, we will showcase several examples, as well as present ideas for keeping our clients actively engaged in statistical methodology discussions.

    Speaker Info:

    Jane Pinelis

    John Hopkins University, Applied Physics Lab

  • Recent use of Statistical Methods in NASA Aeroscience Testing Research and Development Activities

    Abstract:

    Over the past 10 years, a number of activities have incorporated statistical methods for the purpose of database development and associated uncertainty modeling. This presentation will highlight approaches taken in aerodynamic database development for space vehicle projects, specifically the Orion spacecraft and abort vehicle, and the Space Launch System (SLS) launch vehicle. Additionally, statistical methods have been incorporated into the Commercial Supersonic Transport Project for test technique development and optimization as well as a certification prediction methodology, which is planned to be verified with the Low-Boom Flight Demonstrator data. Discussion will conclude with the use of statistical methods for quality control and assurance in the NASA Langley Research Center ground testing facilities related to our Check Standard Project and characterization and calibration testing.

    Speaker Info:

    Eric Walker

    NASA

  • Reliability Growth in T&E - Summary of National Research Council's Committee on National Statistics Report Finding-Tuesday Morning

    Speaker Info:

    Art Fries

    Research Staff Member

    IDA

  • Risk Analysis for Orbital Debris Wire Harness Failure Assessment for the Joint Polar Satellite System

    Abstract:

    This paper presents the results of two hypervelocity impact failure probabilistic risk assessments for critical wire bundles exposed aboard the Joint Polar Satellite System (JPSS-1) to an increased orbital debris environment at its 824 km, 98.8 deg inclination orbit. The first “generic” approach predicted the number of wires broken by orbital debris ejecta emerging from normal impact with multi-layer insulation (MLI) covering 36-, 18-, and 6-strand wire bundles at a 5 cm standoff using the Smooth Particle Hydrodynamic (SPH) code. This approach also included a mathematical approach for computing the probability that redundant wires were impacted then severed within the bundle. Based in part on the high computed risk of a critical wire bundle failure from the generic approach, an enhanced orbital debris protection design was examined, consisting of betacloth-reinforced MLI suspended at a 5 cm standoff over a seven layer betacloth and Kevlar blanket, draped over the exposed wire bundles. A second SPH-based risk assessment was conducted that also included the beneficial effects from the high (75 degree) obliquity of orbital debris impact and shadowing by other spacecraft components, and resulted in a considerably reduced likelihood of critical wire bundle failure compared to the original baseline design.

    Speaker Info:

    Joel Williamsen

    IDA

  • Sensitivity Experiments

    Abstract:

    A sensitivity experiment is a special type of experimental design that is used when the response variable is binary and the covariate is continuous. Armor protection and projectile lethality tests often use sensitivity experiments to characterize a projectile’s probability of penetrating the armor. In this minitutorial we illustrate the challenge of modeling a binary response with a limited sample size, and show how sensitivity experiments can mitigate this problem. We review eight different single covariate sensitivity experiments and present a comparison of these designs using simulation. Additionally, we cover sensitivity experiments for cases that include more than one covariate, and highlight recent research in this area. The mini-tutorial concludes with a case study by Greg Hutto on Army grenade fuze testing, titled “Preventing Premature ZAP: EMPATHY Capacitive Design With 3 Phase Optimal Design (3pod).”

    Speaker Info:

    Thomas Johnson

    Research Staff Member

    IDA

  • Sensitivity Experiments

    Abstract:

    A sensitivity experiment is a special type of experimental design that is used when the response variable is binary and the covariate is continuous. Armor protection and projectile lethality tests often use sensitivity experiments to characterize a projectile’s probability of penetrating the armor. In this minitutorial we illustrate the challenge of modeling a binary response with a limited sample size, and show how sensitivity experiments can mitigate this problem. We review eight different single covariate sensitivity experiments and present a comparison of these designs using simulation. Additionally, we cover sensitivity experiments for cases that include more than one covariate, and highlight recent research in this area. The mini-tutorial concludes with a case study by Greg Hutto on Army grenade fuze testing, titled “Preventing Premature ZAP: EMPATHY Capacitive Design With 3 Phase Optimal Design (3pod).”

    Speaker Info:

    Greg Hutton

    U.S. Air Force , 96 Test Wing

  • STAT Engineering Keynote-Wednesday AM

    Speaker Info:

    Christine Anderson-Cook

    Statistics

    Los Alamos National Lab

    s in the areas of complex system reliability, non-proliferation, malware detection and statistical process control. Before joining LANL, she was a faculty member in the Department of Statistics at Virginia Tech for 8 years. Her research areas include response surface methodology, design of experiments, reliability, multiple criterion optimization and graphical methods. She has authored more than 130 articles in statistics and quality peerreviewed journals, and has been a long time contributor to the Quality Progress Statistics Roundtable column. In 2012, she edited a special issue in Quality Engineering on Statistical Engineering with Lu Lu. She is an elected fellow of the American Statistical Association and the American Society for Quality. In 2012 she was honored with the ASQ Statistics Division William G. Hunter Award. In 2011 she received the 26th Annual Governor’s Award for Outsta

  • Statistical Models for Combining Information: Styrker Reliability Case Study

    Speaker Info:

    Rebecca Dickinson

    IDA

  • Statistically Defensible Experiment Design for Wind Tunnel Characterization of Subscale Parachutes for Mission to Mars

    Abstract:

    https://s3.amazonaws.com/workshop-archives-2016/IDA+Workshop+2016/testsciencemeeting.ida.org/pdfs/1b-ExperimentalDesignMethodsandApplications.pdf

    Speaker Info:

    Drew Landman

    Old Dominion Univerity

  • Supersaturated Designs: Construction and Analysis

    Abstract:

    An important property of any experimental design is its ability to detect active factors. For supersaturated designs, in which model parameters outnumber experimental runs, power is even more critical. In this talk, we review several popular supersaturated design construction criteria and analysis methods. We then demonstrate how simulation studies can be useful for practitioners in selecting a supersaturated design with regards to power to detect active factors. One of our findings based on an extensive simulation study is that although differences clearly exist among analysis methods, most supersaturated design construction methods are indistinguishable in terms of power. This conclusion can be reassuring for practitioners as supersaturated designs can then be sensibly chosen based upon convenience. For instance, the Bayesian D-optimal supersaturated designs can be easily constructed in JMP and SAS for any run size and number of factors. On the other hand, software for constructing E(s2)-optimal supersaturated designs is not as accessible.

    Speaker Info:

    David Edwards

    Virginia Commonwealth University

  • Technical Leadership Panel-Tuesday Afternoon

    Speaker Info:

    Catherine Warner

    Science Advsior

    DOT&E

  • Technical Leadership Panel-Tuesday Afternoon

    Speaker Info:

    Paul Roberts

    Chief Engineer

    Engineering and Safety Center

  • Technical Leadership Panel-Tuesday Afternoon

    Speaker Info:

    Frank Peri

    Deputy Director

    Langley Engineering Directorate

  • Technical Leadership Panel-Tuesday Afternoon

    Speaker Info:

    Peter Matisoo

    Technical Director

    COTF

  • Technical Leadership Panel-Tuesday Afternoon

    Speaker Info:

    Jeff Olinger

    Technical Director

    AFOTEC

  • The Bootstrap World

    Abstract:

    Bootstrapping is a powerful tool for statistical estimation and inference. In this tutorial, we will use operational test scenarios to provide context when exploring examples ranging from the simple (estimating a sample mean) to the complex (estimating a confidence interval for system availability). Areas of focus will include point estimates, confidence intervals, parametric bootstrapping and hypothesis testing with the bootstrap. The strengths and weaknesses of bootstrapping will also be discussed.

    Speaker Info:

    Matt Avery

    Research Staff Member

    IDA

  • The Sixth Sense: Clarity through Statistical Engineering

    Abstract:

    Two responses to an expensive, time consuming test on a final product will be referred to as “adhesion” and “strength”. A screening test was performed on compounds that comprise the final product. These screening tests are multivariate profile measurements. Previous models to predict the expensive, time consuming test lacked accuracy and precision. Data visualization was used to guide a statistical engineering model that makes use of multiple statistical techniques. The modeling approach raised some interesting statistical questions for partial least square models regarding over-fitting and cross validation. Ultimately, the model interpretation and the visualization both make engineering sense and led to interesting insights regarding the product development program and screening compounds.

    Speaker Info:

    Jennifer Van-Mullekom

    DuPont

  • Three Case Studies Comparing Traditional versus Modern Test Designs

    Abstract:

    There are many testing situations that historically involve a large number of runs. The use of experimental design methods can reduce the number of runs required to obtain the information desired. Example applications include wind tunnel test campaigns, computational experiments and live fire tests. In this work we present three case studies conducted under the auspices of the Science of Test Research Consortium comparing the information obtained via a historical experimental approach with the information obtained via an experimental design approach. The first case study involves a large scale wind tunnel experimental campaign. The second involves a computational fluid dynamics model of a missile through various speeds and angles of attack. The third case involves ongoing live fire testing involve hot surface testing. In each case, results suggest a tremendous opportunity to reduce experimental test efforts without losing test information.

    Speaker Info:

    Ray Hill

    Air Force instite of Technology

  • Using Sequential Testing to Address Complex Full-Scale Live Fire Test and Evaluation

    Abstract:

    Co-authors: Dr. Darryl Ahner, Director, STAT COE Dr. Lenny Truett, STAT COE Mr. Scott Wacker, 96 TG/OL-ACS. This presentation will present the benefits of sequential testing and demonstrate how sequential testing can be used to address complex test conditions by developing well controlled early experiments to explore basic questions before proceeding to full-scale testing. This approach can result in increased knowledge and decreased cost. As of FY13 the Air Force had spent an estimated $47M on dry bay fire testing making fire the largest cost contributor for Live Fire Test and Evaluation (LFT&E) programs. There is currently an estimated 60% uncertainty in total platform vulnerable area (Av) driven by probability of kill (PK) due to ballistically ignited fires. A large part of this uncertain comes from the fact that current spurt modeling does not predict fuel spurt delay with reasonable accuracy despite a large amount of test data. A low-cost sequential approach was developed to improve spurts models. Initial testing used a spherical projectile to test 10 different factors in a definitive screening design. Once the list of factors was refined, a second phase of testing determined if a suitable methodology could be developed to scaled results using water as a surrogate for JP-8 fuel. Finally testing was performed with cubical projectiles to evaluate the effect of fragment orientation. The entire cost for this effort was less than one or two typical full-scale live fire tests.

    Speaker Info:

    Darryl Ahner

    AFIT, STAT COE

  • Welcoming & Opening Keynote-Tuesday AM

    Speaker Info:

    Mike Gilmore

    Director

    DOT&E

    Link to Bio unavail

  • Speaker Info:

    Tye Botting

    Research Staff Member

    IDA

  • Abstract:

    This tutorial will provide attendees with a live demo of an open source software reliability tool to automatically apply models to data. Functionality to be illustrated includes how to: Select and view data in time between failures, cumulative failures, and failure intensity formats. Apply trend tests to determine if a data set exhibits reliability growth, which is a prerequisite to apply software reliability growth models. Apply models to a data set . Apply measures of model goodness of fit to obtain quantitative guidance to select one or more models based on the needs of the user .Query model results to determine the additional testing time required to achieve a desired reliability. Following this live demonstration an overview of the underlying mathematical theory will be presented, including: Representation of failure data formats. Laplace trend test and running arithmetic average. Maximum likelihood estimation. Failure rate and failure counting software reliability models. Akaike information criterion and predictive sum of squares error.

    Speaker Info:

    Lance Fiondella

    Univeristy of Massachusetts, Dartmouth